Hi.

SSL appears to not work for clusters in 3.2. Here's my environment and steps to reproduce:

1 mysql server in addition to the aegir server - sqlmaster
2 web servers in addition to the aegir master - web1 and web2, which are both apache_ssl
1 web cluster containing web1 and web2

I create a platform on web1 with a site on it, generating a new certificate. Everything works great.

However, when I try to create a site on the cluster, it does not work - neither using an existing certificate or generating a new one...

Here are some of the symptoms in the case of the cluster site:
~aegir/.drush/site-name.drush.alias.drushrc.php - ssl_enabled and ssl_key are MISSING
~aegir/config/ssl/new_cert_name is NOT GENERATED
~aegir/config/server_web[1,2]/ssl/new_cert_name is NOT COPIED OVER, whether using an existing certificate or a new one
~aegir/config/server_web[1,2]/apache/vhost.d/site-name does NOT have the SSL section (port 443) - only the port 80 section.
~aegir/config/server_cluster is EMPTY (presumably by design)

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

captainack created an issue. See original summary.

gboudrias’s picture

Issue summary: View changes

Do you know if this is a regression, ie did this work in 3.1?

captainack’s picture

Well, 3.1 had a different SSL problem, which was causing the parent ssl.d folder to not be rsync'ed over (I believe it was for all remote servers, not just clusters). I noticed a fix on the 3.2 release notes so I updated. Remote apache_ssl servers now work, but clusters still do not.

captainack’s picture

I've probed further. I'm new to all the code, so I hope I'm barking up the right tree.

But it looks like on the back end, $this->context->ssl_enabled is unset in Provision_Service_http_ssl::config_data(), which is the root of the problem.

That ssl_enabled comes from ~aegir/.drush/site-name.drush.alias.drushrc.php and as I mentioned before, it's missing ssl_enabled and ssl_key.

On the front end, I can see that this is generated in the hook implementation in modules/aegir/hosting/web_server/ssl/hosting_ssl.drush.inc, but I have no idea what includes this, and this is where I stopped.

Is this file included in the cluster path?

captainack’s picture

I had a little time so I verified that the hosting_ssl_hosting_site_context_options() hook implementation is being called... But for some reason by the time execution reaches the template file, ssl_enabled is no longer there. I don't know what happens in between the two, and this is as far as my understanding of aegir/drush's internals will take me.

I guess first thing's first - Is my assumption that ssl_enabled is supposed to appear in the site alias file correct in all cases (even for clusters)?

It isn't documented anywhere, and that seems to be the root of the problem.

captainack’s picture

Thanks to ergonlogic's help, I understand now why the backend refuses to write it out. setProperty needs to be called on any properties for them to be persisted with provision-save. It's documented, but it's one of those things you have to look for to find :-/.

Anyway, I adapted the way pack is doing it to cluster (using _each_server) and everything seems to work. Patch coming.

captainack’s picture

captainack’s picture

Status: Active » Needs review
ergonlogic’s picture

Version: 7.x-3.2 » 7.x-3.x-dev
Status: Needs review » Reviewed & tested by the community

I don't believe that this is a regression, but more likely a longer-standing bug. I hadn't come across it because I usually use an SSL endpoint on a load balancer before hitting cache servers, whenever I use clustering. So SSL was never an issue. I don't think the web_pack suffers from it either, since config/ is mounted via NFS, iirc.

Working with Amin on this, the requirement to re-dispatch init_site() to the cluster nodes via _each_server() was the obvious point of failure (if anything can be said to be obvious in Aegir).

I haven't tested this myself, but it works for Amin, and that's enough for me. I'll mark it reviewed, to ensure it gets into 3.3. But I'd like a second look by another core maintainer before committing it.

helmo’s picture

Status: Reviewed & tested by the community » Fixed

No good/easy way to test this ... but it looks good.

  • helmo committed 508c3e6 on 7.x-3.x authored by captainack
    Issue #2613716 by captainack, ergonlogic: Cluster SSL not working for 2...
helmo’s picture

Issue tags: +Aegir 3.3

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.