As it stands, we cannot fully disable a service that was at one time enabled on a server, short of disabling the service-type feature. For example, when we enable 'mysql' on a server (in the front-end), the verify that will follow adds some entries to the context like:

  'db_service_type' => 'mysql',
  'db_port' => '3306',
  'master_db' => 'mysql://user:password@hostname',

Disabling the 'mysql' service in the front-end doesn't remove these entries. Similarly, in hosting_https, we have a 'Certificate' service-type with 'LetsEncrypt' and 'SelfSigned' services. Enabling and switching between these services works just fine. However, setting the service to "None", results in the last 'Certificate_service_type' remaining in the context.

If we disable the service-type module, then the extra parameters in the context go away. A problem arises, however, if we disable the service modules first. This causes a failure whereby Provision is trying to find the service class (e.g. Provision_Service_Certificate_LetsEncrypt) but since the back-end is no longer included from the ~aegir/.drush/drushrc.php, this fails. This, in turn, makes verifying the front-end impossible, and so re-enabling the necessary module is in-effective at fixing the now-broken Aegir install.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

ergonlogic created an issue. See original summary.

ergonlogic’s picture

See the attached patches for a working solution.

I believe what was happening here is that, when no services were enabled for a given service type, we would pass "--TYPE_service_type=0" to provision-save. We then run array_filter() over CLI and STDIN options, before merging then onto the existing context values. So, our "0" was never making it to the service registration, and thus never overwriting any existing values. In fact, nothing false-y could make it through that filtering.

So, the patches here simply set "--TYPE_service_type=NONE" in the front-end, and check for that "NONE" string in the back-end. Kinda ugly, IMO. But the alternative would be to start playing with those array_filter()s when we build the options arrays, which could have unintended consequences.

helmo’s picture

looks reasonable.

helmo’s picture

Status: Needs review » Fixed

Committed to both provision and hosting

helmo’s picture

Status: Fixed » Needs work

It's good we have travis now ... This failed https://travis-ci.org/aegir-project/hosting/builds/169247766

I reproduced locally to get the actual error 'Unable to load NONE driver for the db service: Expecting class Provision_Service_db_NONE' and a notice "Undefined index: db server.php:121"

helmo’s picture

It turned out my local dev was missing the patch for provision in #5 ... after making sure that was applies the test suite ran just fine.

However travis still fails, evenafter r estarting it (https://travis-ci.org/aegir-project/hosting/builds/169324314)

I created a PR just to try it with this hosting commit reverted ... https://github.com/aegir-project/hosting/pull/7 ... that gets a green light.

So... the hosting part is now reverted.

ergonlogic’s picture

Status: Needs work » Needs review

Drush's stability has been pretty spotty lately. I'd suggest pinning to a know-good release (i.e., 8.1.3). We could presumably add a matrix to test against the latest stable and unstable releases too, perhaps with allow_failures, to make it easier to detect where issues are coming from.

In this case though, I think the problem is that our Dockerfile appears to be downloading the latest dev tarball of Provision. Since this fix requires both Hosting and Provision patches in order to operate, and the snapshot is only updated nightly (iirc), I don't think this will ever pass.

If that's the case, then we should presumably figure out a way to use the latest Provision code in these tests. Ideally, we'd be able to specify a branch when cloning Provision. Perhaps just matching the Provision branch to the Hosting one, which should allow us to test issue such as this one without committing to 7.x-3.x.

I'm setting this issue back to "needs review", since I don't believe the issue is with the patches here.

helmo’s picture

Status: Needs review » Fixed

Did a new test run on https://github.com/aegir-project/hosting/pull/8 Green :)

The provision patch is already included in 3.8 so now I committed the hosting part.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.