The Drupal 7.50 release broke our upgrade test from Aegir 6.x-2.x to 7.x-3.x ....

The following module has moved within the file system: admin_menu. In [error]
order to fix this, clear caches or put the module back in its
original location. For more information, see the documentation page.
in _drupal_trigger_error_with_delayed_logging() (line 1128 of
/var/aegir/hostmaster-7.x-3.x-2016-07-08-0323/includes/bootstrap.inc).

Change record: https://www.drupal.org/node/2581445

Unless anyone finds time to debug this I think it's time to deprecate the upgrade path ... So anyone still wanting to upgrade from 6.x-2.x should first upgrade to Aegir 3.6 (with Drupal 7.44) and then further.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

helmo created an issue. See original summary.

memtkmcc’s picture

FileSize
751 KB

BTW, it doesn't affect BOA, since we run drush @hm rr during hostmaster upgrade, automatically. Just tested and it worked flawlessly:

6.38 -> 7.50

helmo’s picture

@memtkmcc thanks for the report ... would you be able to provide a patch to let Aegir use the same solution?

memtkmcc’s picture

I have checked our forked Provision code, and while there is the `registry-rebuild` command we have added, it is never run against hostmaster, so it looks like it works for us probably because we are using non-vanilla D6 and D7 cores (both with numerous patches), and not because we have `registry-rebuild` added there. Not sure, since I didn't test this with vanilla D6/D7.

Here is the (unrelated to this issue, though) code we have in place. Do you think we could still propose a patch to include it by default?

/**
 * Implementation of drush_hook_post_COMMAND().
 */
function drush_provision_drupal_post_provision_deploy() {
  // Run registry-rebuild without cache-clear, if available.
  if (function_exists('drush_registry_rebuild')) {
    if (drush_drupal_major_version() < 8 && d()->profile != 'hostmaster') {
      provision_backend_invoke(d()->name, 'registry-rebuild --no-cache-clear');
      drush_log(dt('Registry rebuilt with --no-cache-clear in drush_provision_drupal_post_provision_deploy'));
    }
  }
  // Call the drush updatedb command.
  provision_backend_invoke(d()->name, 'updatedb');
  // Run registry-rebuild with cache-clear, if available.
  if (function_exists('drush_registry_rebuild')) {
    if (drush_drupal_major_version() < 8 && d()->profile != 'hostmaster') {
      provision_backend_invoke(d()->name, 'registry-rebuild');
      drush_log(dt('Registry rebuilt without --no-cache-clear in drush_provision_drupal_post_provision_deploy'));
    }
  }

The interesting thing here is that we run `registry-rebuild` twice, but the first time without clearing caches, because it could break things if `updated` modifies tables structure, since cache clear would push bootstrap phase too far and too early. The second rebuild uses a full cache clear (default mode), and both are needed to avoid problems with paths changes due to includes moved in some modules, etc.

I think there was a reason we have added the `d()->profile != 'hostmaster'` exception here before, but maybe it was required to avoid broken hostmaster upgrades before we realized we need to run `registry-rebuild` twice, and the first w/o cache clear, so maybe you could try if this code fixes the problem when using vanilla D6/D7 core, if the hostmaster exception is removed and registry-rebuild extension is present?

I think we could at least test with BOA that part with removing hostmaster exception to see if anything breaks on D6-D7 upgrade with custom core.

helmo’s picture

Thanks @memtkmcc I created a patch from it and posted it in #1694792: Allow provision-deploy to move a site to a platform with different module paths

I still have to test it though.

helmo’s picture

ShaneOnABike’s picture

Priority: Normal » Major

I am going to change the status to Major because essentially anyone attempting to upgrade to the new Aegir cannot. They end up in this mid-point state due to this issue... I would love a fix!

Jon Pugh’s picture

Category: Plan » Bug report

Definitely a bug, I'm tempted to mark it critical.

This is something we should add to our automated testing.

helmo’s picture

Jon Pugh’s picture

Oh! Hah, ok, then let me rephrase:

We should block committing until those tests pass!

Or instead... Add this test to the new behat tests, then make every contribution a pull request: the main branches are configured on github to not allow merging until PRs past tests.

ergonlogic’s picture

Just a reminder, but this was caused by a (questionable) change in Drupal core, not a faulty commit to Aegir.

I think it's time to deprecate the upgrade path ... So anyone still wanting to upgrade from 6.x-2.x should first upgrade to Aegir 3.6 (with Drupal 7.44) and then further.

I don't know how workable that'd be, since it isn't like we keep a 3.6 Debian package around for folks to upgrade from 2.x...

ergonlogic’s picture

Also, the current failures in the 6-to-7 tests are of a different nature:

Unknown option: --force. See `drush help hosting-tasks` for [error]
available options. To suppress this error, add the option --strict=0.

ShaneOnABike’s picture

Category: Bug report » Plan

@ergonlogic: Hey is there any quick workaround that folks can use in the meantime? Tragically, anyone trying to do the upgrade at this point cannot because (well) missing module warnings are considered an invalid upgrade :(

memtkmcc’s picture

It seems to confirm that the registry-rebuild we are using in BOA is in fact helpful only for hosted sites, and is not run against hostmaster, so not expected to help here.

However, it just works for us, and I don't think it works because we are using Pressflow 6 and modified Drupal 7 cores, unless we have patches which prevent this warning.

We are running drush @hostmaster registry-rebuild directly before hostmaster-migrate and then again, immediately after hostmaster-migrate.

But it should still fail, I think, while hostmaster-migrate runs.

So why it does work for us?

Before I will try to dig further, can't we somehow silence all those warnings on upgrade from 2.x ?

memtkmcc’s picture

I mean, if nothing else helps, maybe just force on 2.x to 3.x upgrade:

  error_reporting(0);
  ini_set('display_errors', FALSE);
  ini_set('display_startup_errors', FALSE);
memtkmcc’s picture

Still, running drush @hm registry-rebuild after the upgrade seems to be recommended to avoid issues. At least, it helps to avoid WSOD directly after the upgrade in BOA.

memtkmcc’s picture

It should be easy to test on vanilla Aegir after adding these lines in the local.settings.php in the hostmaster site.

helmo’s picture

I tested with the lines from #15 and that seemed to work :)

After that I also added pm-disable for a few modules that are D6 only.
That does not suppress all the missing module messages, but modalframe jquery_update are dependancies.

Commits to the test suite: http://drupalcode.org/puppet-aegir/commit/?id=d40f142 and http://drupalcode.org/puppet-aegir/commit/?id=039ee44

Tests running now: http://ci.aegirproject.org/view/Upgrades/job/U_aegir_6.x-2.x_to_7.x-3.x_...

memtkmcc’s picture

OK, so this is why it works in BOA then -- we force this in global.inc by default, which is a good practice, by the way.

ergonlogic’s picture

@shaneonabike: It looks like adding the lines in #15 to the Aegir site's local.settings.php will allow the upgrade to succeed (or at least, ignore errors). The upgrade path that I recommend (for Aegir 2.x to Aegir 3.x) is actually to provision a new server, install Aegir 3 on it, and use remote_migrate to move all your sites over to it. Since Aegir 2 is on Drupal 6, that has been EOL for quite a while now, I don't know how much longer we can support a direct upgrade path.

helmo’s picture

Status: Active » Fixed

I've documented the extra steps on http://docs.aegirproject.org/en/3.x/install/upgrade/#major-upgrade-from-... please review.

Jon Pugh’s picture

Interesting...

I'm glad we found a workaround, but is this really a good enough fix?

Would it be possible to add the "error_reporting" hack into a hook_update_N()?

Even better, what about using a hook_update_N() to manually tweak the database so we don't get an error at all?

I know for a fact there are a lot of Aegir 1.x/2.x installs still out there, and the "recommended upgrade" is a lot of work for most.

Regarding Drupal 6 being EoL, it's becoming more and more clear that Drupal 6 isn't going anywhere. There are still plenty of Aegir 1.x out there

There are now 3 vendors providing Drupal 6 LTS (https://www.drupal.org/project/d6lts) and now there is a web app for it: https://tag1consulting.com/blog/tag1-quo-drupal-6-long-term-support

Let's help make it as easy as possible for the long tail Aegir community to upgrade.

ShaneOnABike’s picture

Status: Fixed » Active

Yahoo thanks very much for the documentation as well as posting a solution it works! It sucks that there wasn't an additional module added in drupal that does a proper cleanup. I feel bad for people migrating sites from D6 -> D7 that were originally in D5 (myself included). What a mess of 'missing' modules.

memtkmcc’s picture

You can't really "manually tweak the database so we don't get an error at all", and while using registry-rebuild could help for relocated modules, it will not help with "missing" modules. But why not to disable and uninstall problematic modules before running the upgrade? Then enable these still used (and relocated) after the upgrade. This should even remove the need to use registry-rebuild pre/post upgrade.

We are using this method in BOA, and it works perfectly for 2.x to 3.x upgrades.

ergonlogic’s picture

We'll need something that can work for the Debian packages in a completely automated fashion. I'd suggest that we explore a pre-hostmaster-migrate hook. Presumably we could check whether we're coming from a D6 platform, and simply add the lines from #15, if that's the case.

ergonlogic’s picture

Category: Plan » Bug report
Status: Active » Needs review
FileSize
2.17 KB

This patch appears to mostly work. I need to test the backup/cleanup of local.settings.php, but it allowed the upgrade to succeed with the Debian package.

ergonlogic’s picture

FileSize
2.19 KB

Fixed the cleanup with this latest patch.

ergonlogic’s picture

I committed and pushed that patch, then reverted the tweaks to the tests in puppet-aegir. The upgrade 2.x to 3.x test still succeeds: http://ci.aegirproject.org/view/Upgrades/job/U_aegir_6.x-2.x_to_7.x-3.x_...

The takeaway from this issue is just how important it is to maintain a stable API. All this hassle because somewhere in a 630-comment long issue, a new parameter was added with the (imho) wrong default ($trigger_error = TRUE).

I'm not 100% convinced that this won't bite us again though. How sure are we that the system table is getting clean up? We may want to add a hook_update_N() that effectively performs a registry rebuild.

memtkmcc’s picture

We should check if local.settings.php exists before trying to create .bak, and then check again if .bak exists before restoring original file, to avoid false errors, perhaps?

memtkmcc’s picture

Also, not sure why we are using chmod 02740 ? I mean, the executable bit is not expected on this file.

ergonlogic’s picture

Aegir creates an empty local.settings.php file on install at this point, but I suppose it might not exist if the system is very old. That said, it should fail with warnings in that case, and file_put_contents() should still create the temporary file. Still, it wouldn't have a backup to restore from afterwards, in that case. I think just creating an empty local.settings.php should be sufficient though, as the backup is created within the same function as it's later used.

Also, not sure why we are using chmod 02740?

Just laziness, I guess :p

I suppose that 02640 should be sufficient. Good catch.

ergonlogic’s picture

Here's a follow-up patch.

helmo’s picture

Status: Needs review » Fixed
helmo’s picture

Status: Fixed » Active

Jenkins failed again ... could it be that the last commit was pushed a bit later then the commit time?
http://ci.aegirproject.org/view/Upgrades/job/U_aegir_6.x-2.x_to_7.x-3.x_...

helmo’s picture

Status: Active » Fixed

Jenkins is green again ... must have been a temp issue ... http://ci.aegirproject.org/view/Upgrades/job/U_aegir_6.x-2.x_to_7.x-3.x_...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

helmo’s picture

When upgrading between two dev versions of Aegir I noticed the message "Detected major version upgrade" A 7.50 to 7.51 platform is not what I consider major ;)

The output of version_compare('7.0', trim($platform_version['output']))

I added a commit to compensate.