Problem/Motivation

While the registry is being rebuilt there are scenarios where integrity constraint errors are printed. These errors occur in the two following scenarios:

This issue has been found on several contributed modules issue queues.

Proposed resolution

For the first scenario (two rebuilds in paralel), a lock is being suggested but is still under discussion as there is some overhead added by it (see comment #19).

For the second scenario, using db_merge() instead of db_insert() solves the issue.

Remaining tasks

There seems to be consensus about the status of the patch so it should be committed soon. A double-lock approach will be studied in a separate issue.

API changes

None.

Original report by sun

There are lots of reports from users on D7 about

PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'WHATEVER-class' for key 1: INSERT INTO {registry}

everywhere on the net :( It's almost the new perceived slogan for Drupal out there. :(

Technically, this error "cannot" be triggered normally... unless the registry rebuild would run more than once in parallel. However, the error is repeatedly reported by regular users -- no extremes, no large-scale sites involved.

So, wondering whether we can at least resolve it on the surface by changing that effin' db_insert() into a db_merge().

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Status: Needs review » Needs work

The last submitted patch, drupal8.registry-rebuild-nightmare.0.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
FileSize
1.6 KB

oopsie, forgot how db_merge()->key() works ;)

catch’s picture

There's no lock around registry rebuild - do we want to consider adding one instead of the db_merge()? That might not be a good idea but came to mind looking at the patch.

juampynr’s picture

If anyone needs to reproduce this issue, it happens on named sites (not using sites/default) when registry paths are registered for one site (ie. sites/d7.localhost/modules/contrib/feeds/plugins/A.inc) and then we try to clean the cache in a cloned Drupal installation with a different sites location (ie sites/d7B.localhost).

A quick sample of failure can be created with the steps listed at http://drupal.org/node/1347894#comment-5366800.

Once the registry has recorded paths, Ctools attempts to re-register them on hook_registry_alter by picking a cached version that has an inexistent path (see #1371700: CTools adds cached plugins with wrong paths during hook_registry_files_alter() in multisite systems), causing the integrity error.

arski’s picture

subscribing, very annoying indeed on D7 sites!

juampynr’s picture

Please stop subscribing by posting a comment and click on the "Follow" link at the top of the issue.

juampynr’s picture

Title: STOP the registry integrity constraint violation nightmare » STOP the registry integrity constraint violation nightmare.
FileSize
1.55 KB
1.53 KB

I backported the patch to 7.x and can confirm that the integrity error gets fixed and the registry table ends in a consistent state after cleaning the cache (hence rebuilding the registry). However, I had to remove the line

+      $newquery = clone $query;

so the values could be merged correctly. As it was, the same Integrity constraint error raised.

Attached are an updated patch for D8 and its backport to D7. This fixes quite a bit or issues in Drupal 7 contributed modules that had minor fixes such as:

If anyone wants to recreate an environment where this happens, please contact me on my personal contact form and I will share a Dropbox folder where I have got a Drupal core and database dump configured to test this particular error when cleaning the cache.

sun’s picture

Added a lock to registry_update().

Status: Needs review » Needs work

The last submitted patch, drupal8.registry-rebuild-nightmare.8.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
FileSize
2.75 KB

Almost figured that. ;) Attached patch skips the lock when we're in the Drupal installer.

sun’s picture

Title: STOP the registry integrity constraint violation nightmare. » STOP the registry integrity constraint violation nightmare

This patch looks RTBC to me. Note that this is a critical issue.

catch’s picture

How do we know for certain the registry_update() will complete in less than 30 seconds?

sun’s picture

There's no guarantee for that, but wouldn't we potentially hit the max_execution_time roadblock after 30 seconds either way?

David_Rothstein’s picture

Title: STOP the registry integrity constraint violation nightmare. » STOP the registry integrity constraint violation nightmare

There's code in _registry_parse_files() that currently does this, before _registry_parse_file() is called:

// Delete registry entries for this file, so we can insert the new resources.
db_delete('registry')
  ->condition('filename', $filename)
  ->execute();

If we're changing _registry_parse_file() to use db_merge(), can the above code be removed?

mradcliffe’s picture

I modified the d8 patch to work for d7 (s/core\//), and this got things working again on a multisite dev. environment. I did not review for commenting on #14. Just a "it fix it" review for D7.

BernieCram’s picture

I was having a PDOException: SQLSTATE[23000]: error on a site managed through aegir and couldn't clone or migrate the site because of that. I manually applied the changes in the patch at #10 to my 7.12 platform which fixed the PDOException.

So essentially i'd agree that it works on a multisite d7. If anyone can roll a d7 version of the patch, that would be great for future reference.

Thanks for the patch and thanks to @juampy for your messages in the forums on all sorts of module issue queues looking for the cause of the PDOexception.

B

juampynr’s picture

You are welcome!
Just to give some perspective of how this bug affects other modules: for me it all started with an Aegir site too (thus, using the multi site feature). I first saw an error related to Feeds #1347894: Clear cache causes integrity constraint violation when clearing out the cache, then I debugged it further and went down to CTools #1371700: CTools adds cached plugins with wrong paths during hook_registry_files_alter() in multisite systems and finally ended up in core at this issue. It seems that it can be solved at different levels but definitively core fixes them all.

klausi’s picture

@sun: why would we want a lock here? Updating/Rebuilding the registry is a rare task, usually when caches are cleared or modules are enabled/disabled. The lock comes with some problems:

  • as already stated by catch the lock time limit can be exceeded, then we can have a parallel registry update again anyway.
  • Administrator race condition: One administrator clears caches, another one enables a module in the meantime. So the second call to registry_update() just waits and assumes that first call has included the new files of the module. This might not be the case and we get an inconsistent registry state, where a module is enabled but its files are not in registry.
  • code complexity: adding a lock on top of registry updates makes this code even more hard to grasp.

So we should just make the registry update more robust, db_merge() seems perfect for that.

@David_Rothstein: yes, the deletion part beforehand seems useless then.

klausi’s picture

New patch without the lock and without the obsolete delete query.

mradcliffe’s picture

Drupal 7 patch.

aspilicious’s picture

I thik this is rtbc...

ksenzee’s picture

Status: Needs review » Reviewed & tested by the community

Yes, I agree it's RTBC. I am a big db_merge() fan and this is a good place to use it.

sun’s picture

Status: Reviewed & tested by the community » Needs work

#18 tried to argue against the lock, but if you fully think through it, then it actually argued for the lock ;) Here's why:

  1. Request #1 hits the site, changing module/extension information, starts a rebuild...
  2. Request #2 hits the site in parallel, starts a rebuild...
  3. Request #3 hits the site in parallel, starts a rebuild...
  4. Request #4 hits the site in parallel, starts a rebuild...
  5. Request #1 completes the rebuild, based on new module/extension information.
  6. Request #2 completes the rebuild, potentially based on old module/extension information.
  7. ...

To prevent any possibly malformed state information and parallel rebuilds, everything else needs to wait until the request that caused the rebuild has completed. That's what the lock system has been designed for.

In turn, we want to incorporate #14 into #10.

klausi’s picture

Status: Needs work » Needs review

I don't get it, request #2 arrives after #1, so the changed module information should be present already?

Anyway, I would argue that the rebuilding race condition is a separate, non-critical issue. As the issue title says, let's fix the nightmare first.

sun’s picture

Status: Needs review » Needs work

The nightmare is caused by the race condition.

klausi’s picture

Status: Needs work » Needs review
FileSize
3.22 KB

After discussion with sun on IRC and more thinking, here is the patch with the lock again and with the obsolete db_delete() removed. The lock helps to eliminate the race condition.

I did a little manual testing with an artificial sleep(20); delay during the lock in registry_update(). Initiating a cache clear in one request and enabling a module in a parallel request worked as desired. The second request updates the registry twice: once in module_enable() and once through drupal_flush_all_caches() at the end of system_modules_submit(). So even if after the first update the registry is inconsistent, this is cleared up by the cache clear at the end again. I also tested parallel module enabling with drush 5.x, also works as expected. So no worries anymore from my side.

sun’s picture

Status: Needs review » Reviewed & tested by the community

Thanks!

I mentioned that the db_merge() change might be superfluous with the lock addition now. Sleeping over it, the db_merge() makes perfectly sense in terms of performance here (and should even be faster), since the old code pretty much deleted all existing records and re-inserted them afterwards. Since this is a rebuild operation, most of the records will be identical (or get updated), so blatantly deleting them upfront can only be slower.

Thus, _registry_parse_files() will no longer delete records in {registry}. That said, _registry_update() still deletes orphan records for code and files that no longer exist on disk.

Anonymous’s picture

Status: Reviewed & tested by the community » Needs work

sorry folks, this doesn't look right.

previously, we deleted then inserted. AFAICS, now we just merge. so i think that means we can leave stale entries behind.

looking into that now...

Anonymous’s picture

Status: Needs work » Needs review
FileSize
4.09 KB

ok, here's a first stab at deleting stale entries after the merge.

Anonymous’s picture

klausi spotted an unrelated hunk in statistics.test in the patch, rerolled without that.

klausi’s picture

After more thinking about the lock I came to the conclusion that we have to rebuild the registry after waiting on an older, parallel request. This ensures that after a call to module_enable() the registry is always in a consistent state. Patch attached.

sun’s picture

@beejeebus: As mentioned in #27, deletes are still performed in _registry_update(). _registry_parse_files() is only invoked for (still) existing and new files.

@klausi: Can you provide some sane reasoning for #31? I don't understand why that would be needed, so would recommend to go back to #26.

klausi’s picture

Status: Needs review » Needs work

Regarding deletion: classes/interfaces can get removed from a file, if we don't remove those entries they stay as stale entries forever. Example: Add an empty class CommentFoo to comment.entity.inc. Clear caches, CommentFoo now shows up in the registry table. Remove CommentFoo from comment.entity.inc. Clear caches, CommentFoo should now be gone from the registry table. As a matter of fact, I just tested that and the class is not gone, so this patch is not ready yet.

Some further reasoning: we want module_enable() to be atomic and fully consistent. If there is a registry update from an old request going on we have to initiate another registry update to be sure that possible changes are accounted for. Otherwise the outcome of module_enable() could be inconsistent. This is not a problem for Drupal core as my tests have shown (extra cache clearing is done), but any other code that relies on module_enable() could cause problems when the registry is not up to date.

I hoped this would be explained by the comments in the code, but it is tricky to describe :-(

klausi’s picture

Status: Needs work » Needs review
FileSize
3.86 KB

Ok, for whatever reason I must have made a mistake during my testing of the registry entry deletion. It works as advertised.

Here is an updated patch that contains code comments as suggested by beejeebus on https://gist.github.com/1918695

sun’s picture

Status: Needs review » Needs work
+++ b/core/includes/bootstrap.inc
@@ -3215,11 +3215,39 @@ function registry_rebuild() {
+  if (!$in_installer && !lock_acquire(__FUNCTION__)) {
+    // Another request got the lock, wait for it to finish.
+    lock_wait(__FUNCTION__);
+
+    // Try to get the lock again, as the previous run may not incorporate
+    // changes from module enable/disable events.
+    if (!lock_acquire(__FUNCTION__)) {
+      // We failed to get the lock twice, so whichever process got the lock
+      // will incorporate our changes, so we can just wait for it to finish
+      // then return.
+      lock_wait(__FUNCTION__);
+      return FALSE;
+    }
+  }

Sorry, I can't make much sense of the idea of this consecutive lock. The consequences of this idea are entirely not clear, and we don't do something like this anywhere else throughout core. Can we split that into a separate issue?

+++ b/core/includes/registry.inc
@@ -166,17 +162,27 @@ function _registry_parse_files($files) {
+    $all_names = array();
     foreach ($matches[2] as $key => $name) {
...
+      $all_names[] = $name;
...
+      ->condition('name', $all_names, 'NOT IN')

$all_names == array_values($matches[2]);

No need for duplicating that array.

klausi’s picture

Status: Needs work » Needs review
FileSize
4.13 KB

Fixed the array duplication.

I have not removed the double lock acquiring, as I think it is important for an accurate registry state. Why are the consequences of this approach not clear? Let me explain in other words: if a registry update is currently going on and other requests also want to update the registry, they line up in the lock and wait until the old update is finished. Then an arbitrary request picks up the new update and the others just have to wait it out. The state of enabled modules is directly queried from the system table in _registry_update(), so anything that has been changed until this particular point in time will be incorporated in the registry update.

Just because we do not use this anywhere else in core does not mean that this is not reasonable.

klausi’s picture

Urgs, the previous patch contains an unrelated change in comment.entity.inc from my manual testing. Removed that.

David_Rothstein’s picture

Minor issue:

+  $in_installer = defined('MAINTENANCE_MODE') && MAINTENANCE_MODE == 'install';

This should probably just use drupal_installation_attempted().

klausi’s picture

Right, fixed that.

sun’s picture

I've pinged @pwolanin and @Damien Tournoud about the consecutive log usage in this patch.

sun’s picture

I've pinged both several times in the meantime without response.

Can we move the consecutive lock proposal into a separate issue? @klausi mentioned that it actually should also be applied to the menu_rebuild() lock. That would actually support a separate issue. Not to mention that it might make sense to think about a "built-in" opt-in behavior directly in the lock system for this, so we don't need to repeat that overly verbose consecutive lock code all over the place.

klausi’s picture

Sad, but I agree at this point. Let's just fix this now and deal with the minor race condition later.

Anonymous’s picture

i'm ok with getting the merge stuff in, then circling back.

catch’s picture

Status: Needs review » Needs work

The double lock means the process could end up waiting for at least 60 seconds, since max_execution_time can often be 30 seconds we could end up erroring out rather than continuing anyway, so agreed it'd make sense to tackle this in a separate issue.

klausi’s picture

Status: Needs work » Needs review
FileSize
3.4 KB

Rerolled, now with the single lock_wait() in registry_update() again.

tim.plunkett’s picture

Tagging.

juampynr’s picture

Added an issue summary.

tuzonghua’s picture

tuzonghua’s picture

Issue summary: View changes

Added issue summary.

juampynr’s picture

Here is a port of #45 for Drupal 7. I used it to verify that it fixes #1371700: CTools adds cached plugins with wrong paths during hook_registry_files_alter() in multisite systems, which is a registry constraint error that happens during rebuild.

sun’s picture

Status: Needs review » Reviewed & tested by the community

#45 looks ready to fly for me.

mradcliffe’s picture

Was the follow-up issue created yet? I am not sure how to word that one myself.

catch’s picture

Version: 8.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Committed/pushed to 8.x, moving to 7.x for backport.

tim.plunkett’s picture

Status: Patch (to be ported) » Needs review
FileSize
3.12 KB

Reuploading #49, no commit credit.

tim.plunkett’s picture

Status: Needs review » Needs work
+++ b/includes/bootstrap.incundefined
@@ -3080,11 +3080,30 @@ function registry_rebuild() {
+ **

This needs fixing.

Also, this hunk is missing from the reroll:

+    // Delete any resources for this file where the name is not in the list
+    // we just merged in.
+    db_delete('registry')
+      ->condition('filename', $filename)
+      ->condition('name', $matches[2], 'NOT IN')
+      ->execute();
cha0s’s picture

Status: Needs work » Needs review
FileSize
3.36 KB

Re-rolled.

aspilicious’s picture

Status: Needs review » Reviewed & tested by the community

Looks identical

sun’s picture

#55: 1372122-registry-integrity-7.patch queued for re-testing.

denjell’s picture

is #55 a patch i can apply to a running 7.12 system?

xjm’s picture

@denjell - The patch should apply to 7.x-dev. 7.12's codebase has slight differences so you would want to test it. If the patch passes committer review, it will also be included in the next point release of Drupal 7.

denjell’s picture

ok thanks. any idea when the next point release of 7 is going to roll out?

webchick’s picture

Status: Reviewed & tested by the community » Fixed
Issue tags: +7.13 release notes

Wednesday. These things get announced at http://groups.drupal.org/core.

Committed and pushed to 7.x. Great work! I had actually tested this bug against CTools a few weeks ago while it was still NR and it worked for me as well. This seems worth mentioning in the release notes.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

WoozyDuck’s picture

Status: Closed (fixed) » Active

I am using Drupal 7.14 and still get this error when trying to add a new node!

catch’s picture

Status: Active » Postponed (maintainer needs more info)

Please copy/paste the exact error you're seeing.

sun’s picture

Status: Postponed (maintainer needs more info) » Closed (fixed)

Sorry, without further information this issue can only be reverted to closed.

Feel free to re-open this issue if you want to provide further information. Thanks.

sun’s picture

Do not know what the hell happened. I did write the issue review but it shows authored to @tuzonghua.