Problem

_update_create_fetch_task() attempts to ensure that only one queue item per module exists at any one time, however it has a race condition where two processes get a miss on the per-queue-item records in cache_update() at the same time.

This has two symptoms:

- a PDO exception from the failed insert into cache_update.
- unnecessary additional queue items are created. Some sites have reported up to 2,000 outstanding items in the queue with only a fraction of that number of modules installed.

Proposed solution

Add a lock to guarantee only one option at a time.

Outstanding tasks

The patch will need a backport to Drupal 7, then Drupal 6.

I've opened a follow-up to extend the queue API to add a class handling unique queue items: #1492188: Update module creates duplicate queue items.

Original report

Here is the bug in my logs:

Emplacement http://toto.com/admin/structure/types/manage/pays/fields
Référent http://toto.com/admin/structure/types

Message PDOException : SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicata du champ 'fetch_task::captcha' pour la clef 'PRIMARY': INSERT INTO {cache_update} (cid, created) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1); Array ( [:db_insert_placeholder_0] => fetch_task::captcha [:db_insert_placeholder_1] => 1331836672 ) dans _update_create_fetch_task() (ligne 245 dans /var/www/toto.com/modules/update/update.fetch.inc).

and

Emplacement http://toto.com/admin/modules
Référent http://toto.com/admin/modules

PDOException : SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicata du champ 'fetch_task::addressfield' pour la clef 'PRIMARY': INSERT INTO {cache_update} (cid, created) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1); Array ( [:db_insert_placeholder_0] => fetch_task::addressfield [:db_insert_placeholder_1] => 1331836669 ) dans _update_create_fetch_task() (ligne 245 dans /var/www/toto.com/modules/update/update.fetch.inc).

Comments

catch’s picture

Title: PDOException in update.fetch.inc » PDOException (duplicate key in {cache_update} in update.fetch.inc

update module isn't using a db_merge, it has it's own forked version of the cache API and hasn't been updated alongside the actual one.

Now that cache backends are classes, it could probably use DrupalDatabaseCache in Drupal 7 directly and end up with the same effect, maybe.

There's several duplicates of this, for example #1242068: Error in exposed filter and #1371786: PDOException : SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry .

There's also #1175054: Add a storage (API) for persistent non-configuration state which it might be a candidate to use in Drupal 8.

catch’s picture

This is the same bug as #1420372: Modules being checked for updates multiple times.. which I've just marked as duplicate, copying my comment from there:


function _update_create_fetch_task($project) {
  $fetch_tasks = &drupal_static(__FUNCTION__, array());
  if (empty($fetch_tasks)) {
    $fetch_tasks = _update_get_cache_multiple('fetch_task');
  }
  $cid = 'fetch_task::' . $project['name'];
  if (empty($fetch_tasks[$cid])) {
    $queue = queue('update_fetch_tasks');
    $queue->createItem($project);
    db_insert('cache_update')
      ->fields(array(
        'cid' => $cid,
        'created' => REQUEST_TIME,
      ))
      ->execute();
    $fetch_tasks[$cid] = REQUEST_TIME;
  }
}

OK so there's a race condition here - before the update_fetch_task item is created, any number of processes could be creating queue items simultaneously. Probably the easiest change would be to add a lock.

catch’s picture

Title: PDOException (duplicate key in {cache_update} in update.fetch.inc » Race condition in _update_create_fetch_task() (duplicate queue items created, PDO Exceptions)
Version: 7.12 » 8.x-dev
Issue tags: +Needs backport to D7
catch’s picture

Status: Active » Needs review
StatusFileSize
new759 bytes

Here's a patch that adds a lock.

catch’s picture

Issue summary: View changes

add another error message

chx’s picture

Status: Needs review » Reviewed & tested by the community

Hopefully #1492188: Update module creates duplicate queue items will add a better fix. But for now, good enough.

dries’s picture

This looks good to me. Not sure if we want to wait for #1492188: Update module creates duplicate queue items. I'll let catch decide on this.

catch’s picture

I think this is quite urgent to fix in D7, #1420372: Modules being checked for updates multiple times.. suggests that people are ending up with over 2,000 items in the update fetch queue. Additionally there's already a workaround here for unique queue items, we're compounding it a bit with the lock, but converting update module to use the unique queue will be quite significant to backport. However I'll bump #1492188: Update module creates duplicate queue items to major.

catch’s picture

Assigned: Unassigned » webchick
StatusFileSize
new759 bytes

I'm moving this over to webchick for commit since it's my patch.

We can't add tests here because it's purely a race condition, so there's not anything with the test framework we could do to reliably reproduce the bug.

Also attaching D7 backport.

Nick Fedchik’s picture

I apply update_lock_D7.patch to my D7, but the situation has become worse :( and I rollback

gábor hojtsy’s picture

@Nick: what do you mean by that? In what way it got worse? More SQL errors like posted above?

webchick’s picture

Assigned: webchick » Unassigned

I think getting this fix in makes sense, and working on the more useful thing in the separate issue.

However, I too am confused/concerned by Nick's comment. Let's maybe leave this another 24 hours to let him repsond. There is still time to roll this back before the May 2 release.

Unassigning myself, since any of us can commit this.

Nick Fedchik’s picture

Yes, I got much more SQL errors like posted above.
I have a lots of modules in my Drupal installation, and the situation happend often when I save the Modules page.

berdir’s picture

What irritates me is that this race condition shouldn't result in more than a single duplicate, anything else is extremely unlikely, right?

I have seen sites which have literally *dozens* of queue entries for the same project, resulting in a total queue size of ~9000 items just for update.module. There is no way this could have happened by this race condition IMHO. And it's not like it has only happened once, it's so frequent that I usually first do a quick count() query on the queue table before starting drush up...

So there might be more things that are not working as they are supposed to...

Edit: More specifically, I assume this is a valid fix for the duplicate entry PDO exception but not the *actual* duplicates in that queue. I'll try to get my hands on a site that is having this.

catch’s picture

Status: Reviewed & tested by the community » Needs review
Issue tags: +Needs manual testing

Moving this back to needs review and adding the 'needs manual testing' tag in case we can find some more people with D7 sites to test the patch on.

@Berdir, #1492188 should be the 'proper' fix for this, but I agree it looks like something is going on here other than the race condition, staring at the code a bit didn't help though so I'd be interested in debug if you get some.

rooby’s picture

I have a site that regularly checks the status of all modules many times when I try to do a drush update.

Would that site help in terms of manual testing?

catch’s picture

Status: Needs review » Needs work

If you're getting the duplicate key error it'll help, but for the actual duplicate items in the queue itself, just discussed this with beejeebus and he pointed out http://api.drupal.org/api/drupal/modules!update!update.module/function/_... which explains the whole issue. So we need to more or less rip that function out, and handle cleaning up old cache items in the queue runner itself.

berdir’s picture

Oh nice...Yes, that *is* obvious then ;) I knew it must be something like that ;)

I'm wondering if we still want to get locking in from this issue to at least fix the pdo exceptions and then work on the unique queue issue (which then should be raised to major if it isn't already). I have a feeling that's not going to be much harder than cleanup up the mess with that cache clear function.

berdir’s picture

Title: Race condition in _update_create_fetch_task() (duplicate queue items created, PDO Exceptions) » Race condition in _update_create_fetch_task() (PDO Exceptions)
Status: Needs work » Needs review

Nobody said otherwise, so let's try this. Back to needs review, adjust focus of issue by changing title. Issue summary probably needs to be updated as well.

Anonymous’s picture

Status: Needs review » Needs work

the lock as it in the patch is still racy, we'd have to move it up and make the scope broader.

(in between the time you get a result resource back from a mysql query, another connection can insert into the update task 'cache' table, but you won't see that row when iterating over your result resource.)

i think we should just wrap the insert, catch the exception and move on to fixing the dupe queue items.

  if (empty($fetch_tasks[$cid])) {
    $queue = queue('update_fetch_tasks');
    $queue->createItem($project);
    // TODO: fix the race condition here, as it's easy to generate multiple
    // queue items for the same project.
    try {
      db_insert('cache_update')
        ->fields(array(
          'cid' => $cid,
          'created' => REQUEST_TIME,
        ))
        ->execute();
    }
    catch (Exception $e) {
      // It's possible another process inserted before us, just don't blow up.
    }
    $fetch_tasks[$cid] = REQUEST_TIME;
  }

with better comments.

catch’s picture

That works for me.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new1.11 KB

Here is a patch and an attempt at the comments. Please improve :)

Anonymous’s picture

StatusFileSize
new1.01 KB

here we go.

Anonymous’s picture

woops, cross-post. #21 looks like it has better comments than #22, so i think it's RTBC if the bot comes back green.

Status: Needs review » Needs work

The last submitted patch, 1484216-21-update-task.patch, failed testing.

berdir’s picture

Taxonomy Vocabulary test fails? Really? Can't reproduce that locally...

Testbot, you're lying :) Let's try that again.

berdir’s picture

Status: Needs work » Needs review

#21: catch_exception.patch queued for re-testing.

catch’s picture

Status: Needs review » Reviewed & tested by the community

Marking #21 RTBC. I've just committed the other update module patch to 8.x.

Anonymous’s picture

yay, lets kill these bugs.

xjm’s picture

Do we have a followup issue for that @todo?

catch’s picture

There's http://drupal.org/node/1492188 and the followup linked from there (typing from phone so not grabbing the link). I think the combination of the two patches is probably enough to fix the bug in D7 then the unique queue item api would remove the race condition we're working around here.

catch’s picture

Version: 8.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)
Issue tags: +Novice

Committed/pushed to 8.x, moving to 7.x for backport. The backport should be straightforward so tagging as novice.

tim.plunkett’s picture

Status: Patch (to be ported) » Needs review
StatusFileSize
new1 KB

Rerolled. Did not test manually.

Status: Needs review » Needs work

The last submitted patch, drupal-1484216-32.patch, failed testing.

tim.plunkett’s picture

Assigned: Unassigned » tim.plunkett

Duh! I rerolled the wrong patch. I'll fix this in the morning.

xjm’s picture

Assigned: tim.plunkett » Unassigned

That looks like a reroll of the broken #22. The committed patch was #21.

xjm’s picture

xpost, sorry. :)

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new1.1 KB

Time for some testing fun ;)

test.php


/**
 * Root directory of Drupal installation.
 */
define('DRUPAL_ROOT', getcwd());

require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

db_delete('cache_update')
  ->execute();

module_load_include('inc', 'update', 'update.compare');
$projects = update_get_projects();
foreach ($projects as $key => $project) {
  update_create_fetch_task($project);
}

Then load-testing that with
$ ab -n 20 -c 10 http://d7/test.php

And watching the watchdog table.

Before patch:

 9126  28/Apr 11:56  error  php  PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'fetch_task::driven' for key 'PRIMARY': INSERT INTO {cache_update} (cid, created) VALUES (:db_insert_pla 
 9127  28/Apr 11:56  error  php  PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'fetch_task::paranoia' for key 'PRIMARY': INSERT INTO {cache_update} (cid, created) VALUES (:db_insert_p 
 9128  28/Apr 11:56  error  php  PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'fetch_task::userpoints_contrib' for key 'PRIMARY': INSERT INTO {cache_update} (cid, created) VALUES (:d 
...

With patch:

--- Nothing ---

Which is rather obvious, when looking at the code :)

xjm’s picture

Status: Needs review » Reviewed & tested by the community

Correct backport.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Committed and pushed to 7.x. Thanks!

I was confused about why #38 was not a simpletest but it's because it's a race condition, which is hard to test from SimpleTest, as noted in #8.

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Updating issue summary