Problem
_update_create_fetch_task() attempts to ensure that only one queue item per module exists at any one time, however it has a race condition where two processes get a miss on the per-queue-item records in cache_update() at the same time.
This has two symptoms:
- a PDO exception from the failed insert into cache_update.
- unnecessary additional queue items are created. Some sites have reported up to 2,000 outstanding items in the queue with only a fraction of that number of modules installed.
Proposed solution
Add a lock to guarantee only one option at a time.
Outstanding tasks
The patch will need a backport to Drupal 7, then Drupal 6.
I've opened a follow-up to extend the queue API to add a class handling unique queue items: #1492188: Update module creates duplicate queue items.
Original report
Here is the bug in my logs:
Emplacement http://toto.com/admin/structure/types/manage/pays/fields
Référent http://toto.com/admin/structure/types
Message PDOException : SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicata du champ 'fetch_task::captcha' pour la clef 'PRIMARY': INSERT INTO {cache_update} (cid, created) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1); Array ( [:db_insert_placeholder_0] => fetch_task::captcha [:db_insert_placeholder_1] => 1331836672 ) dans _update_create_fetch_task() (ligne 245 dans /var/www/toto.com/modules/update/update.fetch.inc).
and
Emplacement http://toto.com/admin/modules
Référent http://toto.com/admin/modules
PDOException : SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicata du champ 'fetch_task::addressfield' pour la clef 'PRIMARY': INSERT INTO {cache_update} (cid, created) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1); Array ( [:db_insert_placeholder_0] => fetch_task::addressfield [:db_insert_placeholder_1] => 1331836669 ) dans _update_create_fetch_task() (ligne 245 dans /var/www/toto.com/modules/update/update.fetch.inc).
| Comment | File | Size | Author |
|---|---|---|---|
| #38 | update-fetch-task-catch-exception-1484216-37.patch | 1.1 KB | berdir |
| #32 | drupal-1484216-32.patch | 1 KB | tim.plunkett |
| #22 | 1484216-21-update-task.patch | 1.01 KB | Anonymous (not verified) |
| #21 | catch_exception.patch | 1.11 KB | berdir |
| #8 | update_lock_D7.patch | 759 bytes | catch |
Comments
Comment #1
catchupdate module isn't using a db_merge, it has it's own forked version of the cache API and hasn't been updated alongside the actual one.
Now that cache backends are classes, it could probably use DrupalDatabaseCache in Drupal 7 directly and end up with the same effect, maybe.
There's several duplicates of this, for example #1242068: Error in exposed filter and #1371786: PDOException : SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry .
There's also #1175054: Add a storage (API) for persistent non-configuration state which it might be a candidate to use in Drupal 8.
Comment #2
catchThis is the same bug as #1420372: Modules being checked for updates multiple times.. which I've just marked as duplicate, copying my comment from there:
OK so there's a race condition here - before the update_fetch_task item is created, any number of processes could be creating queue items simultaneously. Probably the easiest change would be to add a lock.
Comment #3
catchComment #4
catchHere's a patch that adds a lock.
Comment #4.0
catchadd another error message
Comment #5
chx commentedHopefully #1492188: Update module creates duplicate queue items will add a better fix. But for now, good enough.
Comment #6
dries commentedThis looks good to me. Not sure if we want to wait for #1492188: Update module creates duplicate queue items. I'll let catch decide on this.
Comment #7
catchI think this is quite urgent to fix in D7, #1420372: Modules being checked for updates multiple times.. suggests that people are ending up with over 2,000 items in the update fetch queue. Additionally there's already a workaround here for unique queue items, we're compounding it a bit with the lock, but converting update module to use the unique queue will be quite significant to backport. However I'll bump #1492188: Update module creates duplicate queue items to major.
Comment #8
catchI'm moving this over to webchick for commit since it's my patch.
We can't add tests here because it's purely a race condition, so there's not anything with the test framework we could do to reliably reproduce the bug.
Also attaching D7 backport.
Comment #9
Nick Fedchik commentedI apply update_lock_D7.patch to my D7, but the situation has become worse :( and I rollback
Comment #10
gábor hojtsy@Nick: what do you mean by that? In what way it got worse? More SQL errors like posted above?
Comment #11
webchickI think getting this fix in makes sense, and working on the more useful thing in the separate issue.
However, I too am confused/concerned by Nick's comment. Let's maybe leave this another 24 hours to let him repsond. There is still time to roll this back before the May 2 release.
Unassigning myself, since any of us can commit this.
Comment #12
Nick Fedchik commentedYes, I got much more SQL errors like posted above.
I have a lots of modules in my Drupal installation, and the situation happend often when I save the Modules page.
Comment #13
berdirWhat irritates me is that this race condition shouldn't result in more than a single duplicate, anything else is extremely unlikely, right?
I have seen sites which have literally *dozens* of queue entries for the same project, resulting in a total queue size of ~9000 items just for update.module. There is no way this could have happened by this race condition IMHO. And it's not like it has only happened once, it's so frequent that I usually first do a quick count() query on the queue table before starting drush up...
So there might be more things that are not working as they are supposed to...
Edit: More specifically, I assume this is a valid fix for the duplicate entry PDO exception but not the *actual* duplicates in that queue. I'll try to get my hands on a site that is having this.
Comment #14
catchMoving this back to needs review and adding the 'needs manual testing' tag in case we can find some more people with D7 sites to test the patch on.
@Berdir, #1492188 should be the 'proper' fix for this, but I agree it looks like something is going on here other than the race condition, staring at the code a bit didn't help though so I'd be interested in debug if you get some.
Comment #15
rooby commentedI have a site that regularly checks the status of all modules many times when I try to do a drush update.
Would that site help in terms of manual testing?
Comment #16
catchIf you're getting the duplicate key error it'll help, but for the actual duplicate items in the queue itself, just discussed this with beejeebus and he pointed out http://api.drupal.org/api/drupal/modules!update!update.module/function/_... which explains the whole issue. So we need to more or less rip that function out, and handle cleaning up old cache items in the queue runner itself.
Comment #17
berdirOh nice...Yes, that *is* obvious then ;) I knew it must be something like that ;)
I'm wondering if we still want to get locking in from this issue to at least fix the pdo exceptions and then work on the unique queue issue (which then should be raised to major if it isn't already). I have a feeling that's not going to be much harder than cleanup up the mess with that cache clear function.
Comment #18
berdirNobody said otherwise, so let's try this. Back to needs review, adjust focus of issue by changing title. Issue summary probably needs to be updated as well.
Comment #19
Anonymous (not verified) commentedthe lock as it in the patch is still racy, we'd have to move it up and make the scope broader.
(in between the time you get a result resource back from a mysql query, another connection can insert into the update task 'cache' table, but you won't see that row when iterating over your result resource.)
i think we should just wrap the insert, catch the exception and move on to fixing the dupe queue items.
with better comments.
Comment #20
catchThat works for me.
Comment #21
berdirHere is a patch and an attempt at the comments. Please improve :)
Comment #22
Anonymous (not verified) commentedhere we go.
Comment #23
Anonymous (not verified) commentedwoops, cross-post. #21 looks like it has better comments than #22, so i think it's RTBC if the bot comes back green.
Comment #25
berdirTaxonomy Vocabulary test fails? Really? Can't reproduce that locally...
Testbot, you're lying :) Let's try that again.
Comment #26
berdir#21: catch_exception.patch queued for re-testing.
Comment #27
catchMarking #21 RTBC. I've just committed the other update module patch to 8.x.
Comment #28
Anonymous (not verified) commentedyay, lets kill these bugs.
Comment #29
xjmDo we have a followup issue for that
@todo?Comment #30
catchThere's http://drupal.org/node/1492188 and the followup linked from there (typing from phone so not grabbing the link). I think the combination of the two patches is probably enough to fix the bug in D7 then the unique queue item api would remove the race condition we're working around here.
Comment #31
catchCommitted/pushed to 8.x, moving to 7.x for backport. The backport should be straightforward so tagging as novice.
Comment #32
tim.plunkettRerolled. Did not test manually.
Comment #34
tim.plunkettDuh! I rerolled the wrong patch. I'll fix this in the morning.
Comment #35
xjmThat looks like a reroll of the broken #22. The committed patch was #21.
Comment #36
xjmxpost, sorry. :)
Comment #38
berdirTime for some testing fun ;)
test.php
Then load-testing that with
$ ab -n 20 -c 10 http://d7/test.phpAnd watching the watchdog table.
Before patch:
With patch:
Which is rather obvious, when looking at the code :)
Comment #39
xjmCorrect backport.
Comment #40
webchickCommitted and pushed to 7.x. Thanks!
I was confused about why #38 was not a simpletest but it's because it's a race condition, which is hard to test from SimpleTest, as noted in #8.
Comment #41.0
(not verified) commentedUpdating issue summary