I noticed some duplicates in the {locales_source} table. This is not supposed to happen and may give unexpected results when trying to fetch the corresponding translation.
For performance reasons (I assume) there is not a UNIQUE index that prevents duplicate rows from being inserted. Luckily we can use the new locking API.
Comment | File | Size | Author |
---|---|---|---|
#68 | locale-race-condition-11-D6.patch | 4.22 KB | c960657 |
#60 | locale-race-condition-10-D6.patch | 4.14 KB | c960657 |
#58 | locale-race-condition-9-D6.patch | 4.22 KB | c960657 |
#55 | locale-race-condition-8-D7.patch | 3.78 KB | c960657 |
#50 | locale-race-condition-50.patch | 2.36 KB | Gábor Hojtsy |
Comments
Comment #1
c960657 CreditAttribution: c960657 commentedComment #3
c960657 CreditAttribution: c960657 commented#1: locale-race-condition-2.patch queued for re-testing.
Comment #4
c960657 CreditAttribution: c960657 commented#1: locale-race-condition-2.patch queued for re-testing.
Comment #5
c960657 CreditAttribution: c960657 commentedUse a separate semaphore per string instead of one common semaphore.
This is an annoying bug, especially on high-traffic sites. I hope this will be fixed in D7.
Comment #6
c960657 CreditAttribution: c960657 commented#5: locale-race-condition-3.patch queued for re-testing.
Comment #7
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis looks like the textbook case for a merge query.
Comment #8
andypostLocking patch could lead to serious bottleneck on i18n sites, by default code uses DB to perform locking
Also md5() been removed from the most of core
Comment #9
andypostI think we should fix #851362: Add hash column to {locales_source} to query faster locale strings first
There was also #811158: We should not index on (source, context)
Comment #10
plachsubscribing
Comment #11
c960657 CreditAttribution: c960657 commentedHere it is with a merge query and without locking.
Why do you think #851362: Add hash column to {locales_source} to query faster locale strings should land before this one? I don't see why the order matters.
Comment #12
andypostSure. Let's fix this one first and then think about uniqueness of strings!
Patch looks good, waiting for bot.
Comment #13
c960657 CreditAttribution: c960657 commentedI would like to add an update hook that removes the duplicates. Would it be sufficient to add this to D7 and D6 only, assuming that all sites will have run the update hook in D7 before they upgrade to D8? It seems a reasonable assumption, given that D8 is not going to be released any time soon.
Comment #14
andypostThere's a issue about #1050616: Figure out backport workflow from Drupal 8 to Drupal 7
Let's decide on it
Comment #15
c960657 CreditAttribution: c960657 commented> Patch looks good, waiting for bot.
The bot is happy :-)
Comment #16
Gábor HojtsyTag for Drupal 8 Multilingual Initiative.
Comment #17
c960657 CreditAttribution: c960657 commentedReroll (due to the fix for #1188430: Rip out textgroup support from locale module).
Comment #18
andypostThe questions are
- should we throw away VERSION as key?
- hook_update_N()
- D7 backport
I think version key is useful
-2 days to next Drupal core point release.
Comment #19
c960657 CreditAttribution: c960657 commentedI'm not sure what you mean? The version field is currently not part of the key. The field is an information field that contains the Drupal core version in use the last time a given string was translated. I am not changing the current behaviour (well, at least that's not my intention).
locale-race-condition-4.patch above applies to D7.
Oh, I had forgotten about that. Above we agreed to postpone the decision about the update hook until some conclusions were made in #1050616: Figure out backport workflow from Drupal 8 to Drupal 7. I must admit that I have not read through that entire issue, but it seems that the update hook issue wasn't discussed in great detail. Anywhere, here is a D7 patch with an upgrade hook that removes the duplicates. As mentioned above, I think it would be sufficient to add the upgrade hook to D7 and then assume that everybody upgrading to D8 has upgraded at least to 7.9 (assuming this fix will be included in that release). I think there was a similar requirement about upgrading to 6.something before upgrading to D7. If you don't agree, it should be fairly easy to add the same upgrade hook to D8 as well (running it twice shouldn't hurt).
Comment #20
c960657 CreditAttribution: c960657 commentedReroll of D8 patch after #22336: Move all core Drupal files under a /core folder to improve usability and upgrades.
Comment #21
Gábor HojtsyI'm not 100% sure how we could write a sane update hook for this. We would need to identify string duplicates and then figure out if either of them have translations in any number of languages, and then try and merge those translations. If there are multiple translations for certain languages for certain duplicates, what would we do? Looks like a messy situation. Sounds like it would be preferable to stop the bleeding (get the fix in) and then figure out a data cleanup routine (which in itself looks like going to be no small feat).
Comment #22
c960657 CreditAttribution: c960657 commentedDid you see my D7 patch? My D8 patch assumes that all duplicates have been removed prior to upgrading D7→D8. I can backport the D8 patch to D7 - this will stop the bleeding.
The D7 patch loops through all strings with duplicates and selects one of the rows in {locales_source} as the "winner", preferably one that has a corresponding row in {locales_target}. This does not handle the case where
1) one string has several different translations in the same language, or
2) the same string is translated into several different languages with the translations referencing several different rows in {locales_source}.
So the proposed algorithm leads to a data loss in some cases. This is also the case for the current code that allows a translation to become "hidden" because of a duplicate (the problem is probably not very visible, because even though databases do not guarantee a special order of the rows when there is no ORDER BY clause, the order is probably stable in practice). Do you think we should try harder to reduce this data loss?
Comment #23
Gábor HojtsyI think we generally try to avoid data loss as much as possible, but I don't have great ideas to overcome it this time. I think however that that kind of data loss would be much more tolerated on a major upgrade vs. a minor one, so I'd put the data cleanup in the major upgrade, not the minor one. I understand that would leave D7 dbs in an inconsistent state, which might not indeed be good. What do you think?
Comment #24
c960657 CreditAttribution: c960657 commentedI agree. Some manual work (incl. checking translations) and planned downtime is expected when doing a major upgrade.
This updated D7 patch carefully deletes duplicates when possible but without data loss. This will probably only leave very few duplicate rows, i.e. only strings that have been translated into two (or more) different strings in the same language.
Then we can add the more aggressive update hook from the patch in #19 to the D8 branch. This will delete any leftover duplicates.
Comment #25
Gábor HojtsyThat looks good to me, but we do need the D8 patch to land first and then "backport" to D7.
Comment #26
c960657 CreditAttribution: c960657 commentedHere is the D8 patch.
Comment #28
c960657 CreditAttribution: c960657 commented#26: locale-race-condition-8.patch queued for re-testing.
Comment #30
c960657 CreditAttribution: c960657 commentedHmm, I cannot reproduce the test bot failure.
Comment #31
c960657 CreditAttribution: c960657 commentedComment #32
c960657 CreditAttribution: c960657 commentedAdding the patch from #1331350: [Upgrade path broken] Entity module not enabled during D8 upgrade to see if that makes the test bot happy.
Comment #33
c960657 CreditAttribution: c960657 commentedComment #35
c960657 CreditAttribution: c960657 commentedI'm still not quite sure what is going on, but I think it is related to #1223502: Update for ripping out textgroup support in Drupal locale module, so now I have included the patch from that ticket.
Comment #37
c960657 CreditAttribution: c960657 commentedThird time lucky? I think #1336170: Add locale module to upgrade tests is the culprit.
Comment #38
c960657 CreditAttribution: c960657 commented#26: locale-race-condition-8.patch queued for re-testing.
Comment #39
c960657 CreditAttribution: c960657 commentedComment #40
c960657 CreditAttribution: c960657 commentedNow that #1336170: Add locale module to upgrade tests has been fixed, the test bot is happy. The D7 patch is attached to comment #24.
Comment #41
Gábor HojtsyCan we get someone else to review/support these patches? :) That would be great.
Comment #42
Gábor HojtsyAdding UI language translation tag.
Comment #43
Gábor HojtsyOnce again, any other opinions, others to verify the bug + the fix is good?
Comment #44
Sutharsan CreditAttribution: Sutharsan commentedChecked the code, investigated and tested the merge query, agree on the hook_update approach.
Comment #45
Gábor HojtsyHere is a very tiny change to conform to current coding standards. "Remove" changed to "Removes" in the first phpdoc line. No other change, RTBC should still stand.
Comment #46
Gábor HojtsyAdd backport tag to indicate this is bug to be fixed in D7 too. There is a solid D7 patch in #24 to be reviewed more and to land after the D8 patch landed.
Comment #47
andypostThis query should be checked against pgsql and sqlite.
Comment #48
Gábor HojtsySounds like you are saying this is not RTBC?
Comment #49
andyposttested update with pgsql & sqlite, updated works!
Comment #50
Gábor HojtsyUpdate 8002 was backed out in #1272840: Add upgrade path for language domains and validation so rerolling to attempt to use that number which is the current last number to take for updates. No other change so should still be RTBC as per above manual tests too.
Comment #51
Sutharsan CreditAttribution: Sutharsan commented#50 passed the test bot. Back to RTBC.
Comment #52
Gábor HojtsyTag for sprint to be easier to follow.
Comment #53
webchickI'm not sure exactly in what order is best to commit D8MI patches, but since this one is a candidate for D7 backport, I started with it. :)
The switch to db_merge() is sensible. I'm a little nervous about the upgrade path accidentally destroying data. :\ But I can see that it does its best to be careful, and I think the reasoning of doing a "lite" data destruction in D7 and a "heavy" data destruction in D8 may be the way to handle that.
Thanks for the additional testing on pgsql and sqlite, too. Yay!
Committed and pushed to 8.x. Sounds like the D7 patch in #24 could use additional review (and possibly needs a re-roll; I'll check in a sec.)
Comment #54
webchick#24: locale-race-condition-7-D7.patch queued for re-testing.
Comment #55
c960657 CreditAttribution: c960657 commentedPatch from #24 reuploaded for the sake of the test bot (on this page the test status is yellow, but on qa.drupal.org it is green).
Comment #56
Gábor HojtsyReviewed again, looks good.
Comment #57
webchickOk, great!
Committed and pushed to 7.x. Tagging to mention in the 7.13 release notes.
Comment #58
c960657 CreditAttribution: c960657 commentedD6 backport. The update hook is successfully tested on a D6 site with close to 1,000,000 duplicates (I'm not sure all of these duplicates were due to this race condition, but the important thing is that they are eliminated by the update hook).
Comment #60
c960657 CreditAttribution: c960657 commentedComment #62
c960657 CreditAttribution: c960657 commented#60: locale-race-condition-10-D6.patch queued for re-testing.
Comment #64
c960657 CreditAttribution: c960657 commented#60: locale-race-condition-10-D6.patch queued for re-testing.
Comment #65
c960657 CreditAttribution: c960657 commentedThis time the test bot is satisfied.
Comment #66
MHLut CreditAttribution: MHLut commentedAfter updating from Drupal core 7.12 to 7.14, I ran update.php. I got the following error:
Your {locales_source} table contains duplicates that could not be removed automatically. See http://drupal.org/node/746240 for more information.
After updating, no visible inconsistencies (related to this error) occurred on the website.
I'm not sure what to do with this. I don't know what caused this error and why it's referring my to a D6 issue. If this rings a bell for anyone, please let me know!
Comment #67
pounard@mhlut I had real world errors and crashes because of this, but in very peculiar cases. If you don't experience anything, you can leave them. But I would advice you to get rid of it: it's pretty simple to achieve:
locales_source
andlocales_target
SQL tablesEDIT: In any cases, backup your database first in case something goes wrong during this procedure!
Comment #68
c960657 CreditAttribution: c960657 commentedUpdated D6 patch that uses update_sql().
Comment #69
surgeonbor CreditAttribution: surgeonbor commented@c960657, the update hook in your D6 patch only touches rows with textgroup = 'default'. Is that by design? Should it be modified to remove duplicates in all textgroups?
Comment #70
c960657 CreditAttribution: c960657 commentedIt is by design. The race condition that caused all the duplicate mess is part of the locale module, and that module only deals with 'default' group, so we can be sure that all duplicates introduces by this specific race condition have textgroup = 'default'.
Of course, other modules using other textgroups may have had a similar bug and requires a similar fix. I don't know which modules use other textgroups or how they use them, so I am reluctant about deleting rows created by them. If relevant, they can create update hooks based on the one in the core. But if there is consensus that cleaning up other textgroups is useful and safe, I don't mind updating the patch.
Comment #71
Gábor HojtsyI think we should leave other textgroups alone, they were never managed by core and should not be touched by core updates IMHO.
Comment #72
cweagansFixing tags per http://drupal.org/node/1517250
Comment #73
hefox CreditAttribution: hefox commentedJust found this as I have seen this in d7 still, but I believe that is due to race condition in db_merge itself due to heavy traffic (e.g. performance tests).