I noticed some duplicates in the {locales_source} table. This is not supposed to happen and may give unexpected results when trying to fetch the corresponding translation.

For performance reasons (I assume) there is not a UNIQUE index that prevents duplicate rows from being inserted. Luckily we can use the new locking API.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

c960657’s picture

Status: Needs review » Needs work

The last submitted patch, locale-race-condition-2.patch, failed testing.

c960657’s picture

Status: Needs work » Needs review

#1: locale-race-condition-2.patch queued for re-testing.

c960657’s picture

#1: locale-race-condition-2.patch queued for re-testing.

c960657’s picture

Use a separate semaphore per string instead of one common semaphore.

This is an annoying bug, especially on high-traffic sites. I hope this will be fixed in D7.

c960657’s picture

#5: locale-race-condition-3.patch queued for re-testing.

Damien Tournoud’s picture

Status: Needs review » Needs work

This looks like the textbook case for a merge query.

andypost’s picture

Locking patch could lead to serious bottleneck on i18n sites, by default code uses DB to perform locking

Also md5() been removed from the most of core

andypost’s picture

plach’s picture

subscribing

c960657’s picture

Status: Needs work » Needs review
FileSize
820 bytes

Here it is with a merge query and without locking.

Why do you think #851362: Add hash column to {locales_source} to query faster locale strings should land before this one? I don't see why the order matters.

andypost’s picture

Sure. Let's fix this one first and then think about uniqueness of strings!

Patch looks good, waiting for bot.

c960657’s picture

I would like to add an update hook that removes the duplicates. Would it be sufficient to add this to D7 and D6 only, assuming that all sites will have run the update hook in D7 before they upgrade to D8? It seems a reasonable assumption, given that D8 is not going to be released any time soon.

andypost’s picture

c960657’s picture

> Patch looks good, waiting for bot.

The bot is happy :-)

Gábor Hojtsy’s picture

Issue tags: +D8MI

Tag for Drupal 8 Multilingual Initiative.

c960657’s picture

andypost’s picture

Status: Needs review » Needs work

The questions are
- should we throw away VERSION as key?
- hook_update_N()
- D7 backport


+++ b/modules/locale/locale.moduleundefined
@@ -668,12 +668,14 @@ function locale($string = NULL, $context = NULL, $langcode = NULL) {
+        ->key(array(
           'source' => $string,
           'context' => (string) $context,
-          'version' => VERSION,

I think version key is useful

-2 days to next Drupal core point release.

c960657’s picture

Status: Needs work » Needs review
FileSize
2.29 KB

- should we throw away VERSION as key?

I'm not sure what you mean? The version field is currently not part of the key. The field is an information field that contains the Drupal core version in use the last time a given string was translated. I am not changing the current behaviour (well, at least that's not my intention).

- D7 backport

locale-race-condition-4.patch above applies to D7.

- hook_update_N()

Oh, I had forgotten about that. Above we agreed to postpone the decision about the update hook until some conclusions were made in #1050616: Figure out backport workflow from Drupal 8 to Drupal 7. I must admit that I have not read through that entire issue, but it seems that the update hook issue wasn't discussed in great detail. Anywhere, here is a D7 patch with an upgrade hook that removes the duplicates. As mentioned above, I think it would be sufficient to add the upgrade hook to D7 and then assume that everybody upgrading to D8 has upgraded at least to 7.9 (assuming this fix will be included in that release). I think there was a similar requirement about upgrading to 6.something before upgrading to D7. If you don't agree, it should be fairly easy to add the same upgrade hook to D8 as well (running it twice shouldn't hurt).

c960657’s picture

Gábor Hojtsy’s picture

I'm not 100% sure how we could write a sane update hook for this. We would need to identify string duplicates and then figure out if either of them have translations in any number of languages, and then try and merge those translations. If there are multiple translations for certain languages for certain duplicates, what would we do? Looks like a messy situation. Sounds like it would be preferable to stop the bleeding (get the fix in) and then figure out a data cleanup routine (which in itself looks like going to be no small feat).

c960657’s picture

Did you see my D7 patch? My D8 patch assumes that all duplicates have been removed prior to upgrading D7→D8. I can backport the D8 patch to D7 - this will stop the bleeding.

The D7 patch loops through all strings with duplicates and selects one of the rows in {locales_source} as the "winner", preferably one that has a corresponding row in {locales_target}. This does not handle the case where
1) one string has several different translations in the same language, or
2) the same string is translated into several different languages with the translations referencing several different rows in {locales_source}.

So the proposed algorithm leads to a data loss in some cases. This is also the case for the current code that allows a translation to become "hidden" because of a duplicate (the problem is probably not very visible, because even though databases do not guarantee a special order of the rows when there is no ORDER BY clause, the order is probably stable in practice). Do you think we should try harder to reduce this data loss?

Gábor Hojtsy’s picture

I think we generally try to avoid data loss as much as possible, but I don't have great ideas to overcome it this time. I think however that that kind of data loss would be much more tolerated on a major upgrade vs. a minor one, so I'd put the data cleanup in the major upgrade, not the minor one. I understand that would leave D7 dbs in an inconsistent state, which might not indeed be good. What do you think?

c960657’s picture

I agree. Some manual work (incl. checking translations) and planned downtime is expected when doing a major upgrade.

This updated D7 patch carefully deletes duplicates when possible but without data loss. This will probably only leave very few duplicate rows, i.e. only strings that have been translated into two (or more) different strings in the same language.

Then we can add the more aggressive update hook from the patch in #19 to the D8 branch. This will delete any leftover duplicates.

Gábor Hojtsy’s picture

That looks good to me, but we do need the D8 patch to land first and then "backport" to D7.

c960657’s picture

Here is the D8 patch.

Status: Needs review » Needs work
Issue tags: -D8MI

The last submitted patch, locale-race-condition-8.patch, failed testing.

c960657’s picture

Status: Needs work » Needs review

#26: locale-race-condition-8.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +D8MI

The last submitted patch, locale-race-condition-8.patch, failed testing.

c960657’s picture

Status: Needs work » Needs review

Hmm, I cannot reproduce the test bot failure.

c960657’s picture

Status: Needs review » Needs work
c960657’s picture

Adding the patch from #1331350: [Upgrade path broken] Entity module not enabled during D8 upgrade to see if that makes the test bot happy.

c960657’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, locale-race-condition-9.patch, failed testing.

c960657’s picture

Status: Needs work » Needs review
FileSize
2.97 KB

I'm still not quite sure what is going on, but I think it is related to #1223502: Update for ripping out textgroup support in Drupal locale module, so now I have included the patch from that ticket.

Status: Needs review » Needs work

The last submitted patch, locale-race-condition-10.patch, failed testing.

c960657’s picture

Third time lucky? I think #1336170: Add locale module to upgrade tests is the culprit.

c960657’s picture

Status: Needs work » Needs review

#26: locale-race-condition-8.patch queued for re-testing.

c960657’s picture

c960657’s picture

Now that #1336170: Add locale module to upgrade tests has been fixed, the test bot is happy. The D7 patch is attached to comment #24.

Gábor Hojtsy’s picture

Can we get someone else to review/support these patches? :) That would be great.

Gábor Hojtsy’s picture

Issue tags: +language-ui

Adding UI language translation tag.

Gábor Hojtsy’s picture

Once again, any other opinions, others to verify the bug + the fix is good?

Sutharsan’s picture

Status: Needs review » Reviewed & tested by the community

Checked the code, investigated and tested the merge query, agree on the hook_update approach.

Gábor Hojtsy’s picture

Here is a very tiny change to conform to current coding standards. "Remove" changed to "Removes" in the first phpdoc line. No other change, RTBC should still stand.

Gábor Hojtsy’s picture

Add backport tag to indicate this is bug to be fixed in D7 too. There is a solid D7 patch in #24 to be reviewed more and to land after the D8 patch landed.

andypost’s picture

+++ b/core/modules/locale/locale.install
@@ -278,6 +278,36 @@ function locale_update_8002() {
+  $results = db_query("SELECT source, context FROM {locales_source} GROUP BY source, context HAVING COUNT(*) > 1");

This query should be checked against pgsql and sqlite.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs review

Sounds like you are saying this is not RTBC?

andypost’s picture

Status: Needs review » Reviewed & tested by the community

tested update with pgsql & sqlite, updated works!

Gábor Hojtsy’s picture

Update 8002 was backed out in #1272840: Add upgrade path for language domains and validation so rerolling to attempt to use that number which is the current last number to take for updates. No other change so should still be RTBC as per above manual tests too.

Sutharsan’s picture

#50 passed the test bot. Back to RTBC.

Gábor Hojtsy’s picture

Issue tags: +sprint

Tag for sprint to be easier to follow.

webchick’s picture

Version: 8.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Needs review

I'm not sure exactly in what order is best to commit D8MI patches, but since this one is a candidate for D7 backport, I started with it. :)

The switch to db_merge() is sensible. I'm a little nervous about the upgrade path accidentally destroying data. :\ But I can see that it does its best to be careful, and I think the reasoning of doing a "lite" data destruction in D7 and a "heavy" data destruction in D8 may be the way to handle that.

Thanks for the additional testing on pgsql and sqlite, too. Yay!

Committed and pushed to 8.x. Sounds like the D7 patch in #24 could use additional review (and possibly needs a re-roll; I'll check in a sec.)

webchick’s picture

#24: locale-race-condition-7-D7.patch queued for re-testing.

c960657’s picture

Patch from #24 reuploaded for the sake of the test bot (on this page the test status is yellow, but on qa.drupal.org it is green).

Gábor Hojtsy’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -sprint

Reviewed again, looks good.

webchick’s picture

Status: Reviewed & tested by the community » Fixed
Issue tags: +7.13 release notes

Ok, great!

Committed and pushed to 7.x. Tagging to mention in the 7.13 release notes.

c960657’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Needs review
FileSize
4.22 KB

D6 backport. The update hook is successfully tested on a D6 site with close to 1,000,000 duplicates (I'm not sure all of these duplicates were due to this race condition, but the important thing is that they are eliminated by the update hook).

Status: Needs review » Needs work

The last submitted patch, locale-race-condition-9-D6.patch, failed testing.

c960657’s picture

Status: Needs work » Needs review
FileSize
4.14 KB

Status: Needs review » Needs work
Issue tags: -D8MI, -language-ui, -7.13 release notes

The last submitted patch, locale-race-condition-10-D6.patch, failed testing.

c960657’s picture

Status: Needs work » Needs review

#60: locale-race-condition-10-D6.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, locale-race-condition-10-D6.patch, failed testing.

c960657’s picture

Status: Needs work » Needs review
Issue tags: +D8MI, +language-ui, +7.13 release notes

#60: locale-race-condition-10-D6.patch queued for re-testing.

c960657’s picture

This time the test bot is satisfied.

MHLut’s picture

After updating from Drupal core 7.12 to 7.14, I ran update.php. I got the following error:
Your {locales_source} table contains duplicates that could not be removed automatically. See http://drupal.org/node/746240 for more information.
After updating, no visible inconsistencies (related to this error) occurred on the website.

I'm not sure what to do with this. I don't know what caused this error and why it's referring my to a D6 issue. If this rings a bell for anyone, please let me know!

pounard’s picture

@mhlut I had real world errors and crashes because of this, but in very peculiar cases. If you don't experience anything, you can leave them. But I would advice you to get rid of it: it's pretty simple to achieve:

  1. Export all your translations as .po files
  2. TRUNCATE the locales_source and locales_target SQL tables
  3. Import your exported .po files

EDIT: In any cases, backup your database first in case something goes wrong during this procedure!

c960657’s picture

Updated D6 patch that uses update_sql().

surgeonbor’s picture

@c960657, the update hook in your D6 patch only touches rows with textgroup = 'default'. Is that by design? Should it be modified to remove duplicates in all textgroups?

c960657’s picture

It is by design. The race condition that caused all the duplicate mess is part of the locale module, and that module only deals with 'default' group, so we can be sure that all duplicates introduces by this specific race condition have textgroup = 'default'.

Of course, other modules using other textgroups may have had a similar bug and requires a similar fix. I don't know which modules use other textgroups or how they use them, so I am reluctant about deleting rows created by them. If relevant, they can create update hooks based on the one in the core. But if there is consensus that cleaning up other textgroups is useful and safe, I don't mind updating the patch.

Gábor Hojtsy’s picture

I think we should leave other textgroups alone, they were never managed by core and should not be touched by core updates IMHO.

cweagans’s picture

hefox’s picture

Issue summary: View changes

Just found this as I have seen this in d7 still, but I believe that is due to race condition in db_merge itself due to heavy traffic (e.g. performance tests).

Status: Needs review » Closed (outdated)

Automatically closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.