Drupal 10, the latest version of the open-source digital experience platform with even more features, is here.If this is not given then in the database backend all kinds of strange things can happen that the db_update fails (as expire is the same) and such a process cannot extend its own lock ...
This is very easy to reproduce by using memcache and memcache's stampede protection, but then not using memcache's lock.inc.
This can also be reproduced by the following script:
$x = lock_acquire('a', 15);
$y = lock_acquire('a', 15);
$z = lock_acquire('a', 15);
Where $y and $z fail ...
D7 patch:
diff --git a/includes/lock.inc b/includes/lock.inc
index 7dd8db3..f426341 100644
--- a/includes/lock.inc
+++ b/includes/lock.inc
@@ -67,6 +67,11 @@
function lock_initialize() {
global $locks;
+ // The lock system needs at least a precision of 14.
+ if (ini_get('precision') < 14) {
+ ini_set('precision', 14);
+ }
+
$locks = array();
}
| Comment | File | Size | Author |
|---|---|---|---|
| #83 | 2077827-83-precision14-7x_do_not_test.patch | 676 bytes | Mile23 |
| #77 | core-lock-system-not-enough-precision-2077827-77-D8.patch | 743 bytes | Krzysztof Domański |
| #66 | lock_system_needs_a-2077827-49.patch | 750 bytes | mgifford |
| #50 | interdiff-2077827-44-49.txt | 2.05 KB | nlisgo |
| #49 | lock_system_needs_a-2077827-49.patch | 750 bytes | nlisgo |
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 80,853 pass(es). View | |||











Comments
Comment #1
Fabianx CreditAttribution: Fabianx commented.
Comment #2
Fabianx CreditAttribution: Fabianx commentedTagging novice to create a proper patch for D8
Comment #3
littledynamo CreditAttribution: littledynamo commentedWe'll tackle this one at our code sprint next week.
Comment #4
littledynamo CreditAttribution: littledynamo commentedComment #5
Pete B CreditAttribution: Pete B commentedPatch for this
Comment #6
Fabianx CreditAttribution: Fabianx commentedLooks, good, this will need some Tests.
Comment #7
pingers CreditAttribution: pingers commentedWhat would we be testing? ini_set()? Not sure this does need tests.
I don't think it was mentione: 14 is the default precision for PHP. http://www.php.net/manual/en/ini.core.php#ini.precision
Comment #8
Fabianx CreditAttribution: Fabianx commentedYeah, in that case this is probably good to go.
Comment #9
webchickCan we get the script in the OP as an automated test so we don't break this again?
Comment #10
martin107 CreditAttribution: martin107 commented5: drupal_core-lock_system_precision-2077827-5.patch queued for re-testing.
Comment #12
martin107 CreditAttribution: martin107 commentedComment #13
jorgegc CreditAttribution: jorgegc commentedI am happy to do a reroll for patch in #5.
Comment #14
jorgegc CreditAttribution: jorgegc commentedRerolled patch in #5.
Comment #15
jorgegc CreditAttribution: jorgegc commentedAs required by @webchick, I've rewritten the patch and included a test for the precision. Let's see what the test bot comes back with!
Comment #17
benjy CreditAttribution: benjy commentedI think we should expand on this a little. It's not clear that we're setting it to 12 so we can later test to ensure that the constructor updated the prevision to at least 14.
Comment #18
jorgegc CreditAttribution: jorgegc commentedFair enough @benjy.
@Fabianx said that the default precision is 14, If that is the case, why did it break?
Comment #19
Damien Tournoud CreditAttribution: Damien Tournoud commentedInstinctively, this sounds like trying to patch the symptoms.
@Fabianx: do you have more detail on why precision matters? From a quick read of the code we don't seem to be doing any floating point manipulation, other then the expiration logic that we could get rid of, and should not affect the normal code path.
Comment #20
Fabianx CreditAttribution: Fabianx commentedI don't remember the details anymore, but if you use a precision of 12, you get a DB query that is comparing timestamps, but the timestamps are the same.
so the lock_acquire fails - even though you hold the lock ...
leading to very strange behavior.
Comment #21
Xen CreditAttribution: Xen at Reload! commentedDatabaseLockBackend::aquire calls microtime(true) which returns a float. That is then rendered into a string for the SQL, and it makes sense that's where precision kicks in. Testing locally, with a precision of 12, the timestamp gets rendered with only two decimal digits, which is sure to mess something up. Basically, it only prints decimal digits up until the precision.
One way to avoid the problem would be to use microtime() and do some string manipulation to get the proper string, but then we're basically passing the buck to the DB. It's currently a big float which should be sufficient, but figuring out the exact limits we need and using a decimal would be a tad safer.
But as this setting influences *all* float rendering, I think it should be promoted to be set in bootstrapping so it's consistent all across.
I'm elevating this to critical as it *will* break Drupal on hosts configured with non-standard precision.
Comment #22
dawehnerHa, this comment is a bit confusing, let's explain that we test the automatic changing to precision 14 later, maybe with an @see ?
Comment #23
catchLet's fix that comment, otherwise looks RTBC.
Comment #24
gobinathmwill update the comment shortly...
Comment #25
nlisgo CreditAttribution: nlisgo commentedRe-roll only. Leaving comment for @gobinathm.
Comment #26
nlisgo CreditAttribution: nlisgo commentedComment #27
nlisgo CreditAttribution: nlisgo commentedAwaiting change to comment by @gobinathm
Comment #28
nlisgo CreditAttribution: nlisgo commentedComment #29
nlisgo CreditAttribution: nlisgo commentedComment #30
jorgegc CreditAttribution: jorgegc commentedI thought the comment in __construct() was going to be updated as well to explain why we need a precision of 14.
Comment #31
Fabianx CreditAttribution: Fabianx commentedIndeed lets add some information, why we need a precision of at least 14, see above comments for an explanation.
Comment #32
jorgegc CreditAttribution: jorgegc commentedHappy to do it @Fabianx given that the poor comment is mine :-)
Comment #33
jorgegc CreditAttribution: jorgegc commentedComment #34
nlisgo CreditAttribution: nlisgo commentedComment #35
nlisgo CreditAttribution: nlisgo commented@jorgegc, I'd all but exported the patch before I saw your messages. Feel free to change the comment.
While looking into the issue to attempt to write an appropriate comment I did stumble across this post: http://php.net/manual/en/function.microtime.php#113078
It is suggested in that post that a precision of 16 could ensure that the full value that microtime returns could be preserved on string conversions. But precision 14 has been found to be enough for this?
I do also wonder if setting this precision value should be more global. microtime() is used extensively in core.
Comment #36
Fabianx CreditAttribution: Fabianx commentedLets add something like:
Else the process that holds the lock and tries to re-acquire it will fail.
Comment #37
nlisgo CreditAttribution: nlisgo commentedComment #38
Fabianx CreditAttribution: Fabianx commentedIt looks great to me, will re-review whole patch later again to RTBC.
Comment #39
catchI think we should consider just setting this globally to the higher precision in this issue. The lock system could then throw an exception if it's somehow not set correctly.
Comment #40
Fabianx CreditAttribution: Fabianx commented#39 I tend to agree, but where is globally this times? Container level? Drupal Kernel?
Comment #41
catchDrupalKernel::bootEnvironment()Comment #42
Fabianx CreditAttribution: Fabianx commentedCNW based on #41 then
Comment #43
nlisgo CreditAttribution: nlisgo commentedI will work on supplying a patch for review later today.
Comment #44
nlisgo CreditAttribution: nlisgo commentedI hope this is what you were going for. Please provide feedback.
Comment #45
nlisgo CreditAttribution: nlisgo commentedComment #46
Fabianx CreditAttribution: Fabianx commentedLooks great to me.
Comment #47
Fabianx CreditAttribution: Fabianx commentedLet's RTBC this and see what core committers think.
Comment #48
alexpottIs it really right to throw an exception given that DrupalKernel contains the ini_set()?
This fail might be wrong. I think we should skip the unit test in this case. DrupalKernal has not run btw so the only reason this passes is that precision is 14 or more on testbots.
Comment #49
nlisgo CreditAttribution: nlisgo commentedThat leaves us with just setting the precision.
It makes sense to remove this test, particularly since DrupalKernel is not run :) The only thing a test or check would protect from is if someone later decided that setting the precision was unnecessary then the problems described in this issue would re-occur. But git blame on this code should lead someone back to this issue before thinking about changing it.
Here is a patch for review.
Comment #50
nlisgo CreditAttribution: nlisgo commentedWoops! I triggered the testbot with my interdiff because I mistakenly added the .patch extension. Here it is again as .txt :) Sorry testbot!
Comment #51
nlisgo CreditAttribution: nlisgo commentedComment #53
nlisgo CreditAttribution: nlisgo commentedIgnore the failed interdiff-2077827-44-49.patch (#holdsheadinshame). Patch in #49 is good for review.
Comment #54
catch@alexpott I think it's worth throwing the exception to ensure an early failure if DrupalKernel hasn't run.
Comment #55
alexpottThis returns:
Is this a memcache only issue?
Comment #56
alexpottDiscussed with @nlisgo on IRC I think the problem is in DatabaseLockBackend::lockMayBeAvailable()
Given:
See http://php.net/manual/en/language.types.float.php - thanks @catch
Comment #57
nlisgo CreditAttribution: nlisgo commentedHappy to take on the refactoring of this. Will post up for review.
Comment #58
nlisgo CreditAttribution: nlisgo commentedThe solution proposed in #56 does not address the issue described in #21. If the precision value is not high enough then the expire value that is written to the database will not be fit for a comparison later on in the lockMayBeAvailable method.
What the suggested solution in #56 does draw attention to, however, is that microtime comparisons are only at risk because of the precision setting if the value or values being compared are converted to and from a string.
I think we still need to address the precision setting. But it will strengthen this issue if we can provide more reliable steps to reproduce the issue.
@Fabianx can you respond to #55?
Comment #59
nlisgo CreditAttribution: nlisgo commentedIn an effort to continue the momentum of this issue. I believe that we should consider the last patch for review again. When reviewing please take on board the comments since the last patch was submitted.
The patch for review is in #49.
Comment #60
xjmThis fits the definition of a major bug per: https://www.drupal.org/core/issue-priority#major-bugs. There are also workarounds possible on those hosts affected; for example, in Apache configuration, .htaccess, settings.php, etc. Therefore, it need not block release. However, it is major because it will render the system unusable in these cases and will likely be difficult to isolate.
This could also be addressed with a hook_requirements() when a higher precision is needed.
Comment #61
nlisgo CreditAttribution: nlisgo commentedComment #62
Fabianx CreditAttribution: Fabianx commentedI just confirmed the bug again for D7:
gives TRUE,FALSE,FALSE.
And MAMP (!) ships still with ini precision 12 by default ...
Trying D8 now ...
Comment #63
Fabianx CreditAttribution: Fabianx commentedOkay, so this is way way way trickier than I ever expected.
There is a huge BC break in db_update() vs. $this->database->update(), which is exposed by this - unless this is recorded in some change record. EDIT: Yes, this is recorded in a change record, so fine ...
Before db_update() would return 0 when 0 values had been changed, which led to the bug.
Now it returns 1 values.
Basically we get:
on both D7 and D8, but Drupal 8 does use Rows matched, while Drupal 7 does use 'Changed' value.
b) The failure condition does not make sense at all. Tentatively setting to critical again (can be demoted again for sure)
Neither in D7 nor in D8.
Consider the rare edge case of a lock being broken, but the process did hold it:
process_A:
process_B:
Timeline:
process A: lock_wait('a', 2); // Need it only for 2s, but due to some code it needs longer and process A takes 10s.
process B: lock_wait('a', 5);
process B: lock_release('a');
// The lock a is open at this point.
process A: lock_wait('a', 2); // BOOM! This fails even though, because A thinks it still has the lock.
Proposed resolution for this is:
So the else {} is faulty and there is still the float comparison problem, ugh ...
Problems of the lock system
-
Figure out what is up with the database layer, do we really return affected values instead of changed values now? - create critical follow-up- Fix precision problem - at least for Drupal 7 - Patch exists
- Fix failure case to be _correct_. Even if I had a lock, I need to re-try gaining the lock if prolonging did not work.
- Figure out what to do with the float comparison and if this has more bugs ...
Comment #64
Fabianx CreditAttribution: Fabianx commentedI found the change record for the DB behavior change:
https://www.drupal.org/node/2019303
Just q question though, can someone set this option to something else?
In this case, the one major part is only the failure case and the float comparison, but also happy to continue with D7 at this point and open another issue for the other problems.
Comment #65
jhedstromThe issue summary is quite out of date with more recent comments.
Comment #66
mgiffordRe-uploading patch for the bots.
Comment #67
Nephele CreditAttribution: Nephele commentedI just encountered this problem. Since this is one of the most frustrating bugs I've had to deal with, I wanted to share some details in case a concrete example is useful. I'm working in D7, but I was able to duplicate the problem in a clean install of D8.1.x
I started the day trying to test a new page I was developing on my test site. But I couldn't get drupal to recognize the new page. I tried to rebuild menus every way I know; disabled and re-enabled the module; etc. But menu-rebuilding just refused to happen, and there were no warnings or messages in any of the logs. At this point I assumed something was broken with the menu system.
But over the next few hours I kept diving deeper and deeper into core code, and eventually tracked the problem to lock_may_be_available(). Apparently a menu rebuild had failed about a month ago, and had left the following entry in the semaphore table:
Attempts to delete the entry were failing because the deletion conditions were rounding expire to two digits (e.g.,
WHERE expire <= 1455821246.68). Therefore all menu_rebuild requests were being silently ignored.Once I finally found this issue, a D7 version of patch #49 worked (as did the patch provided by the OP). I also duplicated the issue in D8, by inserting the same row into the D8 semaphore table, except with name changed to
menuLinksRebuild. The same problem occurred in D8 -- all attempts to rebuild menus were ignored -- and patch #49 fixed the problem. Also note that memcache is not enabled on my test site, but problems still occurred.One point of my example is to echo #60: lock expiration rounding errors are indeed "difficult to isolate." They make seemingly unrelated drupal components fail and only very determined developers have any chance of identifying the underlying problem. So this may be the cause of many reported errors on this site -- including many errors tagged as unreproducible, given that the behavior depends upon an obscure PHP setting.
I would vote in favor of implementing patch #49, and sooner rather than later, to help resolve all the possibly undiagnosed but related problems. The tests in #44 seem unnecessary, given that testbots will never trigger the error.
However patch #49 is a band-aid rather than a real fix. Perhaps a separate followup issue should be created to explore other, more stable options, that are not dependent upon floating-point rounding. My guess is that development of any such alternative will take some time, which is why the current patch should be implemented first.
Comment #71
bmx269 CreditAttribution: bmx269 commentedTested and verified that #66 works
Comment #72
joelpittet@webchick and @Fabianx the default PHP setting is 14. What should we be testing?
http://php.net/manual/en/ini.core.php#ini.precision
Comment #75
fgmJust wondering: any reason why we cannot use an explicit high-accuracy serialization starting from
microtime(FALSE)?When using that format, the microsecond part appears to have 1E-8 resolution, but 1E-6 accuracy, while microtime(TRUE) only has 1E-4 accuracy.
So we could just replace the double in the semaphore schema by a CHAR(16) (10 digits for seconds, 6 for microseconds) and store the actual microsecond value as a fixed length string. This should both remove the need for the dependency on the PHP precision check and eliminate any risk of rounding error mentioned against @alexpott's code in #56. Think of something like that:
That timing logic could even be moved to LockBackendAbstract() to allow it to be shared between lock backends.
Comment #77
Krzysztof DomańskiRe-uploading patch for re-testing in 8.6.x-dev.
I used the code from #49. It is the same as #66 so interdiff is not needed.
Comment #80
Krzysztof DomańskiComment #81
quietone CreditAttribution: quietone as a volunteer commentedSince this needs tests and an IS update, setting to NW.
Comment #83
Mile23Here's a 7.x patch for anyone who might find it handy.
No guarantee of usefulness or problem-solving-ness... Just a conversion to 7.x.
Also, I noticed in the docs that we also have this: https://www.php.net/manual/en/ini.core.php#ini.serialize-precision
That is,
serialize_precisionis a thing, and maybe we need to care about it.