Proposed commit message
Issue #2425259 by catch, sidharrell, nlisgo, Josh Waihi, Fabianx, Berdir, martin107: Router rebuild lock_wait() condition can result in rebuild later in the request (race condition)
The problem
menu_rebuild() / Router::rebuild() has additional race conditions which aren't accounted for by the lock.
1. Processes that fail to acquire a lock, have the menu_rebuild_needed variable / state and no menu masks in memory. If they subsequently call menu_get_item() / any function on the router that checks for rebuild, that will trigger yet another menu_rebuild() / Router::rebuild().
2. Additionally, there's a window between variable_initialize() / state() retrieval and menu_get_item() / Router::[public] where another process may have fully completed a menu_rebuild() / Router::rebuild() and freed the lock. In this case we don't need to rebuild the menu, but we do anyway.
#1 is a serious race condition - it can cause menu_rebuild() / Router::rebuild() to occur over and over again - as one process acquires the lock, other processes lock_wait() / Lock::wait(), then rebuild, then acquire the lock, then more process lock_wait() / Lock::wait() etc. - proper lock stampede.
#2 is more of an optimization to make the current stampede protection more robust.
Both problems exist in 8.x too despite the code having changed a lot. They have been shown in above code with the / for D8 as the alternative implementation.
Its really just the naming that has changed.
Original report
Everytime menu_rebuild() is called, we get a crazy amount of waits on INSERT INTO menu_router and a bunch of DELETE FROM menu_router. I believe there is some sort of race condition for rebuilding menus. The only way I can kill it is to kill off a bunch of processes. I would imagine that setting a lock for the first user to rebuild that table would be a simple enough fix, but I wanted to open a ticket and see if anyone else had come up with a solution first.
Please help, this is crashing our site: http://www.divx.com/ every time we update a menu or clear the cache.
Comment | File | Size | Author |
---|---|---|---|
#5 | 356399_menu_rebuild-44.patch | 2.42 KB | catch |
Comments
Comment #1
David_Rothstein CreditAttribution: David_Rothstein commentedFor #1, I wonder if #1232346: Easy to trigger multiple menu rebuilds per page (including infinite recursion) via menu_get_item() since 7.12 is related (seems to be a somewhat similar issue and touches the same code).
Comment #2
Fabianx CreditAttribution: Fabianx commentedUhm, we should put our RTBC patch to this issue, so we can finally get this in D7.
And yes, that issue looks strongly related.
Comment #3
Fabianx CreditAttribution: Fabianx commentedPinging catch
Comment #4
Fabianx CreditAttribution: Fabianx commentedhttps://www.drupal.org/node/356399#comment-9176891 had the last RTBC 7.x patch.
Will put here tomorrow if catch does not beat me to it.
Comment #5
catchHere it is.
Comment #6
Fabianx CreditAttribution: Fabianx commentedBack to RTBC (based on my original review)
Comment #8
Fabianx CreditAttribution: Fabianx commentedComment #10
Fabianx CreditAttribution: Fabianx commentedComment #11
David_Rothstein CreditAttribution: David_Rothstein commentedOK, so the other issue basically fixed this in Drupal 8 in a totally different way, and this is just the Drupal 7 version.
The patch makes sense. I'm slightly scared of the part where menu_rebuild() calls itself, but I guess it's only in the context of failing to get a lock and waiting a long time anyway, so hopefully it's all OK.
Committed to 7.x - thanks!
I fixed a couple small issues with the code comments on commit: