Proposed commit message

Issue #2425259 by catch, sidharrell, nlisgo, Josh Waihi, Fabianx, Berdir, martin107: Router rebuild lock_wait() condition can result in rebuild later in the request (race condition)

The problem

menu_rebuild() / Router::rebuild() has additional race conditions which aren't accounted for by the lock.

1. Processes that fail to acquire a lock, have the menu_rebuild_needed variable / state and no menu masks in memory. If they subsequently call menu_get_item() / any function on the router that checks for rebuild, that will trigger yet another menu_rebuild() / Router::rebuild().

2. Additionally, there's a window between variable_initialize() / state() retrieval and menu_get_item() / Router::[public] where another process may have fully completed a menu_rebuild() / Router::rebuild() and freed the lock. In this case we don't need to rebuild the menu, but we do anyway.

#1 is a serious race condition - it can cause menu_rebuild() / Router::rebuild() to occur over and over again - as one process acquires the lock, other processes lock_wait() / Lock::wait(), then rebuild, then acquire the lock, then more process lock_wait() / Lock::wait() etc. - proper lock stampede.

#2 is more of an optimization to make the current stampede protection more robust.

Both problems exist in 8.x too despite the code having changed a lot. They have been shown in above code with the / for D8 as the alternative implementation.
Its really just the naming that has changed.

Original report
Everytime menu_rebuild() is called, we get a crazy amount of waits on INSERT INTO menu_router and a bunch of DELETE FROM menu_router. I believe there is some sort of race condition for rebuilding menus. The only way I can kill it is to kill off a bunch of processes. I would imagine that setting a lock for the first user to rebuild that table would be a simple enough fix, but I wanted to open a ticket and see if anyone else had come up with a solution first.

Please help, this is crashing our site: http://www.divx.com/ every time we update a menu or clear the cache.

CommentFileSizeAuthor
#5 356399_menu_rebuild-44.patch2.42 KBcatch
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

David_Rothstein’s picture

For #1, I wonder if #1232346: Easy to trigger multiple menu rebuilds per page (including infinite recursion) via menu_get_item() since 7.12 is related (seems to be a somewhat similar issue and touches the same code).

Fabianx’s picture

Uhm, we should put our RTBC patch to this issue, so we can finally get this in D7.

And yes, that issue looks strongly related.

Fabianx’s picture

Assigned: Unassigned » catch

Pinging catch

Fabianx’s picture

https://www.drupal.org/node/356399#comment-9176891 had the last RTBC 7.x patch.

Will put here tomorrow if catch does not beat me to it.

catch’s picture

Status: Active » Needs review
FileSize
2.42 KB

Here it is.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

Back to RTBC (based on my original review)

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 5: 356399_menu_rebuild-44.patch, failed testing.

Fabianx’s picture

Status: Needs work » Reviewed & tested by the community

Fabianx’s picture

Issue summary: View changes
David_Rothstein’s picture

Status: Reviewed & tested by the community » Fixed

OK, so the other issue basically fixed this in Drupal 8 in a totally different way, and this is just the Drupal 7 version.

The patch makes sense. I'm slightly scared of the part where menu_rebuild() calls itself, but I guess it's only in the context of failing to get a lock and waiting a long time anyway, so hopefully it's all OK.

Committed to 7.x - thanks!

I fixed a couple small issues with the code comments on commit:

--- a/includes/menu.inc
+++ b/includes/menu.inc
@@ -2733,9 +2733,9 @@ function menu_rebuild() {
 
     if (_menu_check_rebuild()) {
       // If we get here and menu_masks was not set, then it is possible a menu
-      // being reloaded, or that the process rebuilding the menu was unable to
-      // complete successfully. A missing menu_masks variable could result in a
-      // 404, so re-run the function.
+      // is being reloaded, or that the process rebuilding the menu was unable
+      // to complete successfully. A missing menu_masks variable could result
+      // in a 404, so re-run the function.
       return menu_rebuild();
     }
     return FALSE;
@@ -2762,7 +2762,7 @@ function menu_rebuild() {
     $transaction->rollback();
     watchdog_exception('menu', $e);
   }
-  // Explicitly commit the transaction now, this ensures that the database
+  // Explicitly commit the transaction now; this ensures that the database
   // operations during the menu rebuild are committed before the lock is made
   // available again, since locks may not always reside in the same database
   // connection. The lock is acquired outside of the transaction so should also

  • David_Rothstein committed 860c060 on 7.x
    Issue #2425259 by catch, sidharrell, nlisgo, Josh Waihi, Fabianx, Berdir...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.