Problem/Motivation
_block_rehash() is a function that loops through all code-declared blocks for a given theme and writes/updates their database entries. It's called by block_flush_caches(), which is invoked on all cron runs when system_cron() attempts to get a list of cache bins to flush. Even if cache_clear_all determines that the block cache doesn't need to be flushed (due to cache minimum lifetime), because _block_rehash() is called directly in the Block module's hook_flush_caches() implementation, it's run on every cron run. That means that there's a DB insert or update for every block declared in code for every active theme on every cron run.
If you happen to have a lot of blocks declared in code, or even a relatively small number of blocks declared in code, but have several active themes, your cron run may take ages to complete, or run out of memory and fail completely, etc. This problem may be especially aggravated if you use the Nodeblock module. This is especially a problem if you run cron often.
Proposed resolution
There may be a few ways to cut down on the number of times _block_rehash() gets run.
- Rather than running _block_rehash() in block_flush_caches() (which seems like a hack), we could probably move it to block's own implementation of hook_cron(). In D8, we could make how often it gets run configurable (though maybe not necessary); in D7, we could make it a hidden variable that defaults to the minimum cache lifetime.
- Another way would be to wrap the call to _block_rehash() in block_flush_caches() with some logic that respects minimum cache lifetime (by either piggybacking off of cache_flush_cache_block or adding a new, similar variable that respects minimum cache lifetime). This might not make the most sense, since this isn't technically cache-related.
- Just makes sure the database does not get updated all the time. There are only three things that can happen to a block coming from the database: the cache gets changed, it goes from enabled to disabled due to invalid region and alter does something. We can track the first two, the third we can compare.
Remaining tasks
Discuss which of the above resolutions is most appropriate, then implement.
User interface changes
Depends on discussion, but probably none.
API changes
If anything, this would be an API addition, but there are ways we could avoid any API changes. See proposed resolutions above.
Comment | File | Size | Author |
---|---|---|---|
#23 | interdiff-22-23.txt | 1.08 KB | David_Rothstein |
#23 | 1693336_23.patch | 6.4 KB | David_Rothstein |
#22 | 1693336_22.patch | 6.43 KB | chx |
#22 | interdiff.txt | 2.34 KB | chx |
#21 | interdiff-8-21.txt | 4.25 KB | David_Rothstein |
Comments
Comment #1
iamEAP CreditAttribution: iamEAP commentedTagging with Performance.
Comment #2
iamEAP CreditAttribution: iamEAP commentedLooks like block module now does this using hook_rebuild() in D8. Moving to D7, removing backport tag.
Comment #3
iamEAP CreditAttribution: iamEAP commentedGonna mark this as a dupe of #2041107: Excess UPDATE {block} queries from _block_rehash(), which has a patch.
Comment #4
catchI actually think this is the right issue - it'd be good to optimize _block_rehash(), but it should simply not be running on hook_cron() ever.
#710874: Add optional argument to hook_flush_caches() so that cron runs can be identified would have fixed this issue, but is currently won't fix. #891600: Page and block caches are cleared the same way on cron as on content change is related and still a bug against 7.x
Attaching a patch that fixes this by calling debug_backtrace() to skip the rehash if we're inside system_cron(), it's a hack but there's no other way to know what's calling block_flush_caches() without the API addition from the other issue.
Comment #4.0
catchMore clear about the scope of the issue.
Comment #5
chx CreditAttribution: chx commentedThere are only three things that can happen to a block coming from the database: the cache gets changed, it goes from enabled to disabled due to invalid region and alter does something. We can track the first two, the third we can compare.
Comment #6
iamEAP CreditAttribution: iamEAP commentedTested #5 and it works very well as advertised. A couple of documentation notes below (#1,2), and a suggestion on code readability (#3).
Minor, but phrasing of this comment is a little ambiguous. Do you mean "In this loop, only the cache might change..." or "In this loop only, the cache might change..."
Similar comment ambiguity here. Would also help by explicitly specifying what the second "this" is.
I feel like the double-negative here is making it a little hard to understand your changes.
The primary action we're concerned with here is writing through changes rather than "not writing through unchanges." If you were to keep track of $changed_database_blocks rather than $unchanged..., and rearrange the conditionals appropriately, I think the whole diff would be a bit easier to grasp.
Comment #7
chx CreditAttribution: chx commentedWell, I guess. Keeping changes would be doable but as the original is needed but then again it's only needed in the alter case and so the additional keeping would be superb small, okay
Comment #8
chx CreditAttribution: chx commentedWell this flip was a great idea, it really did made the code and comments much easier to understand, yes.
Comment #9
iamEAP CreditAttribution: iamEAP commentedLove this! Thanks, chx.
+1 for RTBC from me; someone else (@ultimateboy?) ought to review and flip.
Comment #10
ultimateboy CreditAttribution: ultimateboy commentedAfter many rounds of "apache siege" testing with and without cron runs, I am extremely confident that this patch successfully solves the root issue.
Without this patch, while a (relatively) small load test is being performed, if cron is running, we'd see a spike in "active/locked transactions" and very quickly see connection timeouts. With this patch applied and while cron was running, we saw 100% availability, no spike in MySQL, no failed transactions.
I'm really confident flipping the status to RTBC.
Also, I've gone ahead and marked #2041107: Excess UPDATE {block} queries from _block_rehash() as a duplicate as this patch solves what I was going for over in that issue.
Thank you chx for the patch and iamEAP for the review.
Comment #11
ultimateboy CreditAttribution: ultimateboy commentedJust a quick follow up. I've deployed this patch to our production environment (~100 Drupal sites) and cron is functioning completely as desired at this time.
Comment #11.0
ultimateboy CreditAttribution: ultimateboy commentedUpdated issue summary.
Comment #12
iamEAP CreditAttribution: iamEAP commentedChiming in again: this absolutely improved things after deploying to prod. See this chart from New Relic:
Comment #13
pounardDidn't test the patch myself but +1 seeing this happening.
Comment #14
btopro CreditAttribution: btopro commented+1 for patch #8. Applied to 3 different systems both local and production machines and didn't experience any issues. Also noticed if I had two frames open that cron actively running on 1 frame wasn't negatively impacting the performance of the other (not usually the case w/ how many themes and context driven block positions I have on most distros).
Comment #15
bradallenfisher CreditAttribution: bradallenfisher commentedApplied path and can validate that the only drop in query count was from the patch.
All updates to the block table that would lock the records were prevented from happening.
Was able to navigate around while cron was running via separate browser.
Comment #16
btopro CreditAttribution: btopro commentedSo is there no longer a way to wipe the cache_block table via a cache_clear_all call? While using the admin_menu module and clicking to kill all caches which just trips cache_clear_all; pre patch #8 it wipes the block caches for everything doing the (painfully slow) block_rehash but after the patch it doesn't appear there is any way to globally clear these caches? Is this intended behavior since we're not running under the assumption that caches are being cleaned up correctly in an on-demand manner?
Comment #17
chx CreditAttribution: chx commentedThere is quite some misunderstanding in #16 both of what the patch does and what block module does.
Tthe patch does not affect any logic, absolutely not in Drupal 7 because it's a stable release. The only thing it changes is not running UPDATEs that do nothing because MySQL is too stupid to optimize them away and not lock. Really, no other change.
cache_block
is wiped insystem_cron
anddrupal_flush_all_caches
becauseblock_flush_caches
does areturn array('cache_block');
. The patch does not even touchblock_flush_caches
in any way or form, it changes_block_rehash
which in turn never ever had anything to do withcache_block
but let me repeat myself: you can't find a single thing that has changed in the behavior of_block_rehash
.Comment #18
btopro CreditAttribution: btopro commentedThank you for clarifying, this is a much more layman's explanation of what's being proposed and how it works. I some how knew that and when reviewing the devel query logs to A/B compare started to mix the concepts up in my mind. It also wasn't conceivable to me that something this ineffective / inefficient could be in core so long :).
Drastic performance improvements in my testing for module enable/disable operations, cron runs, and cache clears since it no longer locks the db. Loving this patch. +5 :)
Comment #19
iamEAP CreditAttribution: iamEAP commented8: 1693336_8.patch queued for re-testing.
Comment #20
David_Rothstein CreditAttribution: David_Rothstein commentedGreat patch, and thanks for all the testing. I think we can get this in but see a few issues:
Shouldn't the block be marked as changed in the second part too (when the region is changed)?
Themes can implement alter hooks also.
It would be simplest just to remove the $alter_implemented check altogether. It seems like a miniscule performance improvement compared to the massive performance improvement that is the main goal of the patch.
Comment #21
David_Rothstein CreditAttribution: David_Rothstein commentedHere's a reroll fixing the above issues, and also adding one more very useful assertion to the test (seems like something my fix for point 1 above could easily have been broken).
Good to go?
Comment #22
chx CreditAttribution: chx commentedWell, if we always do the original_database_blocks thing then bookkeeping locally changed blocks is not necessary. It was not for performance, it was for memory but I think we are good (reference counting and all that).
Comment #23
David_Rothstein CreditAttribution: David_Rothstein commentedOh, ha, good point.
Yeah, I think the memory hit of an extra array is not likely to be too large. It could also be improved as a followup.
I removed the $cache variable (since after #22 there's no reason to introduce it; it's not being used). But everything looks great to me now, so as long as this passes tests, it should be good to go.
Comment #24
David_Rothstein CreditAttribution: David_Rothstein commentedCommitted to 7.x - thanks! http://drupalcode.org/project/drupal.git/commit/31d124b
Comment #26
chaunceyt CreditAttribution: chaunceyt commentedI'm responsible for a site that has over 5,000 bean blocks and every time a bean block is created, updated and/or deleted _block_rehash() is called. The query generated by this function is extremely expensive when you have over 5,000 bean blocks. Our team retransformed the query in an attempt to make it less expensive in the stated context.
Issue: https://www.drupal.org/node/2411917
Patch: https://www.drupal.org/files/issues/2411917-_block_rehash.patch