in update_status, you're wasting time on the admin index, you're computing your complete array for checking updates every time someone visits the admin page. This includes only 13000 calls to strpos, 431 to function_exists, 247 to is_array, 205 to unserialize, 182 to variable_get, 159 to htmlspecialchars etc. IMHO you should just fill a cache with your array, when you recived an update from d.o, and just display the needed updates from that cache. I saw this bug during my GHOP Task, making a performance Profile for D6 (3pages), you can find the corresponding DrupalIssue here: http://drupal.org/node/196907
Corni

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dww’s picture

Yes, this is a big problem -- I might even call it critical. I was initially writing for correctness and functionality, not performance. In fact, things just got even a little more expensive thanks to the fix for http://drupal.org/node/209242.

As a first step towards solving this, someone should investigate caching the $data array from update_calculate_project_data(). The only slightly hard part is figuring out when to invalidate that cache (especially given #209242), but even if it was just a 5 minute cache, and you always cleared it when visiting either the modules page, available updates or status reports (but not hook_requirements itself). I suspect that'd cut the problem down tremendously. There might be some other optimizations to make, but this seems like the best spot in the code to get the most bang for the effort.

dww’s picture

The other easy/important optimization would be DB caching (not just a static array in the function) the array returned by update_get_projects(). The same invalidation I wrote in #1 would probably be fine here, too.

dww’s picture

Priority: Normal » Critical

I guess we should call this a critical performance bug, though thankfully it only effects admin/*. Really, I suspect a pretty small patch to add DB caching to those 2 functions would almost completely solve this. Any volunteers?

catch’s picture

Title: Excessive wasting of time on the admin index » Add db caching to update_get_projects() and update_calculate_project_data()

More meaningful title.

agentrickard’s picture

Assigned: Unassigned » agentrickard

Working.

agentrickard’s picture

Status: Active » Needs review
FileSize
31.22 KB

Perhaps not the perfect solution. But a first pass.

This patch caches the data created by update_calculate_project_data() in the {cache_update} table for one hour. On specific page requests, the cache is cleared automatically by the new update_project_cache() function.

Those paths are:

-- 'admin/build/modules'
-- 'admin/build/themes'
-- 'admin/reports'
-- 'admin/reports/updates'
-- 'admin/reports/status'

Plus, any page that invokes cache_clear_all() should wipe the data as well.

dww’s picture

Status: Needs review » Needs work

Cool, looks like a good start. A few concerns:

A) Personally, I find it easier to read code like this:

function update_calculate_project_data($available) {
  // Retrieve the projects from cache, if present.
  $projects = update_project_cache();
  if (!empty($projects)) {
    return $projects;
  }
  // Rest of the function, not nested another 2 spaces deeper...
  ...
}

Is that just me? ;) Sure makes this patch an awfully lot bigger and harder to verify by indenting that entire monster function...

B) Minor code style issue: $q= $_GET['q']; (spacing)

C) Perhaps the cache id should be called "update_project_data" instead of "update_projects" to make it clear this is the giant array of project data, not just a list of projects?

D) We should still DB cache the results of update_get_projects(), since that's used in other places, not just update_calculate_project_data().

Thanks for getting this started!
-Derek

agentrickard’s picture

Not a problem.

agentrickard’s picture

Status: Needs work » Needs review
FileSize
2.97 KB

OK, take two. My concern now is that functions like update_get_available that take a $refresh argument may need to pass that argument on to update_get_projects().

Note also that update_project_cache($cid) now takes a required $cid argument. This assumes that the cache rules can be the same for both update_get_projects() and update_calculate_project_data, which may not be correct.

dww’s picture

FileSize
3.09 KB

Looks much better to me, thanks. ;)

1 minor gripe with your patch. update_get_projects() used to cache the project list in a static cache. we should continue to do that, since there are code paths where we hit that twice in the same page load, and it'd be nice to only go all the way to the DB for that list once. attached patch should solve that.

note: i think using the same cache clearing logic for both caches is fine.

agentrickard’s picture

You are correct, and the new patch tests out.

My only concern now is what happens when update_requirements() fires.

If that is not an issue, then this is likely RTBC.

agentrickard’s picture

FileSize
3.11 KB

Actually, the documentation for update_project_cache was wrong. It _retrieves_ from cache rather than _stores_ to cache.

agentrickard’s picture

FileSize
3.13 KB

Sorry. Another minor doc fix.

dww’s picture

FileSize
3.65 KB

More cleanup and explanation in the phpdoc.

I tested this pretty thoroughly, and everything is behaving as hoped. The cache is cleared on the right pages. On the other pages, things come from cache instead of being recomputed, etc.

The *only* slightly weird behavior I could find, which is exactly to be expected with this kind of caching, is the following scenario:

- Install latest core and this patch
- Visit admin/reports/updates (to fetch available data)
- (assuming your core test site is a CVS checkout) edit modules/system/system.module and change the VERSION constant to 6.0-rc1 (to simulate what happens when you're running a version and a new security release is published).
- Visit some random admin page like admin/settings
- You see a nice red warning about your site being insecure.
- Edit the VERSION constant back (as if you upgraded to a new security release)
- Reload the page.
- (Slightly lame) you still see the big red warning.
- (Slightly confusing) you follow the link in the big red warning, you land at admin/reports/updates, the warning disappears, and you're told that everything is fine and up-to-date.

There's really absolutely nothing we can do about this scenario given any kind of caching here. Either we recompute all this stuff on every page load in /admin/* (as originally written -- maximum accuracy for minimum performance), or we cache it, and live with the occasionally confusing results like this. The only other thing I can think of would be a drupal_set_message() every time we recompute the available update status data. Then at the "Slightly confusing" step above, at least the admin would see a message about "Recomputed available updates" or something, and then it'd be less confusing why the warning went away. That, of course, would (again) break the string freeze, though, so I doubt that's going to fly. :(

Anyway, if anyone has any bright ideas here, I'm all ears, but I think we're just going to have to live with this in the interest of performance. It'd be nice to get someone else participating in this issue so Ken and I don't have to just RTBC our own patch. ;)

dww’s picture

FileSize
3.7 KB

I spoke to webchick in IRC about this in some detail. We have no particularly good solution here, though she now sees how this case is distinct from the bugs we've recently fixed regarding when to invalidate the cache of available update data (http://drupal.org/node/209242 and http://drupal.org/node/210141). One possible "solution" (although whacky) was to tie this caching to the 'cache' variable. The idea is that on a dev site where you've disabled caching, you always see the most accurate stuff all the time, but once you turn on the page cache (prod site), you realize you're getting some caching and things might need a little refreshing here and there if you see something counter-intuitive like the thing I mentioned above. However, that's kinda a hack. Here's a related idea: make the caching behavior conditional on a new setting which has no admin UI in core's update.module, but I could at least add a knob for this over at http://drupal.org/project/update_advanced so that folks that really wanted to disable this caching and always see accurate stuff on every admin/* page load could do so. Probably "cannons on sparrows", but I thought I'd post a patch for consideration. I'm guessing we'd rather go with #14, but here you go, just in case. ;)

dww’s picture

FileSize
3.7 KB

Err, sorry, slightly cleaner version of the variable_get() call.

catch’s picture

- Install latest core and this patch
- Visit admin/reports/updates (to fetch available data)
[...]
- Edit the VERSION constant back (as if you upgraded to a new security release)
- Reload the page.
- (Slightly lame) you still see the big red warning.
- (Slightly confusing) you follow the link in the big red warning, you land at admin/reports/updates, the warning disappears, and you're told that everything is fine and up-to-date.

Officially, you're supposed to visit update.php immediately after upgrading, and update_flush_caches() should get this as well right? In that case imo this isn't a problem in the slightest.

dww’s picture

@catch: Sort of. However, it's a) rare that core upgrades (or really most security update releases, even in contrib) contain schema modifications, and b) update.php only clears all the caches if it ran an update. So, in theory, what you say is true, but in practice, that won't actually solve the "problem" in many cases. The issue I described is still a pretty rare/minor thing, so I'm not sure we really care. I'm just being honest about the implications of committing this (otherwise, RTBC) patch.

catch’s picture

@dww

a) yes that's true, but that's the same as what the instructions at http://drupal.org/upgrade/tutorial-introduction suggest. If people follow the instructions to the letter, they'll get expected behaviour here, if they don't, I don't care.

b) _drupal_flush_css_js is called on the update info page to ensure new query strings even if the db update isn't run. Any reason this couldn't do the same?

dww’s picture

FileSize
4.92 KB

a) yup, all true. ;)
b) sure, that'd work.

Attached patch adds it. Seemed quicker to just call cache_clear_all() than update_invalidate_cache(), so we don't have to mess with checking if update.module is enabled. However, while I was at it, I fixed update_invalidate_cache() to clear the whole table, not just the single cid it used to clear, since we've now got 3 possible entries in that table.

JirkaRybka’s picture

#19 b) is just an exception for update.php itself, as it relies on some js/css itself (i.e. before the final full flush). I don't like the idea of introducing more exceptions here, especially that cache flushes early in update.php might trigger cache rebuilds in-progress of big (major version even) updates, with unpredictable consequences.

But as said above, everyone is expected to run update.php after any change done to the code, so this is IMO the way to go. Perhaps document more visibly, that doing update in such a case is useful even if the selectors say "no updates available"? Make update.php performing the final flush straight away, if no updates are going to be offered in these selectors? Add an item "Only flush caches" to the system module updates, if "no updates available" is going to show up?

Please note, that this is just a discussion on a possible follow-up - I'm not burying #16 (still CNR, and not to be stopped by this IMO).

Edit: Cross-posted over #20. Not sure about cache_clear_all() this early in update.php, triggering a rebuild before updating the database schema (!)

dww’s picture

@JirkaRybka: The cache_clear_all() in the patch in #20 is restricted to the 'update_cache' table, which is exclusively owned and operated by the update.module. Even if there's a schema change someday to the update_cache table, we might as well make those changes to an empty table. ;) Note that cache_clear_all() only deletes records from the cache, it doesn't perform any rebuilding itself. So, these tables would only be rebuilt after the update was completed and the admin visited the status report or available updates report. Some of your suggestions for other update.php usability enhancements are worth considering, but I agree, not in here, and not worth holding this up.

dww’s picture

@agentrickard #9: I'm not worried about the $refresh arg to update_get_available(). That's for refreshing the cache of available update data that we download from d.o. All this stuff is about the local cache of when to recompute the list of projects and their current status. The only time we automatically fetch data if there's nothing in the cache ($refresh == TRUE for update_get_available()) is on admin/reports/updates, and we're invalidating the new cache then, anyway. So, I see this as a non-issue. But, thanks for asking, and sorry I missed it in my earlier replies. ;)

agentrickard’s picture

@CorniI How does this patch affect performance? (An obvious question, I know, but still valid.)

@dww I suppose we have to code the 'update_cache_data' setting into this patch as well?

dww’s picture

@agentrickard: re the UI for the 'update_cache_data' setting -- no way. That should either be removed from the patch, or the UI can go into http://drupal.org/project/update_advanced -- which only exists to add some additional "advanced" settings to the update.module's settings tab that were deemed "too confusing" for the core UI. But, there's no need for the UI in this patch (and in fact, it'd break the D6 core string freeze, so that wouldn't fly, anyway). I'd personally rather leave that variable_get() call in there, so that we *could* disable this caching in update_advanced, but it wouldn't kill me to live without it.

The question on "how much does this really help?" is still a good one, and it'd be great if CorniI could reply, though I think it'll obviously be a pretty huge win. ;)

agentrickard’s picture

If we leave the variable_get() in, shouldn't it be set to FALSE by default? Otherwise, you have to use update_advanced to get any benefit from this patch.

dww’s picture

no, the variable determines if we want caching or not. Therefore, it needs to be TRUE by default.

dww’s picture

Ok, re-rolled for a few minor problems:
- If the setting is turned off, it's silly (and wasteful) to call cache_set(), so we check in those 2 places, too.
- Also, if the setting is off, it's silly to check the path and invalidate the (empty) cache, so I refactored update_project_cache(), a little bit, too.
- Removed stray whitespace from previous patches.

However, to avoid potential re-rolls, I'm also including a version of that doesn't include the setting and always assumes we want caching. I don't feel that strongly which one of these two is committed, though I have a preference for the "setting" version, instead of "no-setting". ;) I'll leave it for the community and the core committers to decide.

agentrickard’s picture

Looks good to me.

dww’s picture

Backport of update_cache_10-setting.patch to update_status 5.x-2.* (DRUPAL-5--2 branch). Testing there would be welcome, too.

CorniI’s picture

@agentrickard:
Shall I give it a try? But how fast it was before filing the bug here, you should see it without benchmarks/profiling if you don't have a really fast comoputer ;)

agentrickard’s picture

@CorniI

I am not set up to do performance testing. We are simply asking you to run your original test, then apply the patch, then run the test again. Then report the results.

Note that after you apply the patch, the first admin* call you make will generate the data. Subsequent calls will pull from cache.

CorniI’s picture

I just enabled xdebug and looked at the generated profiler data's from the admin index and the normal index, no special benchmark test's here. But I'm doing some benchmarks for GHOP #127, after them, I can include one for the admin panel and your patch, too.

agentrickard’s picture

Anything that compares the patch to your original bug report would be fine.

birdmanx35’s picture

Got this message when I tried testing it:

$ patch -p0 < update_cache_10-setting.5x.patch
can't find file to patch at input line 8
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: modules/update_status/update_status.module
|===================================================================
|RCS file: /cvs/drupal-contrib/contributions/modules/update_status/update_status.module,v
|retrieving revision 1.83.2.29
|diff -u -p -r1.83.2.29 update_status.module
|--- modules/update_status/update_status.module	22 Jan 2008 18:42:44 -0000	1.83.2.29
|+++ modules/update_status/update_status.module	23 Jan 2008 04:51:27 -0000
--------------------------
File to patch: 
birdmanx35’s picture

Sorry, was testing the backport instead of the real patch. Testing now.

birdmanx35’s picture

Okay, both patches apply cleanly, and there are no problems as far as I can tell on the user's side.

agentrickard’s picture

Status: Needs review » Reviewed & tested by the community

Ok, bumping for core reviewers. Here's the issue as I see it:

-- This patch sets the data to cache for 1 hour.
-- The cache is cleared on several key pages (like the update status page and the modules page).

Is the 1 hour lag acceptable, given that the Advanced Update module will allow the cache to be turned off? If so, RTBC.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs work

Is the cache invalidation not called on cron runs, when new project data is available or is it just not documented? This looks like an omission to me from my expectation and reading the docs.

agentrickard’s picture

Status: Needs work » Needs review

@Gabor.

I don't think this is an issue. Update frequency is either 1) daily or 2) weekly, and the project data cache clears hourly.

See:

function update_cron() {
  $frequency = variable_get('update_check_frequency', 1);
  $interval = 60 * 60 * 24 * $frequency;
  if (time() - variable_get('update_last_check', 0) > $interval) {
    update_refresh();
    _update_cron_notify();
  }
}

Valid values for 'update_check_frequency' are '1' and '7'.

But clearing the cache here might not hurt.

  if (time() - variable_get('update_last_check', 0) > $interval) {
    cache_clear_all('update_project_data', 'cache_update');
    cache_clear_all('update_project_projects', 'cache_update');
    update_refresh();
    _update_cron_notify();
  }

Though a wildcard wipe of {cache_update} might be preferred.

@dww -- opinion?

Gábor Hojtsy’s picture

Well, I said we need to clear these caches when new data comes in as well. If this is already done through some indirection, then it needs to be documented. The docs to update_status_project_cache() does not mention it is cleared when updates come in through requests without direct user interaction.

dww’s picture

Status: Needs review » Needs work

I'd be in favor of:
a) Clearing the entire {cache_update} table every time we fetch new available update data.
b) Explicitly checking the expiration time in update_project_cache() like we do update_get_available() to ensure that we actually flush this cache every hour, instead of just marking it as potentially stale after an hour.

dww’s picture

Status: Needs work » Needs review
FileSize
4.52 KB
5.23 KB

Like so. Also, fixes the 3rd arg to cache_clear_all() from the previous patch (should just be TRUE, not '*').
HEAD patch for D6 core and backport for update_status DRUPAL-5--2 branch in contrib both attached.

webernet’s picture

I did some quick testing and nothing appears to be broken.

int’s picture

Assigned: agentrickard » Unassigned
agentrickard’s picture

Status: Needs review » Reviewed & tested by the community

Looks good, and incorporates the requested changes.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Fixed

Now looks good to me as well, thanks. Committed!

dww’s picture

Great, thanks! Committed the update_status contrib backport patch to DRUPAL-5--2: http://drupal.org/cvs?commit=97798

dww’s picture

Status: Fixed » Needs review
FileSize
699 bytes

Whoops, the final patch that landed didn't include the change to clear {cache_update} when you first hit the update.php info page. Just as well, since the previous version would generate a warning when upgrading from 5.x, since {cache_update} doesn't yet exist in that case. ;) This one is tested and works like a charm.

Gábor Hojtsy’s picture

Status: Needs review » Fixed

OK, committed to Drupal 6.

dww’s picture

Priority: Critical » Minor
Status: Fixed » Needs review
FileSize
590 bytes

Tee hee, whoops. In my haste, I used a less-than-ideal value for the 3rd arg. No big deal, it all works like this, but the attached patch is cleaner. Sorry about that.

RobLoach’s picture

Status: Needs review » Reviewed & tested by the community

Yup.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Fixed

Committed.

dww’s picture

Priority: Minor » Critical

Cool, thanks. Setting this back to the original priority for posterity...

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.