Closed (duplicate)
Project:
Drupal core
Version:
6.x-dev
Component:
base system
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
12 Dec 2005 at 10:59 UTC
Updated:
7 Sep 2009 at 07:39 UTC
Jump to comment: Most recent file
Comments
Comment #1
chx commentedCompared to the 'cache all aliases' method of old, this one is still faster.
Comment #2
Wesley Tanaka commentedyes, one of the reasons I prematurely upgraded to 4.7 was because what i had read about the old caching behavior.
But "it's faster than before" doesn't seem like a very compelling answer... I'm talking about what it's doing now. Bug's filed against CVS -- it's not necessarily going to be solved in 4.7 branch.
Comment #3
dries commentedWhat is your suggested solution?
How many URL aliases do you have in your database?
Comment #4
Wesley Tanaka commentedLet me explain motivation first: I got a big spike of hits on my site, and I read comments about it which led me to believe that my site had succumbed to the "slashdot effect" and basically gone down during the spike. I was sleeping, and wasn't able to instrument while the spike happened, so I'm looking for possible culprits.
Let me be clear that I'm not sure how much of a performance problem this was under load. However, I was surprised that one page view was generating such a large amount of queries. My pages were previously running about 5-10 queries per page (about 3 or 4 of them in what would be equivalent to blocks) before I switched to Drupal.
I was hoping someone would have a better solution, but here are my thoughts: Most of the links on a page are the same links from page to page. They appear in blocks, which are relatively static; they appear in the header or footer, which are often completely static. Assume we can model a drupal site like this:
* For each page view, print N fixed links from a "fixed pool" and M links at random.
* The "fixed pool" changes slowly (say once every 10 minutes) as different blocks change content. A similar assumption is already encoded in the way blocks are cached for anonymous users.
A further note: on my site, N is a sizeable fraction of the links on most pages, so optimizing for the N links would cut down the number of queries dramatically. Hopefully back into the 10s instead of 100s.
Given that model, it would make sense to split the link pool into two conceptual parts: The part that is in the fixed pool, and the part that is not.
Question: how do you quickly determine which links are in the fixed pool and which ones aren't.
Proposal: use drupal's existing cache mechanism to cache the "fixed pool", by storing a cache of size N+epsilon (epsilon some small number to help increase cache hits) under a single key in the cache table.
Initially, at random, the cache is filled with N random entries.
Every page view loads the cache and tries to use that to resolve url aliases before resorting to a db query.
Every 100 page loads (or one pageload every 10 minutes or something) will then, blind to any global state, use some cheap operation to update the entries in the cache based on the links that were on that page load. With the right update algorithm, the cache would grow to contain all the links that were on all pages, as well as most of the links that were common to many pages.
Comment #5
Wesley Tanaka commentedhttp://drupal.org/files/issues/cache-url-alias.patch
Comments from 47832 (which has been marked as a duplicate of this bug):
Comment #6
merlinofchaos commentedDries: I didn't do any kind of a grand scale benchmark, but I did test a few individual pages on a small CVS installation.
The use of that patch shaved anywhere from 40-70 ms per page hit, plus it shaved a similar number of queries. Where I didn't benchmark it (because I don't have an environment set up and it's time consuming to do so) is on a larger site with a more significant traffic base.
The code style nits can easily be cleaned up when (if) we get to a 2nd version of the patch, which is quite necessary.
killes had suggested that if we set up a predictable crawler we might be able to test it on scratch, which sounds like an ideal place to me.
Comment #7
dries commented"40-70 ms per page hit" has no value unless we can compare it with the total page generation time. Either give the percentage improvement, or both the before and after times. Depending on your setup, 40ms can be 0.1% or 10% improvement. Testing on scratch sounds like a good idea, although it is not exactly an idle machine.
Comment #8
merlinofchaos commentedUnfortunately, the problem there is that it's a pretty flat improvement.
For comparison, on the 'create content' page, which is about as simple of a page as I have, the total generation time went from 180 ms to 120 ms with the patch.
On the front page of angry donuts, which is pulling in 8 or 9 posts and is taking longer to generate for it, I get a slightly bigger savings, -- from around 600ms to around 520ms -- which is a much lower percentage. Though still a pretty good savings.
Comment #9
Wesley Tanaka commentedThat's huge savings per webpage hit on a busy site. I imagine the savings would be even better on a site where the database was busy.
I was poking through the performance of theme('page') today, and thought these visualizations of my page.tpl.php run might be interesting. The main block is the call to _phptemplate_default. I couldn't go any higher than that, since it was called using reflection rather than directly, and the combination of tools I'm using doesn't seem to be able to follow that indirect call. Note that half the time spent is in l(). This is on a completely unused database.
Comment #10
Wesley Tanaka commented2 more levels down the call stack. The call above db_query is drupal_lookup_path. One thing that surprised me (I won't bother uploading the image) is that about 1/4 of the time used up by db_query calls were taken up by variable_get()
Comment #11
Wesley Tanaka commentedI tested my main page on my home machine using 'ab -c 10 -n 100' and got a mean time per request speed-up of a bit more than 12%
With the patch
Concurrency Level: 10
Time taken for tests: 53.511949 seconds
Complete requests: 100
Failed requests: 94
(Connect: 0, Length: 94, Exceptions: 0)
Write errors: 0
Total transferred: 1725501 bytes
HTML transferred: 1698701 bytes
Requests per second: 1.87 [#/sec] (mean)
Time per request: 5351.195 [ms] (mean)
Time per request: 535.119 [ms] (mean, across all concurrent requests)
Transfer rate: 31.49 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 2
Processing: 1500 5326 1185.6 5320 7892
Waiting: 529 1929 937.4 1756 4828
Total: 1500 5326 1185.6 5320 7892
Percentage of the requests served within a certain time (ms)
50% 5320
66% 5920
75% 6152
80% 6369
90% 6862
95% 7196
98% 7315
99% 7892
100% 7892 (longest request)
Without the patch
Concurrency Level: 10
Time taken for tests: 60.978320 seconds
Complete requests: 100
Failed requests: 92
(Connect: 0, Length: 92, Exceptions: 0)
Write errors: 0
Total transferred: 1726379 bytes
HTML transferred: 1699579 bytes
Requests per second: 1.64 [#/sec] (mean)
Time per request: 6097.832 [ms] (mean)
Time per request: 609.783 [ms] (mean, across all concurrent requests)
Transfer rate: 27.63 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 8 29.1 0 138
Processing: 2223 6063 1457.4 5930 10936
Waiting: 742 2525 879.9 2566 4203
Total: 2223 6072 1463.3 5930 10936
Percentage of the requests served within a certain time (ms)
50% 5930
66% 6415
75% 6827
80% 7125
90% 8163
95% 8794
98% 9793
99% 10936
100% 10936 (longest request)
Comment #12
killes@www.drop.org commentedThe standard deviation is larger than the reported gain in performance.
Comment #13
Wesley Tanaka commentedThat seems pretty normal -- the whole curve shifts faster, but less than one standard deviation faster.
Like people from different countries being different heights, but perhaps less than one standard deviation different.
Comment #14
moshe weitzman commentedbenchmarks don't show a speedup which merits the complexity of this semi-cache proposal. if benchmarks od show up which change the argument, lets reopen this.
Comment #15
merlinofchaos commentedAn average 12% is insignificant? Hmm. Given how poorly Drupal fares on my crappy server I'd take a couple percent wherever I could get it; 12% seems like rather a lot to me.
Comment #16
saml commentedI performed a simple test on my local machine (I requested 5 different pages as a logged in user, each page 7 times, one at a time) and collected total SQL query time and page execution time from the devel module.
The result: I got almost only slower page execution times using this patch with i18n and Swedish translation turned on. Total SQL query time was a little shorter though, and when switching to English (still using i18n) there was a slight improvement in general. (All data and calculated meanvalue and change of mean value are given in the attached PDF.)
I'm running
* Win XP
* XAMPP Lite version 1.4.16 ( Apache 2.0.54, MySQL 4.1.14, PHP 5.0.5 + PEAR (basis package))
* CVS-version of Drupal 4.7.2 and patched versions of i18n-files according to: http://drupal.org/node/65801.
I also have introduced a 40 character big Key table of the 'source' column in the locales_source table, which has given truly significant improvements on performance with locale and i18n.
Comment #17
saml commentedDries seems to be right in that it is the PHP preparing the queries that are the real problem, (so let's focus on them instead!): http://buytaert.net/drupal-database-interaction
"Drupal preparing the queries (eg. database prefixing, sanitizing the input to prevent SQL query injection, etc) takes more than 31% of the total execution time. We should look into optimizing the functions that prepare the queries (db_query(), _db_query and _db_query_callback())."
Comment #18
beginner commentedabout cpu usage:
http://lists.drupal.org/archives/development/2006-08/msg00016.html
(and thereby bookmarking this issue, should it be revived...)
Comment #19
brevity commentedHow about a switch similar to 'enable clean urls' to trigger the call of
$path = drupal_get_path_alias($path);in theurl-function (or where appropriate), for a start?Comment #20
mfer commentedNow that work on 6 has begun I think this issue is one that should be on the plate to attack. It's a large performance hit and worth looking into, again.
Comment #21
catchThis is now fixed in Drupal 7.