Interesting finding mentioned by JohnAlbin elsewhere: http://www.cznp.com/blog/3/strtr-vs-str_replace-a-battle-for-speed-and-d...
HEAD:
# ab -c 1 -n 50 http://drupal.test/node/1
Document Path: /node/1
Document Length: 9371 bytes
Concurrency Level: 1
Time taken for tests: 28.608 seconds
Complete requests: 50
Failed requests: 0
Write errors: 0
Total transferred: 489700 bytes
HTML transferred: 468550 bytes
Requests per second: 1.75 [#/sec] (mean)
Time per request: 572.158 [ms] (mean)
Time per request: 572.158 [ms] (mean, across all concurrent requests)
Transfer rate: 16.72 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 11 8.4 16 31
Processing: 469 558 55.6 547 750
Waiting: 453 540 55.5 516 734
Total: 484 569 54.6 562 750
Percentage of the requests served within a certain time (ms)
50% 562
66% 562
75% 562
80% 578
90% 672
95% 719
98% 750
99% 750
100% 750 (longest request)
Patch:
# ab -c 1 -n 50 http://drupal.test/node/1
Document Path: /node/1
Document Length: 9371 bytes
Concurrency Level: 1
Time taken for tests: 26.436 seconds
Complete requests: 50
Failed requests: 0
Write errors: 0
Total transferred: 489700 bytes
HTML transferred: 468550 bytes
Requests per second: 1.89 [#/sec] (mean)
Time per request: 528.723 [ms] (mean)
Time per request: 528.723 [ms] (mean, across all concurrent requests)
Transfer rate: 18.09 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 13 5.8 16 16
Processing: 469 515 46.0 500 703
Waiting: 453 497 46.1 484 687
Total: 484 528 47.3 516 719
Percentage of the requests served within a certain time (ms)
50% 516
66% 531
75% 547
80% 547
90% 578
95% 594
98% 719
99% 719
100% 719 (longest request)
Didn't replace the line for easier benchmarking.
Comment | File | Size | Author |
---|---|---|---|
#130 | db-prefix-benchmark.txt | 3.08 KB | lyricnz |
#130 | db-prefix-benchmark.php_.txt | 5.97 KB | lyricnz |
#126 | 561422-prefix_strreplace-126.patch | 1.65 KB | Mark Theunissen |
#125 | 561422-prefix_strreplace-125.patch | 1.64 KB | Mark Theunissen |
#124 | 561422-prefix_strreplace-124.patch | 1.64 KB | Mark Theunissen |
Comments
Comment #1
mcrittenden CreditAttribution: mcrittenden commentedSweet! Subscribe.
Comment #2
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedI've tried this with a minimal script and I am not convinced this is an improvement.
528 47.3
569 54.6
Lookat these values, they aren't even one std dev apart.
Comment #3
sunstrtr() seems to be much slower than str_replace() when the string subject exceeds a certain char limit, approx. ~100 chars. Perhaps also the amount of replacements plays a role.
1) The patch resulting in the best performance for me. Only t(), format_plural(), and DatabaseConnection converted.
2) A simple test script.
Comment #4
smk-ka CreditAttribution: smk-ka commentedNote that strtr and str_replace are NOT equivalent replacements. From the PHP manual:
This means, if a string contains both @foo and @foo-bar placeholders, strtr will replace the longest match, while str_replace will replace whichever replacement variable comes first, leading to unpredictable results.
Comment #5
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedAh too bad, I guess this pretty much kills it.
I've refined the above script a bit
Comment #6
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedoops
Comment #7
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedMaybe this was a bit premature.
In some cases strtr has no advantage over str_replace, ie for format_plural it should be ok.
Comment #8
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedin t() the change is too dangerous, I think, because if the problem outlined by smk-ka.
in format_plural() it would be safe, but I believe there is only a small gain, if it doesn' hurt. There is only one string to be replaced and the strings are usually short.
In the database class it should get more investigation: Due to the prefixes the change should be safe, the strings are typically long and there are often several replacement in a string.
Comment #9
sunSo we need to benchmark this one.
Comment #10
catchSubscribing. Don't think we'll get much from this, but profiling 1000 node inserts from a migration, strstr takes 1% of the overall request time. 14th most expensive function for exclusive CPU time. So this is worth looking into given it's such a small change.
Comment #11
retester2010 CreditAttribution: retester2010 commented#9: drupal.str-replace.9.patch queued for re-testing.
Comment #13
DaPooch CreditAttribution: DaPooch commentedAfter profiling with XHProf it is quite clear that strtr takes a HUGE performance hit for me. Certain operations take 15-30s to complete and strtr is always listed as the culprit. In some instances it gets over a million calls and can take over 50% of the total operation time. I don't recall this being so performance hefty on other drupal installs... maybe it's a server config issue. Anyone else come across this before?
Comment #14
dalinThis kind of patch could help improving performance of the slowest 5% of page loads. For example the ones right after clearing all the caches where a lot of DB queries happen. In the benchmarks below I've enabled all core modules, all blocks, 10 nodes on the front page, and completely disabled the cache system (by returning FALSE and array() from cache_get() and cache_get_multiple() respectively).
Two things to note:
1) I tried to setup database prefixing to confirm that these variations don't break anything, but any database prefixing that contains a '.' breaks. Or am I misunderstanding how to use prefixing:
2) There doesn't seem to be enough variance between the options to make a firm judgement call. Though the difference should be greater if I could get prefixing working.
First up, the original:
Lets try a static.
Comment #15
dalinSo I figured out how to do the database table prefixing (had to enclose the DB name in ``) and setup prefixes on 19 tables + default. I then repeated benchmarks on the four scenarios that tested in #14 on the front page with 10 nodes, all modules and blocks enabled, anon has all perms, and completely disabled the cache system (by returning FALSE and array() from cache_get() and cache_get_multiple() respectively) to simulate a cache miss. There are 478 queries per page. I did three runs for each scenario. The numbers below are for requests-per-second and number of ms that 75% of pages were served.
no patch
2.21rps 2336ms
2.13rps 2437ms
2.01rps 2582ms
ave: 2.12rps 2451ms
single str_replace()
2.36rps 2164ms
2.36rps 2192ms
2.25rps 2358ms
ave: 2.32rps 2238ms
improvement: 9.4% 8.7%
single str_replace plus static search/replace arrays
2.44rps 2118ms
2.41rps 2135ms
2.37rps 2199ms
ave: 2.41rps 2151ms
improvement: 14% 12%
static replace array + multiple strtr()
2.17rps 2426ms
2.17rps 2383ms
2.18rps 2386ms
ave: 2.17rps 2398ms
improvement: 2.4% 2.2%
Based on these results plus those in #14 it looks like the change makes little improvement when there are no prefixes, but if there are we can see a good gain on high-query-count pages which seems to confirm @sun @Gerhard's research above. I'll throw together a patch using variation #3. Someone correct me if I'm wrong, but prefixes will not change during page generation so we should use plain old static over drupal_static.
Comment #16
dalinComment #18
dalinTestbot answered my question about statics. Simpletest is an example where the prefix changes during the page request. So my first thought was to use the drupal_static_fast pattern, only to discover that it won't work here. To use a static we'd need a workflow something like:
- setup the static variable
- the database connection or prefixes change so reset the static
However we don't have a concept of a "current" connection like we did with db_set_active(). All connections are active and you just choose which one to act on at any given time. Therefore I don't think statics will work.
New patch using variation 2 from #14 and #15.
Comment #19
sunNot sure whether str_replace() is really appropriate here. The last two replacements in the stack are not unique. Furthermore, the passed in $sql string is often a single table name only.
However, playing a bit around with strtr(), it seems like we could at least cut this down to a single strtr().
Comment #20
dalinThe substitutions just need to have the default happen last. str_replace() does replacements in the order of the $search array, this works fine because we add the default to the array last. strtr() does replacements in order of longest to shortest, this works fine because '{' and '}' are always shorter than "{$key}". So either will work fine.
I did benchmarks of the patch in #18 vs. #19 vs. un-patched for 0 prefixes and 19 tables prefixed. 10 nodes on the front page, all modules and blocks enabled, anon has all permissions, the caching system is disabled, 477 queries on the page. No XDebug, APC is enabled. Results attached.
Neither patch makes a statistically significant effect when there aren't any tables prefixed. However with 19 tables prefixed sun's strtr() patch makes a ~3.9% improvement, while my str_replace() makes a ~11% improvement.
Comment #21
dalinI wanted this for a D6 site as well. Here's a patch.
This might actually be the way to do it for D7 as well since it uses strtr() if $db_prefix is not an array.
Comment #22
JamesSharpe CreditAttribution: JamesSharpe commentedThis issue becomes important when there are a large number of prefixes e.g. when using CiviCRM 4 in split database mode where all civicrm tables have a prefix. I was seeing 34% of the page generation time being spent in the prefixTable code. See the benchmarks in #1192620: Performance improvement in prefixTables
Comment #23
sun#1192620: Performance improvement in prefixTables has been marked as duplicate, contains a patch that's 99% identical to #19.
Last comment over there:
Comment #24
lyricnz CreditAttribution: lyricnz commentedHere's a patch for D7/D8 that uses str_replace() rather than strtr() for database prefixing, including pre-calculating the arguments. It uses instance variables rather than static variables, to support >1 DatabaseConnection, and to minimize the code in prefixTables().
Comment #25
lyricnz CreditAttribution: lyricnz commentedFixing crosspost.
Comment #26
pillarsdotnet CreditAttribution: pillarsdotnet commentedI see a repeated pattern of finessing code to gain a 1-5% performance improvement (per-function-call, not overall) based on weird, illogical quirks of the current version of PHP. So we invent silly rules that result in a patch being rejected because it uses (e.g.)
if (empty($foo))
rather than the 5% fasterif (!isset($foo))
.If we're not willing to work toward a 500% improvement in performance, why do we kill ourselves over these tiny incremental gains?
Comment #27
lyricnz CreditAttribution: lyricnz commentedThe change proposed here is not based on some illogical quirk - str_replace() is both logically and practically faster then strtr(). The optimization proposed here is more like 60% faster (typical case: medium length SQL, with single prefix) to 95% faster (medium length SQL, 100 prefixes), for a function that is called a few hundred time per page.
Who says we're not working towards hiphop?
http://fourkitchens.com/blog/2010/02/03/making-drupal-pressflow-more-mun...
http://php.webtutor.pl/en/2011/05/17/drupal-hiphop-for-php-vs-apc-benchm...
Comment #28
catchI have never seen a patch rejected on isset vs empty based on performance, plenty of times because they were incorrectly used.We do avoid array_key_exists() unless it's absolutely necessary, function calls are slower than language constructs regardless of php version. Examples please?
DB table prefixing has been found responsible for 30% slow downs on installs running civicrm, micro-optimization is doing this to code that runs rarely, not that runs potentially hundreds of times on every request.
I'm typing from a phone but would very much question the 500% claim with hiphop. Benchmarks against D6 with hiphop didn't get anything like that vs APC. The main issue with hiphop and D7/8 is not that people don't want to support it, but the poor support in hiphop for pdo and stream wrappers that we rely on, which makes it tricky.
Comment #29
pillarsdotnet CreditAttribution: pillarsdotnet commented@catch:
Why? Incremental performance gains. We avoid
is_array()
and a handful of other PHP functions for the same reason. I wish there was a page somewhere to document the exact subset of PHP that Drupal finds acceptable.(Looking...) Okay, it was
!isset()
vs.is_null()
. In this patch review, @sun authoritatively declared, "Always use !isset() instead of is_null()." Why? 5% performance gain is the only reason I can find. Conversely, coder recommends replacing everysubstr()
by the much slowerdrupal_substr()
, even when the result is guaranteed to contain no unicode byte sequences. So I think there is some inconsistency regarding these rules.Comment #30
catchIf there's no meaningful difference between is_null() and isset() most of the time, why not use the one that's 5% faster? More to the point, why not pick one to standardize on (same as we standardize on print vs. echo despite print being nominally slower) so that there is consistency (same as we do for elseif vs. else if). I'm not sure what you think the problem is?
is_array() isn't avoided, but there are cases where it is called hundreds (or thousands) of times during a request and eats a bunch of cpu. drupal_attributes() in core, and unpack_options() in views are two examples. Sometimes things get solved by trying to optimize the function call, sometimes it's via major refactoring, sometimes both. Drupal's usage is not good, and got considerably worse between D6 and D7, so if there are quick wins for marginal gains they are better than nothing.
Coder is likely playing on the safe side for multibyte string compatibility, if you want to see what the actual 'rules' are look at http://drupal.org/coding-standards, if you think a coding standard is 'silly' open an issue to discuss changing it.
Also, http://drupal.org/sandbox/sun/1181434
Comment #31
sunThe primary reason we're using isset($foo) instead of is_null($foo) is that is_null($foo) throws a PHP notice if $foo is undefined. is_null() only works silently when $foo is actually defined (NULL or an actual value). It's not about performance, it's about preventing sloppy PHP notices. Lastly, is_null() is a negated condition in 99% of all cases - following positive condition logic is much easier for the human brain.
@pillarsdotnet: If you want to discuss a handbook page or whatever other documentation for these things, then open a new issue, please. While I can understand your rant, it's side-tracking the work on the actual issue here.
Comment #32
dalinAnd getting us back to the actual issue at hand...
- Great to see that this works now for more than one database connection.
- Other than that this is almost exactly the same as the patch in #18 except that it removes several of the inline comments. I think we should bring those back in to describe the logic behind what may not be immediately obvious.
Other than that it looks great.
Comment #33
lyricnz CreditAttribution: lyricnz commentedThe latest patch is a bit different in that it pre-calculates the arguments for str_replace() rather than doing this for each invocation of prefixTables().
I think the long comment in #18 is redundant - once the change is committed, the use of strtr() becomes a historical legacy. Updated patch with slightly more verbosity about the table-specific replacement happening ahead of the default prefix.
Comment #34
sunLooks ready to fly for me.
Comment #35
catchThere is a D6-only issue at #42827: optimize/enhance db prefixing, looks like the new code is not as different from there as I thought when updating that issue in 2009, so marking this one for backport.
The patch here is simpler than the patch over there which is also good.
Comment #36
lyricnz CreditAttribution: lyricnz commentedFWIW I just benchmarked another alternative - using str_replace() + creating $search/$replace on the fly, and only including elements that were actually in the string (using strpos()). As expected, performance-wise, it came in between the precalculated version, and the on-the-fly version. Here's the test harness, and results for various numbers of prefixes.
Here's the results (in ms) for 200 iterations of 100 prefixes (roughly realistic for Civicrm)
In order these are:
So the fastest (in all cases, as it happens) is precalculating the arguments and using a single str_replace(). About 20 times faster than current code.
Comment #37
sunThank you for the additional benchmarks!
So the chosen RTBC'ed solution prefixTables_precalc_single_str_replace wins on all ends, and will even shave of some ms for the majority of sites that don't use prefixes, it seems. Very nice.
Comment #38
webchickAwesome! This looks like a nice improvement.
Since this affects the database layer I'd like to see a DB maintainer sign off on this, ideally Crell.
Comment #39
catchThis may help with test run speed, do we have a way to find out?
Comment #40
johnp CreditAttribution: johnp commentedOh, nice to see that there is work going on the prefixing issue!
When I've been profiling drupal 6 sites I noticed that db_prefix_tables took surprisingly long time, so I whipped together my own version. I've used it for a Drupal 6 multisite, with partly shared database, for about a year. It is probably not as elegant, nor as fast as the solution you guys have up come with for D7, but it is about 5-7 times faster than the original when I've tested it . Perhaps someone can find it useful in the wait for a backport to D6.
Here is the dirty hack:
function db_prefix_tables($sql) {
global $db_prefix;
$pattern = '/\{(.+?)\}/';
preg_match_all($pattern, $sql, $tables);
$tables_len=sizeof($tables[0]);
$i=0;
$prefix_default = (array_key_exists("default",$db_prefix)) ? $db_prefix["default"] : "";
while ($tables_len > $i) {
$table = trim($tables[1][$i]);
$table_esc = trim($tables[0][$i]);
$prefix = (array_key_exists($table,$db_prefix)) ? $db_prefix[$table] : $prefix_default;
$sql = str_replace($table_esc,$prefix.$table,$sql);
$i++;
}
return $sql;
}
// John
Comment #41
Crell CreditAttribution: Crell commentedThese should be separate declarations with their own docblocks. I wasn't even aware that PHP allowed multiple properties to be declared together like that. :-) We don't do so anywhere else, so let's not start here.
Same here. This is not a typical pattern in Drupal. Let's not use it.
Powered by Dreditor.
Other than those stylistic issues this looks like a great optimization. Thanks, lyricnz!
Comment #42
lyricnz CreditAttribution: lyricnz commentedComment #43
Crell CreditAttribution: Crell commentedI do love it when we can precompute things.
Comment #44
webchickAwesome. :) Thanks for the sanity check, Crell!
Committed to 8.x and 7.x. Marking for backport to D6.
Comment #45
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis completely broke SQLite on both branches.
Comment #46
Damien Tournoud CreditAttribution: Damien Tournoud commentedPlease commit the attached.
Comment #47
chx CreditAttribution: chx commentedSo SQLite adds a dot for each prefix which is not reflected in the new prefixSearch and prefixReplace properties. The most important part of the patch is
$this->setPrefix($prefixes);
which makes this happen.The other hunk, line by line: it's no longer necessary to mess with $this->prefixes because setPrefix will set it up for us, so the reference is removed. Then, instead of a numeric key we put the default prefix where setPrefix expects it to be. Finally, the foreach is just simplification, reuse the $prefixes variable.
Comment #48
lyricnz CreditAttribution: lyricnz commentedMy bad. But... perhaps we should be bot-testing in some non-MySQL databases?
Comment #49
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis was properly caught three days ago by our testing of secondary databases: https://drupaltesting.org/jenkins/job/test-drupal7-linux-sqlite/
Comment #50
lyricnz CreditAttribution: lyricnz commentedOh sweet, I never knew about that.
Comment #51
aspilicious CreditAttribution: aspilicious commentedI trust you guys but isn't there a possibility to run the tests with this patch on the sql lite test system BEFORE we commit this?
Comment #52
Damien Tournoud CreditAttribution: Damien Tournoud commentedThere is another fix needed for
DatabaseConnection_sqlite::queryTemporary()
. Also, this is an API change, it should be marked as such. Most of the contrib database drivers will likely have to be updated.Comment #53
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis issue introduced an incompatible API change to the Database Engine API. While we are at it, let's clean-up this and remove the defaultPrefix variable. It will make everyone's life easier.
Also fixed the temporary query handling in SQLite, which also needs to mess-up with prefixes.
Comment #54
Damien Tournoud CreditAttribution: Damien Tournoud commentedFix typo.
Comment #55
Damien Tournoud CreditAttribution: Damien Tournoud commentedThe latest patch passed testing on SQLite (https://drupaltesting.org/jenkins/job/test-drupal7-linux-sqlite/241/).
Comment #56
webchickAck!! Good catch, Damien!
Tagging "release blocker". If this patch is ready to go in the next 24 hours or so (would love to see Crell chime in again, if possible), I'll commit it, else I'll roll back the earlier patch so we don't ship 7.3 with broken SQLite support.
Comment #57
Crell CreditAttribution: Crell commentedI didn't realize we had an SQLite test bot now either. Do we have one for Postgres, as well? Does it approve of #54?
It seems sane to me, and I like moving the prefixes into an internal getter method. If Damien and the bots are happy with this on SQLite and Postgres, then I am too.
Comment #58
lyricnz CreditAttribution: lyricnz commentedsetPrefix() should not include literal 'default' in the arrays used for the str_replace() - this is undesirable replacement. Also, this change makes 'default' an unusable table name for explicit prefixing - perhaps we should use a non-valid table name, like '#default'.
Comment #59
lyricnz CreditAttribution: lyricnz commentedAnd the description of field $prefixes "The non-default prefixes used by this database connection" is no longer accurate.
Comment #60
lyricnz CreditAttribution: lyricnz commentedHere's an update of the patch in #54 that updates the field comment, and excludes 'default' from the search/replace arrays - it's still used at the end, of course. I haven't changed the name of the default prefix in the array.
Comment #61
lyricnz CreditAttribution: lyricnz commentedShould function name be plural?
Comment #62
Crell CreditAttribution: Crell commentedHm. Yeah, it probably should be since it returns an array.
Comment #63
lyricnz CreditAttribution: lyricnz commentedI suspect the reason for the name was so that it matched setPrefix() - which supports both $prefix and $prefixes style invocation. Thoughts?
Comment #64
pillarsdotnet CreditAttribution: pillarsdotnet commentedsetPrefix() should then be likewise renamed.
Comment #65
lyricnz CreditAttribution: lyricnz commentedRenamed both.
Comment #66
webchickWe can't arbitrarily rename functions in a stable point release, so we'll need a D7 version of this patch as well.
Comment #67
lyricnz CreditAttribution: lyricnz commentedSo, rename or not rename? I'm not terribly bothered either way.
Comment #68
pillarsdotnet CreditAttribution: pillarsdotnet commentedlyricnz: no rename. Then suggest rename in a follow-up d8-only patch.
@webchick: See #60 for non-rename version.
Comment #69
chx CreditAttribution: chx commentedNo rename. Also, even the most rabid Java OOP fiend would not use a getter to get a property in the same object! I really enjoy (or not) the debate whether properties should have setters and getters and you know where I am on that but the same class? Gimme a break.
This is a reroll of #60 and the interdiff.
Comment #70
lyricnz CreditAttribution: lyricnz commented#69 looks good to me, but would like a DB maintainer to review before RTBC - since the patch is a little bit mine.
Comment #71
Damien Tournoud CreditAttribution: Damien Tournoud commentedTested #69 on SQLite and it passed. I'm very happy with it.
Comment #72
Damien Tournoud CreditAttribution: Damien Tournoud commentedThe patch applies cleanly to both to 7.x and 8.x.
Comment #73
lyricnz CreditAttribution: lyricnz commentedThanks team.
Comment #74
Crell CreditAttribution: Crell commented*stamps his approval*
Comment #75
chx CreditAttribution: chx commentedNote: you can not have a specific prefix for a table named default after this. However, default is a reserved SQL word for everything due to its being used in CREATE / ALTER table probably. I checked this in SQL 92, SQL 99, SQLite, MySQL, PostgreSQL, MS SQL and Oracle. So you can't have a table named default anyways.
Comment #76
Damien Tournoud CreditAttribution: Damien Tournoud commentedThat was true before. The structure of the prefix array (from settings.php) is done that way. There are absolutely no functional change in this patch.
Comment #77
webchickOk great. Thanks a lot for rallying on this, folks!!
Committed to 7.x and 8.x!
Comment #78
webchickComment #79
jhodgdonThis issue is tagged "API change". If it is actually an API change, can someone please provide a summary, if it needs to be added to the update docs in any way?
Also, just pointing out that it is marked "needs d6 port", so once the issue of update docs is resolved, it probably needs to be marked d6 / to be ported?
Comment #80
jhodgdonChanging status until doc question is resolved, so it doesn't get closed behind my back.
Comment #81
catchThis is an API change for contrib database drivers, I don't know what the documentation would look like, maybe just a note and point to this issue.
And it does need backporting to D6, unless we shifted back to #42827: optimize/enhance db prefixing for that.
Downgrading to 'major' since that was original status prior to the regression.
Comment #82
jhodgdonTo document API changes, it's very helpful to have a before/after code example, and a short narrative summary of what changed. Neither of which is obvious to me at first glance on this issue. The issue title is all about strstr, but the patch doesn't seem to have anything to do with that, and seems to have removed part of some database class?
Comment #83
catchThere are two patches:
http://drupal.org/node/561422#comment-4640200 - this fixed the original bug but broke sqlite.
http://drupal.org/node/561422#comment-4659224 - this fixed sqlite. The code change example would be the hunk in that patch dealing with sqlite.
It is possible that contrib drivers aren't affected by this issue in the same way that mysql and postgres didn't need any specific changes.
Comment #84
Crell CreditAttribution: Crell commentedI suspect drivers will only need to be updated if they have custom prefix handling of some kind. Damien obviously knows about it, but we should ping Andrea as well.
Comment #85
catchhttp://drupal.org/update/modules/6/7#driver_prefix - not amazing but should point people in the right direction.
Comment #86
pillarsdotnet CreditAttribution: pillarsdotnet commentedYou mean http://drupal.org/update/modules/6/7#driver_prefixes
Comment #87
catchthat is what I mean, thanks.
Comment #88
lyricnz CreditAttribution: lyricnz commentedNeeds backport to D6
Comment #89
lyricnz CreditAttribution: lyricnz commentedstr_replace() version using a static in db_prefix_tables()
Comment #90
pillarsdotnet CreditAttribution: pillarsdotnet commentedNote that "0 pass(es)" means that it didn't pass, despite the misleading green-tinted "PASSED" prefix.
EDIT: Or maybe that it simply didn't run any tests? At least the test details show that the patch applied properly. Hmm...
Comment #91
lyricnz CreditAttribution: lyricnz commentedDoes D6 simpletest fiddle with prefixes on the fly? Perhaps the static cache is preventing prefixes working.
Comment #92
dalin"Does D6 simpletest fiddle with prefixes on the fly?"
Yes it does.
Comment #93
lyricnz CreditAttribution: lyricnz commentedWith no explicit "set prefix" method, or object that contains enough context, we can't easily [1] cache the arrays used to perform the substitution. So, the closest we can get to a backport is doing a single str_replace on a dynamically generated array. This is option "prefixTables_single_str_replace" from #53 above - about 12x faster than the current code.
[1] Well.. if we kept a second copy of $db_prefix and compared for equality in each db_prefix_tables(), before regenerating the cache, we could still cache it. But that feels a bit ugly+expensive. Thoughts?
Comment #94
lyricnz CreditAttribution: lyricnz commented0 passes. That's strange - simpletest works OK here. Is the testbot doing something wierd?
Comment #95
pillarsdotnet CreditAttribution: pillarsdotnet commentedAsk rfay to be sure, but I think that the d6 testbot *only* checks whether the patch applies; it doesn't actually run any tests.
Comment #96
catchNo unfortunately that is a genuine failure (or non-pass or whatever), there's 190 assertions in D6:
http://qa.drupal.org/pifr/test/16784
Very likely it is the static cache. If you look at the 6.x-1.x memcache branch there is some simpletest-specific code in there to handle the same problem, it's not pretty though.
Comment #97
lyricnz CreditAttribution: lyricnz commentedLatest patch does not include any cache at all - still failed while accessing the DB tables.
Comment #98
andypostAs I see it takes a lot of page request time http://pix.am/qGe6/
Currently str_replace() is used, also marked as duplicate #960798: Faster DatabaseConnection::prefixTables() when database prefix is an array
Test can be found at #960798-3: Faster DatabaseConnection::prefixTables() when database prefix is an array
Comment #99
catch@andypost, this was committed to 7.x and 8.x already?
Comment #100
andypostResults for script from #960798-3: Faster DatabaseConnection::prefixTables() when database prefix is an array
php 5.3.8 - str_replace is much faster!
strtr 0.8406500816 str_replace 0.0321018696
php 5.2.17 results are looks the same
0.0041248798 0.0023090839
Comment #101
andypostCrossposted, suppose we should check this for upcoming PHP 5.4
Comment #102
pillarsdotnet CreditAttribution: pillarsdotnet commented@andypost:
Can you provide a current patch? I don't see where the patch you reference could possibly be faster than the current code, but I would welcome benchmarks to prove otherwise.
At any rate, if you do have a solution that provides better performance than the current code, you should probably open (or re-open) a separate issue, as this one is currently focused on backporting the performance improvements to 6.x.
Comment #103
andypost@pillarsdotnet I see no ability to improve performance for D7 but for D6 it's #93 probably makes sense
d7 & d8 implementation str_replace() d6 implementation strtr()
Comment #104
pillarsdotnet CreditAttribution: pillarsdotnet commented@andypost:
I can assure you that the current 7.x and 8.x code runs fine on my PHP 5.5 site. I'm not certain the Drupal 6.x will ever fully support PHP 5.3 or later.
Well, that's why the issue version is currently at 6.x-dev. The problem with the patch in #93 is that it does not pass the 190 core tests. If you can produce a patch that passes tests, it would be greatly appreciated.
EDIT: Got an answer from webchick and updated the docs.
Comment #105
donquixote CreditAttribution: donquixote commentedJust discovered this problem myself, on a D6 site where the settings.php defines a list of 187 civicrm tables.
I had literally 149.200 calls to strtr() in one request.
I experimented with 5 different solutions:
- original D6 strtr() implementation.
- str_replace() without cache as suggested in #93 (lyricnz)
- str_replace() with static cache (home-made)
- preg_replace_callback() without cache (home-made)
- preg_replace_callback() with static cache in the callback (home-made)
I used array_slice() to experiment with $db_prefix arrays of different size.
I did benchmarking, and compared the result with the result from the original, on a normal page request.
Basically, in every db_prefix_tables() I let all 5 solutions run one after the other, track the time for each, and sum up.
After 500 calls (those that usually happen within one page request), I stop and print the numbers.
Then I do some spreadsheet magic, to calculate what you see here.
higher number = faster
All numbers are relative to the original algorithm.
A number of 31.431 means, this method is around 31.431 times as fast as the original.
This is only for the time for db_prefix_tables(), not the full page generation time.
Conclusion:
preg_replace_callback() does pay off for large numbers, but it only would make sense with a case distinction logic. Probably not worth it (although I love it so much)
The static cache does usually pay off.
We see the biggest effect (not surprisingly) for large numbers.
The static cache could be implemented differently (as an instance variable, etc), which might yield different results.
Comment #106
dalin@donquixote your research seems to confirm the D6 research that I made in #20 a year ago.
Comment #107
Anonymous (not verified) CreditAttribution: Anonymous commented#93: 561422-prefix-strreplace.patch queued for re-testing.
Comment #108
Owen Barton CreditAttribution: Owen Barton commentedSubscribe
Comment #109
fen CreditAttribution: fen commentedSubscribe
Comment #110
andypostBut whats about measures 5.x vs 5.3 as I posted at #561422-100: Replace strtr() with str_replace() for db prefixing
Comment #111
pillarsdotnet CreditAttribution: pillarsdotnet commented@#110 by andypost on October 13, 2011 at 9:18pm:
We need a patch that passes core tests. Nothing else matters.
Comment #112
andypostAs @donquixote pointed in #105 the faster results is static preg_replace_callback() also this looks reasonable to make it ported to upstream.
@pillarsdotnet Both patches are passes tests
current
#561422-93: Replace strtr() with str_replace() for db prefixing
and
#960798-3: Faster DatabaseConnection::prefixTables() when database prefix is an array
But also I'd like to make benchmarks on a small site instance shared 10-30 fields.
Can we make a test of real impact of prefixed tables? (I have no idea of scenario)
Comment #113
pillarsdotnet CreditAttribution: pillarsdotnet commentedNo, the patch in #93 does not pass tests, despite the green color. According to catch in #96:
So we are still waiting for a patch that passes tests.
Also, this is not a "bug report" because nothing is broken.
This is only an opportunity to improve performance for Drupal-6 websites that are running on PHP 5.3 or later.
Adjusting category, priority, and status accordingly.
Comment #114
donquixote CreditAttribution: donquixote commented@pillarsdotnet
Are you aware how severe the performance penalty is on the typical CiviCRM use case we are talking about?
For me that's easily x2 for any request. As mentioned in #105, I saw 149.200 calls to strtr() on one request.
That's not just strtr vs str_replace, the real problem is that our strtr is sitting inside a foreach loop. Even if we stick with strtr, getting it out of foreach should result in a big improvement.
Of course, this only applies to those people with a reason for setting this prefix array. But that's a documented feature of Drupal, with no warning about performance.
Comment #115
pillarsdotnet CreditAttribution: pillarsdotnet commentedSounds like a documentation problem, and you strengthen the point I made earlier. There is an opportunity to improve performance for:
The only time this issue was legitimately a "Critical" "Bug report" was in #45 through #77, where the changes made here "completely broke SQLite on both branches."
Anyhow, complaining is really pointless until someone submits a working patch.
Comment #116
donquixote CreditAttribution: donquixote commented> Anyhow, complaining is really pointless until someone submits a working patch.
Yep.
I would love to.
Unfortunately, I have no idea why the test bot does not run any tests on #93. It seems this is rather a test bot problem, than a problem with the patch. What can we do?
Comment #117
donquixote CreditAttribution: donquixote commented> On PHP 5.3 or later
And I still don't see how this is relevant. Unless you play these points as "OR".
In the civicrm case mentioned above, we could get an improvement even with strtr, just by getting it out of the foreach loop.
Comment #118
dalinSince the testbot changes $db_prefix on the fly (to run the simpletests) I'm guessing that this is an error with the patch, not the bot. Can you get the tests to run locally?
Also, any thoughts on using the technique from the original D6 patch in #20 to use strtr() if $db_prefix is not an array?
Also when you do your benchmarking be sure that neither xDebug or XHProf are being loaded as PHP extensions. They cause a lot of additional overhead on internal PHP functions such as is_array(), and string functions which do not exist in a standard PHP installation (even if you have not enabled profiling).
Also your patch does not conform to coding standards (you are using camel case).
And this is not a minor issue since it can be summarized as "If you flip this switch your performance inexplicably drops by 25%".
Comment #119
lyricnz CreditAttribution: lyricnz commentedCoding standards aside (the code is a backport, from an object-based solution), the patch in #93 doesn't cache anything, so shouldn't be impacted by simpletest changing the prefix. I'll look at it again now.
Comment #120
Mark Theunissen CreditAttribution: Mark Theunissen commented#93: 561422-prefix-strreplace.patch queued for re-testing.
Comment #121
Mark Theunissen CreditAttribution: Mark Theunissen commentedHere's a patch with what I think was the problem, basically the case where the $db_prefix is a simple string was not accounted for, it was setting the default_prefix to be '' instead of $db_prefix.
I also cleaned up the coding standards problems.
Comment #122
lyricnz CreditAttribution: lyricnz commentedYeah, that looks like the bug. Other than renaming the local variables, that's the same as #93 right?
Comment #123
Mark Theunissen CreditAttribution: Mark Theunissen commentedYup.
So I guess we can't use a static cache because of the changing-prefixes-on-the-fly use case? We suspect Simpletest needs to change prefixes on the fly, but is there any other situation where this is required?
Comment #124
Mark Theunissen CreditAttribution: Mark Theunissen commentedRemoving tag 'API change' because this does not break the API.
Here is a new patch that uses preg_replace_callback and doesn't precompute any arrays beforehand. It should work with PHP 4 and 5, as it doesn't use a closure for the callback but rather a real function.
My testing shows this is by far the fastest method, so we should go with this unless there is a problem with the approach?
My test was saving a view on a site with 151 entries in the db_prefixes array, using XHProf to determine the total inclusive wall time spent in the function db_prefix_tables.
There are 9225 total calls to db_prefix_tables on a View save page.
Inc. wall time:
15,503,778 microsecs (15 seconds) with Drupal 6 core default.
3,509,399 microsecs with patch in #121.
433,416 with this patch (using preg_replace_callback).
So with preg_replace_callback it's 12x as fast as the patch in #121.
Comment #125
Mark Theunissen CreditAttribution: Mark Theunissen commentedActually, we don't need the case insensitive match in the regular expression which may speed it up and it certainly doesn't hurt to remove it.
Comment #126
Mark Theunissen CreditAttribution: Mark Theunissen commentedOne more patch, just noticed it's probably safer to use !is_array(), instead of is_string().
Comment #127
lyricnz CreditAttribution: lyricnz commentedpreg_replace_callback() is not universally faster - it depends very much on the number of substitutions and the number of prefixes - see comment 105 above. 151 entries is definitely atypical - any solution needs to be the same or better in the configuration used on 99% of sites: 0-1 prefix.
Also, you seem to have reverted to strtr() in the degenerate (and most common) case. Wasn't str_replace() already proven to be significantly faster?
Comment #128
Mark Theunissen CreditAttribution: Mark Theunissen commentedFor the configuration used on 99% of D6 sites, the 0-1 prefix, my patch leaves Drupal 6 code unchanged. This I believe is the safest option, as Drupal 6 has been out for many years now and optimizing this routine has never been a priority, as it's already very fast. I've just done some testing on a stock Drupal 6 install, and I was unable to produce a statistically significant difference between using str_replace() with two arrays and using strtr().
We also cannot use a static cache as this will break Simpletest, and it is also an API change in Drupal, which could potentially break hundreds of sites. So static cache of $search and $replace arrays are out, in my opinion. Removing those from the results, you'll see preg_replace_callback() is faster or not significantly slower.
In #105, donquixote doesn't actually benchmark the case where there are 0-1 prefixes. The benchmarks start by comparing 3 prefixes. Even then, the improvement is 1.5x vs 1.4x, not significant. As soon as you get to 9 items, preg_replace_callback is faster for the non-static cache.
Regardless of this, donquixote mentions using a static cache in his preg_replace_callback() method, which my patch does not need, as it does not precompute arrays, so comparing my patch vs donquixote's is apples vs. oranges. We need the exact patches used by donquixote and the benchmarking code to make a meaningful comparison.
In cases where there are hundreds of prefixes, preg_replace_callback() is much much faster.
I think the goal of this patch should be to drastically optimize the horrifically slow case, rather than create some complicated set of case statements just to eke out another 5 or 10ms from a page load for the majority of users that for the most part aren't complaining about it.
Comment #129
JacobSingh CreditAttribution: JacobSingh commentedI've been working on a special case where they have 400+ prefixes. We dropped from 256s to 60s on a full cache clear. So +1 to this patch.
Comment #130
lyricnz CreditAttribution: lyricnz commentedI updated the benchmark code from #36 to include a preg_replace_callback() option as used in #126. For 100 prefixes, the various methods took:
i.e. the preg_callback method is about 5 times faster than a single str_replace() and 10 times faster than a single strtr().
However the patch still uses strtr() for the single-prefix-as-string case, and str_replace() is ~20 times faster, so why wouldn't we use that? I know it's not terribly significant in overall page time, but it's a straight replacement in this case.
FWIW, on my machine, the "crossover point", where preg_callback() becomes faster than str_replace() is about 5 prefixes (on a sample SQL with two replacements).
Comment #131
mgiffordIt's been RTBC for 2 years, but don't see this getting into Core for D6.
Comment #132
Fabianx CreditAttribution: Fabianx commentedI disagree, if anyone wants to take over this again for D6 maintenance, they could.
I think we should close this just when D6 is EOL.
Comment #133
donquixote CreditAttribution: donquixote commentedJust saying sorry I did not upload the patches and did not follow up on #128.
And now I don't have this stuff available anymore (at least cannot find it atm), and it no longer applies to any active project I work on.
But #126 looks reasonable, so why not.
(But the rtbc should be renewed by someone still using D6)