Closed (fixed)
Project:
Drupal core
Version:
7.x-dev
Component:
path.module
Priority:
Normal
Category:
Task
Assigned:
Issue tags:
Reporter:
Created:
12 Oct 2008 at 20:03 UTC
Updated:
2 Jan 2014 at 23:45 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
moshe weitzman commentedNice work. And a test too. I hope to test the patch soon.
Comment #2
moshe weitzman commentedThe code here looks good. Not sure how to test it, if at all.
Comment #3
john morahan commentedIf you want to test this now, first apply the latest patch from #298600: DBTNG: make module_implements work regardless of bootstrap phase, then either make the change I suggested and then run the tests, or remove the hidden=TRUE from hook_url_rewrite.info and enable the module to test it manually.
Comment #4
sunwhoa, nice idea. Will try to test this if time permits.
Comment #5
starbow commentedsubscribe
Comment #6
c960657 commentedSeems to work fine. I also tried modifying $options, e.g. $options['base_url'], and that also works.
The test only covers hook_rewrite_url_rewrite_inbound(), not hook_rewrite_url_rewrite_outbound().
I think the
nameanddescriptiontexts from getInfo() are a bit too generic, considering that there are two other tests in the Path group. Also, the outbound rewriting doesn't happen in path.inc but in common.inc. I suggest something like “Path rewriting” and “Test hook_rewrite_url_rewrite_inbound and hook_rewrite_url_rewrite_outbound”.I noticed that hook_rewrite_url_rewrite_inbound() is invoked several times per request. Every time drupal_is_front_page() is called, it calls drupal_get_normal_path() that invokes the hook. This may be optimized, though this is outside the scope of this issue.
Comment #7
john morahan commentedThanks for the review; the details of the test were the part I was least sure of here.
Regarding how often these are invoked - this is no different to how custom_url_rewrite_{in,out}bound in settings.php are invoked already. If it can be reduced, great - but not here.
I'll post a new patch soon with a better test.
Comment #8
john morahan commentedUpdated the test. Also included the simpletest change in this version to make it easier to test. So now simply apply #298600 and this.
Naming the patch .txt to avoid automated testing: this should fail because t.d.o does not know to apply the other patch first. (I wonder how the first one passed?)
Comment #9
john morahan commentedIn light of Gábor's comments on the dev list, benchmarks would also be welcomed.
Comment #10
Dave Cohen commentedHaven't tested the patch, but looked it over. Question...
You use these two loops:
Don't we need to reverse the order of one of these loops? I think the inbound and outbound operations need to be performed in opposite order, although I'm not sure which order should be reversed (or whether it matters which).
For example consider the original url is 'node/42', and module X rewrites it to be X/node/42. Then module Y rewrites it to be Y/X/node/42. If in the other direction we pass Y/X/node/42 to module X, it won't make sense of it. So in the other direction module Y must act first, then X.
Comment #11
cwgordon7 commentedI agree that this needs a bit more thought. The current API does not really support multiple hook implementations very well; $result is modified by reference based on $path, making multiple implementations of the hook problematic. The parameters and return value (if any) of the hook(s) need to be reconsidered.
Comment #12
c960657 commentedI get your point - but wouldn't this be unusual compared to how other hooks work? Currently both "inbound" hooks (hooks that act on form submission) and "outbound" hooks (hooks that change how things are displayed) are invoked in the same order.
Comment #13
Dave Cohen commentedSure it's unusual, but it's better to be unusual and useful, than normal and useless.
If you want to make things more "usual", then we'd make url creation more like form creation. Create a data structure with values similar to the parameters of url(), then call url_alter() or something like that to let modules change it. I think there are some valid reasons to take that sort of approach for outbound URLs.
Comment #14
john morahan commentedHere's a version with just one of the hooks called in reverse order.
Comment #16
john morahan commentedFail is due to #303154-8: Document ini_set() calls in default.settings.php
Comment #17
john morahan commentedComment #18
c960657 commentedSee also #359965: Replace custom_url_rewrite_outbound() and language_url_rewrite() with new hook_url_alter().
Comment #19
dave reidMarked #359965: Replace custom_url_rewrite_outbound() and language_url_rewrite() with new hook_url_alter() as a duplicate of this issue. I'll repost the benchmarking results I did with using drupal_alter() to replace custom_url_rewrite_outbound():
About a +/- 1 ms difference with the patch. Maximum benefit is with the patch and locale module enabled.
Comment #20
moshe weitzman commentedWell done. My only beef is with the test. testHookUrlRewrite does not need to do http requests to assert that url() is working. just call url("user/12") and assert you get back member/12. http requests are the slowest possible way to test something. they are often useful, but not here IMO.
Comment #21
dave reidHere's the progress I made so far for converting both into a hook_url_alter_inbound/outbound. I will need to incorporate some of the tests from this issue and write some simpler ones myself. I don't agree with reversing the module_implements. If modules have weights, those weights should always be respected.
Comment #22
Dave Cohen commentedI will consider this "code needs work" until there is a test involving at least two modules which both implement the inbound and outbound hooks.
...or the order of one of the loops is reversed.
Comment #23
cwgordon7 commentedI agree with Dave Cohen: we should respect the weights, yes, but in this case, respecting the weights means reversing the order of the one of the loops. Consider modifying an outgoing link node/4:
Module1: node/4 -> article/4
Module2: article/4 -> blog/article/4
Module2 in this case has the lower weight, which is being respected. On the incoming url blog/article/4, however, that means the module with the lower weight needs to be considered first to undo what it did:
Module2: blog/article/4 -> article/4
Module1: article/4 -> node/4
Consider what would have happened if it had been the other way around, assuming Module1 is using the regular expression /^article\//:
Module1: blog/article/4 -> blog/article/4
Module2: blog/article/4 -> article/4
The problem would only get worse with more modules implementing the hook. Each module needs to be presented with a certain url for outcoming links and change it to exactly one thing, which is all they should need to worry about changing back.
Hope all that makes sense.
Comment #24
dave reid@cwgordon7: Ok that makes sense.
@DaveCohen: That's why I left the issue as "code needs work" :)
Comment #25
arhak commentedsubscribing
Comment #26
wim leersSubscribing.
Comment #27
moshe weitzman commentedWould be great to revive this and push it in.
Comment #28
dave reidMarking this for my own little patch sprint this weekend.
Comment #29
wim leersI agree with cwgordon7: multiple modules implementing this hook will result in a nightmare. I wonder if it wouldn't be better to stick with the "special function" concept instead of a hook to keep our sanity?
It probably can be made to work with multiple hook implementations, but it will always remain tricky or at the very least, rather difficult to grok.
Comment #30
cwgordon7 commentedHere's an updated patch that has what I believe is the correct implementation. The test case it comes with quite clearly describes the expected behavior and makes the hook easier to understand. The hook is also well documented in system.api.php.
Comment #31
arhak commentedalso check out url_alter
Comment #32
smk-ka commentedI have not tested the patch, but I've noticed this: language_url_rewrite() takes the $options array by reference, and was responsible for initializing $options['language'] if it was unset. The language is then used in a subsequent call to drupal_get_path_alias(). However, that happens BEFORE hook_url_alter_outbound() (including the successor to language_url_rewrite()) is called, so I expect the new code to initialize the language too late. Or, in other words: there seemed to be a reason why language_url_rewrite() was called earlier than custom_url_rewrite().
Comment #34
andypostBot was broken
Comment #36
andypostrestart bot
Comment #38
dries commentedI support this, and I think we should proceed with the hook-approach.
I agree with Wim and cwgordon7 that the problem gets complicated when many modules start rewriting URLs. These cases should be rare -- modules shouldn't rewrite URLs because they can. In some cases, having multiple modules rewrite hooks is a valid use case. So, yes, it can get messy but it is not our task to babysit.
Comment #39
dries commentedBy the way, do we really have to have modules like hook_url_alter_1.module and hook_url_alter_2.module? Those are pretty terrible. We should try to find more elegant ways to these those things, IMO.
Comment #40
sunExactly. I didn't look at its code, nor did I use it in any way (yet) - but the description of http://drupal.org/project/purl sounds like a more stable, reliable, and elegant approach to solve this in a scalable way. Language negotiation in core could be treated as first consumer already. Most of the other consumers are modules that similarly try to provide and determine context. The exception to this are modules that really want to advance on URL aliases only (such as http://drupal.org/project/subpath_alias).
Comment #41
drecute commentedPlease, I am new here, but has anyone test this patch together with a site that runs both domain access and url alter modules at the same time.
Pls let me know thanks.
Comment #42
dave reidTagging and reminding myself to work on this sooner rather than later so we can get this into D7 before code freeze.
Comment #43
aaron commentedsubscribe
Comment #44
moshe weitzman commented@Dave Red - any chance you can revive this patch and the role API one? I think the blocker on role API has been cleared.
Comment #45
smoothify commentedI have been testing out the url_alter module (which uses very similar code to this patch) and have run into an issue with how the inbound hook works.
Currently if a module makes a change to a path and sets the result, the following module isn't aware of that change, so it can only act upon the original unmodified path. If the second module changes the path, then the first modules change is completely lost. Sometimes this may work well, but in cases where you want to act upon the changes of another module this causes problems.
For my use case the following adjustment works:
Original:
Changed:
This carries the $result forward to the next module, which in my use case (using subdomain) solved the problem.
Comment #46
arhak commentedyou mean chaining the url rewrites, right?
that makes a lot of sense
would be helpful the other way around? don't think so
Comment #47
smoothify commentedyes thats right.
For my situation I didn't need to change the outbound links too, but it would probably make sense to keep it consistent.
Chaining both inbound & outbound would allow for the situation mentioned in #23 to work as expected.
One other thought I had on the matter is it might be worth keeping track of the original unaltered path somewhere as well as the chained one.
Comment #48
dave reidIntending to get back and get this finished today if possible. Would this be considered for the code slush phase since we're changing the APIs of an existing feature?
Comment #49
cwgordon7 commentedI don't think this can happen anymore without significant performance implications now that the registry is out.
Comment #50
dave reidI would like to do actual performance tests to back that up on a normal site with and without the locale module enabled. We've shown that there are several use cases for this feature already with Url alter. Asking users to add a custom function to their settings.php in order to make a module work is just sad.
Comment #51
wim leers@cwgordon7: hook_file_url_alter() is also a hook and was recommended/requested by moshe et al. For consistency, it would make sense to also use this approach for these functions. I *think* Dries/webchick would let this one in, because it affects so few people.
I'm too tired to think about the performance consequences at the moment.
Comment #52
mattyoung commented.
Comment #53
dave reidAlrighty, I'm going to focus on re-rolling this because this custom_url_rewrite is a broken API that needs to be fixed in D7.
Comment #54
dave reidComment #55
dave reidMoving path patches to path.module so they're easier to track.
Comment #56
dave reidSo crazy thought, this could and *should* also repalce hook_term_path as well.
Comment #57
dave reidInitial stab at replacing hook_term_path and hand-tested it with forum module and it worked pretty darn good. Finally figured out that I needed to add url_alter_inbound to the list in bootstrap_hooks() in order to get it to fire since drupal_get_normal_path is called before all modules are loaded *duh!*. I didn't run -N to include the tests because I know it doesn't work yet and there are some things we need to decide how exactly this should work.
Currenty we run the url_outbound_alter() before we lookup path aliases in url(). So that means for any forums that want to have path aliases, they need to create the path alias on 'forum/tid' instead of 'taxonomy/term/tid'. That doesn't seem ideal, but maybe it makes sense.
Comment #58
moshe weitzman commentedI think it is fine to expect aliases to happen on the path that admins are seeing (forum/n). They have no clue that this path was taxonomy/term/n for a brief moment. Perhaps we should write an update function for these aliases?
Comment #59
dave reidThe problem that I'm coming into is how we should handle these two scenarios when forum.module is enabled and rewriting forum paths using url():
Case 1: We have an alias for source path 'forum/3' -> 'f3'.
url: taxonomy/term/3
forum_url_alter_outbound: forum/3
drupal_get_path_alias: f3
result of url: f3
drupal_get_normal_path: f3
drupal_lookup_path: forum/3
forum_url_inbound_alter: taxonomy/term/3
reusult of drupal_get_normal_path: taxonomy/term/3
----------------------
Case 2: We have an alias for source path 'taxonomy/term/3' -> 'f3'.
url: taxonomy/term/3
forum_url_outbound_alter: forum/3
drupal_get_path_alias: forum/3
result of url(): forum/3
drupal_get_normal_path: f3
drupal_lookup_path: taxonomy/term/3
forum_url_inbound_alter: taxonomy/term/3
reusult of drupal_get_normal_path: taxonomy/term/3
---------------------
So one way around this is to make sure when url aliases are added is to double check that they are run through drupal_get_normal_path(). So the case 1 the source url for the alias would be 'taxonomy/term/3' instead of 'forum/3'.
But that adds another situation for looking up the path alias when rewriting in url() like in case 2. Becuase we looking the alias before we rewrite paths, we should probably lookup the path alias based on the 'original' path given to url().
Comment #60
dave reidRe-rolled for head.
Comment #62
dave reidTesting to see if I went crazy and this was all caused by a module_list(TRUE, TRUE), which is WTF because the same thing is used to call hook_boot() before hook_url_inbound_alter().
Comment #63
dwwIn IRC, Dave and I determined the source of the badness is that calling module_list(TRUE, TRUE) to refresh the module list from inside a reduced bootstrap level (we're still just at DRUPAL_BOOTSTRAP_PATH when drupal_path_initialize() runs) results in sheer badness. entity_get_info() is confused, we have an empty list of modules (or something), and we end up caching an empty array in {cache}.cid == 'entity_info', resulting in a completely broken site until you manually clear that cache entry. :(
So:
A) WTF, module_list()? Why can't you be refreshed twice during the bootstrap process?
B) WTF, entity_get_info()? Why are you willing to cache an empty array into the DB? ;)
For now, this patch is going to work-around it by calling module_list(FALSE, TRUE) to only get bootstrap modules, but *not* refresh the list.
Comment #64
dwwBTW, forgot to mention: big +1 on the direction of this patch. ;) Making these things hooks will clean up a bunch of stupid in core and remove a lot of WTF. I guess it's implicit that I support this patch if I spent time to help debug it, but I wanted to make it explicit in case that helps anyone decide to commit. ;)
Comment #66
dave reidIf we're going to do this, we should really should run this during a full bootstrap. And looking into _drupal_bootstrap, I don't see any reason why we *shouldn't* just load all modules first, then run drupal_path_initialize(). It's not like we're going to somehow stop execution in DRUPAL_BOOTSTRAP_PATH. It's always going to continue on to DRUPAL_BOOTSTRAP_FULL.
Comment #67
dave reidRevised patch moving drupal_path_initialize to DRUPAL_BOOTSTRAP_FULL and including the missing path alter test modules.
Comment #69
dave reidOk this one is almost ready. Still trying to figure out what's going wrong with the wonky forum tests...
Comment #70
dave reidAdding official D7 api cleanup tag
Comment #72
dave reidForum.module is very broken. Very upset and frustrated.
Comment #74
dave reidThink I've got it finally. Should not have had forum_url_inbound_alter to turn forum/x into taxonomy/term/x because they needed to use the 'forum/x' path for display.
Comment #75
dave reidTry again without contrib patch noise.
Comment #76
dave reidUn-did changes to forum paths.
Comment #78
dave reidNow with 100% random 1 failure from the bot. Also 100% less whitespace changes from my IDE. Now working on profiling.
Comment #79
dave reidComment #80
dave reidRemoved the crappy, crappy node title rewriting examples, because that was just silly. Node titles are not unique. Now onto profiling.
Comment #81
dave reidBasic benchmarks since devel_generate is massively broken with all the recent patches:
Before without custom_url_rewrite_outbound() in settings.php:
----------------------------
Before with custom_url_rewrite_outbound in settings.php:
----------------------------
POST_PATCH (forum_url_alter_outbound() is implemented):
Comment #82
dwwGenerally, I agree that initializing the path can wait until the full bootstrap. However, it turns out that the new authorize.php needs $_GET['q'] initialized (so that BatchAPI works) but definitely does not want a full bootstrap, or it becomes really nasty to try to upgrade existing modules while they're being loaded. :(
Luckily, we don't actually care about a true Drupal Path(tm) inside authorize.php. The only times $_GET['q'] matter are when it's already set. It just turns out there are a lot of places in core that assume it's always set. So, we can just initialize it ourselves inside authorize.php instead of calling drupal_initialize_path(). This is actually an improvement for authorize.php, regardless of this patch (although we need the change if this patch is committed, so I'm just including it here).
We could potentially reconsider if we want to always assume $_GET['q'] is defined in our code base, but that seems like a totally separate issue. Basically, that'd just be a series of PHP notice bug fixes in a followup issue, if we care.
Comment #83
dwwNote to webchick:
Comment #85
dwwSorry, forgot hunks from statistics.module.
Comment #86
dwwOh, and Dave pointed out !isset() would probably be more appropriate in authorize.php than empty() for this...
Comment #87
dave reidI think between dww, moshe and myself, we have pretty much got all of this covered.
Today's patch is brought to you by the phrase "Yay for removing hacks!"
Executive summary:
And reminder again for committing this patch that the following CVS commands must be run after patching, but before committing:
Comment #88
moshe weitzman commented7. Comes with a boatload of tests. Nice work, Dave ... Pleas wait for green before commit.
Comment #89
catchI'd like to know the number of calls to url() on the page which was benchmarked - this number can vary wildly depending on what sort of page is viewed. ideally in the form of webgrind screenshots and/or microbenchmarks similar to those done for #523284: Optimize url() so we can see the difference internal to url() and drupal_lookup_path().
Comment #90
dave reidOk I used a node page with lots of comments. Kcachegrind shows url() being called 207 times. I can attach the screenshot if needed. Run with ab -c 1 -n 500 three times for each condition and took the middle run:
BEFORE PATCH (without custom_url_rewrite_outbound() in settings.php):
BEFORE PATCH (wtih custom_url_rewrite_outbound() in settings.php):
AFTER PATCH:
Comment #91
catchOK that looks pretty good, nothing disastrous anyway.
Comment #92
dave reidFixed a very minor spelling error in the test assertion. No need to reset status.
Comment #93
dave reidRemoving 'needs tests' tag.
Comment #94
sunThis really is a nice patch.
I'm not sure in which horrible hook-weighting disaster we'll run into with this, but I guess there's no other way to just find out. Especially, because it makes the installation of a series of contributed modules a lot easier.
So, +1 for getting this last-minute "clean-up" in. Doesn't really break anything, especially not in core - but rather improves the situation for contributed modules. Aside from that, it was already RTBC around 10/15.
Comment #95
sunoops, cross-posted.
Comment #96
wim leersThis *must* get in if we want to provide a decent developer experience, i.e. if we want to provide a consistent API. The API to alter file URLs (hook_file_url_alter()) specifically uses a hook instead of a "special/magical" function because this was going in as well, because URL rewriting was going to use hooks as well.
Performance impact is extremely minor, if any. This patch has had extensive reviews and solid tests. So I definitely agree that it's time to commit this one. Excellent work, Dave Reid et al. :)
Comment #97
andypost+1 to RTBC
This hooks could be good as solution for #308263: Allow privileged users to bypass the validation of menu items
to make outbound URL protected with some #anchor for example and inbound to check #anchor for existence then pass to original menu item - ability to protect URL like #606608: Use proper menu router paths for comment/* from csrf and leave them without confirmation
Comment #99
dave reidRe-rolled only.
Comment #101
dave reidArg. Reference parameter introduced in path_save(). Tested and passes tests locally.
Comment #102
sunI'd like to see a third $original_path argument for hook_url_outbound_alter() as well.
This review is powered by Dreditor.
Comment #103
dave reidRevised patch addressing sun's concerns.
Comment #104
dave reidwebchick wanted some benchmarks, so I enabled forum and statistics, put 15 comments on a forum topic, enabled the 'administer comments' permission for anonymous user (lots of links on the page), and added French language, so that all paths had to be re-written with the /fr/ prefix.
ab -c 1 -n 500 http://mysql.drupal7dev.local/fr/node/1Before patch:
After patch:
About a +1-2 ms difference with the patch. At this point I know it's a teeny-tiny increase but at least we're doing things the right way now.
Comment #105
webchickOk, cool. My main concern with this patch was removing DRUPAL_BOOTSTRAP_PATH in favour of a full bootstrap. It looks like in the worst-case scenario where you are hitting locale, forum, and statistics-related changes in this patch, you're getting a negligible slow-down.
I also had some questions about that weird $_GET['q'] stuff in authorize.php. But basically, it's only required because authenticate.php is trying to do the least amount of bootstrapping possible. So the only other places that might need to use this are like file_layer_lite.php or something.
I also wondered if we'd hit any problems now that the language URL re-writing is part of locale.module (an optional core module) rather than includes/language.inc. I guess worst-case, a module implementing locale-like features will just need to copy/paste this code.
Otherwise, I agree this is a really solid clean-up, reviewed by lots of people big into both performance and i18n. Although I weep at no longer being one of like 30 people who know about custom_url_rewrite functions (hee hee).
And technically, this did make code freeze, albeit at the last possible second. :P~ Since Dries said he wanted to see this earlier, I feel comfortable committing this, although it is an API change, without checking with him first. (Dries, if that was the wrong call, sorry!)
Therefore, committed to HEAD. Please ensure this API addition (and the removal of hook_term_path()/language_rewrite_url()) is documented in the upgrade guide.
Comment #106
dave reidWorking on the update handbook now, but D'OH! Noticed I forgot to remove custom_url_rewrite_crap from system.api.php. On the bright side, didn't even need to undocument hook_term_path since it wasn't even documented to begin with. Hah!
Comment #107
webchickD'oh! Good catch. Committed that to HEAD too.
Back to needs work for docs.
Comment #108
dave reidYAAAAAAAAAAAAAAAAY!
Updating docs completed ready for review at http://drupal.org/update/modules/6/7#hook_url_outbound_alter and http://drupal.org/update/modules/6/7#hook_url_inbound_alter
Comment #109
chx commentedThis must be rolled back. The original solution worked just fine , we (NowPublic) have a module where custom_url_rewrite sits and it works (i presume it needs to be a bootstrap module) and doing drupal_alter just for the sake of it IS slower than a direct function call and the benchmarks are wrong because a) they were not made on a heavily linked page , node/1 on a def install is a joke b) there was no implementation. It's highly arguable that you want more than one implementation, this is a site-specific performance optiomization, having a hook is pointless. I will make a real page with 300 links and then add an implementation and then we shall see.
Comment #110
chx commentedCurrent HEAD:
Comment #111
chx commentedSry for
function linktest_node_boot() {}i made the module bootstrap by hand anywyas -- not that it matters for outbound.Comment #112
chx commentedOh well. I bow before the developer experience and webchick says most links will be cached anyways. At worst, I will run a hacked Drupal 7 -- that was always case, now won't be any other way.
Comment #113
webchickEr. The question I asked on IRC was whether the benchmarks you ran painted a realistic picture of a Drupal site, since calls to l() within hook_node_view would become part of the content array which afaik is cached. If this is still a realistic scenario regardless of that fact, then fine. I certainly don't want to slow Drupal 7 down any more than it actually is.
Comment #114
dave reidHave my doubts that it was a realistic test, but I do understand the concern. Also see #523284: Optimize url().
Comment #115
dave reidAnd to properly compare old vs new, make sure we are using worst-case scenario. Locale.module enabled and altering language prefixes (a non-default language path), and also using taxonomy_term_path() with forum.module.
Comment #116
dave reidAlso consider there *are* other modules out there that want/need to alter outgoing or incoming paths and using custom_url_rewrite inside a module is a complete WTF. See the list at http://drupal.org/project/url_alter
Comment #117
chx commentedYes, there is that -- the test did not include the existing pieces indeed. There is more to be done here but now is not the time for me. I laid the groundwork, I let others continue.
Comment #118
effulgentsia commented@chx and others concerned about performance of url(): please weigh in on #619566: Don't invoke hook_url_outbound_alter() if there's nothing to invoke (comment 15 on down).
Comment #120
dave reidPlease don't do that. I hate this retest functionality in PIFR2.
Comment #121
gpk commentedThe benchmarks at #104 look completely wrong to me. Previously I could get a cached 304 response (as recorded in the timer column of {accesslog} by statistics.module) in about 9 or 10 ms [on a well-managed shared server]. Now it is 350- 400 ms. (An uncached response is around 450 - 700ms.) This does not surprise me - a full bootstrap is patently an order of magnitude heavier than booting up to the old "DRUPAL_BOOTSTRAP_PATH" level.
What this means is that at the moment the page cache doesn't do a lot for sites running statistics.module. I suspect that #104 had the page cache turned off, so in other words you get a 1 - 2 ms increase in page generation time (about 0.2% - 0.4%) for non-cached responses that are already doing a full bootstrap.
Comment #122
gpk commentedBumping to critical...
What would be the best approach to resolving the problem with statistics.module? Given that it's too late to remove it entirely - maybe restore the DRUPAL_BOOTSTRAP_PATH phase and track which modules implement hook_url_in/outbound_alter() in the bootstrap column in {system} table?
Comment #123
dave reidThe problem is that the hook_url_alter() hooks run best with full bootstrap. Reducing these hooks to run on a reduced bootstrap is going to cause problems. The best solution at this point is people concerned about performance shouldn't be using statistics.module. I'm also working on a revamped 'analytics' module that could replace statistics in D8 that's more like a Google Analytics lite.
Comment #124
effulgentsia commentedPossibly related to the bootstrap issue: #619566: Don't invoke hook_url_outbound_alter() if there's nothing to invoke
Comment #125
effulgentsia commentedShameless plug for #535612: Fix Drupal 6 url() function to call custom_url_rewrite_outbound for external links too. It's been sitting there for a while, and perhaps someone on this list is interested and can help move it along.
Comment #126
webchickThat's not remotely an acceptable solution. I would prefer to roll this patch back than tell people this.
Comment #127
dave reidEither way statistics.module causes load on cached pages before this patch and after it. It's always going to have an effect. That's what I'm referring to. I'd like to get a solution to help make this work better. Let's come up with something.
Comment #128
moshe weitzman commented@webchick - thats always been my recommendation since drupal 1.0 or so. access logging in drupal is a really bad idea. you might think it is unacceptable, but it isn't new either.
Comment #129
catchYeah no-one should ever use statistics module on sites which care about performance. A cached page load should be around 5-10ms on head, a db_insert() is something like 7-15ms, so you just slowed your site down 50% for anonymous users. If statistics module does something else that's bad for performance, it's unlikely to make this already shocking situation significantly worse.
Comment #130
webchickOh. If we're talking normal "db writes on every page load is a really bad idea" kind of performance hit, then fine. Dave made it sound like this hook had introduced a horrific regression even worse than that.
Comment #131
effulgentsia commentedLooks to me like #121 is referring to the hunk in the patch that got committed (#103) that changed statistics_exit() to do a full bootstrap. So it's not just the cost of a db insert, it's the cost of a full bootstrap. @Dave: it's not obvious to me why this is needed. What does statistics_exit() do that requires a full bootstrap? What's the connection to url() and hook_url_outbound_alter()?
Comment #132
effulgentsia commentedOh, I see. it's not url() and hook_url_outbound_alter(), it's having $_GET['q'] properly set, with hook_url_inbound_alter() having had a chance to run. Crap.
Comment #133
gpk commented@130:
>Dave made it sound like this hook had introduced a horrific regression even worse than that.
Yes it has, if statistics.module is enabled.
Here are typical page timers I've seen for viewing a simple node or front page (as recorded in accesslog) with statistics.module enabled, on a shared server running an opcode cache. Vanilla core Drupal, only a few modules enabled, just a few nodes created.
- Prior to this patch making it into core: 9 - 10ms
- After this patch: 350 - 400ms
This is because of the full bootstrap in statistics_exit(). See #121.
For quiet or only moderately busy sites the performance delta can be an order of magnitude worse than this even since not all the relevant bits and pieces (code/cached code/related file & directory info) may be sitting in server RAM and may therefore have to be fetched from the HD. If there is other disk-intensive activity taking place then fetching all this stuff can take a while.
Comment #134
effulgentsia commentedSetting back to fixed as per #112. Regarding #113: #619566: Don't invoke hook_url_outbound_alter() if there's nothing to invoke
Regarding #133: #692044: A site with statistics module enabled is much slower in serving cached pages in D7 than in D6
Comment #135
catchOK that's considerably worse than the db_insert() and sounds unacceptable to me. Although a full bootstrap taking 350ms also seems completely off the wall.
Comment #136
catchresetting status.
Comment #137
gpk commented>a full bootstrap taking 350ms also seems completely off the wall
Seems to me roughly in line with what Dave and chx report above?