Comments

moshe weitzman’s picture

Nice work. And a test too. I hope to test the patch soon.

moshe weitzman’s picture

The code here looks good. Not sure how to test it, if at all.

john morahan’s picture

If you want to test this now, first apply the latest patch from #298600: DBTNG: make module_implements work regardless of bootstrap phase, then either make the change I suggested and then run the tests, or remove the hidden=TRUE from hook_url_rewrite.info and enable the module to test it manually.

sun’s picture

whoa, nice idea. Will try to test this if time permits.

starbow’s picture

subscribe

c960657’s picture

Status: Needs review » Needs work

Seems to work fine. I also tried modifying $options, e.g. $options['base_url'], and that also works.

The test only covers hook_rewrite_url_rewrite_inbound(), not hook_rewrite_url_rewrite_outbound().

I think the name and description texts from getInfo() are a bit too generic, considering that there are two other tests in the Path group. Also, the outbound rewriting doesn't happen in path.inc but in common.inc. I suggest something like “Path rewriting” and “Test hook_rewrite_url_rewrite_inbound and hook_rewrite_url_rewrite_outbound”.

I noticed that hook_rewrite_url_rewrite_inbound() is invoked several times per request. Every time drupal_is_front_page() is called, it calls drupal_get_normal_path() that invokes the hook. This may be optimized, though this is outside the scope of this issue.

john morahan’s picture

Thanks for the review; the details of the test were the part I was least sure of here.

Regarding how often these are invoked - this is no different to how custom_url_rewrite_{in,out}bound in settings.php are invoked already. If it can be reduced, great - but not here.

I'll post a new patch soon with a better test.

john morahan’s picture

Status: Needs work » Needs review
StatusFileSize
new4.4 KB

Updated the test. Also included the simpletest change in this version to make it easier to test. So now simply apply #298600 and this.

Naming the patch .txt to avoid automated testing: this should fail because t.d.o does not know to apply the other patch first. (I wonder how the first one passed?)

john morahan’s picture

In light of Gábor's comments on the dev list, benchmarks would also be welcomed.

Dave Cohen’s picture

Status: Needs review » Needs work

Haven't tested the patch, but looked it over. Question...

You use these two loops:

foreach (module_implements('url_rewrite_outbound') as $module) {
  // do something
}

foreach (module_implements('url_rewrite_inbound') as $module) {
  // do something else
}

Don't we need to reverse the order of one of these loops? I think the inbound and outbound operations need to be performed in opposite order, although I'm not sure which order should be reversed (or whether it matters which).

For example consider the original url is 'node/42', and module X rewrites it to be X/node/42. Then module Y rewrites it to be Y/X/node/42. If in the other direction we pass Y/X/node/42 to module X, it won't make sense of it. So in the other direction module Y must act first, then X.

cwgordon7’s picture

I agree that this needs a bit more thought. The current API does not really support multiple hook implementations very well; $result is modified by reference based on $path, making multiple implementations of the hook problematic. The parameters and return value (if any) of the hook(s) need to be reconsidered.

c960657’s picture

Don't we need to reverse the order of one of these loops?

I get your point - but wouldn't this be unusual compared to how other hooks work? Currently both "inbound" hooks (hooks that act on form submission) and "outbound" hooks (hooks that change how things are displayed) are invoked in the same order.

Dave Cohen’s picture

Sure it's unusual, but it's better to be unusual and useful, than normal and useless.

If you want to make things more "usual", then we'd make url creation more like form creation. Create a data structure with values similar to the parameters of url(), then call url_alter() or something like that to let modules change it. I think there are some valid reasons to take that sort of approach for outbound URLs.

john morahan’s picture

Status: Needs work » Needs review
StatusFileSize
new4.01 KB

Here's a version with just one of the hooks called in reverse order.

Status: Needs review » Needs work

The last submitted patch failed testing.

john morahan’s picture

john morahan’s picture

Status: Needs work » Needs review
dave reid’s picture

Marked #359965: Replace custom_url_rewrite_outbound() and language_url_rewrite() with new hook_url_alter() as a duplicate of this issue. I'll repost the benchmarking results I did with using drupal_alter() to replace custom_url_rewrite_outbound():

ab -c 5 -n 1000 http://mysql.drupalhead.local with 50 nodes, 5 comments per node, and the recent blog posts block enabled:

w/o patch w/o locale module enabled

Requests per second:    5.20 [#/sec] (mean)
Time per request:       961.254 [ms] (mean)
Time per request:       192.251 [ms] (mean, across all concurrent requests)
Transfer rate:          78.21 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   2.0      0      32
Processing:   539  959 123.4    949    2075
Waiting:      502  883 117.4    875    1916
Total:        539  959 123.5    949    2075



w/o patch w/ locale module enabled

Requests per second:    5.05 [#/sec] (mean)
Time per request:       991.042 [ms] (mean)
Time per request:       198.208 [ms] (mean, across all concurrent requests)
Transfer rate:          75.85 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   2.1      0      33
Processing:   570  989 159.4    967    2180
Waiting:      552  911 152.6    891    2063
Total:        570  989 159.4    967    2180



w/ patch w/o locale module enabled

Requests per second:    5.17 [#/sec] (mean)
Time per request:       967.278 [ms] (mean)
Time per request:       193.456 [ms] (mean, across all concurrent requests)
Transfer rate:          77.72 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.9      0      30
Processing:   525  966 120.7    954    2030
Waiting:      513  890 116.3    883    1899
Total:        525  966 120.8    954    2030



w/ patch w/ locale module enabled

Requests per second:    5.07 [#/sec] (mean)
Time per request:       986.513 [ms] (mean)
Time per request:       197.303 [ms] (mean, across all concurrent requests)
Transfer rate:          76.20 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.8      0      29
Processing:   426  985 122.0    974    2048
Waiting:      413  906 118.2    899    1908
Total:        426  985 122.0    974    2048

About a +/- 1 ms difference with the patch. Maximum benefit is with the patch and locale module enabled.

moshe weitzman’s picture

Status: Needs review » Needs work

Well done. My only beef is with the test. testHookUrlRewrite does not need to do http requests to assert that url() is working. just call url("user/12") and assert you get back member/12. http requests are the slowest possible way to test something. they are often useful, but not here IMO.

dave reid’s picture

StatusFileSize
new6.84 KB

Here's the progress I made so far for converting both into a hook_url_alter_inbound/outbound. I will need to incorporate some of the tests from this issue and write some simpler ones myself. I don't agree with reversing the module_implements. If modules have weights, those weights should always be respected.

Dave Cohen’s picture

I will consider this "code needs work" until there is a test involving at least two modules which both implement the inbound and outbound hooks.

...or the order of one of the loops is reversed.

cwgordon7’s picture

If modules have weights, those weights should always be respected.

I agree with Dave Cohen: we should respect the weights, yes, but in this case, respecting the weights means reversing the order of the one of the loops. Consider modifying an outgoing link node/4:

Module1: node/4 -> article/4
Module2: article/4 -> blog/article/4

Module2 in this case has the lower weight, which is being respected. On the incoming url blog/article/4, however, that means the module with the lower weight needs to be considered first to undo what it did:

Module2: blog/article/4 -> article/4
Module1: article/4 -> node/4

Consider what would have happened if it had been the other way around, assuming Module1 is using the regular expression /^article\//:

Module1: blog/article/4 -> blog/article/4
Module2: blog/article/4 -> article/4

The problem would only get worse with more modules implementing the hook. Each module needs to be presented with a certain url for outcoming links and change it to exactly one thing, which is all they should need to worry about changing back.

Hope all that makes sense.

dave reid’s picture

@cwgordon7: Ok that makes sense.
@DaveCohen: That's why I left the issue as "code needs work" :)

arhak’s picture

subscribing

wim leers’s picture

Subscribing.

moshe weitzman’s picture

Would be great to revive this and push it in.

dave reid’s picture

Assigned: Unassigned » dave reid

Marking this for my own little patch sprint this weekend.

wim leers’s picture

I agree with cwgordon7: multiple modules implementing this hook will result in a nightmare. I wonder if it wouldn't be better to stick with the "special function" concept instead of a hook to keep our sanity?

It probably can be made to work with multiple hook implementations, but it will always remain tricky or at the very least, rather difficult to grok.

cwgordon7’s picture

Status: Needs work » Needs review
StatusFileSize
new16.61 KB

Here's an updated patch that has what I believe is the correct implementation. The test case it comes with quite clearly describes the expected behavior and makes the hook easier to understand. The hook is also well documented in system.api.php.

arhak’s picture

also check out url_alter

smk-ka’s picture

I have not tested the patch, but I've noticed this: language_url_rewrite() takes the $options array by reference, and was responsible for initializing $options['language'] if it was unset. The language is then used in a subsequent call to drupal_get_path_alias(). However, that happens BEFORE hook_url_alter_outbound() (including the successor to language_url_rewrite()) is called, so I expect the new code to initialize the language too late. Or, in other words: there seemed to be a reason why language_url_rewrite() was called earlier than custom_url_rewrite().

Status: Needs review » Needs work

The last submitted patch failed testing.

andypost’s picture

Status: Needs work » Needs review

Bot was broken

Status: Needs review » Needs work

The last submitted patch failed testing.

andypost’s picture

Status: Needs work » Needs review

restart bot

Status: Needs review » Needs work

The last submitted patch failed testing.

dries’s picture

I support this, and I think we should proceed with the hook-approach.

I agree with Wim and cwgordon7 that the problem gets complicated when many modules start rewriting URLs. These cases should be rare -- modules shouldn't rewrite URLs because they can. In some cases, having multiple modules rewrite hooks is a valid use case. So, yes, it can get messy but it is not our task to babysit.

dries’s picture

By the way, do we really have to have modules like hook_url_alter_1.module and hook_url_alter_2.module? Those are pretty terrible. We should try to find more elegant ways to these those things, IMO.

sun’s picture

Exactly. I didn't look at its code, nor did I use it in any way (yet) - but the description of http://drupal.org/project/purl sounds like a more stable, reliable, and elegant approach to solve this in a scalable way. Language negotiation in core could be treated as first consumer already. Most of the other consumers are modules that similarly try to provide and determine context. The exception to this are modules that really want to advance on URL aliases only (such as http://drupal.org/project/subpath_alias).

drecute’s picture

Please, I am new here, but has anyone test this patch together with a site that runs both domain access and url alter modules at the same time.

Pls let me know thanks.

dave reid’s picture

Issue tags: +custom_url_rewrite

Tagging and reminding myself to work on this sooner rather than later so we can get this into D7 before code freeze.

aaron’s picture

subscribe

moshe weitzman’s picture

@Dave Red - any chance you can revive this patch and the role API one? I think the blocker on role API has been cleared.

smoothify’s picture

I have been testing out the url_alter module (which uses very similar code to this patch) and have run into an issue with how the inbound hook works.

Currently if a module makes a change to a path and sets the result, the following module isn't aware of that change, so it can only act upon the original unmodified path. If the second module changes the path, then the first modules change is completely lost. Sometimes this may work well, but in cases where you want to act upon the changes of another module this causes problems.

For my use case the following adjustment works:

Original:

foreach (array_reverse(module_implements('url_alter_inbound')) as $module) {
  $function = $module . '_url_alter_inbound';
  $function($result, $path, $path_language);
}

Changed:

foreach (array_reverse(module_implements('url_alter_inbound')) as $module) {
  $function = $module . '_url_alter_inbound';
  $function($result, $path, $path_language);
  $path = !empty($result) ? $result : $path;
}

This carries the $result forward to the next module, which in my use case (using subdomain) solved the problem.

arhak’s picture

you mean chaining the url rewrites, right?
that makes a lot of sense
would be helpful the other way around? don't think so

smoothify’s picture

you mean chaining the url rewrites, right?
that makes a lot of sense
would be helpful the other way around? don't think so

yes thats right.

For my situation I didn't need to change the outbound links too, but it would probably make sense to keep it consistent.

Chaining both inbound & outbound would allow for the situation mentioned in #23 to work as expected.

One other thought I had on the matter is it might be worth keeping track of the original unaltered path somewhere as well as the chained one.

dave reid’s picture

Intending to get back and get this finished today if possible. Would this be considered for the code slush phase since we're changing the APIs of an existing feature?

cwgordon7’s picture

I don't think this can happen anymore without significant performance implications now that the registry is out.

dave reid’s picture

I would like to do actual performance tests to back that up on a normal site with and without the locale module enabled. We've shown that there are several use cases for this feature already with Url alter. Asking users to add a custom function to their settings.php in order to make a module work is just sad.

wim leers’s picture

@cwgordon7: hook_file_url_alter() is also a hook and was recommended/requested by moshe et al. For consistency, it would make sense to also use this approach for these functions. I *think* Dries/webchick would let this one in, because it affects so few people.

I'm too tired to think about the performance consequences at the moment.

mattyoung’s picture

.

dave reid’s picture

Issue tags: +API clean-up

Alrighty, I'm going to focus on re-rolling this because this custom_url_rewrite is a broken API that needs to be fixed in D7.

dave reid’s picture

Category: feature » task
dave reid’s picture

Component: base system » path.module

Moving path patches to path.module so they're easier to track.

dave reid’s picture

So crazy thought, this could and *should* also repalce hook_term_path as well.

dave reid’s picture

StatusFileSize
new18.63 KB

Initial stab at replacing hook_term_path and hand-tested it with forum module and it worked pretty darn good. Finally figured out that I needed to add url_alter_inbound to the list in bootstrap_hooks() in order to get it to fire since drupal_get_normal_path is called before all modules are loaded *duh!*. I didn't run -N to include the tests because I know it doesn't work yet and there are some things we need to decide how exactly this should work.

Currenty we run the url_outbound_alter() before we lookup path aliases in url(). So that means for any forums that want to have path aliases, they need to create the path alias on 'forum/tid' instead of 'taxonomy/term/tid'. That doesn't seem ideal, but maybe it makes sense.

moshe weitzman’s picture

I think it is fine to expect aliases to happen on the path that admins are seeing (forum/n). They have no clue that this path was taxonomy/term/n for a brief moment. Perhaps we should write an update function for these aliases?

dave reid’s picture

The problem that I'm coming into is how we should handle these two scenarios when forum.module is enabled and rewriting forum paths using url():

Case 1: We have an alias for source path 'forum/3' -> 'f3'.

url: taxonomy/term/3
forum_url_alter_outbound: forum/3
drupal_get_path_alias: f3
result of url: f3

drupal_get_normal_path: f3
drupal_lookup_path: forum/3
forum_url_inbound_alter: taxonomy/term/3
reusult of drupal_get_normal_path: taxonomy/term/3

----------------------

Case 2: We have an alias for source path 'taxonomy/term/3' -> 'f3'.

url: taxonomy/term/3
forum_url_outbound_alter: forum/3
drupal_get_path_alias: forum/3
result of url(): forum/3

drupal_get_normal_path: f3
drupal_lookup_path: taxonomy/term/3
forum_url_inbound_alter: taxonomy/term/3
reusult of drupal_get_normal_path: taxonomy/term/3

---------------------

So one way around this is to make sure when url aliases are added is to double check that they are run through drupal_get_normal_path(). So the case 1 the source url for the alias would be 'taxonomy/term/3' instead of 'forum/3'.

But that adds another situation for looking up the path alias when rewriting in url() like in case 2. Becuase we looking the alias before we rewrite paths, we should probably lookup the path alias based on the 'original' path given to url().

dave reid’s picture

Status: Needs work » Needs review
StatusFileSize
new23.4 KB

Re-rolled for head.

Status: Needs review » Needs work

The last submitted patch failed testing.

dave reid’s picture

Status: Needs work » Needs review
StatusFileSize
new22.37 KB

Testing to see if I went crazy and this was all caused by a module_list(TRUE, TRUE), which is WTF because the same thing is used to call hook_boot() before hook_url_inbound_alter().

dww’s picture

In IRC, Dave and I determined the source of the badness is that calling module_list(TRUE, TRUE) to refresh the module list from inside a reduced bootstrap level (we're still just at DRUPAL_BOOTSTRAP_PATH when drupal_path_initialize() runs) results in sheer badness. entity_get_info() is confused, we have an empty list of modules (or something), and we end up caching an empty array in {cache}.cid == 'entity_info', resulting in a completely broken site until you manually clear that cache entry. :(

So:

A) WTF, module_list()? Why can't you be refreshed twice during the bootstrap process?

B) WTF, entity_get_info()? Why are you willing to cache an empty array into the DB? ;)

For now, this patch is going to work-around it by calling module_list(FALSE, TRUE) to only get bootstrap modules, but *not* refresh the list.

dww’s picture

BTW, forgot to mention: big +1 on the direction of this patch. ;) Making these things hooks will clean up a bunch of stupid in core and remove a lot of WTF. I guess it's implicit that I support this patch if I spent time to help debug it, but I wanted to make it explicit in case that helps anyone decide to commit. ;)

Status: Needs review » Needs work

The last submitted patch failed testing.

dave reid’s picture

Status: Needs work » Needs review

If we're going to do this, we should really should run this during a full bootstrap. And looking into _drupal_bootstrap, I don't see any reason why we *shouldn't* just load all modules first, then run drupal_path_initialize(). It's not like we're going to somehow stop execution in DRUPAL_BOOTSTRAP_PATH. It's always going to continue on to DRUPAL_BOOTSTRAP_FULL.

dave reid’s picture

StatusFileSize
new27.41 KB

Revised patch moving drupal_path_initialize to DRUPAL_BOOTSTRAP_FULL and including the missing path alter test modules.

Status: Needs review » Needs work

The last submitted patch failed testing.

dave reid’s picture

Status: Needs work » Needs review
StatusFileSize
new30.14 KB

Ok this one is almost ready. Still trying to figure out what's going wrong with the wonky forum tests...

dave reid’s picture

Issue tags: +D7 API clean-up

Adding official D7 api cleanup tag

Status: Needs review » Needs work

The last submitted patch failed testing.

dave reid’s picture

Status: Needs work » Needs review
StatusFileSize
new34.08 KB

Forum.module is very broken. Very upset and frustrated.

Status: Needs review » Needs work

The last submitted patch failed testing.

dave reid’s picture

Status: Needs work » Needs review
StatusFileSize
new36.36 KB

Think I've got it finally. Should not have had forum_url_inbound_alter to turn forum/x into taxonomy/term/x because they needed to use the 'forum/x' path for display.

dave reid’s picture

StatusFileSize
new34.55 KB

Try again without contrib patch noise.

dave reid’s picture

StatusFileSize
new31.12 KB

Un-did changes to forum paths.

Status: Needs review » Needs work

The last submitted patch failed testing.

dave reid’s picture

Status: Needs work » Needs review

Now with 100% random 1 failure from the bot. Also 100% less whitespace changes from my IDE. Now working on profiling.

dave reid’s picture

StatusFileSize
new27.12 KB
dave reid’s picture

StatusFileSize
new26.68 KB

Removed the crappy, crappy node title rewriting examples, because that was just silly. Node titles are not unique. Now onto profiling.

dave reid’s picture

Basic benchmarks since devel_generate is massively broken with all the recent patches:

Before without custom_url_rewrite_outbound() in settings.php:

Concurrency Level:      1
Time taken for tests:   177.262 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      6908500 bytes
HTML transferred:       6664500 bytes
Requests per second:    2.82 [#/sec] (mean)
Time per request:       354.523 [ms] (mean)
Time per request:       354.523 [ms] (mean, across all concurrent requests)
Transfer rate:          38.06 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   285  354  55.6    349     484
Waiting:      269  336  52.9    331     467
Total:        285  354  55.6    349     484

Percentage of the requests served within a certain time (ms)
  50%    349
  66%    389
  75%    412
  80%    419
  90%    431
  95%    435
  98%    439
  99%    442
 100%    484 (longest request)

----------------------------

Before with custom_url_rewrite_outbound in settings.php:

function custom_url_rewrite_outbound(&$path, $options, $original_path) {
  if ($path == 'blah') {
    $path = 'foobar';
  }
}
Concurrency Level:      1
Time taken for tests:   180.408 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      6908500 bytes
HTML transferred:       6664500 bytes
Requests per second:    2.77 [#/sec] (mean)
Time per request:       360.815 [ms] (mean)
Time per request:       360.815 [ms] (mean, across all concurrent requests)
Transfer rate:          37.40 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   285  361  51.7    368     455
Waiting:      270  342  49.2    349     432
Total:        285  361  51.7    368     455

Percentage of the requests served within a certain time (ms)
  50%    368
  66%    391
  75%    407
  80%    420
  90%    428
  95%    436
  98%    441
  99%    447
 100%    455 (longest request)

----------------------------

POST_PATCH (forum_url_alter_outbound() is implemented):

Concurrency Level:      1
Time taken for tests:   178.556 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      6908500 bytes
HTML transferred:       6664500 bytes
Requests per second:    2.80 [#/sec] (mean)
Time per request:       357.113 [ms] (mean)
Time per request:       357.113 [ms] (mean, across all concurrent requests)
Transfer rate:          37.78 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   286  357  54.0    353     475
Waiting:      270  338  51.5    333     443
Total:        286  357  54.0    353     475

Percentage of the requests served within a certain time (ms)
  50%    353
  66%    389
  75%    408
  80%    421
  90%    433
  95%    436
  98%    441
  99%    449
 100%    475 (longest request)
dww’s picture

StatusFileSize
new28.18 KB

Generally, I agree that initializing the path can wait until the full bootstrap. However, it turns out that the new authorize.php needs $_GET['q'] initialized (so that BatchAPI works) but definitely does not want a full bootstrap, or it becomes really nasty to try to upgrade existing modules while they're being loaded. :(

Luckily, we don't actually care about a true Drupal Path(tm) inside authorize.php. The only times $_GET['q'] matter are when it's already set. It just turns out there are a lot of places in core that assume it's always set. So, we can just initialize it ourselves inside authorize.php instead of calling drupal_initialize_path(). This is actually an improvement for authorize.php, regardless of this patch (although we need the change if this patch is committed, so I'm just including it here).

We could potentially reconsider if we want to always assume $_GET['q'] is defined in our code base, but that seems like a totally separate issue. Basically, that'd just be a series of PHP notice bug fixes in a followup issue, if we care.

dww’s picture

Note to webchick:

cvs add modules/simpletest/tests/url_alter_test.info
cvs add modules/simpletest/tests/url_alter_test.install
cvs add modules/simpletest/tests/url_alter_test.module

Status: Needs review » Needs work

The last submitted patch failed testing.

dww’s picture

Status: Needs work » Needs review
StatusFileSize
new28.9 KB

Sorry, forgot hunks from statistics.module.

dww’s picture

StatusFileSize
new28.9 KB

Oh, and Dave pointed out !isset() would probably be more appropriate in authorize.php than empty() for this...

dave reid’s picture

Status: Needs review » Reviewed & tested by the community

I think between dww, moshe and myself, we have pretty much got all of this covered.

Today's patch is brought to you by the phrase "Yay for removing hacks!"

Executive summary:

  1. Adds a hook_url_outbound_alter() that replaces custom_url_rewrite_outbound() and allows other modules to use this functionality. Yay for removing hacks!
  2. Converts language_url_rewrite() to locale.module as locale_url_outbound_alter(). Yay for proper hook implementations.
  3. Adds a hook_url_inbound_alter() that replaces custom_url_rewrite_inbound(). Yay for removing hacks! Because more than one module may be altering paths, we want to make sure they always run in opposite order of hook_url_outbound_alter. For example, if the outbound path 'foo' is altered to 'bar' and then to 'ferzle', the inbound path 'ferzle' should be altered to 'bar' first, then back to 'foo.
  4. Removes the DRUPAL_BOOTSTRAP_PATH phase because we want to invoke drupal_alter during drupal_initialize_path(). Since we are just a short, short step from DRUPAL_BOOTSTRAP_FULL and loading all modules, the path phase has been merged into DRUPAL_BOOTSTRAP_FULL.
  5. {url_alias}.source paths will always be run through durpal_get_normal_path(), therefore run through hook_url_inbound_alter to get the absolute 'normal' path. We also make sure that if there is an url alias for a path used with url(), the alias will always override any path altering.
  6. Removes taxonomy_term_path() since modules can just implement hook_url_outbound_alter() now. Yay for removing hacks! See the implementation of forum_url_outbound_alter():
    function forum_url_outbound_alter(&$path, &$options) {
      if (preg_match('!^taxonomy/term/(\d+)!', $path, $matches)) {
        $term = taxonomy_term_load($matches[1]);
        if ($term && $term->vocabulary_machine_name == 'forums') {
          $path = 'forum/' . $matches[1];
        }
      }
    }
    
  7. As moshe points out, comes with extensive tests. I also manually tested the install process, update.php, authorize.php, and lots of other random click-throughs.

And reminder again for committing this patch that the following CVS commands must be run after patching, but before committing:

cvs add modules/simpletest/tests/url_alter_test.info
cvs add modules/simpletest/tests/url_alter_test.install
cvs add modules/simpletest/tests/url_alter_test.module
moshe weitzman’s picture

7. Comes with a boatload of tests. Nice work, Dave ... Pleas wait for green before commit.

catch’s picture

I'd like to know the number of calls to url() on the page which was benchmarked - this number can vary wildly depending on what sort of page is viewed. ideally in the form of webgrind screenshots and/or microbenchmarks similar to those done for #523284: Optimize url() so we can see the difference internal to url() and drupal_lookup_path().

dave reid’s picture

Ok I used a node page with lots of comments. Kcachegrind shows url() being called 207 times. I can attach the screenshot if needed. Run with ab -c 1 -n 500 three times for each condition and took the middle run:

BEFORE PATCH (without custom_url_rewrite_outbound() in settings.php):

Requests per second:    2.91 [#/sec] (mean)
Time per request:       343.446 [ms] (mean)
Time per request:       343.446 [ms] (mean, across all concurrent requests)
Transfer rate:          106.78 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   326  343  15.5    339     420
Waiting:      312  329  14.9    324     406
Total:        326  343  15.5    339     420

BEFORE PATCH (wtih custom_url_rewrite_outbound() in settings.php):

Requests per second:    2.89 [#/sec] (mean)
Time per request:       345.625 [ms] (mean)
Time per request:       345.625 [ms] (mean, across all concurrent requests)
Transfer rate:          106.11 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   328  346  15.1    342     426
Waiting:      314  331  14.6    328     407
Total:        328  346  15.1    342     426

AFTER PATCH:

Requests per second:    2.90 [#/sec] (mean)
Time per request:       345.387 [ms] (mean)
Time per request:       345.387 [ms] (mean, across all concurrent requests)
Transfer rate:          106.18 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   329  345  16.6    340     456
Waiting:      315  331  15.8    326     441
Total:        329  345  16.6    340     456
catch’s picture

OK that looks pretty good, nothing disastrous anyway.

dave reid’s picture

StatusFileSize
new28.9 KB

Fixed a very minor spelling error in the test assertion. No need to reset status.

dave reid’s picture

Issue tags: -Needs tests

Removing 'needs tests' tag.

sun’s picture

Issue tags: +Needs tests

This really is a nice patch.

I'm not sure in which horrible hook-weighting disaster we'll run into with this, but I guess there's no other way to just find out. Especially, because it makes the installation of a series of contributed modules a lot easier.

So, +1 for getting this last-minute "clean-up" in. Doesn't really break anything, especially not in core - but rather improves the situation for contributed modules. Aside from that, it was already RTBC around 10/15.

sun’s picture

Issue tags: -Needs tests, -hooks, -url, -path

oops, cross-posted.

wim leers’s picture

This *must* get in if we want to provide a decent developer experience, i.e. if we want to provide a consistent API. The API to alter file URLs (hook_file_url_alter()) specifically uses a hook instead of a "special/magical" function because this was going in as well, because URL rewriting was going to use hooks as well.

Performance impact is extremely minor, if any. This patch has had extensive reviews and solid tests. So I definitely agree that it's time to commit this one. Excellent work, Dave Reid et al. :)

andypost’s picture

+1 to RTBC
This hooks could be good as solution for #308263: Allow privileged users to bypass the validation of menu items
to make outbound URL protected with some #anchor for example and inbound to check #anchor for existence then pass to original menu item - ability to protect URL like #606608: Use proper menu router paths for comment/* from csrf and leave them without confirmation

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

dave reid’s picture

Status: Needs work » Reviewed & tested by the community
StatusFileSize
new27.89 KB

Re-rolled only.

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

dave reid’s picture

Status: Needs work » Reviewed & tested by the community
StatusFileSize
new27.91 KB

Arg. Reference parameter introduced in path_save(). Tested and passes tests locally.

sun’s picture

+++ includes/common.inc	20 Oct 2009 17:31:24 -0000
@@ -2387,10 +2388,12 @@ function url($path = NULL, array $option
+  // Preserve the original path before altering or aliasing.
+  $original_path = $path;
+
+  // Allow other modules to alter the outbound URL and options.
+  drupal_alter('url_outbound', $path, $options);
+++ modules/system/system.api.php	20 Oct 2009 17:31:26 -0000
@@ -2652,5 +2652,59 @@ function hook_page_delivery_callback_alt
+ * @param $original_path
+ *   The original path, before being checked for path aliases or altered by the
+ *   modules.
...
+function hook_url_inbound_alter(&$path, $original_path, $path_language) {
...
+function hook_url_outbound_alter(&$path, &$options) {

I'd like to see a third $original_path argument for hook_url_outbound_alter() as well.

This review is powered by Dreditor.

dave reid’s picture

Status: Reviewed & tested by the community » Needs review
StatusFileSize
new28.15 KB

Revised patch addressing sun's concerns.

dave reid’s picture

webchick wanted some benchmarks, so I enabled forum and statistics, put 15 comments on a forum topic, enabled the 'administer comments' permission for anonymous user (lots of links on the page), and added French language, so that all paths had to be re-written with the /fr/ prefix.

ab -c 1 -n 500 http://mysql.drupal7dev.local/fr/node/1

Before patch:

Requests per second:    2.08 [#/sec] (mean)
Time per request:       480.172 [ms] (mean)
Time per request:       480.172 [ms] (mean, across all concurrent requests)
Transfer rate:          64.46 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   459  480  16.8    476     626
Waiting:      442  463  16.6    459     609
Total:        459  480  16.8    476     626

Requests per second:    2.08 [#/sec] (mean)
Time per request:       481.345 [ms] (mean)
Time per request:       481.345 [ms] (mean, across all concurrent requests)
Transfer rate:          64.31 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   457  481  20.9    476     700
Waiting:      440  464  20.5    459     678
Total:        457  481  20.9    476     700

After patch:

Requests per second:    2.08 [#/sec] (mean)
Time per request:       481.528 [ms] (mean)
Time per request:       481.528 [ms] (mean, across all concurrent requests)
Transfer rate:          64.29 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   462  481  16.8    479     686
Waiting:      446  465  16.2    462     647
Total:        462  481  16.8    479     686

Requests per second:    2.07 [#/sec] (mean)
Time per request:       482.825 [ms] (mean)
Time per request:       482.825 [ms] (mean, across all concurrent requests)
Transfer rate:          64.11 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   464  483  14.7    479     561
Waiting:      448  466  14.5    462     545
Total:        464  483  14.7    479     561

About a +1-2 ms difference with the patch. At this point I know it's a teeny-tiny increase but at least we're doing things the right way now.

webchick’s picture

Status: Needs review » Needs work
Issue tags: -D7 API clean-up +Needs documentation

Ok, cool. My main concern with this patch was removing DRUPAL_BOOTSTRAP_PATH in favour of a full bootstrap. It looks like in the worst-case scenario where you are hitting locale, forum, and statistics-related changes in this patch, you're getting a negligible slow-down.

I also had some questions about that weird $_GET['q'] stuff in authorize.php. But basically, it's only required because authenticate.php is trying to do the least amount of bootstrapping possible. So the only other places that might need to use this are like file_layer_lite.php or something.

I also wondered if we'd hit any problems now that the language URL re-writing is part of locale.module (an optional core module) rather than includes/language.inc. I guess worst-case, a module implementing locale-like features will just need to copy/paste this code.

Otherwise, I agree this is a really solid clean-up, reviewed by lots of people big into both performance and i18n. Although I weep at no longer being one of like 30 people who know about custom_url_rewrite functions (hee hee).

And technically, this did make code freeze, albeit at the last possible second. :P~ Since Dries said he wanted to see this earlier, I feel comfortable committing this, although it is an API change, without checking with him first. (Dries, if that was the wrong call, sorry!)

Therefore, committed to HEAD. Please ensure this API addition (and the removal of hook_term_path()/language_rewrite_url()) is documented in the upgrade guide.

dave reid’s picture

Status: Needs work » Needs review
StatusFileSize
new2.62 KB

Working on the update handbook now, but D'OH! Noticed I forgot to remove custom_url_rewrite_crap from system.api.php. On the bright side, didn't even need to undocument hook_term_path since it wasn't even documented to begin with. Hah!

webchick’s picture

Status: Needs review » Needs work

D'oh! Good catch. Committed that to HEAD too.

Back to needs work for docs.

dave reid’s picture

Status: Needs work » Fixed
chx’s picture

Status: Fixed » Active

This must be rolled back. The original solution worked just fine , we (NowPublic) have a module where custom_url_rewrite sits and it works (i presume it needs to be a bootstrap module) and doing drupal_alter just for the sake of it IS slower than a direct function call and the benchmarks are wrong because a) they were not made on a heavily linked page , node/1 on a def install is a joke b) there was no implementation. It's highly arguable that you want more than one implementation, this is a site-specific performance optiomization, having a hook is pointless. I will make a real page with 300 links and then add an implementation and then we shall see.

chx’s picture

function linktest_node_view() {
  for ($i = 0; $i < 500; $i++) l($i, "node/$i");
}
function linktest_node_boot() {}
function linktest_url_outbound_alter() {}
function custom_url_rewrite_outbound() {}
Rolled back aka before patch:
<code>
This is ApacheBench, Version 2.3 <$Revision: 655654 $>              
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/      

Benchmarking localhost (be patient).....done


Server Software:        Apache/2.2.12
Server Hostname:        localhost    
Server Port:            80           

Document Path:          /drupal/node/3
Document Length:        6173 bytes

Concurrency Level:      1
Time taken for tests:   17.414 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      664200 bytes
HTML transferred:       617300 bytes
Requests per second:    5.74 [#/sec] (mean)
Time per request:       174.136 [ms] (mean)
Time per request:       174.136 [ms] (mean, across all concurrent requests)
Transfer rate:          37.25 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   161  174  16.0    170     293
Waiting:      161  174  16.0    170     292
Total:        161  174  16.0    170     293

Percentage of the requests served within a certain time (ms)
  50%    170
  66%    176
  75%    179
  80%    181
  90%    188
  95%    203
  98%    205
  99%    293
 100%    293 (longest request)

Current HEAD:

This is ApacheBench, Version 2.3 <$Revision: 655654 $>                                         
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/                       
Licensed to The Apache Software Foundation, http://www.apache.org/                             

Benchmarking localhost (be patient).....done


Server Software:        Apache/2.2.12
Server Hostname:        localhost    
Server Port:            80           

Document Path:          /drupal/node/3
Document Length:        6173 bytes    

Concurrency Level:      1
Time taken for tests:   18.902 seconds
Complete requests:      100           
Failed requests:        0             
Write errors:           0             
Total transferred:      664200 bytes  
HTML transferred:       617300 bytes  
Requests per second:    5.29 [#/sec] (mean)
Time per request:       189.024 [ms] (mean)
Time per request:       189.024 [ms] (mean, across all concurrent requests)
Transfer rate:          34.31 [Kbytes/sec] received                        

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       3
Processing:   172  189  12.2    187     240
Waiting:      172  189  12.2    186     240
Total:        172  189  12.1    187     240

Percentage of the requests served within a certain time (ms)
  50%    187                                                
  66%    192                                                
  75%    195                                                
  80%    197                                                
  90%    203                                                
  95%    219                                                
  98%    222                                                
  99%    240                                                
 100%    240 (longest request)    
chx’s picture

Sry for function linktest_node_boot() {} i made the module bootstrap by hand anywyas -- not that it matters for outbound.

chx’s picture

Status: Active » Fixed

Oh well. I bow before the developer experience and webchick says most links will be cached anyways. At worst, I will run a hacked Drupal 7 -- that was always case, now won't be any other way.

webchick’s picture

Status: Fixed » Active

Er. The question I asked on IRC was whether the benchmarks you ran painted a realistic picture of a Drupal site, since calls to l() within hook_node_view would become part of the content array which afaik is cached. If this is still a realistic scenario regardless of that fact, then fine. I certainly don't want to slow Drupal 7 down any more than it actually is.

dave reid’s picture

Have my doubts that it was a realistic test, but I do understand the concern. Also see #523284: Optimize url().

dave reid’s picture

And to properly compare old vs new, make sure we are using worst-case scenario. Locale.module enabled and altering language prefixes (a non-default language path), and also using taxonomy_term_path() with forum.module.

dave reid’s picture

Also consider there *are* other modules out there that want/need to alter outgoing or incoming paths and using custom_url_rewrite inside a module is a complete WTF. See the list at http://drupal.org/project/url_alter

chx’s picture

Yes, there is that -- the test did not include the existing pieces indeed. There is more to be done here but now is not the time for me. I laid the groundwork, I let others continue.

effulgentsia’s picture

@chx and others concerned about performance of url(): please weigh in on #619566: Don't invoke hook_url_outbound_alter() if there's nothing to invoke (comment 15 on down).

Status: Active » Needs review

linno requested that failed test be re-tested.

dave reid’s picture

Status: Needs review » Active

Please don't do that. I hate this retest functionality in PIFR2.

gpk’s picture

Issue tags: +Performance

The benchmarks at #104 look completely wrong to me. Previously I could get a cached 304 response (as recorded in the timer column of {accesslog} by statistics.module) in about 9 or 10 ms [on a well-managed shared server]. Now it is 350- 400 ms. (An uncached response is around 450 - 700ms.) This does not surprise me - a full bootstrap is patently an order of magnitude heavier than booting up to the old "DRUPAL_BOOTSTRAP_PATH" level.

What this means is that at the moment the page cache doesn't do a lot for sites running statistics.module. I suspect that #104 had the page cache turned off, so in other words you get a 1 - 2 ms increase in page generation time (about 0.2% - 0.4%) for non-cached responses that are already doing a full bootstrap.

gpk’s picture

Priority: Normal » Critical

Bumping to critical...

What would be the best approach to resolving the problem with statistics.module? Given that it's too late to remove it entirely - maybe restore the DRUPAL_BOOTSTRAP_PATH phase and track which modules implement hook_url_in/outbound_alter() in the bootstrap column in {system} table?

dave reid’s picture

The problem is that the hook_url_alter() hooks run best with full bootstrap. Reducing these hooks to run on a reduced bootstrap is going to cause problems. The best solution at this point is people concerned about performance shouldn't be using statistics.module. I'm also working on a revamped 'analytics' module that could replace statistics in D8 that's more like a Google Analytics lite.

effulgentsia’s picture

effulgentsia’s picture

Shameless plug for #535612: Fix Drupal 6 url() function to call custom_url_rewrite_outbound for external links too. It's been sitting there for a while, and perhaps someone on this list is interested and can help move it along.

webchick’s picture

The best solution at this point is people concerned about performance shouldn't be using statistics.module.

That's not remotely an acceptable solution. I would prefer to roll this patch back than tell people this.

dave reid’s picture

Either way statistics.module causes load on cached pages before this patch and after it. It's always going to have an effect. That's what I'm referring to. I'd like to get a solution to help make this work better. Let's come up with something.

moshe weitzman’s picture

@webchick - thats always been my recommendation since drupal 1.0 or so. access logging in drupal is a really bad idea. you might think it is unacceptable, but it isn't new either.

catch’s picture

Yeah no-one should ever use statistics module on sites which care about performance. A cached page load should be around 5-10ms on head, a db_insert() is something like 7-15ms, so you just slowed your site down 50% for anonymous users. If statistics module does something else that's bad for performance, it's unlikely to make this already shocking situation significantly worse.

webchick’s picture

Oh. If we're talking normal "db writes on every page load is a really bad idea" kind of performance hit, then fine. Dave made it sound like this hook had introduced a horrific regression even worse than that.

effulgentsia’s picture

Looks to me like #121 is referring to the hunk in the patch that got committed (#103) that changed statistics_exit() to do a full bootstrap. So it's not just the cost of a db insert, it's the cost of a full bootstrap. @Dave: it's not obvious to me why this is needed. What does statistics_exit() do that requires a full bootstrap? What's the connection to url() and hook_url_outbound_alter()?

effulgentsia’s picture

Oh, I see. it's not url() and hook_url_outbound_alter(), it's having $_GET['q'] properly set, with hook_url_inbound_alter() having had a chance to run. Crap.

gpk’s picture

@130:
>Dave made it sound like this hook had introduced a horrific regression even worse than that.
Yes it has, if statistics.module is enabled.

Here are typical page timers I've seen for viewing a simple node or front page (as recorded in accesslog) with statistics.module enabled, on a shared server running an opcode cache. Vanilla core Drupal, only a few modules enabled, just a few nodes created.
- Prior to this patch making it into core: 9 - 10ms
- After this patch: 350 - 400ms

This is because of the full bootstrap in statistics_exit(). See #121.

For quiet or only moderately busy sites the performance delta can be an order of magnitude worse than this even since not all the relevant bits and pieces (code/cached code/related file & directory info) may be sitting in server RAM and may therefore have to be fetched from the HD. If there is other disk-intensive activity taking place then fetching all this stuff can take a while.

effulgentsia’s picture

Priority: Critical » Normal
Status: Active » Fixed
Issue tags: -Performance
catch’s picture

Category: task » bug
Priority: Normal » Critical
Status: Fixed » Active
Issue tags: +Performance

OK that's considerably worse than the db_insert() and sounds unacceptable to me. Although a full bootstrap taking 350ms also seems completely off the wall.

catch’s picture

Category: bug » task
Priority: Critical » Normal
Status: Active » Fixed

resetting status.

gpk’s picture

>a full bootstrap taking 350ms also seems completely off the wall
Seems to me roughly in line with what Dave and chx report above?

Status: Fixed » Closed (fixed)
Issue tags: -Performance, -DX (Developer Experience), -Needs documentation, -custom_url_rewrite, -API clean-up

Automatically closed -- issue fixed for 2 weeks with no activity.