Immediately after emptying the menu cache, the drupal_lookup_path() function fails to find any path aliases. Steps to reproduce:

  1. Make sure that the menu and the path modules are enabled.
  2. Create a new node (e.g. a static page). Create a menu item for the node on-the-fly (e.g. in 'primary links'), and create a path alias for it on-the-fly.
  3. Submit the node.
  4. You will be taken to the page for the new node. However, the path you will be taken to is the internal path (i.e. node/x, NOT the aliased path. If there are any links to aliased pages (e.g. sidebar menu items) on the page, you should notice that they are also showing their internal paths, not their aliased paths.
  5. Navigate somewhere else (e.g. to the front page of your site). You should notice that all links to aliased pages on the page are now showing their aliased paths, not their internal paths.

This only happens when you create a menu item and a path alias on-the-fly for a node. When you create a menu item on-the-fly, the menu cache is emptied, and this bug becomes evident. The bug can also be reproduced by manually emptying your site's menu cache, and then navigating to any page that should display aliased links. All such links will instead display their internal paths. Reload that same page, and the links will then become aliased.

This is a VERY serious bug, particularly for sites that (a) use full page caching, (b) have a lot of menu items for nodes, and (c) have a lot of path aliases for nodes. For such sites, a lot of their pages will be displayed to anonymous users with all links pointing to internal paths, rather than aliased paths.

Attached patch is a simple, one-line fix for the issue. The patch ensures that the url_alias table is actually checked, for the first time that drupal_lookup_path() is called with $action = 'alias' for a given $path.

The patch applies to Drupal 5.0 beta, and to Drupal 4.7 (same patch, with no offsets).

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

webchick’s picture

Status: Needs review » Reviewed & tested by the community

Confirmed both the presence of the bug, and the fact that this fix seems to work for me. RTBC.

chx’s picture

Status: Reviewed & tested by the community » Needs work
      if (isset($map[$path])) {
        return $map[$path];
      }
      if ($alias = db_result(db_query("SELECT dst FROM {url_alias} WHERE src = '%s'", $path))) {
        $map[$path] = $alias;
        return $alias;
      }

if there is no alias $map[$path] becomes FALSE. If you apply this patch then you have killed the caching.

Jaza’s picture

Status: Needs work » Needs review
FileSize
616 bytes

OK, after further thought, and some discussion on IRC, it seems that the problem lies with the _menu_build() function in menu.inc:

When _menu_build() is run, it calls drupal_lookup_path() with $action = 'source' for all paths (via drupal_get_normal_path()), and this causes $map[$path] to get set to FALSE for everything. A better solution would be for _menu_build() to wipe the drupal_lookup_path() cache, once it's done checking for aliases; because _menu_build() is the one that's corrupting the cache in the first place, so it should wipe the cache in order to fix it up.

Attached patch takes this new approach. Tested, and it gives the same good results as the original patch, except without killing the alias caching mechanism in drupal_lookup_path().

chx’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
479 bytes

Much better. Rerolled because the patch did not include the -p.

webchick’s picture

Confirmed that this too fixes the problem. Would mark back to RTBC but it already was.

Dries’s picture

Status: Reviewed & tested by the community » Needs work
+    // Delete the alias cache, since calling drupal_get_normal_path() with
+    // paths that aren't aliases may have corrupted it.

Could we make the code comment a bit more accessible? Maybe add some context. Why would paths that aren't aliases corrupt the cache? Great thing you added documentation, but I don't think the documentation is transparent enough.

Jaza’s picture

Status: Needs work » Needs review
FileSize
455 bytes

I know this is getting a bit silly, but hey, I've changed my mind about this patch again! :P

Thanks for asking your question, Dries. It made me think: why would paths that aren't aliases corrupt the cache? And the only answer I could come up with was: they wouldn't. Or at least, they shouldn't.

This leads me to believe that the problem lies in the fact that the lookup of paths and the lookup of aliases is shared by a single cache. Attached is a new patch, that takes a THIRD approach to this bug, by creating two separate caches within drupal_get_path(), by using the $map[$action][$path] construct instead of the $map[$path] construct. In this way, the lookup of (cached) paths is completely separate to the lookup of (cached) aliases, and the two caching mechanisms cannot possibly interfere with each other, as is happening at the moment.

I realise that this has a few disadvantages, such as: (a) the need to hold two (usually equivalent) caches in memory; and (b) the possibility of an empty cache being looked at, while a populated cache is ignored. But surely the advantage of the cache always being correct is enough to outweigh these disadvantages.

No... on second thoughts, having two equivalent caches really is silly. So, here's a FOURTH attempt (I ditched the third attempt before uploading it). The problem is that each key of the $map cache is meant to be a system path, and each value is meant to be an alias. When drupal_get_path() is called with $action = 'source', it's meant to be passed an alias (through $path), and it's meant to return a system URL. If it finds the given alias in the DB, then it saves the system path for that alias in the cache, and it returns that system path. If it doesn't find the given alias, then it assumes that it was given a system path instead of an alias, and it returns FALSE.

But what it shouldn't be doing is saving FALSE to the cache, as the alias value for that system path. Just because the system path is not an alias, doesn't mean that we've checked if it HAS any aliases, and doesn't mean that it hasn't GOT any aliases. So the simple solution is to return FALSE, but to not save that to the static cache.

Attached patch implements this approach. As with the previous patches, this patch has been tested, and it works at fixing the original bug. Hopefully I've finally hit upon the (most) correct way to do it.

Jaza’s picture

FileSize
499 bytes

Oops, forgot the -p again. :P

chx’s picture

I will not comment any more on this. I mailed Matt and Goba instead :)

m3avrck’s picture

Jaza's patch removes the same patch that went in to fix lookups: http://drupal.org/node/83234#comment-135035

drumm’s picture

Status: Needs review » Fixed

I remember the patch that put those lines in. I tried getting the duplicate queries with this patch installed and didn't see any.

Committed to HEAD.

m3avrck’s picture

Version: 5.x-dev » 4.7.x-dev

This should be backported for 4.7 too.

chx’s picture

Status: Fixed » Reviewed & tested by the community

Still I would like to hear from the path alias masters about all this.

Gábor Hojtsy’s picture

I would say others who submitted patches around this isse should also get contacted. Like http://drupal.org/node/65493 is a big issue which partly also deals with (and removes) this FALSE value assignment. Unfortunately I am not that deep in path aliasing now to intelligently comment on the issue.

robertDouglass’s picture

Version: 4.7.x-dev » 5.x-dev
Status: Reviewed & tested by the community » Needs work

This rolls back an important optimization. We need a different solution. If you have a Drupal site with path module turned on, with one node and one alias, this adds 16 additional database queries to the front page. Multiply that by every additional node, and you'll probably agree that this solution is not acceptable.

robertDouglass’s picture

@drumm: to get duplicate queries on the current drupal HEAD:

1. enable path module
2. create a node
3. give it an alias
4. look at the front page.

Don't depend on the devel module to spot duplicate queries, there is a bug in it (I'm working on that at the moment).

robertDouglass’s picture

This is the query that gets repeated:

SELECT src FROM url_alias WHERE dst = 'node'

robertDouglass’s picture

FileSize
1.48 KB

I've simplified the whole caching scheme. This saves some processing and is easier to read. I've reinstated caching FALSE because in this function, one combination of $action and $path is going to always return the same value, every time the function is called.

As far as I can tell, the patch that got committed only forced the function to do a database lookup and conclude that FALSE should be returned instead of returning the cached FALSE.

@Jeremy, I couldn't reproduce your problem, so I probably didn't understand it correctly. To clarify, in every case, after I've edited a node, updating it's menu and path settings (thus triggering a menu cache clear), I get q=alias on the View tab and q=node/1/edit on the edit tab, and q=alias in the menu.

Tobias Maier’s picture

please have a look at my patch of http://drupal.org/node/65493
I think it could help you, too

robertDouglass’s picture

Tobias: your patch can be rerolled slightly simpler if mine goes in. The right order for the two issues is this on first, then yours. Also, I need a half a day to understand the 5-6 lines in block_list().

drumm’s picture

Does this code need work or review?

chx’s picture

drupal_get_normal_path does not get called as often as drupal_get_path_alias -- only with two URLs on a normal page from drupal_init_path and drupal_is_front_page. It's also called from drupal_access_denied, drupal_not_found and search_index, _menu_build and _menu_site_is_offline.

What I want to say is that the double caching has a very small memory penalty for the source part is small. I am not fond of. If mathias can find a better solution I would be happier with that .

Jaza’s picture

Assigned: Jaza » Unassigned

As I said in #7 (above):

No... on second thoughts, having two equivalent caches really is silly.

I agree with chx: let's try and implement a better solution than using two caches.

If the solution that was committed really does cause duplicate queries, then I think we should roll back what was committed, and simply commit my patch from #3 (above), which wipes the cache at the end of _menu_build(). That has already been tested to fix the original problem, WITHOUT destroying the alias caching mechanism.

Anyway, I feel that I've been taking shots in the dark for too long in this thread - someone who knows what they're doing needs to fix this up now. :P

chx’s picture

that patch masks only the problem. if some other script would call drupal_get_normal_path with system paths then the alias cache in memory gets corruped.

robertDouglass’s picture

I just wanted to officially retract #18. The potentially higher memory usage doesn't bother me as much as the fact that looking up a source and then a destination always results in two queries instead of the destination benefitting from the first lookup. This would be the disadvantage with any double cache, I think.

chx’s picture

Status: Needs work » Needs review
FileSize
1.68 KB

Let's begin with http://drupal.org/node/83234 for we are actually back to where we were before that went in.

What's the intention there? We want to record the negative result of SELECT src FROM {url_alias} WHERE dst = '%s'. Then let's do that and nothing else. This does not mandate two full caches.

robertDouglass’s picture

I like Chx's approach. There was a bug in it (wrong return value for one case) and I reworked the code comments.

Tobias Maier’s picture

+      	// If there is no alias for this path, cache the path as its own alias
         $map[$path] = $path;

is false
I investigated this function very deeply in this issue
the right code is

+      	// If there is no alias for this path, cache FALSE, so we know that we already looked for it.
         $map[$path] = FALSE;

believe it or not
but drupal_lookup_path() should return FALSE if we did not find any alias or the real source.

+      if ($src = db_result(db_query("SELECT src FROM {url_alias} WHERE dst = '%s'", $path))) {
+        // Cache the $src/$path pair, but return the $src.
+        $map[$src] = $path;

$map[$src] = $path; is not good, too
Because why do you believe that $path is the same as what you would get as if you would do drupal_lookup_path('alias', 'foo');?
(this happens if you have more than one alias for any path defined)

I think there is no other way then to separate the the $map array in a part which is there for 'alias' and one for 'source'
This is what you can see in my last patch, too
If you look closely then you can see $map_source[$path] = $src;. You could argue that $map_source... is missing but that cant be defined in this case.

please try my patch it may fix your problems, too...
I know I mixed a lot of different things in this patch. (Fixing a few bugs or inconsistencies I found)

chx’s picture

Title: Path alias lookup fails after emptying menu cache » Path alias memory cache is broken
FileSize
1.68 KB

Because why do you believe that $path is the same as what you would get as if you would do drupal_lookup_path('alias', 'foo');?

How do you know what you are going to get when you run drupal_lookup_path('alias', 'foo') if there are several aliases of the same path? With the current code you have no idea which alias you will get, especially on postgresql there is no such a thing as 'table order'. So what we return after this patch is not worse than it was -- it is simply one of the many possible corrects. I updated the doxygen to reflect this situation.

It is true that $map[$path] = FALSE; is more correct i rerolled the patch as such.

chx’s picture

I upped the wrong patch -- and suddenly it struck me that we can simplify the code further so I did.

chx’s picture

Even less code, same principles.

drumm’s picture

Is devel module fixed?
Does the patch from #8 need to be reverted first?

I applied #31 and devel module is showing duplicate queries.

chx’s picture

What duplicate queries? Have you tried checking by hand (for example inserting a message into the new parts of the drupal_lookup_path function)?

chx’s picture

As this fixes some errant behaviour, too, http://drupal.org/node/78377 should get in after this.

robertDouglass’s picture

FileSize
2.62 KB

#31 has a bug in it because isset($array) isn't working right when the array value is NULL. This version looks RTBC to me.

robertDouglass’s picture

Status: Needs review » Reviewed & tested by the community

Scratch #35, it is just a workaround for THIS CRITICAL BUG: http://drupal.org/node/98988

So #31 is the patch to go with, and is RTBC, but we will not reap the benefits of static caching without 98988.

drumm’s picture

Any objection to applying #35 if #98988 doesn't move quickly? We can always undo the workaround later, but having working code in the meantime is nice.

chx’s picture

FileSize
2.64 KB

I read your concerns on db_results and find them unfounded but I will coment there.

If you insist then at least please consider this one. array_key_exists is 2.5x slower than isset we learned this during form API and used the same construct.

drumm’s picture

Status: Reviewed & tested by the community » Fixed

Committed to HEAD. Sorry about the commit message, didn't notice that was already applied locally.

The extra condition in the if statement can be cleaned up once the db_result() change happens.

Tobias Maier’s picture

Anonymous’s picture

Status: Fixed » Closed (fixed)
RobRoy’s picture

Version: 5.x-dev » 4.7.4
Status: Closed (fixed) » Patch (to be ported)

Does that last patch need to be ported to 4.7.x-dev?

chx’s picture

Status: Patch (to be ported) » Reviewed & tested by the community
FileSize
2.34 KB

I simply copypasted HEAD drupal_lookup_path, did a quick sanity review and here it is.

killes@www.drop.org’s picture

Status: Reviewed & tested by the community » Fixed

applied

Anonymous’s picture

Status: Fixed » Closed (fixed)