Immediately after emptying the menu cache, the drupal_lookup_path()
function fails to find any path aliases. Steps to reproduce:
- Make sure that the menu and the path modules are enabled.
- Create a new node (e.g. a static page). Create a menu item for the node on-the-fly (e.g. in 'primary links'), and create a path alias for it on-the-fly.
- Submit the node.
- You will be taken to the page for the new node. However, the path you will be taken to is the internal path (i.e.
node/x
, NOT the aliased path. If there are any links to aliased pages (e.g. sidebar menu items) on the page, you should notice that they are also showing their internal paths, not their aliased paths. - Navigate somewhere else (e.g. to the front page of your site). You should notice that all links to aliased pages on the page are now showing their aliased paths, not their internal paths.
This only happens when you create a menu item and a path alias on-the-fly for a node. When you create a menu item on-the-fly, the menu cache is emptied, and this bug becomes evident. The bug can also be reproduced by manually emptying your site's menu cache, and then navigating to any page that should display aliased links. All such links will instead display their internal paths. Reload that same page, and the links will then become aliased.
This is a VERY serious bug, particularly for sites that (a) use full page caching, (b) have a lot of menu items for nodes, and (c) have a lot of path aliases for nodes. For such sites, a lot of their pages will be displayed to anonymous users with all links pointing to internal paths, rather than aliased paths.
Attached patch is a simple, one-line fix for the issue. The patch ensures that the url_alias
table is actually checked, for the first time that drupal_lookup_path()
is called with $action = 'alias'
for a given $path
.
The patch applies to Drupal 5.0 beta, and to Drupal 4.7 (same patch, with no offsets).
Comment | File | Size | Author |
---|---|---|---|
#43 | path_8.patch | 2.34 KB | chx |
#38 | path_7_0.patch | 2.64 KB | chx |
#35 | path_7.patch | 2.62 KB | robertDouglass |
#31 | path_alias_negative_cache_3.patch | 2.42 KB | chx |
#30 | path_alias_negative_cache_2.patch | 2.53 KB | chx |
Comments
Comment #1
webchickConfirmed both the presence of the bug, and the fact that this fix seems to work for me. RTBC.
Comment #2
chx CreditAttribution: chx commentedif there is no alias
$map[$path]
becomes FALSE. If you apply this patch then you have killed the caching.Comment #3
Jaza CreditAttribution: Jaza commentedOK, after further thought, and some discussion on IRC, it seems that the problem lies with the
_menu_build()
function in menu.inc:Attached patch takes this new approach. Tested, and it gives the same good results as the original patch, except without killing the alias caching mechanism in
drupal_lookup_path()
.Comment #4
chx CreditAttribution: chx commentedMuch better. Rerolled because the patch did not include the -p.
Comment #5
webchickConfirmed that this too fixes the problem. Would mark back to RTBC but it already was.
Comment #6
Dries CreditAttribution: Dries commentedCould we make the code comment a bit more accessible? Maybe add some context. Why would paths that aren't aliases corrupt the cache? Great thing you added documentation, but I don't think the documentation is transparent enough.
Comment #7
Jaza CreditAttribution: Jaza commentedI know this is getting a bit silly, but hey, I've changed my mind about this patch again! :P
Thanks for asking your question, Dries. It made me think: why would paths that aren't aliases corrupt the cache? And the only answer I could come up with was: they wouldn't. Or at least, they shouldn't.
This leads me to believe that the problem lies in the fact that the lookup of paths and the lookup of aliases is shared by a single cache. Attached is a new patch, that takes a THIRD approach to this bug, by creating two separate caches withindrupal_get_path()
, by using the$map[$action][$path]
construct instead of the$map[$path]
construct. In this way, the lookup of (cached) paths is completely separate to the lookup of (cached) aliases, and the two caching mechanisms cannot possibly interfere with each other, as is happening at the moment.I realise that this has a few disadvantages, such as: (a) the need to hold two (usually equivalent) caches in memory; and (b) the possibility of an empty cache being looked at, while a populated cache is ignored. But surely the advantage of the cache always being correct is enough to outweigh these disadvantages.No... on second thoughts, having two equivalent caches really is silly. So, here's a FOURTH attempt (I ditched the third attempt before uploading it). The problem is that each key of the
$map
cache is meant to be a system path, and each value is meant to be an alias. Whendrupal_get_path()
is called with$action = 'source'
, it's meant to be passed an alias (through$path
), and it's meant to return a system URL. If it finds the given alias in the DB, then it saves the system path for that alias in the cache, and it returns that system path. If it doesn't find the given alias, then it assumes that it was given a system path instead of an alias, and it returns FALSE.But what it shouldn't be doing is saving FALSE to the cache, as the alias value for that system path. Just because the system path is not an alias, doesn't mean that we've checked if it HAS any aliases, and doesn't mean that it hasn't GOT any aliases. So the simple solution is to return FALSE, but to not save that to the static cache.
Attached patch implements this approach. As with the previous patches, this patch has been tested, and it works at fixing the original bug. Hopefully I've finally hit upon the (most) correct way to do it.
Comment #8
Jaza CreditAttribution: Jaza commentedOops, forgot the -p again. :P
Comment #9
chx CreditAttribution: chx commentedI will not comment any more on this. I mailed Matt and Goba instead :)
Comment #10
m3avrck CreditAttribution: m3avrck commentedJaza's patch removes the same patch that went in to fix lookups: http://drupal.org/node/83234#comment-135035
Comment #11
drummI remember the patch that put those lines in. I tried getting the duplicate queries with this patch installed and didn't see any.
Committed to HEAD.
Comment #12
m3avrck CreditAttribution: m3avrck commentedThis should be backported for 4.7 too.
Comment #13
chx CreditAttribution: chx commentedStill I would like to hear from the path alias masters about all this.
Comment #14
Gábor HojtsyI would say others who submitted patches around this isse should also get contacted. Like http://drupal.org/node/65493 is a big issue which partly also deals with (and removes) this FALSE value assignment. Unfortunately I am not that deep in path aliasing now to intelligently comment on the issue.
Comment #15
robertDouglass CreditAttribution: robertDouglass commentedThis rolls back an important optimization. We need a different solution. If you have a Drupal site with path module turned on, with one node and one alias, this adds 16 additional database queries to the front page. Multiply that by every additional node, and you'll probably agree that this solution is not acceptable.
Comment #16
robertDouglass CreditAttribution: robertDouglass commented@drumm: to get duplicate queries on the current drupal HEAD:
1. enable path module
2. create a node
3. give it an alias
4. look at the front page.
Don't depend on the devel module to spot duplicate queries, there is a bug in it (I'm working on that at the moment).
Comment #17
robertDouglass CreditAttribution: robertDouglass commentedThis is the query that gets repeated:
SELECT src FROM url_alias WHERE dst = 'node'
Comment #18
robertDouglass CreditAttribution: robertDouglass commentedI've simplified the whole caching scheme. This saves some processing and is easier to read. I've reinstated caching FALSE because in this function, one combination of $action and $path is going to always return the same value, every time the function is called.
As far as I can tell, the patch that got committed only forced the function to do a database lookup and conclude that FALSE should be returned instead of returning the cached FALSE.
@Jeremy, I couldn't reproduce your problem, so I probably didn't understand it correctly. To clarify, in every case, after I've edited a node, updating it's menu and path settings (thus triggering a menu cache clear), I get q=alias on the View tab and q=node/1/edit on the edit tab, and q=alias in the menu.
Comment #19
Tobias Maier CreditAttribution: Tobias Maier commentedplease have a look at my patch of http://drupal.org/node/65493
I think it could help you, too
Comment #20
robertDouglass CreditAttribution: robertDouglass commentedTobias: your patch can be rerolled slightly simpler if mine goes in. The right order for the two issues is this on first, then yours. Also, I need a half a day to understand the 5-6 lines in block_list().
Comment #21
drummDoes this code need work or review?
Comment #22
chx CreditAttribution: chx commenteddrupal_get_normal_path does not get called as often as drupal_get_path_alias -- only with two URLs on a normal page from drupal_init_path and drupal_is_front_page. It's also called from drupal_access_denied, drupal_not_found and search_index, _menu_build and _menu_site_is_offline.
What I want to say is that the double caching has a very small memory penalty for the source part is small. I am not fond of. If mathias can find a better solution I would be happier with that .
Comment #23
Jaza CreditAttribution: Jaza commentedAs I said in #7 (above):
I agree with chx: let's try and implement a better solution than using two caches.
If the solution that was committed really does cause duplicate queries, then I think we should roll back what was committed, and simply commit my patch from #3 (above), which wipes the cache at the end of
_menu_build()
. That has already been tested to fix the original problem, WITHOUT destroying the alias caching mechanism.Anyway, I feel that I've been taking shots in the dark for too long in this thread - someone who knows what they're doing needs to fix this up now. :P
Comment #24
chx CreditAttribution: chx commentedthat patch masks only the problem. if some other script would call drupal_get_normal_path with system paths then the alias cache in memory gets corruped.
Comment #25
robertDouglass CreditAttribution: robertDouglass commentedI just wanted to officially retract #18. The potentially higher memory usage doesn't bother me as much as the fact that looking up a source and then a destination always results in two queries instead of the destination benefitting from the first lookup. This would be the disadvantage with any double cache, I think.
Comment #26
chx CreditAttribution: chx commentedLet's begin with http://drupal.org/node/83234 for we are actually back to where we were before that went in.
What's the intention there? We want to record the negative result of
SELECT src FROM {url_alias} WHERE dst = '%s'
. Then let's do that and nothing else. This does not mandate two full caches.Comment #27
robertDouglass CreditAttribution: robertDouglass commentedI like Chx's approach. There was a bug in it (wrong return value for one case) and I reworked the code comments.
Comment #28
Tobias Maier CreditAttribution: Tobias Maier commentedis false
I investigated this function very deeply in this issue
the right code is
believe it or not
but drupal_lookup_path() should return FALSE if we did not find any alias or the real source.
$map[$src] = $path;
is not good, tooBecause why do you believe that $path is the same as what you would get as if you would do
drupal_lookup_path('alias', 'foo');
?(this happens if you have more than one alias for any path defined)
I think there is no other way then to separate the the $map array in a part which is there for 'alias' and one for 'source'
This is what you can see in my last patch, too
If you look closely then you can see
$map_source[$path] = $src;
. You could argue that$map_source...
is missing but that cant be defined in this case.please try my patch it may fix your problems, too...
I know I mixed a lot of different things in this patch. (Fixing a few bugs or inconsistencies I found)
Comment #29
chx CreditAttribution: chx commentedHow do you know what you are going to get when you run
drupal_lookup_path('alias', 'foo')
if there are several aliases of the same path? With the current code you have no idea which alias you will get, especially on postgresql there is no such a thing as 'table order'. So what we return after this patch is not worse than it was -- it is simply one of the many possible corrects. I updated the doxygen to reflect this situation.It is true that
$map[$path] = FALSE;
is more correct i rerolled the patch as such.Comment #30
chx CreditAttribution: chx commentedI upped the wrong patch -- and suddenly it struck me that we can simplify the code further so I did.
Comment #31
chx CreditAttribution: chx commentedEven less code, same principles.
Comment #32
drummIs devel module fixed?
Does the patch from #8 need to be reverted first?
I applied #31 and devel module is showing duplicate queries.
Comment #33
chx CreditAttribution: chx commentedWhat duplicate queries? Have you tried checking by hand (for example inserting a message into the new parts of the drupal_lookup_path function)?
Comment #34
chx CreditAttribution: chx commentedAs this fixes some errant behaviour, too, http://drupal.org/node/78377 should get in after this.
Comment #35
robertDouglass CreditAttribution: robertDouglass commented#31 has a bug in it because isset($array) isn't working right when the array value is NULL. This version looks RTBC to me.
Comment #36
robertDouglass CreditAttribution: robertDouglass commentedScratch #35, it is just a workaround for THIS CRITICAL BUG: http://drupal.org/node/98988
So #31 is the patch to go with, and is RTBC, but we will not reap the benefits of static caching without 98988.
Comment #37
drummAny objection to applying #35 if #98988 doesn't move quickly? We can always undo the workaround later, but having working code in the meantime is nice.
Comment #38
chx CreditAttribution: chx commentedI read your concerns on db_results and find them unfounded but I will coment there.
If you insist then at least please consider this one. array_key_exists is 2.5x slower than isset we learned this during form API and used the same construct.
Comment #39
drummCommitted to HEAD. Sorry about the commit message, didn't notice that was already applied locally.
The extra condition in the if statement can be cleaned up once the db_result() change happens.
Comment #40
Tobias Maier CreditAttribution: Tobias Maier commentednew regression: drupal_lookup_path returns NULL instead of FALSE
Comment #41
(not verified) CreditAttribution: commentedComment #42
RobRoy CreditAttribution: RobRoy commentedDoes that last patch need to be ported to 4.7.x-dev?
Comment #43
chx CreditAttribution: chx commentedI simply copypasted HEAD drupal_lookup_path, did a quick sanity review and here it is.
Comment #44
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedapplied
Comment #45
(not verified) CreditAttribution: commented