Right now you are using "LIKE CONCAT(path, '%')" to compare the request with both {menu_router} and {url_alias}. That approach is quite imprecise and misses a lot of invalid paths.
You should definitely not add the % wildcard to the end of any path in {menu_router}, because this table contains all paths Drupal will be able to display, already containing all possible wildcards! With your current implementation, if you had a valid menu path "/music", all requests to any subpath such as "/music/wrongpage" would automatically be found valid, even if they don't exist.
The aliases in {url_alias} are technically supposed to be absolute, so no real need for an appended % either. However, if you want to include people like me, who are using the Extended path aliases module, you'll have to reproduce this functionality.
Here are the steps I think are necessary for a reliable, full verification:
- Check if the request exists in {menu_router}, using "LIKE path".
- Check if the request exists in {url_alias}, using "= alias".
- Check if the request exists in {url_alias}, using "LIKE CONCAT(alias, '%')".
- If 3 is true, substitute the found 'alias' part in the request with 'source', then check {menu_router} again.
Comment | File | Size | Author |
---|---|---|---|
#4 | fast_404-alias_exact_comparison-1297122-4.patch | 896 bytes | acrollet |
Comments
Comment #1
soyarma CreditAttribution: soyarma commentedI'll need to double check this. I was under the impression that paths for views get added to the menu_router table and they can accept and process arguments even if they are not defined.
Comment #2
ralf.strobel CreditAttribution: ralf.strobel commentedYou are right. As it turns out it's not just views, but it seems to be Drupal's default behavior to look for a page down the path if the exact request does not exist and call it with the remaining path as arguments.
However, that only goes for {menu_router}. Like I said, the entries in {url_alias} are supposed to be absolute by Drupal default. Extended path aliases will check them recursively by repeatedly popping arguments off the end and testing each truncated version against {menu_router}.
If you don't want to include support for path_alias_xt, just change your {url_alias} lookup to an absolute comparison and you should have reproduced Drupal's 404 behavior. With the support you should check using module_exists() whether the current installation uses path_alias_xt and then use my steps 2-4.
Comment #3
ralf.strobel CreditAttribution: ralf.strobel commentedActually, the precise {menu_router} comparison could be an interesting optional feature for your module. I have to say that I don't like Drupal's default behavior at all because it makes it possible to have several URLs resulting in the the same output, which is not SEO friendly and a waste of cache.
Comment #4
acrollet CreditAttribution: acrollet commentedI agree that this is a problem, as it doesn't match Drupal's alias validation behavior, and eliminates a good part of the functionality of this module. One may verify by looking at http://api.drupal.org/api/drupal/includes%21path.inc/function/drupal_loo... ->
In all cases, Drupal does a direct comparison of the alias when attempting to retrieve a source. So, I'm changing the issue type to a bug, and attaching a patch for review. This patch does not add compatibility with path_alias_xt, IMO that should be added as a separate feature request.
Comment #5
acrollet CreditAttribution: acrollet commentedTo clarify, the patch in #4 does not change the query for menu_router, only for url_alias.
Comment #6
Treidge CreditAttribution: Treidge commentedThat's pretty interesting and I believe it's an important issue. Seems like steps described in 1st post is exactly what I'd like to see added to this module.
This is a pretty nasty thing to have on a website. My D7 installation almost all the time is under siege by spammers. They are using xRumer to scan it and try to post spam comments all over the nodes and forums.
Quite unfortunately, but software they're using sends an malformed URL requests looking like this:
valid-node-path+Result:+chosen+nickname+%22NonEloseloutt%22;+nofollow+is+found;+success;
Because the first part of the path is valid, Drupal makes the full page display each time such URL is called, resulting in wasted server resources. It would be much better if we can just shot down such requests with 404 error.
I've applied provided patch and now trying to test it on a production environment. So far seems working good with no problems (except the fact that I've modified it a little bit to include fix from #1561418: fast_404_path_check not working with i18n to support i18n paths with language prefixes). Some wrong paths that prior the patch was displaying full page view now shows 404 error as expected, but some others don't. An issue with {menu_router} as was told in a 1st post?..
I will report in a few days to confirm that patch is working as expected.
Comment #7
Treidge CreditAttribution: Treidge commentedPatch in #4 works with no problems for 5 days on a production site, seems stable enough for me.
Comment #8
acrollet CreditAttribution: acrollet commentedMarking RTBC according to #7
Comment #9
soyarma CreditAttribution: soyarma commentedI've included the patch in #4. I think that supporting path_alias_xt is beyond the scope of this module. Also, as much as I also hate the way that foo and foo/moo could return the same page if the later does not exist, there's too much functionality in Drupal and contrib that expects this to work.
Comment #10
Leagnus CreditAttribution: Leagnus commentedon D7.41 with
$conf['fast_404_path_check'] = TRUE;
i can't even login or clear the cache.
I get "Not Found" page on /user page
and blank empty line break instead of admin menu on other pages.