Right now you are using "LIKE CONCAT(path, '%')" to compare the request with both {menu_router} and {url_alias}. That approach is quite imprecise and misses a lot of invalid paths.

You should definitely not add the % wildcard to the end of any path in {menu_router}, because this table contains all paths Drupal will be able to display, already containing all possible wildcards! With your current implementation, if you had a valid menu path "/music", all requests to any subpath such as "/music/wrongpage" would automatically be found valid, even if they don't exist.

The aliases in {url_alias} are technically supposed to be absolute, so no real need for an appended % either. However, if you want to include people like me, who are using the Extended path aliases module, you'll have to reproduce this functionality.

Here are the steps I think are necessary for a reliable, full verification:

  1. Check if the request exists in {menu_router}, using "LIKE path".
  2. Check if the request exists in {url_alias}, using "= alias".
  3. Check if the request exists in {url_alias}, using "LIKE CONCAT(alias, '%')".
  4. If 3 is true, substitute the found 'alias' part in the request with 'source', then check {menu_router} again.
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

soyarma’s picture

I'll need to double check this. I was under the impression that paths for views get added to the menu_router table and they can accept and process arguments even if they are not defined.

ralf.strobel’s picture

You are right. As it turns out it's not just views, but it seems to be Drupal's default behavior to look for a page down the path if the exact request does not exist and call it with the remaining path as arguments.

However, that only goes for {menu_router}. Like I said, the entries in {url_alias} are supposed to be absolute by Drupal default. Extended path aliases will check them recursively by repeatedly popping arguments off the end and testing each truncated version against {menu_router}.

If you don't want to include support for path_alias_xt, just change your {url_alias} lookup to an absolute comparison and you should have reproduced Drupal's 404 behavior. With the support you should check using module_exists() whether the current installation uses path_alias_xt and then use my steps 2-4.

ralf.strobel’s picture

Actually, the precise {menu_router} comparison could be an interesting optional feature for your module. I have to say that I don't like Drupal's default behavior at all because it makes it possible to have several URLs resulting in the the same output, which is not SEO friendly and a waste of cache.

acrollet’s picture

Category: feature » bug
Status: Active » Needs review
FileSize
896 bytes

I agree that this is a problem, as it doesn't match Drupal's alias validation behavior, and eliminates a good part of the functionality of this module. One may verify by looking at http://api.drupal.org/api/drupal/includes%21path.inc/function/drupal_loo... ->

    elseif ($action == 'source' && !isset($cache['no_source'][$path_language][$path])) {
      // Look for the value $path within the cached $map
      $source = FALSE;
      if (!isset($cache['map'][$path_language]) || !($source = array_search($path, $cache['map'][$path_language]))) {
        $args = array(
          ':alias' => $path, 
          ':language' => $path_language, 
          ':language_none' => LANGUAGE_NONE,
        );
        // See the queries above.
        if ($path_language == LANGUAGE_NONE) {
          unset($args[':language']);
          $result = db_query("SELECT source FROM {url_alias} WHERE alias = :alias AND language = :language_none ORDER BY pid DESC", $args);
        }
        elseif ($path_language > LANGUAGE_NONE) {
          $result = db_query("SELECT source FROM {url_alias} WHERE alias = :alias AND language IN (:language, :language_none) ORDER BY language DESC, pid DESC", $args);
        }
        else {
          $result = db_query("SELECT source FROM {url_alias} WHERE alias = :alias AND language IN (:language, :language_none) ORDER BY language ASC, pid DESC", $args);
        }
        if ($source = $result->fetchField()) {
          $cache['map'][$path_language][$source] = $path;
        }
        else {
          // We can't record anything into $map because we do not have a valid
          // index and there is no need because we have not learned anything
          // about any Drupal path. Thus cache to $no_source.
          $cache['no_source'][$path_language][$path] = TRUE;
        }
      }
      return $source;
    }

In all cases, Drupal does a direct comparison of the alias when attempting to retrieve a source. So, I'm changing the issue type to a bug, and attaching a patch for review. This patch does not add compatibility with path_alias_xt, IMO that should be added as a separate feature request.

acrollet’s picture

To clarify, the patch in #4 does not change the query for menu_router, only for url_alias.

Treidge’s picture

That's pretty interesting and I believe it's an important issue. Seems like steps described in 1st post is exactly what I'd like to see added to this module.

if you had a valid menu path "/music", all requests to any subpath such as "/music/wrongpage" would automatically be found valid, even if they don't exist.

This is a pretty nasty thing to have on a website. My D7 installation almost all the time is under siege by spammers. They are using xRumer to scan it and try to post spam comments all over the nodes and forums.

Quite unfortunately, but software they're using sends an malformed URL requests looking like this:

valid-node-path+Result:+chosen+nickname+%22NonEloseloutt%22;+nofollow+is+found;+success;

Because the first part of the path is valid, Drupal makes the full page display each time such URL is called, resulting in wasted server resources. It would be much better if we can just shot down such requests with 404 error.

I've applied provided patch and now trying to test it on a production environment. So far seems working good with no problems (except the fact that I've modified it a little bit to include fix from #1561418: fast_404_path_check not working with i18n to support i18n paths with language prefixes). Some wrong paths that prior the patch was displaying full page view now shows 404 error as expected, but some others don't. An issue with {menu_router} as was told in a 1st post?..

I will report in a few days to confirm that patch is working as expected.

Treidge’s picture

Patch in #4 works with no problems for 5 days on a production site, seems stable enough for me.

acrollet’s picture

Status: Needs review » Reviewed & tested by the community

Marking RTBC according to #7

soyarma’s picture

Status: Reviewed & tested by the community » Closed (fixed)

I've included the patch in #4. I think that supporting path_alias_xt is beyond the scope of this module. Also, as much as I also hate the way that foo and foo/moo could return the same page if the later does not exist, there's too much functionality in Drupal and contrib that expects this to work.

Leagnus’s picture

on D7.41 with
$conf['fast_404_path_check'] = TRUE;
i can't even login or clear the cache.
I get "Not Found" page on /user page
and blank empty line break instead of admin menu on other pages.