The aliases for nodes are not being indexed yet this would go a long way to increasing relevancy. It would make projects (search d.o. for jstools, for example), bubble higher to top because of their aliases. The solution is to implement an update index hook in path.module's nodeapi.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

robertDouglass’s picture

Component: search.module » path.module
Assigned: Unassigned » robertDouglass
Status: Active » Needs review
FileSize
1015 bytes

Here it is. Path alias is fetched in a language specific manner, ascribed a high ranking value, and given to the indexer.

moshe weitzman’s picture

very nice idea

puregin’s picture

We need to address nodes that are specifically added to menus, by adding the menu label text and giving the node a rank boost.

douggreen’s picture

LIVE FROM THE MINNESOTA SEARCH SPRINT...

I talked to Robert about this and I'm not convinced yet that this is a great idea, especially for core. This patch will only improve the keyword relevancy on nodes that have a single word path aliases. For example, if the url alias is http://example/drupal_handbook, and we index <H2>drupal_handbook</H2>, the search engine will convert "drupal_handbook" to "drupalhandbook", and searching for "drupal", "handbook", or "drupal handbook" will all fail.

While this is a good idea, it's not going to give us much of a benefit until we improve the handling of hyphenations and underscores.

And if we don't do this here, any contrib module is free to implement the 'update index' op and add this code.

lilou’s picture

Patch cannot apply against HEAD since #314244: remove $op from hook_nodeapi commit.

lilou’s picture

Status: Needs review » Needs work

@douggreen : we can use a simple replacement like

$output = str_replace('_', ' ', drupal_get_path_alias($path, $language));

robertDouglass’s picture

@douggreen:

And if we don't do this here, any contrib module is free to implement the 'update index' op and add this code

My opinion is that the path module should interact with the rest of Drupal core in an ideal way and not have to depend on a whole contrib module just to add a path alias to the search index.

As for the hyphenation handling, it's a bug in the search module, and while it inhibits the usefulness of this patch, I don't think it should be addressed here.

robertDouglass’s picture

Status: Needs work » Needs review
FileSize
1.05 KB

Rerolled.

Anonymous’s picture

Status: Needs review » Needs work

If there is a need for

+  $language = empty($node->language) ? '' : $node->language;

shouldn't it be done in drupal_lookup_path or in node_save? I think this is fluff that can be removed.

robertDouglass’s picture

@earnie: Maybe the $node has the alias as an attribute when it comes in and I didn't notice. Otherwise, the signature for drupal_get_path_alias pretty much demands that if you want a localized path, you send in a language, which is an attribute on the node.

function drupal_get_path_alias($path, $path_language = '') {
  $result = $path;
  if ($alias = drupal_lookup_path('alias', $path, $path_language)) {
    $result = $alias;
  }
  return $result;
}
Anonymous’s picture

Robert $node->language can be passed to drupal_get_path_alias without needing to check for its existence. The value of a non-existent class property is NULL. There isn't a need to protect against a non-existent property.

robertDouglass’s picture

Ah, you're totally correct. Now I understand.

robertDouglass’s picture

I think you might get a warning for trying to access a nonexistent member of an object. That's probably why path's node_load implementation looks like this:

function path_nodeapi_load(&$node, $arg) {
  $language = isset($node->language) ? $node->language : '';
  $path = 'node/' . $node->nid;
  $alias = drupal_get_path_alias($path, $language);
  if ($path != $alias) {
    $node->path = $alias;
  }
}

This also means, however, that I have the language adjusted path already available, so I don't need to repeat the work.

robertDouglass’s picture

FileSize
818 bytes

This may be the simplest implementation. Now however, I'm thinking that if a node has many path aliases (this happens sometimes), they should all get added. I think I'm going to roll another one that actually polls the database to get all of the aliases.

robertDouglass’s picture

Status: Needs work » Needs review

But if I don't have time to re-roll, someone else should, and if they don't this patch is still an improvement.

deviantintegral’s picture

I'm not sure if every alias should be indexed. What if the node's title has changed, and the user isn't doing something smarter like a redirect for the old URL? Since 302 redirects aren't in core, I've seen quite a few non-technical users with sites where multiple path aliases exist just so links don't break.

Also, I think the suggestion up at #6 should be implemented. Pathauto has a set of characters which are user configurable to be considered punctuation. Is there something similar in core we can use?

robertDouglass’s picture

Is there something similar in core we can use?

Actually, the handling for putting things in the search index should handle punctuation better. There are several common characters that get stripped, (the minus is one of them - ) which should in fact either get replaced with a space, or both (added in two different forms). Therefore my recommendation is to let this patch go in without taking care of punctuation but then handling improved punctuation in core search in another patch.

catch’s picture

Centralising punctuation replacement/stripping in another patch seems good.

I think this should come with a test though.

Status: Needs review » Needs work

The last submitted patch failed testing.

Gurpartap Singh’s picture

Status: Needs work » Needs review

Simple feature with an interesting scope!

Patch still applies.

cburschka’s picture

Status: Needs review » Needs work

1.) The hook is called "hook_node_update_index", not "hook_nodeapi_update_index".

2.) If underlines and hyphens aren't stripped, then what does the search module do with the forward slashes that alias paths contain by definition? Are those separated properly? Even more so than the underline or hyphens, this is essential. If we end up with compound words like "contenttitle" from content/title, or "projectdrupal" from project/drupal, then this patch adds nothing of value to the search index, and shouldn't go in without the stripping logic taken care of first (whether in a separate patch or here).

deviantintegral’s picture

Version: 7.x-dev » 8.x-dev
jhodgdon’s picture

Title: Path module should add URL alias to update index in nodapi » Add path aliases to search index
Component: path.module » search.module

I'm moving this to the search module issue queue, as it's mostly a search feature. Still needs to be D8 at this point.

jhodgdon’s picture

Just as a note, this should apply also for links to nodes internal to the node text, see
#256692: Search indexer doesn't take aliases into account when determining relevancy
which I've marked as a duplicate of this issue.

jhodgdon’s picture

Version: 8.0.x-dev » 8.1.x-dev
Issue summary: View changes

Since 8.0.x-beta1 has been released, our policy at this point is No feature requests until 8.1.x. See #2350615: [policy, no patch] What changes can be accepted during the Drupal 8 beta phase?. Sorry, it's just too late for 8.0.x at this point, so even if we had a viable patch, the core committers would not commit it. So unless we decide this is a Task or a Bug (and I don't think it is), we'll have to delay it.

mgifford’s picture

Assigned: robertDouglass » Unassigned

There has been no new work on this issue in quite some time. So I'm assuming the person assigned is no longer being actively pursuing it. Sincere apologies if this is wrong.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.