Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
The aliases for nodes are not being indexed yet this would go a long way to increasing relevancy. It would make projects (search d.o. for jstools, for example), bubble higher to top because of their aliases. The solution is to implement an update index hook in path.module's nodeapi.
Comment | File | Size | Author |
---|---|---|---|
#14 | path.patch | 818 bytes | robertDouglass |
#8 | path_search.patch | 1.05 KB | robertDouglass |
#1 | search-path-alias.patch | 1015 bytes | robertDouglass |
Comments
Comment #1
robertDouglass CreditAttribution: robertDouglass commentedHere it is. Path alias is fetched in a language specific manner, ascribed a high ranking value, and given to the indexer.
Comment #2
moshe weitzman CreditAttribution: moshe weitzman commentedvery nice idea
Comment #3
puregin CreditAttribution: puregin commentedWe need to address nodes that are specifically added to menus, by adding the menu label text and giving the node a rank boost.
Comment #4
douggreen CreditAttribution: douggreen commentedLIVE FROM THE MINNESOTA SEARCH SPRINT...
I talked to Robert about this and I'm not convinced yet that this is a great idea, especially for core. This patch will only improve the keyword relevancy on nodes that have a single word path aliases. For example, if the url alias is http://example/drupal_handbook, and we index
<H2>drupal_handbook</H2>
, the search engine will convert "drupal_handbook" to "drupalhandbook", and searching for "drupal", "handbook", or "drupal handbook" will all fail.While this is a good idea, it's not going to give us much of a benefit until we improve the handling of hyphenations and underscores.
And if we don't do this here, any contrib module is free to implement the 'update index' op and add this code.
Comment #5
lilou CreditAttribution: lilou commentedPatch cannot apply against HEAD since #314244: remove $op from hook_nodeapi commit.
Comment #6
lilou CreditAttribution: lilou commented@douggreen : we can use a simple replacement like
$output = str_replace('_', ' ', drupal_get_path_alias($path, $language));
Comment #7
robertDouglass CreditAttribution: robertDouglass commented@douggreen:
My opinion is that the path module should interact with the rest of Drupal core in an ideal way and not have to depend on a whole contrib module just to add a path alias to the search index.
As for the hyphenation handling, it's a bug in the search module, and while it inhibits the usefulness of this patch, I don't think it should be addressed here.
Comment #8
robertDouglass CreditAttribution: robertDouglass commentedRerolled.
Comment #9
Anonymous (not verified) CreditAttribution: Anonymous commentedIf there is a need for
shouldn't it be done in drupal_lookup_path or in node_save? I think this is fluff that can be removed.
Comment #10
robertDouglass CreditAttribution: robertDouglass commented@earnie: Maybe the $node has the alias as an attribute when it comes in and I didn't notice. Otherwise, the signature for drupal_get_path_alias pretty much demands that if you want a localized path, you send in a language, which is an attribute on the node.
Comment #11
Anonymous (not verified) CreditAttribution: Anonymous commentedRobert $node->language can be passed to drupal_get_path_alias without needing to check for its existence. The value of a non-existent class property is NULL. There isn't a need to protect against a non-existent property.
Comment #12
robertDouglass CreditAttribution: robertDouglass commentedAh, you're totally correct. Now I understand.
Comment #13
robertDouglass CreditAttribution: robertDouglass commentedI think you might get a warning for trying to access a nonexistent member of an object. That's probably why path's node_load implementation looks like this:
This also means, however, that I have the language adjusted path already available, so I don't need to repeat the work.
Comment #14
robertDouglass CreditAttribution: robertDouglass commentedThis may be the simplest implementation. Now however, I'm thinking that if a node has many path aliases (this happens sometimes), they should all get added. I think I'm going to roll another one that actually polls the database to get all of the aliases.
Comment #15
robertDouglass CreditAttribution: robertDouglass commentedBut if I don't have time to re-roll, someone else should, and if they don't this patch is still an improvement.
Comment #16
deviantintegral CreditAttribution: deviantintegral commentedI'm not sure if every alias should be indexed. What if the node's title has changed, and the user isn't doing something smarter like a redirect for the old URL? Since 302 redirects aren't in core, I've seen quite a few non-technical users with sites where multiple path aliases exist just so links don't break.
Also, I think the suggestion up at #6 should be implemented. Pathauto has a set of characters which are user configurable to be considered punctuation. Is there something similar in core we can use?
Comment #17
robertDouglass CreditAttribution: robertDouglass commentedIs there something similar in core we can use?
Actually, the handling for putting things in the search index should handle punctuation better. There are several common characters that get stripped, (the minus is one of them - ) which should in fact either get replaced with a space, or both (added in two different forms). Therefore my recommendation is to let this patch go in without taking care of punctuation but then handling improved punctuation in core search in another patch.
Comment #18
catchCentralising punctuation replacement/stripping in another patch seems good.
I think this should come with a test though.
Comment #20
Gurpartap Singh CreditAttribution: Gurpartap Singh commentedSimple feature with an interesting scope!
Patch still applies.
Comment #21
cburschka1.) The hook is called "hook_node_update_index", not "hook_nodeapi_update_index".
2.) If underlines and hyphens aren't stripped, then what does the search module do with the forward slashes that alias paths contain by definition? Are those separated properly? Even more so than the underline or hyphens, this is essential. If we end up with compound words like "contenttitle" from content/title, or "projectdrupal" from project/drupal, then this patch adds nothing of value to the search index, and shouldn't go in without the stripping logic taken care of first (whether in a separate patch or here).
Comment #22
deviantintegral CreditAttribution: deviantintegral commentedComment #23
jhodgdonI'm moving this to the search module issue queue, as it's mostly a search feature. Still needs to be D8 at this point.
Comment #24
jhodgdonJust as a note, this should apply also for links to nodes internal to the node text, see
#256692: Search indexer doesn't take aliases into account when determining relevancy
which I've marked as a duplicate of this issue.
Comment #25
jhodgdonSince 8.0.x-beta1 has been released, our policy at this point is No feature requests until 8.1.x. See #2350615: [policy, no patch] What changes can be accepted during the Drupal 8 beta phase?. Sorry, it's just too late for 8.0.x at this point, so even if we had a viable patch, the core committers would not commit it. So unless we decide this is a Task or a Bug (and I don't think it is), we'll have to delay it.
Comment #26
mgiffordThere has been no new work on this issue in quite some time. So I'm assuming the person assigned is no longer being actively pursuing it. Sincere apologies if this is wrong.