See http://groups.drupal.org/node/4105. The summary is:

  • modules such as voting modules or other metrics systems can inject their own scoring factors. Wouldn't it be cool if the votes a node has received could improve its positioning in the search results?
  • it cleans up node.module code

The technical details are:

I've create a new hook_node_rank() function which returns an array of ranks which allows each hook_node_rank to define multiple rankings. The lower level array defines title, join, score, and terms; terms is an array; join and terms are optional. For example, hook_node_rank for comment.module:

function comment_node_rank() {
  return array(
    'node_rank_comments' => array(
      'title' => t('Number of comments'),
      'join' => 'LEFT JOIN {node_comment_statistics} c ON c.nid = i.sid',
      'score' => '2.0 - 2.0 / (1.0 + c.comment_count * %f)',
      'terms' => array(variable_get('node_cron_comments_scale', 0)),
    ),
  );
}

I moved the existing code out of node_search and implemented three hook_node_rank functions (node_, comment_, and statistics_). I think that this refactoring makes more sense, moving the code to the module that actually defines it. Thus, this patch impacts node.module, comments.module, and statistics.module.

You might notice that the weights are not included in the definition. I was able to easily extend the admin/settings/search page with all of the 'node_rank' variables, and then automatically add these to the generated query. This makes it a lot easier to add new node rankings, since the system handles the UI.

Lastly, I wrapped the score values with the SQL COALESCE(..., 0) because during testing of a 5.x system, I encountered NULL score values (I think related to comments) that we're messing up the results.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

nedjo’s picture

+1 in principle (haven't tested). This is part of what we need to do all over core: remove the hard-coded special treatment given to core modules (in this case, comment and statistics). Enabling contrib modules to affect scoring will indeed be a significant benefit as we get e.g. voting and other user-driven ranking parameters.

This also opens the door to other uses of node ranking outside search.

douggreen’s picture

FileSize
8.27 KB

Updated patch, fixing 'field' reference that should of been 'score'...

Scott Reynolds’s picture

subscribing. Great idea

Zothos’s picture

definitely +1 for the idea. :)

douggreen’s picture

FileSize
8.32 KB

I'm updating patch to current Drupal head. I think we're probably too late to get this into 6.x, but it doesn't break any api's. Anyone care to test and mark it RTBC?

nedjo’s picture

Minor comments on the code, I haven't tested.

Looks very good overall.

The fact we're attaching this new hook to node rather than search reflects a general weakness in search module (the need to search separately for different types of objects, the relatively full implementation for node and lack thereof for other object types like user, taxonomy term, etc.). It'd be nice to see a solution that moved us toward general search support rather than overdeveloped node support, but this is at least a good step toward generalizing the hard-codedness.

Code comments should be full sentences (capitalized, periods or other punctuation).

I find the use of the word 'terms' here confusing, e.g.,


+      $rankings['terms'][] = $node_rank;

They are fed into a function argument called $argument2. These seem to be arguments fed into the SQL? If so, let's call them 'arguments'. If not, they need some code comment to explain.

nedjo’s picture

A couple of possible longer term directions:

1. Move ranking from node to search module, generalizing to accept a 'type' argument. Search is the only place I know of in core where we have built-in support for multiple object types (node, user, etc.). Compare e.g. the node_access system, hard-coded to just nodes. On the surface, generalizing hook_node_rank() to be hook_search_rank() would seem fairly simple: add a $type argument (e.g., 'node'), add a switch ($type) test to implementations. But adding e.g. a user ranking implementation would of course be more work.

2. Factor out the ranking into e.g. rank.module so it can be used independently of search. There are many other cases in which we wish to rank content (or users) by a synthesis of multiple criteria, e.g., a "recommended content" block.

douggreen’s picture

This is definitely an incremental improvement. $arguments2 is part of the current core search code. When submitting patches, I frequently try to change as little as possible. Are suggesting that I/we discuss more significant changes.

I agree that hook_search_rank is a better name.

I don't know that it could be generalized, however, because the rank methods actually return SQL that is related to the search. For example, the 'join' array element.

You say that you find the use of the word "terms" confusing in one place. I've used the array keys of 'title', 'join', 'score', and 'terms'. I agree that 'terms' should probably be renamed to something like 'join args'.

nedjo’s picture

My feeling is, if this goes in for 6.x, great, it's definitely an incremental improvement (though it's hard to say if it's enough of a priority to make it in at this point). If not, we might consider some more generalization. We have a core search system that's well developed for nodes and almost nonexistant for anything else. User searching is a skeletal "SELECT name, uid FROM {users} WHERE LOWER(name) LIKE LOWER('%%%s%%')". There's zip AFAIK for taxonomy terms. Moving logic from node to search may be a step toward more balanced implementations.

I don't know that it could be generalized, however, because the rank methods actually return SQL that is related to the search. For example, the 'join' array element.

If we wanted to consider moving ranking to a separate implementation rather than tying it to search, I guess we'd need to do something similar to what we do with db_rewrite_sql(), that is, feed in a table alias and field name. In the case of node ranking, that would be 'i', 'sid' for search and 'n', 'nid' for node.

nedjo’s picture

I've taken a first crack at sketching in a generalized ranking system based on this patch. See this issue http://drupal.org/node/173593 on Content Recommendation Engine (CRE) and this module in my sandbox http://cvs.drupal.org/viewvc.py/drupal/contributions/sandbox/nedjo/modul.... I'm hoping this might go in the short term into CRE.

In the longer term, based on this experience, I am thinking we could probably factor out search's ranking into a small, general core ranking API that search and other modules could implement and extend.

Whether or not we do so, this patch on its is a positive step in the right direction.

catch’s picture

Status: Needs review » Needs work

This is a very nice idea but it no longer applies.

douggreen’s picture

Version: 6.x-dev » 7.x-dev
Assigned: Unassigned » douggreen

This issue will be one of my focus areas during the 7.x release cycle.

dropcube’s picture

Subscribing.

robertDouglass’s picture

Status: Needs work » Needs review
FileSize
8.31 KB

Rerolled against HEAD, tested, fixed a bug with an unbalanced (, and somewhat addressed Nedjo's concern about the variable name (I renamed term into factor since these are referred to scoring factors elsewhere).

douggreen’s picture

Status: Needs review » Needs work
FileSize
8.27 KB

LIVE FROM THE MINNESOTA SEARCH SPRINT, the attached patch does:

  1. rerolled to current Drupal HEAD
  2. changes the new hook to be named hook_ranking which is still consistent with the original verbage in the code, but is a better hook name without an underscore
  3. enforce the variable names to always be prepended as 'node_rank_', and then removes these from the new hook_ranking array index. This change will make it so that a hook_ranking implementor can't pollute other variable name spaces

This is still a work in progress, but something we should finish this morning.

Next, we need to review all of the existing core modules, and see if there is any low hanging fruit to further extend the search rankings.

BlakeLucchesi’s picture

* Tests Needed
- Relevancy: Two nodes with similar nodes, one node has more instances of the same search keyword than the other.
- Recency: Two nodes with the exact same content; different timestamp.
- Statistics: Two nodes with the exact same content; different number of views.
- Comments: Two nodes with the exact same content; different number of comments.
- Promoted: Two nodes with the exact same content; one node is promoted.
- Sticky: Two nodes with the exact same content; one node is marked sticky.

douggreen’s picture

FileSize
8.53 KB

LIVE FROM THE MINNESOTA SEARCH SPRINT, this fixes the patch, which was not searching properly. If the hook_ranking 'score' uses SQL arguments, this is now renamed to 'arguments' and the code is fixed to build the proper SQL arguments. It also removes factors with a value of 0 from the SQL.

BlakeLucchesi’s picture

FileSize
9.32 KB

Re-rolled to make sure that there is no division by zero error when normalizing the score factors.

cwgordon7’s picture

Aren't we setting ourselves up for conflicts if we use table abbreviations such as {node_comment_statistics} c? Perhaps the hook should be refactored to force node_comment_statistics.nid notation? Something like:

function comment_ranking() {
  return array(
    'comments' => array(
      'title' => t('Number of comments'),
      'join' => array(
        'table' => 'node_comment_statistics',
        'type' => 'left',
        'on' => 'node_comment_statistics.nid = search_index.sid'
      ),
      'score' => '2.0 - 2.0 / (1.0 + node_comment_statistics.comment_count * %f)',
      'arguments' => array(variable_get('node_cron_comments_scale', 0)),
    ),
  );
}
douggreen’s picture

FileSize
9.55 KB

Rerolled with a simpletest (original version courtesy of Blake and Chad). I think that the test case has uncovered a problem in the code... it's either that or the test.

douggreen’s picture

FileSize
12.58 KB

This update restores part of the original patch that was lost. The test is still failing the "recent", "comments", and "views" (statistics) tests.

douggreen’s picture

FileSize
12.52 KB

This patch fixes the "recent" test failure, which was only a problem with the test case, and not the patch code.

douggreen’s picture

FileSize
12.96 KB

And now, a finally working version, complete with simpletest. I'm marking as CNR, but I think cwgordon7 has a good point that we should consider before RTBC.

To cwgordon7's point, the existing search joins do use aliases, and one just needs to know to work around them. However, since we're opening this up to contrib, you could work around the core aliases, but you'll never know what another contrib module might alias. In this case, I think that standard should be to not use aliases at all. And if that's the standard for contrib, we should probably do this in core too.

douggreen’s picture

Status: Needs work » Needs review
BlakeLucchesi’s picture

agree to to reroll without aliases.

douggreen’s picture

FileSize
13.05 KB

Agreed, the solution then is to just not introduce SQL table aliasing in hook_ranking. While we could do this in core (because everyone will know that core implements certain aliases and will avoid them), core is also an example to contrib. We don't want contrib doing this, so we just shouldn't do it in core. I've retested and rerolled without the aliases in comment_ranking and statistics_ranking.

robertDouglass’s picture

Status: Needs review » Reviewed & tested by the community

Nice work. Great new opportunities for contrib modules to interact with search.

BlakeLucchesi’s picture

Status: Reviewed & tested by the community » Needs work

Changing to "needs work" because we are going to add another ranking factor which increases the score of nodes with incoming links.

douggreen’s picture

Status: Needs work » Reviewed & tested by the community

And after further discussion, the new ranking factor is really a separate patch that is going to have to wait until this one commits.

coupet’s picture

Highly interesting. subscribing.

douggreen’s picture

FileSize
13.67 KB

Dries commented on #256792, when I think he intended to comment here.

  • We don't use the word 'Node' in output.
  • We write 'Node is sticky' not 'Node is Sticky'.
  • Code comments are inconsistent.
  • It would be useful to document the score functions (i.e. the math).

So, I re-rolled this patch changing "Node is Sticky" to "Content is sticky at top of lists" and changing "Node is Promoted" to "Content is promoted to the front page". I made sure that all of the ranking scores had comments, which were accidentally lost in the refactoring. I made sure that all of the new comments were sentences. I also tried to improve the comments in the new _node_rankings() function. And lastly I diffed the old patch with the new one, just to make sure that only comments and strings were changed. I did this hoping we could leave it RTBC.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS. Thanks Doug!

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

Dave Reid’s picture

If someone could follow-up and document this new hook in #364564: Undocumented hook_ranking in search.api.php that would be awesome.

BenK’s picture

Just need to keep track of this thread...