See http://groups.drupal.org/node/4105. The summary is:
- modules such as voting modules or other metrics systems can inject their own scoring factors. Wouldn't it be cool if the votes a node has received could improve its positioning in the search results?
- it cleans up node.module code
The technical details are:
I've create a new hook_node_rank() function which returns an array of ranks which allows each hook_node_rank to define multiple rankings. The lower level array defines title, join, score, and terms; terms is an array; join and terms are optional. For example, hook_node_rank for comment.module:
function comment_node_rank() {
return array(
'node_rank_comments' => array(
'title' => t('Number of comments'),
'join' => 'LEFT JOIN {node_comment_statistics} c ON c.nid = i.sid',
'score' => '2.0 - 2.0 / (1.0 + c.comment_count * %f)',
'terms' => array(variable_get('node_cron_comments_scale', 0)),
),
);
}
I moved the existing code out of node_search and implemented three hook_node_rank functions (node_, comment_, and statistics_). I think that this refactoring makes more sense, moving the code to the module that actually defines it. Thus, this patch impacts node.module, comments.module, and statistics.module.
You might notice that the weights are not included in the definition. I was able to easily extend the admin/settings/search page with all of the 'node_rank' variables, and then automatically add these to the generated query. This makes it a lot easier to add new node rankings, since the system handles the UI.
Lastly, I wrapped the score values with the SQL COALESCE(..., 0)
because during testing of a 5.x system, I encountered NULL score values (I think related to comments) that we're messing up the results.
Comment | File | Size | Author |
---|---|---|---|
#31 | 145242.patch | 13.67 KB | douggreen |
#26 | 145242.patch | 13.05 KB | douggreen |
#23 | 145242.patch | 12.96 KB | douggreen |
#22 | 145242.patch | 12.52 KB | douggreen |
#21 | 145242.patch | 12.58 KB | douggreen |
Comments
Comment #1
nedjo+1 in principle (haven't tested). This is part of what we need to do all over core: remove the hard-coded special treatment given to core modules (in this case, comment and statistics). Enabling contrib modules to affect scoring will indeed be a significant benefit as we get e.g. voting and other user-driven ranking parameters.
This also opens the door to other uses of node ranking outside search.
Comment #2
douggreen CreditAttribution: douggreen commentedUpdated patch, fixing 'field' reference that should of been 'score'...
Comment #3
Scott Reynolds CreditAttribution: Scott Reynolds commentedsubscribing. Great idea
Comment #4
Zothos CreditAttribution: Zothos commenteddefinitely +1 for the idea. :)
Comment #5
douggreen CreditAttribution: douggreen commentedI'm updating patch to current Drupal head. I think we're probably too late to get this into 6.x, but it doesn't break any api's. Anyone care to test and mark it RTBC?
Comment #6
nedjoMinor comments on the code, I haven't tested.
Looks very good overall.
The fact we're attaching this new hook to node rather than search reflects a general weakness in search module (the need to search separately for different types of objects, the relatively full implementation for node and lack thereof for other object types like user, taxonomy term, etc.). It'd be nice to see a solution that moved us toward general search support rather than overdeveloped node support, but this is at least a good step toward generalizing the hard-codedness.
Code comments should be full sentences (capitalized, periods or other punctuation).
I find the use of the word 'terms' here confusing, e.g.,
They are fed into a function argument called
$argument2
. These seem to be arguments fed into the SQL? If so, let's call them 'arguments'. If not, they need some code comment to explain.Comment #7
nedjoA couple of possible longer term directions:
1. Move ranking from node to search module, generalizing to accept a 'type' argument. Search is the only place I know of in core where we have built-in support for multiple object types (node, user, etc.). Compare e.g. the node_access system, hard-coded to just nodes. On the surface, generalizing
hook_node_rank()
to behook_search_rank()
would seem fairly simple: add a $type argument (e.g., 'node'), add aswitch ($type)
test to implementations. But adding e.g. a user ranking implementation would of course be more work.2. Factor out the ranking into e.g. rank.module so it can be used independently of search. There are many other cases in which we wish to rank content (or users) by a synthesis of multiple criteria, e.g., a "recommended content" block.
Comment #8
douggreen CreditAttribution: douggreen commentedThis is definitely an incremental improvement. $arguments2 is part of the current core search code. When submitting patches, I frequently try to change as little as possible. Are suggesting that I/we discuss more significant changes.
I agree that hook_search_rank is a better name.
I don't know that it could be generalized, however, because the rank methods actually return SQL that is related to the search. For example, the 'join' array element.
You say that you find the use of the word "terms" confusing in one place. I've used the array keys of 'title', 'join', 'score', and 'terms'. I agree that 'terms' should probably be renamed to something like 'join args'.
Comment #9
nedjoMy feeling is, if this goes in for 6.x, great, it's definitely an incremental improvement (though it's hard to say if it's enough of a priority to make it in at this point). If not, we might consider some more generalization. We have a core search system that's well developed for nodes and almost nonexistant for anything else. User searching is a skeletal
"SELECT name, uid FROM {users} WHERE LOWER(name) LIKE LOWER('%%%s%%')"
. There's zip AFAIK for taxonomy terms. Moving logic from node to search may be a step toward more balanced implementations.If we wanted to consider moving ranking to a separate implementation rather than tying it to search, I guess we'd need to do something similar to what we do with
db_rewrite_sql()
, that is, feed in a table alias and field name. In the case of node ranking, that would be'i', 'sid'
for search and'n', 'nid'
for node.Comment #10
nedjoI've taken a first crack at sketching in a generalized ranking system based on this patch. See this issue http://drupal.org/node/173593 on Content Recommendation Engine (CRE) and this module in my sandbox http://cvs.drupal.org/viewvc.py/drupal/contributions/sandbox/nedjo/modul.... I'm hoping this might go in the short term into CRE.
In the longer term, based on this experience, I am thinking we could probably factor out search's ranking into a small, general core ranking API that search and other modules could implement and extend.
Whether or not we do so, this patch on its is a positive step in the right direction.
Comment #11
catchThis is a very nice idea but it no longer applies.
Comment #12
douggreen CreditAttribution: douggreen commentedThis issue will be one of my focus areas during the 7.x release cycle.
Comment #13
dropcube CreditAttribution: dropcube commentedSubscribing.
Comment #14
robertDouglass CreditAttribution: robertDouglass commentedRerolled against HEAD, tested, fixed a bug with an unbalanced (, and somewhat addressed Nedjo's concern about the variable name (I renamed term into factor since these are referred to scoring factors elsewhere).
Comment #15
douggreen CreditAttribution: douggreen commentedLIVE FROM THE MINNESOTA SEARCH SPRINT, the attached patch does:
This is still a work in progress, but something we should finish this morning.
Next, we need to review all of the existing core modules, and see if there is any low hanging fruit to further extend the search rankings.
Comment #16
BlakeLucchesi CreditAttribution: BlakeLucchesi commented* Tests Needed
- Relevancy: Two nodes with similar nodes, one node has more instances of the same search keyword than the other.
- Recency: Two nodes with the exact same content; different timestamp.
- Statistics: Two nodes with the exact same content; different number of views.
- Comments: Two nodes with the exact same content; different number of comments.
- Promoted: Two nodes with the exact same content; one node is promoted.
- Sticky: Two nodes with the exact same content; one node is marked sticky.
Comment #17
douggreen CreditAttribution: douggreen commentedLIVE FROM THE MINNESOTA SEARCH SPRINT, this fixes the patch, which was not searching properly. If the hook_ranking 'score' uses SQL arguments, this is now renamed to 'arguments' and the code is fixed to build the proper SQL arguments. It also removes factors with a value of 0 from the SQL.
Comment #18
BlakeLucchesi CreditAttribution: BlakeLucchesi commentedRe-rolled to make sure that there is no division by zero error when normalizing the score factors.
Comment #19
cwgordon7 CreditAttribution: cwgordon7 commentedAren't we setting ourselves up for conflicts if we use table abbreviations such as
{node_comment_statistics} c
? Perhaps the hook should be refactored to force node_comment_statistics.nid notation? Something like:Comment #20
douggreen CreditAttribution: douggreen commentedRerolled with a simpletest (original version courtesy of Blake and Chad). I think that the test case has uncovered a problem in the code... it's either that or the test.
Comment #21
douggreen CreditAttribution: douggreen commentedThis update restores part of the original patch that was lost. The test is still failing the "recent", "comments", and "views" (statistics) tests.
Comment #22
douggreen CreditAttribution: douggreen commentedThis patch fixes the "recent" test failure, which was only a problem with the test case, and not the patch code.
Comment #23
douggreen CreditAttribution: douggreen commentedAnd now, a finally working version, complete with simpletest. I'm marking as CNR, but I think cwgordon7 has a good point that we should consider before RTBC.
To cwgordon7's point, the existing search joins do use aliases, and one just needs to know to work around them. However, since we're opening this up to contrib, you could work around the core aliases, but you'll never know what another contrib module might alias. In this case, I think that standard should be to not use aliases at all. And if that's the standard for contrib, we should probably do this in core too.
Comment #24
douggreen CreditAttribution: douggreen commentedComment #25
BlakeLucchesi CreditAttribution: BlakeLucchesi commentedagree to to reroll without aliases.
Comment #26
douggreen CreditAttribution: douggreen commentedAgreed, the solution then is to just not introduce SQL table aliasing in hook_ranking. While we could do this in core (because everyone will know that core implements certain aliases and will avoid them), core is also an example to contrib. We don't want contrib doing this, so we just shouldn't do it in core. I've retested and rerolled without the aliases in comment_ranking and statistics_ranking.
Comment #27
robertDouglass CreditAttribution: robertDouglass commentedNice work. Great new opportunities for contrib modules to interact with search.
Comment #28
BlakeLucchesi CreditAttribution: BlakeLucchesi commentedChanging to "needs work" because we are going to add another ranking factor which increases the score of nodes with incoming links.
Comment #29
douggreen CreditAttribution: douggreen commentedAnd after further discussion, the new ranking factor is really a separate patch that is going to have to wait until this one commits.
Comment #30
coupet CreditAttribution: coupet commentedHighly interesting. subscribing.
Comment #31
douggreen CreditAttribution: douggreen commentedDries commented on #256792, when I think he intended to comment here.
So, I re-rolled this patch changing "Node is Sticky" to "Content is sticky at top of lists" and changing "Node is Promoted" to "Content is promoted to the front page". I made sure that all of the ranking scores had comments, which were accidentally lost in the refactoring. I made sure that all of the new comments were sentences. I also tried to improve the comments in the new _node_rankings() function. And lastly I diffed the old patch with the new one, just to make sure that only comments and strings were changed. I did this hoping we could leave it RTBC.
Comment #32
Dries CreditAttribution: Dries commentedCommitted to CVS. Thanks Doug!
Comment #33
Anonymous (not verified) CreditAttribution: Anonymous commentedAutomatically closed -- issue fixed for two weeks with no activity.
Comment #34
Dave ReidIf someone could follow-up and document this new hook in #364564: Undocumented hook_ranking in search.api.php that would be awesome.
Comment #35
BenK CreditAttribution: BenK commentedJust need to keep track of this thread...