Problem/Motivation
The Search module's Content (Node) search allows you to specify result ranking/ordering factors. These factors include Number of Comments and Number of Page Views. However, neither the comments nor page views ranking is currently working. This has been broken since Drupal 7.
The reason that it is not working is that hook_update_index() [which lets search modules add items to the search index during search_cron()] in Drupal 7 was changed so that it was only called on search modules that were marked as "active" on the Search settings page. In Drupal 8, this was even more the case, since indexing happens by calling the updateIndex() method on the plugins of active search pages. But the comment/statistics ratings relied some code in comment_update_index() and statistics_update_index() being executed, and it was never executed.
Proposed resolution
Fix so that the rankings work. This was done by moving the necessary code (which basically was just calculating normalization factors for these rankings) into the hook_cron() for these two modules. This could be slightly inefficient, since it will be executed even if there isn't a node search module active that needs these factors to be calculated, but it's difficult to detect that and it's only one line of query and a settings update anyway, and it only happens during cron runs where runtime should not be that huge of an issue.
In getting it to work, I also found that a CAST in the search query code that actually added the rankings to queries was causing problems, so I took that out.
And there were some problems with the tests themselves, so those were fixed as well (in Drupal 8, simulating in a test that a node had been viewed was kind of problematic, so the test now updates the database directly to simulate it).
Remaining tasks
None, except committing the patch.
User interface changes
Rankings for node searches based on comments and statistics will work.
API changes
None.
Original report by @rayasa
In the search settings, when ONLY the ranking factor 'number of comments" is set (i.e. >0), the scores for all search results is being calculated to 0 even if they have comments.
Observations:
1) The functionality to rank search results based on number of comments requires that the variable 'node_cron_comments_scale" is properly set. This variable is set by the hook function comment_update_index, but, it seems that this function is never invoked. I guess, the function search_cron is expected to call all hook_update_index implementations, but currently it's not being done for comment module implementation of the hook.
function search_cron() {
// We register a shutdown function to ensure that search_total is always up
// to date.
drupal_register_shutdown_function('search_update_totals');
foreach(variable_get('search_active_modules', array('node', 'user')) as $module) {
// Update word index
module_invoke($module, 'update_index');
}
}
I suppose, instead of doing a module_invoke only on the active search modules, search_cron should do a module_invoke_all('update_index'). The patch attached does the change.
2) There maybe a small glitch in the hook implementation comment_rankings:
function comment_ranking() {
return array(
'comments' => array(
'title' => t('Number of comments'),
'join' => array(
'type' => 'LEFT',
'table' => 'node_comment_statistics',
'alias' => 'node_comment_statistics',
'on' => 'node_comment_statistics.nid = i.sid',
),
// Inverse law that maps the highest reply count on the site to 1 and 0 to 0.
'score' => '2.0 - 2.0 / (1.0 + node_comment_statistics.comment_count * CAST(:scale AS DECIMAL))',
'arguments' => array(':scale' => variable_get('node_cron_comments_scale', 0)),
),
);
}
Here, a CAST to Decimal for ':scale' is not required as the variable 'node_cron_comment_scale' is already loaded with a value of type Decimal. Moreover, doing the CAST sets the score to 0 again. The attached patch handles this too.
Comment | File | Size | Author |
---|---|---|---|
#27 | 893302-27.patch | 5.08 KB | jhodgdon |
#23 | 893302-22.patch | 2.67 KB | jhodgdon |
#1 | comment_count_search_rankings.patch | 1.18 KB | rayasa |
comment_count_search_rankings.patch | 1.18 KB | rayasa | |
Comments
Comment #1
rayasa CreditAttribution: rayasa commentedComment #2
jhodgdonInteresting, good catch!
We don't want to do a module_invoke_all on hook_update_index, because it's inefficient to index modules that are deactivated. So we need to do something else.
Comment #3
rayasa CreditAttribution: rayasa commentedI find even search rankings based on number of page views calculates all the search result scores to 0. The problem is similar to the one mentioned in the observation 1 of the description. In this case its the statistics module hook_update_index that never gets invoked.
So, as per the suggestion of 'jhodgdon', I have removed the module_invoke_all from the previous patch and have used search_active_modules variable instead. In the attached patch I update the search_active_modules variable to include comment module or statistics module when either of them are enabled and remove them from the variable when they are disabled.
Comment #4
jhodgdonThat's not the right approach. search_active_modules is what is setting up tabs for different types of searching (content, users, etc.) on the search page.
I think the right approach is that the node module should see if the comment and statistics modules are enabled when node_update_index() is called, and invoke those modules' functions at that time. Both of these operations are related to the node module's search, and they shouldn't be invoked unless node searching is enabled.
Comment #5
rayasa CreditAttribution: rayasa commentedGot you.... will work on it
Comment #6
rayasa CreditAttribution: rayasa commentedThe attached patch does a module_invoke on comment and statistics module for hook update_index when the node_update_index is called. By default the module_invoke function invokes the hook only if the module is installed and enabled.
Comment #7
jhodgdonI don't like that it hard-wires in the comment and statistics modules. What if some other module implements hook_ranking() in order to define a rank? I think what we should do is to invoke hook_update_index() on any module that has supplied a hook_ranking(). And maybe any module that implements hook_node_update_index(), just in case.
The attached patch does that, and also documents this behavior.
Comment #8
rayasa CreditAttribution: rayasa commentedDefinitely better.
Comment #9
jhodgdon#7: 893302.patch queued for re-testing.
Comment #10
pwolanin CreditAttribution: pwolanin commentedThis seems like a weird solution - perhaps node module should jsut invoke some other hook at this point?
Comment #11
jhodgdonProbabably. If we go with that approach, we probably need to do it today (API change).
Comment #12
jhodgdon#7: 893302.patch queued for re-testing.
Comment #13
jhodgdon#7: 893302.patch queued for re-testing.
Comment #14
jhodgdonThis is still an issue in 8.x. See #2003482-55: Convert hook_search_info to plugin system
Comment #15
ianthomas_ukNeeds a patch for 8.x
Comment #16
jhodgdonI'm circling back to this issue, and yes we still need a patch...
So the problem is that actually we don't even have a hook_update_index() any more in D8, and both Comment and Statistics module-provided rankings rely on the D6 behavior where in their hook_update_index(), they set a scale value so that their rankings would be properly scaled. This hasn't worked since 7.0 either, because (as noted in the summary), hook_update_index() was only invoked on enabled search modules in 7. In 8, it's now a method on the Search plugin, and is only invoked for enabled search pages' plugins.
So what do we do? Two options:
a) The NodeSearch plugin has hook_ranking(), which allows modules to define rankings, and it could have a new hook that these modules can implement, which will be run at the end of node indexing, so they can update their scaling values.
b) The comment and statistics ranking code could be updated so that it doesn't rely on these calculated scaling values, but since ranking values have to be between 0 and 1, there has to be some scale, so I think that wouldn't really work.
So, we need a new hook. We have now hook_node_update_index(), which is invoked for each node during node indexing... We could use that but then it would get called for each node instead of just once at the end, which would be inefficient. Too bad, because that would have been a good name. :) So... maybe what we should do actually is go ahead and call it hook_update_index() again, and just document that it's invoked from NodeSearch after all nodes for this run have been indexed? That would probably fix the whole problem.
This is basically what the last patch did, except we need to update it for 2014 D8 and put the invoke code into NodeSearch::updateIndex().
Thoughts?
And we probably need a test.
Comment #17
ianthomas_ukIf it's at all possible I think this would be better calculated per-search, i.e. the maximum number of comments returned in any of the search results. I can see that being slow though, or maybe not even possible with the db abstraction.
The scale doesn't need to be perfect, so long as values over 1 are scaled back down to 1 (again, assuming this is possible with the API), so it could just be calculated in hook_cron().
Comment #18
jhodgdonCalculating per search... that is an interesting idea, but it would require the NodeSearch method to alter the SearchQuery in some way that would be really complicated. I don't think it's viable.
And you are right, we could just make those two former hook_update_index() things into hook_cron() implmentations. Actually, that is probably simpler... let's just do that. Neither one is dependent on indexing to calculate the scales anyway.
We obviously also need a test. Actually I think there are ranking tests for comments and stats that were commented out until we fixed this issue.
So... the proposed patch would be to change the names of comment_update_index() and statistics_update_index() to be comment_cron() and statistics_cron(), and uncomment those test lines that were supposed to test this behavior.
Comment #19
ianthomas_ukI'd say this is a major as this functionality is just broken. If this ends up being the one issue blocking the release of D8 then it should be easy to just remove the broken rankings and this then becomes a feature request.
Comment #20
jhodgdonThis is going to conflict with
#2042807: Convert search plugins to use a ConfigEntity and a PluginBag
because it takes care of making node rankings not use variable_get/set.
However, just for fun, let's see if this patch works, and then set the issue to Postponed.
Comment #21
jhodgdonComment #23
jhodgdonTurns out there was already a statistics_cron() function. Slightly different patch.
Comment #24
jhodgdonComment #26
jhodgdonWell, that patch didn't work right -- comments and ranking tests both failed. So, we need to figure out why and make a viable patch.
This issue also doesn't need to be postponed any more, because the other issue's patch was committed.
Comment #27
jhodgdonSo... I think I have this figured out. There were multiple reasons this didn't work:
a) The test didn't work as it was because it didn't run cron, and the comment/statistics rankings now depend on a cron run. So instead of trying to manually update the search index, I substituted a cron run.
b) The statistics module is not actually working very well. In my manual tests, viewing a node did not actually do anything to the node_counter table -- see #2170685: Statistics page views are not being logged. And on this other issue #1684968: More reliable statistics.js: retry after 2 s when request fails I learned that we should probably be doing something else in the test for statistics logging anyway rather than trying to duplicate what the Statistics logging tests are doing already. So instead of trying to figure out how to simulate a visit of node pages, I just added a manual database query.
c) The score functions for comments and statistics were using a CAST(scale AS DECIMAL) and by experimenting in MySQL I found out that CAST(0.1 as DECIMAL) is resulting in a value of 0, thereby multiplying the factors being used to rank nodes by zero, doh! So I took out the cast. I have no idea why that was even in there, or how to make it work, but it was causing the tests to fail because it screwed up the scoring completely (multiplying the stats count by a scale of zero basically takes it out of the equation). And of course it's probably been like this since who knows when, but since search ranking has been broken for these two factors forever, no one noticed.
So the ranking test now passes on my local machine.
I don't think anyone has reviewed this patch yet so I didn't bother with an interdiff. I hope that's OK.
Comment #30
jhodgdon27: 893302-27.patch queued for re-testing.
Comment #33
jhodgdon27: 893302-27.patch queued for re-testing.
Comment #34
jhodgdonOK, that was two different completely unrelated random "test could not complete due to a fatal error" fails. :( Trying again...
Comment #35
mikemiles86Comment #36
mikemiles86Representing Boston Sprint, reviewed patch #27.
Comment #37
Nick_vhAwesome, let's get this in.
Comment #38
jhodgdonUpdating the issue summary
Comment #39
jhodgdonComment #40
catchWhy does this need the cast to float? Would we ever store a non-float? If we really do have to cast somewhere could it be on save instead?
Comment #41
jhodgdonWe can possibly take that out... The previous code was even worse, as it had a cast to DECIMAL in the SQL code:
So what I did is move it into PHP instead. I didn't want to make a judgement about the need for a cast, but the SQL cast was not working correctly in the queries.
Comment #42
andypostI think this cron implementations needs to check that required modules and settings active/valid before running
If this hook is not executed then should be removed from node.api.php
Comment #43
jhodgdonThe hook in node.api.php is hook_node_update_index(), not hook_update_index(). They are not at all the same thing, and hook_node_update_index() still exists.
Regarding the other part of #42 "I think this cron implementations needs to check that required modules and settings active/valid before running" ... Here is comment_cron():
It seems like a quick query to do during cron. It is technically only necessary to do if the Search module is enabled and there is at least one NodeSearch page enabled... at least at the moment. Checking this is probably going to take more time than just doing the query, and some contrib module is going to come along and make their own node searching plugin (which might or might not extend NodeSearch) and want to use the same ranking, and the check will not be valid. So I think it's just better to do it without checking. OK?
Comment #44
webchickSounds like this still needs discussion?
Comment #45
jhodgdonI think the review comments above have been addressed, but it would be good if catch and andypost would add a quick either "I still think it's wrong" or "OK I accept your explanation" so that either I can make a new patch or we can get this one in. Meanwhile, I'll click retest on it -- the last test was nearly a month ago and things have been changing underfoot.
Comment #46
jhodgdon27: 893302-27.patch queued for re-testing.
Comment #47
jhodgdonActually can I do it this way:
catch/andypost: If you still have objections, please set the patch back to "needs work". I think your comments have been addressed.
Comment #48
catchI don't really like the cast, but since there was an existing, worse, cast, we can handle that in a follow-up. It should be left in the backport anyway.
The query every cron run isn't fantastic but if that's the correct check then there's not an obvious way to optimize it.
Committed/pushed to 8.x, thanks!
Comment #49
tstoecklerAlso came here because the cast threw me off in the commitdiff. I had to look up PHP operator precedence in order to know whether the 0 in the ternary gets casted as well or not (it doesn't). Let's explore removing that, I'd say.
Comment #50
catchOpened #2194403: Remove unnecessary cast to float from comment_ranking() and statistics_ranking().
Comment #51
jhodgdonThanks!