Problem/Motivation

The Search module's Content (Node) search allows you to specify result ranking/ordering factors. These factors include Number of Comments and Number of Page Views. However, neither the comments nor page views ranking is currently working. This has been broken since Drupal 7.

The reason that it is not working is that hook_update_index() [which lets search modules add items to the search index during search_cron()] in Drupal 7 was changed so that it was only called on search modules that were marked as "active" on the Search settings page. In Drupal 8, this was even more the case, since indexing happens by calling the updateIndex() method on the plugins of active search pages. But the comment/statistics ratings relied some code in comment_update_index() and statistics_update_index() being executed, and it was never executed.

Proposed resolution

Fix so that the rankings work. This was done by moving the necessary code (which basically was just calculating normalization factors for these rankings) into the hook_cron() for these two modules. This could be slightly inefficient, since it will be executed even if there isn't a node search module active that needs these factors to be calculated, but it's difficult to detect that and it's only one line of query and a settings update anyway, and it only happens during cron runs where runtime should not be that huge of an issue.

In getting it to work, I also found that a CAST in the search query code that actually added the rankings to queries was causing problems, so I took that out.

And there were some problems with the tests themselves, so those were fixed as well (in Drupal 8, simulating in a test that a node had been viewed was kind of problematic, so the test now updates the database directly to simulate it).

Remaining tasks

None, except committing the patch.

User interface changes

Rankings for node searches based on comments and statistics will work.

API changes

None.

Original report by @rayasa

In the search settings, when ONLY the ranking factor 'number of comments" is set (i.e. >0), the scores for all search results is being calculated to 0 even if they have comments.

Observations:

1) The functionality to rank search results based on number of comments requires that the variable 'node_cron_comments_scale" is properly set. This variable is set by the hook function comment_update_index, but, it seems that this function is never invoked. I guess, the function search_cron is expected to call all hook_update_index implementations, but currently it's not being done for comment module implementation of the hook.

function search_cron() {
  // We register a shutdown function to ensure that search_total is always up
  // to date.
  drupal_register_shutdown_function('search_update_totals');

  foreach(variable_get('search_active_modules', array('node', 'user')) as $module) {
    // Update word index
    module_invoke($module, 'update_index');
  }
}

I suppose, instead of doing a module_invoke only on the active search modules, search_cron should do a module_invoke_all('update_index'). The patch attached does the change.

2) There maybe a small glitch in the hook implementation comment_rankings:

function comment_ranking() {
  return array(
    'comments' => array(
      'title' => t('Number of comments'),
      'join' => array(
        'type' => 'LEFT',
        'table' => 'node_comment_statistics',
        'alias' => 'node_comment_statistics',
        'on' => 'node_comment_statistics.nid = i.sid',
      ),
      // Inverse law that maps the highest reply count on the site to 1 and 0 to 0.
      'score' => '2.0 - 2.0 / (1.0 + node_comment_statistics.comment_count * CAST(:scale AS DECIMAL))',
      'arguments' => array(':scale' => variable_get('node_cron_comments_scale', 0)),
    ),
  );
}

Here, a CAST to Decimal for ':scale' is not required as the variable 'node_cron_comment_scale' is already loaded with a value of type Decimal. Moreover, doing the CAST sets the score to 0 again. The attached patch handles this too.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

rayasa’s picture

Status: Active » Needs review
FileSize
1.18 KB
jhodgdon’s picture

Status: Needs review » Needs work

Interesting, good catch!

We don't want to do a module_invoke_all on hook_update_index, because it's inefficient to index modules that are deactivated. So we need to do something else.

rayasa’s picture

Title: Search ranking based on the factor "number of comments" is broken » Search ranking based on the factor "number of comments" & "No of Page Views" is broken
Status: Needs work » Needs review
FileSize
2.83 KB

I find even search rankings based on number of page views calculates all the search result scores to 0. The problem is similar to the one mentioned in the observation 1 of the description. In this case its the statistics module hook_update_index that never gets invoked.

So, as per the suggestion of 'jhodgdon', I have removed the module_invoke_all from the previous patch and have used search_active_modules variable instead. In the attached patch I update the search_active_modules variable to include comment module or statistics module when either of them are enabled and remove them from the variable when they are disabled.

jhodgdon’s picture

That's not the right approach. search_active_modules is what is setting up tabs for different types of searching (content, users, etc.) on the search page.

I think the right approach is that the node module should see if the comment and statistics modules are enabled when node_update_index() is called, and invoke those modules' functions at that time. Both of these operations are related to the node module's search, and they shouldn't be invoked unless node searching is enabled.

rayasa’s picture

Status: Needs review » Needs work

Got you.... will work on it

rayasa’s picture

Status: Needs work » Needs review
FileSize
1.73 KB

The attached patch does a module_invoke on comment and statistics module for hook update_index when the node_update_index is called. By default the module_invoke function invokes the hook only if the module is installed and enabled.

jhodgdon’s picture

FileSize
4.48 KB

I don't like that it hard-wires in the comment and statistics modules. What if some other module implements hook_ranking() in order to define a rank? I think what we should do is to invoke hook_update_index() on any module that has supplied a hook_ranking(). And maybe any module that implements hook_node_update_index(), just in case.

The attached patch does that, and also documents this behavior.

rayasa’s picture

Definitely better.

jhodgdon’s picture

#7: 893302.patch queued for re-testing.

pwolanin’s picture

This seems like a weird solution - perhaps node module should jsut invoke some other hook at this point?

jhodgdon’s picture

Probabably. If we go with that approach, we probably need to do it today (API change).

jhodgdon’s picture

#7: 893302.patch queued for re-testing.

jhodgdon’s picture

#7: 893302.patch queued for re-testing.

jhodgdon’s picture

Version: 7.x-dev » 8.x-dev
ianthomas_uk’s picture

Issue summary: View changes
Status: Needs review » Needs work

Needs a patch for 8.x

jhodgdon’s picture

I'm circling back to this issue, and yes we still need a patch...

So the problem is that actually we don't even have a hook_update_index() any more in D8, and both Comment and Statistics module-provided rankings rely on the D6 behavior where in their hook_update_index(), they set a scale value so that their rankings would be properly scaled. This hasn't worked since 7.0 either, because (as noted in the summary), hook_update_index() was only invoked on enabled search modules in 7. In 8, it's now a method on the Search plugin, and is only invoked for enabled search pages' plugins.

So what do we do? Two options:

a) The NodeSearch plugin has hook_ranking(), which allows modules to define rankings, and it could have a new hook that these modules can implement, which will be run at the end of node indexing, so they can update their scaling values.

b) The comment and statistics ranking code could be updated so that it doesn't rely on these calculated scaling values, but since ranking values have to be between 0 and 1, there has to be some scale, so I think that wouldn't really work.

So, we need a new hook. We have now hook_node_update_index(), which is invoked for each node during node indexing... We could use that but then it would get called for each node instead of just once at the end, which would be inefficient. Too bad, because that would have been a good name. :) So... maybe what we should do actually is go ahead and call it hook_update_index() again, and just document that it's invoked from NodeSearch after all nodes for this run have been indexed? That would probably fix the whole problem.

This is basically what the last patch did, except we need to update it for 2014 D8 and put the invoke code into NodeSearch::updateIndex().

Thoughts?
And we probably need a test.

ianthomas_uk’s picture

If it's at all possible I think this would be better calculated per-search, i.e. the maximum number of comments returned in any of the search results. I can see that being slow though, or maybe not even possible with the db abstraction.

The scale doesn't need to be perfect, so long as values over 1 are scaled back down to 1 (again, assuming this is possible with the API), so it could just be calculated in hook_cron().

jhodgdon’s picture

Calculating per search... that is an interesting idea, but it would require the NodeSearch method to alter the SearchQuery in some way that would be really complicated. I don't think it's viable.

And you are right, we could just make those two former hook_update_index() things into hook_cron() implmentations. Actually, that is probably simpler... let's just do that. Neither one is dependent on indexing to calculate the scales anyway.

We obviously also need a test. Actually I think there are ranking tests for comments and stats that were commented out until we fixed this issue.

So... the proposed patch would be to change the names of comment_update_index() and statistics_update_index() to be comment_cron() and statistics_cron(), and uncomment those test lines that were supposed to test this behavior.

ianthomas_uk’s picture

Priority: Normal » Major

I'd say this is a major as this functionality is just broken. If this ends up being the one issue blocking the release of D8 then it should be easy to just remove the broken rankings and this then becomes a feature request.

jhodgdon’s picture

Status: Needs work » Needs review
FileSize
2.32 KB

This is going to conflict with
#2042807: Convert search plugins to use a ConfigEntity and a PluginBag
because it takes care of making node rankings not use variable_get/set.

However, just for fun, let's see if this patch works, and then set the issue to Postponed.

jhodgdon’s picture

Status: Needs review » Postponed

Status: Postponed » Needs work

The last submitted patch, 20: 893302-20.patch, failed testing.

jhodgdon’s picture

Status: Needs work » Needs review
FileSize
2.67 KB

Turns out there was already a statistics_cron() function. Slightly different patch.

jhodgdon’s picture

Status: Needs review » Postponed

Status: Postponed » Needs work

The last submitted patch, 23: 893302-22.patch, failed testing.

jhodgdon’s picture

Well, that patch didn't work right -- comments and ranking tests both failed. So, we need to figure out why and make a viable patch.

This issue also doesn't need to be postponed any more, because the other issue's patch was committed.

jhodgdon’s picture

Issue summary: View changes
Status: Needs work » Needs review
FileSize
5.08 KB

So... I think I have this figured out. There were multiple reasons this didn't work:

a) The test didn't work as it was because it didn't run cron, and the comment/statistics rankings now depend on a cron run. So instead of trying to manually update the search index, I substituted a cron run.

b) The statistics module is not actually working very well. In my manual tests, viewing a node did not actually do anything to the node_counter table -- see #2170685: Statistics page views are not being logged. And on this other issue #1684968: More reliable statistics.js: retry after 2 s when request fails I learned that we should probably be doing something else in the test for statistics logging anyway rather than trying to duplicate what the Statistics logging tests are doing already. So instead of trying to figure out how to simulate a visit of node pages, I just added a manual database query.

c) The score functions for comments and statistics were using a CAST(scale AS DECIMAL) and by experimenting in MySQL I found out that CAST(0.1 as DECIMAL) is resulting in a value of 0, thereby multiplying the factors being used to rank nodes by zero, doh! So I took out the cast. I have no idea why that was even in there, or how to make it work, but it was causing the tests to fail because it screwed up the scoring completely (multiplying the stats count by a scale of zero basically takes it out of the equation). And of course it's probably been like this since who knows when, but since search ranking has been broken for these two factors forever, no one noticed.

So the ranking test now passes on my local machine.

I don't think anyone has reviewed this patch yet so I didn't bother with an interdiff. I hope that's OK.

Status: Needs review » Needs work

The last submitted patch, 27: 893302-27.patch, failed testing.

The last submitted patch, 27: 893302-27.patch, failed testing.

jhodgdon’s picture

Status: Needs work » Needs review

27: 893302-27.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, 27: 893302-27.patch, failed testing.

The last submitted patch, 27: 893302-27.patch, failed testing.

jhodgdon’s picture

Status: Needs work » Needs review

27: 893302-27.patch queued for re-testing.

jhodgdon’s picture

OK, that was two different completely unrelated random "test could not complete due to a fatal error" fails. :( Trying again...

mikemiles86’s picture

Assigned: Unassigned » mikemiles86
mikemiles86’s picture

Assigned: mikemiles86 » Unassigned
Status: Needs review » Reviewed & tested by the community
Issue tags: +SprintWeekend2014

Representing Boston Sprint, reviewed patch #27.

  • Patch applied cleanly.
  • After applying patch, SearchRankingTest passes locally.
  • Manually tested by:
    1. generated content with varied number of comments (using devel generate)
    2. set search settings for content to rank only on number of comments (Influence > 0)
    3. Indexed site content and ran cron.
    4. Ran search query. results listed in correct order of number of comments
Nick_vh’s picture

Awesome, let's get this in.

jhodgdon’s picture

Issue summary: View changes

Updating the issue summary

jhodgdon’s picture

Issue tags: +Needs backport to D7
catch’s picture

+++ b/core/modules/comment/comment.module
@@ -1623,8 +1624,8 @@ function comment_ranking() {
+      'arguments' => array(':scale' => (float) \Drupal::state()->get('comment.node_comment_statistics_scale') ?: 0),

Why does this need the cast to float? Would we ever store a non-float? If we really do have to cast somewhere could it be on save instead?

jhodgdon’s picture

We can possibly take that out... The previous code was even worse, as it had a cast to DECIMAL in the SQL code:

-      'score' => '2.0 - 2.0 / (1.0 + ces.comment_count * CAST(:scale AS DECIMAL))',
-      'arguments' => array(':scale' => \Drupal::state()->get('comment.node_comment_statistics_scale') ?: 0),
+      'score' => '2.0 - 2.0 / (1.0 + ces.comment_count * :scale)',
+      'arguments' => array(':scale' => (float) \Drupal::state()->get('comment.node_comment_statistics_scale') ?: 0),

So what I did is move it into PHP instead. I didn't want to make a judgement about the need for a cast, but the SQL cast was not working correctly in the queries.

andypost’s picture

I think this cron implementations needs to check that required modules and settings active/valid before running

+++ b/core/modules/comment/comment.module
@@ -1037,10 +1037,11 @@ function comment_node_update_index(EntityInterface $node, $langcode) {
- * Implements hook_update_index().
...
-function comment_update_index() {

+++ b/core/modules/statistics/statistics.module
@@ -228,21 +231,14 @@ function statistics_ranking() {
- * Implements hook_update_index().
...
-function statistics_update_index() {

If this hook is not executed then should be removed from node.api.php

jhodgdon’s picture

The hook in node.api.php is hook_node_update_index(), not hook_update_index(). They are not at all the same thing, and hook_node_update_index() still exists.

Regarding the other part of #42 "I think this cron implementations needs to check that required modules and settings active/valid before running" ... Here is comment_cron():

+function comment_cron() {
+  // Store the maximum possible comments per thread (used for node search
+  // ranking by reply count).
   \Drupal::state()->set('comment.node_comment_statistics_scale', 1.0 / max(1, db_query('SELECT MAX(comment_count) FROM {comment_entity_statistics}')->fetchField()));

It seems like a quick query to do during cron. It is technically only necessary to do if the Search module is enabled and there is at least one NodeSearch page enabled... at least at the moment. Checking this is probably going to take more time than just doing the query, and some contrib module is going to come along and make their own node searching plugin (which might or might not extend NodeSearch) and want to use the same ranking, and the check will not be valid. So I think it's just better to do it without checking. OK?

webchick’s picture

Status: Reviewed & tested by the community » Needs review

Sounds like this still needs discussion?

jhodgdon’s picture

I think the review comments above have been addressed, but it would be good if catch and andypost would add a quick either "I still think it's wrong" or "OK I accept your explanation" so that either I can make a new patch or we can get this one in. Meanwhile, I'll click retest on it -- the last test was nearly a month ago and things have been changing underfoot.

jhodgdon’s picture

27: 893302-27.patch queued for re-testing.

jhodgdon’s picture

Status: Needs review » Reviewed & tested by the community

Actually can I do it this way:

catch/andypost: If you still have objections, please set the patch back to "needs work". I think your comments have been addressed.

catch’s picture

Version: 8.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

I don't really like the cast, but since there was an existing, worse, cast, we can handle that in a follow-up. It should be left in the backport anyway.

The query every cron run isn't fantastic but if that's the correct check then there's not an obvious way to optimize it.

Committed/pushed to 8.x, thanks!

tstoeckler’s picture

Also came here because the cast threw me off in the commitdiff. I had to look up PHP operator precedence in order to know whether the 0 in the ternary gets casted as well or not (it doesn't). Let's explore removing that, I'd say.

catch’s picture

jhodgdon’s picture

  • catch committed 3ea2e2c on 8.3.x
    Issue #893302 by jhodgdon, rayasa: Search ranking based on the factor '...

  • catch committed 3ea2e2c on 8.3.x
    Issue #893302 by jhodgdon, rayasa: Search ranking based on the factor '...

  • catch committed 3ea2e2c on 8.4.x
    Issue #893302 by jhodgdon, rayasa: Search ranking based on the factor '...

  • catch committed 3ea2e2c on 8.4.x
    Issue #893302 by jhodgdon, rayasa: Search ranking based on the factor '...