Search Ranking Recency scoring algorithm [#303574]

Comment	File	Size	Author
#61	303574-search-ranking-61.patch	12.65 KB	jhodgdon

#57	interdiff.txt	3.27 KB	jhodgdon
#57	303574-search-ranking-57.patch	12.65 KB	jhodgdon

#53	explain_cron.png	16.38 KB	jhodgdon
#48	303574-d8-with-test-48.patch	9.18 KB	jhodgdon

#45	303574-d8-with-test-45.patch	9.23 KB	jhodgdon

#42	interdiff.txt	2.69 KB	jhodgdon
#42	303574-d8-with-test-42.patch	9.85 KB	jhodgdon

#39	303574-d8-with-test-reroll.patch	8.93 KB	jhodgdon

#32	303574-d8-test-only-FAIL.patch	6.9 KB	jhodgdon

#32	303574-d8-with-test.patch	8.92 KB	jhodgdon

#30	proposedrecency.png	12.18 KB	jhodgdon
#30	commentstats.png	10.8 KB	jhodgdon
#27	303574.patch	22.04 KB	douggreen

#22	303574.patch	10.92 KB	douggreen

#18	303574.patch	11.15 KB	douggreen

#9	303574.patch	3.91 KB	douggreen

#7	303574.patch	1.57 KB	douggreen

#5	search_ranking_recency.patch	1.42 KB	BlakeLucchesi

#3	search_ranking_recency.patch	1.33 KB	BlakeLucchesi

#1	Picture 1.png	78.48 KB	BlakeLucchesi
#1	Picture 2.png	72.03 KB	BlakeLucchesi
#1	search_ranking_recency.patch	1.37 KB	BlakeLucchesi

Comment #1

BlakeLucchesi CreditAttribution: BlakeLucchesi commented 4 September 2008 at 21:37

File	Size
search_ranking_recency.patch	1.37 KB

Picture 2.png	72.03 KB
Picture 1.png	78.48 KB

Ok sorry, for some reason my patch and screen shots didn't get posted. Here they are:

Log in or register to post comments

Comment #2

4 September 2008 at 21:42

Patch failed to apply. More information can be found at http://testing.drupal.org/node/14060. If you need help with creating patches please look at http://drupal.org/patch/create

Log in or register to post comments

Comment #3

BlakeLucchesi CreditAttribution: BlakeLucchesi commented 4 September 2008 at 21:50

File	Size
search_ranking_recency.patch	1.33 KB

Fixed typo sorry.

Log in or register to post comments

Comment #4

4 September 2008 at 21:52

Patch failed to apply. More information can be found at http://testing.drupal.org/node/14062. If you need help with creating patches please look at http://drupal.org/patch/create

Log in or register to post comments

Comment #5

BlakeLucchesi CreditAttribution: BlakeLucchesi commented 5 September 2008 at 06:03

File	Size
search_ranking_recency.patch	1.42 KB

Ok working patch for real this time. I used textmate's cvs bundle the last few times and it did the diff with respect to the specific file not Drupal root directory.

Log in or register to post comments

Comment #6

douggreen CreditAttribution: douggreen commented 8 September 2008 at 01:42

Status:

Needs review

» Needs work

Blake, could you give a little more detail of what this new algorithm is doing, and possibly add a comment to the code that explains it (for example, the old comment said "Exponential decay with half-life of 6 months, starting at last indexed node"). I know that 2.718 is e, what's the -5 for?

Remembering that these values must be normalized between 0 and 1, I looked at the two algorithms. The low end of both algorithms is 0, but the high end of the current algorithm does not appear to be one. Is it possible that the only problem with the current algorithm is that the decay isn't working right? Should the multiplication by 6.43e-8 (which is 1 / (60 * 60 * 24 * 365.25 / 2) which I think is a normalization factor, be inside the POW()?

I'm changing to CNR, so that we can discuss and write some simpletests. I think that a test that looped through times from a certain date forward, and showed that the algorithm produced the expected results would be helpful. For example, with the current implementation, we should have a test that starts 6 months ago, and shows that newer items are weighted twice as high as the next newer item, and so on.

Log in or register to post comments

Comment #7

douggreen CreditAttribution: douggreen commented 21 April 2010 at 05:31

Status:

Needs work

» Needs review

File	Size
303574.patch	1.57 KB

If this is a problem, we should try to fix. Re-rolling to current HEAD.

Log in or register to post comments

Comment #8

21 April 2010 at 06:00

Status:

Needs review

» Needs work

The last submitted patch, 303574.patch, failed testing.

Log in or register to post comments

Comment #9

douggreen CreditAttribution: douggreen commented 21 April 2010 at 07:16

Status:

Needs work

» Needs review

File	Size
303574.patch	3.91 KB

Attached patch includes a few improvements:

scales from oldest changed to the newest changed, instead of oldest to REQUEST_TIME.
assumes vars are set, by using an update handler and sensible var defaults. These default are only used after the first node is added and before cron runs the first time. Since we now have poor mans cron, this time shouldn't be too long, yet I think the current defaults will not really fail, just not be the best possible values.
better comment
fixes test to set these vars.

Log in or register to post comments

Comment #10

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 21 April 2010 at 07:20

Status:

Needs review

» Reviewed & tested by the community

Looks good to me. Unless the test bot objects...

Log in or register to post comments

Comment #11

Dave Reid

he/him

English

Nebraska USA

CreditAttribution: Dave Reid commented 21 April 2010 at 08:01

Status:

Reviewed & tested by the community

» Needs review

+++ modules/node/node.module	21 Apr 2010 07:15:37 -0000
@@ -1620,17 +1633,21 @@ function node_ranking() {
+        ':time_range' => MAX($newest_time - $oldest_time, 1),

Should be max() instead of MAX since its in PHP. And needs tests possibly? Or test chunk isn't included in the patch?

Powered by Dreditor.

Log in or register to post comments

Comment #12

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 21 April 2010 at 08:03

Status:	Needs review	» Needs work
Issue tags:		+Needs tests

Hrm. If the tests aren't breaking now and this bug exists, it seems like we need extra test coverage for this..?

Log in or register to post comments

Comment #13

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 21 April 2010 at 17:18

- MAX is not in PHP, really, it's a fragment of a query.
- I'm not sure about a test for this. The old algorithm was bad in that it wasn't giving a reasonable distribution of relevancy rankings. We think the new one is better. The current tests for rankings just test the ordering, not the actual numbers, and the ordering hasn't changed...

Log in or register to post comments

Comment #14

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 21 April 2010 at 17:18

Duh. Scratch that about MAX, you are right it is in PHP.

Log in or register to post comments

Comment #15

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 21 April 2010 at 18:03

Yeah, I guess what I'm saying is it seems like the old tests were insufficient for catching this bug. So it seems like we need expanded test coverage for also checking the relevancy rankings distribution, in addition to the ordering. Unless I misunderstand.

Log in or register to post comments

Comment #16

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 21 April 2010 at 20:45

I hear you. But I don't know if not having tests for relevancy distributions should block fixing this really broken one. Can we make it a separate issue, because not having these tests applies to all the other rankings too, and I also think that while writing them is important, it might not be quite as high priority as some other search issues?

Log in or register to post comments

Comment #17

douggreen CreditAttribution: douggreen commented 22 April 2010 at 04:36

Assigned:

Unassigned

» douggreen

I've written a test, but it found bugs, so I'm fixing the bugs.

Log in or register to post comments

Comment #18

douggreen CreditAttribution: douggreen commented 22 April 2010 at 05:46

Status:

Needs work

» Needs review

File	Size
303574.patch	11.15 KB

Attached patch adds a test, changed MAX to max, and fixes a problem we had with our current tests related to changed times not set properly.

Log in or register to post comments

Comment #19

douggreen CreditAttribution: douggreen commented 22 April 2010 at 05:48

Issue tags:

-Needs tests

Removing "needs tests" tag now that there's a test...

Log in or register to post comments

Comment #20

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 22 April 2010 at 06:06

Status:

Needs review

» Reviewed & tested by the community

Assuming that the test bot agrees, I think this is ready to go.

Log in or register to post comments

Comment #22

douggreen CreditAttribution: douggreen commented 22 April 2010 at 06:18

File	Size
303574.patch	10.92 KB

Ugh, removing debugging...

Log in or register to post comments

Comment #23

douggreen CreditAttribution: douggreen commented 22 April 2010 at 06:20

One more note, this patch fixes the range of the recent node ranking to be from the oldest changed node to the newest changed node, rather than from the oldest changed node to the current time. This way, if nothing has changed in the last N days, the newest node still has the highest recency of 1.0.

Log in or register to post comments

Comment #24

chx CreditAttribution: chx commented 22 April 2010 at 06:48

Good to go. Although using EXP() instead of POW would be a slightly better but we are not bound by anything, it's totally arbitrary.

Log in or register to post comments

Comment #25

22 April 2010 at 07:10

Status:

Reviewed & tested by the community

» Needs work

The last submitted patch, 303574.patch, failed testing.

Log in or register to post comments

Comment #26

douggreen CreditAttribution: douggreen commented 22 April 2010 at 07:13

I'll change it to use EXP(). Plus the new test uncovered a bug that I'm working on the fix for.

Log in or register to post comments

Comment #27

douggreen CreditAttribution: douggreen commented 24 April 2010 at 02:16

File	Size
303574.patch	22.04 KB

I need to test this on postgres, and break out some of the changes into a separate patch, but attaching here so we don't lose it.

Log in or register to post comments

Comment #28

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 7 January 2014 at 19:26

Version:	7.x-dev	» 8.x-dev
Issue summary:	View changes
Issue tags:		+Needs backport to D8

We should be working on this in 8.x now.

Log in or register to post comments

Comment #29

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 7 March 2014 at 00:54

Assigned:	douggreen	» jhodgdon
Issue tags:	-Needs backport to D8	+Needs backport to D7

This look like something we should revive. I'm going to look into it. The latest patch is D7 and also has a bunch of other unrelated stuff in it, but it looks like a good start if we can distill out the relevant pieces and put them into 8.x.

Log in or register to post comments

Comment #30

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 7 March 2014 at 21:32

Issue summary:

View changes

File	Size
commentstats.png	10.8 KB
proposedrecency.png	12.18 KB

I did some analysis of all the existing rankings (gotta put all that math background I have to use sometimes!).

There are 3 hook_ranking() implementations in Core:
- comment_ranking() - Nodes with more comments are ranked higher.
- statistics_ranking() - Nodes with more views are ranked higher
- node_ranking() - provides 4 ranking algorithms:
- Keyword relevance (this is the standard ranking provided actually by the SearchQuery class)
- Sticky
- Promoted to front page
- Recent - Nodes more recently posted are ranked higher

Analysis:
a) The "Sticky" and "Promoted" rankings are fine - nodes with those bits set get a 1, and nodes without it get a 0.

b) Comment and Statistics rankings both use an equation like this:
2.0 - 2.0 / (1.0 + count * scale)
where count is the # of comments or # of views for the node, and scale is calculated during cron to be 1 / (max count). This seems to be OK. I analyzed what it would do for nodes with a count of 1 to 30 (assuming the max in the database was 30), and the values ranged from 0.06 to 1. Here's a graph:
Graph of comment/statistics ranking values for node search

c) Recently updated (the subject of this issue) - This is currently in 8.x using an equation that looks like this:
2^ (updated time - scale) [2^ means "2 to the power of"]
where scale is set to the updated time of the latest indexed node.

Two problems with this:
- The latest indexed node may or may not really be the node with the latest updated time that exists.
- As reported originally in this issue, even for a node 2 hours older than the most recent node, since the times are in seconds, this drops way way too quickly down to essentially zero. [So, I confirmed this report.]

The proposal is instead to use
e^(-5 * (1 - (updated - oldest)/(newest - oldest))))
Here's a graph with hours old on the x-axis (ranging from a just-updated node to a year old):
Graph of proposed recency algorithm

This seems reasonable. Content that is a day old is ranked pretty close to 1. Content that is a week old gets a .91 ranking. Content that is a month old gets 0.66. Comment that is 3 months old gets 0.29. Year-old gets 0.0067. In any case, it's way better than what is there now.

But... the EXP function seems to be available in most variants of SQL. However, it looks like neither POW nor EXP is available in SQLite. I wonder if that is OK? We could instead use something more like what the comment and statistics rankings are using, such as:

2.0 - 2.0 / (1.0 + (updated - oldest)/(newest-oldest))

But this doesn't really fall off fast enough. If you have content that ranges from current to 1 year old, the 6-month-old content would still have a ranking of 0.67, and all the stuff from the first month would be above .95. So I think we probably do want it to be exponential. Using exp() is probably not any worse than using pow(), which is what the existing code does, and exp() seems to be supported in most other databases (unlike pow() which is power() in some databases anyway).

So I think we should stick with exp().

d) Keyword relevance

This is using an equation that looks like (scale * score * count) for each node, where scale = 1 / max(sum(score * count)). So this is just linear in the "score * count" value, and it is weighted with the most relevant node having value 1. I think we can assume this is working OK.

So, I've reconfirmed this issue, and validated that the proposed ranking should be much better than what we have now. I just need to take the patch and port it to D8. What it needs to do:

a) Each cron run, calculate the oldest and newest update times for nodes, and store them in the state system.
b) Use the equation above for the ranking instead of what is there.

I've updated the issue summary and I'm working on a patch.

Log in or register to post comments

Comment #31

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 7 March 2014 at 23:27

Hm. In looking at this code vs. what is there, I also noticed that the current code is based on the created time, and the proposal here is based on the updated time. The ranking is called "recently posted" in the UI. So I think we should actually use the created time and not the updated time in this search ranking, which would be more compatible with what was intended in the original code.

Also I have noticed that it is impossible to test vs. the updated time, because Node::preSave() always sets the updated time to the REQUEST_TIME, which is constant in tests, and there is no way to override this.
#2213411: WebTestBase::drupalCreateNode() shouldn't imply you can set 'changed' time.

Log in or register to post comments

Comment #32

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 8 March 2014 at 01:02

Status:

Needs work

» Needs review

File	Size
303574-d8-with-test.patch	8.92 KB

303574-d8-test-only-FAIL.patch	6.9 KB

OK, here's my first go at a patch for this in Drupal 8. Tests work locally -- and they fail where you'd expect without the patch, so I think they are good tests. Let's see what the bot thinks.

I added some tests, and combined some test methods together. There was a test method called testDoubleRankings() but it was kind of lame (it basically just turned on two rankings and verified you could still search, to test #771596: Turning on multiple search weights causes PDO exception during search). So I removed this, and instead added this functionality test to the testRankings() method, which had nodes for all the ranking options, so that I could actually test that if you prioritized one ranking over another, the ordering was correct. Seemed much better.

Log in or register to post comments

Comment #33

8 March 2014 at 02:02

Status:

Needs review

» Needs work

The last submitted patch, 32: 303574-d8-test-only-FAIL.patch, failed testing.

Log in or register to post comments

Comment #34

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 10 March 2014 at 15:58

Status:

Needs work

» Needs review

The test that failed was supposed to fail in exactly the way it did, to illustrate the problem of this issue. So, we just need a review.

Log in or register to post comments

Comment #35

10 March 2014 at 15:59

The last submitted patch, 27: 303574.patch, failed testing.

Log in or register to post comments

Comment #36

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 10 March 2014 at 16:04

12 files were hidden/shown/deleted

File	Size
search_ranking_recency.patch	1.37 KB

Picture 2.png	72.03 KB
Picture 1.png	78.48 KB
search_ranking_recency.patch	1.33 KB

search_ranking_recency.patch	1.42 KB

303574.patch	1.57 KB

303574.patch	3.91 KB

303574.patch	11.15 KB

303574.patch	10.92 KB

303574.patch	22.04 KB

commentstats.png	10.8 KB
proposedrecency.png	12.18 KB

Huh, wonder why the patch in #27 suddenly got retested?!?

Log in or register to post comments

Comment #37

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 31 March 2014 at 22:50

32: 303574-d8-with-test.patch queued for re-testing.

Log in or register to post comments

Comment #38

31 March 2014 at 22:52

The last submitted patch, 32: 303574-d8-with-test.patch, failed testing.

Log in or register to post comments

Comment #39

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 31 March 2014 at 23:57

File	Size
303574-d8-with-test-reroll.patch	8.93 KB

Easy reroll.

Log in or register to post comments

Comment #40

dawehner

German

CreditAttribution: dawehner commented 2 April 2014 at 05:29

+++ b/core/modules/node/node.module
@@ -773,6 +773,18 @@ function node_permission() {
+  if ($changed = db_query("SELECT MIN(created) AS oldest, MAX(created) AS newest FROM {node_field_data}")->fetchObject()) {

Is there any way to use entity query instead? It does support aggregation to a certain limit.

+++ b/core/modules/node/node.module
@@ -794,14 +806,24 @@ function node_ranking() {
+  if (($node_newest = \Drupal::state()->get('node.newest_update_time')) &&
+    ($node_oldest = \Drupal::state()->get('node.oldest_update_time'))) {

Should we store the information in one state to not require two queries, potentially.

Log in or register to post comments

Comment #41

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 2 April 2014 at 14:53

RE #40...

1. No, I don't think entity query would work. At least, the docs for the execute method say that it returns either an int (if it's a count query) or a list of node IDs (if it's not a count query). Neither is what we want for this query. Besides which, it seems like it would be a huge amount of overhead for a simple query like this.

2. Yes, we could store the result as one state variable. I thought it was clearer code this way, partly because I couldn't think of a good name for a combined state variable and these two names, I thought, were clear... I also don't think, in practice, that two state->get() calls right after each other like that will ever require two different queries?

Log in or register to post comments

Comment #42

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 2 April 2014 at 16:58

File	Size
303574-d8-with-test-42.patch	9.85 KB

interdiff.txt	2.69 KB

1 file was hidden/shown/deleted

File	Size
303574-d8-with-test.patch	8.92 KB

dawehner and I discussed this in IRC, and dawehner supplied code in a pastebin that uses entity aggregate query, which I incorporated.

Also, as suggested, the state is now using one variable instead of two.

Here's a new patch. (No changes to the test part of the patch, so I didn't upload another "test-only" patch.)

dawehner credit!!

Log in or register to post comments

Comment #43

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 23 April 2014 at 16:05

42: 303574-d8-with-test-42.patch queued for re-testing.

Log in or register to post comments

Comment #44

23 April 2014 at 16:08

Status:

Needs review

» Needs work

The last submitted patch, 42: 303574-d8-with-test-42.patch, failed testing.

Log in or register to post comments

Comment #45

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 23 April 2014 at 16:36

Status:

Needs work

» Needs review

File	Size
303574-d8-with-test-45.patch	9.23 KB

2 files were hidden/shown/deleted

File	Size
303574-d8-test-only-FAIL.patch	6.9 KB

303574-d8-with-test-reroll.patch	8.93 KB

Straight reroll. Some other patch took care of the hunk in Drupal.php.

Log in or register to post comments

Comment #46

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 23 April 2014 at 16:36

1 file was hidden/shown/deleted

File	Size
303574-d8-test-only-FAIL.patch	6.9 KB

Log in or register to post comments

Comment #47

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 7 May 2014 at 15:28

45: 303574-d8-with-test-45.patch queued for re-testing.

Log in or register to post comments

Comment #48

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 30 May 2014 at 17:49

File	Size
303574-d8-with-test-48.patch	9.18 KB

Reroll for PSR-4

Log in or register to post comments

Comment #49

9 July 2014 at 15:12

jhodgdon queued 48: 303574-d8-with-test-48.patch for re-testing.

Log in or register to post comments

Comment #50

dawehner

German

CreditAttribution: dawehner commented 9 July 2014 at 18:15

Status:

Needs review

» Reviewed & tested by the community

This patch feels quite good

Log in or register to post comments

Comment #51

catch

he/him

English

CreditAttribution: catch commented 10 July 2014 at 12:15

Status:

Reviewed & tested by the community

» Needs review

+++ b/core/modules/node/node.module
@@ -755,6 +755,29 @@ function node_permission() {
+  $max_alias = 'max_created';
+  $result = \Drupal::entityQueryAggregate('node')
+    ->aggregate('created', 'MIN', NULL, $min_alias)
+    ->aggregate('created', 'MAX', NULL, $max_alias)
+    ->execute();

Could someone run an EXPLAIN on the generated query, feels like it should be fine since there's a dedicated index for the created column, but just in case.

Log in or register to post comments

Comment #52

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 10 July 2014 at 18:20

catch: I can do that, but ... is that the relevant query? That query is in the hook_cron(), so even if it is slightly slow, I don't think it matters a whole lot.

The more relevant query is probably the one that is done during search execute (implied by the hook_ranking() implementation).

Anyway, I'll see what I can do about an explain for both of them.

Log in or register to post comments

Comment #53

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 31 July 2014 at 00:08

File	Size
explain_cron.png	16.38 KB

I forgot about this...

OK, so there are a couple of queries here.

The first one is during node_cron():

+  $min_alias = 'min_created';
+  $max_alias = 'max_created';
+  $result = \Drupal::entityQueryAggregate('node')
+    ->aggregate('created', 'MIN', NULL, $min_alias)
+    ->aggregate('created', 'MAX', NULL, $max_alias)
+    ->execute();

This results in a query like this:

SELECT MIN(node_field_data.created) AS min_created, MAX(node_field_data.created) AS max_created FROM node base_table LEFT JOIN node_field_data node_field_data ON node_field_data.vid = base_table.vid

Here's a screen shot of explain output in phpmyadmin:
explain output on above query

I'll look at the other one in the next comment... I really don't know much about explain queries... not sure if this is good or bad, but this particular query is only happening once per cron run, so I am not sure why we even care too much?

Log in or register to post comments

Comment #54

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 31 July 2014 at 00:38

b) The second query happens during searches, and is in the node_ranking() code.

Without this patch, we have:

     $ranking['recent'] = array(
      'title' => t('Recently posted'),
      // Exponential decay with half-life of 6 months, starting at last indexed node
      'score' => 'POW(2.0, (GREATEST(n.created, n.changed) - :node_cron_last) * 6.43e-8)',
      'arguments' => array(':node_cron_last' => $node_cron_last),
     );

And with this patch, we have:

    $ranking['recent'] = array(
      'title' => t('Recently created'),
      'join' => array(
        'type' => 'LEFT',
        'table' => 'node_field_data',
        'alias' => 'nfd',
        'on' => 'nfd.nid = sid',
      ),
      // Exponential decay with half life of 14% of the age range of nodes.
      'score' => 'EXP(-5 * (1 - (nfd.created - :node_oldest) / :node_range))',
      'arguments' => array(
        ':node_oldest' => $node_min_max['min_created'],
        ':node_range' => max($node_min_max['max_created'] - $node_min_max['min_created'], 1),
      ),
    );

These are both added into the search query. So, to turn this on:
- Add a node.
- Go to the Search settings page, edit the NodeSearch page, and set the influence on Recent to 10 (default is 0 meaning "ignore this factor").

Then do a search. This results in a query like this with the patch:

[Note: EDIT - with and without patches were reversed in this section!]

SELECT i.langcode AS langcode, i.type AS type, i.sid AS sid, SUM((CAST (:multiply_0 AS DECIMAL(10,4))) * COALESCE((((CAST (:normalization_0 AS DECIMAL(10,4))) * i.score * t.count)), 0) / (CAST (:total_0 AS DECIMAL(10,4))) + (CAST (:multiply_1 AS DECIMAL(10,4))) * COALESCE((EXP(-5 * (1 - (nfd.created - :node_oldest) / :node_range))), 0) / (CAST (:total_1 AS DECIMAL(10,4)))) AS calculated_score FROM search_index i INNER JOIN node_field_data n ON n.nid = i.sid LEFT JOIN node_field_data nfd ON nfd.nid = sid INNER JOIN search_total t ON i.word = t.word INNER JOIN search_dataset d ON i.sid = d.sid AND i.type = d.type AND i.langcode = d.langcode WHERE (n.status = :db_condition_placeholder_0) AND( (i.word = :db_condition_placeholder_1) )AND (i.type = :db_condition_placeholder_2) AND( (d.data LIKE :db_condition_placeholder_3 ESCAPE '\\') ) GROUP BY i.langcode, i.type, i.sid HAVING (COUNT(*) >= :matches) ORDER BY calculated_score DESC LIMIT 10 OFFSET 0

And without the patch:

SELECT i.langcode AS langcode, i.type AS type, i.sid AS sid, SUM((CAST (:multiply_0 AS DECIMAL(10,4))) * COALESCE((((CAST (:normalization_0 AS DECIMAL(10,4))) * i.score * t.count)), 0) / (CAST (:total_0 AS DECIMAL(10,4))) + (CAST (:multiply_1 AS DECIMAL(10,4))) * COALESCE((POW(2.0, (GREATEST(n.created, n.changed) - :node_cron_last) * 6.43e-8)), 0) / (CAST (:total_1 AS DECIMAL(10,4)))) AS calculated_score FROM search_index i INNER JOIN node_field_data n ON n.nid = i.sid INNER JOIN search_total t ON i.word = t.word INNER JOIN search_dataset d ON i.sid = d.sid AND i.type = d.type AND i.langcode = d.langcode WHERE (n.status = :db_condition_placeholder_0) AND( (i.word = :db_condition_placeholder_1) )AND (i.type = :db_condition_placeholder_2) AND( (d.data LIKE :db_condition_placeholder_3 ESCAPE '\\') ) GROUP BY i.langcode, i.type, i.sid HAVING (COUNT(*) >= :matches) ORDER BY calculated_score DESC LIMIT 10 OFFSET 0

Ugh. I have to go and those placeholders... do you need an explain on this one?

Log in or register to post comments

Comment #55

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 31 July 2014 at 03:35

Status:

Needs review

» Reviewed & tested by the community

These two queries are essentially the same. The difference is that with the patch, the calculation of the score goes from

POW(2.0, (GREATEST(n.created, n.changed) - 100000) * 6.43e-8)

to

EXP(-5 * (1 - (nfd.created - 100) / 50))

(so only created is used instead of changed, and a different mathematical function).

And there is a new join between node_field_data and search_index, on nfd.nid = i.sid

When I run an explain on a query that does this join, it says it is using PRIMARY for that, so I think it is OK. There is not any difference between the two queries in the WHERE, ORDER BY, or GROUP BY, or HAVING, except the score is the ORDER BY, so the ordering will be different. But if anything, the score is involving fewer database fields in the new query.

I'm provisionally going to set this back to RTBC...

Log in or register to post comments

Comment #56

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott commented 4 August 2014 at 08:49

Status:

Reviewed & tested by the community

» Needs work

We're not using the node.cron_last state anymore there we can should remove all sets and deletes of it - NodeSearch::indexNode() and node_uninstall() need fixing.

Log in or register to post comments

Comment #57

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 5 August 2014 at 22:08

Status:

Needs work

» Needs review

File	Size
303574-search-ranking-57.patch	12.65 KB

interdiff.txt	3.27 KB

4 files were hidden/shown/deleted

File	Size
interdiff.txt	2.69 KB
303574-d8-with-test-45.patch	9.23 KB

303574-d8-with-test-48.patch	9.18 KB

explain_cron.png	16.38 KB

Ah, good idea.

Here's a new patch. This removes everything having to do with the state variable 'node.cron_last', and in addition, NodeSearch doesn't require State any more.

Log in or register to post comments

Comment #58

20 August 2014 at 13:17

jhodgdon queued 57: 303574-search-ranking-57.patch for re-testing.

Log in or register to post comments

Comment #59

10 September 2014 at 11:30

anavarre queued 57: 303574-search-ranking-57.patch for re-testing.

Log in or register to post comments

Comment #60

10 September 2014 at 11:31

Status:

Needs review

» Needs work

The last submitted patch, 57: 303574-search-ranking-57.patch, failed testing.

Log in or register to post comments

Comment #61

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 10 September 2014 at 16:14

Status:

Needs work

» Reviewed & tested by the community

File	Size
303574-search-ranking-61.patch	12.65 KB

3 files were hidden/shown/deleted

File	Size
303574-d8-with-test-42.patch	9.85 KB

303574-search-ranking-57.patch	12.65 KB

interdiff.txt	3.27 KB

Straight reroll. This is probably RTBC...

Log in or register to post comments

Comment #62

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott commented 12 September 2014 at 10:24

Issue tags:

+Needs change record

We need a CR to outline the change and removal of node.cron_last

Log in or register to post comments

Comment #63

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 12 September 2014 at 13:31

Issue tags:

-Needs change record

Good idea. https://www.drupal.org/node/2337437 is the change record.

Log in or register to post comments

Comment #64

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott commented 12 September 2014 at 16:35

Version:	8.0.x-dev	» 7.x-dev
Status:	Reviewed & tested by the community	» Patch (to be ported)

Committed 829c436 and pushed to 8.0.x. Thanks!

Log in or register to post comments

Comment #65

12 September 2014 at 18:22

alexpott committed 829c436 on 8.0.x

Issue #303574 by jhodgdon, douggreen, BlakeLucchesi: Fixed Search...

Log in or register to post comments

Comment #66

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 12 September 2014 at 20:46

Assigned:

jhodgdon

» Unassigned

Thanks alexpott!

Log in or register to post comments

Comment #67

19 November 2015 at 17:53

alexpott committed 829c436 on 8.1.x

Issue #303574 by jhodgdon, douggreen, BlakeLucchesi: Fixed Search...

Log in or register to post comments

Comment #68

3 August 2016 at 00:04

alexpott committed 829c436 on 8.3.x

Issue #303574 by jhodgdon, douggreen, BlakeLucchesi: Fixed Search...

Log in or register to post comments

Comment #69

3 August 2016 at 00:05

alexpott committed 829c436 on 8.3.x

Issue #303574 by jhodgdon, douggreen, BlakeLucchesi: Fixed Search...

Log in or register to post comments

Comment #70

27 January 2017 at 17:53

alexpott committed 829c436 on 8.4.x

Issue #303574 by jhodgdon, douggreen, BlakeLucchesi: Fixed Search...

Log in or register to post comments

Comment #71

27 January 2017 at 17:52

alexpott committed 829c436 on 8.4.x

Issue #303574 by jhodgdon, douggreen, BlakeLucchesi: Fixed Search...

Log in or register to post comments

Search Ranking Recency scoring algorithm

Problem/Motivation

Proposed resolution

Remaining tasks

User interface changes

API changes

Original report by @BlakeLucchesi

Comments

Change records for this issue