Part of #2195815: [meta] Search module tests do not pass on PostgreSQL

I ran the Search module tests on PostgreSQL today, with the patch for #2158339: Search results page broken on PostgreSQL applied.

The test SearchRankingTest failed with an exception, in the "number of views" step:

PDOException: SQLSTATE[22P02]: Invalid text representation: 7 
ERROR: invalid input syntax for integer: "0.2" LINE 1: ...CE(( 2.0 - 2.0 / (1.0 + node_counter.totalcount * '0.2')), 0... ^: 
SELECT i.langcode AS langcode, i.type AS type, i.sid AS sid, SUM(CAST(:multiply_0 AS DECIMAL) * COALESCE(( 2.0 - 2.0 / (1.0 + node_counter.totalcount * :scale)), 0) / CAST(:total_0 AS DECIMAL)) AS calculated_score FROM {search_index} i INNER JOIN {node_field_data} n ON n.nid = i.sid INNER JOIN {search_total} t ON i.word = t.word LEFT JOIN {node_counter} node_counter ON node_counter.nid = i.sid INNER JOIN {search_dataset} d ON i.sid = d.sid AND i.type = d.type WHERE (n.status = :db_condition_placeholder_0) AND( (i.word = :db_condition_placeholder_1) )AND (i.type = :db_condition_placeholder_2) AND( (d.data ILIKE :db_condition_placeholder_3) ) GROUP BY i.type, i.sid, i.langcode HAVING (COUNT(*) >= :matches) ORDER BY calculated_score DESC LIMIT 10 OFFSET 0; Array ( [:db_condition_placeholder_0] => 1 [:db_condition_placeholder_1] => rocks [:db_condition_placeholder_2] => node_search [:db_condition_placeholder_3] => % rocks % [:matches] => 1 [:scale] => 0.2 [:multiply_0] => 10 [:total_0] => 10 ) 
in Drupal\node\Plugin\Search\NodeSearch->execute() (line 232 of /home/jhodgdon/gitrepos/drupal_pg8/core/modules/node/lib/Drupal/node/Plugin/Search/NodeSearch.php). 

Hm... this is weird, I thought we took the CAST statements out of this in #893302: Search ranking based on the factor "number of comments" & "No of Page Views" is broken -- will need to investigate.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

bzrudi71’s picture

The scale value is quoted e.g. '0.2' in the database query which is wrong and will fail, see PG log:

ERROR:  invalid input syntax for integer: "0.2" at character 146
STATEMENT:  SELECT i.langcode AS langcode, i.type AS type, i.sid AS sid, SUM(CAST('10' AS DECIMAL) * COALESCE(( 2.0 - 2.0 / (1.0 + node_counter.totalcount * '0.2')), 0) / CAST('10' AS DECIMAL)) AS calculated_score
	FROM 
	simpletest572179search_index i
	INNER JOIN simpletest572179node_field_data n ON n.nid = i.sid
	INNER JOIN simpletest572179search_total t ON i.word = t.word
	LEFT JOIN simpletest572179node_counter node_counter ON node_counter.nid = i.sid
	INNER JOIN simpletest572179search_dataset d ON i.sid = d.sid AND i.type = d.type
	WHERE  (n.status = '1') AND( (i.word = 'rocks') )AND (i.type = 'node_search') AND( (d.data::text ILIKE '% rocks %') )
	GROUP BY i.type, i.sid, i.langcode
	HAVING  (COUNT(*) >= '1') 
	ORDER BY calculated_score DESC
	LIMIT 10 OFFSET 0
jhodgdon’s picture

Right. And I think this is the part of the query that's generated inside of the SearchQuery class, not the parts specifically generated in NodeSearch from the hook_rankings() stuff from Statistics and other modules.

I think we're going to have to revisit the really convoluted code in SearchQuery that adds the scale to the rankings stuff. We should also look at
#2194403: Remove unnecessary cast to float from comment_ranking() and statistics_ranking()
in PostgreSQL and see if we can clean all of that code up, plus making sure it works in PostgreSQL. It's really... weird queries and weird code. To say the least.

jhodgdon’s picture

Assigned: Unassigned » jhodgdon

I am working on a patch for this issue.

jhodgdon’s picture

OK, I don't get it.

What's happening here is that statistics_ranking() is setting a "score" expression of:

'2.0 - 2.0 / (1.0 + node_counter.totalcount * :statistics_scale)'

and then it is adding an argument:

       'arguments' => array(':statistics_scale' => (float) \Drupal::state()->get('statistics.node_counter_scale') ?: 0),

However, when PostgreSQL substitutes that argument into the query, for some reason it is going in as '0.2' instead of the number 0.2.

I do not understand why that would be. Any ideas?

jhodgdon’s picture

I did some further debugging.

In my test site with manual testing, I made some content and set Search to use Number of Views.

I'm getting pretty much the same error that comes up in the tests, with a slightly different number for the scale coming through:

PDOException: SQLSTATE[22P02]: Invalid text representation: 7 ERROR: invalid input syntax for integer: "0.5" LINE 1: ...SCE((2.0 - 2.0 / (1.0 + node_counter.totalcount * '0.5')), 0... ^ in PDOStatement->execute() (line 61 of core/lib/Drupal/Core/Database/Statement.php).

The full query in the error is this, as reported above:

SELECT i.langcode AS langcode, i.type AS type, i.sid AS sid, SUM(:multiply_0 * COALESCE((2.0 - 2.0 / (1.0 + node_counter.totalcount * :statistics_scale)), 0) / :total_0) AS calculated_score
FROM 
{search_index} i
INNER JOIN {node_field_data} n ON n.nid = i.sid
INNER JOIN {search_total} t ON i.word = t.word
LEFT JOIN {node_counter} node_counter ON node_counter.nid = i.sid
INNER JOIN {search_dataset} d ON i.sid = d.sid AND i.type = d.type
WHERE  (n.status = :db_condition_placeholder_0) AND( (i.word = :db_condition_placeholder_1) )AND (i.type = :db_condition_placeholder_2) AND( (d.data ILIKE :db_condition_placeholder_3) )
GROUP BY i.type, i.sid, i.langcode
HAVING  (COUNT(*) >= :matches) 
ORDER BY calculated_score DESC
LIMIT 10 OFFSET 0

So in core/lib/Drupal/Core/Database/Query::arguments(), I put in some debugging and I verified that in the arguments array that this function returns, the argument called ':statistics_scale' is definitely a float value (not a string).

I also looked into the database code, and it looks to me like once arguments() is called to get the substitution list, it is just passed into the PDO execute() method for final substitution.

So I still don't get it. Why would PDO decide that this particular argument should be substituted in as a string?

jhodgdon’s picture

Status: Active » Needs review
FileSize
6.92 KB

It looks like if you embed a numeric placeholder deep inside a query expression like this, the PostgreSQL PDO driver simply gets confused and it might sometimes put in a string, and although PostgreSQL can sometimes do automatic casts, it doesn't always work out.

So the only solution appears to be to put a CAST into the SQL, which will negate any possible problems with this.

Here's a patch. I'm going to run all the Search tests with this patch installed and see if they work, while I set the bot loose on the MySQL tests.

Status: Needs review » Needs work

The last submitted patch, 6: 2195907-search-pg-6.patch, failed testing.

jhodgdon’s picture

Status: Needs work » Needs review
FileSize
6.92 KB

Oops, I missed a ) in comment_ranking() in the score expression. This should fix it. I had tested manually on PostgreSQL with the statistics ranking turned on, but not comment ranking, which I broke with the patch.

So, with this new patch, all of the Search module tests pass in PostgreSQL except for one that is a separate and unrelated issue already.

And by the way, this patch also gets rid of the whole idea of the kludgy code that had in the past caused problems with international users not being able to use search, if they had "," as their decimal point. This was fixed in 8 but it was still a kludge; it's slightly less so now.

bzrudi71’s picture

Status: Needs review » Reviewed & tested by the community

I even spend a hour and more to trace down where the string cast happened in the code with no luck :-( The argument in the score expression in addScore() is for sure still a float, I checked that in several places with debug messages. It's still even a float in Query::arguments() but then at some point, and that was where I gave up, it get's casted to string, no idea. So I think the proposed solution to do the SQL casting is the way to go.
Patch from #8 makes PG pass all the search tests, beside the one left failing as noted in #8. As the SQL casting seems to work even for MySQL bot this seems RTBC from my side.

btw, nice normalization stuff cleanup :-)

jhodgdon’s picture

Great!

Just a note for whoever commits this: It will conflict with #2101183: Move {comment_entity_statistics} to proper service, but this issue has better code in it that will work with PostgreSQL. We need to make sure the better code here eventually gets into the new class being created for comment statistics on that other issue.

ianthomas_uk’s picture

Shouldn't we add a comment explaining why the cast is there? We've already removed it once because it looked unnecessary.

jhodgdon’s picture

Status: Reviewed & tested by the community » Needs work

OK.

jhodgdon’s picture

Status: Needs work » Needs review
FileSize
7.79 KB
4.12 KB

OK, here's a new patch. The only difference is that I expanded the code comments in comment.module and statistics.module. Interdiff also attached.

ianthomas_uk’s picture

Status: Needs review » Needs work

The SQL-cast comments look good, but I don't think the php-cast changes are needed at all.

jhodgdon’s picture

Status: Needs work » Needs review
FileSize
7.54 KB
1.48 KB

Good point.

ianthomas_uk’s picture

Status: Needs review » Reviewed & tested by the community

That looks better, RTBC assuming tests pass.

jhodgdon’s picture

Assigned: jhodgdon » Unassigned

Unassigning because I know some potential committers ignore issues in the RTBC queue assigned to me.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Wow, nice sleuthing!

Committed and pushed to 8.x. Thanks! One step closer to PostgreSQLishisness. :)

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

amateescu’s picture

Just for posterity, I'd like to note that SQLite does not have a DECIMAL data type, so the search ranking algorithm is currently broken for SQLite. The good news is that it's being fixed in #2454731: SQLite: Fix search\Tests\SearchRankingTest :)