Problem/Motivation

This is a follow up of #2158229: Site search indexing broken on PostgreSQL As soon as your search database contains some entries you will not be able to do a search on PostgreSQL, because the search results page throws an error of: The website has encountered an error. Please try again later.
From the PostgreSQL logs, this error is due a missing GROUP BY statement:

ERROR: column "i.langcode" must appear in the GROUP BY clause or be used in an aggregate function at character 8
STATEMENT: SELECT i.langcode AS langcode, i.type AS type, i.sid AS sid, SUM((2.5797131359 * i.score * t.count)) AS calculated_score
FROM
search_index i
INNER JOIN node_field_data n ON n.nid = i.sid
INNER JOIN search_total t ON i.word = t.word
INNER JOIN search_dataset d ON i.sid = d.sid AND i.type = d.type
WHERE (n.status = '1') AND( (i.word = 'drupal') )AND (i.type = 'node_search') AND( (d.data::text ILIKE '% drupal %') )
GROUP BY i.type, i.sid
HAVING (COUNT(*) >= '1')
ORDER BY calculated_score DESC
LIMIT 10 OFFSET 0

Proposed resolution
The fix is to add "i.langcode" to the GROUP BY clause in SearchQuery.php in search module.

Remaining tasks
Testing.

User interface changes
None.

API changes
None.

Comments

bzrudi71’s picture

Assigned: Unassigned » bzrudi71
Issue summary: View changes
Status: Active » Needs review
StatusFileSize
new594 bytes
jhodgdon’s picture

Status: Needs review » Reviewed & tested by the community

Thanks! Simple fix and looks correct to me, from what I know about PostgreSQL vs. MySQL.

chx’s picture

Status: Reviewed & tested by the community » Needs review

I am much less sure about it. Not saying it's wrong because I do not have a good grasp on what the table contains multilanguage wise but a) shouldn't it be the first in grouping b) instead of grouping; if it's always the same in a group; why not MAX(i.langcode) AS langcode in select and be done? Of course FIRST or LAST would be the best but guess what? MySQL doesn't support that.

jhodgdon’s picture

I'm not sure about the order of groupings. Does it matter?

But yes I think it needs to be a group by and not a max(), first() etc. For a given value of sid, there can be multiple languages. For instance, when the Node module uses the search index to index a node, it adds an entry for each language that exists on the node, and "sid" here is the node ID.

chx’s picture

Status: Needs review » Reviewed & tested by the community

Sorry, I was wrong, the order of grouping (here) doesn't matter. In ANSI SQL it doesn't matter and we are not using the MySQLisms where it does.

jhodgdon’s picture

Status: Reviewed & tested by the community » Needs work

I was too quick to set this to RTBC, as pointed out by alexpott in IRC today.

So. The problem is that i.langcode is not always added to the search query, and in particular, at the time of SearchQuery::executeFirstPass(), in particular, it has not yet been added to the query, and it will not be there for all search queries for general search plugins.

So, let's step back here and look at NodeSearch::execute().

First off, this method will add i.langcode to the query if the user is running an advanced search that is limited by language. This code will most likely not work in PostgreSQL because it is adding where() clauses without necessarily adding the fields to the query. So we should test advanced searches in PostgreSQL and see if they are working.

Second, this method will add i.langcode **after excecuteFirstPass()**, in this code:

    $find = $query
      // Add the language code of the indexed item to the result of the query,
      // since the node will be rendered using the respective language.
      ->fields('i', array('langcode'))
      ->limit(10)
      ->execute();

So here, where it does fields() it also needs to add the groupBy there.

So we need a new patch and we also need to test advanced searches on PostgreSQL to see if they work.

bzrudi71’s picture

I will do some advanced testing with languages at least tomorrow.

bzrudi71’s picture

@jhodgdon Okay, here we go...

I started testing of advanced search and did the following:

1. Enabled Language and Content translation
2. Made Article node type translatable
3. Enabled user translation for Body field
4. Created three Articles, one in English, one in German, one in Dutch language
5. All Articles contain some common keywords to search for e.g. 'testkeyword"

Now started an advance search just for 'testkeyword' in the 'Containing any of the words' field with no other options selected. Works without any problem. All three articles show up with the highlighted keyword.

Next started the same advanced search with just one selected language. Works without problems but still shows all three articles.

Next up, same search with two of three languages selected. Still no error, but again all three articles show up.

So it seems the language select options are silently ignored, hmmh... I now added a new node type to see if search limiting by node type works. And yes that worked, only the selected node type containing the keyword shows up.

Guess search limited by language should be tested with MySQL too as it may be a bug and not a PG problem. That said, advanced search unexpectedly shows no PDOExceptions;-)

bzrudi71’s picture

Just to make sure. I looked into the search_index and search_dataset relations and the langcode is set right for the created articles (e.g. en/de/nl), so no problem here...

jhodgdon’s picture

Did you test with your earlier (incorrect) patch in there? That could be a problem.

I think we have tests (I hope we have tests!) that verify the language advanced search works, and if so they have been run on MySQL... I'm testing now and will report back shortly.

jhodgdon’s picture

I just tested and I agree that in MySQL the advanced search is totally broken by language. I've just filed a separate issue on that:
#2161067: Advanced search with Language filter does not work

So let's in this issue try to just fix the PostgreSQL bug on plain non-advanced search.

bzrudi71’s picture

Good to see it's not always PG breaking things ;-)
So, what's next? I added the groupBy('i') to the find query as you suggested. Currently it doesn't have any effect. This may change once search limited by language is working, so should I rewrite patch to also include the groupBy('i')?
Please advice, thanks!

bzrudi71’s picture

Assigned: bzrudi71 » Unassigned

Unfortunately I'm off for 4 weeks for a trip and can't work on this before February, so anyone is invited to pick up and fix this issues in the meantime ;-)

jhodgdon’s picture

Status: Needs work » Needs review
StatusFileSize
new828 bytes

I installed D8 with PostgreSQL today and tested this. It was still broken, and running the existing Search module tests was sufficient to trigger this error.

I analyzed what was going on. The error you get is:

PDOException: SQLSTATE[42803]: Grouping error: 7 ERROR: column "i.langcode" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT i.langcode AS langcode, i.type AS type, i.sid AS sid,... ^ in PDOStatement->execute() (line 61 of /home/jhodgdon/gitrepos/drupal_pg8/core/lib/Drupal/Core/Database/Statement.php).

This is happening at the main execute() step in SearchQuery (after the first pass query is done). The reason is that NodeSearch has a line saying:

   // Load results.
    $find = $query
      // Add the language code of the indexed item to the result of the query,
      // since the node will be rendered using the respective language.
      ->fields('i', array('langcode'))

So it is adding the i.langcode field to the query. This doesn't work in PostgreSQL because the SearchQuery::execute() method has made it ba an aggregate query:

   $this
      ->condition('i.type', $this->type)
      ->groupBy('i.type')
      ->groupBy('i.sid')
      ->having('COUNT(*) >= :matches', array(':matches' => $this->matches));

So now that NodeSearch is adding a field, it also needs to either add a groupBy for the field, or aggregate it.

I believe in this case that the correct answer is to add a groupBy -- we would want to have each translation of each node as a separate search result.

So, the attached patch fixes the problem.

Do we have a way to trigger a PostgreSQL test? On my local machine, it is fixing the tests so that instead of mostly failing, they mostly work... So far at least; I'm still running them and they take a while, but tests that were failing before this patch are now passing, so I've verified by manually running the existing Search module tests on PostgreSQL that it works. I also verified manually that without this patch, I get the above error when I try to run any search on nodes, and with this patch, node searching works.

jhodgdon’s picture

Priority: Normal » Major

Also, functionality that is broken completely for PostgreSQL is a "major" bug.

bzrudi71’s picture

I came up with the exactly same solution while doing more PG local testing but didn't upload the patch yet because of the ongoing fixes in search to prevent re-rolls. That said, the patch works as expected and makes search tests pass locally. Except one failing test that is related to #1003788: PostgreSQL: PDOException:Invalid text representation when attempting to load an entity with a string or non-scalar ID but is out of scope here...

Regarding TestBot, sadly there is not much progress and TestBot runs out of Disk space while testing :-( please see: this comment from jthorson...

So +1 for RTBC and thanks for setting up PostgreSQL!

jhodgdon’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: +Quick fix

Rather than +1 for RTBC, how about just marking it RTBC? Or I will, based on comment #16 -- it's a one-line fix anyway and fairly obvious I think... Thanks for the review!

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed e17552a and pushed to 8.x. Thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.