There's a common misunderstanding of (e)dismax which is the default query parser used by Serch API Solr Search. Even the multilingual backend adds more query fields using the qf parameter. But that leads to fewer or no results in some cases. Especially if people add more fulltext fields to the index and configure an exposed fulltext search in views that queries multiple fields.

See https://opensourceconnections.com/blog/2013/04/15/querying-more-fields-m...

Querying More Fields != More Results

The Search API database doesn't know anything about a dismax concept at all. And the Solr "Any Schema Backend" might hit a select handler that is configured to any kind of strange combinations of query parsers. So lets force explicit query handlers and build an edismax query the way most people understand the documentation.

Example: If we send the parameters q=(+a +b) and qf=field1^50 field2, the edismax query parser creates something like

DISMAX(field1:a^50 OR field2:a) AND DISMAX(field1:b^50 OR field2:b)

But what we and most users expect is something like

DISMAX((field1:a AND field1:b)^50 OR (field2:a AND field2:b))

That would mostly be what the database backend does with a benefit of a edismax on top.

At the moment it is very easy to end up with zero results if you simply add many fulltext fields, or if you use the multilingual backend and enter stopwords, or whenever you do something that is too sophisticated for edismax query parser.
On the other hand it will be a huge amount of work an an API break to adjust the code to use the standard query parser again including all its disadvantages.
Therfore I suggest to keep the current erroneous usage of the solarium edismax component in our hooks and to tweak the query right before sending it to Solr.
Maybe we can add a switch to turn that tweaking off for real experts that don't use Views but just the API.

CommentFileSizeAuthor
#5 2948469.patch11.13 KBmkalkbrenner
#3 2948469.patch4.21 KBmkalkbrenner

Comments

mkalkbrenner created an issue. See original summary.

mkalkbrenner’s picture

Issue summary: View changes
mkalkbrenner’s picture

Status: Active » Needs review
StatusFileSize
new4.21 KB

Here's a first patch. Testers welcome!

mkalkbrenner’s picture

Title: Wrong usage of edismax causes fewer or no results » Wrong usage of edismax causes fewer or no results on Solr 7.1 and earlier; Exceptions on Solr 7.2

Unfortunately this patch only solves the issue for Solr versions up to 7.1.
For 7.2 it doesn't work anymore:

Users should be aware of the following major changes from v7.1:

Starting a query string with local parameters {!myparser …​} is used to switch from one query parser to another, and is intended for use by Solr system developers, not end users doing searches. To reduce negative side-effects of unintended hack-ability, Solr now limits the cases when local parameters will be parsed to only contexts in which the default parser is "lucene" or "func".

So, if defType=edismax then q={!myparser …​} won’t work. In that example, put the desired query parser into the defType parameter.

Another example is if deftype=edismax then hl.q={!myparser …​} won’t work for the same reason. In this example, either put the desired query parser into the hl.qparser parameter or set hl.qparser=lucene. Most users won’t run into these cases but some will need to change.

If you must have full backwards compatibility, use luceneMatchVersion=7.1.0 or an earlier version.

The eDisMax parser by default no longer allows subqueries that specify a Solr parser using either local parameters, or the older _query_ magic field trick.

For example, {!prefix f=myfield v=enterp} or _query_:"{!prefix f=myfield v=enterp}" are not supported by default any longer. If you want to allow power-users to do this, set uf=*,_query_ or some other value that includes _query_.

If you need full backwards compatibility for the time being, use luceneMatchVersion=7.1.0 or something earlier.

see https://lucene.apache.org/solr/guide/7_2/solr-upgrade-notes.html

But that change introduces another critical issue as well. We already use different query parsers within an edismax, especially in the location searches. These will lead to an error if you replace the currently preset luceneMatchVersion=7.0 by luceneMatchVersion=7.2.

mkalkbrenner’s picture

StatusFileSize
new11.13 KB

OK, I swapped the edismax parser for the standard lucene parser - the only only one that gives us full flexibility. Therefor I refactored the query builder. Most users shouldn't notice a difference, except that they get better(!) search results.
But for advanced users I'll re-add edismax on demand in a follow-up issue.

  • mkalkbrenner committed 6b42c2c on 8.x-2.x
    Issue #2948469 by mkalkbrenner: Wrong usage of edismax causes fewer or...
mkalkbrenner’s picture

Status: Needs review » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.