Hoping you might be able to point me in the right direction.

We have an indexed search (using a SOLR server) and the view we have to search the indexed:node is working fine. Essentially, it is an exposed Fulltext search from a form that displays the results.

However, when anything is put in quotes -- no results are ever found.

So a search for Pizza Pie will give results, a search for "pizza pie" will not.

Is there a tweek or setting I should be looking at either on the view, in the search api module, or the slor indexing itself that can shed some light on this problem?

Been banging my head against a wall for days.

Workflow:
Standard HTML form that posts the field 'search_api_views_fulltext' to /search
/search is the page created by a view that takes the 'search_api_views_fulltext' parameter for the exposed filter search.

So a search for "pizza pie" hits
/search?search_api_views_fulltext="Pizza+pie"

Any help appreciated. Or let me know if I should be looking somewhere else.

CommentFileSizeAuthor
#4 brokensearch.tgz12.12 KBjp.stacey
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drunken monkey’s picture

Make sure that in the "Query settings" (in the view, under "Advanced"), "Parse mode" is NOT set to "Single term" (both others should work fine). Also make sure that for the "Search: Fulltext search" filter "Use as" is set to "Search keys".
Other than that, you could post the query from the Solr log here and let me see whether I see something odd. But other than the two suggestions above, I'm not sure what could cause this.

CLEE25’s picture

Thank you so much for the reply. I really appreciate you taking the time to help.

When my query string is set to 'Direct Query' I can get results when putting the search phrase in quotes -- although it just seems to be ignoring the quotes.

For example, if I search for "King David" I get results for anything with the word "king" or "David" -- not just the phrase "King David."

at least that is better than getting nothing -- which was happening until I got your advice :)

However, when I changed the parsing from 'Direct Query' to 'Multiple Term' -- and put something in quotes, I get nothing.

My USE AS is set to "Search Keys", and I have clicked 'Contains all words' as well, but no joy.

I don't have access to the logs -- Pantheon (my host) doesn't have them available. They do have a query analyzer that I am trying to work with but having no luck.

If I can get some sort of log or this analyzer to work I will post more.

drunken monkey’s picture

Issue tags: -quotes, -encoding, -Phrase Searching

Are you absolutely sure that your index is set to use the Solr server? Ignoring quotes is usually only what the database backend does.
Which version of Solr are you using? While I can't think of any specific relevant problem at the moment, there are a lot of issues with Solr 1.4 which could probably also cause this behavior.
If you don't have access to the logs, you can maybe insert debugging code into your Search API Solr Search module?
Near the bottom of includes/solr_connection.inc, insert watchdog('debug_solr', $queryString, NULL, WATCHDOG_DEBUG); right after $queryString = $this->httpBuildQuery($params);. Then, after a query is sent to Solr, you will find it in your Drupal site's "Recent log messages". (This assumes that you are using a recent version of the Solr module. Otherwise, you should update in any case (the Search API module as well), since maybe your problem is already fixed in a newer version.)

(Also, please don't put tags on issues, these are primarily for maintainer use. See the issue tag guidelines.)

jp.stacey’s picture

FileSize
12.12 KB

Hi,

we've got a similar problem with a search-powered view: any ideas? We're using:

* search_api 7.x-1.12+2-dev
* search_api_solr 7.x-1.5+1-dev
* Solr 3.6.2 (.2013.12.15.17.23.54)
* Unedited version of schema drupal-4.2-solr-3.x
* Pantheon (but the behaviour is reproducible on a local build with standard solrconfig.xml)

(But as we can see the URLs hitting Solr (see below) then I don't think module versions as such can be an issue.)

When I search for e.g. deep biosphere (no quotes) I get plenty of results, some containing the phrase. When I add the quotes e.g. "deep biosphere", then I get no results. Here's the direct Solr URLs (based on catalina.out) for with and without quotes. (The "minimum match" mm is a parameter our custom code is adding as Pantheon sets it to 1, which returns a lot of irrelevant results. Removing it doesn't affect this issue.)

We've set the following in the view:

* Filter: Search: Fulltext search: "Use as" set to "Search keys"
* Advanced: Query Settings: "Parse mode" set to "Direct query"

The title field is set to type="fulltext" for this index, and under filters, "Ignore case" and "Tokenizer" are both set to run on title. I mention title, because that's a field that we know should match the quoted phrase.

I've attached a zipfile of a feature with our view, service configuration and facet setup: let me know if you need any more information.

slinky’s picture

We are just using Search API and having this same problem, same results and getting inconsistent answers elsewhere.

1. Filter: Search: Fulltext Search: "Use As": set to "Search Keys"
2. Advanced: Query Settings: "Parse Mode" set to "Multiple Terms" which is defined as:

The query is interpreted as multiple keywords separated by spaces. Keywords containing spaces may be "quoted". Quoted keywords must still be separated by spaces.

Search for "pizza pie" results in the search query exactly as stated above - search?search_api_views_fulltext="pizza+pie" and yields results containing both "pizza" and "pie" but not "pizza pie". Many hours were put into this by a few different developers and none have a solution as of yet as to why the description above for Parse Mode is not yielding the expected results of the phrase itself and not the words as if the quotes didn't exist.

EDIT - I'm noticing that using the quotes provides a different result. It searches ONE field for ALL terms. Without quotes, as long as both words appear in any of the search fields enumerated, that generates a hit. But with quotes, both words have to be in the same field (e.g. in title and not one word in title and one in body.)

drunken monkey’s picture

@ slinky: Which backend are you using (Solr, database, Elasticsearch, …)?
The database backend doesn't support phrase queries, it will just ignore the quotes (more or less).

awolfey’s picture

I'm also having this issue using Pantheon Solr backend.

Filter: Search: Fulltext Search: "Use As": set to "Search Keys"
Searching for a known phrase in indexed nodes, when the parse mode is set to direct query quote marks are ignored and results include partial phrase matches. When set to multiple terms no results are returned.

Thanks for any help.

drunken monkey’s picture

@ awolfey: I cannot reproduce this behavior. Maybe Pantheon preprocesses the request in some way?
Could you post the generated Solr request here? (See the handbook.)

awolfey’s picture

Thanks.

Without quotes:

https://xxxxxxxxxxxx-8b45-3a77ecc1871c-private.panth.io:449/sites/self/environments/dev/index/select?fl=item_id%2Cscore&qf=tm_search_api_aggregation_1%5E21.0&qf=tm_search_api_viewed%5E0.1&fq=index_id%3A%22full_site%22&fq=hash%3A6s8uca&start=0&rows=25&sort=score%20desc&wt=json&json.nl=map&q=%28%28%22health%22%29%20OR%20%28%22homes%22%29%29

With quotes:

https://xxxxxxxxxxxx-8b45-3a77ecc1871c-private.panth.io:449/sites/self/environments/dev/index/select?fl=item_id%2Cscore&qf=tm_search_api_aggregation_1%5E21.0&qf=tm_search_api_viewed%5E0.1&fq=index_id%3A%22full_site%22&fq=hash%3A6s8uca&start=0&rows=25&sort=score%20desc&wt=json&json.nl=map&q=%22health%20homes%22

Pantheon uses its own connection class PantheonApachesolrSearchApiSolrConnection.

drunken monkey’s picture

Are these with "direct query" or "multiple terms" parse mode? They look correct to me for the latter (if you have the operator set to "contains any of these words").
In that case, it might just be that in the text indexed in Solr, there is some token between "health" and "homes" – maybe an HTML tag which doesn't get removed or ignored, or something similar. You'd have to look at the indexed data for an item which you think should match to find that out.

amykhailova’s picture

I can confirm the same issue. We are using Open SOLR with config from Search API module and we have multiple keywords setting. We also tried with direct query. If multiple keywords is selected then query returns no results, if direct query is selected then the results are not narrowed and partial results are returned.

lukamr’s picture

I have the same problem.. but I'm using Tika not Solr :/

JasonSafro’s picture

Are you sure we want "search keys"? That doesn't seem consistent with the descriptions:

Search keys – multiple words will be split and the filter will influence relevance. You can change how search keys are parsed under "Advanced" > "Query settings".

Search filter – use as a single phrase that restricts the result set but doesn't influence relevance.

Based on that description, "search filter" seems more like what I'd want.

lukamr’s picture

I'm still having the same problem with quotes phrases.
Tika is not a search engine but extract the pdf text to index, and this part is ok.
The problem is in Search API View, all quote phrases is searching like separated words.

I changed the "Advanced" > "Query settings" to "direct query", and "search key" to "search filter"... but still getting the same results.

Any suggests?

drunken monkey’s picture

Are you maybe using a Database server? That would be the only explanation I can think of here.

lukamr’s picture

Yes, I'm using Database server. But what's the point?

drunken monkey’s picture

Database servers don't support quoted phrases.

acbramley’s picture

In case anyone else is having the same issue on Drupal 8, even with the Multiple words parse mode set, you need to check how your fields are indexed.

I have a field which contains strings like "EP 01 00 00 02 SP", and when searching with quotes around this kind of string, I was getting no results. A quick look at the index showed me what was going on. I was indexing this field as "text" and using a bunch of processors on it like stopwords and tokeniser which resulted in the indexed value being "01 00 00 02".

To fix this, I changed the index type to "Fulltext tokens" (or "solr_text_wstoken"), and removed those previously mentioned processors and everything started working. The key is to make sure your index actually has the full string you're searching for!

TrevorBradley’s picture

Just came at this from Drupal 8 as well, using the Multiple Terms Parse Plugin (with search_api_solr).

Looking at the solr results, Multiple Terms just decides to entirely discard the quotes passing to solr, e.g. as search for "foo bar" with quotes results in:

tm_my_field:foo bar^1

Which is mangled, garbage Solr. "bar" is entirely discarded.

meanwhile without quotes I get a much different

+(tm_my_field:foo^1) +(tm_my_field:bar^1)

Which is a lovely and query, but not the same as searching for "foo bar" as a single string.

If I switch over to direct query, quotes work for "foo bar", but without the quotes, the result is now:

tm_my_field:(foo bar)^1

Which is not at all what I want (I want that lovely AND behaviour over multiple fields).

I suspect very strongly that (at least for Drupal 8) Plugin/search_api/parse_mode/Terms.php is fundamentally flawed. It's stripping away the quotes from "foo bar" into "foo (with a dangling left quote) and bar" (with a dangling right quote) and then stripping away the garbage quote characters. Instead the parser needs to be aware that the quoted items are atomic, and can't be subdivided because it happens to have a space in it.

I'm going to resolve this in my own case by creating my own custom ParseModePluginBase, and try to respect quotation marks before cutting up the search query into chunks.

EDIT: OK, I'm now becoming convinced this isn't a search_api issue, but rather how search_api_solr is handling this request passed to it.

EDIT: In the end, it was kind of a search api issue. I ended up extending "Multiple Terms" into "Multiple Phrases". See #3017342: keys of fulltext searches must be encoded as phrases for terms parse mode

This could be search_api_solr weirdness, or an actual feature serach_api needs. I don't know enough about either system to say for sure.

drunken monkey’s picture

@ TrevorBradley: This seems to be wrong handling in the Solr backend, the Search API side seems to be working fine.
To check, you can Views’ “Live preview” and show the query there. You want "foo bar" to be parsed as:

array(
  'foo bar',
  '#conjunction' => 'AND',
),

Individual array keys should be treated as phrases by the backend. According to your observations, it seems this might not be implemented correctly in Solr.

TrevorBradley’s picture

@drunken money: Looks like the devs of search_api_solr have fixed this behaviour as a result of #3017342: keys of fulltext searches must be encoded as phrases for terms parse mode.

drunken monkey’s picture

Status: Active » Fixed

OK, then this can be closed?
Please re-open with current open questions if not.

TrevorBradley’s picture

Original ticket is 7.x. You might have to ask @CLEE25

(I've got no preference one way or the other!)

Edit: Whoops, I missed that it was just closed. Agreed - best to leave it closed unless someone has data to open it.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

damondt’s picture

For me the fix was indexing the field in question as "Fulltext Tokens" instead of "Fulltext"