Hi,

I'm using Solr 3.6 with search_api_solr 7.x-1.0-rc2 (but my request is also valid for latest dev version).

I have enabled SnowballPorterFilterFactory filter for the field type "text" and want to use the autocompletion feature provided by search_api_autocomplete on those fields.

Autocompletion works fine, but the suggestions returned are the "stemmed" terms and I want the original term (e.g. I want "cats" but the suggestions contains only "cat")

I can use another Solr index to store original terms, but it seems that there is no way to use a different Solr index for autocompletion.
In other words, what I want is the possibility to have different values for 'qf' and 'facet.field' parameters. For example "qf=t_title&facet.field=ss_title_autocomplete"

Is it possible to have this feature in search_api_solr ? (I can write a patch if so)
Or maybe I'm thinking wrong and there is a much simpler solution for that ?

Thanks for any help.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Julian Maurice’s picture

Hi again,

I made some tests on latest dev version and saw that there are dynamic fields in schema.xml that are not used at all. For example 'tus_*' and 'tum_*' which are perfect for what I'm trying to do.
So I made a tiny patch that make use of those fields for the autocomplete feature.

I tested it on Solr 3.6 only.

Please tell me if the following patch can be integrated in search_api_solr.

drunken monkey’s picture

No, sorry, that patch seems too specialized to be generally useful, and thus included. Also, since we are using the common configurations, we cannot just change our config files without them being changed in that project, too.

The good news, though, is that this should be very easy to do in a custom module, with hook_search_api_solr_query_alter() (you can use the Search API query's search id option to determine whether it's an Autocomplete request), and by adding the <copyField> directive to schema_extra_fields.xml.

Also, there is already the option to use different fields than the ones searched for the autocomplete suggestions. However, that would not help you directly with your use case, of course. However, you could use an aggregated field to copy the fulltext data and then changing that field's type to the one you want to use. That should work without any custom code, just with config file manipulations.

Julian Maurice’s picture

It sounds weird to me to use stemmed fields for autocomplete suggestions.

Ok, you can choose what fields are used for autocompletion in Search API Autocomplete configuration page, but in this case the same fields are used for every textbox. So I can't have different suggestions for my "Title" and "Authors" search fields.

And if I understand well your last solution (aggregated field + changing field's type) it will correctly return unstemmed words for autocomplete suggestions, but will disallow "stemmed search" on this field, right?

So... no possibility at all to use automatically (may be a configurable behaviour) tum_* instead of tm_* ?

drunken monkey’s picture

And if I understand well your last solution (aggregated field + changing field's type) it will correctly return unstemmed words for autocomplete suggestions, but will disallow "stemmed search" on this field, right?

No, since you're creating a copy of the field, you can use the one for unstemmed autocomplete suggestions and the other for stemmed searching. But, as you say, this won't work if you want autocompletion for two fulltext fields on the same view. (I am, however, actually not sure whether this works at all, currently.)

But there's always the first option I mentioned, of altering the Solr query yourself.

Julian Maurice’s picture

Status: Active » Closed (works as designed)

But there's always the first option I mentioned, of altering the Solr query yourself.

For now, the "patch" solution is ok, even if not integrated in the module, as it applies easily to all versions since 1.0-rc3. But this will be a great solution when the patch will not apply anymore on future versions.

Thanks for your answers.

I close the issue.

drunken monkey’s picture

Status: Closed (works as designed) » Active

At DrupalCon, and also outside it, I have heard several complaints about exactly this in the last days. Therefore, it might really be a good idea to index fulltext fields twice by default.
I'll see what I can do.

checker’s picture

I have a similar problem using EdgeNGramFilterFactory and autocomplete (#2093939: How to use autocomplete if solr EdgeNGramFilterFactory is used?). I just want to post my solution depends on drunken monkey's solution.

To get good results for autocomplete i had to change the field type for autocomplete fields:
Therefore I'm using an aggregated field that i created with the search_api UI. This field contains all other fields i want to use for autocomplete as fulltext (for example node title). After that i modified schema_extra_fields.xml to change the field type of the aggregated field. Using something like this <field name="tm_search_api_aggregation" type="text_und" indexed="true" stored="true" multiValued="true" termVectors="true" />

Now autocomplete works fine! Thanks for your help drunken monkey.

almc’s picture

I've applied the patch #1 to search_api_solr module with the dev version of Autocomplete and it stopped making suggestions at all, so I had to revert search_api_solr to the original files from its package.

f0ns’s picture

anyone figured out how to do this without hacking the module/writing custom modules. I would prefer adjusting schema.xml in order to get it working (easiest to reuse)

Atm I use

<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />

for indexing and my autocomplete suggestions is filled with stemmed results (that don't add value).

Thanks in advance!

narkoff’s picture

I fixed the stemming issue by performing the steps outlined in comment #7. This fixed stemming issues with single word search terms being returned (e.g. Manhattan is now only returned as 'Manhattan', not 'Man', 'Manhatt', etc.). I tested different field types in the schema_extra_fields.xml and type - "text_ws" worked best for me.

However, I still have issues with returning multiple word search terms. Using an aggregated field that includes all of the fields for Autocomplete to search, the returned results mix the fields. For example, entering 'san d' will return 'San Diego'. But, it will also return 'San United', 'San States', 'San America', San California' (the aggregated field includes city, state, country).

I don't believe this to be a stemming issue, but haven't been able to resolve by changing Views exposed filter Search Keys/Search Filter selection or the Views Query Settings either.

f0ns’s picture

So what did you add to schema.xml, tried #7 but didn't get the results I expected.

narkoff’s picture

I didn't touch schema.xml. I edited schema_extra_fields.xml. I added the following line (edit according to your field name): <field name="tm_search_api_aggregation_4" type="text_ws" indexed="true" stored="true" multiValued="true"/>

I also went into the Solr Search API Server settings > Advanced > Autocomplete and turned off 'Suggest additional words'.

I also restarted the Solr services (to be safe).

f0ns’s picture

Oké I got it working:

  1. Added <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" /> under the analyzer indexer of <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  2. Enable aggregated fields (Search API > Index > Workflow)
  3. Add a new aggregated field (and add the fields to it you want indexed for you autocomplete
  4. Add the field definition in schema_extra_fields.xml for my case it was <field name="tm_search_api_aggregation_1" type="text_und" indexed="true" stored="true" multiValued="true"/>, you can of course also copy text_und to your schema_extra_types.xml and give it a different name and change it around a bit (depending on your use case)
  5. Enable this newly created field for your autocomplete, also see my remark on the bottom of the post (HTML filter)
  6. Restart Solr + reindex website
  7. You are done! Woohoo! (Get some coffee/beer, depends what time it is I guess)

Hope this helps people in the future.

If you want to filter your autocomplete suggestion by language (not default) here is how to do it: https://drupal.org/node/2166113

Remark:
I've noticed p and li chars were attached to the results.. don't forget to add your new aggregated field to the HTML filter under processor (admin/config/search/search_api/index/default_node_index/workflow)

narkoff’s picture

Issue summary: View changes

I have autocomplete working, but it still needs some fine tuning. I can't get multi-word suggestions to return properly.

I am using an aggregated fulltext field that contains address data: city, state, country (e.g. San Diego, California, United States). This data is stored in one field, and then added to the aggregated field. I'd like the behavior to be that as I type, matching results are returned for the complete field (i.e. San Diego, California, United States). Common use case like: hotels.com .

However, what is happening is only one matching word at a time is being returned. So typing 'san' only returns 'San', 'Santa', etc. When I type 'san d', then 'San Diego', 'San Dimas', etc is returned. I'm looking to return the complete field - city, country, state.

I had stemming issues with partial words being returned that #7 resolved. However, I can't figure out what Tokenizer or TokenFilter in schema.xml may be causing only one whole word at a time to be returned. Any ideas?

GaëlG’s picture

For anyone interested, I wanted something generic and did it this way:

<!--
  Unstemmed fields used for autocompletion.
-->
    <copyField source="ts_*" dest="tus_*" />
    <copyField source="tm_*" dest="tum_*" />
/**
 * Implements hook_search_api_solr_query_alter().
 */
function MY_MODULE_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) {
  // Use unstemmed fields for autocompletion.
  if (preg_match('/^search_api_autocomplete:.+/', $query->getOption('search id'))) {
    foreach ($call_args['params']['qf'] as &$query_field) {
      $query_field = preg_replace('/^t([sm])_/', 'tu$1_', $query_field);
    }
    foreach ($call_args['params']['facet.field'] as &$query_field) {
      $query_field = preg_replace('/^t([sm])_/', 'tu$1_', $query_field);
    }
  }
}
sgurlt’s picture

#13 Works perfectly, thanks for the advice :-)
Just a little tip for everyone trying to get this working. The aggregate field name you enter is not your field name for the schema_extra_fields.xml file.

Solr needs the field id itself. For example, if you're creating your first aggrated field, your field id would be: "search_api_aggregation_1"

But this wont to it, you have to insert it like this into your schema_extra_fields.xml "tm_search_api_aggregation_1".

I dont know why you've to enter "tm_", but without it, the magic wont work :-)

kitikonti’s picture

#13 also works for me, but my problem is that i dont understand what the hell i am doing here, which makes me a bit nervous. Ok i understand that we add a extrafield based on the original field we want autocomplete suggestions for. Why do we do this? Couldnt we just use the original field? Or because we dont want to loose the stemming on this field? What is an EdgeNGramFilterFactory? Do i need this always or was this just a special use case of #13. What happens in 1 and 4 of the #13 description? And where happens the part that this aggregated field will not be stemmed?

drunken monkey’s picture

Why do we do this? Couldnt we just use the original field? Or because we dont want to loose the stemming on this field?

Exactly. The original field, used for normal searching, should still be stemmed, but we want an unstemmed version for autocomplete suggestions. That's why we need two fields.

What is an EdgeNGramFilterFactory? Do i need this always or was this just a special use case of #13.

That's a bit confusing, you're right – that's just the special case of f0ns, not anything you have to do for unstemmed autocomplete.
EdgeNGramFilterFactory makes searches match on substrings, something a lot of people want to add to their Solr searches (see, e.g., #1760076: Search part of the word (schema.xml), #1414838: Partial word matching, #1307784: Fuzzy Search). But with that configuration, autocomplete suggestions will become almost useless, which is why this issue then becomes all the more important – which is probably what lead f0ns here, and to post the instructions.

What happens in 1 and 4 of the #13 description?

For 1. see above, 4. changes the field type of the aggregated field to the unstemmed one – see the handbook.

And where happens the part that this aggregated field will not be stemmed?

By changing the field type for the aggregated field, as seen in 4. – the text_und is a pre-defined type in the configs for unstemmed text, exactly what we want here.

I hope this answers your question.

kitikonti’s picture

Thx for the answer.

So if i understand this right. You say that in most cases it is better to dont use the EdgeNGramFilterFactory for the Autocomplete field?

drunken monkey’s picture

So if i understand this right. You say that in most cases it is better to dont use the EdgeNGramFilterFactory for the Autocomplete field?

For the autocomplete field, you should never use it. f0ns used it for the normal text field – and therefore had to create a dedicated autocomplete field.

Anonymous’s picture

For the autocomplete field, you should never use it. f0ns used it for the normal text field – and therefore had to create a dedicated autocomplete field.

Could you please shortly explain why? Will there be issues with performance?

drunken monkey’s picture

Could you please shortly explain why?

Because that field then contains not only complete words, but also substring. So, e.g., as a completion for "lib" you'd get something like:
"lib"
"libr"
"libra"
"librar"
"library"

Amir Simantov’s picture

Hi guys. I have been referred here when looking for a solution to suggest only whole words. However, this issue here is dealing only with Solr a service class. I am currently using the search_api_db and was wondering whether I can make it work there, as well. Just to make sure I am clear about my need: if user types "inf" she will not be suggested with "info" but only with "information". Thanks!

louisnagtegaal’s picture

The solution in #13 almost worked for me. One mistake I made was that I did not add the new aggregated field to the fields to index. But after I corrrected that, I still got autocomplete with stemmed words.

When inspecting the aggrated field in the Solr admin, I noticed that the field-type was somehow set to "text" and not to "text_und". I resolved this by implementing hook_search_api_solr_field_mapping_alter and change the field from "tm_*" to "tum_*"-type by changing "tm_search_api_aggregation_2" to "tum_search_api_aggregation_2" (I also have another aggregated field).

This corrected the problem, but I wonder if this solution is "better" than the solution in #15 which copies all "tm_-fields" to corresponding "tum_-fields".

drunken monkey’s picture

Hi guys. I have been referred here when looking for a solution to suggest only whole words. However, this issue here is dealing only with Solr a service class. I am currently using the search_api_db and was wondering whether I can make it work there, as well. Just to make sure I am clear about my need: if user types "inf" she will not be suggested with "info" but only with "information". Thanks!

"info" is maybe a bad example, as that's used as a word, too, and therefore also might be indexed. And in other situations, I can't imagine how this problem should arise with the DB backend. But please create an issue there if you are sure this problem exists there, too, and describe it in detail.

sgurlt’s picture

This is all working good for me since a couple of month, I only have two issues left I never figured out how to solve, maybe someone could give me a hint.
I have several fields inside my autocomplete, one of those is "company name".
Lets imagine one user added the company name "Bright Solutions" in his profile. When a vistior now starts typing in the autocomplete field "Brig", the following is printed out:

Autocomplete

This is ok, but not perfect, I want to see the full company name here, some idea how to achieve this?
The next wiered thing is, when the user types "Bright " (with a white space after the word), the following results are returned:

Autocomplete

Only one of them "Bright solutions" really makes sense here, the rest is random stuff... Some idea how to avoid this ?

Greetings

Sebastian

drunken monkey’s picture

You can index the field as a string by changing the Solr configuration. As the type, maybe just use sortString, that should work fine in your case.
If you want closer control over how the suggestions are generated, take a look at #2475435: Add support for switching the autocomplete implementation and the other recently committed (or pending) Search API Autocomplete issues.

sgurlt’s picture

Hm sadly i still could not resolve this, even after updating to the latest dev version. Here is my full setup, did I miss something:

1. Created new aggregation field and added all fields to it that should be searched.

Aggregation field

2. Added the field to the index, tried indexing with fulltext and string. String does not return any results at all, so i left this on fulltext.

indexed field

3. Configured autocomplete

autocomplete setup

4. Changed field type in schema_extra_fields.xml

Schema extra fields

5. Checked solr Interface if right field type is applied

Solr Interface

6. Results:

results

Did I miss something?

drunken monkey’s picture

Hm. No, seems about right. Or which fields did you include in the aggregation? Just the company name, or which others?
The suggestions should all be complete field values from there. (If you have more than one field in the aggregation, I'd use the "List" aggregation type, though.)

If not, try to debug the query sent to the Solr server (see the handbook).

Oh, also, did you reindex after changing the Solr configuration? You'll have to do that.

sgurlt’s picture

The aggregation contains multiple other fields. As the searchfield is for keyword search, it contains fields like, company name, user name, company tasks, ... and some more.
Even after changing the aggregation field type to "List" i am running into the same result, and yep, I reindexed ;)

The aggregated field must be index a fulltext on the field settings page right?
I will write you a privat message, maybe we could check it together using skype or hangout ? :)

drunken monkey’s picture

As said, try debugging the Solr query. Also try deactivating the search server's "Suggest additional words" option, if you haven't already.

sgurlt’s picture

Feels like I am pretty close to get this finally working. I added a new fieldType to my schema.xml:

<fieldType name="suggest_phrase" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
</fieldType>

Now i recieve the following:

autocomplete

The last issue that is left over is, that when I know add a blank behind a word, the suggestion stops showing:

autocomplete

Maybe another idea how to fix that ? :-)

drunken monkey’s picture

Maybe another idea how to fix that ? :-)

Ah, yes, that's because the current autocomplete code treats everything in front of the last space (if any) as completed words, not as prefixes for a word you're still typing ("bright solutions" counting as one word, in this case, due to your tokenizing – but the autocomplete code doesn't know that).
You can either modify the autocomplete code in this module slightly to change that (i.e., hack it) or, much more cleanly, use #2475435: Add support for switching the autocomplete implementation to write a suggester plugin with mostly identical code to the autocomplete code from the Solr module, but with this change for your use case. (While you're at it, you might find other spots where you can customize the code to better match the expectations of users on your site.)

Anyways, this has little or nothing to do with this issue. If you need any more help, please create a new issue for it.

michaelmallett’s picture

sg88 can you please elaborate on what you've altered in your schema.xml? I've found this on stack overflow, is it you?
http://stackoverflow.com/questions/18132819/how-to-have-solr-autocomplet...

1. You seem to no longer be using the aggregate field, how are you setting everything up in drupal now to query that suggest_name field?

2. Is it possible to make the configuration changes, (searchComponent, requestHandler) in schema.xml only? Just that I am using pantheon and they only allow changes
to schema.xml. Otherwise, is there a way of making those configuration changes outside of solrconfig.xml elsewhere?

3. How do you force the search api autocomplete querying on spellcheck.q rather than q? I have tried using the search_api_spellcheck module and adding the params to the query but it's not helped.

If that's not you, then sorry! But how did you complete the configuration for your search_api_setup? You have the exact same issue as me, or least I am having a similar issue where I am trying to index on a term reference field, and only retrieve auto complete on exact matches. I have followed the directions in #13 to where I can return full terms without stemming, but similar to your issue, I am getting mixed multi word terms back. So where I have terms
"african descent"
"asian ethnicity"
I'm getting "african ethnicity", "african asian" etc, when I type 'african'.

michaelmallett’s picture

Never mind! I got it all sorted. My issue was that I was not delimiting the right character, and so was either returning a mixture of terms together, or the all keywords on the field were being indexed as a single term. I also had the 'suggest words' option turned on in the solr server, which was the cause of the mixing of terms together.

So, for anyone who wants to have a solr powered taxonomy autocomplete search, I did the following:

1. Added the aggregate field on the taxonomy name for the taxonomy term reference to take the taxonomy term name.
2. Added a new field in schema.xml:
<field name="tm_search_api_aggregation_1" type="suggest_phrase" indexed="true" stored="false" multiValued="true"/>
I did this in schema.xml because of platform limitation, but this is probably best done in schema_extra_fields.xml instead
3. Added the field type schema.xml

<fieldType name="suggest_phrase" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
      </analyzer>
    </fieldType>

KeywordTokenizerFactory is used to index the entire term in the case of multiwords.
4. In the tokenizer settings for the index, set the PMCE Character Class to [\n], to target new lines. It turned out that when the field value was passed to solr, it was being passed as one long string with \n\n between terms.

Thanks to everyone in this thread for the work they did that got me here!

alexanansi’s picture

Hi Michael

Can you elaborate on point 4? I am not sure where / how to put this in my schema file as I am quite new to solr - thanks!

Alex

michaelmallett’s picture

Hi Alex.

This one isn't in your schema, it's in the search api ui. When you are setting up the filters in the search api form there is a checkbox called tokenizer. Enable this and then at the bottom of the form there is a text box to add a regex style character targeting (though I don't think it's strictly regex). If you replace whats in there with [\n] including the square brackets, it should split the results by newline, rather than indexing as one massive string.

If this isn't clear (I'm on a phone) I'll come back with screenshot later

mstrelan’s picture

So what happens if you have a single solr server with separate indexes and separate searches, and on one index the aggregate field name is search_api_aggregation_1 and the other is search_api_aggregation_2? There doesn't seem to be a way to change the machine name of the aggregation.

drunken monkey’s picture

So what happens if you have a single solr server with separate indexes and separate searches, and on one index the aggregate field name is search_api_aggregation_1 and the other is search_api_aggregation_2? There doesn't seem to be a way to change the machine name of the aggregation.

You can change those machine names programmatically, or just add/delete fields until they have the desired machine names. Granted, it's more a hack than a proper solution, but it works.

ciss’s picture

@mstrelan For a patch that allows you to customize the aggregated field's machine name see #2905585: Allow custom machine name for aggregation fields.

wil2091’s picture

I'm on the same page as #32 .The suggestion stops when a white space is added. Any possible solution/s ?
Also, I'm trying to sort the suggestion values, below is a sample list -
bproduct01-1
aproduct-2
hproduct-2
cproduct004-3

I want to sort it asc -
aproduct-2
bproduct01-1
cproduct004-3
hproduct-2

I'm using the aggregate field with FullText type. But on the fields tab it's clearly mentioned as

"Fields indexed with type "Fulltext" and multi-valued fields (marked with 1) cannot be used for sorting"

Which clearly means FullText and List types can't be used for sorting, while the aggregated field must be indexed as a FullText to use it for Autocomplete, does this mean we can't sort the autocomplete values except integers and strings?
Any ideas ?

OanaIlea’s picture

Status: Active » Closed (outdated)

This issue was closed due to lack of activity over a long period of time. If the issue is still acute for you, feel free to reopen it and describe the current state.