Hi,
I'm using Solr 3.6 with search_api_solr 7.x-1.0-rc2 (but my request is also valid for latest dev version).
I have enabled SnowballPorterFilterFactory filter for the field type "text" and want to use the autocompletion feature provided by search_api_autocomplete on those fields.
Autocompletion works fine, but the suggestions returned are the "stemmed" terms and I want the original term (e.g. I want "cats" but the suggestions contains only "cat")
I can use another Solr index to store original terms, but it seems that there is no way to use a different Solr index for autocompletion.
In other words, what I want is the possibility to have different values for 'qf' and 'facet.field' parameters. For example "qf=t_title&facet.field=ss_title_autocomplete"
Is it possible to have this feature in search_api_solr ? (I can write a patch if so)
Or maybe I'm thinking wrong and there is a much simpler solution for that ?
Thanks for any help.
Comment | File | Size | Author |
---|---|---|---|
#32 | Screen Shot 2015-08-17 at 15.03.26.png | 10.76 KB | sgurlt |
#32 | Screen Shot 2015-08-17 at 15.03.14.png | 12.53 KB | sgurlt |
#28 | Screen Shot 2015-08-10 at 11.57.48.png | 23.76 KB | sgurlt |
#28 | Screen Shot 2015-08-10 at 11.53.09.png | 66.83 KB | sgurlt |
#28 | Screen Shot 2015-08-10 at 11.53.01.png | 89.49 KB | sgurlt |
Comments
Comment #1
Julian Maurice CreditAttribution: Julian Maurice commentedHi again,
I made some tests on latest dev version and saw that there are dynamic fields in schema.xml that are not used at all. For example 'tus_*' and 'tum_*' which are perfect for what I'm trying to do.
So I made a tiny patch that make use of those fields for the autocomplete feature.
I tested it on Solr 3.6 only.
Please tell me if the following patch can be integrated in search_api_solr.
Comment #2
drunken monkeyNo, sorry, that patch seems too specialized to be generally useful, and thus included. Also, since we are using the common configurations, we cannot just change our config files without them being changed in that project, too.
The good news, though, is that this should be very easy to do in a custom module, with
hook_search_api_solr_query_alter()
(you can use the Search API query'ssearch id
option to determine whether it's an Autocomplete request), and by adding the<copyField>
directive toschema_extra_fields.xml
.Also, there is already the option to use different fields than the ones searched for the autocomplete suggestions. However, that would not help you directly with your use case, of course. However, you could use an aggregated field to copy the fulltext data and then changing that field's type to the one you want to use. That should work without any custom code, just with config file manipulations.
Comment #3
Julian Maurice CreditAttribution: Julian Maurice commentedIt sounds weird to me to use stemmed fields for autocomplete suggestions.
Ok, you can choose what fields are used for autocompletion in Search API Autocomplete configuration page, but in this case the same fields are used for every textbox. So I can't have different suggestions for my "Title" and "Authors" search fields.
And if I understand well your last solution (aggregated field + changing field's type) it will correctly return unstemmed words for autocomplete suggestions, but will disallow "stemmed search" on this field, right?
So... no possibility at all to use automatically (may be a configurable behaviour) tum_* instead of tm_* ?
Comment #4
drunken monkeyNo, since you're creating a copy of the field, you can use the one for unstemmed autocomplete suggestions and the other for stemmed searching. But, as you say, this won't work if you want autocompletion for two fulltext fields on the same view. (I am, however, actually not sure whether this works at all, currently.)
But there's always the first option I mentioned, of altering the Solr query yourself.
Comment #5
Julian Maurice CreditAttribution: Julian Maurice commentedFor now, the "patch" solution is ok, even if not integrated in the module, as it applies easily to all versions since 1.0-rc3. But this will be a great solution when the patch will not apply anymore on future versions.
Thanks for your answers.
I close the issue.
Comment #6
drunken monkeyAt DrupalCon, and also outside it, I have heard several complaints about exactly this in the last days. Therefore, it might really be a good idea to index fulltext fields twice by default.
I'll see what I can do.
Comment #7
checker CreditAttribution: checker commentedI have a similar problem using EdgeNGramFilterFactory and autocomplete (#2093939: How to use autocomplete if solr EdgeNGramFilterFactory is used?). I just want to post my solution depends on drunken monkey's solution.
To get good results for autocomplete i had to change the field type for autocomplete fields:
Therefore I'm using an aggregated field that i created with the search_api UI. This field contains all other fields i want to use for autocomplete as fulltext (for example node title). After that i modified schema_extra_fields.xml to change the field type of the aggregated field. Using something like this
<field name="tm_search_api_aggregation" type="text_und" indexed="true" stored="true" multiValued="true" termVectors="true" />
Now autocomplete works fine! Thanks for your help drunken monkey.
Comment #8
almc CreditAttribution: almc commentedI've applied the patch #1 to search_api_solr module with the dev version of Autocomplete and it stopped making suggestions at all, so I had to revert search_api_solr to the original files from its package.
Comment #9
f0ns CreditAttribution: f0ns commentedanyone figured out how to do this without hacking the module/writing custom modules. I would prefer adjusting schema.xml in order to get it working (easiest to reuse)
Atm I use
for indexing and my autocomplete suggestions is filled with stemmed results (that don't add value).
Thanks in advance!
Comment #10
narkoff CreditAttribution: narkoff commentedI fixed the stemming issue by performing the steps outlined in comment #7. This fixed stemming issues with single word search terms being returned (e.g. Manhattan is now only returned as 'Manhattan', not 'Man', 'Manhatt', etc.). I tested different field types in the schema_extra_fields.xml and type - "text_ws" worked best for me.
However, I still have issues with returning multiple word search terms. Using an aggregated field that includes all of the fields for Autocomplete to search, the returned results mix the fields. For example, entering 'san d' will return 'San Diego'. But, it will also return 'San United', 'San States', 'San America', San California' (the aggregated field includes city, state, country).
I don't believe this to be a stemming issue, but haven't been able to resolve by changing Views exposed filter Search Keys/Search Filter selection or the Views Query Settings either.
Comment #11
f0ns CreditAttribution: f0ns commentedSo what did you add to schema.xml, tried #7 but didn't get the results I expected.
Comment #12
narkoff CreditAttribution: narkoff commentedI didn't touch schema.xml. I edited schema_extra_fields.xml. I added the following line (edit according to your field name):
<field name="tm_search_api_aggregation_4" type="text_ws" indexed="true" stored="true" multiValued="true"/>
I also went into the Solr Search API Server settings > Advanced > Autocomplete and turned off 'Suggest additional words'.
I also restarted the Solr services (to be safe).
Comment #13
f0ns CreditAttribution: f0ns commentedOké I got it working:
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />
under the analyzer indexer of<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<field name="tm_search_api_aggregation_1" type="text_und" indexed="true" stored="true" multiValued="true"/>
, you can of course also copy text_und to your schema_extra_types.xml and give it a different name and change it around a bit (depending on your use case)Hope this helps people in the future.
If you want to filter your autocomplete suggestion by language (not default) here is how to do it: https://drupal.org/node/2166113
Remark: I've noticed p and li chars were attached to the results.. don't forget to add your new aggregated field to the HTML filter under processor (admin/config/search/search_api/index/default_node_index/workflow)
Comment #14
narkoff CreditAttribution: narkoff commentedI have autocomplete working, but it still needs some fine tuning. I can't get multi-word suggestions to return properly.
I am using an aggregated fulltext field that contains address data: city, state, country (e.g. San Diego, California, United States). This data is stored in one field, and then added to the aggregated field. I'd like the behavior to be that as I type, matching results are returned for the complete field (i.e. San Diego, California, United States). Common use case like: hotels.com .
However, what is happening is only one matching word at a time is being returned. So typing 'san' only returns 'San', 'Santa', etc. When I type 'san d', then 'San Diego', 'San Dimas', etc is returned. I'm looking to return the complete field - city, country, state.
I had stemming issues with partial words being returned that #7 resolved. However, I can't figure out what Tokenizer or TokenFilter in schema.xml may be causing only one whole word at a time to be returned. Any ideas?
Comment #15
GaëlGFor anyone interested, I wanted something generic and did it this way:
Comment #16
sgurlt CreditAttribution: sgurlt commented#13 Works perfectly, thanks for the advice :-)
Just a little tip for everyone trying to get this working. The aggregate field name you enter is not your field name for the schema_extra_fields.xml file.
Solr needs the field id itself. For example, if you're creating your first aggrated field, your field id would be: "search_api_aggregation_1"
But this wont to it, you have to insert it like this into your schema_extra_fields.xml "tm_search_api_aggregation_1".
I dont know why you've to enter "tm_", but without it, the magic wont work :-)
Comment #17
kitikonti CreditAttribution: kitikonti commented#13 also works for me, but my problem is that i dont understand what the hell i am doing here, which makes me a bit nervous. Ok i understand that we add a extrafield based on the original field we want autocomplete suggestions for. Why do we do this? Couldnt we just use the original field? Or because we dont want to loose the stemming on this field? What is an EdgeNGramFilterFactory? Do i need this always or was this just a special use case of #13. What happens in 1 and 4 of the #13 description? And where happens the part that this aggregated field will not be stemmed?
Comment #18
drunken monkeyExactly. The original field, used for normal searching, should still be stemmed, but we want an unstemmed version for autocomplete suggestions. That's why we need two fields.
That's a bit confusing, you're right – that's just the special case of f0ns, not anything you have to do for unstemmed autocomplete.
EdgeNGramFilterFactory
makes searches match on substrings, something a lot of people want to add to their Solr searches (see, e.g., #1760076: Search part of the word (schema.xml), #1414838: Partial word matching, #1307784: Fuzzy Search). But with that configuration, autocomplete suggestions will become almost useless, which is why this issue then becomes all the more important – which is probably what lead f0ns here, and to post the instructions.For 1. see above, 4. changes the field type of the aggregated field to the unstemmed one – see the handbook.
By changing the field type for the aggregated field, as seen in 4. – the
text_und
is a pre-defined type in the configs for unstemmed text, exactly what we want here.I hope this answers your question.
Comment #19
kitikonti CreditAttribution: kitikonti commentedThx for the answer.
So if i understand this right. You say that in most cases it is better to dont use the EdgeNGramFilterFactory for the Autocomplete field?
Comment #20
drunken monkeyFor the autocomplete field, you should never use it. f0ns used it for the normal text field – and therefore had to create a dedicated autocomplete field.
Comment #21
Anonymous (not verified) CreditAttribution: Anonymous commentedCould you please shortly explain why? Will there be issues with performance?
Comment #22
drunken monkeyBecause that field then contains not only complete words, but also substring. So, e.g., as a completion for "lib" you'd get something like:
"lib"
"libr"
"libra"
"librar"
"library"
Comment #23
Amir Simantov CreditAttribution: Amir Simantov commentedHi guys. I have been referred here when looking for a solution to suggest only whole words. However, this issue here is dealing only with Solr a service class. I am currently using the search_api_db and was wondering whether I can make it work there, as well. Just to make sure I am clear about my need: if user types "inf" she will not be suggested with "info" but only with "information". Thanks!
Comment #24
louisnagtegaal CreditAttribution: louisnagtegaal commentedThe solution in #13 almost worked for me. One mistake I made was that I did not add the new aggregated field to the fields to index. But after I corrrected that, I still got autocomplete with stemmed words.
When inspecting the aggrated field in the Solr admin, I noticed that the field-type was somehow set to "text" and not to "text_und". I resolved this by implementing hook_search_api_solr_field_mapping_alter and change the field from "tm_*" to "tum_*"-type by changing "tm_search_api_aggregation_2" to "tum_search_api_aggregation_2" (I also have another aggregated field).
This corrected the problem, but I wonder if this solution is "better" than the solution in #15 which copies all "tm_-fields" to corresponding "tum_-fields".
Comment #25
drunken monkey"info" is maybe a bad example, as that's used as a word, too, and therefore also might be indexed. And in other situations, I can't imagine how this problem should arise with the DB backend. But please create an issue there if you are sure this problem exists there, too, and describe it in detail.
Comment #26
sgurlt CreditAttribution: sgurlt commentedThis is all working good for me since a couple of month, I only have two issues left I never figured out how to solve, maybe someone could give me a hint.
I have several fields inside my autocomplete, one of those is "company name".
Lets imagine one user added the company name "Bright Solutions" in his profile. When a vistior now starts typing in the autocomplete field "Brig", the following is printed out:
This is ok, but not perfect, I want to see the full company name here, some idea how to achieve this?
The next wiered thing is, when the user types "Bright " (with a white space after the word), the following results are returned:
Only one of them "Bright solutions" really makes sense here, the rest is random stuff... Some idea how to avoid this ?
Greetings
Sebastian
Comment #27
drunken monkeyYou can index the field as a string by changing the Solr configuration. As the type, maybe just use
sortString
, that should work fine in your case.If you want closer control over how the suggestions are generated, take a look at #2475435: Add support for switching the autocomplete implementation and the other recently committed (or pending) Search API Autocomplete issues.
Comment #28
sgurlt CreditAttribution: sgurlt commentedHm sadly i still could not resolve this, even after updating to the latest dev version. Here is my full setup, did I miss something:
1. Created new aggregation field and added all fields to it that should be searched.
2. Added the field to the index, tried indexing with fulltext and string. String does not return any results at all, so i left this on fulltext.
3. Configured autocomplete
4. Changed field type in schema_extra_fields.xml
5. Checked solr Interface if right field type is applied
6. Results:
Did I miss something?
Comment #29
drunken monkeyHm. No, seems about right. Or which fields did you include in the aggregation? Just the company name, or which others?
The suggestions should all be complete field values from there. (If you have more than one field in the aggregation, I'd use the "List" aggregation type, though.)
If not, try to debug the query sent to the Solr server (see the handbook).
Oh, also, did you reindex after changing the Solr configuration? You'll have to do that.
Comment #30
sgurlt CreditAttribution: sgurlt commentedThe aggregation contains multiple other fields. As the searchfield is for keyword search, it contains fields like, company name, user name, company tasks, ... and some more.
Even after changing the aggregation field type to "List" i am running into the same result, and yep, I reindexed ;)
The aggregated field must be index a fulltext on the field settings page right?
I will write you a privat message, maybe we could check it together using skype or hangout ? :)
Comment #31
drunken monkeyAs said, try debugging the Solr query. Also try deactivating the search server's "Suggest additional words" option, if you haven't already.
Comment #32
sgurlt CreditAttribution: sgurlt commentedFeels like I am pretty close to get this finally working. I added a new fieldType to my schema.xml:
Now i recieve the following:
The last issue that is left over is, that when I know add a blank behind a word, the suggestion stops showing:
Maybe another idea how to fix that ? :-)
Comment #33
drunken monkeyAh, yes, that's because the current autocomplete code treats everything in front of the last space (if any) as completed words, not as prefixes for a word you're still typing ("bright solutions" counting as one word, in this case, due to your tokenizing – but the autocomplete code doesn't know that).
You can either modify the autocomplete code in this module slightly to change that (i.e., hack it) or, much more cleanly, use #2475435: Add support for switching the autocomplete implementation to write a suggester plugin with mostly identical code to the autocomplete code from the Solr module, but with this change for your use case. (While you're at it, you might find other spots where you can customize the code to better match the expectations of users on your site.)
Anyways, this has little or nothing to do with this issue. If you need any more help, please create a new issue for it.
Comment #34
michaelmallett CreditAttribution: michaelmallett commentedsg88 can you please elaborate on what you've altered in your schema.xml? I've found this on stack overflow, is it you?
http://stackoverflow.com/questions/18132819/how-to-have-solr-autocomplet...
1. You seem to no longer be using the aggregate field, how are you setting everything up in drupal now to query that suggest_name field?
2. Is it possible to make the configuration changes, (searchComponent, requestHandler) in schema.xml only? Just that I am using pantheon and they only allow changes
to schema.xml. Otherwise, is there a way of making those configuration changes outside of solrconfig.xml elsewhere?
3. How do you force the search api autocomplete querying on spellcheck.q rather than q? I have tried using the search_api_spellcheck module and adding the params to the query but it's not helped.
If that's not you, then sorry! But how did you complete the configuration for your search_api_setup? You have the exact same issue as me, or least I am having a similar issue where I am trying to index on a term reference field, and only retrieve auto complete on exact matches. I have followed the directions in #13 to where I can return full terms without stemming, but similar to your issue, I am getting mixed multi word terms back. So where I have terms
"african descent"
"asian ethnicity"
I'm getting "african ethnicity", "african asian" etc, when I type 'african'.
Comment #35
michaelmallett CreditAttribution: michaelmallett commentedNever mind! I got it all sorted. My issue was that I was not delimiting the right character, and so was either returning a mixture of terms together, or the all keywords on the field were being indexed as a single term. I also had the 'suggest words' option turned on in the solr server, which was the cause of the mixing of terms together.
So, for anyone who wants to have a solr powered taxonomy autocomplete search, I did the following:
1. Added the aggregate field on the taxonomy name for the taxonomy term reference to take the taxonomy term name.
2. Added a new field in schema.xml:
<field name="tm_search_api_aggregation_1" type="suggest_phrase" indexed="true" stored="false" multiValued="true"/>
I did this in schema.xml because of platform limitation, but this is probably best done in schema_extra_fields.xml instead
3. Added the field type schema.xml
KeywordTokenizerFactory is used to index the entire term in the case of multiwords.
4. In the tokenizer settings for the index, set the PMCE Character Class to [\n], to target new lines. It turned out that when the field value was passed to solr, it was being passed as one long string with \n\n between terms.
Thanks to everyone in this thread for the work they did that got me here!
Comment #36
alexanansi CreditAttribution: alexanansi commentedHi Michael
Can you elaborate on point 4? I am not sure where / how to put this in my schema file as I am quite new to solr - thanks!
Alex
Comment #37
michaelmallett CreditAttribution: michaelmallett commentedHi Alex.
This one isn't in your schema, it's in the search api ui. When you are setting up the filters in the search api form there is a checkbox called tokenizer. Enable this and then at the bottom of the form there is a text box to add a regex style character targeting (though I don't think it's strictly regex). If you replace whats in there with [\n] including the square brackets, it should split the results by newline, rather than indexing as one massive string.
If this isn't clear (I'm on a phone) I'll come back with screenshot later
Comment #38
mstrelan CreditAttribution: mstrelan commentedSo what happens if you have a single solr server with separate indexes and separate searches, and on one index the aggregate field name is search_api_aggregation_1 and the other is search_api_aggregation_2? There doesn't seem to be a way to change the machine name of the aggregation.
Comment #39
drunken monkeyYou can change those machine names programmatically, or just add/delete fields until they have the desired machine names. Granted, it's more a hack than a proper solution, but it works.
Comment #40
ciss CreditAttribution: ciss at yousign GmbH commented@mstrelan For a patch that allows you to customize the aggregated field's machine name see #2905585: Allow custom machine name for aggregation fields.
Comment #41
wil2091 CreditAttribution: wil2091 as a volunteer commentedI'm on the same page as #32 .The suggestion stops when a white space is added. Any possible solution/s ?
Also, I'm trying to sort the suggestion values, below is a sample list -
bproduct01-1
aproduct-2
hproduct-2
cproduct004-3
I want to sort it asc -
aproduct-2
bproduct01-1
cproduct004-3
hproduct-2
I'm using the aggregate field with FullText type. But on the fields tab it's clearly mentioned as
Which clearly means FullText and List types can't be used for sorting, while the aggregated field must be indexed as a FullText to use it for Autocomplete, does this mean we can't sort the autocomplete values except integers and strings?
Any ideas ?
Comment #42
OanaIlea CreditAttribution: OanaIlea at bio.logis Genetic Information Management GmbH commentedThis issue was closed due to lack of activity over a long period of time. If the issue is still acute for you, feel free to reopen it and describe the current state.