Solr implements EdgeNGramTokenizerFactory which matches against partial strings.
There is an example implementation (rather old) here... would have to research a bit (starter Google search) to get more info.
http://coderrr.wordpress.com/2008/05/08/substring-queries-with-solr-acts...
I guess we would have to copy the node's body (?) into a new field, and would have to write the PHP portion that queries Solr agains just that index to power autocomplete.
Implementing this might mean several things: (a) doing this probably would increment the index size (how much??) and (b) the Solr PHP class will probably (?) not have support for this
| Comment | File | Size | Author |
|---|---|---|---|
| #29 | 394076_29.patch | 9.75 KB | janusman |
| #29 | 394076_29.png | 12.58 KB | janusman |
| #18 | 394076-18.patch | 7.35 KB | janusman |
| #17 | 394076-14.patch | 7.53 KB | janusman |
| #10 | 394076-10.patch | 11.23 KB | janusman |
Comments
Comment #1
pwolanin commentedIf you have totally public content, there is an easier way to do auto-complete. See: http://drupal.org/node/375341
Comment #2
janusman commentedWent ahead and wrote a contrib module: apachesolr_autocomplete
It's based on @pwolanin's comment on issue #375341: use TermsCompponent for autocomplete or a directory? for project.module.
Overview: it basically enables autocomplete in the block and main query box.
Instructions:
Caveats:
Perhaps this is best included with the core module, as it requires modifications to solrconfig.xml...?
Comment #3
janusman commentedForget to set as Needs Review... and it *really* needs some work as the caveats are really bad IMO =| This is just a starting point.
Comment #4
janusman commentedSome cruft removed, and switched over to using "spell" field which includes title and body.
Comment #5
janusman commentedMissed callback in hook_menu not marked as MENU_CALLBACK. Hopefully now fit for review.
Comment #6
janusman commentedNew patch. No code changes, just packaged everything into one patch that adds the module inside contrib/ and patches solrconfig.xml.
To test, you need to:
Comment #7
pwolanin commentedA good start, but I'm not sure that is suitable for general use since AFAIK, you can never have access controls work with this.
The Solr path being hard-coded in the PHP is a problem - why don't you jsut use the current path and append '/autocomplete'?
Also, why do you search for 'solr'?
see also:
http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent...
http://wiki.apache.org/solr/TermsComponent
Comment #8
janusman commented@pwolanin: oops =) And thanks for the links... in there, Tim said something that might let us do access control:
... which might not be great performance-wise.
I'll try this out; if it works, I think we could default to the method in #6 when apachesolr_nodeaccess.module is disabled and use the latter to enable access control.
Comment #9
janusman commentedHere's a new try. Some changes:
Also see attached screenshot.
Thoughts?
Comment #10
janusman commentedI like this one better =) This one:
How to test: start with the latest 6.x beta or DEV version. Apply the patch (which will just create a new contrib module, it doesn't change any files). Go to admin/build/modules and enable the module. FLUSH ALL CACHES. Go to /search/apachesolr_search and type in a few keywords and wait for the autocomplete to kick in.
See attached patch and screenshot.
Comment #11
robertdouglass commentedUse drupal_json instead.
I'd get rid of the define and hardcode the path into the form. It doesn't make it any easier to change if it's a define, and it makes it harder to read.
We don't need a variable for this
+ // Set parameters for partial term query
+ $autocomplete_field_name = 'spell';
You can just make it
+ $matches = $response->facet_counts->facet_fields->spell
We can get rid of $html
+ $html = theme('apachesolr_autocomplete_highlight', $keys, $suggestion, $count);
+ $suggestions[$suggestion] = $html;
Comment #12
robertdouglass commentedI'm not sure what I think of the hook. What would you do with the node title? Search for the whole title?
Why is the function _nodetitles_apachesolr? I admit I'm not a fan of _function notation.
Also (and I know this is really nit-picky), please don't keep copying the code comments on these lines. In fact... it'd be nice if we could improve them so they make more sense =)
Comment #13
pwolanin commentedProbably also want to handle form ID "search_theme_form"
Comment #14
robertdouglass commentedCould you explain this limitation to me?
Thanks!
Comment #15
robertdouglass commentedIs this part needed?
I guess it's because we want extra space? But won't it mess with people's themes?
Comment #16
robertdouglass commentedReally nice feature. Congrats.
Comment #17
janusman commented@robertDouglass: corrected things you mention in #11. Re: #12, that function wasn't used by the module; deleted it.
@pwolanin: took care of form id search_theme_form.
New patch.
Comment #18
janusman commentedGlad I have your attention! =)
New patch fixing #14, #15 =)
Comment #19
janusman commentedDrat! Found a place where this fails: if you are starting from a search that has an active filter, and have the "Retain current filters" checkbox under the search box enabled, the suggestions *might* not actually return any results.
To fix this (I think?) this means writing a new autocomplete handler in JS that knows about that checkbox's value. That's a bit out of my reach expertise- and time-wise... help would be appreciated.
Comment #20
robertdouglass commentedI think it's fair to put some things like that into the $_SESSION object. That's what I'm doing to make autocompletes work right in a different setting on a client project. Every time you submit the form save the state of that checkbox, then it's there for the autocomplete to call on.
Comment #21
robertdouglass commentedThe menu handler doesn't need a title:
Comment #22
robertdouglass commentedNor does it need page arguments:
Comment #23
robertdouglass commenteddrupal_json does the printing for you:
Comment #24
robertdouglass commentedI'm still skeptical of the hook: + $results = module_invoke_all('apachesolr_autocomplete_suggestions', $query);
We could reduce these three functions to one:
becomes
Or if you think that getting the suggestions is somehow a useful API function for others, then two functions: one to get the suggestions, and the callback to drupal_json them. But the hook seems like overkill to me (unless you can explain some compelling use cases).
Comment #25
Scott Reynolds commentedHere are my comments on this patch as it stands
- Remove inline styles from theme functions
- static cache copy and paste error:
AND
Reading this patch (I haven't ran it locally yet) seems that the autocomplete doesn't work with phrases. Meaning it only auto completes for a given word.
I would suggest instead, leveraging TermsCompent* http://wiki.apache.org/solr/TermsComponent
O, and I would suggest looking for a Javascript only solution. There are solid JS to Solr examples. It avoids bootstrapping Drupal which is a huge boon. You don't need Drupal expect for the apachesolr_nodeaccess. So it would be possible to put the data you need for the node access* into JS variables and that would allow you to query Solr from JS and avoiding Drupal.
So if I was writing this patch, I would look to leverage the TermsComponent handler and I would work hard for a JS only solution, avoid Drupal bootstrap.
*Edit: Ok i get why your not using TermsComponent. Node access. But your not returning nodes, your returning term matches. So not sure node_access is important here. It would suck to have the autocomplete suggest a term and it no match anything you have access too.
Comment #26
janusman commented@robertDouglass: re #20, I think the _SESSION object would solve the problem *if* we didn't have that "Retain current filters" checkbox which can be changed by the end user at any moment =) So we would need some JS to send that state into the suggestion callback. =( (I'm thinking just adding an argument to the callback URL, somehow; need a Drupal behaviors JS guru here)
re: #21-24, I agree, I'll incorporate those. I thought the hook would be a good idea to let others add suggestions... but in the end it's a bit nuts to think there'd be more than one active module making suggestions? (Say, a module wants to add suggestions using only the taxonomy name field... or suggest using the spellchecker... dunno).
Comment #27
janusman commented@Scott: Saw your last edit, and will explain a bit... perhaps you already *got* this =)
The whole point is to get suggestions that, as you say, don't suck (actually do match something).
Hence going all the way to insure that, if apachesolr_nodeaccess or who knows what other modules that could alter the Solr query are active, they must also alter how that suggestion works.
TermsComponent was my first clue into this (look at #6) but (AFAIK) just uses one field's values across the whole index (which in my patch in #6 was the "spell" field). It can't do things like suggest phrases (like "star w") nor act on filters (you want suggestions only for "movie-type" nodes). Or can it? =)
Using facets for suggestions solves this as it accepts a query and other filters from which to suggest terms (which in reality are vaues from field "spell" from items matching
query "learn"a keyword query and other parameters from modules such as apachesolr_nodeaccess).Also, I'm sure there can be some ways to optimize this... I too am concerned with performance but have not gotten this far; I have trying to at least get usable results first and go from there. I read that using facets for terms suggestions *might* be worse than TermsComponent, but have not seen nor done any benchmarking. Bypassing Drupal of course would boost performance; would we loose apachesolr_nodeaccess or other functionality? Would it be that much quicker *and* keep the UX intact?
I'm aiming for an easy-to-install contrib module that *just* works. However of course I don't want Drupal or Solr to die from too many requests... will benchmark soon.
Oh, and I might be totally wrong on my assumptions, feel free to shamelessly educate me =)
Comment #28
robertdouglass commentedI get why we would want to bypass Drupal but I don't think it's realistic. The assumption of the module so far is that Solr is a protected resource and Drupal is its gatekeeper. I'd not like to introduce the exception to that here.
I'm working on this patch and have made serious changes (and it's looking good!) but I'm tired and going to bed. I'll post my updates tomorrow.
Comment #29
janusman commented@robertDouglass: I'm working on this too.. hopefully we don't run over each other, and can merge the best from both efforts.
New patch. Changes:
After installing this patch you need to flush all caches for it to work.
Check out the attached screenshot.
Comment #30
janusman commentedForgot to set as needs review.
Comment #31
janusman commentedFound a slight problem when suggesting additional terms (e.g. typing "learning" and getting suggestions like "learning objects").
Each "additional term" suggestion reports the expected number of results; however since our solrconfig.xml has a 'mm = 2<-35%' setting, that number is not always correct.
A scenario:
How would we solve this? We could get rid of the # of results preview in the suggestions when the number of terms > 2... or altogether.
Ideas welcome.
Comment #32
robertdouglass commentedWe can change mm for this case. I know how to do that, I'll put it in the merged patch tomorrow.
Some changes I've made:
- I'm either sending it to term completion (if the last keystroke was a character), or to further suggested terms (if the last keystroke was a space). I like this much better than always merging the two. When you're typing a word you want word completion suggestions, and when you type a space you want guidance for the next word.
- I've overhauled the algorithm for word suggestions introducing a naive scoring and some stop word detection.
Comment #33
robertdouglass commentedI've merged our two patches, more or less. We've both started looking at scoring algorithms to find the most "interesting" of the word suggestions. I also renamed some functions. The most important differences I'll post here for you to think about. My version of the patch isn't really done yet as I'm not happy with the results.
Here you can see that I stopped combining 5 suggestions with 5 terms, and made it an either/or thing based on the last character. I like this much better and suggest we both adopt it going forward.
The other major changes I've been playing with are in the term suggestions. I'll post here for your inspiration, but I'm not happy with the results yet.
Also note that I removed "results" from the count display due to real-estate concerns. I think it takes up too much space.
Comment #34
janusman commented@robertDouglass:
I think we're getting there =) I think we're getting at least acceptable results now and need to continue to fine-tune.
Re: using only one of either algorithms depending on $last_char: I think it's better performance wise, but we *really* don't know if the user has indeed finished entering a "valid" word (one that's in the index) without actually searching for it. See the last portion of this comment for an idea.
Re: your apachesolr_autocomplete_additional_term() function
Re: my spellcheck portion in the last patch: don't know if you missed or disliked that. (I suspect you were looking into it since you opened #499640: Are we stemming our spelling index?...).
AND... I say we should look at what Google, Amazon (& others) are doing.
I did a quick test of Google Suggest and Amazon.com's autocomplete features: entering "terminat" nets suggestions like "terminator salvation", "terminator 3", "terminator 4", "terminator sarah connor", etc. The order is not related to either term length nor number of results returned. Note how it also does term completion and additional terms for each suggestion. However, no concrete ideas on how to do this with Solr yet, but I'm guessing it would require multiple queries.
Comment #35
robertdouglass commentedSome stuff to look at for inspiration http://dzineblog.com/2009/07/20-ready-to-use-auto-completion-scripts.html
Comment #36
janusman commented... and still have the basic problem of "mm" from #32...
@robertDouglass: BTW did you mean to post a patch and didn't in #33?
Comment #37
robertdouglass commentederrr... If I intended to, I didn't, and I've wiped the install since then :( Fortunately it looks like I recorded the salient bits in the comment.
Comment #38
robertdouglass commentedFor mm, this is sample code that I have in modify_query in a custom module:
The xor flag is a switch between AND and OR. If mm = 1 than it is an OR query. If it equals 100% is is an AND query.
Comment #39
janusman commented@robertDouglass: Let me explain the mm issue a bit more... :)
The number of *expected* results shown by the autocomplete widget does not match the real results for that query, ONLY when the suggestion has 3 or more terms.
This is because the number of expected results in the suggestions are coming from facet count (which are not affected by mm). If the user accepts a suggestion, it is launched as a keyword search that *is* affected by mm.
For example: The user types in "one two t" in the search box. The suggestion "one two three" comes back, and says this will return 10 results (the facet count for "three" = 10). However search/apachesolr_search/one+two+three returns a LOT more records.
This is because the suggestion was calculated from "q=one two" which is equivalent to "one AND two" and then counting the occurence of "three", but the final query search/apachesolr_search/one+two+three is the equivalent of (one AND two) OR (one AND three) OR (two AND three) ... because of the mm = 2 in solrconfig.xml for dismax.
We could fix this by:
(A) not showing the expected number of results for any suggestion (less feedback but hey, Google does it)
(B) same as (A) but only when the wordcount in the suggested search > 2
(C) querying Solr for each suggestion to get the *real* number of expected results (which *might* place a heavy load on Solr)
(D) have an mm = 100% parameter by default in dismax so that, for example, search/apachesolr_search/one+two+three returns the same number of documents as the suggestion "one two three".
I'm thinking (A) for now, for being consistent and fast. But, we loose that [potentially] great feedback :(
Thoughts appreciated.
Comment #40
robertdouglass commentedComment #41
wonder95 commentedI really have a need to have this autocomplete feature for a project. I'm new to Solr, but I'm willing to help this along as best I can.
I've downloaded 6.x-2.x-dev, but I don't see this code in there, and it looks like there has been a lot of discussion since the last posted patch in this thread. Is there somewhere I can get the most current code for this?
Thanks.
Comment #42
francewhoaWow. Nice feature. Subscribing.
Comment #43
janusman commentedFinally opened up that new project! See:
http://drupal.org/project/apachesolr_autocomplete
I added most of the code from @robertDouglass comments. I evade the MM issue from #39 by not showing preview
Should I close this out and continue discussion over there?
Comment #44
wonder95 commentedI just did the same thing and have been testing it over the past hour. It rocks! This is exactly what we needed. I'll have a bunch of time to work on this next week if needed, so let me know what you think needs to be improved.
To answer your question, go ahead and close this one, and I'll grab what you have from the new project page.
Comment #45
robertdouglass commentedjanusman, yeah, I'll close it for now. Your new module will be the center of activity on this feature now. Great work!
FIXED: See the dedicated module for this feature: http://drupal.org/project/apachesolr_autocomplete