We've just finished testing the D5 and D6 versions of this module in-house, and it appears that the Solr's partial-word completion doesn't work in the 5.x version. Try searching for 'sub' and expecting 'submarine', and you'd expect the stemmer to kick in and return all the nodes with 'sub' AND all the nodes with 'submarine'. This does not happen for the 5.x, but it does for the 6.x. The synonym filter is working, however.

What changed? Who managed to enable this feature in D6, and can they backport it to D5 please?

Thanks in advance,
*****
Senpai

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Senpai’s picture

Assigned: Unassigned » Senpai

Update. It turns out that there is no difference between the two versions of this module as it pertains to partial-word completion. The real deal is that the $query that's sent to Solr needs to have varied parameters in order to provide partial-word completion or sounds-like completion matches.

I'll roll up a patch that takes care of this, and makes it an admin-settable checkbox.

robertDouglass’s picture

please provide examples of the different queries. Also please specify what Solr version you are using. Thanks.

robertDouglass’s picture

also note I've committed some stuff in the last minutes so you might want to cvs up before you start rolling patches.

Senpai’s picture

Title: Partial-word search works in D6 version, but not in D5 version. Why? » This patch enables partial-word search completion
Version: 5.x-1.0-beta2 » 5.x-1.x-dev
Status: Active » Needs review
FileSize
2.09 KB

Thanks to the genius mind of Bill O'Connor, the apachesolr searches can now do partial word completions! Thanks for the heads-up on the recent commits, Robert.

(I sure hope this patch applies to 5.x-dev. I rolled it with Textmate, and I'm not sure I did a multifile patch properly.)

janusman’s picture

Just my $.02...

The patch would only add "*" to the last word typed; for example, searching for "brand name products" would tell Solr to search "brand name product*" which would only make affect the word "product". Adding "*" to the correct place would require a good regular expression, which the following isn't:

$query = preg_replace('/(:?[\s|$|\)])/', '*\\1', $query)

Also... I don't like the *idea* of this implementation... sure, partial word completion will bring more results, but is the search better because of it? If at all implemented, I would make results that are a partial-word match sink to the bottom of the search results, not mixed with exact-word match. After all, the searcher *must* know something =) This can be done rewriting the search in Drupal before sending to apache, adding ^ (boost) to exact terms and adding extra terms with "*" with no boost.

Example:

   (yellow sub)^5 (yellow* sub*)

This boosts exact matches vs. partial-word matches.

Senpai’s picture

Category: support » feature

Changing the Category to Feature Request.

Robert, the queries in question here would be along the lines of 'pil' finding all instances of 'Pilates', or 'sub' finding all occurrences of 'submarine'. In either case, traditional stemming wouldn't be good enough.

Senpai’s picture

Woops, I messed up on the actual search token. This is the patch that works. No, really!

@janusman in #5:

partial word completion will bring more results, but is the search better because of it? If at all implemented, I would make results that are a partial-word match sink to the bottom of the search results, not mixed with exact-word match.

The kind folks of the Lucene project have already taken your concerns into account. The search results after applying the patch in#7 *will* filter the search results by exact match as well as partial-word, but the partial word results are at the bottom of the pile by default. It took us a while to figure out that this was happening already, so here's an example to help you understand how Solr handles this patch.

Example: Search for 'member', and all nodes that have the word 'member' in their $title or $body are found and the word member is bolded, but underneath those results are nodes that have the word 'membership', but non-bolded. Try it, you'll like it!

chuckdeal97’s picture

OK, I got this patched into 6.x-1.0-alpha3, but the concerns from #5 still stand as far as multiple tokens go.

Does anyone know if it is feasible to preprocess the query to append the wildcard to each token? I can see that at the very least, the preprocessor would need to ignore 'OR' and 'AND'. Any other gotchas? Is a simple whitespace tokenizer on the query appropriate? Would it be more appropriate to write some kind of Lucene/Solr plugin to handle this?

I don't know that I am qualified to solve this problem, but I'll contribute if I can.

jaumenet’s picture

Version: 5.x-1.x-dev » 6.x-1.x-dev
pwolanin’s picture

Status: Needs review » Needs work

I don't think this patch will apply to 6.x

Also, we are using the dismax handler and

Wildcards in this "q" parameter are not supported.

See: http://wiki.apache.org/solr/DisMaxRequestHandler

Dismax gives us easy searching across fields and easier boosting, but only very simplified keyword handling. If you think you want to revert to the standard handler, please provide a patch that replicates all that.

aufumy’s picture

How about using the patch mentioned here:
http://drupal.org/node/467810 to pass in $caller as 'mycustom_search'

apachesolr_search_execute('tes*', 'type:blog', '', '', 0, 'mycustom_search')

so that when calling

apachesolr_modify_query($query, $params, $caller);

Then, custom module can specify:

function my_module_apachesolr_modify_query(&$query, &$params, $caller) {
  if ($caller == 'mycustom_search) {
    $params['q.alt'] = $query;
    $query = '';
  }
  // I only want to see articles by the admin!
  $query->add_field("uid", 1);          
}
aufumy’s picture

Title: This patch enables partial-word search completion » Partial-word search completion with Regex

don't have this working yet, just a proposed idea.

aufumy’s picture

FileSize
702 bytes

This adds remove_keys() to Solr_Base_Query.php so that q.alt can be populated and q can be unset, to be able to use regular expressions.

function my_module_apachesolr_modify_query(&$query, &$params, $caller) {
  if ($caller == 'mycustom_search) {
    $params['q.alt'] = $query->get_query_basic();
    $query->remove_keys();
  }
  // I only want to see articles by the admin!
  $query->add_field("uid", 1);          
}
pwolanin’s picture

Status: Needs work » Needs review

Unsetting like that probably gives you PHP warnings later?

Scott Reynolds’s picture

Also please update the Interface definition as well.

aufumy’s picture

FileSize
1.35 KB

I checked the error logs, and do not see warnings.

Updated the interface as well in this patch.

aufumy’s picture

Peter, what do you suggest then, setting it to NULL or empty string or?

And Scott, are you talking to me or Peter, and if me, could you please me more specific?

Scott Reynolds’s picture

That patch is exactly what I was talking about. I wanted the Interface updated so that a similar thing could be done with the Views query object

++

aufumy’s picture

Assigned: Senpai » Unassigned
FileSize
2.3 KB

Added hook_apachesolr_modify_query() to apachesolr_search.module so that if it finds * or ~ it will use q.alt.

Scott Reynolds’s picture

hmm a module shouldn't implement its own hook. this can be done in the hook_search or somewhere else in the flow.

Also, your modify query should at minimum, check to ensure its not a mlt query

edit: use string pos insead of a regex. Much faster.

aufumy’s picture

Would 2 or 3 strpos be faster than regex, I don't know the details on that. I can surely change it to strpos if 2 or 3 would be faster.

Okay I will work on instead of invoking the hook, there could be an admin switch to turn on searching of '*', '?' and '~' with q.alt instead of dismax. With a message that boosts will not apply.

robertDouglass’s picture

Version: 6.x-1.x-dev » 6.x-2.x-dev

moving to 6.2 so that it can get more attention. 6.1 is in code freeze. aufumy, Scott Reynolds - let's get this rtbc by DrupalCon at the latest?

marcvangend’s picture

Hi, what's the status of this patch? I just found out about this thread - is partial word searching still an option for the 2.x version?

jpmckinney’s picture

Added the remove_keys method. Not sure I would add the other functionality to this module. If anyone wants to get the rest in, re-roll the patch with an admin setting. http://drupal.org/cvs?commit=359546

jpmckinney’s picture

Status: Needs review » Needs work
kscheirer’s picture

I do actually like the idea of this patch - I would actually expect that a search for "sub" would return results including "submarine". Perhaps that's a result of being conditioned by Google's search.

The two patches provided actually have slightly different approaches. In #7, the wildcard is silently adding to the last term of any search. In #19, the wildcard is only added when the user uses one in their search. I prefer the former, asking the user to type in sub*
doesn't seem as good. But the patch provided in #7 doesn't have any effect anymore, I presume because the DisMaxRequestHandler now ignores wildcards in the main query.

I did get this working in 6.x-1.0 by patching in the remove_keys() function, and then using aufumy's approach to move the query into q.alt. I can provide a patch for 6.x-1.0 is anyone is interested.

I would expect this to be core behavior for the module, but it would mean all searches would be run through q.alt. I don't know enough about Solr or the DisMaxRequestHandler to know if that's a bad idea.

marcvangend’s picture

At first sight, the silent wildcards from #7 seem like a good idea, but when you think about it, it can cause difficulties as well. Imagine searching for "quest" because you want to know about a certain quest (or maybe a magazine or company called Quest) and getting results with words like 'question', 'conquest', 'equestrian', 'request' and everything else in this list.

I think that silent wildcards can only be usable when combined with some kind of relevancy algorithm that prefers the search term as word over the search term as partial match.

kscheirer’s picture

From #7, it seems like solr will already prefer whole matches over partial matches - but I haven't confirmed that this is still true.

ygerasimov’s picture

FileSize
144.93 KB
25.37 KB
144.79 KB

I can't make partial matches working.

I am trying to use admin panel of Solr.
http://localhost:8983/solr/core1/select/?q.alt=body:Uitstroombevordering...
Returns one result (screenshot 035)

http://localhost:8983/solr/core1/select/?q.alt=body:Uitstroombevorderin*...
Returns no results (screenshot 036)

Interesting that adding fuzzy search wildcard return result.
http://localhost:8983/solr/core1/select/?q.alt=body:Uitstroombevorderin~...
screenshot 037

Using fuzzy search instead is not nice at all as some long words should be found with only first three-four letters used.

ygerasimov’s picture

Title: Partial-word search completion with Regex » Partial-word search completion

Can we use somehow ExtendedDismaxQParser handler? According to http://stackoverflow.com/questions/2413946/wildcard-searching-and-highli... it might help.

pwolanin’s picture

Yes, it's not that hard to backport EDismax - but you have to build Solr yourself. I consider this the main acceptable solution. see: #713142: Add configuration option to a search environment if we want to use Dismax or EDismax.

Using the standard query handler is via q.alt is really not ideal - and not that it will only sort of work since the body is the default field.

thtas’s picture

I'm trying to get this working like in #7 but using version 2.x dev.
My method is to just add a wildcard using hook_apachesolr_modify_query

liks this:

(added to my_module.module)

function my_module_apachesolr_modify_query(&$query, &$params, $caller) {
   $query->set_keys($query->get_keys().'*');
}

This seems to work, but doesn't produce the desired results.

So I checked the Solr admin panel on my server at http ://my-server.com:8080/solr/admin/ and found that testing queries by adding wildcards in the test form also doesn't work.

*but*, if i try and do a search with an added wildcard on the "FULL INTERFACE" form located at http ://my-server.com:8080/solr/admin/form.jsp
I get exactly the results i expect with searches for both foo* and foo~ returning results for foobar as expected!

So what is this "full interface" doing differently? and how can i make the apachesolr module run a query in a similar way?

Any ideas greatly appreciated.

I'll reply here if i eventually get this working

*EDIT*

Okay i got it working, through a process of elimination of looking at the difference in params between the standard and full interface results i have found the following param is required for the partial matching to work:
<str name="qt">standard</str>

So here is the final code which works:

function my_module_apachesolr_modify_query(&$query, &$params, $caller) {
   $query->set_keys($query->get_keys().'*');
   $params["qt"]="standard";  
}

This is great as it means we can control how we want this wildcard matching to work. i.e. *query* or just query* or even query~
Nice module :)

Nick_vh’s picture

I'd like to see this in a form of a patch since this an important functionality that should be enabled or disabled by the user that is configuring apache solr!

Edit: I tried this myself and seems not to work properly in my instance of solr :
The error I have when I enable those 2 lines :

warning: htmlspecialchars() [function.htmlspecialchars]: Invalid multibyte sequence in argument in /opt/svn/trunk/www/core/includes/bootstrap.inc on line 857.

When I disable

$query->set_keys($query->get_keys().'*');

it runs fine again but the wildcard search does not function yet.

Nick_vh’s picture

FileSize
1.76 KB

Included is a patch that brings this functionality to the administration backend.

TODO : whenever a person does not request a word and just tries to fetch everything, the qt parameter should not be set.
Whenever we set this on an empty query it generates an error and also the error that apache solr is not reachable anymore.

TODO : Improve the check if we do a query with a word or just a wildcard search

Please review

Nick_vh’s picture

Status: Needs work » Needs review

Changed status to needs review

Nick_vh’s picture

This is not yet the best option since this disables the 'did you mean' option and returns when queried to a big amount of data unreliable results.

Appearently this has effect on the query engine which is used. As far as I understand when you use qt = standard a lot of the server side checks are not executed (such as the spellchecker) and some other unwanted content could pop up in your search result and this is not wanted.

The better option would be to migrate to the edismax query engine (see http://drupal.org/node/713142)

So if you are concerned about the normal functionality of your search do NOT use this patch.

jpmckinney’s picture

Status: Needs review » Closed (duplicate)

I think #713142: Add configuration option to a search environment if we want to use Dismax or EDismax. is the way to go. This issue hasn't made much progress since the first patch. Still too many TODOs and pwolanin's concerns are still valid.