The "Search" on the upper right of the drupal page works so badly that it can prevent people from finding what they are searching for. I have been trying to use it to get needed information on drupal, modules, internals, &c., and it's maddening.
For instance, I'm trying to find out what's in the file "cache_content", so I put that string, including quotes and underscore, into the Search . . . and it returns any form of the word cache OR any form of the word content OR any word that's real similar to one of the two.
Search needs more user control. If you want a general, fuzzy search for general, fuzzy questions, fine, but if that's the only thing available, it's a serious problem.
Comments
Comment #1
vm commentedsmall help, but there is a separate modules seach once you enter the modules area.
Not sure what documentation can do about this is reads more like you want the apache_solr module to do something more than it is doing. Filing an issue with that module, which is running on drupal.org may be a better avenue.
Comment #2
leehunter commentedMoving to infrastructure
Comment #3
drummBlanket statements about how bad search is are not helpful. Specifics are helpful- what you searched for and what you expected to find.
I am not familiar with "cache_content," is this a file included with a module? I am guessing this may be improved by two projects- integrating API search results #84207: Integrate API full text search into the ApacheSolr setup and documenting all contributed modules, which I mention at http://delocalizedham.com/state-of-the-api-module.
Comment #4
pwolanin commented"cache_content" is likely the table {cache_content} (which comes from CCK, not Drupal core), so I'm guessing it doesn't even show up on api.d.o in a search.
If you are doing searches about Drupal internals you should be here: http://api.drupal.org
This search does turn up many relevant results: http://drupal.org/search/apachesolr_search/%22cache_content%22
but indeed seems like Solr wants to split on the punctuation into 2 tokens which get stemmed and matched as a phrase, as expected:
http://wiki.apache.org/solr/DisMaxRequestHandler see 'q'
Comment #5
leehunter commentedIt does seem that queries in quotation marks are not being handled correctly. Or maybe just not the way that I would expect.
For example, if I do a search for "report generation" (including quotes) my expectation is that I would see only pages with that exact term, or at least that results with that term would be displayed before any other pages that only have stemmed variations of the component words. Instead, what happens is that the first displayed hit is for an exact match and the rest are a jumble of exact matches and guesses.
If I go to the trouble of putting a search term in quotation marks, it means that I'm trying to exclude those other results (or at least push up the results with an exact match)
Comment #6
leehunter commentedChanged title to be more inclusive
Comment #7
pwolanin commentedQuotes forces a phrase search, not an exact match.
Comment #8
leehunter commentedIs it possible to change this behavior?
User expectations for search have been very heavily conditioned by Google, where quotes indicate the searcher is looking for an exact match.
From the Google Help docs "By putting double quotes around a set of words, you are telling Google to consider the exact words in that exact order without any change." http://www.google.com/support/websearch/bin/answer.py?hl=en&answer=136861
Comment #9
pwolanin commentedCranking down the 'qs' value gives closer to an exact match (i.e. there can be fewer or no intervening words). So, d.o could change that by adding it to their solrconfig.xml as a query default, or via a module to add it at query time, but Solr's dismax handler doesn't support exact phrase matching. Putting in "solr single" or "single solr" will match the same docs.
So, basically, no. Feel free to use: http://www.google.com/search?q=site%3Adrupal.org
Comment #10
Neil_in_Chicago commentedLeeHunter @2
Thanks. I think I'm learning my way around, but it's still foggy.
drumm @3
I thought I'm trying to find out what's in the file "cache_content", so I put that string, including quotes and underscore, into the Search . . . and it returns any form of the word cache OR any form of the word content OR any word that's real similar to one of the two.
Search needs more user control. If you want a general, fuzzy search for general, fuzzy questions, fine, but if that's the only thing available, it's a serious problem was pretty specific.
I want items including the string "cache_content".
I don't want "Content set contents typically don't change often, so cache", or "must rebuild all the cached content", etc.
Comment #11
drummNeil_in_Chicago- yes, that was specific enough to get us started and hopefully we did figure out the root issue. I read the issue as "Search sucks, oh and here is why," where I would like to have seen simply "phrases don't work."
Drupal.org's search has had plenty of problems. We recently put in much more flexible tools, such as Apache Solr, so we can fix many issues if we know what we are looking for. Bug reports are appreciated so we can continue to improve.
Pwolanin- would cranking down qs affect all searches or just phrase searching? Would any affect on all searches be negative or positive?
Comment #12
pwolanin commentedChecking into this more - the default qs seems to be 0 (or NULL) already, so I'm not sure there is much to do - we could document this - i.e. that "" finds words in immediate proximity, but not exact matches or ordering.
Increasing qs might help some users, but leaving as-is seems fine to me. I think most people will just put in their keywords and never assume that the know the exact string in advance.
We might look at the tokenizer and see if there is some way to avoid splitting on _ or ->, for example, though right now we are limited to using the solr.CharStreamAwareWhitespaceTokenizerFactory since we are using the filter to map accented to non-accented characters. This is probably more relevant for api.d.o searching.
For api.d.o we can also think about having a filter to match exctly on function names, for example.
Comment #13
moshe weitzman commentedI have to agree that exact match is what most searchers will expect and it is surprising that solr can easily support this.
Comment #14
gerhard killesreiter commentedhttp://drupal.org/search/apachesolr_search/%22most%20searchers%22
Seems to work now.