The simpletest's written by Steven for the search module (see #205631) uncovered a bug in the search results regarding search queries with short words.

The problem is with the logic for "simple" searches. "Simple" searches are ones that don't need to be joined with the {search_dataset} -- something previously referred to as the second pass or second pass arguments. Searches are faster when they avoid the join with {search_dataset}, so the setting of the "simple" variable tries to avoid this. However, the simpletest's found one case that wasn't quite as simple as original thought. This patch fixes that.

I'm not completely satisfied with this patch, because it adds 5 lines of code, and uses a variable named "count2". If we do go with the extra count variable, the variable names need to be cleaned up. But before we go with this solution, I'd like to see if a more elegant solution materializes, possibly by counting counting the terms in the $query array.

We should do some performance tests and/or look at the queries, just to make sure that this patch is not joining {search_dataset} too often.

CommentFileSizeAuthor
#14 205631.patch2.16 KBchx
#8 205631.patch2.81 KBdouggreen
205631.patch1.46 KBdouggreen
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Rok Žlender’s picture

I just committed search tests which run with no failures after this patch. http://drupal.org/node/205631

Gábor Hojtsy’s picture

This patch does not look like affecting performance much by looking at the code. Are you talking about performance implications of a possible alternate version?

douggreen’s picture

I had two concerns: (a) a minor concern that the code didn't seem elegant to me, and thinking that there might be a more elegant way to set the $simple variable, (b) a bigger concern that the additional case which sets the $simple variable (introduced by this patch) is as narrow as it can be, and doesn't introduce full table queries on the {search_dataset} more often than needed.

chx’s picture

Status: Needs review » Reviewed & tested by the community

The simpletests provided by Steven (thanks much!) prove that the patch works. The patch needs a bit of explanation which I will try to give:

      if ($num || drupal_strlen($s) >= variable_get('minimum_word_size', 3)) {
        $s = $num ? ((int)ltrim($s, '-0')) : $s;
        if (!isset($scores[$s])) {
          $scores[$s] = $s;
          $count++;
        }
        $count2++; // This is the new line.
      }

While $count counts scoring words, $count2 counts the words that are either numeric or are long enough. So if you just type in a bunch of words -- which I would say is the typical case -- that will be interpreted as a series of words ANDed together, this code will fire and if just one of them is long enough then it's still a simple query.

douggreen’s picture

if there are short words, i.e. words smaller than the minimum_word_size, then it's not a simple query and {search_dataset} must be used. The simple test for phrases is handled elsewhere. So this test just checks for simple AND and simple OR terms, and if any of the words are smaller than the minimum_word_size, then, it's not so simple after all.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs work

Let's get a better variable name instead of $count2 then.

chx’s picture

" if any of the words are smaller than the minimum_word_size, then, it's not so simple after all." -- you sure that's what this piece does? To my eyes it seems that if there are no words above the minimum size then will the new !$simple fire.

douggreen’s picture

Status: Needs work » Needs review
FileSize
2.81 KB

If the word is too small then $count2 == 0 and $simple is set to FALSE. @chx, does that make sense, or do you still think I'm seeing something wrong?

How about changing both $count variables to $num_new_scores and $num_valid_words, respectively. The first count, actually keeps track of the number of words that were added to the $scores array, that weren't already there... thus the suggested variable name, while the second count keeps track of the number of valid words regardless of whether they've already been added to the $scores array or not.

Gábor Hojtsy’s picture

New variable names looks way better! I guess the performance argument is left.

Gábor Hojtsy’s picture

Status: Needs review » Needs work

Chx notes there is an already committed hunk from your previously committed patch included.

chx’s picture

Status: Needs work » Needs review

I am probably daft but I still do not get it. The issue here says "if any of the words are smaller than the minimum_word_size" but $num_valid_words apparently counts the number of the words that are longer than the minimum_word_size and if that's zero then it's not a simple query. So it's "if all of the words are smaller than the minimum_word_size" notice the any vs all. What am I missing here?

chx’s picture

Status: Needs review » Needs work

Sorry, crosspost.

douggreen’s picture

search_parse_query() first the breaks the search phrase into individual array elements, with the exception that expressions are still lumped into a single array. Then _search_parse_query is called on each of these array elements.

If you search for alpha beta, which looks for the words alpha and beta, search_parse_query() first puts this into the $keys['positive'] array where each of these words in the array('alpha', 'beta'). Then _search_parse_query() is called twice, once with each of these words, and $num_valid_words is set to 1, on each call.

If you search for "alpha beta", which looks for the words alpha and beta, but also does a second pass using {search_dataset} to make sure that these two words occur next to each other as in a phrase search for "alpha beta", This time, search_parse_query() puts the two words into a single array element, and _search_parse_query() is called only once, and the explode and loop in _search_parse_query, then iterate over both words, and $num_valid_words returns 2.

If either of these words is smaller than the minimum_word_size, We need to set $simple to FALSE We actually only need to figure this out for the first case, the AND case, because we already catch phrase searches on line 1285. So, looking at the first case, the AND search, _search_parse_query is called twice, once with each word. One of these calls with return $num_valid_words of 1, and the other call will return $num_valid_words of 0. On that second call that returns $num_valid_words of 0, $simple will be set to FALSE.

At least that's what I think (and hope) is happening.

chx’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
2.16 KB

Oh that thingie is called per word when it matters! My bad. Let me resubmit your patch (do not credit me) with the already committed hunk removed.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Fixed

Committed, thanks.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.