Not LIVE FROM THE MINNESOTA SEARCH SPRINT, but found on the plane ride home...

In search_index (the function that scores HTML), a SQL to UPDATE the {search_index} is done, and if that fails then an INSERT is done. However, since search_wipe is called just before this loop, there will never be any matching items for the UPDATE and it will always fail. The loop also can't introduce any duplicate records because the array key is the word. Thus I don't think we ever need to do the UPDATE. And the attached patch will half the number of queries needed for every word inserted into the index!

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Damien Tournoud’s picture

Status: Needs review » Closed (works as designed)

Hum. And what about the big comment in front of the UPDATE that says:

The database will collate similar words (accented and non-accented forms, etc.),
and the score is additive, so first add and then insert.

This UPDATE is required, so this is by design.

douggreen’s picture

FileSize
1.21 KB

Ah, I think you're right on the collate comment. But if that's the only case where we need to UPDATE, we should try the INSERT first and then do the UPDATE if the INSERT fails. The duplicate INSERT will fail because of the word_sid_type unique key. I think that this approach will result in one query instead of two 99% of the time.

Damien Tournoud’s picture

Status: Closed (works as designed) » Needs work

Right, we should invert those queries, because the INSERT will infrequently fail.

douggreen’s picture

Status: Needs work » Needs review
douggreen’s picture

FileSize
1.21 KB

And since scores are floats and not numbers the %d in the UPDATE should be fixed.

Damien Tournoud’s picture

Please also fix the type of the sid column in the UPDATE query (it should be an int). Oh this was messy, and the fact that I fixed that bug on the day of the release of D6 is not an excuse.

douggreen’s picture

What's wrong with the type? It's a string, which is correct.

Damien Tournoud’s picture

What's wrong with the type? It's a string, which is correct.

The sid is an int column:

      'sid' => array(
        'type' => 'int',
        'unsigned' => TRUE,
        'not null' => TRUE,
        'default' => 0,
        'description' => t('The {search_dataset}.sid of the searchable item to which the word belongs.'),
      ),

But there the %d in the UPDATE query is in quotes.

douggreen’s picture

FileSize
1.21 KB

I didn't notice the quotes around '%d'... The attached patch fixes those too.

moshe weitzman’s picture

Looks good to me. This is a big performance win, and should be backported IMO.

Damien Tournoud’s picture

Assigned: douggreen » Unassigned
FileSize
1.38 KB

Ooops. We overslept on that one.

Here is a slightly modified version of douggreen's patch (with an updated comment).

Please review!

douggreen’s picture

@Damien, thanks for fixing the comment, I obviously missed that!

Is it at all possible to keep the comment on two lines, or did it break 80 characters that way?

Please remove the @ in front of db_query. I'm not sure why and when people need to add this, but I don't think it's needed.

douggreen’s picture

I replaced @db_query. with db_query. I tried to make that comment fit on 2 lines, but couldn't make it terse yet explanatory. But while working on it, I modified Damien's text. I'm not going to be stubborn on what the comment says, it's the code here that's really important. As moshe points out, we might want to backport this, so let's try to agree on the comment verbiage, and get this RTBC!

moshe weitzman’s picture

@douggreen - did you mean to attach a patch?

douggreen’s picture

FileSize
1.46 KB

doh!

Damien Tournoud’s picture

Status: Needs review » Needs work

Hum. The comment looks good. But the @ in front of the db_query is required. The INSERT should fail with a duplicated key warning.

douggreen’s picture

What does the @ do in php? I thought it was redundant. The only place in core that there is an @db_query is in modules/node/node.module line +194, and I kinda assumed that this was a mistake.

Damien Tournoud’s picture

The @ prefix suppresses error reporting (http://www.php.net/manual/en/language.operators.errorcontrol.php).

The only place in core that there is an @db_query is in modules/node/node.module

... as well as in cache.inc and bootstrap.inc, everywhere an INSERT query could potentially fail.

douggreen’s picture

FileSize
1.46 KB

I think that this is ready now... new patch attached restoring the '@' sign. Damien's patch (#11) above was good enough too. My latest patch should be identical to his, just with a slightly different comment.

The reason this patch has take so long is that it just took some education on my part to understand why we needed @db_query!

Damien Tournoud’s picture

Status: Needs work » Reviewed & tested by the community

@douggreen, I like your comment better.

Marking #19 as (very well) reviewed. We finally made it!

catch’s picture

still applies cleanly.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Tested, reviewed and committed. Thanks.

moshe weitzman’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Reviewed & tested by the community

patch applies just fine to D6.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Fixed

Thanks, committed to 6.x.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.