Data Search is wiping search index for node [#847674]

After enabling the data_search module, I was finding that I would no longer get any search results on my site node content. And I was finding that when I ran cron, my search index would reach its cap on whatever number of items it was going to index and it would suddenly drop back to 0. So I started doing some further debugging and the issue seems to be coming from line 99 of data_search.module:

db_query("DELETE sd, si, snl FROM {search_dataset} sd LEFT JOIN {{$name}} dt ON sd.type = '{$name}' AND sd.sid = dt.{$base_field} LEFT JOIN {search_index} si ON sd.sid = si.sid AND sd.type = si.type LEFT JOIN {search_node_links} snl ON sd.sid = snl.sid AND sd.type = snl.type WHERE dt.{$base_field} IS NULL");

When drupal runs the query, it translates out to something like the following:

DELETE sd, si, snl FROM search_dataset sd LEFT JOIN feeds_data_regular_feed dt ON sd.type = 'feeds_data_regular_feed' AND sd.sid = dt.id LEFT JOIN search_index si ON sd.sid = si.sid AND sd.type = si.type LEFT JOIN search_node_links snl ON sd.sid = snl.sid AND sd.type = snl.type WHERE dt.id IS NULL

Where you can replace 'feeds_data_regular_feed' with your table name. When I replaced the beginning of it with "SELECT *" I get all of the search index content that are not in the feeds_data_regular feed table (which turned out to be all the nodes) which were then purged from the search index. I'm not entirely sure of how to make sure this work correctly other than perhaps splitting this into 3 separate queries with something like:

DELETE FROM {search_dataset} sd WHERE sd.type = {$name} AND sd.sid NOT IN (SELECT dt.{$base_field} FROM {{$name}})

And repeat for the other search tables. I'll report back once I am able to test some of this out in better detail.

Comment	File	Size	Author
#8	data.b847674.patch	999 bytes	brad.bulger
#3	847671-btmash--3.patch	1.1 KB	btmash
#2	847671-btmash--2.patch	1.07 KB	btmash
#1	847671-btmash--1.patch	1.38 KB	btmash

Comments

Comment #1

btmash commented 7 July 2010 at 20:22

Status:

Active

» Needs review

Status	File	Size
new	847671-btmash--1.patch	1.38 KB

So I tested out the issue by commenting out the query that is currently running and it did 'fix' the part of the issue of not deleting node data (or potentially any other data) from the search index/dataset/node_links. But we still need a patch and as a first step towards it, the patch that I am currently using is:

    db_query("DELETE FROM {search_dataset} sd WHERE sd.type = {$name} AND sd.sid NOT IN (SELECT dt.{$base_field} FROM {{$name}})");
    db_query("DELETE FROM {search_index} si WHERE si.type = {$name} AND si.sid NOT IN (SELECT dt.{$base_field} FROM {{$name}})");
    db_query("DELETE FROM {search_node_links} snl WHERE snl.type = {$name} AND snl.sid NOT IN (SELECT dt.{$base_field} FROM {{$name}})");

Attaching patch along with issue for further review (patch is applied from data directory, not data_search).

Comment #2

btmash commented 7 July 2010 at 20:23

Status	File	Size
new	847671-btmash--2.patch	1.07 KB

Should have attached patch without the previous query (which I had originally commented out).

Comment #3

btmash commented 7 July 2010 at 20:44

Status	File	Size
new	847671-btmash--3.patch	1.1 KB

Hopefully, third time is the charm - there were some errors I had missed in how I had built out the query.

Comment #4

btmash commented 8 July 2010 at 17:35

Version:

6.x-1.0-alpha9

» 6.x-1.0-alpha11

Ack...should have said this applies to alpha 11 in the version.

Comment #5

alex_b commented 25 July 2010 at 19:27

Ok, this is looking good. We should roll this into the next Data release.

Comment #6

alex_b commented 28 July 2010 at 20:47

Status:

Needs review

» Needs work

On closer review I notice that the subselect is on a potentially very large data set (the entire data table without a limit on the result).

Why did you not use a LEFT JOIN [datatable] WHERE [datatable].[basefield] IS NULL pattern?

Comment #7

btmash commented 29 July 2010 at 08:13

Hence the 'needs review' ;-) This was a first stab at getting this to work correctly. Your suggestion makes more sense (I simply didn't think of a left join to check on null) and *should* be considerably faster. I'll try to roll out a patch tomorrow.

Comment #8

brad.bulger commented 4 November 2010 at 08:29

Version:

6.x-1.0-alpha11

» 6.x-1.x-dev

Status	File	Size
new	data.b847674.patch	999 bytes

If you add sd.type = '{$name}' to the WHERE clause of the original query, that will keep it from deleting all the rows that are indexing things besides your table. You might still want to split this into three queries if you think you might have orphan records in just one of them.

Comment #9

brad.bulger commented 4 November 2010 at 08:39

looking at this a bit more, that query is more or less what data_search_wipe() does, except it doesn't add the type to the left join, and it doesn't touch search_node_links. maybe some reuse of that function would be a better way to handle the task ultimately.

Comment #10

attiks commented 12 April 2011 at 20:26

Patch in #8 is working, but I also agree with #9

Comment #11

joachim commented 6 January 2012 at 09:43

Status:

Needs work

» Fixed

I have no idea what this is doing, and the query on one long line is unreadable anyway, but this works for the people who've tested it so that's good enough for me.

> maybe some reuse of that function would be a better way to handle the task ultimately

Refactoring is always good :)
But best get this fixed first, as it's critical (even if it has been in the queue for over a year -- has everyone just been running patched installations or something??)

- #847674 by BTMash, brad.bulger: Fixed data search wiping search index for node.

Comment #12

20 January 2012 at 09:50

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Data Search is wiping search index for node

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

News items

Our community

Documentation

Drupal code base

Governance of community