After enabling the data_search module, I was finding that I would no longer get any search results on my site node content. And I was finding that when I ran cron, my search index would reach its cap on whatever number of items it was going to index and it would suddenly drop back to 0. So I started doing some further debugging and the issue seems to be coming from line 99 of data_search.module:

db_query("DELETE sd, si, snl FROM {search_dataset} sd LEFT JOIN {{$name}} dt ON sd.type = '{$name}' AND sd.sid = dt.{$base_field} LEFT JOIN {search_index} si ON sd.sid = si.sid AND sd.type = si.type LEFT JOIN {search_node_links} snl ON sd.sid = snl.sid AND sd.type = snl.type WHERE dt.{$base_field} IS NULL");

When drupal runs the query, it translates out to something like the following:

DELETE sd, si, snl FROM search_dataset sd LEFT JOIN feeds_data_regular_feed dt ON sd.type = 'feeds_data_regular_feed' AND sd.sid = dt.id LEFT JOIN search_index si ON sd.sid = si.sid AND sd.type = si.type LEFT JOIN search_node_links snl ON sd.sid = snl.sid AND sd.type = snl.type WHERE dt.id IS NULL

Where you can replace 'feeds_data_regular_feed' with your table name. When I replaced the beginning of it with "SELECT *" I get all of the search index content that are not in the feeds_data_regular feed table (which turned out to be all the nodes) which were then purged from the search index. I'm not entirely sure of how to make sure this work correctly other than perhaps splitting this into 3 separate queries with something like:

DELETE FROM {search_dataset} sd WHERE sd.type = {$name} AND sd.sid NOT IN (SELECT dt.{$base_field} FROM {{$name}})

And repeat for the other search tables. I'll report back once I am able to test some of this out in better detail.

Comments

btmash’s picture

Status: Active » Needs review
StatusFileSize
new1.38 KB

So I tested out the issue by commenting out the query that is currently running and it did 'fix' the part of the issue of not deleting node data (or potentially any other data) from the search index/dataset/node_links. But we still need a patch and as a first step towards it, the patch that I am currently using is:

    db_query("DELETE FROM {search_dataset} sd WHERE sd.type = {$name} AND sd.sid NOT IN (SELECT dt.{$base_field} FROM {{$name}})");
    db_query("DELETE FROM {search_index} si WHERE si.type = {$name} AND si.sid NOT IN (SELECT dt.{$base_field} FROM {{$name}})");
    db_query("DELETE FROM {search_node_links} snl WHERE snl.type = {$name} AND snl.sid NOT IN (SELECT dt.{$base_field} FROM {{$name}})");

Attaching patch along with issue for further review (patch is applied from data directory, not data_search).

btmash’s picture

StatusFileSize
new1.07 KB

Should have attached patch without the previous query (which I had originally commented out).

btmash’s picture

StatusFileSize
new1.1 KB

Hopefully, third time is the charm - there were some errors I had missed in how I had built out the query.

btmash’s picture

Version: 6.x-1.0-alpha9 » 6.x-1.0-alpha11

Ack...should have said this applies to alpha 11 in the version.

alex_b’s picture

Ok, this is looking good. We should roll this into the next Data release.

alex_b’s picture

Status: Needs review » Needs work

On closer review I notice that the subselect is on a potentially very large data set (the entire data table without a limit on the result).

Why did you not use a LEFT JOIN [datatable] WHERE [datatable].[basefield] IS NULL pattern?

btmash’s picture

Hence the 'needs review' ;-) This was a first stab at getting this to work correctly. Your suggestion makes more sense (I simply didn't think of a left join to check on null) and *should* be considerably faster. I'll try to roll out a patch tomorrow.

brad.bulger’s picture

Version: 6.x-1.0-alpha11 » 6.x-1.x-dev
StatusFileSize
new999 bytes

If you add sd.type = '{$name}' to the WHERE clause of the original query, that will keep it from deleting all the rows that are indexing things besides your table. You might still want to split this into three queries if you think you might have orphan records in just one of them.

brad.bulger’s picture

looking at this a bit more, that query is more or less what data_search_wipe() does, except it doesn't add the type to the left join, and it doesn't touch search_node_links. maybe some reuse of that function would be a better way to handle the task ultimately.

attiks’s picture

Patch in #8 is working, but I also agree with #9

joachim’s picture

Status: Needs work » Fixed

I have no idea what this is doing, and the query on one long line is unreadable anyway, but this works for the people who've tested it so that's good enough for me.

> maybe some reuse of that function would be a better way to handle the task ultimately

Refactoring is always good :)
But best get this fixed first, as it's critical (even if it has been in the queue for over a year -- has everyone just been running patched installations or something??)

- #847674 by BTMash, brad.bulger: Fixed data search wiping search index for node.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.