After enabling the data_search module, I was finding that I would no longer get any search results on my site node content. And I was finding that when I ran cron, my search index would reach its cap on whatever number of items it was going to index and it would suddenly drop back to 0. So I started doing some further debugging and the issue seems to be coming from line 99 of data_search.module:
db_query("DELETE sd, si, snl FROM {search_dataset} sd LEFT JOIN {{$name}} dt ON sd.type = '{$name}' AND sd.sid = dt.{$base_field} LEFT JOIN {search_index} si ON sd.sid = si.sid AND sd.type = si.type LEFT JOIN {search_node_links} snl ON sd.sid = snl.sid AND sd.type = snl.type WHERE dt.{$base_field} IS NULL");
When drupal runs the query, it translates out to something like the following:
DELETE sd, si, snl FROM search_dataset sd LEFT JOIN feeds_data_regular_feed dt ON sd.type = 'feeds_data_regular_feed' AND sd.sid = dt.id LEFT JOIN search_index si ON sd.sid = si.sid AND sd.type = si.type LEFT JOIN search_node_links snl ON sd.sid = snl.sid AND sd.type = snl.type WHERE dt.id IS NULL
Where you can replace 'feeds_data_regular_feed' with your table name. When I replaced the beginning of it with "SELECT *" I get all of the search index content that are not in the feeds_data_regular feed table (which turned out to be all the nodes) which were then purged from the search index. I'm not entirely sure of how to make sure this work correctly other than perhaps splitting this into 3 separate queries with something like:
DELETE FROM {search_dataset} sd WHERE sd.type = {$name} AND sd.sid NOT IN (SELECT dt.{$base_field} FROM {{$name}})
And repeat for the other search tables. I'll report back once I am able to test some of this out in better detail.
| Comment | File | Size | Author |
|---|---|---|---|
| #8 | data.b847674.patch | 999 bytes | brad.bulger |
| #3 | 847671-btmash--3.patch | 1.1 KB | btmash |
| #2 | 847671-btmash--2.patch | 1.07 KB | btmash |
| #1 | 847671-btmash--1.patch | 1.38 KB | btmash |
Comments
Comment #1
btmash commentedSo I tested out the issue by commenting out the query that is currently running and it did 'fix' the part of the issue of not deleting node data (or potentially any other data) from the search index/dataset/node_links. But we still need a patch and as a first step towards it, the patch that I am currently using is:
Attaching patch along with issue for further review (patch is applied from data directory, not data_search).
Comment #2
btmash commentedShould have attached patch without the previous query (which I had originally commented out).
Comment #3
btmash commentedHopefully, third time is the charm - there were some errors I had missed in how I had built out the query.
Comment #4
btmash commentedAck...should have said this applies to alpha 11 in the version.
Comment #5
alex_b commentedOk, this is looking good. We should roll this into the next Data release.
Comment #6
alex_b commentedOn closer review I notice that the subselect is on a potentially very large data set (the entire data table without a limit on the result).
Why did you not use a
LEFT JOIN [datatable] WHERE [datatable].[basefield] IS NULLpattern?Comment #7
btmash commentedHence the 'needs review' ;-) This was a first stab at getting this to work correctly. Your suggestion makes more sense (I simply didn't think of a left join to check on null) and *should* be considerably faster. I'll try to roll out a patch tomorrow.
Comment #8
brad.bulger commentedIf you add
sd.type = '{$name}'to the WHERE clause of the original query, that will keep it from deleting all the rows that are indexing things besides your table. You might still want to split this into three queries if you think you might have orphan records in just one of them.Comment #9
brad.bulger commentedlooking at this a bit more, that query is more or less what data_search_wipe() does, except it doesn't add the type to the left join, and it doesn't touch search_node_links. maybe some reuse of that function would be a better way to handle the task ultimately.
Comment #10
attiks commentedPatch in #8 is working, but I also agree with #9
Comment #11
joachim commentedI have no idea what this is doing, and the query on one long line is unreadable anyway, but this works for the people who've tested it so that's good enough for me.
> maybe some reuse of that function would be a better way to handle the task ultimately
Refactoring is always good :)
But best get this fixed first, as it's critical (even if it has been in the queue for over a year -- has everyone just been running patched installations or something??)
- #847674 by BTMash, brad.bulger: Fixed data search wiping search index for node.