In sites I've deployed recently, I use search_config to set up some basic search restrictions.
However, some of those sites still have exploding search_dataset and search_index table size, especially if i have thousands of users and node_profile or profile2.
I've implemented the following hook_cron to reduce the size of these tables, and prevent any accidental exposing of restricted content types:
function MYMODULE_cron() {
// More search workaround'ing.
$types = variable_get('search_config_disable_index_type', array());
if (empty($types)) {
return;
}
$ph = db_placeholders($types, 'varchar');
// Search indexing works by combing search_dataset for missing sids, or sids
// marked for reindexing,so we can't simply delete the rows. Set the
// search_dataset data to empty string, so that the search module won't try to
// index the content.
db_query("INSERT INTO search_dataset
(sid, type, data, reindex)
SELECT nid, 'node', '', 0 FROM node WHERE node.type IN ($ph)
ON DUPLICATE KEY UPDATE reindex = 0, data = ''", $types);
// Entry from search_index can be deleted.
db_query("DELETE from search_index where sid in
(SELECT nid FROM node WHERE node.type IN ($ph))", $types);
}
This has reduced the size of search_dataset significantly, and in one case, eliminated millions of rows from search_index.
Would other folks find this useful in search_config module?
If so I will write a patch that includes this option as a checkbox on the search config admin page.
Comment | File | Size | Author |
---|---|---|---|
#4 | batch_indexing-hack-for-search_config-1977798-3-no-test.patch | 936 bytes | Alan D. |
#3 | 1977798-3-search_config-allow-nodes-to-be-excluded-from-index-tables.patch | 7.1 KB | Alan D. |
Comments
Comment #1
Josika CreditAttribution: Josika commentedI must say, that I really can use this function. Expanding of search-index and search_data gives me serious headache for last few days. If you could write the patch, I will be incredibly grateful.
Comment #1.0
Josika CreditAttribution: Josika commentedformatting fixed
Comment #2
alberto facchini CreditAttribution: alberto facchini commentedWhere have you inserted the hook_cron?
Comment #3
Alan D. CreditAttribution: Alan D. commentedAlpha patch!!!!!!!!!!!!!!!!!!!!
Nuke existing search indexes, I don't believe anything in core will remove already indexed content
TRUNCATE {search_index};
TRUNCATE {search_dataset};
TRUNCATE {search_node_links};
TRUNCATE {search_total};
This will make search completely dead until re-indexed
Comment #4
Alan D. CreditAttribution: Alan D. commentedAnd a quick hack to batch_indexing module for this alter (without, this module will not work with this new feature)
Comment #5
Alan D. CreditAttribution: Alan D. commentedDuh. $settings['index'] should be $settings['excluded_types']
Comment #6
AltaGrade CreditAttribution: AltaGrade commentedHello everyone! Consider using the new module Search Index that tackles the problem from the other end: you, guys, are trying to clean up the database after the search index was generated, but what about preventing it from indexing unnecessary content from the very beginning?
Comment #7
AstonVictor CreditAttribution: AstonVictor at DevBranch commentedI'm closing it because the issue was created a long time ago without any further steps.
if you still need it then raise a new one.
thanks