In sites I've deployed recently, I use search_config to set up some basic search restrictions.
However, some of those sites still have exploding search_dataset and search_index table size, especially if i have thousands of users and node_profile or profile2.

I've implemented the following hook_cron to reduce the size of these tables, and prevent any accidental exposing of restricted content types:

function MYMODULE_cron() {
  // More search workaround'ing.
  $types = variable_get('search_config_disable_index_type', array());
  if (empty($types)) {
    return;
  }
  $ph = db_placeholders($types, 'varchar');
  // Search indexing works by combing search_dataset for missing sids, or sids
  // marked for reindexing,so we can't simply delete the rows. Set the 
  // search_dataset data to empty string, so that the search module won't try to
  // index the content.
  db_query("INSERT INTO search_dataset 
      (sid, type, data, reindex)
      SELECT nid, 'node', '', 0 FROM node WHERE node.type IN ($ph)
      ON DUPLICATE KEY UPDATE reindex = 0, data = ''", $types);
  // Entry from search_index can be deleted.
  db_query("DELETE from search_index where sid in 
(SELECT nid FROM node WHERE node.type IN ($ph))", $types);
}

This has reduced the size of search_dataset significantly, and in one case, eliminated millions of rows from search_index.
Would other folks find this useful in search_config module?
If so I will write a patch that includes this option as a checkbox on the search config admin page.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Josika’s picture

I must say, that I really can use this function. Expanding of search-index and search_data gives me serious headache for last few days. If you could write the patch, I will be incredibly grateful.

Josika’s picture

Issue summary: View changes

formatting fixed

alberto facchini’s picture

Where have you inserted the hook_cron?

Alan D.’s picture

Alpha patch!!!!!!!!!!!!!!!!!!!!

Nuke existing search indexes, I don't believe anything in core will remove already indexed content

TRUNCATE {search_index};
TRUNCATE {search_dataset};
TRUNCATE {search_node_links};
TRUNCATE {search_total};

This will make search completely dead until re-indexed

Alan D.’s picture

And a quick hack to batch_indexing module for this alter (without, this module will not work with this new feature)

Alan D.’s picture

Duh. $settings['index'] should be $settings['excluded_types']

+function _search_config_update_index() {
+  $limit = (int)variable_get('search_cron_limit', 100);
+  $settings = search_config_node_settings();
+
+  $excluded_types = empty($settings['index']) ? array() : $settings['index'];
+  $excluded_types = empty($settings['excluded_types']) ? array() : $settings['excluded_types'];
AltaGrade’s picture

Hello everyone! Consider using the new module Search Index that tackles the problem from the other end: you, guys, are trying to clean up the database after the search index was generated, but what about preventing it from indexing unnecessary content from the very beginning?

AstonVictor’s picture

Status: Needs work » Closed (outdated)

I'm closing it because the issue was created a long time ago without any further steps.

if you still need it then raise a new one.
thanks