When you are using one Apache Solr instance to index multiple sites, you will have contents from all sites in the same index.

To ensure that the search results shows only contents from the current site, you can filter the Apache Solr index based on an attribute called 'Site Hash'.
This attribute is passed by the Apache Solr module every time the site is indexed and is stored together with the related pages.

For those who are familiar with GSA (Google Search Appliance), if you are looking for a "collection" behavior, this filter may be the answer.

Note: the site hash is generated based on the base_url by a function called "apachesolr_site_hash()" once and then it's stored in a variable in your database. So if your sites are sharing the base_url or the database, they will share the Site Hash.
In this case, you may need to add another filter (e.g.: Domain ID if using the Domain Access module).

In order to get the filter defined, you can use the following code (according to your Drupal core version):

Drupal 6

function mymodule_apachesolr_modify_query(&$query, &$params, $caller) {
  $query->add_filter('hash', apachesolr_site_hash());
}

In case you are using the Apache Solr Multisite Search module, you may need the following code to avoid any impacts to the multisite search:

function mymodule_apachesolr_modify_query(&$query, &$params, $caller) {
  if ($caller != "apachesolr_multisitesearch") {
    $query->add_filter('hash', apachesolr_site_hash());
    }
}

Drupal 7

The hook and method names were changed up for D7, so the code to get the same filter behavior should be:

function mymodule_apachesolr_query_alter($query) {
  $query->addFilter('hash', apachesolr_site_hash());
}

Deleting the index based on the Site Hash

You can also use the Site Hash to restrict the 'delete index' action in the administrative area to remove from the index only the pages related to the current site.

Use one of the following codes (according to your Drupal core version):

Drupal 6

function mymodule_apachesolr_delete_index_alter(&$query) {
  $query = 'hash:' . apachesolr_site_hash();
}

Drupal 7

function mymodule_apachesolr_delete_by_query(&$query) {
  $query = 'hash:' . apachesolr_site_hash();
}

Comments

zaloni’s picture

In the above example, to skip filtering by hash if running multisite search:

function mymodule_apachesolr_modify_query(&$query, &$params, $caller) {
        if ($caller != "apachesolr_multisitesearch") {
                 $query->add_filter('hash', apachesolr_site_hash());
        }
}
obrienmd’s picture

I've added a custom module, where the custommodule.module has the following text:

<?php
function mymodule_apachesolr_modify_query(&$query, &$params, $caller) {
  $query->add_filter('hash', apachesolr_site_hash());
}
?>

And this doesn't seem to be working. I've re-enabled the module, cleared all caches, removed the trailing "?>" from the module code, nothing seems to work. Any ideas? Using Apache Solr 6.x-1.0

jdwfly’s picture

I just implemented those functions on my sites and they are working as advertised.

Off hand I would say you forgot to change mymodule to the name of your custom module.

obrienmd’s picture

Hehe - You're right. That was dumb, sorry!

MickC’s picture

Is this step necessary when using multicore as per this method http://drupal.org/node/484800 ?

I've got this going and there are 2 indexes building in 2 separate folders - so far search is only delivering results for the relevant single site, although both sites are running on the same server.

yngens’s picture

I second question of MickC. Which method is better to go with?

miiimooo’s picture

This is really helpful but what I'm looking for is a solution similar to what acquia offer where you use some sort of authentication (e.g. BASIC AUTH) and then, somehow, the solr server adds a filter to limit the results to a particular site. I see this piece of code here http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication in the instock example but I'd rather not have a unique path for each site.
I'm not a Java or Solr so wonder where to start. I've managed to get HTTP AUTH working on Jetty, and now it would be good if the username could be the limiting fq (I think that's the filter but it's all a bit like Spanish to me). Couldn't find anything searching for this so any help would be very much appreciated.

techgirlgeek’s picture

I'm a little late replying to this post, but I recently created a solution for limiting a result set to only one site: http://drupal.org/node/1139240

chirale’s picture

I use this in both Drupal 5 (Apache Solr 5.x-2.x-dev) and 6 (Apache Solr 6.x-1.0). The code is the same, only the two .info files are different.

askibinski’s picture

This hook has been renamen to hook_apachesolr_query_alter in the Drupal 7 version.
See http://drupal.org/node/1146976

Albert Skibinski - Homepage

damien_vancouver’s picture

I bashed my head into the wall a bit on this one... it turns out it was not working becuase my sites started out as clones of the same database... which means they all had the same site hash.

This is because ApacheSolr generates a random site_hash and stores it in the variable table. So, if you clone the site, then all of its clones will have the same value for apachesolr_site_hash() which will cause you to see results from all sites.

The solution is to clear that row in the variable table, then rebuild the index. The ApacheSolr module will generate a new random site hash once it sees the old one is gone.

Using Drush: drush vdel apachesolr_site_hash

or in SQL: delete from variable where name = 'apachesolr_site_hash'

Sel_Space’s picture

this module works but with the enabled filter block I had by default :
-- Current search
-- Search found 57 items (57 = all my content)
-- (-) rw4cf4

I've cleard the cache and run cron , but with no result ,

Any Issue ??

Anonymous’s picture

if you want to exclude certain results, here is how you do that.

 function unification_search_apachesolr_query_alter($query) {
    $query->addFilter("access_node_tupwml_domain_id", 4, TRUE);
}

the TRUE is to exclude, by default this is set to FALSE