We already have one site that implements search page using Search API solr. No we're creating another one, which will heavily rely on search functionality. Now I have to decide whether to create new Solr instance or to use existing one.

Second option looks like a better decision since it means less system administration and possibly allows us to implement general search sometime in the future (this is a likely requirement). It is possible to do that with search_api_solr? Is there any other thing I should think about before jumping into the water?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

slashrsm’s picture

Status: Active » Fixed

Nevermind... Now I realized that search_api creates meta-field called "Index ID". This is always used as a filter when using Views to output results from the index.

Taking this into consideration I realized, that requested feature works out of the box for me.

Good job!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

guillaumev’s picture

Status: Closed (fixed) » Active

I'd like to reopen this because I'm trying to achieve the same thing and I'm wondering if you or someone was able to succesfully set up a multisite search with solr and search_api, and if so, how did you do it ?

slashrsm’s picture

Title: Multiple sites sharing same solr instance? » Multi-site search with Search API and solr

Changing title.

e2thex’s picture

FileSize
2.31 KB

I have made an attempt at this but it requires the following patch as well as a sandbox module search_api_site

https://drupal.org/sandbox/e2thex/2033065

The patch does to basic things, one if the search_api_site module is around it adds a site hash to the index_id of each item (this way there are not collisions). Also it make sure that when results are returned one does not use the entity id as the key of the returning array (otherwise you get collisions.

drunken monkey’s picture

Status: Needs work » Active
+++ b/includes/service.inc
@@ -968,7 +977,8 @@ class SearchApiSolrService extends SearchApiAbstractService {
-        $results['results'][$result['id']] = $result;
+        #$results['results'][$result['id']] = $result;
+        $results['results'][] = $result;

This change is violating the contract for the search() function: the results have to be keyed by the ID*.

Other than that, it's a nice idea. If the search_api_site module would become a proper project, we could definitely introduce a similar patch to support it, if it's as easy as that.
However, the issue remains that you'd have to have identical index definitions on both servers to make use of this, right? You couldn't, e.g., filter for a field that's only indexed (or even present) on the other server. But I guess there's no easy way around that, and already Sarnia trying to support that use case.

* Admittedly I don't really know an easy (and clean) way around that, though. The fact is, returning several results with the same ID simply doesn't fit the Search API logic. Different results should have different IDs – that's what "ID" means, after all.
To solve this cleanly, you would probably have to define a new datasource controller, offering multi-site-capable versions of all item types. The upside would be that you could, in theory, also provide additional properties that way, thereby supporting the usage of fields in filters, etc., that are only present on a different server. That would probably be really hard, though, as you'd have to gain access to the property information on a different server.

drunken monkey’s picture

Title: Multi-site search with Search API and solr » Add support for multi-site searches
Version: 7.x-1.0-rc2 » 7.x-1.x-dev
Component: Miscellaneous » Code
Category: support » feature
Status: Active » Needs work
e2thex’s picture

Status: Active » Needs work

So @drunken monkey, so I want to just change the id of the item to include the site hash, but it seems to be that the id has to be an int (it is suppose to be an entity id?).

So I am looking to see what part of the contract I am violating, and do not see anything in the docs. (the abstract class function just says return a associative array contain the search results. Could this return a combo of the site key and the id, or does it have to match the id field?

Also I the search_api_site module can totally go full module, I just want to get it out and working before doing so.

drunken monkey’s picture

So @drunken monkey, so I want to just change the id of the item to include the site hash, but it seems to be that the id has to be an int (it is suppose to be an entity id?).

Basically, it can have any (scalar) data type at all. However, the type is of course determined by the item type (or, rather, its datasource controller). That's the one which has to be able to load the items using those IDs, after all. So, when you're just using an index of type, e.g., Node, of course things could go wrong when you try to pass strings as NIDs. (If you're successfully avoiding entity loads by retrieving the data from Solr, however, I don't know where it would go wrong. But maybe that's not completely working? Bear the caveat about Field API fields (last paragraph) in mind.)
To solve this cleanly, as said, you'd have to provide your own datasource controller and item types which support this behavior. See this sandbox where I do something similar with a datasource controller, providing slightly modified versions of existing entities with a new (string) ID.

Also I the search_api_site module can totally go full module, I just want to get it out and working before doing so.

OK, great!

drunken monkey’s picture

Issue summary: View changes
Status: Needs work » Needs review
FileSize
12.66 KB

Oops, I totally forgot about this issue and created a duplicate: #2146893: Add site hash to indexed documents. Closed that one now and posting the patch here.
This version is, I think, much more robust, doesn't rely on another module and is also along the lines of what the Apachesolr module does, making it more likely that storing items from both Search API and Apachesolr on the same Solr server will work. (As long as you have the Apachesolr Multi-Site module installed, otherwise Apachesolr will delete your Search API items.)

@ e2thex: Are you still working on your sandbox and interested in making it a full module? Could it work with this patch, too?

Everyone else, what are your opinions on this? With the patch, you could at least use a Solr server for as many sites as you want out of the box, without worrying about clashes. (A more detailed description is in the other issue.)

Nick_vh’s picture

$index_id = call_user_func(array($this->connection_class, 'phrase'), $index_id);

This could use some documentation

$query = "index_id:$index_id";

Isn't it nicer if we do not rely on the double quotes to get string concatenation? 'index_id:' . $index_id; IDE's also play a little nicer that way.

// If multi-site compatibility is enabled, add the site hash and
+        // language-specific base URL.

Why wouldn't we enable it by default?

+    $site_hash = !empty($this->options['site_hash']) ? search_api_solr_site_hash() . '-' : '';
+    return "$site_hash$index_id-$item_id";

Short code is fun, but less readable. Also the last one could be written a little nicer and without so much in one single quote.

if (is_array($ids)) {

Bit weird to start with if your default value is 'all' and a string. You might want to flip the order around and gain some bytecode and code predictions.

if (!($hash = variable_get('search_api_solr_site_hash', FALSE))) {

In your code you're saying that the hash is only alphanumeric. This is not correct as it can be a user defined variable. I think you do need to check it and filter it at least.

Nick_vh’s picture

Status: Needs review » Needs work
drunken monkey’s picture

Status: Needs work » Needs review
FileSize
12.79 KB

Isn't it nicer if we do not rely on the double quotes to get string concatenation? 'index_id:' . $index_id; IDE's also play a little nicer that way.

This and some of your other remarks are just a matter of taste. The Drupal coding standards explicitly allow placing variables in double quotes, so there's nothing wrong with it here, I'd say. (Also, at least in my IDE, it doesn't make any difference.)

Why wouldn't we enable it by default?

It is enabled by default, for new servers. For old ones we cannot just enable it, though, as that would completely break their functionality and require clearing all their indexes – we can't just do that without explicit permission by the user.

In your code you're saying that the hash is only alphanumeric. This is not correct as it can be a user defined variable. I think you do need to check it and filter it at least.

In that case, apachesolr_multisitesearch_apachesolr_delete_by_query_alter() is also buggy:

$query = 'hash:' . apachesolr_site_hash();

In the case of this module, we explicitly specify in README.txt that the site hash can only contain alphanumeric characters. I also don't think that will be much of an issue. But of course, we can also easily allow any kind of (non-empty) string, if you think that would be better.

Attached is a patch which, for now, adds a clarifying comment to the escaping of the index ID. (I also had an idea of how to call these functions in a cleaner fashion, but that should be done in a different patch.)

thePanz’s picture

Hi DrunkenMonkey, any news on this issue? I am going to use this feature for a new project and I would like to help in the development.
Is the @e2thex sandbox updated?
Thank you for your help in advance! :)

My two cents on this patch: HASH code is correctly set and avoids conflicts when the SOLR index is used to store more than one Drupal site.
The hash code is automatically injected by the SOLR Server as a FilterQuery for every request, thus no multi-site search functionality may be provided (AFAIK).. I'll perform more investigations soon.

drunken monkey’s picture

Title: Add support for multi-site searches » Add support for using a Solr server with multiple sites
Status: Needs review » Fixed

You are right, the issue is a bit misnomed, we are currently just trying to make using a Solr server with multiple sites (more easily) possible. Some other module would then be needed to use this new functionality to allow real multi-site searches.

Anyways, committed. Thanks for your input and help, everyone!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

xamanu’s picture

Note: This commit breaks the Sarnia module because Sarnia obtains from Solr through the search_api_solr module content indexed and stored by the ApacheSolr module. With the changes in this issue the indexes are getting mixed up and Sarnia has no results, even when not using the multi sites functionality implemented here.

basvredeling’s picture

For multisite searches the pieces of code that say:

$query = 'hash:' . apachesolr_site_hash();

should really say something like:

$query = 'hash:' . $sitehash_a . ' OR ' . $sitehash_b;

Especially in \SearchApiSolrService::search()

$fq[] = 'hash:' . search_api_solr_site_hash();

this breaks multisite searches

basvredeling’s picture

Status: Closed (fixed) » Active
FileSize
10.59 KB

I'm reopening this issue. Current multisite doesn't work. I'm typing out a recipe / proof of concept. Plus, we need a patch to tackle the problem from #18. This needs to be tested.

The recipe includes a feature module and some steps to follow to reproduce the use case. Settings mentioned in the steps match the settings stored in the feature. If you need to test under different settings you'll need to override the feature to match the correct Solr core and host:port.

recipe

  1. Install a Solr 4 server:
    • that runs on localhost
    • with a core called multisite
    • accessible on port 8983
  2. install 2 drupal 7 sites
  3. enable the attached feature module on both sites (including necessary dependencies)
  4. generate or create some content on both sites which is uniquely identifiable
  5. index the content on: /admin/config/search/search_api/index/nodes
  6. check the permissions, the feature enforces a custom permission for authenticated users only
  7. clear the cache (just to be sure)
  8. the search page should be accessible on: /search/multisite
  9. execute a search on this page.
  10. if the search returns results, 2 facets should be visible: language and site hash, so you should be able to determine if multisite search is working

patch and feature module uploaded... patch merely comments out line 838 of search_api_solr/includes/service.inc

other stuff to do

  • Rewrite the site hashes into friendly site names... like in apachesolr_multisite (with raw hashes the facets are unusable to end-users).
  • Find a way around node_access in the remote site. Can we presume user from site A is always an anonymous user to site B?
  • test in more complex setups (multisite + multilanguage + multi-index for instance)

NB: patch was uploaded in #20... sorry about that mess

basvredeling’s picture

and the patch

drunken monkey’s picture

Status: Active » Closed (fixed)

Note: This commit breaks the Sarnia module

Sarnia is unmaintained, so it's clear that it's gonna drift apart from this module over time. It shouldn't be too hard to fix this – however, I guess we could make it a lot easier by moving the adding of the filter into its own method, to simplify overriding.

However, that as well as any suggestions following from #19 should be done in another issue.

@ basvredeling: If you have any suggestions how to make this easier, please create a new issue for it. E.g., we might provide a setting on the search server to switch this filter on/off. Otherwise, you can also just use a hook_search_api_solr_query_alter() implementation to remove the filter.

basvredeling’s picture

basvredeling’s picture