Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
We already have one site that implements search page using Search API solr. No we're creating another one, which will heavily rely on search functionality. Now I have to decide whether to create new Solr instance or to use existing one.
Second option looks like a better decision since it means less system administration and possibly allows us to implement general search sometime in the future (this is a likely requirement). It is possible to do that with search_api_solr? Is there any other thing I should think about before jumping into the water?
Comment | File | Size | Author |
---|---|---|---|
#20 | apachesolr_multisite-1776534-19.patch | 656 bytes | basvredeling |
#19 | search_api_multisite_demo.zip | 10.59 KB | basvredeling |
#13 | 1776534-12--multi_site.patch | 12.79 KB | drunken monkey |
#10 | 1776534-10--multi_site.patch | 12.66 KB | drunken monkey |
#5 | 1776534.patch | 2.31 KB | e2thex |
Comments
Comment #1
slashrsm CreditAttribution: slashrsm commentedNevermind... Now I realized that search_api creates meta-field called "Index ID". This is always used as a filter when using Views to output results from the index.
Taking this into consideration I realized, that requested feature works out of the box for me.
Good job!
Comment #3
guillaumev CreditAttribution: guillaumev commentedI'd like to reopen this because I'm trying to achieve the same thing and I'm wondering if you or someone was able to succesfully set up a multisite search with solr and search_api, and if so, how did you do it ?
Comment #4
slashrsm CreditAttribution: slashrsm commentedChanging title.
Comment #5
e2thex CreditAttribution: e2thex commentedI have made an attempt at this but it requires the following patch as well as a sandbox module search_api_site
https://drupal.org/sandbox/e2thex/2033065
The patch does to basic things, one if the search_api_site module is around it adds a site hash to the index_id of each item (this way there are not collisions). Also it make sure that when results are returned one does not use the entity id as the key of the returning array (otherwise you get collisions.
Comment #6
drunken monkeyThis change is violating the contract for the
search()
function: the results have to be keyed by the ID*.Other than that, it's a nice idea. If the search_api_site module would become a proper project, we could definitely introduce a similar patch to support it, if it's as easy as that.
However, the issue remains that you'd have to have identical index definitions on both servers to make use of this, right? You couldn't, e.g., filter for a field that's only indexed (or even present) on the other server. But I guess there's no easy way around that, and already Sarnia trying to support that use case.
* Admittedly I don't really know an easy (and clean) way around that, though. The fact is, returning several results with the same ID simply doesn't fit the Search API logic. Different results should have different IDs – that's what "ID" means, after all.
To solve this cleanly, you would probably have to define a new datasource controller, offering multi-site-capable versions of all item types. The upside would be that you could, in theory, also provide additional properties that way, thereby supporting the usage of fields in filters, etc., that are only present on a different server. That would probably be really hard, though, as you'd have to gain access to the property information on a different server.
Comment #7
drunken monkeyComment #8
e2thex CreditAttribution: e2thex commentedSo @drunken monkey, so I want to just change the id of the item to include the site hash, but it seems to be that the id has to be an int (it is suppose to be an entity id?).
So I am looking to see what part of the contract I am violating, and do not see anything in the docs. (the abstract class function just says return a associative array contain the search results. Could this return a combo of the site key and the id, or does it have to match the id field?
Also I the search_api_site module can totally go full module, I just want to get it out and working before doing so.
Comment #9
drunken monkeyBasically, it can have any (scalar) data type at all. However, the type is of course determined by the item type (or, rather, its datasource controller). That's the one which has to be able to load the items using those IDs, after all. So, when you're just using an index of type, e.g., Node, of course things could go wrong when you try to pass strings as NIDs. (If you're successfully avoiding entity loads by retrieving the data from Solr, however, I don't know where it would go wrong. But maybe that's not completely working? Bear the caveat about Field API fields (last paragraph) in mind.)
To solve this cleanly, as said, you'd have to provide your own datasource controller and item types which support this behavior. See this sandbox where I do something similar with a datasource controller, providing slightly modified versions of existing entities with a new (string) ID.
OK, great!
Comment #10
drunken monkeyOops, I totally forgot about this issue and created a duplicate: #2146893: Add site hash to indexed documents. Closed that one now and posting the patch here.
This version is, I think, much more robust, doesn't rely on another module and is also along the lines of what the Apachesolr module does, making it more likely that storing items from both Search API and Apachesolr on the same Solr server will work. (As long as you have the Apachesolr Multi-Site module installed, otherwise Apachesolr will delete your Search API items.)
@ e2thex: Are you still working on your sandbox and interested in making it a full module? Could it work with this patch, too?
Everyone else, what are your opinions on this? With the patch, you could at least use a Solr server for as many sites as you want out of the box, without worrying about clashes. (A more detailed description is in the other issue.)
Comment #11
Nick_vhThis could use some documentation
Isn't it nicer if we do not rely on the double quotes to get string concatenation? 'index_id:' . $index_id; IDE's also play a little nicer that way.
Why wouldn't we enable it by default?
Short code is fun, but less readable. Also the last one could be written a little nicer and without so much in one single quote.
Bit weird to start with if your default value is 'all' and a string. You might want to flip the order around and gain some bytecode and code predictions.
In your code you're saying that the hash is only alphanumeric. This is not correct as it can be a user defined variable. I think you do need to check it and filter it at least.
Comment #12
Nick_vhComment #13
drunken monkeyThis and some of your other remarks are just a matter of taste. The Drupal coding standards explicitly allow placing variables in double quotes, so there's nothing wrong with it here, I'd say. (Also, at least in my IDE, it doesn't make any difference.)
It is enabled by default, for new servers. For old ones we cannot just enable it, though, as that would completely break their functionality and require clearing all their indexes – we can't just do that without explicit permission by the user.
In that case,
apachesolr_multisitesearch_apachesolr_delete_by_query_alter()
is also buggy:In the case of this module, we explicitly specify in
README.txt
that the site hash can only contain alphanumeric characters. I also don't think that will be much of an issue. But of course, we can also easily allow any kind of (non-empty) string, if you think that would be better.Attached is a patch which, for now, adds a clarifying comment to the escaping of the index ID. (I also had an idea of how to call these functions in a cleaner fashion, but that should be done in a different patch.)
Comment #14
thePanz CreditAttribution: thePanz commentedHi DrunkenMonkey, any news on this issue? I am going to use this feature for a new project and I would like to help in the development.
Is the @e2thex sandbox updated?
Thank you for your help in advance! :)
My two cents on this patch: HASH code is correctly set and avoids conflicts when the SOLR index is used to store more than one Drupal site.
The hash code is automatically injected by the SOLR Server as a FilterQuery for every request, thus no multi-site search functionality may be provided (AFAIK).. I'll perform more investigations soon.
Comment #15
drunken monkeyYou are right, the issue is a bit misnomed, we are currently just trying to make using a Solr server with multiple sites (more easily) possible. Some other module would then be needed to use this new functionality to allow real multi-site searches.
Anyways, committed. Thanks for your input and help, everyone!
Comment #17
xamanu CreditAttribution: xamanu commentedNote: This commit breaks the Sarnia module because Sarnia obtains from Solr through the search_api_solr module content indexed and stored by the ApacheSolr module. With the changes in this issue the indexes are getting mixed up and Sarnia has no results, even when not using the multi sites functionality implemented here.
Comment #18
basvredelingFor multisite searches the pieces of code that say:
should really say something like:
Especially in \SearchApiSolrService::search()
this breaks multisite searches
Comment #19
basvredelingI'm reopening this issue. Current multisite doesn't work. I'm typing out a recipe / proof of concept. Plus, we need a patch to tackle the problem from #18. This needs to be tested.
The recipe includes a feature module and some steps to follow to reproduce the use case. Settings mentioned in the steps match the settings stored in the feature. If you need to test under different settings you'll need to override the feature to match the correct Solr core and host:port.
recipe
patch and feature module uploaded... patch merely comments out line 838 of search_api_solr/includes/service.inc
other stuff to do
NB: patch was uploaded in #20... sorry about that mess
Comment #20
basvredelingand the patch
Comment #21
drunken monkeySarnia is unmaintained, so it's clear that it's gonna drift apart from this module over time. It shouldn't be too hard to fix this – however, I guess we could make it a lot easier by moving the adding of the filter into its own method, to simplify overriding.
However, that as well as any suggestions following from #19 should be done in another issue.
@ basvredeling: If you have any suggestions how to make this easier, please create a new issue for it. E.g., we might provide a setting on the search server to switch this filter on/off. Otherwise, you can also just use a
hook_search_api_solr_query_alter()
implementation to remove the filter.Comment #22
basvredelingok, see: #2357897: Fix the Solr multisite search use case
Comment #23
basvredelingok, see: #2357897: Fix the Solr multisite search use case