Another requirement we have for our use case is cross-entity searches. Currently, one can create an Index that applies to any one entity type: Node, User, File, etc.

In our case, we need to make cross-entity searches. That is, we want to search for, say, "anthropology", and get back Nodes for books on anthropology, videos (Media entities) on anthropology, and Users who describe themselves (in text Fields on the User entity) as anthropologists.

I am unclear on how we could do that, but if someone can point me in the right direction I should be able to put in the time to add such functionality.

(In our case we're using Solr, but presumably it shouldn't matter.)

CommentFileSizeAuthor
#14 search_api_multi.patch16.83 KBdrunken monkey
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drunken monkey’s picture

Sorry for taking a bit longer for the answer, I had to thoroughly think about this …

You are right, this would be a great feature, with quite a few sure use cases. However, as you noted, at the moment this can't be done easily, as indexes always use a single entity type, and searches always a single index. And, at the core, this is probably too deeply rooted to be easily changed. So I guess you'll need to hack around it in a more or less clean way. I had the following idea regarding that:

Execute multiple searches at once and merge the results
You could patch the search_api_page module to allow for search pages that search on several indexes at once.
Pros: no real hack, uses defined functionality.
Contras: probably rather hard to merge the results correctly; don't even want to think about paging; would also break facets; views would need to be done additionally, which would probably be even harder.
OK, forget about that one …
Trick an index into indexing different entity types
You could create a new entity type that contains all entities from the types you want to search, with a new unique ID field. You'd have to define which properties from which entity type you want to index, and maybe match different properties from different types to a single one in your type for displaying (and easier searching). Then, through implementing hook_entity_insert/update/delete(), you can keep your entities up to date, and the Search API automatically keeps track of which of your entities have changed.
Pros: although the custom entity type is a bit ugly, this would also work quite naturally with the framework.
Contras: custom fields could be a bit tricky to get right; site-specific solution, can hardly be contributed back (without investing some extra effort for a generic solution).
Query Solr server (more) directly
It does matter that you are using a Solr server: since all data is stored in a single Solr index, it would be fairly easy to search on several indexes at once, if you just circumvent the Search API for executing the search. You'd only (I think) have to add "index_id" to the fl parameter in solrconfig.xml and subclass the SolrService class, adding a method that accepts multiple indexes.
Then, of course, you'd also have to somehow create search pages or views that can use this method, which would probably require completely redoing most of that work.
Pros: "Add-on", all normal stuff would work, too.
Contras: Solr-only (but at least not site-specific); would need patches/hacks to the facet module to work with facets (Solr could deliver them, but the facet module wouldn't know about the search to begin with, and couldn't work with one spanning multiple indexes, anyways).

If I can think of another method, I'll let you know. Otherwise just decide and ask if something is unclear, or if I should elaborate a bit.
As you see, there sadly isn't a clean method right now that would let all users use this functionality. But it should be a priority when rethinking the framework on a basic level (i.e., doing a 2.0 version or a D8 port).

Crell’s picture

Good morning!

I was leaning toward option 2, as it seems the most sustainable. I'm not sure that it would necessarily be a site-specific solution. With Entity API shouldn't it be possible to treat them generically? I haven't looked into that part of the code quite deeply enough, but that's on my todo list for today. If I can figure out a way to do so that is contributable back I'll provide a patch or several.

If you're around IRC at all I'll be camped out in #Drupal, and will likely have questions. :-)

Crell’s picture

Update: Yeah, I see what you mean about the single-entity assumption being baked into the system. I just went through to see how hard it would be to change, and it would be a really deep change. Ah well.

Another thought I had. It appears that multiple indexes can use the same server entity. If two indexes bind to 2 different entity types but feed into the same search server (Solr core), wouldn't that give us the index we need? Then we just need to solve the querying question, vis, which index is going to "own" the searching.

Or just update the apachesolr_views module to D7 and hit Solr directly through that. Not impossible, I guess. I'd rather keep it all within Search API if I can, though.

drunken monkey’s picture

Another thought I had. It appears that multiple indexes can use the same server entity. If two indexes bind to 2 different entity types but feed into the same search server (Solr core), wouldn't that give us the index we need? Then we just need to solve the querying question, vis, which index is going to "own" the searching.

Yes, that's exactly what I meant with the third approach. Maybe I didn't explain it that well …

fago’s picture

yep, to me also option 3 sounds best. Baking cross-index search (but single server of course) support in the search-api views module would be great.

drunken monkey’s picture

When you find a clean way to support this on a server, define a new feature ("multi-index search" or something? But probably with a properly namespaced prefix.) and query it on the service class with supportsFeature().
For example, services supporting that feature could have to provide an additional method, searchMultiple(), for searching multiple indexes on the server at once. That done, you could provide a way for creating a view using multiple search indexes, and if that queries the feature support correctly, the Search API could even "officially" support it (e.g., I could directly patch the Solr Service class, without you having to subclass it). The module providing multi-index views could either be a new one, or merged into search_api_views.

Damn, now I'm interested myself. If only I had the time, I'd try it myself … :-/

Crell’s picture

What I was able to get working on Friday was two indexes, one for nodes and one for users, both indexing the same Fields shared between both entities, and both pointing to the same Solr core / Service. That seemed to work, although I periodically get indexes that "jam" with inexplicable 500 errors when indexing that go away if I delete and recreate the index. (I was getting that before trying this, too.)

However, when I then configure a search page or a view on either index, I get only results for that entity type even though they're both pointing to the same Solr core. Vis, the user index-based search page only returns users and the node index-based search page only returns nodes.

Are the indexes perhaps adding an additional filter on a particular field within Solr to restrict by entity type? Is that something that could be removed, conditionally? I haven't looked at the Views integration in detail yet (since Views is still in such an unstable state itself).

drunken monkey’s picture

However, when I then configure a search page or a view on either index, I get only results for that entity type even though they're both pointing to the same Solr core. Vis, the user index-based search page only returns users and the node index-based search page only returns nodes.

Are the indexes perhaps adding an additional filter on a particular field within Solr to restrict by entity type?

Yes, of course they do, that's the whole point. Sorry, I thought that was clear …
Whenever a search on an index that lies on a Solr server is executed, a filterquery "index_id:ID" is added. Normally, that is what you want – no matter, what indexes lie on a server, you only want the results for the index you are searching.
You would have to provide an additional method in the service class that doesn't add this filter query (and somehow combines the metadata of several indexes).

Crell’s picture

Urf. OK, so this means a new server class, not a new index. Geesh.

Any other pointers to where I should be digging for this? :-) I've been tracing through the code with a debugger for the past few hours and while educational I still don't have it working the way it needs to be. (Anything I do here I will try to make contributable, worry not.)

Crell’s picture

Hm. Well I found the line that adds that filter and commented it out (in SearchApiSolrService::search()), but the index is still showing only node records, not nodes and users. I'm somewhat grasping here as I am by no means a Solr expert. What else need I do?

drunken monkey’s picture

Hm. Well I found the line that adds that filter and commented it out (in SearchApiSolrService::search()), but the index is still showing only node records, not nodes and users. I'm somewhat grasping here as I am by no means a Solr expert. What else need I do?

Well, if you say that they have the same fields (with the same types), then the Solr fields should be the same as well and a search on one of those fields will return results from both indexes (when the "index_id:foo" filter isn't added).

However, the service class gets only the IDs back and will, when the search was executed on the node index, interpret all those as node IDs. So what you'll have to do is add "index_id" to the fl parameter (either in the search() method, or directly in the solrconfig.xml). Then when processing the results you'll have to load the right entity (either directly in search() (store it alongside "id" and "score" as "entity" in the results array, and Views and Page will find it) or where it is used (Views or Page)) depending on the index_id stored with the result.

Right now, you should get nodes that don't match the query, because the user with the same UID as the node's NID does.

drunken monkey’s picture

Assigned: Unassigned » drunken monkey
Status: Active » Needs work

Working on it, should be done tomorrow or Tuesday.

drunken monkey’s picture

Assigned: drunken monkey » Unassigned
Status: Needs work » Needs review
FileSize
16.83 KB

Did I say "tomorrow"? Obviously I meant "in a few minutes" …
Just tested it a bit and contrary to any assumptions it worked almost immediately! So please, check out the new Search API multi-index searches (have to think of a less complicated name, maybe …) project, I think it's a rather good solution. Took me just a little more than ten hours, too, so those previous worries of mine were also unnecessary.

There were a few changes necessary in the Search API project itself (for one, the Solr service had to implement the feature, of course – also Views had a few too hardcoded assumptions in it), which I haven't committed yet (because they aren't as tested as they should be to be committed – it's a little late here already). So to test the new module, please apply the attached patch to the search_api module (and maybe also quickly test if the things that previously aren't broken, either).

Oh, and what is included might also be good to know: What it basically does is just define the feature and how it should work in service classes. But when the search_api_views module is enabled, too, it provides new base tables for all compatible search servers (i.e., all Solr servers, at the moment). Those accumulate all fields, filters, arguments and sorts from their enabled indexes (where sorts might not work, since it's kinda hard to sort on a field which only some items have).
Through joining table columns, the output can be made to look pretty nice, too (otherwise there are a lot of empty fields, of course).

drunken monkey’s picture

Just committed this, but I'd still appreciate some reviews.
Well, otherwise people will just normally complain if something goes wrong …

drunken monkey’s picture

Status: Needs review » Fixed

Let's call this fixed.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

j0rd’s picture

Component: Code » Framework

If this patch is committed to search_api, should you remove a link to this module in the search_api project page.

Additionally, this project should also be deprecated and / or deleted.

drunken monkey’s picture

What are you talking about?

klausi’s picture

Status: Closed (fixed) » Active

I think he referred to the search_api_multi project page where the patch in this issue is still linked?

drunken monkey’s picture

Status: Active » Fixed

Ah, thanks a bunch for deciphering that! Fixed.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.