We have node types that have multi-value fields. These may be terms fields or any other type of field. We need to show this data in a tabular form and make it sortable. To do this we need a single node to show up multiple times in the search results for each possible combination of multi-value fields. With plain mysql, this is easy to do, but we haven't identified a way to do this with search_api/search_api_solr. We'd prefer to accomplish this with a reusable/releasable approach, rather than a once off hack.

This is a common problem for things like product displays in ecommerce. I believe that I've seen a discussion in the commerce project about using 'display' nodes to handle this. That's messy and I'd rather avoid it. However, it got me thinking about creating a new index with pseudo nodes for each variant. The pseudo nodes would never exist, but would ultimately map back to real nodes. The trick here is how to keep track of all the possible variants. If the node values change we'd have to prune some of the variants, or just regenerate them all. We can use the search_api hooks to pull from the special index rather than the normal index, and ultimately return results as if thy came from the normal index.

Any ideas on how to accomplish this in a clean fashion?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

rwohleb’s picture

After talking with another dev, I think I need to clarify this a bit. Here is an example structure of a node (json form):

node = {
    nid: 1234,
    field1: [1, 2, 3],
    field2: [1, 2, 3]
}

The resulting search results should include:

NID,        field1,       field2
-----------------------------
1234       1              1
1234       1              2
1234       1              3
1234       2              1
1234       2              2
1234       2              3
1234       3              1
1234       3              2
1234       3              3

You can see how we want the node listed multiple times with each possible combination displayed. We need to be able to sort these variants.

ohthehugemanatee’s picture

Working on the same problem. :)

Basically the problem is that we want to emulate a relational data manipulation in a non-relational DB.

To explain: all Views does with MySQL to make this result is a LEFT JOIN, because of the way we store field data. Each entry in the field gets its own row in field_my_field, with an eid, entity type, and revision id indexes. This way we can do LEFT JOIN field_my_field ON node.nid = field_my_field.entity_id AND (field_my_field.entity_type = 'node'). This is clever use of a relational database. Solr is not a relational database, though. AFAIK our schema (and Solr's model in general) is flat; it just treats each node as a single object. I was trying to think about how we could more closely map our data structure to Solr to get this done... but the problem is, though Solr does SOME basic relational stuff, it doesn't do relational JOINS at all. I think that doing this the "normal" Views way (ie - in the db layer) is probably not possible as long as the DB is Solr.

Given that we can't do this in the data storage layer, another approach would be to change our schema when this is the case. ie if you want a multi-value field to create multiple objects in display, we'll actually store it as multiple objects in the Solr index. That's the only way I can think of to get our desired result out of a flat data model. In fact, this is the same approach as what commerce does with data displays; we're just doing it across layers.

So just as how in Views we have a checkbox for "display all values in the same row", in our Search Index Fields tab we could have a checkbox for "display all values in the same result". If any of these are unchecked, we create a separate object in the solr index for each combination of single values in the unchecked fields. Then we can let Solr do its thing normally.

Thoughts?

Coyote’s picture

This would be most desirable behavior. It's very easy to get multiple rows with a content view, but we have a need for the same functionality with a (faceted) search view.

jgraham’s picture

Working denormalized search index

I have a sandbox project over here http://drupal.org/sandbox/jgraham/1777454

This creates an alternate denormalized entity search index. This works great with search_api_solr to push denormalized node entities into the solr index. Everything makes it into the index as expected, eg. for the example in comment 2 we get 9 solr entries from our one node.

Issues

When trying to get the results back out it is not so successful. I can get it to indicate that it found matching documents in the solr search index, but I'm running into the limitation that search_api assumes that the keys returned from search_api_meta_entity_search_api_item_type_info() represent entity types. To avoid stomping search_apis standard entity implementation we can't use actual entity type names so the code in my sandbox above uses 'denormalized-ENTITY_NAME' as the item_type keys. This combined with setting the 'type' attribute to the actual entity_type in our SearchApiDataSourceControllerInterface::__construct() gets us pretty far, but the instances of search_index fallback and use our 'denormalized-ENTITY_NAME' as the 'item_type' which in turn gets used in various places under the assumption that it *is* the entity_type.

It seems like it would be great to decouple the assumption about what defines a datasource as an entity or not. That is, the index should defer to the datasource about what the entity type is, if any. The index should not make the assumption that the datasource key is the entity type.

Proposed solution

  1. document an optional key 'entity_type' to return from hook_search_api_item_type_info()
  2. update SearchApiDataSourceControllerInterface interface to include a method getEntityType() that returns either '' or the entity_type if defined in hook_search_api_item_type_info()
  3. update existing search api modules that use the current $index->item_type or $this->index->item_type as the entity type to instead call $index->datasource()->getEntityType()

I'm hoping someone more familiar with search_api or one of the search_api maintainers can chime in and indicate the above approache makes sense, and if a patch accomplishing the above would be accepted or if there is any interest in decoupling the assumptions about entity_type. I think that there could be other use cases to create alternate indexes for various entity types that behave in a different manner than the default indexes as provided by search_api. Any other potential approaches that would allow an alternate search index like in my sandbox to leverage the rest of search_api would also be appreciated.

jgraham’s picture

Status: Active » Needs review
FileSize
1.69 KB
8.97 KB

Attached patch implements the proposed solution in comment 4.

With the attached patch I can get results back via search_api_page, and search_api_views. They both make additional assumptions about what is an entity id and fail loading our proper items. Perhaps our datasource can be improved to create an etid entry that the various search api display modules can use as a loading id rather than making the assumption that the id returned is the entity id.

Also attached is a corresponding patch for search_api_page(), which does the rudimentary steps in comment 4, but this still needs work.

Regardless of whether or not this denormalized solr approach is fruitful it seems like the patch to remove assumptions about entity_type could have generic usefuleness.

Status: Needs review » Needs work

The last submitted patch, search_api-page-1760706-5.patch, failed testing.

jgraham’s picture

Status: Needs work » Needs review
FileSize
83.02 KB
9.58 KB

Adjusted patch (without search_api_page patch) this one is now working with the denormalized results displaying in a views search.

There is a section around line 276 in contrib/search_api_views/includes/query.inc that we can hopefully adjust as it is not the ideal performant option, however this is the line that let's us load our denormalized entries rather than the normal full normalized entity. This was tested at commit 0f213681484ad20d0eb4388195f5c8d69b644779 from the sandboxed project linked in comment 4 with a solr search backend.

Screenshot attached to show facets working alongside denormalized results for two distinct nodes resulting in 20 16 permutations.

das-peter’s picture

das-peter’s picture

Replaced some other occurrences of $index->item_type and added getEntityType() to the interface definition.
Let's see if this passes the tests.

das-peter’s picture

Category: support » feature

Changing to feature request :)

drunken monkey’s picture

Status: Needs review » Needs work

Yes, I definitely think this extension makes sense! I'm always for adding more genericity, it's a pity I didn't think about this as a restriction right away …

+++ b/contrib/search_api_views/includes/query.inc
@@ -276,7 +276,11 @@ class SearchApiViewsQuery extends views_plugin_query {
       }
       else {
-        $row['entity'] = $id;
+        // @todo review this line to see if this can still defer loading or
+        // if we can selectively call loadItems(). This line here is the key
+        // for our results, but we would rather not have to call loadItems() as
+        // that is not ideal for performance reasons.
+        $row['entity'] = reset($this->index->loadItems(array($id)));

This is much too invasive for such a niche feature. Being able to display results without loading the entities was one of the key requirements for the Views integration, which I don't want to throw away. Especially since that could get in the way of other niche features (data sources which don't implement item loading).

+++ b/contrib/search_api_views/includes/query.inc
@@ -369,7 +373,7 @@ class SearchApiViewsQuery extends views_plugin_query {
-    $is_entity = (boolean) entity_get_info($this->index->item_type);
+    $is_entity = (boolean) entity_get_info($this->index->datasource()->getEntityType());

There are many occassions like this, which simply don't work when the type isn't an entity – in this case, e.g., this will always return TRUE because passing an empty parameter to entity_get_info() results in all entity infos being returned, not an empty result.
In this example, you could just use (boolean) $this->index->datasource()->getEntityType() instead.

Please search the patch for other code like this and always think about what happens for an index of non-entities (as well as other edge cases, if possible).

Also, I think we should add a getEntityType() method to the index class, which passes the call to the datasource controller. Just a bit shorter to write.
I'd also use NULL instead of an empty string as the return value for non-entities.

Oh, and in the entity datasource controller, you don't have to call the method, just use the property directly!

Please make these changes and I'll look at the patch in more detail.

das-peter’s picture

Status: Needs work » Needs review
FileSize
14.46 KB
14.79 KB

@drunken monkey Thank you very much for your feedback. I've adjusted the patch accordingly.

das-peter’s picture

Found a potential issue - actually it struck me in my special setup.

heyyo’s picture

Could we use this patch with Search API database or just with Solr ?
Could you provide any guideline on howto use it ?

Status: Needs review » Needs work

The last submitted patch, search_api-1760706-13.patch, failed testing.

das-peter’s picture

Status: Needs work » Needs review
FileSize
3.72 KB
18.71 KB

Here's a re-roll. I hope I found all the changes ;)

Status: Needs review » Needs work

The last submitted patch, search_api-1760706-16.patch, failed testing.

das-peter’s picture

Status: Needs work » Needs review
FileSize
19.06 KB
1.29 KB

Looks like the method getEntityType() in the index class was lost - re-added.

drunken monkey’s picture

Title: Displaying each multi-value node variant » Add a flexible way for determining whether an index contains entities
Component: Miscellaneous » Framework
Issue tags: +API change

Renaming this and tagging it appropriately. (The API change is not completely backwards-compatible in that the datasource controller interface changed.)

Will test/review later.

drunken monkey’s picture

OK, the “later” admittedly turned out to be a lot later – sorry. Anyways, here's a revised version of the patch. It lacked some documentation and also used the new method in several places where the plain item type needs to be used. It also often didn't take into account that the method return value can be empty, which would probably have lead to some weird bugs.

So, could you please test this with your setup, does it work for you?

das-peter’s picture

I gave it a try, unfortunately I wasn't able to test it with "Search API Denormalized Entity Index" but I think that was caused by my odd existing setup and not the patch ;)
The only things I found were some inconsistencies in the api documentation.
I've adjusted that and here's the adjusted patch.

drunken monkey’s picture

Ah, thanks for catching that!
However, I think we should at least have someone test this successfully with a custom datasource with entitites before we commit this. Otherwise that part is completely untested.

das-peter’s picture

Do you know someone with such a custom entity datasource, so I can bother them? ;)

drunken monkey’s picture

Status: Needs review » Fixed

Good question. jgraham and rwohleb don't seem interested any more. I'd have to search trough the issue queue to find someone. Actually I'd hoped you'd bring it to run with your setup, since you're the one who wants this patch committed.
But in the end, the patch doesn't add a regression in any case, so I guess we can just commit it and wait for someone to complain if they want to use it. Or maybe you'll find some flaw, or succeed in making it work with this patch.

So, committed. Thanks for all your work, everyone!

das-peter’s picture

Thank you very much for committing it - I'll update my version asap and I'll complain when it causes an error :P

Automatically closed -- issue fixed for 2 weeks with no activity.

regal’s picture

Version: 7.x-1.x-dev » 7.x-1.16
Issue summary: View changes
FileSize
260.25 KB
247.11 KB

When I create an indexed search with a repeating date field, I am unable to uncheck the "Display all values in the same row."
I'm not a developer, so I couldn't get everything above, but it seemed like whatever was resolved was committed to the current module.

Can you explain how I can index a repeating date field so I'm able to see the event node appear multiple time in the view?

I'm using the current version of Search API and Views and Solr.

UPDATE: I see that this is a known limitation. I'm trying to see if Date Repeat Entity can help this issue.