Hi!
I searched but did not find what I think I need. Just in case I missed something, I post here to have some advice...

[Context]
I'm in a scientific community that uses a module called Tripal that basically brings data from another database (in another postgreSQL schema to be more correct) into Drupal. Current Tripal release (v2) associates/syncs data from the other database with Drupal nodes. The new release (v3) will work with the Entity API. While waiting for the new release we can't index the other database content but only the associated nodes. For example, we deal with biological data like genes. A gene is stored in a database table and several other properties of that gene are stored in other (linked) tables. The node associated to the gene only have the gene name and the gene key identifier in order to retrieve if in the other database.

With the current Search API we can't index the data related to the gene. v3 of Tripal will solve that issue but while waiting (and it can take a while! :) ), we need a temporary solution. On my site, I designed a module that just implements hook_entity_property_info_alter() and it add the fields I need to index to the "gene" nodes but it can't be used by others since their needs are not mines. They often need to index other fields (and they might store their data in a different way than I do). We need a more generic solution.

[Proposal]
After thinking a little, I came up with a more generic idea that could also interest a _larger community_ than ours. With the Tripal v2 module we can create views (with relationships/joins) that can aggregate the data of a gene from the other database. So my idea is:
what about indexing the views result?

What I would like to do is create a module allowing you to create a new (virtual) entity type and associate this entity type with a view that would take as argument an identifier. Each field of the view would be an entity field. Then, the Search API would be able to index the content of those virtual entities.

This approach would allow to index _any_ view results, regardless if the view result fields are generated from a real database field or from something else ("views PHP", views on other things than databases,...).

[Approach]
I plan to create a simple module that has an entity type "virtual view entity" (VVE). Each instance of a VVE would associate an entity name to a view. The user interface would allow you to select any view that has at least one argument which would be the entity unique identifier and that field should appear in the view fields. If the argument is left empty, the view returns all the entities.
Once a new VVE instance is created, a new associated entity type would be also be created (hook_entity_property_info). When the Drupal system would need to know the content of any entity, my module would call the associated view with the appropriate parameter in order to return the requested values.

For instance:
I got a gene in my other database called "eye color" with the properties "chromosome location", "alleles", "gene length", "publications",...
I create a view "gene_details" that would return those properties with their values and which supports the "gene_id" argument.
I create a new VVE that I will call "gene data" associated with that view.
Now Drupal knows a new entity type called "gene data" (which is read only and not stored in a table).
The Search API can now index this new "gene data" entity type and my problem is solved.

[Questions/advice request]
Before I spend time developing such a module:
1) do you think it's a good approach? Did I miss something?
2) have you got any other suggestions?

Thanks for your time by advance. I know it's a lot to read and understand.

Comments

drunken monkey’s picture

First off, the Searchlight project back in 2010 (shortly before I started work on the Search API) had a similar approach, so you might find useful bits and pieces there.

Second, instead of creating a new entity type, if you only want the data to be usable with the Search API, writing a new SearchApiDatasourceController plugin that works this way and then use hook_search_api_item_type_info() to make a new item type for each such view. The rest would work similarly to your proposal, but since you don't deal with the Search API through one layer of abstraction (i.e., entities), you can directly implement the functionality as the Search API needs it. I think that should make some things easier.

In any case, the largest problem I see with this is detecting when there are new items to index. You need to invoke hook_entity_insert() (if it's an entity type) or call search_api_track_item_insert() (if it's a datasource) each time there is a new gene. Similar for updated or deleted entries. This won't really work through the Views abstraction, you'd either have to implement that part of the functionality by circumventing Views, or you'd have to just check peridodically for all results and see which ones are new or have been removed (and marking all existing ones as changed, if they can change, so they will be indexed again).

I hope this helps!
However, normally, please create a "Support request" in the issue queue for questions if you want feedback from the maintainer. I, at least, never look into the forums.

guignonv’s picture

Thanks for your time and your answer, I really appreciate. I will investigate the data source approach. It seems to better fit. :-)

About the "issue queue", I thought about it but believed that post was not dealing with Search API code directly. Since I was thinking of a new module, I didn't want it to appear in the Search API issue queue where only the maintainer would have a look. But I also though you might not noticed this post (that's way I notified you by your contact form). We don't all have the same logic and nothing is obvious in fact... ;-)

Valentin Guignon, Bioversity International