Currently, even though Solr for instance is able to retrieve and return the indexed fields when searching, when the complete, unprocessed entity (or other item) is needed by a search's caller this currently always has to be loaded from the database. Apart from possible performance drawbacks, this has the disadvantage of introducing/increasing problems with stale data, most seriously security problems.

In my opinion, in the module's D8 branch we should introduce a generic per-index setting whether the entity/item should be retrieved from the server and let service classes follow that. This would, apart from solving the above-mentioned problems, also make use cases like multi-site searches easier to implement. And I think most backends should have some method for storing a serialized entity – and even if they don't have it, just storing it in the database would be fine, too.

(Actually, we could maybe (additionally or instead) just try to support revisions, keep track of which revision we've indexed and optionally load that instead of the most current one. Depending on the implementation of revisions in core this might be enough to at least minimize the problems with stale data.)

Comments

drunken monkey’s picture

Issue summary: View changes

Clarified that I'm talking about D8.

drunken monkey’s picture

Issue summary: View changes
Parent issue: » #2044421: [meta] Upgrade to Drupal 8
drunken monkey’s picture

A different plan here would be to provide a uniform way of storing result HTML on the search server, and retrieve that along with the results. This would help multi-site searches even more, since they'd just have to display that HTML. (Since we already have view-mode support there, this probably could just live in a processor even, with some framework support. And we could even store the result as several different view modes, if different searches want to use different ones – but I guess that's one step further again, and not sure if that's worth it. But at least planning for it, framework-wise, might be a good idea.)

And for the "complete item" storage solution we could have a feature to let backends advertize whether they support storing/retrieving the indexed item or not.
(Also, some types of items might not support serializing natively in PHP, so we might want to make the serialization a datasource method.)

drunken monkey’s picture

Issue tags: +release target

Time to decide: do we want that or not?
Note that it's very easy for backends to implement this on their own, so I'm not sure a generic solution is really necessary (or preferable).

Renrhaf’s picture

I hit the same issue with a custom Datasource, fetching data from an API. I'm able to index data properly but when doing searches on it, I'm always requesting data from the API again even though all data is available through the ElasticSearch indexed fields. That cause heavy performance issues and I'm tring to figure out a workaround.

Maybe I'll end up using the fully rendered entity storage into the index. Is that what is advised ?

drunken monkey’s picture

If it's a custom datasource, you could maybe just cache the fields data from the ES response in the datasource and return that when trying to load the same items instead of going through the API.
Otherwise, yes, using the "Rendered item" field could also work, if you can avoid the item loads (can be a bit tricky, unfortunately).