Summary of problem:
If a node's fields are modified during hook_entity_insert, and then the node is immediately sent to be indexed by apachesolr, also on hook_entity_insert, then the entity available to hook_apachesolr_index_document_build() will have stale field information.
Situation:
1. I have filefield_paths enabled, which updates a node's image file path on hook_entity_insert().
2. In a custom module I implement hook_apachesolr_index_document_build_node() to add my node's image uri to to the solr index.
3. Also in the custom module, I implement hook_entity_insert() to send my nodes directly to be indexed by apachesolr as soon as they are inserted.
The workflow is thus:
- New node is created, which looks for all modules implementing hook_entity_insert().
- filefield_paths adjusts the node's image uri on hook_entity_insert().
- My custom module sends the node to be indexed by apachesolr right away also on hook_node_insert().
- My custom module implements apachesolr_index_document_build_node() to add the node's image_uri as a solr field.
Problem:
The image uri which is indexed by apachesolr is the stale uri from before filefield_paths acted on the node.
Solution:
The entity needs to be loaded with the $reset flag set to TRUE in apachesolr_index_entity_to_documents(), so that the node's fields are not stale.
Comment | File | Size | Author |
---|---|---|---|
#3 | apachesolr-stale-entity-cache-2125595-3.patch | 2.89 KB | JordanMagnuson |
#2 | apachesolr-stale-entity-cache-2125595-2.patch | 886 bytes | JordanMagnuson |
Comments
Comment #1
JordanMagnuson CreditAttribution: JordanMagnuson commentedHere's how I'm sending the node to be indexed by solr immediately on hook_entity_insert(). Note that my module is acting after filefield_paths here, and the file uri has already been updated, and is available properly if I check
dpm($entity)
in this function.Comment #2
JordanMagnuson CreditAttribution: JordanMagnuson commentedPatch attached: reset internal cache on entity_load() in apachesolr_index_entity_to_documents().
Comment #3
JordanMagnuson CreditAttribution: JordanMagnuson commentedSeems like it might be better to make resetting the entity cache an optional parameter in apachesolr_index_entity_to_documents(), so that it doesn't always have to get reset... another patch attached.
Comment #4
JordanMagnuson CreditAttribution: JordanMagnuson commentedAny chance of getting this patch reviewed/applied? I have been using it successfully for almost a year now, and it is a required patch for my apachesolr use case.
Also note that because of the default parameter value, it does not change anything for existing installations.
Comment #5
JordanMagnuson CreditAttribution: JordanMagnuson commentedI think this issue can also be addressed by manually resetting the entity cache for a given entity before sending it to solr (as per this comment).
Comment #6
amonteroWarning: the patch above introduces a performance issue by calling entity_load() with $reset=TRUE that flushes the *entire* node entity cache.
The fix involves using entity_load_unchanged(), which will always read fresh, uncached data instead of entity_load() with $reset=TRUE, as done in issue #2623562: Avoid clearing all node cache when entity_load .