Summary of problem:
If a node's fields are modified during hook_entity_insert, and then the node is immediately sent to be indexed by apachesolr, also on hook_entity_insert, then the entity available to hook_apachesolr_index_document_build() will have stale field information.

Situation:

1. I have filefield_paths enabled, which updates a node's image file path on hook_entity_insert().

2. In a custom module I implement hook_apachesolr_index_document_build_node() to add my node's image uri to to the solr index.

3. Also in the custom module, I implement hook_entity_insert() to send my nodes directly to be indexed by apachesolr as soon as they are inserted.

The workflow is thus:

  1. New node is created, which looks for all modules implementing hook_entity_insert().
  2. filefield_paths adjusts the node's image uri on hook_entity_insert().
  3. My custom module sends the node to be indexed by apachesolr right away also on hook_node_insert().
  4. My custom module implements apachesolr_index_document_build_node() to add the node's image_uri as a solr field.

Problem:
The image uri which is indexed by apachesolr is the stale uri from before filefield_paths acted on the node.

Solution:
The entity needs to be loaded with the $reset flag set to TRUE in apachesolr_index_entity_to_documents(), so that the node's fields are not stale.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

JordanMagnuson’s picture

Here's how I'm sending the node to be indexed by solr immediately on hook_entity_insert(). Note that my module is acting after filefield_paths here, and the file uri has already been updated, and is available properly if I check dpm($entity) in this function.

/**
 * Implements hook_entity_insert().
 */
function custom_helper_entity_insert($entity, $type) {
  if ($type == 'node') {
    $env_id = apachesolr_default_environment();
    $bundles_to_index = apachesolr_get_index_bundles($env_id, $type);
  
    // Check if entity should be indexed
    $info = entity_get_info($type);
    if (!empty($info['entity keys']['bundle'])) {
      $bundle = $entity->{$info['entity keys']['bundle']};
    }
    else {
      $bundle = $type;
    }
  
    if (!in_array($bundle, $bundles_to_index)) {
      return;
    }
  
    // Entity should be indexed, so send it to solr
    $ids = entity_extract_ids($type, $entity);
    $id = $ids[0];
  
    // If this entity is unpublished, remove from index
    if ($entity->status != 1) {
      apachesolr_remove_entity($env_id, $type, $id);
      return;
    }
  
    $item = new stdClass();
    $item->entity_type = $type;
    $item->entity_id = $id;
  
    module_load_include('inc', 'apachesolr', 'apachesolr.index');
    $doc = apachesolr_index_entity_to_documents($item, $env_id);
    apachesolr_index_send_to_solr($env_id, $doc);
  }
}
JordanMagnuson’s picture

Status: Active » Needs review
FileSize
886 bytes

Patch attached: reset internal cache on entity_load() in apachesolr_index_entity_to_documents().

JordanMagnuson’s picture

Seems like it might be better to make resetting the entity cache an optional parameter in apachesolr_index_entity_to_documents(), so that it doesn't always have to get reset... another patch attached.

JordanMagnuson’s picture

Any chance of getting this patch reviewed/applied? I have been using it successfully for almost a year now, and it is a required patch for my apachesolr use case.

Also note that because of the default parameter value, it does not change anything for existing installations.

JordanMagnuson’s picture

I think this issue can also be addressed by manually resetting the entity cache for a given entity before sending it to solr (as per this comment).

entity_get_controller($type)->resetCache(array($id));
amontero’s picture

Status: Needs review » Needs work
Related issues: +#2623562: Avoid clearing all node cache when entity_load

Warning: the patch above introduces a performance issue by calling entity_load() with $reset=TRUE that flushes the *entire* node entity cache.
The fix involves using entity_load_unchanged(), which will always read fresh, uncached data instead of entity_load() with $reset=TRUE, as done in issue #2623562: Avoid clearing all node cache when entity_load .