If using workbench moderate the status of the node is zero until workbench_moderation_store is ran which happens during drupal_register_shutdown_function and still the change is not available in the db until after the node_save is complete and since apachesolr does a node_load the see if the node is published during apachesolr_index_node_status it is '0' during a node_update.

To remedy this issue I added an additional parameter to apachesolr_index_node_status to pass the entity that is handed over to apachesolr_entity_update and in apachesolr_index_node_status_callback I am checking to see if the entity object is available and if so use it instead of doing the node_load, since the entity at that point would contain the correct status state.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

grndlvl’s picture

Title: Nodes are not requeued for index if using workbench_moderate » Nodes are not requeued after update for index if using workbench_moderate
Nick_vh’s picture

This is weird,

I am not sure I understand why we need the complete entity object + the type and id? How does workbench handle this differently? If it has been loaded, it should be in the cache right?

pwolanin’s picture

This sounds like a bug in workbench_moderation - it's trying to be clever and not using the Drupal API normally

grndlvl’s picture

I don't care for the shutdown too much either, but I think it is currently a necessary evil.

As noted in comment #5 on #1452016: Incompatibilities with Migrate module.

The shutdown function is crufty. It's necessary to re-save the published version of a node after saving an unpublished "forward revision." All of the modules in this space, ERS, State Machine, etc use something like this (hook_exit also works) to get around the fact that core always puts the most recently saved thing in the node table.
I'll be bringing this up at my core conversation in Denver: http://denver2012.drupal.org/content/i-just-want-edit-node

Nick_vh’s picture

Status: Needs review » Closed (won't fix)

Don't think we can fix this in the module. Please move it over to the workbench module and see if they can fix it using regular drupal core apis.

If necessary, I recommend you to create a custom module that hooks into both and creates the compatibility?

David_Rothstein’s picture

Title: Nodes are not requeued after update for index if using workbench_moderate » Pass the full entity to Apache Solr status callbacks so modules which manage node revisions can use it
Category: bug » task
Status: Closed (won't fix) » Needs review
FileSize
1.89 KB

It actually turns out there are some scenarios where the above patch doesn't quite work correctly.

I'm going to post a patch for Workbench Moderation; however, I don't think it can be entirely fixed there because Apache Solr simply isn't passing enough information to the status callback for the module to be able to do its job. (By only passing the entity ID, there is no way you can get information about the particular revision that is being saved, only the current revision.)

So here's a reroll of the above patch that simply passes the entity along (as before), but doesn't try to do anything with it in the Apache Solr module itself. The rest can be handled in Workbench Moderation.

David_Rothstein’s picture

David_Rothstein’s picture

Rerolled to apply to the latest 7.x-1.x code.

pwolanin’s picture

This looks pretty harmless, though I'm still suspicious of what workbench is doing

rylowry@gmail.com’s picture

The patch in #8 looks like its working for me.

pwolanin’s picture

Version: 7.x-1.x-dev » 6.x-3.x-dev
Status: Needs review » Patch (to be ported)

committed to 7.x. Needs backport.

j0rd’s picture

I had another bug related to this code in older version of solr. After I pulled -dev I noticed this new code is here.

It's unfortunately not optimal and I believe it has bugs.

FROM MY PREVIOUS CLOSED ISSUE #2017593: apachesolr_index_node_status_callback() bug when publishing / unpublishing content and using entitycache.
---

QUOTE

I've found this to be a bug within entitycache where passing NULL to resetCache doesn't actually cause cache to get reset.
#1333542: Possible issue with EntityCacheControllerHelper::resetEntityCache()

It doesn't appear the node_load($entity_id, NULL, TRUE) code is longer in -dev. So I suppose this doesn't matter.

I did notice the code in -dev might still be effected by this, since you're doing an entity_load() with out a cache reset in hook_entity_update, which will cause you to get the "old cached entry" in the DB, since it's not cleared till later I believe in the saving process.

There's simply a small bug causing this issue though, since you are passing the updated $entity into the function (but then not using it for some reason)

Here's the fix for this issue I believe.

I've also attached a "dont load code" patch, which doesn't load some code, if it's not required.

---

So here are the patched I made before I found this issue.

At best, my code fixes a caching bug with entity_load() (which i'm not 100% sure exists), at worst, it avoids an un-needed entity_load().

You can't call entity_load() in hook_entity_update() as you'll get the node, which has not yet been saved. Previously required a node_load($nid, NULL, TRUE) or entity_cache_controller('node')->resetCache(array($nid)); to be able to node_load() the valid node I believe. I could be wrong, but this was the bug I ran into today.

j0rd’s picture

Version: 6.x-3.x-dev » 7.x-1.x-dev
Status: Patch (to be ported) » Needs review

Changing status.

pwolanin’s picture

I'm a little unclear if the bug you see is a consequence of what was committed, or is a separate bug?

j0rd’s picture

In this old patch you'll notice the node_load with reset = TRUE.

https://drupal.org/files/apachesolr-allow_updates_to_use_pass_entity.patch

Some other patch, probably replaced node_load() with entity_load() (and correctly so), but it seems to also have removed the $reset = TRUE, for some unknown reasons to myself.

Changing to entity_load is correct, but not resetting the cache during the load causes a bug, as calling node_load or entity_node within a hook_update loads the old "un-saved" node.

Previously calling node_load with the reset was required to get the "hook_entity_update" node, as only entity_id was the argument being passed in. Now that we're passing in the entity, the entity_load is not longer required and will avoid this caching issue all together.

This is what my patch does, and I believe it's the most correct.

So while I don't think the bug was introduced by any of these patches, the logic in this patch is incorrect and will causing problems when applied to current -dev. My patch is the most correct version of this patch, as it removed an un-required entity_load (and you end up avoiding a caching bug as a bonus).

pwolanin’s picture

We were previously calling reset to avoid memory issues, so perhaps there is more of a problem here.

j0rd’s picture

apachesolr_index_node_status_callback() is called via two endpoint hooks.

One in apachesolr_entity_update()

In this case you need to call reset, or used the passed in entity as there's a DrupalEntityController "quirk" which loads the old "before-saved" entity from cache instead of the changed/updated $entity which is passed into hook_entity_update()

See this comment:
https://api.drupal.org/comment/25648#comment-25648

or the comments in here:
https://api.drupal.org/api/drupal/modules!node!node.api.php/function/hoo...

This is previously why reset was called. It's currently not being called, but since we're now passing in that updated entity in apachesolr_entity_update we don't need to call entity_load(). We just need to make sure we use the passed in entity if it's available. Which is what my patch does.

The other places we call apachesolr_index_node_status_callback() are in apachesolr_cron()
via apachesolr_index_get_entities_to_index() and apachesolr_index_entities_document()

Since these are not called in apachesolr_entity_update() we do not need the reset. Things should work as expected.

So does that make sense?

I found this issue due to a bug in EntityCache module, and upon reviewing the ApacheSolr latest dev code + this patch, I realized that it would suffer from this bug.

acbramley’s picture

I came here from #1836680: Nodes controlled by Workbench Moderation are sometimes added or removed from the Apache Solr index at the wrong times and applying the 2 patches in #12 stopped nodes being removed from the index when a draft version was saved twice using workbench_moderation

ianthomas_uk’s picture

I've also run into this issue. In my case I have a parent entity (a node) that has an associated set of child entities (a custom entity type) and when the node gets published/unpublished I have custom code to call apachesolr_index_node_status_callback(), but it's failing because entity_load returns the older version of the entity.

2017593-apache_solr_node_status_callback-3.patch looks correct to me and is very similar to what I was about to write. j0rd: I'm a bit confused by your comments, are you saying that this patch would cause problems if committed?

j0rd’s picture

@ianmthomasuk, my patch avoids the issue as it doesn't do an entity_load when the entity is passed in. The entity is passed in, when the callback is called via hook_entity_update, so my patch avoids the issue entirely.

This previous committed patch (apachesolr-allow_updates_to_use_pass_entity-1565766-8.patch), still has the issue of entity_load in hook_entity_update(), since it calls entity_load with out resetting the cache. Since it's already committed, you need to also apply my patch to fix the problem.

ianthomas_uk’s picture

Status: Needs review » Reviewed & tested by the community

OK, that makes sense. I'm using 7.x-1.3 which includes the earlier patch, so I was only really looking at #12 onwards.

In that case, RTBC for 2017593-apache_solr_node_status_callback-3.patch
(by the way, the second number in the patch name is conventionally the comment number that it will be attached to, so should have been 12 rather than 3)

2017593-dontloadcode-3.patch might be better in it's own issue.

j0rd’s picture

Thanks. I actually made this patch in another issue, but it was relevant to this issue, so I closed my old one out and added it here. Sorry for the confusion. That bug turned out to be a problem with entity_cache module.

#2017593: apachesolr_index_node_status_callback() bug when publishing / unpublishing content and using entitycache.

pwolanin’s picture

Version: 7.x-1.x-dev » 6.x-3.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)
FileSize
1.57 KB

committed 2017593-apache_solr_node_status_callback-3_0.patch after moving the code comment down and fixing code style (see attached).

back to needing to be ported

damontgomery’s picture

To be clear, #23 is going to be part of 7.x-1.4 (which I can't seem to download right now, but maybe I'm trying too quickly)?

Thanks.

pwolanin’s picture

pandaeskimo - I haven't actually tagged & released it yet. Check out the latest 7.x-1.x and let me know if there are any critical bugs...

damontgomery’s picture

For some reason #23 is giving me this error when I try to apply it to 7.x-1.x-dev or 7.x-1.3 :

error: patch failed: docroot/sites/all/modules/apachesolr/CHANGELOG.txt:1
error: docroot/sites/all/modules/apachesolr/CHANGELOG.txt: patch does not apply

Applying #12 works fine. I'm going to test it.

Sorry I can't really help you with other testing unless I get more direction. I have really just started diving into these search modules and I'm not clear on the entire system yet.

pwolanin’s picture

check out 7.x-1.x from git. Or in any case, it doesn't matter if the CHANGELOG part doesn't apply

damontgomery’s picture

Well, here is #23 without the changelog bit in case someone wants to use that without modifying the file.

acbramley’s picture

j0rd’s picture

The don't load code patch could probably get committed, as it just avoids loading code which doesn't need to get loaded. It's really my fault for adding two patches to the same issue queue.

damontgomery’s picture

I believe this is now fixed in version 1.4.