The issue here is that when you are using Apache Solr, that module's default behavior is to save a node to the search index if the revision being saved is published, and remove it from the search index if it is not. But that doesn't quite match the model used by Workbench Moderation (in particular when Workbench Moderation needs to re-save the node in its shutdown function to get the final version correct).

I think there are various ways to reproduce issues here, but one I was able to reproduce is if you have a published node, then create a new draft for it, then edit the draft and save it again, the node will disappear from the Solr search index (even though there is still a published revision which should be shown).

Here is a patch that seems to fix things. Note: It relies on the Apache Solr patch posted at #1565766: Pass the full entity to Apache Solr status callbacks so modules which manage node revisions can use it in order to work.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

David_Rothstein’s picture

Status: Active » Needs review
FileSize
1.98 KB

Here is the patch.

rylowry@gmail.com’s picture

This patch looks like its working for me, in conjunction with #8 from the corresponding apache solr issue.

Josh Waihi’s picture

Status: Needs review » Needs work

Thanks for the patch! After applying it, solr (with apachesolr module 1.3 or later) began indexing workbench moderated content again which is great. However, the problem still occurs when a node is reverted. The document is deleted from the index and the reverted revision is not re-indexed.

acbramley’s picture

Status: Needs work » Reviewed & tested by the community

Reverting works fine for me. Reverting does not automatically publish the revision, it simply creates a new revision in draft that is a copy of the revision you reverted to. After publishing this new revision it works. This patch fixed all other workbench functionality too including the issue described in the summary. You do need the 2 patches from https://drupal.org/node/1565766#comment-7527059 for apachesolr though.

alexkb’s picture

Just an FYI for others, apachesolr 7.x-1.3 already includes a patch from #2017593: apachesolr_index_node_status_callback() bug when publishing / unpublishing content and using entitycache., but there are those two other patches that acbramley points out (not part of 1.3) that need to be applied as well.

damontgomery’s picture

With the 2 patches mentioned here, #1 for Workbench Moderation and #12 from #1565766: Pass the full entity to Apache Solr status callbacks so modules which manage node revisions can use it, things seem to work as expected.

I believe the new apachesolr module will be released soon and this can make it into the next Workbench Moderation release.

Dave Reid’s picture

Status: Reviewed & tested by the community » Postponed (maintainer needs more info)

This seems like it would be problematic for anything that has some kind of custom node access logic. Why isn't apachesolr checking node_access('node', $view, drupal_anonymous_user()) instead by default? You have a full node object anyway, and it would make it compatible with more modules?

David_Rothstein’s picture

Status: Postponed (maintainer needs more info) » Needs review

If the Apache Solr module were to check node_access() when indexing, that would prevent private content from appearing in search results at all. Instead, I believe sites which are using node access are supposed to turn on the Apache Solr Access sub-module, which filters the search results on display so that the user who is searching only sees results that they personally have access to.

Since the Apache Solr module itself just uses the status callback to filter out unpublished content (I guess the idea is that no one should see unpublished content in search), this patch is trying to do the same, but modified for how Workbench Moderation handles published vs. unpublished.

Dave Reid’s picture

I'm still not sure why we can trust $node->status, but based on the comments, it sounds like apachesolr is performing this action in a node_save() call via hooks. Wouldn't it be safer to use drupal_register_shutdown_function() from the hook to delay the action until the end of the request when you can do a node_load() after workbench_moderation has made the changes to the node object and $node->status accurately reflects what you need?

rv0’s picture

I just stumbled across this issue.
I remember in an older project, we had a dirty fix that ran apachesolr_cron() on hook_workbench_moderation_transition, that seemed to do the trick at that time.
Perhaps the hook is a better place to implement an appropriate fix rather than overriding callbacks?

ykhadilkar’s picture

If you dont want to apply path mentioned in comment #2 ... there is module which seems to accomplish same thing and by implementing same hooks ApacheSolr Workbench Moderation

csimmons44’s picture

Issue summary: View changes

I have a similar issue using Search API and I am wondering if this patch would work for Search API as well as Apache Solr search?

cristiroma’s picture

@csimmons44 Have you found a solution for the Search API? We have the same issue. Thanks!

acbramley’s picture

This patch won't work for search api as the hooks are specific to the apachesolr module.

I've just been looking into search_api and if it has any similar hooks available and it looks like there is. In theory:

* There's a hook_search_api_alter_callback_info implemented in search_api.module which defines a "Filter" which you apply to nodes at index time called search_api_alter_node_status which "Excludes unpublished nodes".
* There's a hook available called hook_search_api_alter_callback_info_alter where you can alter these alter callbacks
* We could implement this hook in workbench_moderation and swap out the class for that callback from SearchApiAlterNodeStatus to a new WorkbenchModerationSearchApiAlterNodeStatus class then implement similar code as the patch to ensure nodes are indexed correctly.

I haven't had much experience with search api so this may be all wrong but I'm going to be implementing this soon so will hopefully get a patch up.

acbramley’s picture

I've just tested using workbench_moderation 1.4 with search_api 1.15 and search_api_solr 1.8 and everything seems to be working as expected, there's some issues with fields updating at different times if you don't have "Retrieve result data from Solr" ticked in your search api server config though.

mitchalbert’s picture

hey,

i just came across this issue and i have a similar problem(or even the same problem)

Problem:

Node 1 version 1 has the word 'test'
Node 1 version 2(the published on on the site) 'test' was remove from the node

When searching for test, it returns node 1.... which shouldn't be returned.

im in the right issue for my issue :)?

*Edit: this isnt a issue anymore and isnt related to this issue

pbattino’s picture

Status: Needs review » Reviewed & tested by the community

To me this patch work and I think it should be merged. True, it only work with one of the 2 ways you can use Solr, but I don't think it wise to keep this waiting. Either we fork this issue into one regarding only Search API + Solr + Workbench moderation, or we keep it open and we move the code in another module like https://drupal.org/project/apachesolr_workbench_moderation BTW I tried it and its internal logic seems completely flawed to me. There's a patch to that module that seems to work but I think it's flawed as well (gives false positives on published status).

A third alternative is to turn the patch into a replacement of apachesolr_workbench_moderation.

Charles Belov’s picture

preksha’s picture

I am also facing the problem with search api module. When there is a content type with workbench moderation enabled, can see those nodes are indexed in the Solr Admin query but not able to see in the search result page. I have used search api views to display search results.

Has any one faced this issue with search api+workbench moderation?

Any inputs will be appreciated !

DamienMcKenna’s picture

Another issue should be opened for Search API and it should refer back to this one.

preksha’s picture

I thought this issue is with workbench moderation, but it is not because of the workbench moderation. I have used search api views to render the search result and in that one checkbox boolean field was failing to render the correct result.

Else search api(7.x-1.14)+workbench moderation(7.x-1.4)+search api exclude(7.x-1.x-dev) is working perfect for me.

Charles Belov’s picture

@preksha, this issue is about Apache Solr and Workbench Moderation. Please add your own separate issue to discuss Search API and Workbench Moderation. You are welcome to relate that issue to this one, but discussing Search API here will confuse the two separate issues.