Problem/Motivation

In some cases, nodes published from moderation lose their associated files.

Steps to reproduce

Quoted from comment #6:

if I attach files and publish via the "moderate" tab, then all is OK (ie files don't seem to go missing). If I publish directly after saving the draft by selecting the "publish" action from the drop down list on that page, then files start to go missing. By example:
1) create draft page and attach file A - > save, hit moderate tab and then publish from there (file present)
2) new draft page and attach a second file (now have File A and B) -> save, hit moderate and publish from there (both files present)
3) new draft page and attach a third file (now have File's A, B and C) -> save, hit moderate and publish from there (all three files present)

All is OK until this point. Now we try and publish our next draft by saving and selecting publish directly from the drop down on the draft page (ie rather than going via the moderate tab):
4) new draft page and attach fourth file (now A, B, C & D) -> save and publish from the drop down presented on the draft page directly (files now start dropping from the page).

A less than ideal fix was committed at #35: http://cgit.drupalcode.org/workbench_moderation/commit/?id=a66a37e

comment #42 lays out what is going on:

You want to save a draft with a new file and you submit the form
node_save gets called and a new revision gets saved and is now the new current version of the node (this is super dump and the main issue here anyway)
to "pull back" the revision which is actually just a draft node_save gets called again with the former "stable" version of the node.
$node->original = entity_load_unchanged('node', $node->nid); will pull up the "draft" as its the current "head" revision
file_field_update() will compare the unchanged "stable" version against the "draft" (it should compare it against the "stable" version)
With this screwed up perspective it looks as if the file (that was actually added to the draft) is missing in the "stable" version.
Therefore file_field_delete_file() gets called, the file_usage counter gets reduced, most likely to 0 and the file is dropped
The unchanged "stable" version gets saved and is now the most current version again. The "draft" ends up with a reference to the uploaded file, which no longer exists as it got deleted.

An additional related problem was reported in #50:

I'm using the file_entity and media modules and came across the issue where removing a file from a field was not removing the usage.

I set up a vanilla environment with the same file field using the media widget on 2 content types. I set up 1 content type with revisions disabled, and the other to use workbench. The content type using workbench encountered the bug I described, but the non-revisioned type did not. I suspect this is a similar issue to what's happening with adding files. But to reproduce:

1) Create a node and upload a file
2) Save and publish the node (The file usage now reports it is in use on the same node twice).
3) Create new draft, remove the file and save. File usage now reports it is in use on the node 3 times even though I just removed it.
4) Publish node, file usage goes back to 2 and file is not deleted.

Proposed resolution

The latest patch in #55 is a re-roll of #53. #53 improved upon #47 (which was RTBC by one person) by avoiding issue with #1963992: Don't use $reset flag with node_load(). This patch removes the original "brute force" sub-optimal solution that was previously committed.

Remaining tasks

Review & test latest patch. Need to confirm or deny the concerns mentioned in #54.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

binford2k’s picture

Status: Active » Needs review
FileSize
389 bytes

This should fix it. I don't know that it's all that needs to be done though, so please test.

kirby14’s picture

This fixed it for me with my basic testing.

stevector’s picture

binford2k, can you describe how you knew there was a problem? That'll make testing easier.

Also do you know why there are only three list calls to this function?
http://api.drupal.org/api/drupal/modules--field--field.attach.inc/functi...

It seems odd that it is getting called in comments, taxonomy and users but not nodes.

kirby14’s picture

edit: scratch that, seems to be a different issue.

binford2k’s picture

Steve, I saved a revision and the file attachments were not saved.

It is called in node_save, but it's constructed in a tricky fashion so the docs system doesn't pick it up.

// Save fields.
    $function = "field_attach_$op";
    $function('node', $node);

    module_invoke_all('node_' . $op, $node);
    module_invoke_all('entity_' . $op, $node, 'node');
drkrusher’s picture

To add some debugging info to this issue, I'm finding that if I attach files and publish via the "moderate" tab, then all is OK (ie files don't seem to go missing). If I publish directly after saving the draft by selecting the "publish" action from the drop down list on that page, then files start to go missing. By example:
1) create draft page and attach file A - > save, hit moderate tab and then publish from there (file present)
2) new draft page and attach a second file (now have File A and B) -> save, hit moderate and publish from there (both files present)
3) new draft page and attach a third file (now have File's A, B and C) -> save, hit moderate and publish from there (all three files present)

All is OK until this point. Now we try and publish our next draft by saving and selecting publish directly from the drop down on the draft page (ie rather than going via the moderate tab):
4) new draft page and attach fourth file (now A, B, C & D) -> save and publish from the drop down presented on the draft page directly (files now start dropping from the page).

I've reproduced the above in multiple combinations. I have workbench files, access and moderator enabled and workbench media disabled.

kirby14’s picture

I can verify what drkusher listed above. When I added the patch in comment #1 the new draft had the proper new item but when I used the drop down to publish it, it would error.

kirby14’s picture

A little more information: If you comment out the node_save inside workbench_moderation_moderate() it will properly save the file/image to the revision. The problem is that it set that revision to the published state automatically.

Somehow the additional node_save (being called during a hook_node_update()) seems to be what is causing the issue. Is there a chance that a saving a node during a node update is interfering somehow?

The comment just before that node_save even says "@TODO: do we trust node_save() here?"

stevector’s picture

Calling a node_save() from within a hook_node_update seems like a bad idea. In fact I'm kind of surprised we don't have an infinite loop here since hook_node_update is called from node_save(). http://api.drupal.org/api/drupal/modules--node--node.api.php/function/ho...

Perhaps workbench_moderation_moderate() can take an additional argument for whether or not to save the node. If the node is not saved workbench_moderation_moderate() could return it instead. In the case we're discussing here that would return the node to the hook_update_node implementation.

CB’s picture

I posted an issue #1126690: Updated content triggers twice about the triggers being triggered twice, stevector suspects it may be this call to node_save(). Would love any ideas / opinions.

Thanks.

kirby14’s picture

workbench_moderation_moderate line 1370 has $live_revision->workbench_moderation['updating_live_revision'] = TRUE and the first thing the hook_node_update implementation looks for is to verify that updating_live_revision isn't set. I commented out the line and got the endless loop we all expected :P

So it looks like this is the sequence of events:
1. A new revision is submitted, firing off hook_node_update (if nothing is done, this new revision will be published by default)
2. workbench_moderation tables are updated to reflect the new revision
3. the current live revision (not the one we're actually submitting) is loaded and we verify it is set to be published and not a new revision
4. that live revision is then saved which fires off hook_node_update but is automatically returned
5. the original hook_node_update is done and returns processing to core

Without steps 3 and 4, the new revision that we submitted will be published instead of set to be a draft, etc. 3 and 4 go update the previously published revision to still be published so that our revision will stay in draft mode.

The rest of you probably already understood all of what was going on but it helped me to write it out. Is any of what I listed above incorrect?

kirby14’s picture

Starting with a fresh install of this to see if I can work out what exactly is happening. Are there any other modules that do this sort of thing that we can check out?

stevector’s picture

Here's the patch I outlined in #9

kirby14 I'm not sure if other modules do something like this. Perhaps we can get a comment from another workbench maintainer.

Ivan Zugec’s picture

patch from #13 seems to fix file upload issue.

But after applying the patch, drafts are automatically published. When I click on "New draft" and save the form, it publishes that draft. I don't get the "View draft" or "Edit draft" tabs anymore.

kirby14’s picture

If I'm not mistaken, setting $node = workbench_moderation_moderate() is not needed. Objects are passed by reference automatically now so all of the changes to $node are already there without needing to explicitly return it.

Also, I don't believe the additional variable is needed based on what I said in post #11

workbench_moderation_moderate line 1370 has $live_revision->workbench_moderation['updating_live_revision'] = TRUE and the first thing the hook_node_update implementation looks for is to verify that updating_live_revision isn't set.

I could be reading the patch incorrectly, though.

Taxoman’s picture

Priority: Normal » Major
agentrickard’s picture

kirby14’s picture

I tested the patch from #1087044: Custom URL path settings do not survive moderation and it still doesn't correct the issue. It really doesn't make sense, I guess my assumption about what was causing the issue was wrong.

Previously the node_save that set the current published node BACK to published happened before the new revision was actually done saving. With agentrickard's patch, that is happening afterwards and yet it still doesn't work.

With the patch from agent, if I create a new revision and attach a file/image, it doesn't show up. If I comment out the shutdown call, the image DOES show up but that revision is now the published revision instead of sitting in draft still.

There has to be something else we're missing.

agentrickard’s picture

We've had a fairly long debate about this process, and my opinion is that we should never node_save() the unpublished version. I think we should strategically update that node record.

The problem with files (and other things that aren't revisioned) is that they aren't consistently loaded onto the node.

kirby14’s picture

Looks like the revisioning module is doing all of this correctly already! It also supports scheduling the release date/time of the published revision.

That module appears to have a much more robust system for dealing with revisions, I wonder if there is some way to just use that module and have it tie in to Workbench. Obviously if all else fails we can see how they are doing it to get it working properly here but that just seems like a bad way of going about it.

agentrickard’s picture

This is a classic case of module developers not working together, since in D6 we had about 4 modules trying to do the same thing. (Note that Revisioning is itself a fork of Revision Moderation.)

That module wasn't available for D7 when we started. I don't recall why we based from Content Moderation instead of Revisioning, though I seem to recall that Revisioning had lots of hardcoded exception handlingfor specific modules, a practice that we want to avoid.

I don't think Revisioning handles the muti-stage workflow (though it lets Workflow integration handle that -- yet another module that wasn't ready for D7 when we started.)

You should be able to use Revisioning just fine. It just doesn't have the Views. You could build those and add them in for your site.

agentrickard’s picture

So all a Workbench Revisioning module would need, I think, is the Views we want for the My Workbench pages. That and maybe a little language cleanup for consistency in the user interface.

kirby14’s picture

    // Hack!
    // Because of core bug #1120272: node object inconsistent after node_load($nid), if the current revision is loaded,
    // $node body and fields may in fact be that of the LATEST revision.
    // So reload with FIELD_LOAD_REVISION ($node->vid is already set correctly).
    // Make sure to unset the already loaded fields or we end up with 2 copies
    // of each field, e.g. 2 tags, 2 image attachments etc.

That information is in the hook_node_load of the revisioning module. It sounds like that is the exact issue we're having. They fixed it with this hack while this module tried to fix it by re-saving the published revision in the node update.

The hook_node_update in the revisioning module also notes some bugs in core that they have gotten around. The end result is that they just have to do a simple db query to update the published revision instead of needing to save the revision again.

I only put this in here to show that it should be pretty simple to use these fixes to get workbench moderation working properly. The real question is if it is worth it to have competing modules or if it would be better to just create a wrapper module that makes revisioning work with workbench.

I'll see what I can come up with when I get some more free time.

agentrickard’s picture

IIRC, the re-save performs some other functions as well. I've always argued against it, but some of the other developers insist on it.

Switching to Revisioning loses the workflow state functionality.

mfer’s picture

I am also experiencing the problem. UID 1 seems to be unaffected and works fine. Other users, not so much.

mfer’s picture

To add some more context....

  • In my tests of files the correct values and references are being saved to the field tables and field revision tables.
  • The file_managed table does not have the file nor does it have the fid for the file referenced from the field system.
  • The file itself it not saved to the filesystem.

More on this to come.

Also, I'm interested in fixing this bug... no real preference on a switch to a different module as long as there is an upgrade path and the functionality people use still rocks out.

mfer’s picture

Another piece of contextual information... a user with all permissions does not have the issue.

Note, I have http://drupal.org/node/1087044#comment-4456326 applied.

Still working on this.

mfer’s picture

Unfortunately, upon further testing I'm finding the problem for all users including UID 1. Well, isn't that a bugger.

mfer’s picture

Found the problem! When the draft is saved the files are saved. But, when node_save is called a second time to make a previous revision the live revision file_field_delete_file is called. This sees the file is being removed and that no other place is using the file. So, it deletes it.

mfer’s picture

After digging through all of this it turns out the patch in #1 works for me.

Calling field_attach_update() saves the fields a second time. This increments the usage to 2 instead of one for new files. When the second node_save occurs to revert to the live revision this number is decremented to 1. Since it is not 0 and removed the file is kept.

I'm not sure this is the best solution but it worked for me.

stevector’s picture

#1 also works for me. I also don't know if it is the best solution. I think we can put this patch in to get rid of the bug before trying a more complete solution.

kirby14’s picture

I remember upon further testing that #1 ended up breaking other stuff for me. It seemed like it worked but there were additional issues I found later.

I'll try to find exactly what was the issue with it.

RdeBoer’s picture

@agentrickard, #21:
Sorry for the late response, I only just saw this.
"This is a classic case of module developers not working together"

Well if you'd just asked, then maybe we would have. :-)
Bit of background. Yes Revisioning started it's live as a fork of/contribution to Revision Moderation. Back in 2009 contacted both the original developer (webchick) and the maintainer at the time, but they didn't seem to want to be in that space anymore. To meet our client's requirements we then re-wrote Revision Moderation and contributed it back to the community in the form of Revisioning,

"That module (Revisioning) wasn't available for D7 when we started."
Again, if you'd just asked, then you might have found that work was underway...

"You should be able to use Revisioning just fine. It just doesn't have the Views."
Err... yes it does! It comes with a nice canned View named "Content summary" that appears automatically in your navigation menu when Views and Revisioning are both installed. It shows the pseudo workflow state of each node, the number of revisions it has etc. and may be customised to suit your needs.

PS: agree that loading of revisions in core D7 currently sucks! With D7.2 now out we have to change our workarounds again!

agentrickard’s picture

@RdeBoer

I agree on all points. I just know we were on a deadline and went with the Content Moderation fork, though I don't recall why.

Has 7.2 changed anything specific?

becw’s picture

Status: Needs review » Fixed

I've committed the patch from #1; I've also committed a test for this, which is my first test evar, so while it proves that this specific issue is fixed it's probably not going to catch bugs that this *creates*. See a66a3.

One issue with this patch is that it makes the file_usage counter for files on published revisions less accurate. Files are still removed if the node is removed, though.

@kirby14: let me know if you recall the other bugs you encountered with this patch.

jaiiali’s picture

subscribe

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

agentrickard’s picture

Version: 7.x-1.0-beta5 » 7.x-2.x-dev
Status: Closed (fixed) » Active

I'm seeing this behavior again with the State Machine version. Testing:

- Upload a new file on an existing node using Media field.
- Check {file_managed} for the new file record.
- Save the node as a draft (any state except for published).
- Check {file_managed} and the new file record is gone.
- Oddly. the field_revision_X table still points to the now missing file id.

I was able to work around this using File Lock -- http://drupal.org/project/file_lock -- but I wonder what the root cause is.

robeano’s picture

Assigned: Unassigned » becw
Issue tags: +Workbench Moderation
robeano’s picture

Tagging for sprint.

stevector’s picture

Status: Active » Closed (duplicate)

The file handling problem in 2.x comes from the dependency on State Machine 3.x

#1867182: File handling on forward revisions

ti2m’s picture

I'm posting here even if its marked duplicate as I ran into the same problem and checked workbench_moderation for reference (no clue about state machine). I'm working on an automated import process of salesforce products that need to be "staged" (a new revision which is not active yet) and can then be approved. I tracked it for 2 days and as far as I can tell the problem lies within node_save itself.

The reason why files get deleted are the wrong file_usage counts, which are caused by calls to file_field_delete_file() in function file_field_update() in file.field.inc.

When files are removed from a node (e.g. 1 of 3 images), file_field_delete_file() needs to be called when the node gets overwritten(!). It will then reduce the file_usage counter. To check which files were removed the function compares the new version of the node with the "original" version, which itself was stored by node_save in $node->original. And that's were node_save screws things up (as far as I can tell). Root of all evil in node_save is $node->original = entity_load_unchanged('node', $node->nid); as this will ALWAYS get the currently active node version as it doesn't care for the given revision id of the node revision that you actually want to overwrite.

So, to point out what happens:

  1. You want to save a draft with a new file and you submit the form
  2. node_save gets called and a new revision gets saved and is now the new current version of the node (this is super dump and the main issue here anyway)
  3. to "pull back" the revision which is actually just a draft node_save gets called again with the former "stable" version of the node.
  4. $node->original = entity_load_unchanged('node', $node->nid); will pull up the "draft" as its the current "head" revision
  5. file_field_update() will compare the unchanged "stable" version against the "draft" (it should compare it against the "stable" version)
  6. With this screwed up perspective it looks as if the file (that was actually added to the draft) is missing in the "stable" version.
  7. Therefore file_field_delete_file() gets called, the file_usage counter gets reduced, most likely to 0 and the file is dropped
  8. The unchanged "stable" version gets saved and is now the most current version again. The "draft" ends up with a reference to the uploaded file, which no longer exists as it got deleted.

This seems to be the exact same issue for workbench, but here it gets more confusing as field_attach_update() gets called so file_usage will be of no matter what. Though I post it anyway in node_save withar is to replace

$node->original = entity_load_unchanged('node', $node->nid); in node_save (node.module Line 1036) with
$node->original = node_load($node->nid, $node->vid, TRUE); and that seems to work. You could also use hook presave.

stevector’s picture

Status: Closed (duplicate) » Active

Timm, are you using Workbench Moderation 1.x or 2.x?

ti2m’s picture

Sorry, my fault, rookie mistake, just pulled it down with drush and didn't look for the version as it totally matched the error... Will check version 2.x and see if it applies as well.

stevector’s picture

Version: 7.x-2.x-dev » 7.x-1.x-dev

That's not a mistake Timm :) 1.x is the more stable version. I was asking because I thought this issue had been solved in 1.x per the comment in #35. So if it's not fixed I'm reopening this issue and changing it back to 1.x

stevector’s picture

Assigned: becw » Unassigned
ti2m’s picture

Haha, yeah, I just looked at the patch in #35. I would call that a bit of a brute force patch. I was wondering why it was necessary to call field_attach_update() there as that results in most of the file usage counts to be off. If it's really only to pump up the counts then I would try another approach as described in #42.

I just tested it, check the attached patch, that seems to work for me. I just took out the extra field_attach_update() call and placed

$live_revision->original = node_load($live_revision->nid, $live_revision->vid, TRUE);

before the node_save($live_revision) call in workbench_moderation_store(). Not sure about your workflow, maybe you even have the old original node already at hand somewhere. In the meantime I have checked the revisioning module and they just switch back to the old "stable" revision through a database update of the node table in hook node_update, which is great because then it is within the original node_save transaction scope. As I need to do batch operations this is way faster as well. But that just on the side.

stevector’s picture

Status: Active » Needs review

Setting to needs review so that tests get run.

I can't totally explain the file_attach_update() approach off the top of my head. I think this is tied to the nature of this module and the need to double save nodes. For forward revisions to work in D7 we need to save the forward (draft) revision and then resave the published revision. If we can get rid of it, all the better.

ti2m’s picture

Turns out the DB update isn't that easy either to switch back to the stable node. So I'm back to double saving as well as that is way less risky and consistent with the API. The revisioning module must do some other stuff, but I didn't get into it entirely.
As far as I can tell file_attach_update() has nothing to do with the double saving. The "draft" gets saved through the node form submit and you resave the stable version with an extra call to node_save. The field_attach_update() happens before the node_save call. But I guess it just needs testing and we see what happens. I think I'll also open up a core ticket as this is a bit of a strange behavior. I don't get why there isn't a simple switch in node_save which lets you just save a new revision without pushing it to "head". Hopefully that will change in D8 at least.

acbramley’s picture

I'm having a similar (yet sort of the opposite) issue with files and usage. I'm using the file_entity and media modules and came across the issue where removing a file from a field was not removing the usage.

I set up a vanilla environment with the same file field using the media widget on 2 content types. I set up 1 content type with revisions disabled, and the other to use workbench. The content type using workbench encountered the bug I described, but the non-revisioned type did not. I suspect this is a similar issue to what's happening with adding files. But to reproduce:

1) Create a node and upload a file
2) Save and publish the node (The file usage now reports it is in use on the same node twice).
3) Create new draft, remove the file and save. File usage now reports it is in use on the node 3 times even though I just removed it.
4) Publish node, file usage goes back to 2 and file is not deleted.

Doing the exact same actions with the non-revisioned content type gives the results we want.

Edit: I've realised that deleting a file or removing the usage when a file is still used in an old revision IS what we want. So I thought maybe if I deleted the node it would remove the file and usage, however it did not and now the usage reports the file is in use twice even though it's not attached to anything. The db table contains the following:
+-----+--------+------+----+-------+
| fid | module | type | id | count |
+-----+--------+------+----+-------+
| 4 | file | node | 4 | 2 |
+-----+--------+------+----+-------+

acbramley’s picture

Status: Needs review » Reviewed & tested by the community

Wow, unbelieveably patch #47 fixed all my issues! Thanks a lot :) Usage counts now report correctly as 1 even when resaving a node and removing the file from a new revision. File is also deleted when last node that uses it is deleted, but this only happens if the file is attached to the published revision.

If a file is attached to a previous revision and the node is deleted, the file remains and the usage table is not updated. Is this a bug with file_entity now?

hefox’s picture

Issue summary: View changes
Status: Reviewed & tested by the community » Needs work

#1963992: Don't use $reset flag with node_load() needs work based on the node_load

recrit’s picture

Status: Needs work » Needs review
FileSize
1.25 KB
1.15 KB

The attached patch uses the existing loaded live node from workbench_moderation_node_live_load() as the "original". If the patch in [1963992] is used, then it will not reset the cache.

Note: this patch also fixes a double save issue with field collections since it saves the field collection item entities in it's hook_field_update().

jweowu’s picture

#53 seems like a good improvement.

Certainly (as per #52) the unnecessary entity load cache flush in the earlier patch was very bad.

The only potential issue that springs to mind with the new patch is the fact that clone only creates a shallow copy of the object, so the resulting node object isn't the same as the one generated with #47 (as all references will be identical to those of the original object).

e.g. $node->workbench_moderation['current'] and $node->original->workbench_moderation['current'] are the same object.

I think that's fine in these circumstances, though? None of the properties directly modified in this function are references:

  // Make sure we're published.
  $live_revision->status = 1;
  // Don't create a new revision.
  $live_revision->revision = 0;
  // Prevent another moderation record from being written.
  $live_revision->workbench_moderation['updating_live_revision'] = TRUE;

So the only question is whether the node_save() process could have issues if additional changes to $node made during that process could potentially affect $node->original (or vice versa).

I'm pretty sure this wouldn't be an issue in practice, but I wanted to get more input.

If we're paranoid, then the alternative would be to make a second call to workbench_moderation_node_live_load(), to do a fresh load (like the earlier patch but, given #1963992: Don't use $reset flag with node_load(), not blowing away the entire load cache in the process).

It would be nice to avoid the inefficiency of a second workbench_moderation_node_live_load() if we don't need it, though.

dooug’s picture

The #53 patch didn't apply cleanly. I re-rolled it. Though I am also curious about the concerns mentioned in #54.

dooug’s picture

Issue summary: View changes

reviewed thread and improved summary for easy readin'!

ram4nd’s picture

dgtlmoon’s picture

Just a comment - Does this help? #2568053: What's the reasoning for file_field_update deleting files? can someone clarify why Drupal would want to delete files between revisions at all?

  • becw committed a66a37e on 7.x-3.x
    Issue #1084436 by binford2k, becw: force saving of file attachments on...
fgjohnson@lojoh.ca’s picture

#55

Seems to work better BUT the usage count is different depending on where the image is displayed.

Primary Image

1 count for itself,
1 count each for the English and French translations(?) of the nodes for each version.

So a node with 1 Featured Image and 3 versions would have a count of 7.
Removing a version of the node deprecates the file usage by 2.

Embedded Image

1 count for itself,
1 count for the node for each version. (Doesn't consider translations?)

So a node with 1 Embedded Image and 3 versions would have a count of 4.
Removing a version of the node deprecates the file usage by 1.