Fields always load the latest revision value even if node.vid points to an older revision, which is something we need to make #282122: D7UX: "Save draft" and "Publish" buttons on node forms work.

How to reproduce:

1. Create a node and some revisions changing the body text.
2. Change manually the vid value in the node table to an older revision.
3. The body keeps showibn the latest revision.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Pedro Lozano’s picture

Issue tags: +Fields in Core

Tagging.

catch’s picture

subscribing. This is going to be an interesting one.

yched’s picture

That's by design, older revisions need to be fetched using field_attach_load_revisions().
This was originally designed to allow field storage to separate 'current revision' data from the more rarely accessed 'older revisions' data, and let the former be faster.

Pedro Lozano’s picture

@ysched, I've debugged it a little and the field (body) is been load using field_attach_load_revisions() but the arguments seems to be wrong.

I'm not familiar with D7 fields code so I've not come with a patch, but something isn't right with the $vid variable passed throught node_load -> node_load_multiple -> field_attach_load_revisions -> field_attach_load.

bjaspan’s picture

Priority: Critical » Normal

I installed a new D7 site, created node 1, created a second revision of node 1, and viewed the original revision of node 1, and the body displayed correctly (showing revision 1's content). Until further information is available, I think there is not a bug here.

In the original description, it sounded to me like Pedro wants field_attach_load() to load the revision specified by vid automatically, even if field_attach_load_revision() is not called. As yched says, this is not how it works or will work.

I'll leave this open for now but without more info we'll mark it "won't fix" at some point.

catch’s picture

@bjaspan, this is less of a bug than an API change. In Drupal 6, modules like revision moderation and others have used the ability to set an arbitrary vid in order to make that the 'active' one. The only way I could find to do a similar thing in D7 was to load, save and delete revisions, messing with timestamps etc., to have a revision newer than the active one. So, all those modules will break, and there's a bit of trickery involved to get similar behaviour working now (although arguably a less trickery than 'update {node} set vid = $some_arbitrary_vid').

bjaspan’s picture

Ah. So you want to be able munge the node.vid column and have node_load() get the vid you set, but it won't because node_load() calls field_attach_load() which ignores the vid property since it is loading the "current" value. Right?

The "easy" solution, if node.module wants to support having the vid column munged in this way, is for node_load() to always call field_attach_load_revision(), which will work fine for the current revision and all past revisions. This would basically be a reversion to the way CCK worked: always querying a table with all revisions in it.

A better solution, I think, is for node.module to export another API function: node_set_active_vid($nid, $vid). This function would internally munge the vid column, and also set a new column (e.g.) "vid_is_munged" to TRUE. Then node_load() would know to use field_attach_load_revision() when the vid is munged, and field_attach_load() otherwise.

Pedro Lozano’s picture

1. There is at least an API terminology problem here. What is considered current revision? The interface shows as "Current revision" the one in the node.vid column (munged or not). But FIELD_LOAD_CURRENT actually means the LAST created revision when passed to field_attach_load(), not the "current", and FIELD_LOAD_REVISION really loads the CURRENT revision.

2. When you do a node_load() without the vid argument the title of the node loaded is the title of the "current" revision (the one in node.vid), not the last one. But the fields (ie: body) always load the last revision, so there is an inconsistency between core node fields and fields from the fields api.

bjaspan’s picture

Component: field system » node.module

Yes, to Field API, the most current revision is the one most recently saved. This is by design, for a very good reason: it is by far the most common mode of operation and is much more efficient than the alternative.

The fact that node.module chooses to consider the vid column as "the most recent" revision is different than Field API, and that's fine; there is no rule that says they have to use the same model. We just have to bridge the impedance mismatch. I specified in #7 an easy way to do that correctly.

I believe this is now a node.module issue. If someone wants to argue that every fieldable entity will have this problem and thus that Field API should provide a solution, go ahead. :-)

catch’s picture

I already found a workaround for this on #282122: D7UX: "Save draft" and "Publish" buttons on node forms, and it really only needs to be done when you want to save a new revision but not make it active. What we need, I think, is a node_save_revision() which handles saving to the node_revision table without altering the current {node}.vid - I think from yched's comments there'd be some field API work to do there.

yched’s picture

Drawback with solution #7 is that field_attach_load_revision() skips the field cache, and thus implies a performance penalty on having nodes with draft revisions.

IMO we should aim at keeping the data in field storage's 'current revisions' bucket always to-date with what is actually the current (published) vid.
What i was thinking in #282122-80: D7UX: "Save draft" and "Publish" buttons on node forms was: if node.module did the job of setting a $node->is_current_revision = FALSE flag (or equivalent) before calling field_attach_update() when saving the draft revision, then it would be easy for storage modules to only save in the 'other revisions' bucket, keeping the data in 'current revision' intact.

This only works for 'create a new revision (draft) while keeping the published revision unchanged' - which is the use case catch mentions in #10. This doesn't work if the 'current vid' ({node}.vid) is changed under us - which I don't see how we can support. Such abrupt change should never happen IMO. Moving 'current' (published) revision from vid N to N+x should *create a new revision* N+(x+1) - like core currently does when 'reverting to an older revision': it doesn't just hack {node}.vid, it creates a new revision. In this case, $node->is_current_revision = TRUE and field data are saved in the 'current revisions' bucket, and we're OK.

[edit: the $node->is_current_revision flag could also be an additional param for f_a_update(), but I guess the flag could also be useful for non-field nodeapi modules that want to optimize their data storage in a similar way that Field does]

yched’s picture

This doesn't work if the 'current vid' ({node}.vid) is changed under us - which I don't see how we can support. Such abrupt change should never happen IMO. Moving 'current' (published) revision from vid N to N+x should *create a new revision* N+(x+1) - like core currently does when 'reverting to an older revision'

If I understand #282122-172: D7UX: "Save draft" and "Publish" buttons on node forms right, it seems the current approach over there still creates a new revision for every step of a node lifecycle, throughout drafts and published versions. So the proposal above still stands.

catch’s picture

It does, but the only way it's able to do that, is by deleting old revisions and cloning them, which is a bit of a mess (although frankly not too bad in the scheme of things). I don't have high hopes for that patch getting in before freeze though, however the code there should hopefully allow for revision_moderation to update to D7.

yched’s picture

by deleting old revisions and cloning them, which is a bit of a mess (although frankly not too bad in the scheme of things)

Yeah, as I said above, I'd also tend to see this as good practice.

Damien Tournoud’s picture

Component: node.module » field system
Priority: Normal » Critical

I am raising this to critical, we need to find a solution here, or Drupal becomes very very broken as a CMS. Having new but unpublished revisions is quite a basic requirement.

If I understand correctly, #7 is based on the premise that loading revisions skip the field cache. Nearly two years after, is it still a problem? What are the drawbacks of having a field cache even for revisions?

Let's discuss how to fix that.

catch’s picture

Version: 7.x-dev » 8.x-dev
Issue tags: -Fields in Core +Needs backport to D7
Damien Tournoud’s picture

Status: Active » Needs review
FileSize
2.04 KB

Here is a suggested way forward, that could possibly be backported to Drupal 7.

  • We add a 'base revision' entity key, in addition to the 'revision' key; defaults to the 'revision' key if not present;
  • When the 'base revision' entity key is not present or it is equal to the 'revision' key, the semantic of entity_load() is unchanged: Load the row from 'revision table' which 'revision' column is equal to the value of the 'revision' column in the 'base table';
  • When the 'base revision' entity key is present and different then the 'revision' key, the semantic of entity_load() is changed in this way: Load the row from 'revision table' which 'revision' column is equal to the value of the 'base revision' column in the 'base table';
  • We call field_attach_load_revision() in the first case, but we still call field_attach_load() in the second.

This is a small API change, and we need to extend the tests, but I think this should pass all our current test suite.

fago’s picture

Honestly, to me this two similar flags are rather confusing (while I see how they could be used to serve your use-case).

What about the approach yched outlined in #11?

Imo it's totally weird that the API does not provide a way to tell whether $node is the current revision or not. The in #11 proposed $node->is_current_revision would solve that too.

Damien Tournoud’s picture

My point is that all our issues come from the fact that modules want to change the semantic of the 'revision' key.

Drupal core is built around the assumption that 'revision' (in the base table) is the key of the latest revision. Let's keep it that way, and allow other modules to define other columns in the base table with different semantics (latest published revision, etc.).

The way I see it, the problem here is that we assume there is a 'tip' revision. The whole field_data vs. field_revision doesn't make a lot of sense to me, and I would be happy to see it ripped apart for Drupal 8. We probably should not do that in Drupal 7, and the change I suggest will restore flexibility in that regard without breaking anything else.

Of course, the name of the key can be discussed :)

fago’s picture

>The whole field_data vs. field_revision doesn't make a lot of sense to me, and I would be happy to see it ripped apart for Drupal 8.

hm, I like that approach as it avoids having to deal with inactive-revision data when querying just for current stuff (as we do in 99% of the cases) + it works already similar as the CRAP approach that is proposed for d8.

>Drupal core is built around the assumption that 'revision' (in the base table) is the key of the latest revision. Let's keep it that way, and allow other modules to define other columns in the base table with different semantics (latest published revision, etc.).

Yes, I agree we need to keep the semantic of the node revision key, which is to specify the current revision. But instead of introducing just another key, we could follow the solution in #11 what would allow us to make the current field API model properly work with the node model even if latest revision != current revision.

E.g. we could support the following to add a new unpublished revision:

$node->is_current_revision = FALSE;
$node->revision = TRUE;
node_save($node);

To me this would be much more intuitive.

catch’s picture

Yeah I still like #11 here as well. We wouldn't even be minting the bid like d6, just saving the revision directly without updating the node table at all.

Damien Tournoud’s picture

hm, I like that approach as it avoids having to deal with inactive-revision data when querying just for current stuff (as we do in 99% of the cases) + it works already similar as the CRAP approach that is proposed for d8.

That was the initial idea behind the field_current_*/field_revision_* table architecture. But it was just a ill-conceived idea. Most of the time everything is loaded from the field cache anyway, so we are just duplicating the data for no reason whatsoever.

I would be super happy to see this go away for Drupal 8.

Also, I don't agree at all with the CRAP approach, which is basically hardcoding the notion of "current" revision.

Yes, I agree we need to keep the semantic of the node revision key, which is to specify the current revision.

The revision key is absolutely not the "current" revision, at least not in Drupal 7 and above. It's the "latest" revision. That's the semantic I think we should keep, and allow contrib modules to define other columns with different semantics.

fago’s picture

>The revision key is absolutely not the "current" revision, at least not in Drupal 7 and above. It's the "latest" revision. That's the semantic I think we should keep, and allow contrib modules to define other columns with different semantics.

The node schema description in d7 still says it is the "current revision". I think it always was the *current* revision and I'm not aware that this has been changed. That said, I'd agree with #8 that just the field API works different by loading always its last revision as current revision.

catch’s picture

I've never thought the current vs. revision split was about loading field data. That's cached and even when not it's an indexed query. What it does make some sense for is taking the edge off unindexed queries against field values, which are generally going to be worse the more rows there are in the table. Now it'd be better if we didn't run unindexed queries, but that's not the case yet. There's something to be said for being able to tell if a field value is in use in any current revision - for example if you wanted to clean out old tags, but a flag on a single table would work for that too. If we want something that works for D7 then it's a bit off topic though.

Damien, can you explain why you prefer latest revision to current revision? In D6 the vid column always meant current revision more or less - queries would join on this, IMO it is only field api which doesn't follow this rule, and judging by this issue that was mostly accidental. Also I would remind people that if it comes to it, revision moderation could create two revisions - first the new edit, then a copy of the existing revision. If it comes to if that hack seems a lot less bad to me than node table column additions in D7 which is serious overkill for something that can be worked around.

fmizzell’s picture

subscribing

catch’s picture

Title: Fields always load the latest revision » Make it easier to create draft revisions in core
Component: field system » node.module
Category: bug » task
Priority: Critical » Major
Status: Needs review » Active

I posted #1184318: Avoid using vid munging for pending revisions and http://drupal.org/node/995876#comment-4583786

Those are the only two modules I'm aware of that will have trouble with this being ported from Drupal 6 to 7. Let's see if the code from that doomed save as draft issue helps them.

The use case is also handled by workbench judging by the project description, I've not tried that module yet though: http://drupal.org/project/workbench

Downgrading this to major task. It might be possible to make this easier in D7, but it is not breaking anything. A quick review of revisions module shows it is already working around the bug by adding field_attach_load() with a vid in hook_node_load(), - that's a bit ugly but no uglier than what Barry was suggesting which would do the same thing in core.

Changing the actual field API behaviour is a completely different discussion, and that can be its own D8 issue, so I'm also moving this to node module for now.

Azol’s picture

Subscribing.

agentrickard’s picture

@catch

For what it's worth, I think file handling was the biggest D7 problem for Workbench Moderation and Revisioning modules. Of course, the Revisioning fix circles back to this issue... #1120272: node object inconsistent after node_load($nid).

jstoller’s picture

Subscribing. Please see #218755: Support revisions in different states for more discussions around this issue.

webchick’s picture

Subscribing.

colan’s picture

Subscribing.

Bojhan’s picture

Would be awesome if we can work on this soon, I want to solve some of these content creation issues early in the cycle.

catch’s picture

I did some initial work in #1082292: Clean up/refactor revisions.

jstoller’s picture

I had posted a comment in #218755: Support revisions in different states that I think relates to this issue, the gist of which is as follows.

What if we had four different node tables: {node}, {node_revision}, {node_active} and {node_latest}.

{node} would contain only that information which always applies to the entire node, like nid, type, language, created, changed, comment, tnid and translate.

{node_revision} would be a slightly expanded version of the current table of that name, containing any node data that could potentially vary from one version to the next. For instance, timestamp should be replaced by separate created and changed fields.

{node_active} and {node_latest} would be denormalization tables for the two most basic views of node data and could be similar to today's {node} table in structure, but would contain nothing that isn't also in the new {node} and {node_revision} tables. {node_active} would reference the most current version of each node with a "Published" status. {node_latest}, on the other hand, would reference the latest version of each node (Published or Unpublished).

Here "active" and "latest" could be thought of as two standard entity display modes for nodes. If I'm viewing the site in the "active" mode, then I'm seeing only the publicly visible content. If I'm viewing the site in the "latest" display mode, then I'm seeing all the latest content, even if it's still unpublished. Which mode I'm in would be context dependent, like a build mode for the entire site. This same concept could be expanded to all entities. For instance, we could have {users}, {users_revision}, {users_active}, {users_latest}, {comment}, {comment_revision}, {comment_active} and {comment_latest}. I could now see my entire site as if the latest revisions of all my content were published, like a virtual staging site for content!

Further expanding on this concept, we could allow modules to define additional display modes, other than the two defaults I've outlined here. For instance, maybe I want something similar to {node_latest}, but I only want it to display unpublished content if it has some flag set on it, thus filtering out initial drafts which the author isn't ready to release.

I think this sort of approach would support much more robust workflow systems being built on top of it.

colan’s picture

jstoller: Interesting idea. Would we really need to have {node_active} and {node_latest} as separate tables, or could we simply keep their data in {node}?

jstoller’s picture

@colan:
That depends on your perspective. In theory, everything in {node_active} and {node_latest} could be found by joining {node} and {node_revision}, so in that sense the tables aren't necessary. However, there still seems to be a desire for tables with denormalized data. And if you want that, then yes, you do need {node_active} and {node_latest}. Remember, {node} would only contain data that always applies to the entire node, regardless of revision. That means even things like the title would no longer appear in {node}, since it is revision-dependent. I'm not a DB expert, but cramming your denormalized data in that table as well doesn't seem like a good idea to me. And I expect it would make it more difficult to add additional display modes, beyond the "active" and "latest" modes I mention here.

yoroy’s picture

I'm only project-managing a bit here, but after 6 months of inactivity in a major issue it would be good to get an update on how this one relates to #1528028: Add tests for reverting revisions where revision_uid and uid differ, #218755: Support revisions in different states and #1082292: Clean up/refactor revisions.

Thanks :)

Bojhan’s picture