There is a good deal of code (300 loc and counting) in node module that deals with revisions. I think we could see many benefits from stripping this out (at least the UI part) into its own separate module.

Advantages:
- People who don't use and will never use revisions don't need to load that code on every page view.
- With the code separated out, we could do some fancy stuff with revisions like integrate diffing and allow moderated revisions without further complicating the logic (or adding to the 3,000 lines of code) of node.module.
- If we go 'all the way', and make it so that the extra join on the node_revisions table is only done when revisions are enabled for a particular node, this will provide performance benefits (chx's idea).

I'm going to start simple and just strip out the blatantly revision-related stuff (permissions, UI, log field, revisions checkbox).

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

webchick’s picture

Category: bug » feature
Status: Active » Needs work
FileSize
25.94 KB

Here's a preliminary patch. Still needs lots and lots of work. For the most part, this is a copy/paste job from node.module, except for:
- moved columns around in node / node_revisions table in preparation for tweaking various queries
- made log message actually use the theme_node_revisions_log_message function (as opposed to it just sitting there like it is currently)
- changed 3rd argument from $user to $account in node/node_revisions_user, per coding standards.

Still TODO:
* - help page.
* - node_load.
* - node_save.
* - upgrade path.
* - rework a bazillion queries :)

The idea is to only join on node_revisions if:
a) the node is revision-enabled
b) you are dealing with a revision other than the currently published one.

This requires copying the body/teaser/format fields back to {node}. So the stuff in {node} will always be the currently published revision's stuff. No need to go digging around the node_revisions table unless you're on a page like node/#/revisions.

webchick’s picture

FileSize
32.18 KB

node_save updated. node_load isn't though, so patch is still pretty useless atm.

webchick’s picture

FileSize
34.12 KB

Updated patch, with some node_load action...only querying the node table unless a particular revision is specified.

"in theory" I'd like to get the query to node_revisions out of node_load entirely. Not sure yet how to do that though... I talked with chx about this and he pointed out that I can't just override $node->body it in nodeapi op 'load' because other modules might want to override its output.

webchick’s picture

FileSize
34.19 KB

This patch has two small changes:
- Gets rid of the theme_revisions_overview function; This was an attempt to give some control to module authors over the output of that page, but I'll futz with that more in a separate patch.
- Adds a return $node->nid to the end of node_save, so calling modules can know what they just added.

webchick’s picture

FileSize
32.62 KB

also getting rid of node_revisions_load function I was dinking around with but which wasn't doing anything.

webchick’s picture

FileSize
32.97 KB

Here's a patch that fixes a bunch of stuff so that node_revisions.module should now be (mostly) working.

Known issues:
- When you revert a revision, timestamp is not there so it defaults to Jan 1 1970.
- I had to take out the where condition in node_load in order to get it to not show "page not found" on node/#/revisions/# .. this is probably sub-optimal. :P

Enough of a patch is here now though that this could probably be benchmarked to see if it's worth going the rest of the way.

webchick’s picture

Status: Needs work » Needs review

Even though there are still issues, I've marked this code needs review ... several people have expressed interest in benchmarking this, and I'd love to get some feedback on it from them before I go further.

webchick’s picture

FileSize
33.49 KB

here's a patch with the -up option, which also gets rid of the return $node->nid from above (will handle fighting for that in a separate patch :)) and adds in a missing check for node_access('delete') on the revisions.

Here's the command for those who are interested:
svn diff --diff-cmd /usr/bin/diff -x "-up" > node_revisions.patch

webchick’s picture

FileSize
34.76 KB

Here's the start of an upgrade path. Doesn't work yet. I'll try and look more when I get home later.

chx’s picture

FileSize
35.07 KB

I gave a little love to the update path.

webchick’s picture

FileSize
22.97 KB

Now with brand spankin' new upgrade path!*

*BACK UP before running it. It destroys the node and node revisions table, dontchaknow. ;)

webchick’s picture

FileSize
38.31 KB

chx caught a logic error in the upgrade path. fixed now. This patch also moves the CREATE {node_revisions} stuff to the node_revisions.install file and adds in some missing $Id$s.

webchick’s picture

FileSize
38.31 KB

Yeesh. :P thanks chx for reminding me not to compare table values that don't exist yet. ;)

webchick’s picture

FileSize
24.46 KB

Needed a re-roll.

rötzi’s picture

subscribing

ChrisKennedy’s picture

FileSize
24.17 KB

Minor: changed the system.install function to system_update_2005().

toemaz’s picture

me too

webchick’s picture

Status: Needs review » Needs work

I need to update this latest patch, since it's missing, oh. the node_revision module. :P

webchick’s picture

Status: Needs work » Needs review
FileSize
37.2 KB

Ok! This should have all the bits. :)

Btw, that wasn't a slam against Chris Kennedy's re-rolling.. I meant the re-rolling I did at "way too late a.m." the other day. ;)

webchick’s picture

Status: Needs review » Needs work

I don't have time to re-roll the patch atm, but I was talking to Nate / quicksketch about this the other day, and he had a great idea.

In order to truly de-couple node and node_revisions.module, we should make a separate node_revisions_revision_load($nid, $vid) function in the node_revisions.module, and completely remove any references to revision in node_load.

catch’s picture

Version: 6.x-dev » 7.x-dev

Needs a complete rewrite after the module split patch. The other benefits sound very nice.

Dave Reid’s picture

I definitely think this would provide a substantial benefit to a large number of users like me that don't have any need for node revisions on some of my sites. Will help test.

webchick’s picture

Assigned: webchick » Unassigned

I'm obviously not going to have time to work on this anymore. ;) But would love to review and commit such a patch. :D

Pancho’s picture

For all the reasons webchick advanced, I absolutely support this request!

This is logically the first step.
Secondly we would integrate diffing.
Third important step would be integrating Save as draft's logic into the new core node_revisions.module, which simply means: The latest revision doesn't necessarily need to be the one that is currently published.
Fourth step: Of Revision Moderation there should be only a few lines left that could be merged into modr8.
Whether we then want to have modr8 in core or not, that's an entirely different issue.

Does that sound like a good plan?

lilou’s picture

Does that sound like a good plan ?

... awesome !

Pancho’s picture

... but it's probably not gonna happen this way: diffing will be included rightaway, because the other patch is close to rtbc. This means we need to wait until that one is committed and then reconsider how we want to split node.module.
So the best thing we can do right at the moment is to review the "diff in core" patch. The more reviews the better, so we can soon return to this issue here.

catch’s picture

This makes complete sense to me, and then adding drafts and simplifying moderation are good plans too.

Pancho’s picture

FileSize
62.93 KB

Oh my,

I've been working all day on a new version of this patch. As you all know, many things have changed in the meanwhile, so I had to start all over again.
But now that most of the work seemed to be done, I was hit by a logic error, which turned out to be a major problem:

If {node_revisions} has never been installed:
We can copy our {node}.nid to {node}.vid.
If {node_revisions} is installed and Node Revisioning is enabled:
We can copy the appropriate {node_revisions}.vid to {node}.vid.
If {node_revisions} is installed, but Node Revisioning has been disabled:
We can't assume the next unused {node}.vid without querying {node_revisions}.

I see the following options:

  1. We start without a {node_revisions} table (so the node table needs to be self-sufficient). As soon as Node Revisioning gets installed, we copy over the existing nodes and start using both tables. Now, if Node Revisioning is being disabled, we can double-check if any node has additional revisions. If not, we can automatically uninstall {node_revisions}. But if yes:
    1. we reintroduce some kind of sequence variable, so we know the next vid to use in {node}.
    2. instead we could query the highest {node_revisions}.vid and the highest {node}.vid and take the max of both +1 as the next vid.

    In both cases, if later Node Revisioning gets enabled again, we need to copy over our new nodes from {node} to {node_revisions}.

  2. We start without a {node_revisions} table, but once Node Revisioning has been installed we keep using the table until a full uninstall. As long as Node Revisioning is only disabled, we keep syncing {node_revisions} on every added, edited or deleted {node} entry. Should Node Revisioning be enabled again, we can just keep on working.
  3. Or we keep creating the {node_revisions} table on every install (self-sufficient, though) and keep using it. Only code that is specific to manipulating multiple revisions is split out into a new module. Then we don't have to cope with this, but have the least performance gains for simple websites w/o revisioning.

Either of these ways will work. But we need to take a decision which way we want to go with this patch. Remember, this is about improving performance for small sites that don't need node revisioning. Please throw in your cents ;)

P.S. Only for the records, I enclose my latest patch. It follows the first solution, but doesn't implement a) or b) yet. At the moment this is not worth testing (so I added ".txt" to keep it from being slandered by the testbed ;-).

(1st edit: clarification on solution 2)
(2nd edit: clarification on solution 3)

catch’s picture

Category: feature » task

Does the current patch allow enough to benchmark the difference between 1 and 3? I'd be interested to know how much difference the extra join makes on node/1 and node pages. If it's not a great deal of gain, then it would simplify things a lot to keep it as, is except for moving the code around so that's not loaded unless necessary - both option 1 and 2 are going to take quite a bit of thought and complexity to keep things synced nicely.

I've also been wondering if we could remove the join on the user table - either by denormalising name, and/or moving that into user_nodeapi and only adding the fields if authoring information / user pictures are enabled.

Pancho’s picture

I clarified the alternatives in my post above. Hopefully this is also what catch thought them to be, but after the edits I can't nail him down on his opinion anymore ;-)

We are actually having a fourth alternative:

  1. Finally we can leave the data structure as-is, which means that we still have to join {node-revision} on viewing a node. As in (3), only code that is specific to manipulating multiple revisions is split out into a new module.
Pancho’s picture

Thanks catch for guessing with me, but I agree that we need hard facts, i.e. benchmarks. Therefore we need to materialize some (maybe two) of the four alternatives to a working patch, and then benchmark them against each other and the status quo. We can't do this for all of them - this would just be too much wasted work. To help us pick two of the four alternatives (at least for now), I did some "theoretical benchmarks". If they give us a general idea about the performance in comparison, they exactly fulfill what they are meant to.

In theory, things should be as following:

LOAD/UPDATE NODE alternative 1 alternative 2 alternative 3 alternative 4 status quo
Node Revisioning uninstalled 10/5 10/-1 2/1 0/0
Node Revisioning installed, but disabled 10/4 10/-2
Node Revisioning installed and enabled 8/-2 0/0
LOAD NODE: UPDATE NODE:
+8 for no join on {node_revision}
+2 for 500 lines less code loaded
-1 for needing to determine whether {node_revision} exists
-1/-2 for 1 resp. 2 queries to determine {node}.vid
-2 for needing to write a full record on {node}
+5 for no need to write a record on {node_revision}
+3 for no need to update {node}.vid
+1 for 1500 lines less code loaded

Notes:

  • More is better.
  • Diff and save_as_draft are already expected to be included afterwards, which should only affect the amount of loaded code. If you disagree, you need to substract 1 (for loads) resp. 0,5 (for updates) from the first two rows, except from the status quo.
  • Note that the values are just freely weighed estimates, that might or might not turn out to be close to the truth. Also I'm obviously weighing loads higher than updates.
  • My personal opinion?
    1 and 3 look promising:

    • While 1 should bring us maximum performance, it is a bit tricky to implement. Especially the syncing after reenabling Node Revisioning will be quite a task - fortunately we don't need to care too much about this, just to get it benchmarked.
    • 2 would be a compromise, if 1 in the end doesn't work out. Ensuring data integrity is no issue, and at least with a completely uninstalled {node_revisions} table even updates should perform pretty well.
    • 3 is rather defensive and easy to implement, but should still give us good performance on loads. The slightly worse performance on updates is still acceptable.
    • 4 would be a last resort, if we otherwise wouldn't get anything done. In the end it is just some code reorganization which still brings us better modularization, a less cluttered admin UI and some minor performance improvements.
    • Some more input on this would be great, otherwise my plans would be to start working out alternative 1 to compare its performance to the status quo. This would show how large the difference between good and evil actually is. Then it might be easier to decide on the route to go.

Pancho’s picture

Status: Needs review » Needs work
FileSize
4.9 KB
4.89 KB
63.79 KB

I reworked the last patch to reflect the renaming of tables, the introduction of node_load_multiple() and other changes in core.

However the results of the benchmarks I made are not impressive at all. I tested both / and /node/xyz with and without cache. Also I benchmarked a PHP page running 100 heavy node_load_multiple() queries (with $reset flag on).

Apache Bench shows no significant improvements in overall performance.
The Devel query log indicates an improvement around 1/3ms per node_load_multiple() query. This is virtually nothing compared to the heavy 'cache-get' queries or 'upload' (which should get a batch loader as well).

Obviously the additional join doesn't make any difference, and the performance increase of node_load_multiple() can't be further improved.

This certainly doesn't mean we should not make the {node} table independent from {node_revisions}, if we have other reasons. But - if I got it right - we should not expect a significant performance increase from that.

If somebody could repeat and verify my benchmarks (maybe on a more reliable testing server than mine), we should be ready to decide on our path and move forward.

The enclosed patch separates node_revision.module from node.module. Note that the 'Node revision' module should stay disabled, and saving nodes in the stripped node.module will also fail (because determining the vid has not yet been implemented, see above). However, loading and viewing nodes works fine, which at the moment is all we need for doing benchmarks.

Pancho’s picture

Status: Needs work » Needs review

Switching this to CNR, so it has at least a chance to get some review... :)

Status: Needs review » Needs work

The last submitted patch failed testing.

Pancho’s picture

Status: Needs work » Needs review

Not bad - I expected more fails... as said in #32, everything needed for benchmarking works fine.
Some more benchmarks for verification and feedback on the general plan is appreciated.

Dave Reid’s picture

Nice work Pancho! I do see a small improvement in the benchmarks [you did] with the patch, not to mention how this will make node editing/creation less complicated for a large number of Drupal sites.

Pancho’s picture

Status: Needs work » Needs review

That'd be cool! Could you please post the figures and how they were measured?

Also, note that the question is not if we separate out the node revisioning code, but to which extent node.module should be independent from the {node_revision} table.
So either way we go, we're both getting the UI simplification for simple websites and opening up the possibility to add more features to node revisioning (diffing, for example).

catch’s picture

It'd be very much worth considering splitting out the loading of revisions from node_load() - or at least doing it using hook_query_alter() or similar. node revisions, and the lowercase queries in user and taxonomy module are the main barriers to re-using code in the new multiple_load() functions - which could probably be abstracted into one or two common helper functions otherwise.

Damien Tournoud’s picture

Please note that node_revisions in its current form may be something of the past thanks to the fields in core initiative. Storing the full $node object in a node_archived_revisions table or something like this could perhaps be enough, and would really simplify the handling code.

bjaspan’s picture

subscribe

jstoller’s picture

Please keep in mind the importance of node moderation when playing with revision system. It remains a gaping hole in Core. One I sincerely hope is plugged in Drupal 7.

See: #218755: Support revisions in different states

jstoller’s picture

Should the priority be optimizing performance for small sites, or optimizing performance for large sites? It seems to me that larger, more complex sites are more likely to push Drupal's performance limits and hence would see more benefit from optimization. These sites also seem more likely to make use of revisions (especially if dif and moderation are included). Simplifying things and improving performance for sites that don't use revisions is a worthy goal, but shouldn't the adoption of these changes be contingent on their not having a negative impact on performance when revisions are enabled? Just a thought.

DamienMcKenna’s picture

Given we have fields in core, shouldn't Body be a separate field? Especially when it's optional anyway.

yched’s picture

DamienMcKenna’s picture

yched: thanks for referencing that, it's a related change but wasn't mentioned here previously. Cool :)

catch’s picture

I haven't checked the status of this for a while - a useful cleanup in itself would be a node_load_revision() function to remove all the $vid special casing from node_load_multiple().

cburschka’s picture

Do we still wait for #120955: Integrate Diff into Core with this?

It was linked here as RTBC last September, but I just came back from rerolling a patch last updated in December, and the discussion there left it unclear if it would happen at all.

wretched sinner - saved by grace’s picture

My thoughts are to get this in, and then we can take #120955: Integrate Diff into Core further afterwards. I know that Dries has some reservations about Diff in Core, so it might seem that we will have a better win on getting the split done first, which then would make more sense in having Diff integrated to Revisions.

My thought is to postpone #120955: Integrate Diff into Core on this issue, get Revisions split into it's own module which we can then work on improving without playing around with Node module.

And another thought as to why splitting Revisions off could be worth while - long term, we could look to expand Revisions to cover more than just Nodes, possibly?

Soli Deo Gloria

catch’s picture

Status: Needs review » Postponed

Let's do this once #372743: Body and teaser as fields is in, which will make it a much smaller patch.

cwgordon7’s picture

Status: Postponed » Needs work

#372743: Body and teaser as fields is in, this shouldn't be postponed for that issue's documentation. :)

catch’s picture

Version: 7.x-dev » 8.x-dev
ilo’s picture

Now that it is postponed, instead of using diff, I guess a good 'auditing' system to track changes on specific fields could be a good approach. Although the output of diff could be used to view the changes, it is not focused on content auditing, well.. entity (why only content, users would require auditing also) auditing.

mitchell’s picture

Status: Needs work » Active
FileSize
11.1 KB
104.68 KB

See also: #996696: Support revisions in Entity API. And, fago's D8 Entity API improvements - entity-property-fago branch - appears to have this synced. Attached are "egrep -r revision ./" and "egrep -r Revision ./" of the latest snapshot.

Setting back to active because #32 is from 2009.

andypost’s picture

Status: Active » Needs review

Yay! EntityNG support revisions but probably it's still possible to decouple this functionality to separate mode.

Let's see how many tests will be broken

mitchell’s picture

Status: Needs review » Closed (duplicate)

@andypost: Nah, that was just grep output. My bad.

For the real progress, see:
* #1723892: Support for revisions for entity save and delete operations
* #1082292: Clean up/refactor revisions

The diff issues in OP are still active.

Pancho’s picture

Status: Closed (duplicate) » Active
Issue tags: +Usability, +node revisions, +EditorUX

Not really a duplicate, so reopening as active.

Now that revision storage is so deeply integrated into node storage and handling, the amount of code might not be the central argument.
However, while revision management is something many people expect from a good CMS, quite many others simply don't need it and don't want to cope with the additional complexity.

Comparing with how fine-grained modularity is, for example with field modules, it is obvious we should separate out this functionality to a separate module, even if we end up removing not much more than some UI from node.module.

Pancho’s picture

Status: Active » Needs review
Issue tags: +Performance
FileSize
116.2 KB

So here is a patch separating the node revision UI from Node module.
Even UI separation is not quite complete, let alone the deeper API separation, yet I could already move ~800 lines of functional code and ~500 lines of testing code out of Node module.
Given the slightly lower memory footprint, his should already improve performance a bit for basic sites not using revisions. But to really make a difference much more needs to be taken into consideration.

Node (Revision) tests should pass, while some others would probably fail. Regard it as a first step or proof-of-concept.

Status: Needs review » Needs work

The last submitted patch, node_revision_separate-120967-57.patch, failed testing.

giorgio79’s picture

Related issue that impacts performance massively. Even if a site does not use revisions, they get to use two field tables, one for the field and another one for revisions.
#1279440: Deeply decouple revisions from Entity API and Node module

Pancho’s picture

@giorgio79:
Thanks for pointing me to that one, which I fear won't hit D8 anymore, unless all performance potentials need to be unveiled.
This one here aims for a more shallow separation of at least the revisioning UI, so it has better chances to land in D8. At the same time it would be a first step.

giorgio79’s picture

andypost’s picture

timmillwood’s picture

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Pancho’s picture

Status: Needs work » Closed (duplicate)

Indeed, this effort is superseded by #1279440: Deeply decouple revisions from Entity API and Node module and #2083451: Reconsider the separate field revision data tables and largely irrelevant to D8 as it has evolved since then. Closing this one as a duplicate.