There is a good deal of code (300 loc and counting) in node module that deals with revisions. I think we could see many benefits from stripping this out (at least the UI part) into its own separate module.
Advantages:
- People who don't use and will never use revisions don't need to load that code on every page view.
- With the code separated out, we could do some fancy stuff with revisions like integrate diffing and allow moderated revisions without further complicating the logic (or adding to the 3,000 lines of code) of node.module.
- If we go 'all the way', and make it so that the extra join on the node_revisions table is only done when revisions are enabled for a particular node, this will provide performance benefits (chx's idea).
I'm going to start simple and just strip out the blatantly revision-related stuff (permissions, UI, log field, revisions checkbox).
Comment | File | Size | Author |
---|---|---|---|
#57 | node_revision_separate-120967-57.patch | 116.2 KB | Pancho |
#53 | entity-revision-d8.txt | 104.68 KB | mitchell |
#53 | entity-Revision-d8.txt | 11.1 KB | mitchell |
#32 | node_revisions_17.patch | 63.79 KB | Pancho |
#32 | as-is.txt | 4.89 KB | Pancho |
Comments
Comment #1
webchickHere's a preliminary patch. Still needs lots and lots of work. For the most part, this is a copy/paste job from node.module, except for:
- moved columns around in node / node_revisions table in preparation for tweaking various queries
- made log message actually use the theme_node_revisions_log_message function (as opposed to it just sitting there like it is currently)
- changed 3rd argument from $user to $account in node/node_revisions_user, per coding standards.
Still TODO:
* - help page.
* - node_load.
* - node_save.
* - upgrade path.
* - rework a bazillion queries :)
The idea is to only join on node_revisions if:
a) the node is revision-enabled
b) you are dealing with a revision other than the currently published one.
This requires copying the body/teaser/format fields back to {node}. So the stuff in {node} will always be the currently published revision's stuff. No need to go digging around the node_revisions table unless you're on a page like node/#/revisions.
Comment #2
webchicknode_save updated. node_load isn't though, so patch is still pretty useless atm.
Comment #3
webchickUpdated patch, with some node_load action...only querying the node table unless a particular revision is specified.
"in theory" I'd like to get the query to node_revisions out of node_load entirely. Not sure yet how to do that though... I talked with chx about this and he pointed out that I can't just override $node->body it in nodeapi op 'load' because other modules might want to override its output.
Comment #4
webchickThis patch has two small changes:
- Gets rid of the theme_revisions_overview function; This was an attempt to give some control to module authors over the output of that page, but I'll futz with that more in a separate patch.
- Adds a return $node->nid to the end of node_save, so calling modules can know what they just added.
Comment #5
webchickalso getting rid of node_revisions_load function I was dinking around with but which wasn't doing anything.
Comment #6
webchickHere's a patch that fixes a bunch of stuff so that node_revisions.module should now be (mostly) working.
Known issues:
- When you revert a revision, timestamp is not there so it defaults to Jan 1 1970.
- I had to take out the where condition in node_load in order to get it to not show "page not found" on node/#/revisions/# .. this is probably sub-optimal. :P
Enough of a patch is here now though that this could probably be benchmarked to see if it's worth going the rest of the way.
Comment #7
webchickEven though there are still issues, I've marked this code needs review ... several people have expressed interest in benchmarking this, and I'd love to get some feedback on it from them before I go further.
Comment #8
webchickhere's a patch with the -up option, which also gets rid of the return $node->nid from above (will handle fighting for that in a separate patch :)) and adds in a missing check for node_access('delete') on the revisions.
Here's the command for those who are interested:
svn diff --diff-cmd /usr/bin/diff -x "-up" > node_revisions.patch
Comment #9
webchickHere's the start of an upgrade path. Doesn't work yet. I'll try and look more when I get home later.
Comment #10
chx CreditAttribution: chx commentedI gave a little love to the update path.
Comment #11
webchickNow with brand spankin' new upgrade path!*
*BACK UP before running it. It destroys the node and node revisions table, dontchaknow. ;)
Comment #12
webchickchx caught a logic error in the upgrade path. fixed now. This patch also moves the CREATE {node_revisions} stuff to the node_revisions.install file and adds in some missing $Id$s.
Comment #13
webchickYeesh. :P thanks chx for reminding me not to compare table values that don't exist yet. ;)
Comment #14
webchickNeeded a re-roll.
Comment #15
rötzi CreditAttribution: rötzi commentedsubscribing
Comment #16
ChrisKennedy CreditAttribution: ChrisKennedy commentedMinor: changed the system.install function to system_update_2005().
Comment #17
toemaz CreditAttribution: toemaz commentedme too
Comment #18
webchickI need to update this latest patch, since it's missing, oh. the node_revision module. :P
Comment #19
webchickOk! This should have all the bits. :)
Btw, that wasn't a slam against Chris Kennedy's re-rolling.. I meant the re-rolling I did at "way too late a.m." the other day. ;)
Comment #20
webchickI don't have time to re-roll the patch atm, but I was talking to Nate / quicksketch about this the other day, and he had a great idea.
In order to truly de-couple node and node_revisions.module, we should make a separate node_revisions_revision_load($nid, $vid) function in the node_revisions.module, and completely remove any references to revision in node_load.
Comment #21
catchNeeds a complete rewrite after the module split patch. The other benefits sound very nice.
Comment #22
Dave ReidI definitely think this would provide a substantial benefit to a large number of users like me that don't have any need for node revisions on some of my sites. Will help test.
Comment #23
webchickI'm obviously not going to have time to work on this anymore. ;) But would love to review and commit such a patch. :D
Comment #24
PanchoFor all the reasons webchick advanced, I absolutely support this request!
This is logically the first step.
Secondly we would integrate diffing.
Third important step would be integrating Save as draft's logic into the new core node_revisions.module, which simply means: The latest revision doesn't necessarily need to be the one that is currently published.
Fourth step: Of Revision Moderation there should be only a few lines left that could be merged into modr8.
Whether we then want to have modr8 in core or not, that's an entirely different issue.
Does that sound like a good plan?
Comment #25
lilou CreditAttribution: lilou commented... awesome !
Comment #26
Pancho... but it's probably not gonna happen this way: diffing will be included rightaway, because the other patch is close to rtbc. This means we need to wait until that one is committed and then reconsider how we want to split node.module.
So the best thing we can do right at the moment is to review the "diff in core" patch. The more reviews the better, so we can soon return to this issue here.
Comment #27
catchThis makes complete sense to me, and then adding drafts and simplifying moderation are good plans too.
Comment #28
PanchoOh my,
I've been working all day on a new version of this patch. As you all know, many things have changed in the meanwhile, so I had to start all over again.
But now that most of the work seemed to be done, I was hit by a logic error, which turned out to be a major problem:
I see the following options:
In both cases, if later Node Revisioning gets enabled again, we need to copy over our new nodes from {node} to {node_revisions}.
Either of these ways will work. But we need to take a decision which way we want to go with this patch. Remember, this is about improving performance for small sites that don't need node revisioning. Please throw in your cents ;)
P.S. Only for the records, I enclose my latest patch. It follows the first solution, but doesn't implement a) or b) yet. At the moment this is not worth testing (so I added ".txt" to keep it from being slandered by the testbed ;-).
(1st edit: clarification on solution 2)
(2nd edit: clarification on solution 3)
Comment #29
catchDoes the current patch allow enough to benchmark the difference between 1 and 3? I'd be interested to know how much difference the extra join makes on node/1 and node pages. If it's not a great deal of gain, then it would simplify things a lot to keep it as, is except for moving the code around so that's not loaded unless necessary - both option 1 and 2 are going to take quite a bit of thought and complexity to keep things synced nicely.
I've also been wondering if we could remove the join on the user table - either by denormalising name, and/or moving that into user_nodeapi and only adding the fields if authoring information / user pictures are enabled.
Comment #30
PanchoI clarified the alternatives in my post above. Hopefully this is also what catch thought them to be, but after the edits I can't nail him down on his opinion anymore ;-)
We are actually having a fourth alternative:
Comment #31
PanchoThanks catch for guessing with me, but I agree that we need hard facts, i.e. benchmarks. Therefore we need to materialize some (maybe two) of the four alternatives to a working patch, and then benchmark them against each other and the status quo. We can't do this for all of them - this would just be too much wasted work. To help us pick two of the four alternatives (at least for now), I did some "theoretical benchmarks". If they give us a general idea about the performance in comparison, they exactly fulfill what they are meant to.
In theory, things should be as following:
+2 for 500 lines less code loaded
-1/-2 for 1 resp. 2 queries to determine {node}.vid
-2 for needing to write a full record on {node}
+5 for no need to write a record on {node_revision}
+3 for no need to update {node}.vid
+1 for 1500 lines less code loaded
Notes:
My personal opinion?
1 and 3 look promising:
Some more input on this would be great, otherwise my plans would be to start working out alternative 1 to compare its performance to the status quo. This would show how large the difference between good and evil actually is. Then it might be easier to decide on the route to go.
Comment #32
PanchoI reworked the last patch to reflect the renaming of tables, the introduction of node_load_multiple() and other changes in core.
However the results of the benchmarks I made are not impressive at all. I tested both / and /node/xyz with and without cache. Also I benchmarked a PHP page running 100 heavy node_load_multiple() queries (with $reset flag on).
Apache Bench shows no significant improvements in overall performance.
The Devel query log indicates an improvement around 1/3ms per node_load_multiple() query. This is virtually nothing compared to the heavy 'cache-get' queries or 'upload' (which should get a batch loader as well).
Obviously the additional join doesn't make any difference, and the performance increase of node_load_multiple() can't be further improved.
This certainly doesn't mean we should not make the {node} table independent from {node_revisions}, if we have other reasons. But - if I got it right - we should not expect a significant performance increase from that.
If somebody could repeat and verify my benchmarks (maybe on a more reliable testing server than mine), we should be ready to decide on our path and move forward.
The enclosed patch separates node_revision.module from node.module. Note that the 'Node revision' module should stay disabled, and saving nodes in the stripped node.module will also fail (because determining the vid has not yet been implemented, see above). However, loading and viewing nodes works fine, which at the moment is all we need for doing benchmarks.
Comment #33
PanchoSwitching this to CNR, so it has at least a chance to get some review... :)
Comment #35
PanchoNot bad - I expected more fails... as said in #32, everything needed for benchmarking works fine.
Some more benchmarks for verification and feedback on the general plan is appreciated.
Comment #36
Dave ReidNice work Pancho! I do see a small improvement in the benchmarks [you did] with the patch, not to mention how this will make node editing/creation less complicated for a large number of Drupal sites.
Comment #37
PanchoThat'd be cool! Could you please post the figures and how they were measured?
Also, note that the question is not if we separate out the node revisioning code, but to which extent node.module should be independent from the {node_revision} table.
So either way we go, we're both getting the UI simplification for simple websites and opening up the possibility to add more features to node revisioning (diffing, for example).
Comment #38
catchIt'd be very much worth considering splitting out the loading of revisions from node_load() - or at least doing it using hook_query_alter() or similar. node revisions, and the lowercase queries in user and taxonomy module are the main barriers to re-using code in the new multiple_load() functions - which could probably be abstracted into one or two common helper functions otherwise.
Comment #39
Damien Tournoud CreditAttribution: Damien Tournoud commentedPlease note that node_revisions in its current form may be something of the past thanks to the fields in core initiative. Storing the full $node object in a node_archived_revisions table or something like this could perhaps be enough, and would really simplify the handling code.
Comment #40
bjaspan CreditAttribution: bjaspan commentedsubscribe
Comment #41
jstollerPlease keep in mind the importance of node moderation when playing with revision system. It remains a gaping hole in Core. One I sincerely hope is plugged in Drupal 7.
See: #218755: Support revisions in different states
Comment #42
jstollerShould the priority be optimizing performance for small sites, or optimizing performance for large sites? It seems to me that larger, more complex sites are more likely to push Drupal's performance limits and hence would see more benefit from optimization. These sites also seem more likely to make use of revisions (especially if dif and moderation are included). Simplifying things and improving performance for sites that don't use revisions is a worthy goal, but shouldn't the adoption of these changes be contingent on their not having a negative impact on performance when revisions are enabled? Just a thought.
Comment #43
DamienMcKennaGiven we have fields in core, shouldn't Body be a separate field? Especially when it's optional anyway.
Comment #44
yched CreditAttribution: yched commentedDamienMcKenna : See #372743: Body and teaser as fields
Comment #45
DamienMcKennayched: thanks for referencing that, it's a related change but wasn't mentioned here previously. Cool :)
Comment #46
catchI haven't checked the status of this for a while - a useful cleanup in itself would be a node_load_revision() function to remove all the $vid special casing from node_load_multiple().
Comment #47
cburschkaDo we still wait for #120955: Integrate Diff into Core with this?
It was linked here as RTBC last September, but I just came back from rerolling a patch last updated in December, and the discussion there left it unclear if it would happen at all.
Comment #48
wretched sinner - saved by grace CreditAttribution: wretched sinner - saved by grace commentedMy thoughts are to get this in, and then we can take #120955: Integrate Diff into Core further afterwards. I know that Dries has some reservations about Diff in Core, so it might seem that we will have a better win on getting the split done first, which then would make more sense in having Diff integrated to Revisions.
My thought is to postpone #120955: Integrate Diff into Core on this issue, get Revisions split into it's own module which we can then work on improving without playing around with Node module.
And another thought as to why splitting Revisions off could be worth while - long term, we could look to expand Revisions to cover more than just Nodes, possibly?
Soli Deo Gloria
Comment #49
catchLet's do this once #372743: Body and teaser as fields is in, which will make it a much smaller patch.
Comment #50
cwgordon7 CreditAttribution: cwgordon7 commented#372743: Body and teaser as fields is in, this shouldn't be postponed for that issue's documentation. :)
Comment #51
catchComment #52
ilo CreditAttribution: ilo commentedNow that it is postponed, instead of using diff, I guess a good 'auditing' system to track changes on specific fields could be a good approach. Although the output of diff could be used to view the changes, it is not focused on content auditing, well.. entity (why only content, users would require auditing also) auditing.
Comment #53
mitchell CreditAttribution: mitchell commentedSee also: #996696: Support revisions in Entity API. And, fago's D8 Entity API improvements - entity-property-fago branch - appears to have this synced. Attached are "egrep -r revision ./" and "egrep -r Revision ./" of the latest snapshot.
Setting back to active because #32 is from 2009.
Comment #54
andypostYay! EntityNG support revisions but probably it's still possible to decouple this functionality to separate mode.
Let's see how many tests will be broken
Comment #55
mitchell CreditAttribution: mitchell commented@andypost: Nah, that was just grep output. My bad.
For the real progress, see:
* #1723892: Support for revisions for entity save and delete operations
* #1082292: Clean up/refactor revisions
The diff issues in OP are still active.
Comment #56
PanchoNot really a duplicate, so reopening as active.
Now that revision storage is so deeply integrated into node storage and handling, the amount of code might not be the central argument.
However, while revision management is something many people expect from a good CMS, quite many others simply don't need it and don't want to cope with the additional complexity.
Comparing with how fine-grained modularity is, for example with field modules, it is obvious we should separate out this functionality to a separate module, even if we end up removing not much more than some UI from node.module.
Comment #57
PanchoSo here is a patch separating the node revision UI from Node module.
Even UI separation is not quite complete, let alone the deeper API separation, yet I could already move ~800 lines of functional code and ~500 lines of testing code out of Node module.
Given the slightly lower memory footprint, his should already improve performance a bit for basic sites not using revisions. But to really make a difference much more needs to be taken into consideration.
Node (Revision) tests should pass, while some others would probably fail. Regard it as a first step or proof-of-concept.
Comment #59
giorgio79 CreditAttribution: giorgio79 commentedRelated issue that impacts performance massively. Even if a site does not use revisions, they get to use two field tables, one for the field and another one for revisions.
#1279440: Deeply decouple revisions from Entity API and Node module
Comment #60
Pancho@giorgio79:
Thanks for pointing me to that one, which I fear won't hit D8 anymore, unless all performance potentials need to be unveiled.
This one here aims for a more shallow separation of at least the revisioning UI, so it has better chances to land in D8. At the same time it would be a first step.
Comment #61
giorgio79 CreditAttribution: giorgio79 commentedSome movement here #2144631: Add a revisionable key to field definitions
Comment #62
andypostSeems there's no actionable battleplan here, and looks mostly as duplicate of #1279440: Deeply decouple revisions from Entity API and Node module
Comment #63
timmillwoodIt'd be great to get the functionality of https://www.drupal.org/project/multiversion into 8.1.x as demoed in https://events.drupal.org/losangeles2015/sessions/we-need-revisions-and-...
Comment #69
PanchoIndeed, this effort is superseded by #1279440: Deeply decouple revisions from Entity API and Node module and #2083451: Reconsider the separate field revision data tables and largely irrelevant to D8 as it has evolved since then. Closing this one as a duplicate.