Currently all node revisions are stored in a serialized field in node.table and retrieved for _each_ page view although they are rarely needed. However, we have agreed that serializing data is bad and that we should try to keep the memory foot print pf Drupal small.

Therefore I propose to create a separate revisions table which would be in principle identical to the node table, only that it could have several old copies of the same node. Extra data added by other modules could be added in a serialized field unless we find a better solution.

CommentFileSizeAuthor
#146 blog_8.patch1.88 KBkilles@www.drop.org
#142 drupal-head-revisions_43+pgsql_2.diff14.37 KBCvbge
#137 drupal-head-revisions_43+pgsql.diff13.84 KBCvbge
#136 revisions_43.patch54.49 KBkilles@www.drop.org
#135 revisions_42.patch54 KBkilles@www.drop.org
#132 revisions_41.patch_0.patch1.24 KBCvbge
#130 revisions_41.patch.patch1.14 KBCvbge
#129 revisions_41.patch52.4 KBkilles@www.drop.org
#128 revisions_40.patch52.22 KBkilles@www.drop.org
#124 revisions_39.patch52.25 KBkilles@www.drop.org
#122 revisions_38.patch52.26 KBkilles@www.drop.org
#120 revisions_37.patch52.29 KBkilles@www.drop.org
#119 revisions_36.patch55.66 KBkilles@www.drop.org
#118 revisions_35.patch55.51 KBkilles@www.drop.org
#115 scripts.tar.gz4.39 KBkilles@www.drop.org
#114 revisions_34.patch52.92 KBkilles@www.drop.org
#111 revisions_33.patch51.27 KBkilles@www.drop.org
#107 revisions_32.patch51.27 KBkilles@www.drop.org
#103 revisions_31.patch48.78 KBkilles@www.drop.org
#97 revisions_30.patch53.13 KBkilles@www.drop.org
#88 revisions_29.patch54.83 KBkilles@www.drop.org
#86 revisions_28.patch53.47 KBkilles@www.drop.org
#80 revisions_27.patch39.05 KBkilles@www.drop.org
#78 revisions_26.patch46.45 KBkilles@www.drop.org
#74 revisions_25.patch.patch528 bytesJeremy
#73 revisions_25.patch60.38 KBkilles@www.drop.org
#68 revisions_24_0.patch60.39 KBkilles@www.drop.org
#67 revisions_24.patch60.39 KBkilles@www.drop.org
#65 revisions_23.patch61.17 KBkilles@www.drop.org
#61 revisions_22.patch60.29 KBkilles@www.drop.org
#60 revisions_21.patch59.42 KBAnonymous (not verified)
#59 revisions_20.patch75.52 KBAnonymous (not verified)
#58 revisions_18.patch52.31 KBkilles@www.drop.org
#57 revisions_17.patch49.64 KBkilles@www.drop.org
#56 revisions_16.patch52.49 KBkilles@www.drop.org
#48 revisions_15.patch52.39 KBkilles@www.drop.org
#47 revisions_14.patch49 KBkilles@www.drop.org
#46 revisions_13.patch49.83 KBkilles@www.drop.org
#45 revisions_12.patch49.83 KBkilles@www.drop.org
#44 revisions_11.patch48.95 KBkilles@www.drop.org
#43 revisions_10.patch49.67 KBkilles@www.drop.org
#41 revisions_9.patch49.29 KBkilles@www.drop.org
#39 revisions_8.patch55.71 KBkilles@www.drop.org
#38 revisions_7.patch55.69 KBkilles@www.drop.org
#37 revisions_6.patch49.18 KBkilles@www.drop.org
#36 revisions_5.patch51.13 KBkilles@www.drop.org
#34 revisions_4.patch30 KBkilles@www.drop.org
#33 revisions_3.patch29.93 KBkilles@www.drop.org
#32 revisions_2.patch49.49 KBkilles@www.drop.org
#31 revisions_1.patch51.77 KBkilles@www.drop.org
#30 revisions-bug.png76.06 KBDries
#29 revisions_0.patch55.96 KBkilles@www.drop.org
#25 revisions.patch52.96 KBkilles@www.drop.org
#22 Drupal-Improved_Revision_Schema_10-26-2004-revisited.patch.gz11.02 KBkilles@www.drop.org
#18 Drupal-Improved_Revision_Schema_10-26-2004.patch_0.gz11 KBJerk Face
#16 Drupal-Improved_Revision_Schema_10-26-2004.patch.gz11 KBJerk Face
#10 Drupal-Improved_Revision_Schema_08-23-2004.patch.gz10.92 KBJerk Face
#4 Drupal-Improved_Revision_Schema_07-30-2004.patch.gz10.47 KBJerk Face
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jhriggs’s picture

I too think the serialized approach is less than desirable, but here's an alternative. This would likely take some considerable rework in core and contrib, but the following is how we handle similar types of situations in our databases at work. It is more elegant that a separate table, and avoids the (almost exact) duplication of a table. Instead of separate tables, keep all revisions of nodes in the node table as follows:

* add field: active (0/1 or Y/N)
* add field: revision
* every revision of a node is stored in the node table; however, only one revision can be active at any given time
* nid can no longer be unique -- primary/unique key becomes (nid, active)
* any time a node is loaded, updated (without revision), etc., the active version is used.

Thoughts?

killes@www.drop.org’s picture

I am not opposed to your scheme, but I want to stress the following:

* Duplicating a table's structure is not bad (IMHO) as long as the content is different.

* having two tables will allow us to have a rather small node table. This is (maybe) a performance gain.

jhriggs’s picture

I don't necessarily think that duplicating a table's structure is _bad_. It just seems to be wasteful and a pain to maintain. (Every change to the node table is made twice...easy to do, but also easy to miss perhaps.)

As for performance, as long as nid and the active indicator are indexed, there shouldn't be any performance loss. Also, archiving an old version when making a new revision will be much simpler: just change the active indicator rather than copying an entire node to another table (and ensuring everything gets copied...again a potential maintainance issue).

To be honest, I would just like to see the serialized data go away, regarless of what approach is taken.

Jerk Face’s picture

I'm interested in using Drupal for a large scale wiki-type project. In order to do this, I need revisions to be in their own table.

Attached is a patch to do just that. Most of the changes are pretty self explanatory. Spreading out node data across two tables meant that I had to add database functions to do locking/transactions. Without this, race conditions in which the database becomes corrupted are possible.

Jerk Face’s picture

Oh yeah... The patch is a diff against Drupal CVS

Anonymous’s picture

Gerhard speaking.

Nick, thanks a lot for your nice patch! It saves me a great deal of labour. I looked through it and immediately liked it. You not only put the old revisions into a new table but also the current one. Do you have an estimate how much more expensive the additional join is?

Besides a few minor coding style issues I found a major one: Just a few hours before you uploaded your patch JonBob's node access patch hit core. That means your patch won't apply anymore as all the queries you change have been changed. Can I bug you to update your patch?

Anonymous’s picture

Also I think that your upgrade path loses existing revisions.

drumm’s picture

I think this is the proper way to do things. No columns are duplicated, there is no serialized data, and only the fields that are logically revised are stored. Nothing jumped out at me as a way to have my node modules be able to keep a table of revisions of additional fields. I'm guessing this could be done within the confines of _insert and _update.

Assuming the upgrade path works and modules can extend it I give it a +1.

Jerk Face’s picture

It figures that just as I finish a big patch, another patch comes along and breaks it. Oh well, it should be a pretty easy to fix. I'll work on it.

Fixing the upgrade path to keep revisions should be fairly painless.

I found another issue that needs to be fixed before this patch gets merged. There format of a node needs to be stored for each revision. Otherwise, for modules that store a format for the nodes, such as page and book, if you write one revision in PHP and the next in HTML, the PHP revision will be displayed as HTML. This is part of a larger issue of how node modules should store revisions of additional fields. I think each module that wants to do this should create another table with (nid, revid) as the primary key. Just as when they want to add fields to a node they create another table with nid as the primary key.

As far as performance goes, for sites that make heavy use of revisions, an extra join on primary keys is going to be a lot faster than grabbing all of the revisions from that database everytime. We would need to run benchmarks to determine is the overall difference in speed is for an average site is a gain or a loss. I'm guessing it's very minor either way.

Jerk Face’s picture

Assigned: Unassigned » Jerk Face
FileSize
10.92 KB

Here's an updated patch against CVS that puts revisions in their own table, provides an upgrade path, and fixes the format related bugs in the last patch.

Hopefully, this can make it into CVS as soon as the freeze is over.

moshe weitzman’s picture

Interesting patch ... drumm's question is still outstanding. how do modules store revisions of their fields? Are they expected to manage this on their own? Thats not how it works today.

As an aside, i am seeing profile_ fields in my node.revisions column. One could argue that those need not be saved. They pertain to the node author, not to the node itself.

Jerk Face’s picture

Having modules be responsible for storing revisions of their own fields is a side-effect of storing revision data in tables. There's really no way around it. However, revisions generally don't make sense for node types that don't have PHP/HTML content, such as polls. I think it's going to be a pretty rare scenario for a new node type to want another field to change per-revision, so it's a pretty good trade-off.

Storing fields that shouldn't be part of revisions, such as the profile_ fields, is a side-effect of storing revisions as serialized objects. Applying this patch will free up that wasted space. :)

Anonymous’s picture

There should be a hook that let's the module choose whether it supports history. This way a module author can prevent the user from doing something that may break his module or just cause undefined behavior. If the module doesn't support history then don't let the user/admin choose to add history to nodes of that type.

Craig

Jerk Face’s picture

I agree, there should be an API change to make specifying support for revisions easier. In the interests of keeping patches small and keeping to one change per patch, I think the API change should be a separate issue.

A sort of ad-hoc API to decide whether or not a module supports revisions by default already exists. Instead of having a hook, modules set the default value of the "Create new revision" field in the edit form. The admin can change this option in admin/node/configure/defaults. This patch doesn't change that.

Revisions are broken for node types that have their own database structure, like polls, even when storing them as serialized objects. This patch doesn't change that, either.

moshe weitzman’s picture

I'm guessing that someone is going to have to demonstrate that this patch performs as well as current drupal before it gets comitted. i think this patch is a few benchmarks from being comitted.

Jerk Face’s picture

I ran some really unscientific benchmarks, and it looks like this patch has a negligible affect on performance.

I used apache bench and the database from theregular.org, which doesn't contain any revisions (worst case scenario for this patch) and contains several hundred nodes. Both the patched and unpatched versions hovered between 2.36 and 2.38 requests per second.

The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2' http://192.168.0.100/

An updated patch that should apply to CVS is attached.

Jerk Face’s picture

I ran some really unscientific benchmarks, and it looks like this patch has a negligible affect on performance.

I used apache bench and the database from theregular.org, which doesn't contain any revisions (worst case scenario for this patch) and contains several hundred nodes. Both the patched and unpatched versions hovered between 2.36 and 2.38 requests per second.

The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2' http://192.168.0.100/

An updated patch that should apply to CVS is attached.

Jerk Face’s picture

I ran some really unscientific benchmarks, and it looks like this patch has a negligible affect on performance.

I used apache bench and the database from theregular.org, which doesn't contain any revisions (worst case scenario for this patch) and contains several hundred nodes. Both the patched and unpatched versions hovered between 2.36 and 2.38 requests per second.

The command I used was:
ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2' http://192.168.0.100/

An updated patch that should apply to CVS is attached.

elias1884’s picture

please overthink the revision system default workflow as well. don't look at the revision system as an isolated system but as a part of the whole workflow system!

if you combine revisions with the moderation queue the most logic default workflow would be like that:

auth user creates node (revision #0)
admin approves the node (status = 1, moderation = 0)
=> node publicly available
auth user finds typo and changes node (revision #1, status = 0, moderation = 1)
-------
what happens at that point at the moment is, that the node is not accessible anymore at all until the new revision is approved by admin. of course the new revision should not go online until reviewed and approved, this is absolutely correct, but there is no reason to not take the old revision offline, since it was already approved and should therefore be online until the new revision is approved. it is not practical if a node disappears only because the author corrected a typo.
-------
admin approves the node (status = 1, moderation = 0)

eventhough I first thought a plain boolean active field would not be capable of providing that functionality if finally came to the conclusion, that it can. The only thing to do is to not set that bit, when a new revision is created, but when it is approved (in case moderation is activated under default workflow). Every revision should have its own moderation, status and active field and on approval they are set like this (status=1, moderation=0, active=Y).

When you wanna rollback to an old revision, you can chose between all revisions that already have the moderation bit set back to 0 again and the published to 1. There should be an extra permission for rollback!

another concern that I have about the default workflow is, that users can't see the content, they have just created, when moderation is enabled. Eventhough, there is a big fat "submission accepted" presented after submissions, unexperienced users tend to question the information those stupid tincans give them, if they can't find their content afterwards. Many users are really lazy bastards and they don't even read the status messages. The best feedback about whether his story was submitted successfully or not of course is, if he can find the story somewhere on the site, maybe with a status message on top of it, mentioning, that the content is currently not publicly available since it has not been approved yet. there should be a my content section under my account, like somebody is trying to do with the workspace module I guess.

so my suggestion is to make (status=0, moderation=1) still available for the creator under a my content section somewhere!

Jerk Face’s picture

I agree. The current workflow for moderation queues and revisions needs to change, but this patch isn't the place for it. The patch is already too big, and it only does the backend stuff.

Instead of adding more to this patch and making it take even longer to get into core, would you mind creating a new issue for your UI suggestions, so the those changes can be added as a separate patch?

Thanks,
Nick

Dries’s picture

This patch is _much_ needed so I'd love to see someone revive it. In order for this patch to be accepted, the following needs to be done:

  1. Update this patch to CVS HEAD.
  2. Rename revid to vid.
  3. Rename node_rev to node_revisions.
  4. Rename node_rev.changed to node_revisions.timestamp.
  5. Rename $rnode to $revision.
  6. Fix the coding style to match Drupal's: proper spacing, single quotes where possible, proper variable names.
  7. Benchmark this patch with a large database with enough revisions. I'd be happy to benchmark this on my local copy of the drupal.org database.
  8. The book.log field should probably move to the node_revisions table. This can be done in a separate patch.
  9. Investigate whether transactions are well-supported.
killes@www.drop.org’s picture

I've worked a bit on the patch (coding style issues as mentioned by Dries). One thing I noticed is that the patch uses REPLACE. IIRC this needs to be chagned to "UPDATE, if fail INSERT" for pgsql compatibility.

Nick, are you still interested in working on that patch? I'd like to know how it works on your site and work on getting it into core.

Dries’s picture

Gerhard: your patch does not apply.

killes@www.drop.org’s picture

Yes, I know, that was the same version as I mailed to you earlier.

killes@www.drop.org’s picture

Assigned: Jerk Face » killes@www.drop.org
FileSize
52.96 KB

Ok, upüdated the patch to cvs.

Dries’s picture

Some more comments:

  • db_begin_transaction() and db_end_transaction() do not belong in database.inc, but in database.mysql.inc and database.pgsql.inc respectively.
  • The node module calls node_revisionsision_list() which is not defined. (Fxed that on my local copy.)
  • Do db_begin_transaction() and db_end_transaction() deprecate Jeremy's table locking patch?
  • The upgrade path assigns the wrong user ID to each revision.
  • The upgrade path assigns the wrong date to each revision (that or a node's revision page shows the wrong usernames/dates).
  • The coding style needs a bit of work, but we can worry about that later.
Jerk Face’s picture

If you need any help getting those things fixed, just let me know.

Jerk Face’s picture

How this relates to Jeremy's node locking patch:

There was lots of discussion, and node locking was decided against because from an end user point of view you never want a node to be locked. He's now advocating for a much simpler patch that warns users if their changes will overwrite someone elses. That patch still has a race condition, which might be fixed using db_begin_transaction().

http://drupal.org/node/6025

killes@www.drop.org’s picture

FileSize
55.96 KB

Here is an updated patch that tries to address Dries concerns.

Dries’s picture

FileSize
76.06 KB

It didn't fix the aforementioned bugs. See attached screenshot.

killes@www.drop.org’s picture

FileSize
51.77 KB

Ok, here is a new version. Dries and myself worked hart at it, so please have a look.

what is still missing

- database upgrades for the core modules with an own table
- contrib modules need an upgrade too.
- do we need nid and vid in both the node and the node_revisions table?
- the amount of sql queries means a good stress testing for large databases.

killes@www.drop.org’s picture

FileSize
49.49 KB

Here is an updated patch. We discussed to keep the current title in node module and also in the revisiosn table. This is content duplication but will save many joins as many queries only need the title of a node. Discussion is welcome.

killes@www.drop.org’s picture

FileSize
29.93 KB

I've implemented the aforementioned solution. This makes the patch much smaller. The patch now also removes taxonomy_node_has_term() which wasn't used anywhere. I'd really apprciate if some people could test drive the patch. It will be another huge improvement for 4.6.

killes@www.drop.org’s picture

FileSize
30 KB

Another revision. Steven didn't like my literal $node->vid in queries.

killes@www.drop.org’s picture

- database upgrades for the core modules with an own table
- contrib modules need an upgrade too.
- do we need nid and vid in both the node and the node_revisions table?
- the amount of sql queries means a good stress testing for large databases.

These issues are still open, btw. Especially the first one needs to be tackled.

killes@www.drop.org’s picture

FileSize
51.13 KB

Here is a patch that has the database tables updated for forum, book, and page module.

killes@www.drop.org’s picture

FileSize
49.18 KB

Yet another update to keep it working with head. The patch now also removes the table definitons for the page table.

killes@www.drop.org’s picture

FileSize
55.69 KB

Sorry, that was the old version, this is the right one.

killes@www.drop.org’s picture

FileSize
55.71 KB

Updated once more.

Dries’s picture

Anyone to help review/test this?

killes@www.drop.org’s picture

FileSize
49.29 KB

Updated again, the update functions occurred twice. Thanks Bart.

killes@www.drop.org’s picture

Don't know if the db I am using is corrupted or what. I still do have some didficulties.
The latest patch is attached.

killes@www.drop.org’s picture

FileSize
49.67 KB

I am probably slowly going mad ...

killes@www.drop.org’s picture

FileSize
48.95 KB

The update issue still needs investigating. This patch is updated for cvs.

killes@www.drop.org’s picture

FileSize
49.83 KB

Ok, here is a new version. I've solved my troubles with book.module. There are still some issues with forum module. Possibly due to inconsistent database.

killes@www.drop.org’s picture

FileSize
49.83 KB

Turns out the drupal.org database had indeed some quirks. Please run this query in your oldest db and tell me the result:

select nid,type from node where type like '%/%';

If you get a non-zero result we might need to add another security update.

The patch could use still more testing, though.

killes@www.drop.org’s picture

FileSize
49 KB

Ok, we are getting somewhere. At a first glance the update is working. There is a problem remaining: the revisions tab will be shown whether the node has revisions or not. Not sure we can/need to fix this.

People with a drupal.org account can log in at http://killes.drupaldevs.org/revision/ and poke around. Your permissions will be the same as on drupal.org. Feel free to vreak everything but don't forget to file complaints here. (Note: this is only a pruned version of the drupal.org database with all project nodes and nodes with nids > 7000 dropped).

killes@www.drop.org’s picture

FileSize
52.39 KB

There was some error in node_save and also the patches to the database.inc files got lost...

robertDouglass’s picture

Submitting book pages doesn't work on your test site. It puts the entire content of the preview inside the body textarea. I wrote a sentence in the body and the log, and pressing preview put several lines of HTML containing both sentences in the body textarea on the preview page, plus the book page wouldn't submit.

-R

Junyor’s picture

0 results here. I started using Drupal with version 4.4, though.

killes@www.drop.org’s picture

@Junyor: Thanks, that's a good sign. Maybe somebody else has an older db to try.

@robertDouglass: The first effect you describe is due to drupaldevs running on PHP 5. I am unsure why the second thing does not work. In node_save() the node object has a nid although there is none in the form. Very strange.

I've enabled display of db queries on the testsite.

dmjossel’s picture

No results here on the query:

select nid,type from node where type like '%/%';

On a database that was put in place prior to Drupal 4 and is now running on 4.5.2.

killes@www.drop.org’s picture

@dmjossel: thanks.

@all. The strange problem I reported was apparently php 5 related. After applying Steven's php 5 patch it went away. One error is remaining: If I create a new forum topic it will be shown as part of the book on preview. Hmm, that was due to a db that got corrupted during testing so that is fixed too.

Please poke around at the test site and look if you find more errors.

Steven’s picture

By the way, I just remembered that Drupal.org has some blogs lingering on in the database even though blog.module is not enabled. Perhaps this is causing troubles?

Anonymous’s picture

I can't see why it would. Drupal.org will need extra updates for images and project nodes because those have their own tables. GK.

killes@www.drop.org’s picture

FileSize
52.49 KB

Updated to apply to cvs again.

killes@www.drop.org’s picture

FileSize
49.64 KB

Updated again.

All we need is a patch to upload module and an upgrade path for it.

killes@www.drop.org’s picture

FileSize
52.31 KB

Updated once more. Moved log field from book to node_revisions table as discussed in Antwerp. upload module still missing.

We need to decide under which circumstances the log field should be displayed. Should that be added to the workflow? Should it depend on the revisions setting?

Anonymous’s picture

Category: task » bug
FileSize
75.52 KB

Ok, here it is: Yet another revision of this grrrrreat patch.

Changes from previous versions:
- supports versioning for uploaded files. A problem is that if you delete a file, it will be gone for all revisions.
- the log field is now in the node_revsions table, but each module has to decide whether to show it or not.
I've implemented it for the page and the book type odes. Also, the field can be edited when adding non-book nodes to the book. The log is displayed on the revisions page and if a node is moderated.
- the revisions are moved to an old_revisions table to a) get the node table smaller and b) still leave the mavailable for contrib modules that want to retreive old version data.

The patch has been applied to killes.drupaldevs.org/revision where it can be tested by anybody (especially people who have "site admin" rights on drupal.org).
The database is from drupal.org and you should b able to log in with your pass or simply mail yourself a new one.

Gerhard

Anonymous’s picture

FileSize
59.42 KB

BTW, I marked this a bug because atm the revisions field can grow quite big. Neil has reported problems from some users who were not able to load some nodes due to to many large revisions.

Also, som unrelated stuff crept into the patch. New version attached.

killes@www.drop.org’s picture

Version: » x.y.z
FileSize
60.29 KB

Ok, I think I got it.

Changes to last version:

- uploads are no properly versioned.

Missing are still pgsql checks and updates.

Anonymous’s picture

Version: x.y.z »

Was able to get http://drupal.org/files/issues/revisions_21.patch to work with drupal-cvs.tar.gz (10 March 2005) by:

- includes/database.mysql.inc: Commenting out duplicates for functions function db_begin_transaction and function db_commit_transaction

- modules: node.module: Removing "'title' => $node->title," from $node_table_values variable declaration and removing "'title' => "'%s'"," from "$node_table_types" variable declaration.

Happy to submit a patch if requested. I'll watch this thread.

killes@www.drop.org’s picture

The duplicate function has been removed in rev 22 of this patch.

Why do you think the changes in node_save are needed? Titles are saved in both tables for performance reasons.

jlerner’s picture

Hi - I posted comment #62. The changes to node_save appear to be needed because recent patches (both 21 and 22) remove the field 'title' from table 'node'. So without the changes to node_save, node.module is broken and generates errors.

Joshua

killes@www.drop.org’s picture

FileSize
61.17 KB

Thanks, Joshua, for catching this. node:title is there to stay.

moshe weitzman’s picture

since HEAD is open again, perhaps it is a good time to revisit this patch.

once this is committed, lets address - http://drupal.org/node/11071 "node_validate does not respect group editing"

killes@www.drop.org’s picture

FileSize
60.39 KB

Updated.

killes@www.drop.org’s picture

FileSize
60.39 KB

Updated.

Dries’s picture

I'll commit this patch later this week! If you haven't checked this patch already, I urge you to test/check it out because it will have significant impact on existing code and modules!

Dries’s picture

Also, what do people think about the n.title being duplicated?

chx’s picture

I won't lose any sleep because of duplicated titles...

killes@www.drop.org’s picture

Let me explain why I have chosen to duplicate the title (and also the uid): If you look at all the queries in Drupal, you will find that most of them only need the title and th uid of a node. That is, by duplicating it, we save expensive joins on the node_revisions table. Due to this fact, this patch is actually a performance improvement.

A note about updating contrib module:

Strictly speaking they wouldn't need to be updated. They only need to if their authors decide that their info should be available for revisioning. The upgrade path for forum.module in my update.inc patch (+ the forum patch)
should show you what needs to be done.
I will write a note for the update page once the patch hits core.

killes@www.drop.org’s picture

FileSize
60.38 KB

Updated to cvs.

Dries: Based on some remarks in #drupal this is the last update I am going to do. Apply it or won't fix it.

Jeremy’s picture

FileSize
528 bytes

That's a big patch. I've only started looking through it. I noticed one little typo, affecting updates. A patch to your last patch is attached.

I'm running with the revision patch on my dev server now happily. I like the concept.

What happens if you click 'stop' on your browser in the middle of a MySQL "transaction"? I assume that kills the connection to MySQL, and the lock is freed? But this then leaves changes only partially applied?

What exactly does locking/unlocking the tables buy us in MySQL? I don't see anywhere that we detect if an apply fails part way through, and thus roll back...? What am I missing?

Dries’s picture

Jeremy: many of us are worried about the performance ramifications this patch introduces. Early experiments showed a small performance improvement (while a performance regression might be expected). More performance reports from large sites like kerneltrap.org will certainly help this patch. Mind to do a quick performance comparision and to report back with some numbers? Thanks.

Jeremy’s picture

Dries: I'm not running HEAD on kerneltrap, so this really isn't a possibility. Furthermore, until I understand why we're locking tables, I don't like it. The idea of revisions in their own tables is great. The idea of locking tables to get (without any obvious benefit) there really worries me.

killes@www.drop.org’s picture

@Jeremy: Thanks for looking at the patch! Also for catching the typo. :)

Did you try to upgrade your database? If yes, how did it go? One of
Dries' concerns is the complexity of the upgrade. How many nodes and
revisions did the db have?

About database locking: This part of the patch was created by Nick and
I simply continued to use it.

Maybe the code should rather be:

if(db_begin_transaction(array('{node}', '{node_revisions}', '{watchdog}', '{sessions}', '{files}'))) {
db_query($node_query, $node_table_values);
db_query($revisions_query, $revisions_table_values);
db_commit_transaction();
...
}

The idea is probably to avoid two updates at the same time. I don't
think the locking helps if you abort the script at an inconvenient
time. Rollbacks aren't implemented in all mysql versions.

We could omit the db locking if deemed inappropriate. Maybe Nick can
explain his ideas behind this.

@Dries: I wonder who the "many of us" are. They certainly haven't
spoken to me. Moshe had some reservations about the upgrade path and
project module, but the time that project module abused revisions to
store issue updates was long ago and his reservations were resolved.
Nobody else (besides you of course and now Jeremy) has voiced
reservations in a way that was audible to me.

If you grep through the patch you will notice that there are only four
queries which have a join on the node_revisions table. Two of them are
in node_load and in the other cases the join replaced a join on the
node table. The two queries in node_load are the only ones that have
both a join on the node and the revisions query. Thus, loading of
individual nodes might become somewhat slower. All other queries will
be faster since the node table is now much smaller. Also, node loading
does not have to be slower, it depends on your node table. If you had
a lot of revisions and thus a large table, then the new scheme will
make your queries actually faster since we do not load the revisions
on each and every node load anymore. If you didn't have many revisions
your node_load migth be somewhat slower.

WRT to the update script Karoly pointed out that we could use multiple
insert queries instead one query per revision. This would probably
make the update somewhat faster. I am willing to work on this iff you
declare that you will commit the patch afterwards. I'd need to know if
this will work on pgsql and on all supported mysql versions before I
work on it.

About locking: Database locking is dog slow, at least on mysql. I was
using locks in an earlier version of the upgrade script but had to
remove it for (serious!) performance reasons.

killes@www.drop.org’s picture

FileSize
46.45 KB

Ok, another update, cause I need it myself.

I've left out the transaction stuff for now. It is in principle unrelated to this patch and should be discussed elsewhere.

This also makes the patch smaller and easier to review (hint, hint).

killes@www.drop.org’s picture

The patch contained the update functions twice.

killes@www.drop.org’s picture

FileSize
39.05 KB

The patch contained the update functions twice.

Dries’s picture

Got one error during the upgrade path:

ALTER TABLE {book} ADD PRIMARY KEY vid (vid)
FAILED
killes@www.drop.org’s picture

This had happend to me as well, when I tested this patch. The reason is that for some reason the vid is not unique. Most likely there are some entries with vid = 0 in there. Can you check which node types those have? it always was an error in the test database. See: http://drupal.org/node/7582#comment-20678

Dries’s picture

Actually, I got 2850 errors during the upgrade.

Some of these:
sprintf() [function.sprintf]: Too few arguments in drupal-cvs/includes/database.inc on line 154.

Some of these:
Query was empty query: in drupal-cvs/includes/database.mysql.inc on line 66.

And this:
Unknown table 'n' in field list query: SELECT n.nid, n.vid FROM node INNER JOIN files f ON n.nid = f.nid in drupal-cvs/includes/database.mysql.inc on line 66.

:-)

Dries’s picture

Or this:

user error: Unknown column 'log' in 'field list'
query: SELECT parent, weight, log FROM book WHERE nid = 1 in drupal-cvs/includes/database.mysql.inc on line 66
Dries’s picture

The time required to generate my main page went from 902 ms (before upgrade) to 2139 ms (after upgrade).

The time required to generate a forum listing (?q=forum/x) went from 1872 ms (before upgrade) to 2874 ms (after upgrade).

Maybe this is because my database is not consistent as the result of the upgrade errors (yet I don't see any errors on the pages I benchmarked).

killes@www.drop.org’s picture

FileSize
53.47 KB

Ok, let me get to this from the bottom to the top:

- my test runs indicated a different development wrt timing. If I had gotten your results, I had stopped working on this long ago. So your results are wrong for some reason.

- user error: Unknown column 'log' in 'field list'
Wasn't my day, the book patch got lost. It is contained now. First -R the old patch, then apply this one.

- Unknown table 'n' in field list query:
Walkah found this, but I forgot to fix it. Fixed now.

- I've no idea where the other queries come from. I am hoping that either your test db is borken or they are follow ups from the other ones.

If you let me have your test db, I'll try some debugging.

Thanks for wasting your time, too.

Dries’s picture

I double-checked and the numbers don't seem to lie. I'll test some more after work on another machine to make sure it is not platform-specific.

killes@www.drop.org’s picture

FileSize
54.83 KB

Ok, here I am again.

What I did:

1) Ask Dries to let me have drupal.org database
2) get 400MB of SQL inserts...
3) take 23 minutes to import said data
4) Remove all image and project nodes (don't want to install their modules), 11765 nodes left
5) back up data
6) take tests on non-cached /node page (as anonymous user).

ab results:

-c 1 -n 25:

Requests per second: 1.29 [#/sec] (mean)

Connection Times (ms)
min mean[+/-sd] median max
Total: 663 775 179.7 689 1264

7) Do the same for the tracker page:

Requests per second: 0.83 [#/sec] (mean)

Total: 1182 1199 7.4 1199 1217

8) Apply my patch (rev. 28).

9) run db update and hold breath

10) update times out...

11) play back backup from 5)

12) wait

13) getting annoyed and removing cache, watchdog, and accesslog before playing in backup.

14) wait again. Understand why Dries doesn't try this patch often. Maybe a smaller DB would do for testing?

15) wait more. get really annoyed.

16) Set time limit to 18000 in update.php

17) try again

18) fails again before the second update is completed.

19) curse.

20) delete search stuff from db. Ooops, sooo much smaller...

21) import again, below 2 minutes...

22) rewrite to use extended insert. Found a bug.

23) still does not complete. Mysql logging to the rescue!

24) tid = 0? Not good.

25) Well, the update works fine till node 10834. 5595 nodes done, 6136 to go.

26) Writing shell based update script. Discovery: 24MB aren't enough. Hopefully 64 are. Nope.
extended inserts for revisions are apparently not the brightest idea: Huge memory consumption.
Hmm, no, all updates got through. Selecting the revisions to put them into old_revisions table screwed it. Learned about CREATE TABLE old_revisions SELECT syntax.
Yay! finally. 24 MB are just not enough the update.php script seems to still break.

27) Benchmarks!

/node
Requests per second: 1.54 [#/sec] (mean)
Connection Times (ms)
min mean[+/-sd] median max
Total: 632 649 40.5 636 791

/tracker

Requests per second: 0.86 [#/sec] (mean)

Total: 1119 1165 65.8 1160 1461

Ok: So we get an improvement for many node_loads, but none for simple selects from node.
More tests can be done.

28) roll new patch
Ain't Drupal fun?

Dries’s picture

I did another round of tests on _another_ machine and it is not looking good:

                              Before upgrade        After upgrade

?q= (main page)               218 ms/request         340 ms/request
?q=forum (forum overview)     754 ms/request        1520 ms/request
?q=about (book page)          375 ms/request        5400 ms/request

The upgrade process itself gave me a number of 'query was empty' and 'sprintf(): too few arguments' reports. Everything seems to work fine though.

Looking at the ?q=about page, I see that the following query is executed twice _and_ that each time, it take more than 2 seconds to complete:

SELECT n.nid, n.title, b.parent, b.weight FROM node n INNER JOIN book b ON n.vid = b.vid WHERE n.status = 1 AND n.moderate = 0 ORDER BY b.weight, n.title; 
--8<-- snip --8<---
351 rows in set (2.39 sec)

If you investigate this some more, you see that there is no index on vid?

EXPLAIN SELECT n.nid, n.title, b.parent, b.weight FROM node n INNER JOIN book b ON n.vid = b.vid WHERE n.status = 1 AND n.moderate = 0 ORDER BY b.weight, n.title;
+-------+------+--------------------------------+------+---------+------+-------+---------------------------------+
| table | type | possible_keys                  | key  | key_len | ref  | rows  | Extra                           |
+-------+------+--------------------------------+------+---------+------+-------+---------------------------------+
| b     | ALL  | NULL                           | NULL |    NULL | NULL |   369 | Using temporary; Using filesort |
| n     | ALL  | node_moderate,node_status_type | NULL |    NULL | NULL | 15294 | Using where                     |
+-------+------+--------------------------------+------+---------+------+-------+---------------------------------+

mysql> SHOW INDEX FROM node;
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name            | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| node  |          0 | PRIMARY             |            1 | nid         | A         |       20392 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | uid                 |            1 | uid         | A         |        4078 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_title_type     |            1 | title       | A         |       20392 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_title_type     |            2 | type        | A         |       20392 |        4 | NULL   |      | BTREE      |         |
| node  |          1 | node_moderate       |            1 | moderate    | A         |           2 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_promote_status |            1 | promote     | A         |           2 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_promote_status |            2 | status      | A         |           4 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_type           |            1 | type        | A         |           8 |       10 | NULL   |      | BTREE      |         |
| node  |          1 | created             |            1 | created     | A         |       20392 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_created        |            1 | created     | A         |       20392 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_changed        |            1 | changed     | A         |       20392 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_status_type    |            1 | status      | A         |           2 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_status_type    |            2 | type        | A         |          13 |     NULL | NULL   |      | BTREE      |         |
| node  |          1 | node_status_type    |            3 | nid         | A         |       20392 |     NULL | NULL   |      | BTREE      |         |
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
14 rows in set (0.00 sec)

mysql> SHOW INDEX FROM book;
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name    | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| book  |          1 | book_parent |            1 | parent      | A         |          92 |     NULL | NULL   |      | BTREE      |         |
| book  |          1 | nid         |            1 | nid         | A         |         369 |     NULL | NULL   |      | BTREE      |         |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
2 rows in set (0.00 sec)

The book module does not appear to have a primary key? Sounds like a bad idea so I added one:

mysql> ALTER TABLE book ADD PRIMARY KEY nid (nid);
Query OK, 369 rows affected (0.02 sec)
Records: 369  Duplicates: 0  Warnings: 0

Next, I wanted to make the vid column a unique key in all node tables:

mysql> ALTER TABLE node ADD UNIQUE vid (vid);
Query OK, 20392 rows affected (0.81 sec)
Records: 20392  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE book ADD UNIQUE vid (vid);
ERROR 1062: Duplicate entry '0' for key 2

mysql> ALTER TABLE forum ADD UNIQUE vid (vid);
Query OK, 10806 rows affected (0.10 sec)
Records: 10806  Duplicates: 0  Warnings: 0

As you can see, it fails for the book table which makes me believe there is some inconsistent data ... I set out to fix that:

mysql> SELECT nid, COUNT(nid) AS vids FROM book GROUP BY vid HAVING vids > 1;
+-----+------+
| nid | vids |
+-----+------+
| 871 |    2 |
+-----+------+
1 row in set (0.00 sec)

mysql> SELECT title FROM node WHERE nid = 871;
Empty set (0.00 sec)

mysql> DELETE FROM book WHERE nid = 871;
Query OK, 1 row affected (0.00 sec)

mysql> ALTER TABLE book ADD UNIQUE vid (vid);
Query OK, 368 rows affected (0.01 sec)
Records: 368  Duplicates: 0  Warnings: 0

Looks like everything is well now. Ran some new benchmarks:

                            Before upgrade      After upgrade      With indices
?q= (main page)             218 ms/request       340 ms/request     336 ms/request
?q=forum (forum overview)   754 ms/request      1520 ms/request    1531 ms/request
?q=about (book page)        375 ms/request      5400 ms/request     475 ms/request

Unfortunately, we're still slower than the original code.

killes@www.drop.org’s picture

Dries, thanks for testing it again.

I do think the broken queries you observer have something to do with the bad performance after the update. Please log the queries any I will have a look at them. I've never seen any such queries.

My update script also tries to create the appropriate indices, but it will of course fail if the database contains cruft. the indices for the forum are probably missing, too.

I am still convinced that the patch is core worthy.

Dries’s picture

It wouldn't hurt if more people would benchmark this patch. The patch's current performance worries me.

Did you check your watchdog messages after upgrading the drupal.org database? Depending on your settings, errors might only be shown in the watchdog. I'll look into the remaining glitches as time permits.

Thanks for your persistence in keeping this patch up-to-date. :)

killes@www.drop.org’s picture

Dries: Can you please let me have your updated database? I want to have a look at it and try my own benchmarks with it.

And yes, if I did learn something on this project is how to be persistant. ;)

killes@www.drop.org’s picture

Here is an idea that occurred to me:

The problem with the upgrade process is that keeping the existing revisions requires a lot of work to do. This generates a huge amount of sql queries for a large database and also requires a huge amount of memory.

My suggestions is to let update.php only handle the basic upgrade, ie without old revisions. An additional module could be created that would implement a cron based approach to upgrading old revisions one node at a time. it could expose a hook to let contrib modules do their own upgrades.

Dries, what do you think? (I am writing "Dries" because he seems to be the only one who is interested in getting this into core...)

Junyor’s picture

Killes:

I'm also interested in seeing this hit core. What about adding something to legacy.module to do it?

chx’s picture

This is a sensible approach. Maybe this is the _only_ sensible approach.

I have a little problem though: while the conversion is running somehow both revision handlers should be available.

killes@www.drop.org’s picture

hehe, one only has to whine and bug enough and one gets some feedback. ;)

@junyor: legacy.module would be a good place. my current idea is to auto-enable it in update.php and then disable it again in legacy_cron after all nodes are updated.. ;)

@chx: When somebody wants to look at revisions of a node that node could be auto-updated.

The only problem are contrib modules: they's need to have some hook in order to update their own data. When somebody looks at the revisions of a node than cannot be updated because the contrib module in question has no such hook, we can optionally let the user discard old revisions I guess.

Dries, what do you think?

killes@www.drop.org’s picture

FileSize
53.13 KB

Sooooo; I've updated this patch once again.
Dries didn't like my idea of legacy updates so we will have an option to discard old revisions in case the update should prove difficult. Always keep a backup.

Bèr Kessels’s picture

can we please accept and commit this patch? We can iron out any issues later. This patch is just far too big and complex to be perfect-at-once.

Dries’s picture

@Ber: no, we can't commit this patch blindly.

Data loss is a much bigger problem than a syntax error or other code glitch. We can break Drupal but we can't break their data.

I've spent quite a bit of time testing the previous version of this patch and noticed significant performance degradation.

Tell me, Ber, why can't you test this patch first?

Bèr Kessels’s picture

I was not referring to not testing it. If it is just the upgrade path that proves to be cumbersome, then why can we not fix it afterwards, I.e. when everyone has had a good chance to look at it.

Testing such a huge patch requires a lot of work. something no-one just does in his spare hours. We had discussions before to deal in a slightly different way with big changes; to commit them quicker and to leave the ironing out of any left overs for the community. I hooked into that discussion here. For two reasons. One is that Gerhard has spent numerous hours on maintaining this patch. The other one is that the community can be of help with ironing out issues in such a large change, much better than that Gerhard can do on his own.
And yes, dataloss is very bad, but no-one should loose data, when he/she followed the instructions (backing up)...

Dries’s picture

Ber: applying this patch takes 15 seconds. Whether I apply this patch for you, or whether you apply it yourself, it will hardly reduce the 'testing cost'. The problem is that data loss can be subtle; it might go unnoticed for a couple days. Make no mistake, I'd like to see this patch committed ASAP, but it warrants some testing. Let's test/benchmark it and commit it.

drumm’s picture

The upgrade script already takes a long time to execute and does not provide feedback to the user about how it is doing. I plan on making the upgrade script able to spread the updates across multiple page views and give user feedback showing that progress is being made. This will hopefully make the speed of this update a moot point.

I am working on this for my dayjob, CivicSpace, so it should get done "real soon now", but it should be expected to take awhile (smallish number of weeks). Please make comprimises if necessary.

killes@www.drop.org’s picture

FileSize
48.78 KB

@drumm: the timeout isn't usually the problem. Memory consumption is. I'll look into doing the updates in chunks of 1000 nodes or so.

Clousseau found a bug in the patch, updated.

moshe weitzman’s picture

Even if the upgrade issues are resolved, we have to figure out why this patch is slowing down drupal (according to benchmarks posted here).

Dries’s picture

We can't proceed without some additional benchmarking. I already benchmarked it on two machines (see comments above), and I'll benchmark it again after August 10. The performance impact of this patch worries me so if two other people could test and benchmark this patch extensively, that would come a long way. August 10 is close to the code freeze so some help is necessary.

If Gerhard/killes reviewed/tested one of your patches, it is time to return the favor ...

Jose Reyero’s picture

After looking at the patch I'm not really surprised this slows down everything. I thought that the reason why we wanted revisions in their own table was in the first place to have a simpler -and faster- node table. But this patch:
- Adds fields and complexity not only to the node table but also to all the main tables for data
- Needs additional joins just to retrieve single nodes (which is done many times for a typical main page)
- Does away with encapsulation of the revision functionality requiring other modules to handle revissions related data.

So please, please, please:

- Make revisions table only needed when actually using revisions
- Keep that old nice thing of hiding revisions system from other modules
- And what's so bad with serializing data anyway? I agree that usually there are better ways to store data. But this is one of the cases where serialization does make sense. Not the current huge serialized field, but only a new table with one serialized node per row.
- In general, do it much simpler. This is way too complex and it removes more functionality than it adds.

In just four words: keep it simple, please.

killes@www.drop.org’s picture

FileSize
51.27 KB

Patch updated (v31 was buggy, and there was an update to updates.inc). I've had a discussion with Jose and he will maybe do some testing over the weekend. John is also doing some testing as we speak.

Jerk Face’s picture

It has been exactly one year since I first submitted this patch.

You know what that means: old patch party in IRC! Virtual beers on me!

w00t!
Jose Reyero’s picture

Hi again.
I had a talk with killes -I have to say he enlightned me about this- and also did some testing. Sorry but I dont have a good set up to provide benchmarks.

Here are my conclusions -and also some explanation for the varied results abt performance:

- This is faster for simple node listings
- This is slower for full node/load
....thus the performance increase/decrease for a given page should depend on the relation between the number of simple node listings (usually blocks) and the number of nodes displayed in the main page.

I've also experienced some trouble with attachments -not re-created when creating a new revision- and book relations, same.

I'm not sure I like this new "feature" of creating a new revision every time you rollback an old one -in Drupal 4.6 this seems not to happen.

Besides these details, I think this patch is good enough -and quite a powerful approach actually- though I still feel it's bit too complex when simply dealing with nodes -without revisions-, which is what we do most of the time, and complexity will be even higher with contrib modules using still more tables, not to mention flexinode -versioning info will have to be scattered everywhere in the db.

So this is not a +1 nor -1. On one side I think this is powerful. On the other side I feel this is too complex and really, it's more than I need.

I would be happy with a 'revision' table having: nid, vid, data(serialized), that you can forget about when not using revisions.

Jerk Face’s picture

On a much more serious note that my last comment, unless I'm missing something, you've introduced a subtle bug into this patch since I left it.

A and B are both updating revision 5 of a node.

A calls db_next_id() and gets a revision id of 6
B calls db_next_id() and gets a revision id of 7
B updates the node's revision to 7
B adds revision 7 to the revision table
A updates the node's revision to 6
A adds revision 6 to the revision table

Now, you have a situation where the current revision (6) isn't actually the last revision(7). For some uses, that's not a big deal, but in the case of wikis, which is what I originally hoped to use this patch for, it creates problems.

killes@www.drop.org’s picture

FileSize
51.27 KB

Updated the patch. updates.inc was changed and chx found a typo.

@Nick: I don't think the issue is due to this patch. Drupal has currently no mechanism to not let people commit a new version of the same node, whether with or without revisions. drupal_next_id ensures that the revision IDs are unique and the later revisions will be the current one. Which revision is the current one does not depend on order of the version ID, btw.

@Jose: what exactly were your problems with uploads? what do I need to do to reproduce?

Jose Reyero’s picture

Sorry, I dont know what I did the other day but I cannot reproduce it either :-)

I applied this new patch. Problems I'm having now (Testing with "stories")

- Book outline is gone after creating a new revision
- I cannot delete attachments (not sure it is related to your patch or to something else)
- Usage of the log message: Is that mensage intended for revisions or only for book outlines? Currently the field is displayed only for outlines but shown in the revision tab

Btw, the db update failed for me in patch 32, so I did it manually. I didnt do a db update again after rolling back v32 and applying v33 -should I?

killes@www.drop.org’s picture

Jose, thanks a lot for testing!

"- Book outline is gone after creating a new revision"

Sounds like a bug, it should not.

"- I cannot delete attachments (not sure it is related to your patch or to something else)"

Another bug

"- Usage of the log message: Is that mensage intended for revisions or only for book outlines? Currently the field is displayed only for outlines but shown in the revision tab"

Log messages are possible for nodes of type book and page as well as all nodes that are put into the book. Log messages can be added for any node type, the module author should implement those.

"Btw, the db update failed for me in patch 32, so I did it manually. I didnt do a db update again after rolling back v32 and applying v33 -should I?"

No, the db update should be the same.

I will try to fix the bugs you found.

WRT the db update I've got an announcement to make: I will not migrate old revisions.

Why?

1) it is a rather complicated process that requires a lot of memory and is likely to give people on cheap hosting trouble.
2) testing it takes a lot of time.
3) I don't personally need it

I will still move old revisions to an old_revisions table from where they can be converted by a later update or a custom script.

killes@www.drop.org’s picture

FileSize
52.92 KB

Ok, here is incarnation No. 34 of this patch. Dries agrees with me that we can put the revision rescue operation into a later update, possible using Drumm's installer.

The patch fixes the issue with the file uploads not being deletable, I could not reproduce the problem with the outlines.

killes@www.drop.org’s picture

FileSize
4.39 KB

I've started to do some performance testing.

I created a script that creates nodes that have revisions according to the new system. A node has a 50% chance to have a revision and then an equal chance fro between 1 and 25 revisions. I've created a database with 50 nodes and 500 comments, 7 vocabs and 100 total taxo terms.

I've also created a script to convert these revisions to the old format (much easier than the other way around). Initial tests didn't show a significant change for /node. I'll try with more nodes and /tracker. Attached are the two scripts in a tarball. They need to be run against the patched install.

Also included are the two preliminary results.

killes@www.drop.org’s picture

I've now created a databse which has 1000 nodes and each node has 50 or 51 revisions. I have no taxo terms assigned and no comments (those parts didn't change). I am only using stories and pages. I am now fairly confident that this patch is a performance improvement for this kind of setup.

for the new patch on the /node page I get

Requests per second: 0.27 [#/sec] (mean)
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 3592 3648 56.9 3640 3985
Waiting: 3573 3626 55.1 3620 3949
Total: 3592 3648 56.9 3640 3985

for the old one I get:

Requests per second: 0.23 [#/sec] (mean)
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 4247 4297 64.0 4286 4657
Waiting: 4156 4263 65.3 4253 4624
Total: 4247 4297 64.0 4286 4657

For the tracker the results are even better:

old:
Requests per second: 2.00 [#/sec] (mean)
new:
Requests per second: 3.64 [#/sec] (mean)

Anybody interested in obtaining this database can mail me. It is 60MB (gzipped).

killes@www.drop.org’s picture

Ok, I've now deleted the revisions and only kept the current one. The initial tests showed that the performance of the patch was miserable. then I remembered that I had shreddered the revisions table and did a mysql optimization.

Now the results are as follows:

/node:

Requests per second: 3.19 [#/sec] (mean)

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 275 312 44.3 290 499
Waiting: 255 291 44.1 268 479
Total: 275 312 44.3 290 499

Requests per second: 3.18 [#/sec] (mean)
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 274 314 42.4 292 407
Waiting: 255 294 42.0 273 388
Total: 274 314 42.4 292 407

The lower one is the new system, there is no difference considering statistics.

/tracker
old:
Requests per second: 3.72 [#/sec] (mean)

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 238 268 33.8 249 332
Waiting: 219 247 33.3 230 313
Total: 238 268 33.8 249 332

new:
Requests per second: 3.77 [#/sec] (mean)

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 240 264 29.8 250 324
Waiting: 221 244 29.0 229 305
Total: 240 264 29.8 250 324

Same applies, no real difference.

So, as soon as somebody tests the upgrade path the patch should be committed.

killes@www.drop.org’s picture

FileSize
55.51 KB

Ok, I coudl hear them in advance "1000 nodes without revisions are not sufficient". Well, here are results for 30000 nodes without revisions.

/node
new:
Requests per second: 0.45 [#/sec] (mean)

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 2162 2202 58.9 2188 2485
Waiting: 2143 2182 54.1 2169 2413
Total: 2162 2202 58.9 2188 2485

old:
Requests per second: 0.43 [#/sec] (mean)

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 2319 2346 77.6 2332 2761
Waiting: 2223 2324 79.0 2311 2742
Total: 2319 2346 77.6 2332 2761

One could argue that the new approach s slightly faster. I'd like to see mroe results to believe it, but it is pretty clear it isn't slower.

/tracker
new:
Requests per second: 0.41 [#/sec] (mean)
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 2330 2418 116.9 2403 3194
Waiting: 2302 2397 117.7 2382 3175
Total: 2330 2418 116.9 2403 3194

old:
Requests per second: 0.39 [#/sec] (mean)

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 2496 2548 50.1 2536 2841
Waiting: 2477 2528 50.0 2518 2822
Total: 2496 2548 50.1 2536 2841

Statistically insignificant differences.

All those tests where done with ab and as user No 1.
model name : Intel(R) Celeron(R) CPU 2.40GHz
512MB RAM

Besides the differences for revisions the change in served pages with different configurations is interesting.

A new version is attached. node_save now returns the complete node (not only the nid).

killes@www.drop.org’s picture

FileSize
55.66 KB

Ok, here is version 36 which incorporates today's changes in cvs. It also changes the upgrade path. I was trying my old solution and had to up my php mem limit to 80 MB in order to get it to run through. I hope this one is better but it needs testing.

killes@www.drop.org’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
52.29 KB

Ok, I've now moved _all_ of the heavy lifting to mysql. pgsql need to catch up. No problems with either time out or memory limit anymore. Since pgsql plays catch-up all the time, I set it to "ready to be committed".

Stefan Nagtegaal’s picture

I could apply the patch on my localhost, and did not seen any errors nor disfunctioning drupal features while creating, revising, editing or deleting nodes..

So, +1 for this patch qua implementation it makes sense. Unfortunatly I chaven't got a clue howto benchmark, so that's for another person.

@Gerhard:
I really think you are a great person by rewriting this patch over and over again, untill Dries decides it is good enough to get in to the trunk.
I doubt about the fact if there are more persons in our community who are like you.. People who would revise their patch 37 times, and even after that still open for discussion..

Gerhard, I think you are a great example of how a real coder should do it..
I think we can all learn from you..

killes@www.drop.org’s picture

FileSize
52.26 KB

Karoly pointed out that node_save(&$node) is nicer.

@Steef: nice that you write that, but a _really_ good coder would only have needed one revision. ;p

Dries’s picture

function book_update($node) {
-  db_query("UPDATE {book} SET parent = %d, weight = %d, log = '%s' WHERE nid = %d", $node->parent, $node->weight, $node->log, $node->nid);
+  if ($node->is_new || $node->revision) {
+    db_query("INSERT INTO {book} (nid, vid, parent, weight) VALUES (%d, %d, %d, %d, '%s')", $node->nid, $node->vid, $node->parent, $node->weight);
+  }
+  else {
+    db_query("UPDATE {book} SET parent = %d, weight = %d WHERE vid = %d", $node->parent, $node->weight, $node->vid);
+  }
 }

Why 'is_new' and 'revision'? Isn't 'is_new' redundant?

I loaded a fresh copy of the drupal.org database on my test system. How do I test this? Where is the upgrade script/path?

killes@www.drop.org’s picture

FileSize
52.25 KB

is_new is indeed redundant since it is handeled in _update. New patch attached.

The upgrate is as usual in updates.inc and accessed through update.php.

jakeg’s picture

This is probably a little off topic but I can't find info elsewhere and this may affect revisions changes.

It appears as though all revisions of a node are accessible to all users (inc. anonymous users) using ?revision=x. Surely only the 'current' revision should be accessible by default, and there should be a permission that can be assigned to roles to view non-current revisions?

I don't want regular users looking at non-current revisions! A revision may have been made for various private reasons. Only admin should be able to do that.

Dries’s picture

On the latest snapshopt of the drupal.org database I get this:

ALTER TABLE {book} ADD PRIMARY KEY vid (vid)
FAILED
ALTER TABLE {forum} ADD PRIMARY KEY vid (vid)
FAILED
ALTER TABLE {node_revisions} ADD KEY nid (nid)
FAILED
ALTER TABLE {node_revisions} ADD KEY uid (uid)
FAILED
ALTER TABLE {node_revisions} ADD PRIMARY KEY vid (vid)
FAILED
CREATE TABLE {old_revisions} SELECT nid, revisions FROM {node} WHERE revisions != ''
OK
ALTER TABLE {book} DROP log
OK
ALTER TABLE {node} DROP teaser
OK
ALTER TABLE {node} DROP body
OK
ALTER TABLE {node} DROP format
OK
ALTER TABLE {node} DROP revisions
OK
killes@www.drop.org’s picture

That's strange, it worked for me with an older version of the same db. Anyway, I've re-rolled the patch and rearranged the update queries a bit. Also everything is now in one update, thanks to Neil for pointing that out.

killes@www.drop.org’s picture

FileSize
52.22 KB

and the patch.

killes@www.drop.org’s picture

FileSize
52.4 KB

I've tested the upgrade path on an older copy of the drupal.org db and found and squished two bugs. The test cycle can be shortened a bit by deleting non-essential db tables before importing (cvs, history, project_*, ...)

Cvbge’s picture

FileSize
1.14 KB

Hi,

tested the patch a bit:

1. in database.pgsql in the table files
a) after removing PRIMARY KEY (fid) you need to remove also "," from the line above
b) you CREATE INDEX files_fid_idx ON node(fid); CREATE INDEX files_vid_idx ON node(vid); on wrong tables: not node but files

2. in newly created table node_revisions
a) in mysql version there is no DEFAULT, in postgres there is DEFAUL 0, maybe mysql version should be the same as postgres?
b) there's no log longtext NOT NULL default '' in pgsql version
c) you have translated format int(4) NOT NULL default '0' to format smallint NOT NULL default '0', it also should be integer and not smallint
d) you create index: CREATE INDEX node_revisions_title_idx ON node_revisions(title,type);, but there is no type column in the table, and also there is no similar index in mysql version

Attached patch should fix the errors

Cvbge’s picture

pgsql needs a sequence to be created, fixed in this patch.

An error when trying to view an old revision, e.g. in ?q=node/3/revision/1:

warning: pg_query(): Query failed: ERROR:  column "c.format" must appear in the GROUP BY clause or be used in an aggregate function in /var/www/dt/d/includes/database.pgsql.inc on line 69.

user error: 
query: SELECT c.cid, c.pid, c.nid, c.subject, c.comment, c.format, c.timestamp, c.name, c.mail, c.homepage, u.uid, u.name AS registered_name, u.picture, u.data, c.score, c.users FROM comments c INNER JOIN users u ON c.uid = u.uid WHERE c.cid = 0 AND c.status = 0 GROUP BY c.cid, c.pid, c.nid, c.subject, c.comment, c.timestamp, c.name, c.mail, u.picture, c.homepage, u.uid, u.name, u.picture, u.data, c.score, c.users in /var/www/dt/d/includes/database.pgsql.inc on line 86.

It's comment.module line 796, adding c.format to group by statement fixed this problem.

That's all for now...

Cvbge’s picture

FileSize
1.24 KB

Forgot to attach the patch for sequence

Dries’s picture

Cvbge: you attached the same patch twice?

Cvbge’s picture

Nope, the last one have CREATE SEQUENCE statement.

killes@www.drop.org’s picture

FileSize
54 KB

Thanks, i've included the pgsql fixes. Could you look at the upgrade path, too?

killes@www.drop.org’s picture

FileSize
54.49 KB

Ok, this updated patch fixes the last buglet discovered by Dries. It also adds the node type to the old revisions table. there is an incomplete attempt at an pgsql update based on feedback from our new pgsql maintainer. .)

Cvbge’s picture

Here's my idea on how to support both mysql and pgsql databases. db_add_column() and db_change_column() come from http://drupal.org/node/29082. I've tested them then and they worked well. Now also added 2-3 new functions that are not yet tested, but are quite simple so they should work more or less. I can test it tommorow if I know this has any chanse getting in...

The patch is against HEAD+_43.

hunmonk’s picture

killes: as you requested, tested this patch on the most recent version of HEAD, with mysqld in ANSI mode and about 300 nodes (none of which had any old revisions prior to the patch and update). update script worked flawlessly and i encountered no problems w/ the site after installation. i also created a few new revisions--the revisions tab works properly, displaying only when there are new revisions for a node. the 'view' and 'set active' operations in the revisions tab also worked perfectly. let me know if there's anything else i can do to help!

Cvbge’s picture

A question: in database.* schema the table node_revision has PRIMARY KEY (nid, vid). But in updates.inc you do ADD PRIMARY KEY (nid). Does that mean that the table will have only nid as primary key?

Cvbge’s picture

It looks like after CREATE TABLE ... SELECT column attributes like NOT NULL and DEFAUL are lost. If that was not intentional I suggest to use CREATE TABLE + INSERT INTO ... SELECT.

Cvbge’s picture

Sorry, I was wrong in my last post.

Cvbge’s picture

Updated version on my patch, now it's complete for mysql, and almost complete for postgresql.

I've tested this with a simple script that is shown below - update_146() with my patch executes *exactly* the same sql statesments that original (ok, I have not tested the line that depends on db_fetch_object).

The script to show what sql statesments are executed in update_146():

include "database/updates.inc";
include "includes/common.inc";
function drupal_pg_version() { return "704"; }
function db_next_id($blah) { return "false vid"; }
function db_fetch_object($blah) { return FALSE; }
function db_query($query) {
  $args = func_get_args();
//   $query = db_prefix_tables($query); // Removed for debug version
  if (count($args) > 1) {
    if (is_array($args[1])) {
      $args = array_merge(array($query), $args[1]);
    }
    $args = array_map('db_escape_string', $args);
    $args[0] = $query;
    $query = call_user_func_array('sprintf', $args);
  }
  return _db_query($query);
}
function _db_query($query) { echo "$query\n"; }
$db_type = 'mysql';

update_146();
Dries’s picture

Status: Reviewed & tested by the community » Needs work

Please provide a patch that does not introduce new database functions/APIs.

Cvbge’s picture

Why?
Instead and writing the same code again and again it's better to write functions.

Anyway, I don't think patch will be pgsql compatibile before freeze, so if mysql part is ok then commit it.

ax’s picture

the committed revisions patch (revisions_43.patch) doesn't update blog.module. now blog feeds throw an error

user error: Unknown column 'n.teaser' in 'field list'
query: SELECT n.nid, n.title, n.teaser, n.created, u.name, u.uid FROM node n INNER JOIN users u ON n.uid = u.uid WHERE n.type = 'blog' AND u.uid = 1 AND n.status = 1 ORDER BY n.created DESC LIMIT 0, 15

there are 2 occurrences of "$result = db_query_range(db_rewrite_sql("SELECT n.nid, n.title, n.teaser,...)" in blog.module which i guess have to be fixed. i would much appreciate if someone familiar with the revision changes could look at this (i had half an hour and couldn't get it). thanks a lot!

killes@www.drop.org’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
1.88 KB

ax: It is really not that difficult, see patch. It must have been lost between rev 15 and 35 or so. ;)

Dries: Please close this issue after submitting the patch, Cvbge will make a new issue for the pgsql upgrade path.
this issue has close to 150 updates and is a bit too long to handle.

Oh, and btw: Why are blog titles not translated and why don't we cached RSS feeds?

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to HEAD. Thanks.

Cvbge’s picture

Postgres' db_* stuff moved back to http://drupal.org/node/29082

Anonymous’s picture

Anonymous’s picture

Status: Fixed » Closed (fixed)