It seems the {sequences} table is used only for MySQL. Looking at the code for the PostgreSQL api:
/**
* Return a new unique ID in the given sequence.
*
* For compatibility reasons, Drupal does not use auto-numbered fields in its
* database tables. Instead, this function is used to return a new unique ID
* of the type requested. If necessary, a new sequence with the given name
* will be created.
*/
function db_next_id($name) {
$id = db_result(db_query("SELECT nextval('%s_seq')", db_prefix_tables($name)));
return $id;
}
whereas the MySQL api looks like this:
/**
* Return a new unique ID in the given sequence.
*
* For compatibility reasons, Drupal does not use auto-numbered fields in its
* database tables. Instead, this function is used to return a new unique ID
* of the type requested. If necessary, a new sequence with the given name
* will be created.
*/
function db_next_id($name) {
$name = db_prefix_tables($name);
db_query('LOCK TABLES {sequences} WRITE');
$id = db_result(db_query("SELECT id FROM {sequences} WHERE name = '%s'", $name)) + 1;
db_query("REPLACE INTO {sequences} VALUES ('%s', %d)", $name, $id);
db_query('UNLOCK TABLES');
return $id;
}
1) The description of the PostgreSQL function is wrong, because in fact, it is supported with no need to use the {sequences} table.
2) To get rid of the {sequences} table once and for all, the function could look like:
/**
* MySQL version
* The argument is ignored here, it returns a value corresponding to the lastest insert stmt.
*/
function db_next_id($name = '') {
global $active_db;
return @mysql_insert_id($active_db);
}
/**
* PostgreSQL version
*/
function db_next_id($name) {
$id = db_result(db_query("SELECT currval('%s_seq')", db_prefix_tables($name)));
return $id;
}
However, the usage would be different. With that change one has to execute the INSERT statement and then invoke db_next_id() to get the value.
The only problem is that all scripts that currently use db_next_id() would have to be adapted, but the rest are benefits, even performance improvements, I believe.
Thoughts?
Comment | File | Size | Author |
---|---|---|---|
#87 | db_last_insert_id_11.patch | 28.52 KB | chx |
#78 | db_last_insert_id_10.patch | 5.33 KB | chx |
#76 | db_last_insert_id_9.patch | 28.64 KB | asimmonds |
#73 | db_last_insert_id_8.patch | 27.09 KB | chx |
#66 | db_last_insert_id_7.patch | 24.9 KB | kbahey |
Comments
Comment #1
markus_petrux CreditAttribution: markus_petrux commentedChanging the meaning of the function could be somehow dangerous if a contrib module has not been adapted.
Maybe it would be wiser to rename the function to something like
db_insert_id()
. Since there are many other things that contrib module authors should do make their modules compatible with 4.7, it seems a good oportunity to try to implement this change as well.Comment #2
Cvbge CreditAttribution: Cvbge commentedHello,
is this really 'critical' issue? Lowered it to normal.
We could use mysql_insert_id() and currval(), but there are some difficulties:
We also could have another db_last_insert_id() "macro" which could be used e.g. in this way:
and db_last_insert_id() would return
LAST_INSERT_ID()
/lastval()
for mysql/pgsql.That would be usefull, because new db_insert_id() for postgresql would still have to do a select('currval()') which is slower...
This is post-4.7 probably...
Usefull links:
http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html
http://www.postgresql.org/docs/current/static/functions-sequence.html
http://pl.php.net/mysql_insert_id
Comment #3
markus_petrux CreditAttribution: markus_petrux commentedIf that helps, the method I pointed to is used for phpBB's DBAL.
I believe it would at least avoid having the {sequences} table and that lock/replace/unlock for MySQL.
As per performance, in PostgreSQL, if you use currval() before inserting, probably the INSERT will run faster, as there are chances that the same record is still in cache.
Comment #4
Cvbge CreditAttribution: Cvbge commentedCould you elaborate?
Comment #5
markus_petrux CreditAttribution: markus_petrux commentedOpps, sorry. In my last sentence I meant the oposite.
As per performance, in PostgreSQL, if you use currval() after inserting, there are chances that the record is still in cache, which was updated by the previous INSERT.
Something else to consider with current method, for postgresql, is the db_next_id() doesn't lock the record/table. It might happen that by the time the INSERT is done, another process incremented the sequence (race condition), so that would generate a dup key error.
Comment #6
Cvbge CreditAttribution: Cvbge commentedComparing
$id = select nextval(); insert ($id)
(current code) withinsert (); $id = select currval()
(proposed db_insert_id()) I think they will have the same speed.The db_insert_last_id() comes from the fact that with even with new db_insert_id() you'd have to do
insert(); $id = select currval(); insert($id)
. With db_insert_last_id() you could doinsert(); insert(lastval())
.Each insert and select is 1 db_query() call. So with db_insert_last_id() you could do 2 db_query() calls instead of 3. That's not much, and we won't have any noticable gain, but since we're using auto increments we could use all of it's features.
Hm did you mean postgresql, or mysql? Because postgresql's sequences does not need locking and have no race conditions.
Comment #7
markus_petrux CreditAttribution: markus_petrux commentedI agree there might be little difference with PostgreSQL. I should say I've been unfortunate with my comments here. I do not have much experience with postgresql, only used a couple of times. I was just thinking about possible issues...
The difference seems to be more clear with MySQL, where the {sequences} table is used. If there was a db_insert_id (ie. to call after INSERT statements), there would be no need for {sequences} table (ie. no need for that lock table, select, replace, unlock table, just using mysql_insert_id).
If using db_next_id (current code), in MySQL, is there a reason to use auto_increment in table definitions where the {sequences} table is used too?
db_insert_id() is an approach that can be ported to PostgreSQL (using currval instead of nextval) and other SQL engines.
Since Drupal 4.7 already breaks all 4.6 modules, I believe it is a good oportunity to think about possible/better alternatives to the {sequences} table. It is just that.
Comment #8
chx CreditAttribution: chx commentedmarkus, rest assured 4.8 (or 5.0) will break things all over. this should not happen for 4.7, too big change. imagine the number of bugs introduced by changing all queries and logic related to this.
Comment #9
markus_petrux CreditAttribution: markus_petrux commentedYes, that's a real possibility.
hmm... I know there's no roadmap in Drupal, but is there a TODO list or something similar?
Comment #10
Cvbge CreditAttribution: Cvbge commentedNo, I see no reason. In fact you can't use both, you need to stick with one way.
I agree about postgresql. I don't know about sqlite or oracle.
I just think that if we keep introducing new features we'll never release 4.7. See recent cry on -devel list ;)
Comment #11
Cvbge CreditAttribution: Cvbge commentedConcurrent editing is bad ;)
Comment #12
markus_petrux CreditAttribution: markus_petrux commentedlol
Look at ADOdb or phpBB DBAL. db_insert_id() would also be easily portable to MSSQL and others. ;)
True
Comment #13
chx CreditAttribution: chx commentedhttp://sqlite.org/capi3ref.html#sqlite3_last_insert_rowid SQLite is OK.
Comment #14
markus_petrux CreditAttribution: markus_petrux commentedSQLite: Call sqlite_last_insert_rowid after an INSERT
http://www.php.net/sqlite_last_insert_rowid
MSSQL: Query for SCOPE_INDENTITY after an INSERT
http://www.databasejournal.com/features/mssql/article.php/10894_3307541_2
Oracle: SELECT sequence_id.currval FROM DUAL
http://www.oracle-base.com/articles/8i/AutoNumber.php
http://www.oracle.com/technology/pub/articles/deployphp/lim_deployphp.html
Comment #15
markus_petrux CreditAttribution: markus_petrux commentedReading the development mailing list (Khalid's comment to the "Can someone confirm -- Minimum MySQL Privileges?" thread) made me remind of this issue.
Well, this issue is somehow related, as if there was a db_insert_id() function, there would be no need to use the sequences table, hence no reason to use LOCK TABLE.
Comment #16
Cvbge CreditAttribution: Cvbge commentedIt's needed by cache_set() and variable_set().
Comment #17
chx CreditAttribution: chx commentedI have something for 4.7 which keeps db_next_id but uses mysql_insert_id for mysql. http://drupal.org/node/55516
Comment #18
m3avrck CreditAttribution: m3avrck commentedAny new updates on this? Sounds like a fairly straightforward plan to me, even if a few modules break, the benefits far outweigh the breakage.
Comment #19
moshe weitzman CreditAttribution: moshe weitzman commentedComment #20
quicksketchJust putting into 6.x and giving a bump.
Comment #21
m3avrck CreditAttribution: m3avrck commentedThere is an entirely different concept being thrown around by Souvent22 and myself: http://earnestberry.com/node/13
Comment #22
sammys CreditAttribution: sammys commentedI wholly support stopping use of the sequences table (yes this is a bottleneck). +1 for that!
+1 to changing db_next_id() to db_insert_id() and flipping the statement ordering of the insert query and call to db_insert_id().
Doing this will also avoid incrementing IDs if the insert query fails for some reason.
--
Sammy Spets
Synerger
http://synerger.com
Comment #23
chx CreditAttribution: chx commentedAll attempts to fix db_next_id failed, so get rid of it. there are too icky parts: batch -- but that's long running anyways, so one more query does not matter. node_save on insert can't insert with the relevant nid into node_revisions however that's fixed soon after and we rely anyways on all queries executing so I am not too concerned on node_revisions.nid being 0 for a very short while -- there was always a gap because node_save was not a transaction nor did it lock. Now the gap is a bit more wide.
Comment #24
chx CreditAttribution: chx commentedWhy now? First not earlier, because earlier we did not have subselects and schema API which standardized serial / auto_increment. Later is not an option either because db_next_id is broken currently.
Comment #25
chx CreditAttribution: chx commentedRelevant devel post http://lists.drupal.org/pipermail/development/2005-December/012108.html
Comment #26
chx CreditAttribution: chx commentedDavid Strauss catched a copy-paste error in form.inc
Comment #27
David StraussAside from an issue I told chx about in IRC that he's fixing right now, the patch needs to update all the *.schema files to make the columns autoincrement/serial. This solution should be portable to MSSQL. Does anyone know about Oracle or DB2?
Comment #28
dwwsay you've got a bunch of your own tables, and you need to update a nid in all of them when you add a new node (this happens all the time in project*, for example, where you've got {node} and {node_revisions}, but then a bunch of project* specific tables for all the project-specific fields and data). how are you sure that db_last_insert_id() gives you the id of the last node you inserted? imagine:
i think i ran into exactly this problem with the cvs access control scripts or something, and there was no good way to ensure this worked right without adding another id to the {sequences} table, so each thread could safely get its own id at the start of its operations, and then consistently use the same id, regardless of who was actually updating which tables when.
seems like now, everyone has to wrap the insert and the call to db_last_insert_id() in a transaction or otherwise lock the DB, to prevent someone else from inserting something before you discover your id. your patch doesn't do this anywhere. so, unless i completely misunderstand what's going on (which is quite likely), i think this patch would introduce a bunch of evil race conditions in core, and generally make it harder for developers to get this right.
in short: big ol' -1 on removing the {sequences} table until these issues are clarified or fixed.
Comment #29
David Strauss@dww Don't worry about it. LAST_INSERT_ID is connection (and hence thread) specific.
Comment #30
add1sun CreditAttribution: add1sun commentedFrom a functionality review the only thing I'm having issues with it comments.
When creating a comment I am getting this error with the patch, which goes away when the patch is reversed:
I went through users, taxonomy, aggregator, contact, blocks, nodes and things seemed to be working fine.
Comment #31
chx CreditAttribution: chx commentedComments fixed. David, the schema already provides serial for us because postgresql needs them.
Comment #32
David StraussI'd like to see benchmarks for this patch versus the old db_next_id() code. kbahley said in a related issue that InnoDB does a table lock to implement autoincrement, so increasing use of autoincrement might not help versus the old db_next_id() table-locking code.
Comment #33
dww@david strauss: yay, the missing bit of completely undocumented, non-obvious wisdom that makes everything work. ;) thanks for clarifying... i knew i must have been wrong, i just wasn't sure why. ;)
@chx: therefore, please just add some doxygen to db_last_insert_id() so others know about this. not only will it help developers know they can use this function safely, it'll help remind people adding support for other DBs that they better make sure their version provides the same semantics we're assuming.
@all: thanks!
$dww_vote += 2;
Comment #34
add1sun CreditAttribution: add1sun commentedyup, comments fixed now. looks good
Comment #35
add1sun CreditAttribution: add1sun commentedno more broken bits. chx said to rtbc
Comment #36
chx CreditAttribution: chx commentedYes. Because there is no need for an update path, system_update_6019 already solved that for us. yay.
Comment #37
dwwdarn busy threads. my last comment didn't mean to set to "needs work"... sorry about that.
Comment #38
David Strauss@dww The archaic runes documenting LAST_INSERT_ID are carefully obscured from the eyes of mere mortals on the "How to Get the Unique ID for the Last Inserted Row" MySQL manual page:
Comment #39
dww@david strauss: i just meant "undocumented" in the sense of not documented anywhere in the drupal code or in this patch. that said, thanks for the pointer to the mysql docs on this.
Comment #40
recidive CreditAttribution: recidive commentedGreat patch!
Question: why not just
db_insert_id()
rather thandb_last_insert_id()
?Comment #41
bjaspan CreditAttribution: bjaspan commentedIt is late and I'm very tired, so I'm probably missing something. db_last_insert_id() for pgsql takes a $name argument and db_last_insert_id() for mysql does not. Is it just that mysql never needs a name because there is only a single last-id per connection?
Even so, the mysql function ought to require the same parameters as the pgsql function (a) for consistency and (b) so that module developers do not break pgsql sites by concluding that if it works without an arg for mysql, that must be okay.
I haven't reviewed the patch any further than this. I seem to recall while adding type 'serial' support to Schema API that at one point I thought, "Ah, yes, *this* is why db_next_id() is necessary and always using auto-increment is not sufficient." But I don't remember why.
Comment #42
kbahey CreditAttribution: kbahey commented@David Strauss, kbahley? The last person who mispelled my nick had his insurance rates go up ... ;-)
What is the risk of this patch creating a lot of "ghost" rows in the database tables? Can it insert incomplete stuff and then the user abandons the operation, and these are left in the database causing confusion for modules and the like?
Comment #43
David Strauss@bjaspan I expressed the exact same sentiments in conversation with chx earlier this evening.
Comment #44
dww@bjaspan and david -- i agree. therefore, this still needs work before it's RTBC.
Comment #45
joshk CreditAttribution: joshk commentedI fear this. Why, you might ask? Because the sequences table has saved my ass on numerous occasions.
I realize that a smart DBA can determine all this information from her database, and even set gaps in ids or mass-update auto_increments to help with content migrations, but it's still very nice to have a central place where all my important drupal ids live. I would be sorry to see it go.
However, if it's already gone from postgres, well, maybe it's just time. (sniff).
Comment #46
chx CreditAttribution: chx commentedDoxygen'd the new function, made a better db_last_insert_id for postgresql 8.1+ and used better parameters for db_last_insert_id and made them mandatory.
Comment #47
Dries CreditAttribution: Dries commentedVery interesting patch, and an incredible amount of overnight activity in this issue. What did you guys have for dinner? :)
David mentioned he'd like to understand the performance implications, and I tend to agree. It would be nice to understand this a bit better.
From http://www.mysqlperformanceblog.com/2007/05/09/mysql-users-conference-in... :
I don't think it is a show-stopper though. In fact, I think we might have a winner. :)
Comment #48
chx CreditAttribution: chx commentedEven better postgresql code but it mandates one more argument. Noted that db_last_insert_id is thread safe.
Comment #49
chx CreditAttribution: chx commentedThat innodb stuff is only relevant if you have a transaction or an extended insert. We have neither. If someone has postgresql below 8.1 and changed the default DB configuration to not have OIDs will have to do some magic with his database to get this working. Not my problem, though.
Comment #50
chx CreditAttribution: chx commentedFurther Googling and sammys' tests show that currval is per connection. So I went back to the currval based solution because it's thread safe and a hell lot simpler.
Comment #51
chx CreditAttribution: chx commentedRemoved lastval as it's not needed -- no oids, no version trouble.
Comment #52
sammys CreditAttribution: sammys commentedTests confirmed that currval operates per connection. FYI: Used two DB shell sessions and a table with a serial field. Inserted a record into the table from each session then called currval. Value returned by the session with the first insertion remained the same after the insertion in the other session. QED pffft
--
Sammy Spets
Synerger
http://synerger.com
Comment #53
Owen Barton CreditAttribution: Owen Barton commentedSubscribing
Comment #54
quicksketch@chx Why the naming of the function as db_last_insert_id() instead of db_insert_id()?
While db_last_insert_id() is used in the SQL, db_insert_id() would make it inline with php functions mysql_insert_id() and mysqli_insert_id().
Comment #55
David Strauss@chx InnoDB treats each statement as an individual transaction unless you're in a larger transaction. (This is why the mode is called "autocommit.") So, to say it's "only relevant if you have a transaction or extended insert" is true but misleading.
Comment #56
David Strauss@quicksketch It's called mysql_insert_id in the MySQL library because that's the only way to get an ID in MySQL -- after insert. Other databases let you get it before, so it's ambiguous to just call it db_insert_id when you support databases like PostgreSQL.
Comment #57
quicksketchThanks for the explaination David. Sounds like a winner to me.
Comment #58
quicksketchWorks awesome. The patch needed a db_last_insert_id() moved from upload.module to file.inc to keep up with the file-system overhaul that went in last night. We're ready to roll with this one.
Comment #59
Crell CreditAttribution: Crell commentedOh the things one misses when one goes out of town for a few days...
My concern with dropping sequences in favor of auto_increment is two fold. One, all the DB guys I talk to keep insisting that sequences are more flexible than auto-increment. One could easily point out that while we can emulate auto-increment on sequence-based databases, one can emulate sequences on auto-increment-based databases, too. I do not claim to be an expert on the subject, however.
Two, one of my goals for Drupal 7 at this point is to finish the PDO driver I started but had to back-burner, and eventually we should move to an all-PDO backend for better abstraction, flexibility, and security. That leads to this http://us.php.net/manual/en/function.PDO-lastInsertId.php . Specifically, lastInsertId only sometimes works. There's no indication that I see on which PDO drivers support auto-increment. (I'd presume the MySQL one does, but I don't know about the others.)
Can anyone sooth my concern that switching Drupal 6 to auto-increment instead of sequences won't make a PDO driver for Drupal 7 that much more painful?
Comment #60
chx CreditAttribution: chx commentedI can't claim to know PDO but look at the pgsql code, it uses sequences still. So I guess here you would need database specific stuff still, there is no way to avoid that. Same with schema, I believe.
Comment #61
kbahey CreditAttribution: kbahey commented@Crell
If PDO is going to set up back with more restrictions, caveats, gotchas, ...etc. then we really need to evaluate whether going this route is worth it or not (benefits vs. restrictions).
@all
I tested this patch on a clean database of HEAD, and things are working well. All hail the auto_increment.
There was a bunch of menu errors though, like:
Undefined index: mlid in ../includes/menu.inc on line 1340.
Not sure if this is related to this patch or something else.
+1 from me.
Comment #62
David StraussTo add to kbahey's post, we need to be reconsider what this patch is trying to solve. The only released GNU/Linux distribution that we thought had a flawed version of MySQL was Ubuntu 7.04, but we later discovered that they use the enterprise version, which doesn't have the problem. Debian testing seems to use a flawed version, but it's not clear that Debian will even continue to use that version for the next release. We have not verified that Debian testing actually has the flaw.
Comment #63
kbahey CreditAttribution: kbahey commentedFound a problem when creating comments:
When reviewing a comment, I get:
Then when clicking submit, I get this, and the comment is NOT created.
Comment #64
kbahey CreditAttribution: kbahey commentedI backed out the patch, and tried again creating a comment. The preview errors still appear.
However when pressing Submit, the comment IS INDEED CREATED, and only one error appears:
notice: Undefined index: name in /../modules/comment/comment.module on line 823.
So the patch needs work, at least for me.
Comment #65
Crell CreditAttribution: Crell commentedWhatever is currently Debian Testing is virtually guaranteed to never be a "stable" release. Debian Etch was just released as stable, so the next stable is at least a year away. We can/should ignore whatever version it has in it because it will change a few dozen times in the next few months. Etch/stable and maybe Sarge/old-stable are the only Debians we should worry about in terms of bug-workarounds.
<--Former Debian user, now running, of course, Ubuntu Feisty. :-) I wonder if Ubuntu's bug/sec fixes have included a fix for this yet. I'll have to check tonight if I can, since I'm on a hell project this week and have almost no Drupal time. :-(
Comment #66
kbahey CreditAttribution: kbahey commentedHere is a reroll of the patch with a fix for the comments. A missing paren was the culprit.
Comment #67
David Strauss@Crell Ubuntu Feisty isn't vulnerable in the first place. Ubuntu uses the enterprise, not community, version of MySQL.
Comment #68
chx CreditAttribution: chx commentedThis is a feature request from ages ago. Instead of our somewhat hackish solution we let the databsae do what a database should do. The result is less code in Drupal, better working code in Drupal etc.
Comment #69
Dries CreditAttribution: Dries commentedIt would be great to see someone test this patch on PostgreSQL.
Another good reason for this patch is to support master-master configurations with MySQL.
Comment #70
matt_paz CreditAttribution: matt_paz commentedI love, love, love the fact that action is being taken on this! Hooray!
Comment #71
bjaspan CreditAttribution: bjaspan commentedI'll try this on pgsql "real soon now."
Comment #72
asimmonds CreditAttribution: asimmonds commentedThe warnings that are mentioned in #61 for a fresh install are still generated
When building a new menu, we have a problem with the parents array, the comment in _menu_link_parents_set() says:
but with this patch, we can't set one of these parents with mlid as we don't know the mlid yet. In the resulting installation the menu doesn't expand as you navigate.
Comment #73
chx CreditAttribution: chx commentedComment #74
dvessel CreditAttribution: dvessel commentedTested patch #73, installed with no notices. All the Nav links were intact..
This was with a mysqli connection using ver. 4.1.21 and PHP 5.1.
Comment #75
chx CreditAttribution: chx commentedThen it's ready.
Comment #76
asimmonds CreditAttribution: asimmonds commentedDoing a quick test under postgres 8.2.4, when adding a new page node I get:
Comparing the pgsql and mysql databases, the nid column in node_revisions has a default of 0 in mysql but not in pgsql.
Adding a 'default 0' in the node_revisions schema, fixes this but is it out of scope for this patch?
Also, something is wrong with node revisions, nid is left 0 for new revisions and the vid in node table doesn't get set to the new revision id. This happens under both mysql and pgsql.
In the attached patch, I've moved the update node table query builder after the the insert or update of the node_revisions table. This seems to have fixed the node revisions under mysql and pgsql.
Comment #77
David Strauss@asimmonds Shouldn't {node_revisions} include the actual nid in the insert? Even if we're not using real foreign keys, inserting into {node_revisions} without specifying an nid is bad.
Comment #78
chx CreditAttribution: chx commentedAssimmonds' patch is great, just it had code style errors.
David, the problem is that we can not have two autoincrement at the same time. So, we first insert into revisions with a nid of 0, then we insert into node and update node_revisions because we can JOIN the two tables on vid. If you mean by "bad" that the database is in an incorrect state before the UDPATE, sure it is, but it is not different than before actually -- if node_save does not complete you have a problem. But, even so, the UPDATE is written in such a way that it tries to updates every row with the relevant nid where the nid is zero -- so Drupal tries very hard to return the database to correctness. There is no unique key on nid (thanks god) so having (0, vid) pairs is not a problem.
Comment #79
David Strauss@chx I'm fully aware of the limitations of autoincrement in MySQL. I'm saying that we need to insert into {node} before we create a corresponding record in {node_revisions}. Even though we don't use referential integrity, we should interact with the database so its addition would be transparent. Conceptually, {node_revisions} has a foreign key to {node}. No record should ever exist in {node_revisions} without a corresponding record in {node}.
Bad:
1. Insert into {node_revisions} with nid = 0.
2. Insert into {node}.
3. Update {node_revisions} with the new nid.
Good:
1. Insert into {node}.
2. Insert into {node_revisions}.
3. Update {node} with the new vid.
Really Good (probably infeasible):
1. Begin transaction.
2. Insert into {node}.
3. Insert into {node_revisions}.
4. Update {node} with the new vid.
5. Commit transaction.
Because we use INNER JOINs between {node} and {node_revisions}, having a record in {node} without a corresponding record in {node_revisions} will not cause significant problems. If anything only refers to {node}, then we need to have it require vid > 0.
Comment #80
chx CreditAttribution: chx commentedI need vid first which is the true PK here and nid second hence the order of inserts.
Comment #81
bjaspan CreditAttribution: bjaspan commentedDavid is right. If node is the master table for nodes with nid as the PK, then we (implicitly or explicitly) want to have a foreign key from node_revisions.nid to node.nid. That means that we have to insert into node first because node_revisions.nid cannot point to a non-existent node.nid (* see below).
chx says "I need vid first ... and nid second hence the order of inserts." Looking at the patch, it seems he needs vid first he needs to insert it into the node.vid column. So, he does (1) insert into node_revisions with invalid nid, save new vid, (2) insert in node, specifying the vid, save new nid (3) update node_revisions with saved nid.
Unless I'm mistaken, we can accomplish the same goal and provide ref integrity by changing the order: (1) insert into node with invalid vid, save nid, (2) insert node_revisions, specifying the nid, save vid, (3) update node with saved vid.
However, I'm speculating about something else. Why does node have a vid column at all? Revisions are a 1-to-many property of a node. So are taxonomy terms, but we don't have a tid column in node. The answer is "because node_load wants to load exactly one, either the newest or a specified, revision." That's a perfectly valid operational decision but I wonder if the problem is that by having a vid column we are representing the operational decision in the database in a way it should not be, and it is causing/will cause us grief regard ref integrity.
node_load, in the "get the newest revision" mode, could just as easily say:
SELECT * FROM node n INNER JOIN node_revision r ON n.nid=r.nid ORDER BY r.vid DESC LIMIT 1
So, we resolve the problem with node.vid and can remove it from the node table entirely. True, we add a sort to every node load, but there are not usually many revisions, and sorting by a primary key (vid) ought to be very fast.
If we decide that the ORDER BY r.vid LIMIT 1 solution is no good for performance reasons, then we should acknowledge that node.vid exists purely as a cache of the most recent vid per node. We might even rename it to node.vid_cache just to be clear about that.
To wrap up, I agree with David's "Good" scenario. Unless I'm totally on crack, in which case I'm sure someone will point that out to me. :-)
* Footnote: Actually, we could insert into node_revisions first by putting a sentinel (dummy) entry in the node table with nid = -1 (not 0, as that will confuse auto_increment). But that is ugly and likely to cause other problems.
Comment #82
bjaspan CreditAttribution: bjaspan commentedSince I'm actually hoping we can maybe get ref integrity into D6 now that we have more time, I don't feel like this patch is ready until we (a) change it so ref integrity is possible or (b) explicitly decide that is not a goal.
Comment #83
chx CreditAttribution: chx commentedReference integrity won't help you here. But, the biggest problem is that there is a unique key on node.vid so if two threads manage to insert into node with a vid of zero (before the INSERT INTO node_revisions and the UPDATE) then you are througly hosed. There is no such unique index on node_revisions.nid as it's truly not unique. I have choosen this order rather carefully.
Comment #84
David Strauss@chx If we want to keep the unique key on {node}.vid, we just have to allow it to be NULL. Unique keys (unlike primary keys) only require non-null values to be unique. Furthermore, if we formally establish {node}.vid as a foreign key of {node_revisions}.vid, the NULL values will work there, too. Using zero as a pseudo-null value is kind of a hack. In any case, node.vid is a denormalization, so it's a lower priority to maintain its integrity than the real data.
@bjaspan Even removing {node}.vid doesn't necessarily force a sort if {node_revisions} has a (nid, vid) index. I'll have to check this, though.
Comment #85
David Strauss@bjaspan Also, taxonomy terms have a many-to-many relationship with nodes. Restricting it further (to, say, many-to-one) is only done in the application software.
Comment #86
chx CreditAttribution: chx commentedThen write it. I do not care. I tried, again, again and again and I do not see any real reasons here. Surely pie in the sky is great.
Comment #87
chx CreditAttribution: chx commentedAny more future dreams we want to support?
Comment #88
David StraussShiny on #drupal brought up an interesting point about this patch: reading the sequence from PostgreSQL after the INSERT isn't guaranteed to return the same value used in the INSERT. Unlike MySQL's LAST_INSERT_ID(), PostgreSQL's sequences aren't per-connection. They're global.
Options:
* Encapsulate INSERTs and the sequence lookup in transactions.
* Provide a db_insert() function that returns the ID and properly juggles sequences and INSERTs.
* Abandon this patch and just use db_next_id().
Previously, dww raised a similar objection about the MySQL implementation. My response in #29 remains correct; LAST_INSERT_ID is thread/connection-specific. However, PostgreSQL doesn't use LAST_INSERT_ID semantics.
Comment #89
David StraussThe race condition in PostgreSQL needs fixing.
Comment #90
Shiny CreditAttribution: Shiny commentedLet me clarify by quoting dww:
The above is solved by mysql's last_insert_id being per connection --- however you're not calling last_insert_id in postgresql - you're instead doing exactly what dww describes above.
mysql lacks sequences. Every other database i've ever used implements sequences. Drupal implements these with a table for mysql. It works. Why fix it?
Comment #91
Shiny CreditAttribution: Shiny commenteda sequencing error caused a race condition on this thread.
Changing status to (code needs work)
Comment #92
chx CreditAttribution: chx commented#50, #52 .
Comment #93
chx CreditAttribution: chx commentedI am so very happy that everyone is such a big database theorist -- which, by now, delayed this patch needlessly twice -- but reading the issue which clearly states that currval is per connection or testing the patch, that's too much to ask.
Comment #94
Shiny CreditAttribution: Shiny commentedapologies - I've tested and yes it is per connection on pg7.4 and 8.1
this will instead become another issue for those wanting persistent connections. If we're going this way, can we remove the comment in includes/database.pgsql.inc about persistent connections improving performance?
Comment #95
David Strauss@Shiny What is the issue with persistent connections?
Comment #96
chx CreditAttribution: chx commentedENOUGH! Drupal does not support persistent connections. And if someone would hack it, he would have problems with for example search using temporary tables.
And guys, if anyone else follows up with more bullshit, I swear that I will go into database level and remove the followup.
Comment #97
chx CreditAttribution: chx commentedI have submitted a new issue instead of this poor abused one and I am not linking, if you want to abuse that too, then at least work a little.