It seems the {sequences} table is used only for MySQL. Looking at the code for the PostgreSQL api:

/**
 * Return a new unique ID in the given sequence.
 *
 * For compatibility reasons, Drupal does not use auto-numbered fields in its
 * database tables. Instead, this function is used to return a new unique ID
 * of the type requested. If necessary, a new sequence with the given name
 * will be created.
 */
function db_next_id($name) {
  $id = db_result(db_query("SELECT nextval('%s_seq')", db_prefix_tables($name)));
  return $id;
}

whereas the MySQL api looks like this:

/**
 * Return a new unique ID in the given sequence.
 *
 * For compatibility reasons, Drupal does not use auto-numbered fields in its
 * database tables. Instead, this function is used to return a new unique ID
 * of the type requested. If necessary, a new sequence with the given name
 * will be created.
 */
function db_next_id($name) {
  $name = db_prefix_tables($name);
  db_query('LOCK TABLES {sequences} WRITE');
  $id = db_result(db_query("SELECT id FROM {sequences} WHERE name = '%s'", $name)) + 1;
  db_query("REPLACE INTO {sequences} VALUES ('%s', %d)", $name, $id);
  db_query('UNLOCK TABLES');

  return $id;
}

1) The description of the PostgreSQL function is wrong, because in fact, it is supported with no need to use the {sequences} table.

2) To get rid of the {sequences} table once and for all, the function could look like:

/**
 * MySQL version
 * The argument is ignored here, it returns a value corresponding to the lastest insert stmt.
 */
function db_next_id($name = '') {
  global $active_db;
  return @mysql_insert_id($active_db);
}
/**
 * PostgreSQL version
 */
function db_next_id($name) {
  $id = db_result(db_query("SELECT currval('%s_seq')", db_prefix_tables($name)));
  return $id;
}

However, the usage would be different. With that change one has to execute the INSERT statement and then invoke db_next_id() to get the value.

The only problem is that all scripts that currently use db_next_id() would have to be adapted, but the rest are benefits, even performance improvements, I believe.

Thoughts?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

markus_petrux’s picture

Changing the meaning of the function could be somehow dangerous if a contrib module has not been adapted.

Maybe it would be wiser to rename the function to something like db_insert_id(). Since there are many other things that contrib module authors should do make their modules compatible with 4.7, it seems a good oportunity to try to implement this change as well.

Cvbge’s picture

Priority: Critical » Normal

Hello,

is this really 'critical' issue? Lowered it to normal.

We could use mysql_insert_id() and currval(), but there are some difficulties:

  1. you can't use sequence table and auto increment at the same time for the same table, so we probably can't make slow, partial transition
  2. there is a slight differences between mysql_insert_id() and currval(): first returns 0 if no auto increment was used and the second one throws an error, so we have to write in capital letters "use only if you're sure you used an auto increment"

We also could have another db_last_insert_id() "macro" which could be used e.g. in this way:

db_query("insert into {table with auto increment} values ('%s')", 'foo');
db_query("insert into {different table} values (%s, '%s', db_last_insert_id(), 'bar');

and db_last_insert_id() would return LAST_INSERT_ID()/lastval() for mysql/pgsql.
That would be usefull, because new db_insert_id() for postgresql would still have to do a select('currval()') which is slower...

This is post-4.7 probably...

Usefull links:
http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html
http://www.postgresql.org/docs/current/static/functions-sequence.html
http://pl.php.net/mysql_insert_id

markus_petrux’s picture

If that helps, the method I pointed to is used for phpBB's DBAL.

I believe it would at least avoid having the {sequences} table and that lock/replace/unlock for MySQL.

As per performance, in PostgreSQL, if you use currval() before inserting, probably the INSERT will run faster, as there are chances that the same record is still in cache.

Cvbge’s picture

As per performance, in PostgreSQL, if you use currval() before inserting, probably the INSERT will run faster, as there are chances that the same record is still in cache.

Could you elaborate?

markus_petrux’s picture

Opps, sorry. In my last sentence I meant the oposite.

As per performance, in PostgreSQL, if you use currval() after inserting, there are chances that the record is still in cache, which was updated by the previous INSERT.

Something else to consider with current method, for postgresql, is the db_next_id() doesn't lock the record/table. It might happen that by the time the INSERT is done, another process incremented the sequence (race condition), so that would generate a dup key error.

Cvbge’s picture

As per performance, in PostgreSQL, if you use currval() after inserting, there are chances that the record is still in cache, which was updated by the previous INSERT.

Comparing $id = select nextval(); insert ($id) (current code) with insert (); $id = select currval() (proposed db_insert_id()) I think they will have the same speed.

The db_insert_last_id() comes from the fact that with even with new db_insert_id() you'd have to do insert(); $id = select currval(); insert($id). With db_insert_last_id() you could do insert(); insert(lastval()).

Each insert and select is 1 db_query() call. So with db_insert_last_id() you could do 2 db_query() calls instead of 3. That's not much, and we won't have any noticable gain, but since we're using auto increments we could use all of it's features.

Something else to consider with current method, for postgresql, is the db_next_id() doesn't lock the record/table. It might happen that by the time the INSERT is done, another process incremented the sequence (race condition), so that would generate a dup key error.

Hm did you mean postgresql, or mysql? Because postgresql's sequences does not need locking and have no race conditions.

markus_petrux’s picture

Title: Getting rid of the sequences tables adding support for nextid in mysql » Getting rid of the sequences table, using db_insert_id() instead of db_next_id()

I agree there might be little difference with PostgreSQL. I should say I've been unfortunate with my comments here. I do not have much experience with postgresql, only used a couple of times. I was just thinking about possible issues...

The difference seems to be more clear with MySQL, where the {sequences} table is used. If there was a db_insert_id (ie. to call after INSERT statements), there would be no need for {sequences} table (ie. no need for that lock table, select, replace, unlock table, just using mysql_insert_id).

If using db_next_id (current code), in MySQL, is there a reason to use auto_increment in table definitions where the {sequences} table is used too?

db_insert_id() is an approach that can be ported to PostgreSQL (using currval instead of nextval) and other SQL engines.

Since Drupal 4.7 already breaks all 4.6 modules, I believe it is a good oportunity to think about possible/better alternatives to the {sequences} table. It is just that.

chx’s picture

markus, rest assured 4.8 (or 5.0) will break things all over. this should not happen for 4.7, too big change. imagine the number of bugs introduced by changing all queries and logic related to this.

markus_petrux’s picture

Status: Active » Postponed

Yes, that's a real possibility.

hmm... I know there's no roadmap in Drupal, but is there a TODO list or something similar?

Cvbge’s picture

Status: Postponed » Active

If using db_next_id (current code), in MySQL, is there a reason to use auto_increment in table definitions where the {sequences} table is used too?

No, I see no reason. In fact you can't use both, you need to stick with one way.

db_insert_id() is an approach that can be ported to PostgreSQL (using currval instead of nextval) and other SQL engines.

I agree about postgresql. I don't know about sqlite or oracle.

Since Drupal 4.7 already breaks all 4.6 modules, I believe it is a good oportunity to think about possible/better alternatives to the {sequences} table. It is just that.

I just think that if we keep introducing new features we'll never release 4.7. See recent cry on -devel list ;)

Cvbge’s picture

Status: Active » Postponed

Concurrent editing is bad ;)

markus_petrux’s picture

lol

I don't know about sqlite or oracle.

Look at ADOdb or phpBB DBAL. db_insert_id() would also be easily portable to MSSQL and others. ;)

I just think that if we keep introducing new features we'll never release 4.7. See recent cry on -devel list ;)

True

chx’s picture

markus_petrux’s picture

markus_petrux’s picture

Reading the development mailing list (Khalid's comment to the "Can someone confirm -- Minimum MySQL Privileges?" thread) made me remind of this issue.

Well, this issue is somehow related, as if there was a db_insert_id() function, there would be no need to use the sequences table, hence no reason to use LOCK TABLE.

Cvbge’s picture

It's needed by cache_set() and variable_set().

chx’s picture

I have something for 4.7 which keeps db_next_id but uses mysql_insert_id for mysql. http://drupal.org/node/55516

m3avrck’s picture

Any new updates on this? Sounds like a fairly straightforward plan to me, even if a few modules break, the benefits far outweigh the breakage.

moshe weitzman’s picture

Status: Postponed » Active
quicksketch’s picture

Version: x.y.z » 6.x-dev

Just putting into 6.x and giving a bump.

m3avrck’s picture

There is an entirely different concept being thrown around by Souvent22 and myself: http://earnestberry.com/node/13

sammys’s picture

I wholly support stopping use of the sequences table (yes this is a bottleneck). +1 for that!

+1 to changing db_next_id() to db_insert_id() and flipping the statement ordering of the insert query and call to db_insert_id().

Doing this will also avoid incrementing IDs if the insert query fails for some reason.

--
Sammy Spets
Synerger
http://synerger.com

chx’s picture

Title: Getting rid of the sequences table, using db_insert_id() instead of db_next_id() » Getting rid of the sequences table, using db_last_insert_id() instead of db_next_id()
Component: base system » database system
Assigned: Unassigned » chx
Priority: Normal » Critical
Status: Active » Needs review
FileSize
21.79 KB

All attempts to fix db_next_id failed, so get rid of it. there are too icky parts: batch -- but that's long running anyways, so one more query does not matter. node_save on insert can't insert with the relevant nid into node_revisions however that's fixed soon after and we rely anyways on all queries executing so I am not too concerned on node_revisions.nid being 0 for a very short while -- there was always a gap because node_save was not a transaction nor did it lock. Now the gap is a bit more wide.

chx’s picture

Why now? First not earlier, because earlier we did not have subselects and schema API which standardized serial / auto_increment. Later is not an option either because db_next_id is broken currently.

chx’s picture

chx’s picture

FileSize
21.8 KB

David Strauss catched a copy-paste error in form.inc

David Strauss’s picture

Status: Needs review » Needs work

Aside from an issue I told chx about in IRC that he's fixing right now, the patch needs to update all the *.schema files to make the columns autoincrement/serial. This solution should be portable to MSSQL. Does anyone know about Oracle or DB2?

dww’s picture

say you've got a bunch of your own tables, and you need to update a nid in all of them when you add a new node (this happens all the time in project*, for example, where you've got {node} and {node_revisions}, but then a bunch of project* specific tables for all the project-specific fields and data). how are you sure that db_last_insert_id() gives you the id of the last node you inserted? imagine:

  1. thread A inserts something into {node} (nid == 10)
  2. thread B inserts something else into {node} (nid == 11)
  3. thread A calls db_last_insert_id() and gets back 11, since that was the last thing inserted.
  4. thread B calls db_last_insert_id() and also gets back 11
  5. both threads start populating their own other tables with nid 11, and get duplicate key errors or worse when they race each other to see who gets their stuff written to the DB first. :(

i think i ran into exactly this problem with the cvs access control scripts or something, and there was no good way to ensure this worked right without adding another id to the {sequences} table, so each thread could safely get its own id at the start of its operations, and then consistently use the same id, regardless of who was actually updating which tables when.

seems like now, everyone has to wrap the insert and the call to db_last_insert_id() in a transaction or otherwise lock the DB, to prevent someone else from inserting something before you discover your id. your patch doesn't do this anywhere. so, unless i completely misunderstand what's going on (which is quite likely), i think this patch would introduce a bunch of evil race conditions in core, and generally make it harder for developers to get this right.

in short: big ol' -1 on removing the {sequences} table until these issues are clarified or fixed.

David Strauss’s picture

@dww Don't worry about it. LAST_INSERT_ID is connection (and hence thread) specific.

add1sun’s picture

From a functionality review the only thing I'm having issues with it comments.

When creating a comment I am getting this error with the patch, which goes away when the patch is reversed:

user warning: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '2, 0, 0, 'cvdv', 'svsdf', 1, '192.168.2.1', 1180487790, 0, 0, 'a:1:{i:0;i:0;}', ' at line 1 query: INSERT INTO comments (nid, pid, uid, subject, comment, format, hostname, timestamp, status, score, users, thread, name, mail, homepage) VALUES 2, 0, 0, 'cvdv', 'svsdf', 1, '192.168.2.1', 1180487790, 0, 0, 'a:1:{i:0;i:0;}', '02/', '', '', '') in /Users/addi/Sites/patch/includes/database.mysql.inc on line 163.

I went through users, taxonomy, aggregator, contact, blocks, nodes and things seemed to be working fine.

chx’s picture

Status: Needs work » Needs review
FileSize
21.8 KB

Comments fixed. David, the schema already provides serial for us because postgresql needs them.

David Strauss’s picture

I'd like to see benchmarks for this patch versus the old db_next_id() code. kbahley said in a related issue that InnoDB does a table lock to implement autoincrement, so increasing use of autoincrement might not help versus the old db_next_id() table-locking code.

dww’s picture

Status: Needs review » Needs work

@david strauss: yay, the missing bit of completely undocumented, non-obvious wisdom that makes everything work. ;) thanks for clarifying... i knew i must have been wrong, i just wasn't sure why. ;)

@chx: therefore, please just add some doxygen to db_last_insert_id() so others know about this. not only will it help developers know they can use this function safely, it'll help remind people adding support for other DBs that they better make sure their version provides the same semantics we're assuming.

@all: thanks!

$dww_vote += 2;

add1sun’s picture

yup, comments fixed now. looks good

add1sun’s picture

Status: Needs work » Reviewed & tested by the community

no more broken bits. chx said to rtbc

chx’s picture

Yes. Because there is no need for an update path, system_update_6019 already solved that for us. yay.

dww’s picture

darn busy threads. my last comment didn't mean to set to "needs work"... sorry about that.

David Strauss’s picture

@dww The archaic runes documenting LAST_INSERT_ID are carefully obscured from the eyes of mere mortals on the "How to Get the Unique ID for the Last Inserted Row" MySQL manual page:

For LAST_INSERT_ID(), the most recently generated ID is maintained in the server on a per-connection basis. It is not changed by another client. It is not even changed if you update another AUTO_INCREMENT column with a non-magic value (that is, a value that is not NULL and not 0). Using LAST_INSERT_ID() and AUTO_INCREMENT columns simultaneously from multiple clients is perfectly valid. Each client will receive the last inserted ID for the last statement that client executed.

dww’s picture

@david strauss: i just meant "undocumented" in the sense of not documented anywhere in the drupal code or in this patch. that said, thanks for the pointer to the mysql docs on this.

recidive’s picture

Great patch!

Question: why not just db_insert_id() rather than db_last_insert_id()?

bjaspan’s picture

It is late and I'm very tired, so I'm probably missing something. db_last_insert_id() for pgsql takes a $name argument and db_last_insert_id() for mysql does not. Is it just that mysql never needs a name because there is only a single last-id per connection?

Even so, the mysql function ought to require the same parameters as the pgsql function (a) for consistency and (b) so that module developers do not break pgsql sites by concluding that if it works without an arg for mysql, that must be okay.

I haven't reviewed the patch any further than this. I seem to recall while adding type 'serial' support to Schema API that at one point I thought, "Ah, yes, *this* is why db_next_id() is necessary and always using auto-increment is not sufficient." But I don't remember why.

kbahey’s picture

@David Strauss, kbahley? The last person who mispelled my nick had his insurance rates go up ... ;-)

What is the risk of this patch creating a lot of "ghost" rows in the database tables? Can it insert incomplete stuff and then the user abandons the operation, and these are left in the database causing confusion for modules and the like?

David Strauss’s picture

@bjaspan I expressed the exact same sentiments in conversation with chx earlier this evening.

dww’s picture

Status: Reviewed & tested by the community » Needs work

@bjaspan and david -- i agree. therefore, this still needs work before it's RTBC.

joshk’s picture

I fear this. Why, you might ask? Because the sequences table has saved my ass on numerous occasions.

I realize that a smart DBA can determine all this information from her database, and even set gaps in ids or mass-update auto_increments to help with content migrations, but it's still very nice to have a central place where all my important drupal ids live. I would be sorry to see it go.

However, if it's already gone from postgres, well, maybe it's just time. (sniff).

chx’s picture

Status: Needs work » Needs review
FileSize
22.51 KB

Doxygen'd the new function, made a better db_last_insert_id for postgresql 8.1+ and used better parameters for db_last_insert_id and made them mandatory.

Dries’s picture

Very interesting patch, and an incredible amount of overnight activity in this issue. What did you guys have for dinner? :)

David mentioned he'd like to understand the performance implications, and I tend to agree. It would be nice to understand this a bit better.

From http://www.mysqlperformanceblog.com/2007/05/09/mysql-users-conference-in... :

Yet another known Innodb problem is table locks which are taken for statement duration for Inserts with auto-increment column. This starts giving you problems if you have a lot of concurrent inserts happening to the same table.

This was born same as bunch of others due to statement level MySQL Replication which among other requirements, needs all auto-increment values in multiple value insert to be sequential.

To be honest Heikki probably took a bit of shortcut and kept code simple and same for all insert cases - in fact if the insert is single value insert or the number of values in the bulk is known you can allocate all values at once when statement is started instead of handling auto increment lock for full statement duration.

The problem with multi value inserts however is the following - even if you know number of values in insert statement you do not know if all of them need autoincrement value allocated, some may already have values specified. I’m not sure if Heikki will find a way to count exact number of auto increment values needed or if we get behavior change by having potential “holes” when auto increment values are assigned but never used. There are probably similar problems in INSERT IGNORE and ON DUPLICATE KEY UPDATE cases.

Now In MySQL 5.1+ it is also possible to fix this problem for cases when number of rows in auto-increment batch insert is not known as we do not have to allocate sequential auto increment values if row level replication is used. This however will be again behavior change to watch out for.

So these are all possibilities but are we expected to have the problem fixed soon ? Yes indeed there is patch out where already which may come in MySQL 5.2 or even MySQL 5.1. I surely would like to see it sooner at least in more flexible community version.

I don't think it is a show-stopper though. In fact, I think we might have a winner. :)

chx’s picture

FileSize
23.15 KB

Even better postgresql code but it mandates one more argument. Noted that db_last_insert_id is thread safe.

chx’s picture

That innodb stuff is only relevant if you have a transaction or an extended insert. We have neither. If someone has postgresql below 8.1 and changed the default DB configuration to not have OIDs will have to do some magic with his database to get this working. Not my problem, though.

chx’s picture

FileSize
22.56 KB

Further Googling and sammys' tests show that currval is per connection. So I went back to the currval based solution because it's thread safe and a hell lot simpler.

chx’s picture

FileSize
22.11 KB

Removed lastval as it's not needed -- no oids, no version trouble.

sammys’s picture

Tests confirmed that currval operates per connection. FYI: Used two DB shell sessions and a table with a serial field. Inserted a record into the table from each session then called currval. Value returned by the session with the first insertion remained the same after the insertion in the other session. QED pffft

--
Sammy Spets
Synerger
http://synerger.com

Owen Barton’s picture

Subscribing

quicksketch’s picture

@chx Why the naming of the function as db_last_insert_id() instead of db_insert_id()?

While db_last_insert_id() is used in the SQL, db_insert_id() would make it inline with php functions mysql_insert_id() and mysqli_insert_id().

David Strauss’s picture

That innodb stuff is only relevant if you have a transaction or an extended insert.

@chx InnoDB treats each statement as an individual transaction unless you're in a larger transaction. (This is why the mode is called "autocommit.") So, to say it's "only relevant if you have a transaction or extended insert" is true but misleading.

David Strauss’s picture

@quicksketch It's called mysql_insert_id in the MySQL library because that's the only way to get an ID in MySQL -- after insert. Other databases let you get it before, so it's ambiguous to just call it db_insert_id when you support databases like PostgreSQL.

quicksketch’s picture

Thanks for the explaination David. Sounds like a winner to me.

quicksketch’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
23.83 KB

Works awesome. The patch needed a db_last_insert_id() moved from upload.module to file.inc to keep up with the file-system overhaul that went in last night. We're ready to roll with this one.

Crell’s picture

Oh the things one misses when one goes out of town for a few days...

My concern with dropping sequences in favor of auto_increment is two fold. One, all the DB guys I talk to keep insisting that sequences are more flexible than auto-increment. One could easily point out that while we can emulate auto-increment on sequence-based databases, one can emulate sequences on auto-increment-based databases, too. I do not claim to be an expert on the subject, however.

Two, one of my goals for Drupal 7 at this point is to finish the PDO driver I started but had to back-burner, and eventually we should move to an all-PDO backend for better abstraction, flexibility, and security. That leads to this http://us.php.net/manual/en/function.PDO-lastInsertId.php . Specifically, lastInsertId only sometimes works. There's no indication that I see on which PDO drivers support auto-increment. (I'd presume the MySQL one does, but I don't know about the others.)

Can anyone sooth my concern that switching Drupal 6 to auto-increment instead of sequences won't make a PDO driver for Drupal 7 that much more painful?

chx’s picture

I can't claim to know PDO but look at the pgsql code, it uses sequences still. So I guess here you would need database specific stuff still, there is no way to avoid that. Same with schema, I believe.

kbahey’s picture

@Crell

If PDO is going to set up back with more restrictions, caveats, gotchas, ...etc. then we really need to evaluate whether going this route is worth it or not (benefits vs. restrictions).

@all

I tested this patch on a clean database of HEAD, and things are working well. All hail the auto_increment.

There was a bunch of menu errors though, like:

Undefined index: mlid in ../includes/menu.inc on line 1340.

Not sure if this is related to this patch or something else.

+1 from me.

David Strauss’s picture

To add to kbahey's post, we need to be reconsider what this patch is trying to solve. The only released GNU/Linux distribution that we thought had a flawed version of MySQL was Ubuntu 7.04, but we later discovered that they use the enterprise version, which doesn't have the problem. Debian testing seems to use a flawed version, but it's not clear that Debian will even continue to use that version for the next release. We have not verified that Debian testing actually has the flaw.

kbahey’s picture

Found a problem when creating comments:

When reviewing a comment, I get:

    * notice: Undefined property: stdClass::$nid in ../modules/comment/comment.module on line 1749.
    * notice: Undefined property: stdClass::$cid in ../modules/comment/comment.module on line 1757.
    * notice: Undefined property: stdClass::$comment in ../modules/comment/comment.module on line 1761.
    * notice: Undefined property: stdClass::$format in ../modules/comment/comment.module on line 1761.
    * notice: Undefined property: stdClass::$subject in ../modules/comment/comment.module on line 1845.
    * notice: Undefined property: stdClass::$cid in ../modules/comment/comment.module on line 1845.

Then when clicking submit, I get this, and the comment is NOT created.

    * notice: Undefined index: name in ../modules/comment/comment.module on line 822.
    * user warning: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '2, 0, 0, 'comment 1', 'comment 1', 1, '192.168.0.249', 1180570374, 0, 0, 'a:1:{i' at line 1 query: INSERT INTO comments (nid, pid, uid, subject, comment, format, hostname, timestamp, status, score, users, thread, name, mail, homepage) VALUES 2, 0, 0, 'comment 1', 'comment 1', 1, '192.168.0.249', 1180570374, 0, 0, 'a:1:{i:0;i:0;}', '01/', '', '', '') in ../includes/database.mysql.inc on line 163.
kbahey’s picture

Status: Reviewed & tested by the community » Needs work

I backed out the patch, and tried again creating a comment. The preview errors still appear.

However when pressing Submit, the comment IS INDEED CREATED, and only one error appears:

notice: Undefined index: name in /../modules/comment/comment.module on line 823.

So the patch needs work, at least for me.

Crell’s picture

Whatever is currently Debian Testing is virtually guaranteed to never be a "stable" release. Debian Etch was just released as stable, so the next stable is at least a year away. We can/should ignore whatever version it has in it because it will change a few dozen times in the next few months. Etch/stable and maybe Sarge/old-stable are the only Debians we should worry about in terms of bug-workarounds.

<--Former Debian user, now running, of course, Ubuntu Feisty. :-) I wonder if Ubuntu's bug/sec fixes have included a fix for this yet. I'll have to check tonight if I can, since I'm on a hell project this week and have almost no Drupal time. :-(

kbahey’s picture

Status: Needs work » Needs review
FileSize
24.9 KB

Here is a reroll of the patch with a fix for the comments. A missing paren was the culprit.

David Strauss’s picture

@Crell Ubuntu Feisty isn't vulnerable in the first place. Ubuntu uses the enterprise, not community, version of MySQL.

chx’s picture

Status: Needs review » Reviewed & tested by the community

This is a feature request from ages ago. Instead of our somewhat hackish solution we let the databsae do what a database should do. The result is less code in Drupal, better working code in Drupal etc.

Dries’s picture

It would be great to see someone test this patch on PostgreSQL.

Another good reason for this patch is to support master-master configurations with MySQL.

matt_paz’s picture

I love, love, love the fact that action is being taken on this! Hooray!

bjaspan’s picture

I'll try this on pgsql "real soon now."

asimmonds’s picture

Status: Reviewed & tested by the community » Needs work

The warnings that are mentioned in #61 for a fresh install are still generated

# notice: Undefined index: mlid in .\includes\menu.inc on line 1341.
# notice: Undefined index: mlid in .\includes\menu.inc on line 1518.

When building a new menu, we have a problem with the parents array, the comment in _menu_link_parents_set() says:

  // The parent (p1 - p6) corresponding to the depth always equals the mlid.

but with this patch, we can't set one of these parents with mlid as we don't know the mlid yet. In the resulting installation the menu doesn't expand as you navigate.

chx’s picture

Status: Needs work » Needs review
FileSize
27.09 KB
dvessel’s picture

Tested patch #73, installed with no notices. All the Nav links were intact..

This was with a mysqli connection using ver. 4.1.21 and PHP 5.1.

chx’s picture

Status: Needs review » Reviewed & tested by the community

Then it's ready.

asimmonds’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
28.64 KB

Doing a quick test under postgres 8.2.4, when adding a new page node I get:

    * warning: pg_query() [function.pg-query]: Query failed: ERROR: null value in column "nid" violates not-null constraint in .\includes\database.pgsql.inc on line 161.
    * user warning: query: INSERT INTO node_revisions (title, body, teaser, timestamp, uid, format, log) VALUES ('page one', 'page one', 'page one', 1180824062, 1, 1, '') in .\includes\database.pgsql.inc on line 180.

Comparing the pgsql and mysql databases, the nid column in node_revisions has a default of 0 in mysql but not in pgsql.
Adding a 'default 0' in the node_revisions schema, fixes this but is it out of scope for this patch?

@@ -65,7 +65,7 @@ function node_schema() {

   $schema['node_revisions'] = array(
     'fields' => array(
-      'nid'       => array('type' => 'int', 'unsigned' => TRUE, 'not null' => TRUE),
+      'nid'       => array('type' => 'int', 'unsigned' => TRUE, 'not null' => TRUE, 'default' => 0),
       'vid'       => array('type' => 'serial', 'unsigned' => TRUE, 'not null' => TRUE),
       'uid'       => array('type' => 'int', 'not null' => TRUE, 'default' => 0),
       'title'     => array('type' => 'varchar', 'length' => 128, 'not null' => TRUE, 'default' => ''),

Also, something is wrong with node revisions, nid is left 0 for new revisions and the vid in node table doesn't get set to the new revision id. This happens under both mysql and pgsql.

In the attached patch, I've moved the update node table query builder after the the insert or update of the node_revisions table. This seems to have fixed the node revisions under mysql and pgsql.

David Strauss’s picture

@asimmonds Shouldn't {node_revisions} include the actual nid in the insert? Even if we're not using real foreign keys, inserting into {node_revisions} without specifying an nid is bad.

chx’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
5.33 KB

Assimmonds' patch is great, just it had code style errors.

David, the problem is that we can not have two autoincrement at the same time. So, we first insert into revisions with a nid of 0, then we insert into node and update node_revisions because we can JOIN the two tables on vid. If you mean by "bad" that the database is in an incorrect state before the UDPATE, sure it is, but it is not different than before actually -- if node_save does not complete you have a problem. But, even so, the UPDATE is written in such a way that it tries to updates every row with the relevant nid where the nid is zero -- so Drupal tries very hard to return the database to correctness. There is no unique key on nid (thanks god) so having (0, vid) pairs is not a problem.

David Strauss’s picture

@chx I'm fully aware of the limitations of autoincrement in MySQL. I'm saying that we need to insert into {node} before we create a corresponding record in {node_revisions}. Even though we don't use referential integrity, we should interact with the database so its addition would be transparent. Conceptually, {node_revisions} has a foreign key to {node}. No record should ever exist in {node_revisions} without a corresponding record in {node}.

Bad:
1. Insert into {node_revisions} with nid = 0.
2. Insert into {node}.
3. Update {node_revisions} with the new nid.

Good:
1. Insert into {node}.
2. Insert into {node_revisions}.
3. Update {node} with the new vid.

Really Good (probably infeasible):
1. Begin transaction.
2. Insert into {node}.
3. Insert into {node_revisions}.
4. Update {node} with the new vid.
5. Commit transaction.

Because we use INNER JOINs between {node} and {node_revisions}, having a record in {node} without a corresponding record in {node_revisions} will not cause significant problems. If anything only refers to {node}, then we need to have it require vid > 0.

chx’s picture

I need vid first which is the true PK here and nid second hence the order of inserts.

bjaspan’s picture

David is right. If node is the master table for nodes with nid as the PK, then we (implicitly or explicitly) want to have a foreign key from node_revisions.nid to node.nid. That means that we have to insert into node first because node_revisions.nid cannot point to a non-existent node.nid (* see below).

chx says "I need vid first ... and nid second hence the order of inserts." Looking at the patch, it seems he needs vid first he needs to insert it into the node.vid column. So, he does (1) insert into node_revisions with invalid nid, save new vid, (2) insert in node, specifying the vid, save new nid (3) update node_revisions with saved nid.

Unless I'm mistaken, we can accomplish the same goal and provide ref integrity by changing the order: (1) insert into node with invalid vid, save nid, (2) insert node_revisions, specifying the nid, save vid, (3) update node with saved vid.

However, I'm speculating about something else. Why does node have a vid column at all? Revisions are a 1-to-many property of a node. So are taxonomy terms, but we don't have a tid column in node. The answer is "because node_load wants to load exactly one, either the newest or a specified, revision." That's a perfectly valid operational decision but I wonder if the problem is that by having a vid column we are representing the operational decision in the database in a way it should not be, and it is causing/will cause us grief regard ref integrity.

node_load, in the "get the newest revision" mode, could just as easily say:

SELECT * FROM node n INNER JOIN node_revision r ON n.nid=r.nid ORDER BY r.vid DESC LIMIT 1

So, we resolve the problem with node.vid and can remove it from the node table entirely. True, we add a sort to every node load, but there are not usually many revisions, and sorting by a primary key (vid) ought to be very fast.

If we decide that the ORDER BY r.vid LIMIT 1 solution is no good for performance reasons, then we should acknowledge that node.vid exists purely as a cache of the most recent vid per node. We might even rename it to node.vid_cache just to be clear about that.

To wrap up, I agree with David's "Good" scenario. Unless I'm totally on crack, in which case I'm sure someone will point that out to me. :-)

* Footnote: Actually, we could insert into node_revisions first by putting a sentinel (dummy) entry in the node table with nid = -1 (not 0, as that will confuse auto_increment). But that is ugly and likely to cause other problems.

bjaspan’s picture

Status: Reviewed & tested by the community » Needs work

Since I'm actually hoping we can maybe get ref integrity into D6 now that we have more time, I don't feel like this patch is ready until we (a) change it so ref integrity is possible or (b) explicitly decide that is not a goal.

chx’s picture

Status: Needs work » Reviewed & tested by the community

Reference integrity won't help you here. But, the biggest problem is that there is a unique key on node.vid so if two threads manage to insert into node with a vid of zero (before the INSERT INTO node_revisions and the UPDATE) then you are througly hosed. There is no such unique index on node_revisions.nid as it's truly not unique. I have choosen this order rather carefully.

David Strauss’s picture

@chx If we want to keep the unique key on {node}.vid, we just have to allow it to be NULL. Unique keys (unlike primary keys) only require non-null values to be unique. Furthermore, if we formally establish {node}.vid as a foreign key of {node_revisions}.vid, the NULL values will work there, too. Using zero as a pseudo-null value is kind of a hack. In any case, node.vid is a denormalization, so it's a lower priority to maintain its integrity than the real data.

@bjaspan Even removing {node}.vid doesn't necessarily force a sort if {node_revisions} has a (nid, vid) index. I'll have to check this, though.

David Strauss’s picture

@bjaspan Also, taxonomy terms have a many-to-many relationship with nodes. Restricting it further (to, say, many-to-one) is only done in the application software.

chx’s picture

Assigned: chx » Unassigned
Status: Reviewed & tested by the community » Needs work

Then write it. I do not care. I tried, again, again and again and I do not see any real reasons here. Surely pie in the sky is great.

chx’s picture

Status: Needs work » Needs review
FileSize
28.52 KB

Any more future dreams we want to support?

David Strauss’s picture

Shiny on #drupal brought up an interesting point about this patch: reading the sequence from PostgreSQL after the INSERT isn't guaranteed to return the same value used in the INSERT. Unlike MySQL's LAST_INSERT_ID(), PostgreSQL's sequences aren't per-connection. They're global.

Options:
* Encapsulate INSERTs and the sequence lookup in transactions.
* Provide a db_insert() function that returns the ID and properly juggles sequences and INSERTs.
* Abandon this patch and just use db_next_id().

Previously, dww raised a similar objection about the MySQL implementation. My response in #29 remains correct; LAST_INSERT_ID is thread/connection-specific. However, PostgreSQL doesn't use LAST_INSERT_ID semantics.

David Strauss’s picture

Status: Needs review » Needs work

The race condition in PostgreSQL needs fixing.

Shiny’s picture

Status: Needs work » Needs review

Let me clarify by quoting dww:

thread A inserts something into {node} (nid == 10)
thread B inserts something else into {node} (nid == 11)
thread A calls db_last_insert_id() and gets back 11, since that was the last thing inserted.
thread B calls db_last_insert_id() and also gets back 11
both threads start populating their own other tables with nid 11, and get duplicate key errors or worse when they race each other to see who gets their stuff written to the DB first. :(

The above is solved by mysql's last_insert_id being per connection --- however you're not calling last_insert_id in postgresql - you're instead doing exactly what dww describes above.

mysql lacks sequences. Every other database i've ever used implements sequences. Drupal implements these with a table for mysql. It works. Why fix it?

Shiny’s picture

Status: Needs review » Needs work

a sequencing error caused a race condition on this thread.
Changing status to (code needs work)

chx’s picture

Status: Needs work » Needs review

#50, #52 .

chx’s picture

I am so very happy that everyone is such a big database theorist -- which, by now, delayed this patch needlessly twice -- but reading the issue which clearly states that currval is per connection or testing the patch, that's too much to ask.

Shiny’s picture

apologies - I've tested and yes it is per connection on pg7.4 and 8.1

this will instead become another issue for those wanting persistent connections. If we're going this way, can we remove the comment in includes/database.pgsql.inc about persistent connections improving performance?

David Strauss’s picture

@Shiny What is the issue with persistent connections?

chx’s picture

ENOUGH! Drupal does not support persistent connections. And if someone would hack it, he would have problems with for example search using temporary tables.

And guys, if anyone else follows up with more bullshit, I swear that I will go into database level and remove the followup.

chx’s picture

Status: Needs review » Closed (fixed)

I have submitted a new issue instead of this poor abused one and I am not linking, if you want to abuse that too, then at least work a little.