Getting rid of the sequences table, using db_last_insert_id() instead of db_next_id() [#49836]

Comment	File	Size	Author
#87	db_last_insert_id_11.patch	28.52 KB	chx

#78	db_last_insert_id_10.patch	5.33 KB	chx

#76	db_last_insert_id_9.patch	28.64 KB	asimmonds

#73	db_last_insert_id_8.patch	27.09 KB	chx

#66	db_last_insert_id_7.patch	24.9 KB	kbahey

#58	db_last_insert_id_6.patch	23.83 KB	quicksketch

#51	db_last_insert_id_5.patch	22.11 KB	chx

#50	db_last_insert_id_4.patch	22.56 KB	chx

#48	db_last_insert_id_3.patch	23.15 KB	chx

#46	db_last_insert_id_2.patch	22.51 KB	chx

#31	db_last_insert_id_1.patch	21.8 KB	chx

#26	db_last_insert_id_0.patch	21.8 KB	chx

#23	db_last_insert_id.patch	21.79 KB	chx

Comment #1

markus_petrux CreditAttribution: markus_petrux commented 17 February 2006 at 08:00

Changing the meaning of the function could be somehow dangerous if a contrib module has not been adapted.

Maybe it would be wiser to rename the function to something like db_insert_id(). Since there are many other things that contrib module authors should do make their modules compatible with 4.7, it seems a good oportunity to try to implement this change as well.

Log in or register to post comments

Comment #2

Cvbge CreditAttribution: Cvbge commented 17 February 2006 at 16:41

Priority:

Critical

» Normal

Hello,

is this really 'critical' issue? Lowered it to normal.

We could use mysql_insert_id() and currval(), but there are some difficulties:

you can't use sequence table and auto increment at the same time for the same table, so we probably can't make slow, partial transition
there is a slight differences between mysql_insert_id() and currval(): first returns 0 if no auto increment was used and the second one throws an error, so we have to write in capital letters "use only if you're sure you used an auto increment"

We also could have another db_last_insert_id() "macro" which could be used e.g. in this way:

db_query("insert into {table with auto increment} values ('%s')", 'foo');
db_query("insert into {different table} values (%s, '%s', db_last_insert_id(), 'bar');

and db_last_insert_id() would return LAST_INSERT_ID()/lastval() for mysql/pgsql.
That would be usefull, because new db_insert_id() for postgresql would still have to do a select('currval()') which is slower...

This is post-4.7 probably...

Usefull links:
http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html
http://www.postgresql.org/docs/current/static/functions-sequence.html
http://pl.php.net/mysql_insert_id

Log in or register to post comments

Comment #3

markus_petrux CreditAttribution: markus_petrux commented 17 February 2006 at 16:59

If that helps, the method I pointed to is used for phpBB's DBAL.

I believe it would at least avoid having the {sequences} table and that lock/replace/unlock for MySQL.

As per performance, in PostgreSQL, if you use currval() before inserting, probably the INSERT will run faster, as there are chances that the same record is still in cache.

Log in or register to post comments

Comment #4

Cvbge CreditAttribution: Cvbge commented 17 February 2006 at 17:14

As per performance, in PostgreSQL, if you use currval() before inserting, probably the INSERT will run faster, as there are chances that the same record is still in cache.

Could you elaborate?

Log in or register to post comments

Comment #5

markus_petrux CreditAttribution: markus_petrux commented 17 February 2006 at 17:15

Opps, sorry. In my last sentence I meant the oposite.

As per performance, in PostgreSQL, if you use currval() after inserting, there are chances that the record is still in cache, which was updated by the previous INSERT.

Something else to consider with current method, for postgresql, is the db_next_id() doesn't lock the record/table. It might happen that by the time the INSERT is done, another process incremented the sequence (race condition), so that would generate a dup key error.

Log in or register to post comments

Comment #6

Cvbge CreditAttribution: Cvbge commented 17 February 2006 at 22:13

As per performance, in PostgreSQL, if you use currval() after inserting, there are chances that the record is still in cache, which was updated by the previous INSERT.

Comparing $id = select nextval(); insert ($id) (current code) with insert (); $id = select currval() (proposed db_insert_id()) I think they will have the same speed.

The db_insert_last_id() comes from the fact that with even with new db_insert_id() you'd have to do insert(); $id = select currval(); insert($id). With db_insert_last_id() you could do insert(); insert(lastval()).

Each insert and select is 1 db_query() call. So with db_insert_last_id() you could do 2 db_query() calls instead of 3. That's not much, and we won't have any noticable gain, but since we're using auto increments we could use all of it's features.

Something else to consider with current method, for postgresql, is the db_next_id() doesn't lock the record/table. It might happen that by the time the INSERT is done, another process incremented the sequence (race condition), so that would generate a dup key error.

Hm did you mean postgresql, or mysql? Because postgresql's sequences does not need locking and have no race conditions.

Log in or register to post comments

Comment #7

markus_petrux CreditAttribution: markus_petrux commented 18 February 2006 at 07:55

Title:

Getting rid of the sequences tables adding support for nextid in mysql

» Getting rid of the sequences table, using db_insert_id() instead of db_next_id()

I agree there might be little difference with PostgreSQL. I should say I've been unfortunate with my comments here. I do not have much experience with postgresql, only used a couple of times. I was just thinking about possible issues...

The difference seems to be more clear with MySQL, where the {sequences} table is used. If there was a db_insert_id (ie. to call after INSERT statements), there would be no need for {sequences} table (ie. no need for that lock table, select, replace, unlock table, just using mysql_insert_id).

If using db_next_id (current code), in MySQL, is there a reason to use auto_increment in table definitions where the {sequences} table is used too?

db_insert_id() is an approach that can be ported to PostgreSQL (using currval instead of nextval) and other SQL engines.

Since Drupal 4.7 already breaks all 4.6 modules, I believe it is a good oportunity to think about possible/better alternatives to the {sequences} table. It is just that.

Log in or register to post comments

Comment #8

chx CreditAttribution: chx commented 18 February 2006 at 08:04

markus, rest assured 4.8 (or 5.0) will break things all over. this should not happen for 4.7, too big change. imagine the number of bugs introduced by changing all queries and logic related to this.

Log in or register to post comments

Comment #9

markus_petrux CreditAttribution: markus_petrux commented 18 February 2006 at 08:10

Status:

Active

» Postponed

Yes, that's a real possibility.

hmm... I know there's no roadmap in Drupal, but is there a TODO list or something similar?

Log in or register to post comments

Comment #10

Cvbge CreditAttribution: Cvbge commented 18 February 2006 at 08:10

Status:

Postponed

» Active

If using db_next_id (current code), in MySQL, is there a reason to use auto_increment in table definitions where the {sequences} table is used too?

No, I see no reason. In fact you can't use both, you need to stick with one way.

db_insert_id() is an approach that can be ported to PostgreSQL (using currval instead of nextval) and other SQL engines.

I agree about postgresql. I don't know about sqlite or oracle.

Since Drupal 4.7 already breaks all 4.6 modules, I believe it is a good oportunity to think about possible/better alternatives to the {sequences} table. It is just that.

I just think that if we keep introducing new features we'll never release 4.7. See recent cry on -devel list ;)

Log in or register to post comments

Comment #11

Cvbge CreditAttribution: Cvbge commented 18 February 2006 at 08:11

Status:

Active

» Postponed

Concurrent editing is bad ;)

Log in or register to post comments

Comment #12

markus_petrux CreditAttribution: markus_petrux commented 18 February 2006 at 08:14

lol

I don't know about sqlite or oracle.

Look at ADOdb or phpBB DBAL. db_insert_id() would also be easily portable to MSSQL and others. ;)

I just think that if we keep introducing new features we'll never release 4.7. See recent cry on -devel list ;)

True

Log in or register to post comments

Comment #13

chx CreditAttribution: chx commented 18 February 2006 at 08:18

http://sqlite.org/capi3ref.html#sqlite3_last_insert_rowid SQLite is OK.

Log in or register to post comments

Comment #14

markus_petrux CreditAttribution: markus_petrux commented 18 February 2006 at 08:55

SQLite: Call sqlite_last_insert_rowid after an INSERT
http://www.php.net/sqlite_last_insert_rowid

MSSQL: Query for SCOPE_INDENTITY after an INSERT
http://www.databasejournal.com/features/mssql/article.php/10894_3307541_2

Oracle: SELECT sequence_id.currval FROM DUAL
http://www.oracle-base.com/articles/8i/AutoNumber.php
http://www.oracle.com/technology/pub/articles/deployphp/lim_deployphp.html

Log in or register to post comments

Comment #15

markus_petrux CreditAttribution: markus_petrux commented 14 March 2006 at 00:33

Reading the development mailing list (Khalid's comment to the "Can someone confirm -- Minimum MySQL Privileges?" thread) made me remind of this issue.

Well, this issue is somehow related, as if there was a db_insert_id() function, there would be no need to use the sequences table, hence no reason to use LOCK TABLE.

Log in or register to post comments

Comment #16

Cvbge CreditAttribution: Cvbge commented 14 March 2006 at 13:16

It's needed by cache_set() and variable_set().

Log in or register to post comments

Comment #17

chx CreditAttribution: chx commented 23 March 2006 at 18:26

I have something for 4.7 which keeps db_next_id but uses mysql_insert_id for mysql. http://drupal.org/node/55516

Log in or register to post comments

Comment #18

m3avrck CreditAttribution: m3avrck commented 14 August 2006 at 15:27

Any new updates on this? Sounds like a fairly straightforward plan to me, even if a few modules break, the benefits far outweigh the breakage.

Log in or register to post comments

Comment #19

moshe weitzman CreditAttribution: moshe weitzman commented 15 August 2006 at 00:54

Status:

Postponed

» Active

Log in or register to post comments

Comment #20

quicksketch

he/him

CreditAttribution: quicksketch commented 17 April 2007 at 04:19

Version:

x.y.z

» 6.x-dev

Just putting into 6.x and giving a bump.

Log in or register to post comments

Comment #21

m3avrck CreditAttribution: m3avrck commented 24 April 2007 at 16:03

There is an entirely different concept being thrown around by Souvent22 and myself: http://earnestberry.com/node/13

Log in or register to post comments

Comment #22

sammys CreditAttribution: sammys commented 30 April 2007 at 07:13

I wholly support stopping use of the sequences table (yes this is a bottleneck). +1 for that!

+1 to changing db_next_id() to db_insert_id() and flipping the statement ordering of the insert query and call to db_insert_id().

Doing this will also avoid incrementing IDs if the insert query fails for some reason.

--
Sammy Spets
Synerger
http://synerger.com

Log in or register to post comments

Comment #23

chx CreditAttribution: chx commented 30 May 2007 at 00:26

Title:	Getting rid of the sequences table, using db_insert_id() instead of db_next_id()	» Getting rid of the sequences table, using db_last_insert_id() instead of db_next_id()
Component:	base system	» database system
Assigned:	Unassigned	» chx
Priority:	Normal	» Critical
Status:	Active	» Needs review

File	Size
db_last_insert_id.patch	21.79 KB

All attempts to fix db_next_id failed, so get rid of it. there are too icky parts: batch -- but that's long running anyways, so one more query does not matter. node_save on insert can't insert with the relevant nid into node_revisions however that's fixed soon after and we rely anyways on all queries executing so I am not too concerned on node_revisions.nid being 0 for a very short while -- there was always a gap because node_save was not a transaction nor did it lock. Now the gap is a bit more wide.

Log in or register to post comments

Comment #24

chx CreditAttribution: chx commented 30 May 2007 at 00:31

Why now? First not earlier, because earlier we did not have subselects and schema API which standardized serial / auto_increment. Later is not an option either because db_next_id is broken currently.

Log in or register to post comments

Comment #25

chx CreditAttribution: chx commented 30 May 2007 at 00:33

Relevant devel post http://lists.drupal.org/pipermail/development/2005-December/012108.html

Log in or register to post comments

Comment #26

chx CreditAttribution: chx commented 30 May 2007 at 01:10

File	Size
db_last_insert_id_0.patch	21.8 KB

David Strauss catched a copy-paste error in form.inc

Log in or register to post comments

Comment #27

David Strauss

he/him

CreditAttribution: David Strauss commented 30 May 2007 at 01:14

Status:

Needs review

» Needs work

Aside from an issue I told chx about in IRC that he's fixing right now, the patch needs to update all the *.schema files to make the columns autoincrement/serial. This solution should be portable to MSSQL. Does anyone know about Oracle or DB2?

Log in or register to post comments

Comment #28

dww

we/he/they

CreditAttribution: dww commented 30 May 2007 at 01:20

say you've got a bunch of your own tables, and you need to update a nid in all of them when you add a new node (this happens all the time in project*, for example, where you've got {node} and {node_revisions}, but then a bunch of project* specific tables for all the project-specific fields and data). how are you sure that db_last_insert_id() gives you the id of the last node you inserted? imagine:

thread A inserts something into {node} (nid == 10)
thread B inserts something else into {node} (nid == 11)
thread A calls db_last_insert_id() and gets back 11, since that was the last thing inserted.
thread B calls db_last_insert_id() and also gets back 11
both threads start populating their own other tables with nid 11, and get duplicate key errors or worse when they race each other to see who gets their stuff written to the DB first. :(

i think i ran into exactly this problem with the cvs access control scripts or something, and there was no good way to ensure this worked right without adding another id to the {sequences} table, so each thread could safely get its own id at the start of its operations, and then consistently use the same id, regardless of who was actually updating which tables when.

seems like now, everyone has to wrap the insert and the call to db_last_insert_id() in a transaction or otherwise lock the DB, to prevent someone else from inserting something before you discover your id. your patch doesn't do this anywhere. so, unless i completely misunderstand what's going on (which is quite likely), i think this patch would introduce a bunch of evil race conditions in core, and generally make it harder for developers to get this right.

in short: big ol' -1 on removing the {sequences} table until these issues are clarified or fixed.

Log in or register to post comments

Comment #29

David Strauss

he/him

CreditAttribution: David Strauss commented 30 May 2007 at 01:23

@dww Don't worry about it. LAST_INSERT_ID is connection (and hence thread) specific.

Log in or register to post comments

Comment #30

add1sun CreditAttribution: add1sun commented 30 May 2007 at 01:24

From a functionality review the only thing I'm having issues with it comments.

When creating a comment I am getting this error with the patch, which goes away when the patch is reversed:

user warning: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '2, 0, 0, 'cvdv', 'svsdf', 1, '192.168.2.1', 1180487790, 0, 0, 'a:1:{i:0;i:0;}', ' at line 1 query: INSERT INTO comments (nid, pid, uid, subject, comment, format, hostname, timestamp, status, score, users, thread, name, mail, homepage) VALUES 2, 0, 0, 'cvdv', 'svsdf', 1, '192.168.2.1', 1180487790, 0, 0, 'a:1:{i:0;i:0;}', '02/', '', '', '') in /Users/addi/Sites/patch/includes/database.mysql.inc on line 163.

I went through users, taxonomy, aggregator, contact, blocks, nodes and things seemed to be working fine.

Log in or register to post comments

Comment #31

chx CreditAttribution: chx commented 30 May 2007 at 01:27

Status:

Needs work

» Needs review

File	Size
db_last_insert_id_1.patch	21.8 KB

Comments fixed. David, the schema already provides serial for us because postgresql needs them.

Log in or register to post comments

Comment #32

David Strauss

he/him

CreditAttribution: David Strauss commented 30 May 2007 at 01:29

I'd like to see benchmarks for this patch versus the old db_next_id() code. kbahley said in a related issue that InnoDB does a table lock to implement autoincrement, so increasing use of autoincrement might not help versus the old db_next_id() table-locking code.

Log in or register to post comments

Comment #33

dww

we/he/they

CreditAttribution: dww commented 30 May 2007 at 01:30

Status:

Needs review

» Needs work

@david strauss: yay, the missing bit of completely undocumented, non-obvious wisdom that makes everything work. ;) thanks for clarifying... i knew i must have been wrong, i just wasn't sure why. ;)

@chx: therefore, please just add some doxygen to db_last_insert_id() so others know about this. not only will it help developers know they can use this function safely, it'll help remind people adding support for other DBs that they better make sure their version provides the same semantics we're assuming.

@all: thanks!

$dww_vote += 2;

Log in or register to post comments

Comment #34

add1sun CreditAttribution: add1sun commented 30 May 2007 at 01:30

yup, comments fixed now. looks good

Log in or register to post comments

Comment #35

add1sun CreditAttribution: add1sun commented 30 May 2007 at 01:31

Status:

Needs work

» Reviewed & tested by the community

no more broken bits. chx said to rtbc

Log in or register to post comments

Comment #36

chx CreditAttribution: chx commented 30 May 2007 at 01:32

Yes. Because there is no need for an update path, system_update_6019 already solved that for us. yay.

Log in or register to post comments

Comment #37

dww

we/he/they

CreditAttribution: dww commented 30 May 2007 at 01:33

darn busy threads. my last comment didn't mean to set to "needs work"... sorry about that.

Log in or register to post comments

Comment #38

David Strauss

he/him

CreditAttribution: David Strauss commented 30 May 2007 at 01:36

@dww The archaic runes documenting LAST_INSERT_ID are carefully obscured from the eyes of mere mortals on the "How to Get the Unique ID for the Last Inserted Row" MySQL manual page:

For LAST_INSERT_ID(), the most recently generated ID is maintained in the server on a per-connection basis. It is not changed by another client. It is not even changed if you update another AUTO_INCREMENT column with a non-magic value (that is, a value that is not NULL and not 0). Using LAST_INSERT_ID() and AUTO_INCREMENT columns simultaneously from multiple clients is perfectly valid. Each client will receive the last inserted ID for the last statement that client executed.

Log in or register to post comments

Comment #39

dww

we/he/they

CreditAttribution: dww commented 30 May 2007 at 01:56

@david strauss: i just meant "undocumented" in the sense of not documented anywhere in the drupal code or in this patch. that said, thanks for the pointer to the mysql docs on this.

Log in or register to post comments

Comment #40

recidive CreditAttribution: recidive commented 30 May 2007 at 02:50

Great patch!

Question: why not just db_insert_id() rather than db_last_insert_id()?

Log in or register to post comments

Comment #41

bjaspan CreditAttribution: bjaspan commented 30 May 2007 at 03:39

It is late and I'm very tired, so I'm probably missing something. db_last_insert_id() for pgsql takes a $name argument and db_last_insert_id() for mysql does not. Is it just that mysql never needs a name because there is only a single last-id per connection?

Even so, the mysql function ought to require the same parameters as the pgsql function (a) for consistency and (b) so that module developers do not break pgsql sites by concluding that if it works without an arg for mysql, that must be okay.

I haven't reviewed the patch any further than this. I seem to recall while adding type 'serial' support to Schema API that at one point I thought, "Ah, yes, *this* is why db_next_id() is necessary and always using auto-increment is not sufficient." But I don't remember why.

Log in or register to post comments

Comment #42

kbahey CreditAttribution: kbahey commented 30 May 2007 at 04:06

@David Strauss, kbahley? The last person who mispelled my nick had his insurance rates go up ... ;-)

What is the risk of this patch creating a lot of "ghost" rows in the database tables? Can it insert incomplete stuff and then the user abandons the operation, and these are left in the database causing confusion for modules and the like?

Log in or register to post comments

Comment #43

David Strauss

he/him

CreditAttribution: David Strauss commented 30 May 2007 at 04:24

@bjaspan I expressed the exact same sentiments in conversation with chx earlier this evening.

Log in or register to post comments

Comment #44

dww

we/he/they

CreditAttribution: dww commented 30 May 2007 at 05:57

Status:

Reviewed & tested by the community

» Needs work

@bjaspan and david -- i agree. therefore, this still needs work before it's RTBC.

Log in or register to post comments

Comment #45

joshk CreditAttribution: joshk commented 30 May 2007 at 06:42

I fear this. Why, you might ask? Because the sequences table has saved my ass on numerous occasions.

I realize that a smart DBA can determine all this information from her database, and even set gaps in ids or mass-update auto_increments to help with content migrations, but it's still very nice to have a central place where all my important drupal ids live. I would be sorry to see it go.

However, if it's already gone from postgres, well, maybe it's just time. (sniff).

Log in or register to post comments

Comment #46

chx CreditAttribution: chx commented 30 May 2007 at 07:25

Status:

Needs work

» Needs review

File	Size
db_last_insert_id_2.patch	22.51 KB

Doxygen'd the new function, made a better db_last_insert_id for postgresql 8.1+ and used better parameters for db_last_insert_id and made them mandatory.

Log in or register to post comments

Comment #47

Dries CreditAttribution: Dries commented 30 May 2007 at 07:53

Very interesting patch, and an incredible amount of overnight activity in this issue. What did you guys have for dinner? :)

David mentioned he'd like to understand the performance implications, and I tend to agree. It would be nice to understand this a bit better.

From http://www.mysqlperformanceblog.com/2007/05/09/mysql-users-conference-in... :

Yet another known Innodb problem is table locks which are taken for statement duration for Inserts with auto-increment column. This starts giving you problems if you have a lot of concurrent inserts happening to the same table.

This was born same as bunch of others due to statement level MySQL Replication which among other requirements, needs all auto-increment values in multiple value insert to be sequential.

To be honest Heikki probably took a bit of shortcut and kept code simple and same for all insert cases - in fact if the insert is single value insert or the number of values in the bulk is known you can allocate all values at once when statement is started instead of handling auto increment lock for full statement duration.

The problem with multi value inserts however is the following - even if you know number of values in insert statement you do not know if all of them need autoincrement value allocated, some may already have values specified. I’m not sure if Heikki will find a way to count exact number of auto increment values needed or if we get behavior change by having potential “holes” when auto increment values are assigned but never used. There are probably similar problems in INSERT IGNORE and ON DUPLICATE KEY UPDATE cases.

Now In MySQL 5.1+ it is also possible to fix this problem for cases when number of rows in auto-increment batch insert is not known as we do not have to allocate sequential auto increment values if row level replication is used. This however will be again behavior change to watch out for.

So these are all possibilities but are we expected to have the problem fixed soon ? Yes indeed there is patch out where already which may come in MySQL 5.2 or even MySQL 5.1. I surely would like to see it sooner at least in more flexible community version.

I don't think it is a show-stopper though. In fact, I think we might have a winner. :)

Log in or register to post comments

Comment #48

chx CreditAttribution: chx commented 30 May 2007 at 08:06

File	Size
db_last_insert_id_3.patch	23.15 KB

Even better postgresql code but it mandates one more argument. Noted that db_last_insert_id is thread safe.

Log in or register to post comments

Comment #49

chx CreditAttribution: chx commented 30 May 2007 at 08:19

That innodb stuff is only relevant if you have a transaction or an extended insert. We have neither. If someone has postgresql below 8.1 and changed the default DB configuration to not have OIDs will have to do some magic with his database to get this working. Not my problem, though.

Log in or register to post comments

Comment #50

chx CreditAttribution: chx commented 30 May 2007 at 11:33

File	Size
db_last_insert_id_4.patch	22.56 KB

Further Googling and sammys' tests show that currval is per connection. So I went back to the currval based solution because it's thread safe and a hell lot simpler.

Log in or register to post comments

Comment #51

chx CreditAttribution: chx commented 30 May 2007 at 11:38

File	Size
db_last_insert_id_5.patch	22.11 KB

Removed lastval as it's not needed -- no oids, no version trouble.

Log in or register to post comments

Comment #52

sammys CreditAttribution: sammys commented 30 May 2007 at 11:39

Tests confirmed that currval operates per connection. FYI: Used two DB shell sessions and a table with a serial field. Inserted a record into the table from each session then called currval. Value returned by the session with the first insertion remained the same after the insertion in the other session. QED pffft

--
Sammy Spets
Synerger
http://synerger.com

Log in or register to post comments

Comment #53

Owen Barton CreditAttribution: Owen Barton commented 30 May 2007 at 15:27

Subscribing

Log in or register to post comments

Comment #54

quicksketch

he/him

CreditAttribution: quicksketch commented 30 May 2007 at 17:27

@chx Why the naming of the function as db_last_insert_id() instead of db_insert_id()?

While db_last_insert_id() is used in the SQL, db_insert_id() would make it inline with php functions mysql_insert_id() and mysqli_insert_id().

Log in or register to post comments

Comment #55

David Strauss

he/him

CreditAttribution: David Strauss commented 30 May 2007 at 17:31

That innodb stuff is only relevant if you have a transaction or an extended insert.

@chx InnoDB treats each statement as an individual transaction unless you're in a larger transaction. (This is why the mode is called "autocommit.") So, to say it's "only relevant if you have a transaction or extended insert" is true but misleading.

Log in or register to post comments

Comment #56

David Strauss

he/him

CreditAttribution: David Strauss commented 30 May 2007 at 17:34

@quicksketch It's called mysql_insert_id in the MySQL library because that's the only way to get an ID in MySQL -- after insert. Other databases let you get it before, so it's ambiguous to just call it db_insert_id when you support databases like PostgreSQL.

Log in or register to post comments

Comment #57

quicksketch

he/him

CreditAttribution: quicksketch commented 30 May 2007 at 18:07

Thanks for the explaination David. Sounds like a winner to me.

Log in or register to post comments

Comment #58

quicksketch

he/him

CreditAttribution: quicksketch commented 30 May 2007 at 19:55

Status:

Needs review

» Reviewed & tested by the community

File	Size
db_last_insert_id_6.patch	23.83 KB

Works awesome. The patch needed a db_last_insert_id() moved from upload.module to file.inc to keep up with the file-system overhaul that went in last night. We're ready to roll with this one.

Log in or register to post comments

Comment #59

Crell CreditAttribution: Crell commented 30 May 2007 at 20:13

Oh the things one misses when one goes out of town for a few days...

My concern with dropping sequences in favor of auto_increment is two fold. One, all the DB guys I talk to keep insisting that sequences are more flexible than auto-increment. One could easily point out that while we can emulate auto-increment on sequence-based databases, one can emulate sequences on auto-increment-based databases, too. I do not claim to be an expert on the subject, however.

Two, one of my goals for Drupal 7 at this point is to finish the PDO driver I started but had to back-burner, and eventually we should move to an all-PDO backend for better abstraction, flexibility, and security. That leads to this http://us.php.net/manual/en/function.PDO-lastInsertId.php . Specifically, lastInsertId only sometimes works. There's no indication that I see on which PDO drivers support auto-increment. (I'd presume the MySQL one does, but I don't know about the others.)

Can anyone sooth my concern that switching Drupal 6 to auto-increment instead of sequences won't make a PDO driver for Drupal 7 that much more painful?

Log in or register to post comments

Comment #60

chx CreditAttribution: chx commented 30 May 2007 at 21:59

I can't claim to know PDO but look at the pgsql code, it uses sequences still. So I guess here you would need database specific stuff still, there is no way to avoid that. Same with schema, I believe.

Log in or register to post comments

Comment #61

kbahey CreditAttribution: kbahey commented 30 May 2007 at 22:59

@Crell

If PDO is going to set up back with more restrictions, caveats, gotchas, ...etc. then we really need to evaluate whether going this route is worth it or not (benefits vs. restrictions).

@all

I tested this patch on a clean database of HEAD, and things are working well. All hail the auto_increment.

There was a bunch of menu errors though, like:

Undefined index: mlid in ../includes/menu.inc on line 1340.

Not sure if this is related to this patch or something else.

+1 from me.

Log in or register to post comments

Comment #62

David Strauss

he/him

CreditAttribution: David Strauss commented 30 May 2007 at 23:19

To add to kbahey's post, we need to be reconsider what this patch is trying to solve. The only released GNU/Linux distribution that we thought had a flawed version of MySQL was Ubuntu 7.04, but we later discovered that they use the enterprise version, which doesn't have the problem. Debian testing seems to use a flawed version, but it's not clear that Debian will even continue to use that version for the next release. We have not verified that Debian testing actually has the flaw.

Log in or register to post comments

Comment #63

kbahey CreditAttribution: kbahey commented 31 May 2007 at 00:15

Found a problem when creating comments:

When reviewing a comment, I get:

    * notice: Undefined property: stdClass::$nid in ../modules/comment/comment.module on line 1749.
    * notice: Undefined property: stdClass::$cid in ../modules/comment/comment.module on line 1757.
    * notice: Undefined property: stdClass::$comment in ../modules/comment/comment.module on line 1761.
    * notice: Undefined property: stdClass::$format in ../modules/comment/comment.module on line 1761.
    * notice: Undefined property: stdClass::$subject in ../modules/comment/comment.module on line 1845.
    * notice: Undefined property: stdClass::$cid in ../modules/comment/comment.module on line 1845.

Then when clicking submit, I get this, and the comment is NOT created.

    * notice: Undefined index: name in ../modules/comment/comment.module on line 822.
    * user warning: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '2, 0, 0, 'comment 1', 'comment 1', 1, '192.168.0.249', 1180570374, 0, 0, 'a:1:{i' at line 1 query: INSERT INTO comments (nid, pid, uid, subject, comment, format, hostname, timestamp, status, score, users, thread, name, mail, homepage) VALUES 2, 0, 0, 'comment 1', 'comment 1', 1, '192.168.0.249', 1180570374, 0, 0, 'a:1:{i:0;i:0;}', '01/', '', '', '') in ../includes/database.mysql.inc on line 163.

Log in or register to post comments

Comment #64

kbahey CreditAttribution: kbahey commented 31 May 2007 at 00:20

Status:

Reviewed & tested by the community

» Needs work

I backed out the patch, and tried again creating a comment. The preview errors still appear.

However when pressing Submit, the comment IS INDEED CREATED, and only one error appears:

notice: Undefined index: name in /../modules/comment/comment.module on line 823.

So the patch needs work, at least for me.

Log in or register to post comments

Comment #65

Crell CreditAttribution: Crell commented 31 May 2007 at 00:28

Whatever is currently Debian Testing is virtually guaranteed to never be a "stable" release. Debian Etch was just released as stable, so the next stable is at least a year away. We can/should ignore whatever version it has in it because it will change a few dozen times in the next few months. Etch/stable and maybe Sarge/old-stable are the only Debians we should worry about in terms of bug-workarounds.

<--Former Debian user, now running, of course, Ubuntu Feisty. :-) I wonder if Ubuntu's bug/sec fixes have included a fix for this yet. I'll have to check tonight if I can, since I'm on a hell project this week and have almost no Drupal time. :-(

Log in or register to post comments

Comment #66

kbahey CreditAttribution: kbahey commented 31 May 2007 at 00:32

Status:

Needs work

» Needs review

File	Size
db_last_insert_id_7.patch	24.9 KB

Here is a reroll of the patch with a fix for the comments. A missing paren was the culprit.

Log in or register to post comments

Comment #67

David Strauss

he/him

CreditAttribution: David Strauss commented 31 May 2007 at 00:40

@Crell Ubuntu Feisty isn't vulnerable in the first place. Ubuntu uses the enterprise, not community, version of MySQL.

Log in or register to post comments

Comment #68

chx CreditAttribution: chx commented 31 May 2007 at 05:58

Status:

Needs review

» Reviewed & tested by the community

This is a feature request from ages ago. Instead of our somewhat hackish solution we let the databsae do what a database should do. The result is less code in Drupal, better working code in Drupal etc.

Log in or register to post comments

Comment #69

Dries CreditAttribution: Dries commented 31 May 2007 at 12:01

It would be great to see someone test this patch on PostgreSQL.

Another good reason for this patch is to support master-master configurations with MySQL.

Log in or register to post comments

Comment #70

matt_paz CreditAttribution: matt_paz commented 1 June 2007 at 16:49

I love, love, love the fact that action is being taken on this! Hooray!

Log in or register to post comments

Comment #71

bjaspan CreditAttribution: bjaspan commented 1 June 2007 at 17:18

I'll try this on pgsql "real soon now."

Log in or register to post comments

Comment #72

asimmonds CreditAttribution: asimmonds commented 2 June 2007 at 08:23

Status:

Reviewed & tested by the community

» Needs work

The warnings that are mentioned in #61 for a fresh install are still generated

# notice: Undefined index: mlid in .\includes\menu.inc on line 1341.
# notice: Undefined index: mlid in .\includes\menu.inc on line 1518.

When building a new menu, we have a problem with the parents array, the comment in _menu_link_parents_set() says:

  // The parent (p1 - p6) corresponding to the depth always equals the mlid.

but with this patch, we can't set one of these parents with mlid as we don't know the mlid yet. In the resulting installation the menu doesn't expand as you navigate.

Log in or register to post comments

Comment #73

chx CreditAttribution: chx commented 2 June 2007 at 12:45

Status:

Needs work

» Needs review

File	Size
db_last_insert_id_8.patch	27.09 KB

Log in or register to post comments

Comment #74

dvessel CreditAttribution: dvessel commented 2 June 2007 at 18:21

Tested patch #73, installed with no notices. All the Nav links were intact..

This was with a mysqli connection using ver. 4.1.21 and PHP 5.1.

Log in or register to post comments

Comment #75

chx CreditAttribution: chx commented 2 June 2007 at 18:36

Status:

Needs review

» Reviewed & tested by the community

Then it's ready.

Log in or register to post comments

Comment #76

asimmonds CreditAttribution: asimmonds commented 2 June 2007 at 23:48

Status:

Reviewed & tested by the community

» Needs review

File	Size
db_last_insert_id_9.patch	28.64 KB

Doing a quick test under postgres 8.2.4, when adding a new page node I get:

    * warning: pg_query() [function.pg-query]: Query failed: ERROR: null value in column "nid" violates not-null constraint in .\includes\database.pgsql.inc on line 161.
    * user warning: query: INSERT INTO node_revisions (title, body, teaser, timestamp, uid, format, log) VALUES ('page one', 'page one', 'page one', 1180824062, 1, 1, '') in .\includes\database.pgsql.inc on line 180.

Comparing the pgsql and mysql databases, the nid column in node_revisions has a default of 0 in mysql but not in pgsql.
Adding a 'default 0' in the node_revisions schema, fixes this but is it out of scope for this patch?

@@ -65,7 +65,7 @@ function node_schema() {

   $schema['node_revisions'] = array(
     'fields' => array(
-      'nid'       => array('type' => 'int', 'unsigned' => TRUE, 'not null' => TRUE),
+      'nid'       => array('type' => 'int', 'unsigned' => TRUE, 'not null' => TRUE, 'default' => 0),
       'vid'       => array('type' => 'serial', 'unsigned' => TRUE, 'not null' => TRUE),
       'uid'       => array('type' => 'int', 'not null' => TRUE, 'default' => 0),
       'title'     => array('type' => 'varchar', 'length' => 128, 'not null' => TRUE, 'default' => ''),

Also, something is wrong with node revisions, nid is left 0 for new revisions and the vid in node table doesn't get set to the new revision id. This happens under both mysql and pgsql.

In the attached patch, I've moved the update node table query builder after the the insert or update of the node_revisions table. This seems to have fixed the node revisions under mysql and pgsql.

Log in or register to post comments

Comment #77

David Strauss

he/him

CreditAttribution: David Strauss commented 3 June 2007 at 02:02

@asimmonds Shouldn't {node_revisions} include the actual nid in the insert? Even if we're not using real foreign keys, inserting into {node_revisions} without specifying an nid is bad.

Log in or register to post comments

Comment #78

chx CreditAttribution: chx commented 3 June 2007 at 08:28

Status:

Needs review

» Reviewed & tested by the community

File	Size
db_last_insert_id_10.patch	5.33 KB

Assimmonds' patch is great, just it had code style errors.

David, the problem is that we can not have two autoincrement at the same time. So, we first insert into revisions with a nid of 0, then we insert into node and update node_revisions because we can JOIN the two tables on vid. If you mean by "bad" that the database is in an incorrect state before the UDPATE, sure it is, but it is not different than before actually -- if node_save does not complete you have a problem. But, even so, the UPDATE is written in such a way that it tries to updates every row with the relevant nid where the nid is zero -- so Drupal tries very hard to return the database to correctness. There is no unique key on nid (thanks god) so having (0, vid) pairs is not a problem.

Log in or register to post comments

Comment #79

David Strauss

he/him

CreditAttribution: David Strauss commented 3 June 2007 at 09:03

@chx I'm fully aware of the limitations of autoincrement in MySQL. I'm saying that we need to insert into {node} before we create a corresponding record in {node_revisions}. Even though we don't use referential integrity, we should interact with the database so its addition would be transparent. Conceptually, {node_revisions} has a foreign key to {node}. No record should ever exist in {node_revisions} without a corresponding record in {node}.

Bad:
1. Insert into {node_revisions} with nid = 0.
2. Insert into {node}.
3. Update {node_revisions} with the new nid.

Good:
1. Insert into {node}.
2. Insert into {node_revisions}.
3. Update {node} with the new vid.

Really Good (probably infeasible):
1. Begin transaction.
2. Insert into {node}.
3. Insert into {node_revisions}.
4. Update {node} with the new vid.
5. Commit transaction.

Because we use INNER JOINs between {node} and {node_revisions}, having a record in {node} without a corresponding record in {node_revisions} will not cause significant problems. If anything only refers to {node}, then we need to have it require vid > 0.

Log in or register to post comments

Comment #80

chx CreditAttribution: chx commented 3 June 2007 at 10:17

I need vid first which is the true PK here and nid second hence the order of inserts.

Log in or register to post comments

Comment #81

bjaspan CreditAttribution: bjaspan commented 3 June 2007 at 14:29

David is right. If node is the master table for nodes with nid as the PK, then we (implicitly or explicitly) want to have a foreign key from node_revisions.nid to node.nid. That means that we have to insert into node first because node_revisions.nid cannot point to a non-existent node.nid (* see below).

chx says "I need vid first ... and nid second hence the order of inserts." Looking at the patch, it seems he needs vid first he needs to insert it into the node.vid column. So, he does (1) insert into node_revisions with invalid nid, save new vid, (2) insert in node, specifying the vid, save new nid (3) update node_revisions with saved nid.

Unless I'm mistaken, we can accomplish the same goal and provide ref integrity by changing the order: (1) insert into node with invalid vid, save nid, (2) insert node_revisions, specifying the nid, save vid, (3) update node with saved vid.

However, I'm speculating about something else. Why does node have a vid column at all? Revisions are a 1-to-many property of a node. So are taxonomy terms, but we don't have a tid column in node. The answer is "because node_load wants to load exactly one, either the newest or a specified, revision." That's a perfectly valid operational decision but I wonder if the problem is that by having a vid column we are representing the operational decision in the database in a way it should not be, and it is causing/will cause us grief regard ref integrity.

node_load, in the "get the newest revision" mode, could just as easily say:

SELECT * FROM node n INNER JOIN node_revision r ON n.nid=r.nid ORDER BY r.vid DESC LIMIT 1

So, we resolve the problem with node.vid and can remove it from the node table entirely. True, we add a sort to every node load, but there are not usually many revisions, and sorting by a primary key (vid) ought to be very fast.

If we decide that the ORDER BY r.vid LIMIT 1 solution is no good for performance reasons, then we should acknowledge that node.vid exists purely as a cache of the most recent vid per node. We might even rename it to node.vid_cache just to be clear about that.

To wrap up, I agree with David's "Good" scenario. Unless I'm totally on crack, in which case I'm sure someone will point that out to me. :-)

* Footnote: Actually, we could insert into node_revisions first by putting a sentinel (dummy) entry in the node table with nid = -1 (not 0, as that will confuse auto_increment). But that is ugly and likely to cause other problems.

Log in or register to post comments

Comment #82

bjaspan CreditAttribution: bjaspan commented 3 June 2007 at 14:30

Status:

Reviewed & tested by the community

» Needs work

Since I'm actually hoping we can maybe get ref integrity into D6 now that we have more time, I don't feel like this patch is ready until we (a) change it so ref integrity is possible or (b) explicitly decide that is not a goal.

Log in or register to post comments

Comment #83

chx CreditAttribution: chx commented 3 June 2007 at 16:37

Status:

Needs work

» Reviewed & tested by the community

Reference integrity won't help you here. But, the biggest problem is that there is a unique key on node.vid so if two threads manage to insert into node with a vid of zero (before the INSERT INTO node_revisions and the UPDATE) then you are througly hosed. There is no such unique index on node_revisions.nid as it's truly not unique. I have choosen this order rather carefully.

Log in or register to post comments

Comment #84

David Strauss

he/him

CreditAttribution: David Strauss commented 3 June 2007 at 18:53

@chx If we want to keep the unique key on {node}.vid, we just have to allow it to be NULL. Unique keys (unlike primary keys) only require non-null values to be unique. Furthermore, if we formally establish {node}.vid as a foreign key of {node_revisions}.vid, the NULL values will work there, too. Using zero as a pseudo-null value is kind of a hack. In any case, node.vid is a denormalization, so it's a lower priority to maintain its integrity than the real data.

@bjaspan Even removing {node}.vid doesn't necessarily force a sort if {node_revisions} has a (nid, vid) index. I'll have to check this, though.

Log in or register to post comments

Comment #85

David Strauss

he/him

CreditAttribution: David Strauss commented 3 June 2007 at 18:56

@bjaspan Also, taxonomy terms have a many-to-many relationship with nodes. Restricting it further (to, say, many-to-one) is only done in the application software.

Log in or register to post comments

Comment #86

chx CreditAttribution: chx commented 4 June 2007 at 02:42

Assigned:	chx	» Unassigned
Status:	Reviewed & tested by the community	» Needs work

Then write it. I do not care. I tried, again, again and again and I do not see any real reasons here. Surely pie in the sky is great.

Log in or register to post comments

Comment #87

chx CreditAttribution: chx commented 4 June 2007 at 03:34

Status:

Needs work

» Needs review

File	Size
db_last_insert_id_11.patch	28.52 KB

Any more future dreams we want to support?

Log in or register to post comments

Comment #88

David Strauss

he/him

CreditAttribution: David Strauss commented 4 June 2007 at 09:16

Shiny on #drupal brought up an interesting point about this patch: reading the sequence from PostgreSQL after the INSERT isn't guaranteed to return the same value used in the INSERT. Unlike MySQL's LAST_INSERT_ID(), PostgreSQL's sequences aren't per-connection. They're global.

Options:
* Encapsulate INSERTs and the sequence lookup in transactions.
* Provide a db_insert() function that returns the ID and properly juggles sequences and INSERTs.
* Abandon this patch and just use db_next_id().

Previously, dww raised a similar objection about the MySQL implementation. My response in #29 remains correct; LAST_INSERT_ID is thread/connection-specific. However, PostgreSQL doesn't use LAST_INSERT_ID semantics.

Log in or register to post comments

Comment #89

David Strauss

he/him

CreditAttribution: David Strauss commented 4 June 2007 at 09:20

Status:

Needs review

» Needs work

The race condition in PostgreSQL needs fixing.

Log in or register to post comments

Comment #90

Shiny CreditAttribution: Shiny commented 4 June 2007 at 09:20

Status:

Needs work

» Needs review

Let me clarify by quoting dww:

thread A inserts something into {node} (nid == 10)
thread B inserts something else into {node} (nid == 11)
thread A calls db_last_insert_id() and gets back 11, since that was the last thing inserted.
thread B calls db_last_insert_id() and also gets back 11
both threads start populating their own other tables with nid 11, and get duplicate key errors or worse when they race each other to see who gets their stuff written to the DB first. :(

The above is solved by mysql's last_insert_id being per connection --- however you're not calling last_insert_id in postgresql - you're instead doing exactly what dww describes above.

mysql lacks sequences. Every other database i've ever used implements sequences. Drupal implements these with a table for mysql. It works. Why fix it?

Log in or register to post comments

Comment #91

Shiny CreditAttribution: Shiny commented 4 June 2007 at 09:24

Status:

Needs review

» Needs work

a sequencing error caused a race condition on this thread.
Changing status to (code needs work)

Log in or register to post comments

Comment #92

chx CreditAttribution: chx commented 4 June 2007 at 09:33

Status:

Needs work

» Needs review

#50, #52 .

Log in or register to post comments

Comment #93

chx CreditAttribution: chx commented 4 June 2007 at 09:42

I am so very happy that everyone is such a big database theorist -- which, by now, delayed this patch needlessly twice -- but reading the issue which clearly states that currval is per connection or testing the patch, that's too much to ask.

Log in or register to post comments

Comment #94

Shiny CreditAttribution: Shiny commented 4 June 2007 at 09:57

apologies - I've tested and yes it is per connection on pg7.4 and 8.1

this will instead become another issue for those wanting persistent connections. If we're going this way, can we remove the comment in includes/database.pgsql.inc about persistent connections improving performance?

Log in or register to post comments

Comment #95

David Strauss

he/him

CreditAttribution: David Strauss commented 4 June 2007 at 10:28

@Shiny What is the issue with persistent connections?

Log in or register to post comments

Comment #96

chx CreditAttribution: chx commented 4 June 2007 at 12:22

ENOUGH! Drupal does not support persistent connections. And if someone would hack it, he would have problems with for example search using temporary tables.

And guys, if anyone else follows up with more bullshit, I swear that I will go into database level and remove the followup.