For inserts, the PostgreSQL driver needs to know two things:
- which columns (if any) are BLOBs because those are special-cased in the PDO implementation
- the name of the sequence(s) used on the target table
Currently the implementation relies on drupal_get_schema() which creates all sorts of fun chicken and egg problems. We could easily change this by introspecting the database, without fearing too much about the cost of some additional queries.
Comments
Comment #1
damien tournoud commentedAs a RFC, here is a first implementation.
Comment #2
damien tournoud commentedNew version, hopefully with less obvious mistakes.
Comment #3
Crell commentedThere's already an issue for this, which asks to put this functionality into the schema itself. #301038: Add a cross-compatible database schema introspection API IMO that's where it belongs, not in the connection object.
Working to get that issue taken care of, though, would be awesome. :-)
Comment #4
damien tournoud commented@Crell: please stop kidding: we only need two information about the table, and we only need them when dealing with InsertQuery. There is zero need to load the full-blown schema object for this. Please review the patch.
Comment #5
Crell commentedWho's kidding? As of right now we do not support ANY non-local databases (local = where Drupal is installed) aside from MySQL and SQLite, because of the requirement of the schema for field-type checking. If we make the schema object able to detect the schema as needed, even if it requires some DB-specific logic, we have enabled remote-DB access for all supported database types. This patch works only for Postgres, and puts information about the schema in the connection object rather than the schema object. That's the wrong place for it.
The schema object should be the abstraction through which all information about a given connection's tables are accessed so that we can do whatever weirdness we need to do. Note, I'm talking about the DatabaseSchema class, not the drupal_get_schema() mega-array. IMO that mega array should be split up by table and moved behind the schema object to begin with.
Also, we care for UpdateQuery, too, not just InsertQuery.
Comment #6
damien tournoud commentedI never said we don't need to add introspection in schema.inc. But this is completely another issue; for Insert and Update queries we only need some very specific information, and we really don't need nor want to load the full schema.inc for this.
Comment #7
drewish commentedsubscribing. chx pointed me to this issue after I posted #351002: Cannot update D6 to D7 on pgsql. this patch fixes that issue and in light of that i'm marking this as critical.
Comment #8
drewish commentedJust to follow up. I gave the patch a good look and it looked okay to me but I'm not comfortable with the affected code to mark it RTBC.
Considering that #301038: Add a cross-compatible database schema introspection API has no patch at this point I'd say we should fix this and then consider a heavier solution like that.
Comment #9
keith.smith commentedStrictly just reading the code comments, "informations" has an extra "s".
Comment #10
dries commentedLet's fix the typo identified by Keith.
I'd like to get another update from Crell too.
Comment #11
drewish commentedHere's a re-roll that fixed the typo.
Comment #12
Crell commentedI am still not convinced that this belongs in the connection object. Information about the DB's schema belongs in the schema object. If the schema object is too clunky right now, then let's make it less clunky.
Comment #13
Crell commentedCross posted. Setting back to CNR for the bot, even though I still do not agree with the approach.
Comment #14
drewish commentedit'd be great to get it in even if it's imperfect because without it you're unable to run updates on HEAD.
Comment #15
damien tournoud commentedI'm not worried about the API, I'm worried about the size of the code itself. I don't see why we should force loading a full schema.inc on every page request for PostgreSQL, while we need only a very specialized piece of information.
The attached patch takes care about updating UpdateQuery_pgsql too, improve code comments and consistency.
Comment #16
drewish commentedI don't think @see tags are complete sentences and therefore shouldn't end with a period.
Tried to do a clean install after applying the patch and got the following error:
Comment #17
damien tournoud commentedTesting the new tag system.
Comment #18
josh waihi commentedsomething like this will help satisfy both parties - I agree with Crell, I see no reason why that extra schema stuff should be in the connection. This patch attempts to put it in Schema though this is untested and most likey won't work. but surely we can do something like this?
Comment #19
damien tournoud commented@Josh: well, this is one of the worst solution.
queryTableInformation()is a very specialized function: it only fetches information about blobs and sequences used on a table. It is not a general schema query solution, and has nothing to do in schema.inc. My points in making that this way are:(1) we simply don't need to query the whole table information for Insert and Update queries
(2) we don't need to load the full schema.inc file at every request (the PostgreSQL driver is slow enough)
And removed the title that is of no use anymore, now that we have proper tags :)
Comment #20
Crell commentedJust how much of a performance hit is there really to loading the schema.inc file? It's not an especially complicated class, so it should parse quickly, and loading the class itself doesn't automatically load the full schema out of the cache. (Side note: We really do need to break that up still. Bah.)
Comment #21
josh waihi commentedSorry, I was under the impression that the schema.inc had already been included. I thought it was the drupal_get_schema() function calls you wanted to avoid. I still think that if its schema related then it should reside in the schema.inc. I'm still not convinced that
queryTableInfomation()belongs in Database_pgsql.I'm not sure how my suggestion loads all the table information:
$this->connection->schema()->queryTableInformation($this->table);I assumed this did the same as
$this->connection->queryTableInformation($this->table);but stored the function in the schema.Comment #22
josh waihi commentedin regards to my comment above, when
schema()is called is mearly initializes an instance of DatabaseSchema_pgsql which in turn assigns the connection to a variable within the instance. Whenschema()is called, it also assumes that schema.inc has been included. So I'm still a little confused as to how movingqueryTableInformationto the schema.inc will slow things down.Comment #23
damien tournoud commented@Josh: calling ->schema() will trigger a
$this->schema = new $class_type($this);, which will trigger, via the SPL autoloader, the inclusion of schema.inc.Comment #24
josh waihi commentedok fair enough, I'm convinced. This is Damien's patch with a few small changes and error clean-ups. am currently testing it on a PostgreSQL testbed. will post results when done
Comment #25
josh waihi commentedforgot to change status
Comment #26
josh waihi commentedalso test results on postgres: Failed: 8138 passes, 5 fails, 0 exceptions <-- no different to CVS HEAD
Comment #27
josh waihi commentedstupid status defaults!!!
Comment #28
Crell commentedI still disagree that this belongs in the connection object rather than the schema object. At absolute minimum, if the performance difference is enough to justify it we should document why we're doing this lookup in the "wrong" place.
Comment #30
josh waihi commentedfaulty slave, retesting
Comment #31
josh waihi commentedok. Time trials, I selected the database unit tests (717 tests) and ran them through the web interface. Results:
HEAD 1/30/09: The tests finished in 14 min 34 sec.
Patch applied: The tests finished in 14 min 22 sec.
Patch Below applied:The tests finished in 14 min 45 sec. (logic was stored in schema as Crell suggested)
In this instance we are talking seconds of difference but there is no doubt that the patch above in #24 is the optimal.
Comment #32
catchJosh - I know it's annoying, but we probably need those run through again, I'm sure there's a few seconds margin of error running tests on most machines anyway. Either that or can you run ab on a database heavy page?
Comment #33
josh waihi commented@catch, this was only over 717 tests out of 8000+ I'd imagine the results would amplify and become clearer if all 8000+ tests were run. Time is the issue here. wish boombatower would get those postgres testbeds running ;)
Comment #34
bjaspan commentedThis patch seems to work to fix the problem with installation on pgsql identified by #361683: Field API initial patch. I did not review the code other than to try the patch.
Comment #35
josh waihi commented#24 fixes #375264: Field API failing on PostgreSQL
Crell, can we get this into head? Cause HEAD is broken on PostgreSQL
Comment #36
damien tournoud commentedThis is a reroll of #24. It's quite time to get this little one in.
Comment #37
webchickHm. I want to commit this, I really do, because every second that we can't install D7 on pgsql means we are likely introducing more pgsql bugs in subsequent patches since we are effectively unable to test.
However, I still can't tell if Crell's concerns were addressed. :(
Crell, are you out there? Would you support committing this even as an interim fix before we re-work schema.inc in the way that you want?
Comment #38
webchickComment #39
Crell commentedI'll try to have a look at this patch tonight US time. Sorry, the past week and a half have been totally insane. :-(
Comment #40
Crell commentedI still disagree that this information belongs in the connection object at all. Putting it there instead of in the schema object is a hack and I'm still not convinced that it's a necessary hack. However, it's a single-database-specific hack and we do need to make simpletest work in postgres. So for now RTBC, and let's try really hard to block out time to clean up schema so we can move this logic there later.
Comment #41
webchickSome minor issues:
Probably "the" information required...
use - rather than * for the bulleted list.
Just a question.. why are these not keyed like the blob_fields are above?
We don't abbreviate variable names. $blob_count. Also consistent with the lines at the bottom of the patch.
$blob_count++ would be the norm. Could you place a comment there to explain why we pre-increment?
Comment #42
Crell commentedPre-increment is marginally faster than post-increment in PHP, for deep engine reasons I don't fully understand but have seen benchmarks for. In cases where either work equally well logically, pre-increment is therefore a (rather) slight performance benefit.
I don't know if intricacies of PHP engine semantics like that are worth a comment, or if they cross into the "we expect you to already know this" territory (as we do for a lot of other micro-optimizations).
Comment #43
josh waihi commentedComment #44
webchickCommitted to HEAD. Thanks!
Comment #45
josh waihi commentedah crap, the patch I posted were fixes made to the schema version. @Crell, I guess you got your wish
Comment #47
josh waihi commentedclosing -