For inserts, the PostgreSQL driver needs to know two things:
- which columns (if any) are BLOBs because those are special-cased in the PDO implementation
- the name of the sequence(s) used on the target table

Currently the implementation relies on drupal_get_schema() which creates all sorts of fun chicken and egg problems. We could easily change this by introspecting the database, without fearing too much about the cost of some additional queries.

Comments

damien tournoud’s picture

Status: Active » Needs review
StatusFileSize
new5.33 KB

As a RFC, here is a first implementation.

damien tournoud’s picture

StatusFileSize
new5.31 KB

New version, hopefully with less obvious mistakes.

Crell’s picture

Status: Needs review » Closed (duplicate)

There's already an issue for this, which asks to put this functionality into the schema itself. #301038: Add a cross-compatible database schema introspection API IMO that's where it belongs, not in the connection object.

Working to get that issue taken care of, though, would be awesome. :-)

damien tournoud’s picture

Status: Closed (duplicate) » Needs review

@Crell: please stop kidding: we only need two information about the table, and we only need them when dealing with InsertQuery. There is zero need to load the full-blown schema object for this. Please review the patch.

Crell’s picture

Who's kidding? As of right now we do not support ANY non-local databases (local = where Drupal is installed) aside from MySQL and SQLite, because of the requirement of the schema for field-type checking. If we make the schema object able to detect the schema as needed, even if it requires some DB-specific logic, we have enabled remote-DB access for all supported database types. This patch works only for Postgres, and puts information about the schema in the connection object rather than the schema object. That's the wrong place for it.

The schema object should be the abstraction through which all information about a given connection's tables are accessed so that we can do whatever weirdness we need to do. Note, I'm talking about the DatabaseSchema class, not the drupal_get_schema() mega-array. IMO that mega array should be split up by table and moved behind the schema object to begin with.

Also, we care for UpdateQuery, too, not just InsertQuery.

damien tournoud’s picture

I never said we don't need to add introspection in schema.inc. But this is completely another issue; for Insert and Update queries we only need some very specific information, and we really don't need nor want to load the full schema.inc for this.

drewish’s picture

Priority: Normal » Critical

subscribing. chx pointed me to this issue after I posted #351002: Cannot update D6 to D7 on pgsql. this patch fixes that issue and in light of that i'm marking this as critical.

drewish’s picture

Just to follow up. I gave the patch a good look and it looked okay to me but I'm not comfortable with the affected code to mark it RTBC.

Considering that #301038: Add a cross-compatible database schema introspection API has no patch at this point I'd say we should fix this and then consider a heavier solution like that.

keith.smith’s picture

Strictly just reading the code comments, "informations" has an extra "s".

dries’s picture

Status: Needs review » Needs work

Let's fix the typo identified by Keith.

I'd like to get another update from Crell too.

drewish’s picture

Status: Needs work » Needs review
StatusFileSize
new5.67 KB

Here's a re-roll that fixed the typo.

Crell’s picture

Status: Needs review » Needs work

I am still not convinced that this belongs in the connection object. Information about the DB's schema belongs in the schema object. If the schema object is too clunky right now, then let's make it less clunky.

Crell’s picture

Status: Needs work » Needs review

Cross posted. Setting back to CNR for the bot, even though I still do not agree with the approach.

drewish’s picture

it'd be great to get it in even if it's imperfect because without it you're unable to run updates on HEAD.

damien tournoud’s picture

StatusFileSize
new7.49 KB

Information about the DB's schema belongs in the schema object. If the schema object is too clunky right now, then let's make it less clunky.

I'm not worried about the API, I'm worried about the size of the code itself. I don't see why we should force loading a full schema.inc on every page request for PostgreSQL, while we need only a very specialized piece of information.

The attached patch takes care about updating UpdateQuery_pgsql too, improve code comments and consistency.

drewish’s picture

Status: Needs review » Needs work

I don't think @see tags are complete sentences and therefore shouldn't end with a period.

Tried to do a clean install after applying the patch and got the following error:

PDOException: UPDATE batch SET token=:db_update_placeholder_0, batch=:db_update_placeholder_1 WHERE (bid = :db_condition_placeholder_13) - Array ( [target] => default [return] => 2 [already_prepared] => 1 ) SQLSTATE[08P01]: <<Unknown error>>: 7 ERROR: bind message supplies 2 parameters, but prepared statement "pdo_pgsql_stmt_0203b6ac" requires 3 in batch_process() (line 2633 of /Users/amorton/Sites/dh/includes/form.inc).
damien tournoud’s picture

Issue tags: +PostgreSQL Surge

Testing the new tag system.

josh waihi’s picture

StatusFileSize
new7.24 KB

something like this will help satisfy both parties - I agree with Crell, I see no reason why that extra schema stuff should be in the connection. This patch attempts to put it in Schema though this is untested and most likey won't work. but surely we can do something like this?

damien tournoud’s picture

Title: PostgreSQL surge #15: Make the postgresql driver independant of the schema » Make the postgresql driver independant of the schema

@Josh: well, this is one of the worst solution. queryTableInformation() is a very specialized function: it only fetches information about blobs and sequences used on a table. It is not a general schema query solution, and has nothing to do in schema.inc. My points in making that this way are:

(1) we simply don't need to query the whole table information for Insert and Update queries
(2) we don't need to load the full schema.inc file at every request (the PostgreSQL driver is slow enough)

And removed the title that is of no use anymore, now that we have proper tags :)

Crell’s picture

Just how much of a performance hit is there really to loading the schema.inc file? It's not an especially complicated class, so it should parse quickly, and loading the class itself doesn't automatically load the full schema out of the cache. (Side note: We really do need to break that up still. Bah.)

josh waihi’s picture

Sorry, I was under the impression that the schema.inc had already been included. I thought it was the drupal_get_schema() function calls you wanted to avoid. I still think that if its schema related then it should reside in the schema.inc. I'm still not convinced that queryTableInfomation() belongs in Database_pgsql.

I'm not sure how my suggestion loads all the table information:
$this->connection->schema()->queryTableInformation($this->table);
I assumed this did the same as $this->connection->queryTableInformation($this->table); but stored the function in the schema.

josh waihi’s picture

in regards to my comment above, when schema() is called is mearly initializes an instance of DatabaseSchema_pgsql which in turn assigns the connection to a variable within the instance. When schema() is called, it also assumes that schema.inc has been included. So I'm still a little confused as to how moving queryTableInformation to the schema.inc will slow things down.

damien tournoud’s picture

@Josh: calling ->schema() will trigger a $this->schema = new $class_type($this);, which will trigger, via the SPL autoloader, the inclusion of schema.inc.

josh waihi’s picture

StatusFileSize
new7.49 KB

ok fair enough, I'm convinced. This is Damien's patch with a few small changes and error clean-ups. am currently testing it on a PostgreSQL testbed. will post results when done

josh waihi’s picture

Status: Needs work » Needs review

forgot to change status

josh waihi’s picture

Status: Needs review » Needs work

also test results on postgres: Failed: 8138 passes, 5 fails, 0 exceptions <-- no different to CVS HEAD

josh waihi’s picture

Status: Needs work » Needs review

stupid status defaults!!!

Crell’s picture

I still disagree that this belongs in the connection object rather than the schema object. At absolute minimum, if the performance difference is enough to justify it we should document why we're doing this lookup in the "wrong" place.

Status: Needs review » Needs work

The last submitted patch failed testing.

josh waihi’s picture

Status: Needs work » Needs review

faulty slave, retesting

josh waihi’s picture

ok. Time trials, I selected the database unit tests (717 tests) and ran them through the web interface. Results:

HEAD 1/30/09: The tests finished in 14 min 34 sec.
Patch applied: The tests finished in 14 min 22 sec.
Patch Below applied:The tests finished in 14 min 45 sec. (logic was stored in schema as Crell suggested)

In this instance we are talking seconds of difference but there is no doubt that the patch above in #24 is the optimal.

catch’s picture

Josh - I know it's annoying, but we probably need those run through again, I'm sure there's a few seconds margin of error running tests on most machines anyway. Either that or can you run ab on a database heavy page?

josh waihi’s picture

@catch, this was only over 717 tests out of 8000+ I'd imagine the results would amplify and become clearer if all 8000+ tests were run. Time is the issue here. wish boombatower would get those postgres testbeds running ;)

bjaspan’s picture

This patch seems to work to fix the problem with installation on pgsql identified by #361683: Field API initial patch. I did not review the code other than to try the patch.

josh waihi’s picture

#24 fixes #375264: Field API failing on PostgreSQL

Crell, can we get this into head? Cause HEAD is broken on PostgreSQL

damien tournoud’s picture

Status: Needs review » Reviewed & tested by the community
StatusFileSize
new7.44 KB

This is a reroll of #24. It's quite time to get this little one in.

webchick’s picture

Hm. I want to commit this, I really do, because every second that we can't install D7 on pgsql means we are likely introducing more pgsql bugs in subsequent patches since we are effectively unable to test.

However, I still can't tell if Crell's concerns were addressed. :(

Crell, are you out there? Would you support committing this even as an interim fix before we re-work schema.inc in the way that you want?

webchick’s picture

Status: Reviewed & tested by the community » Needs review
Crell’s picture

I'll try to have a look at this patch tonight US time. Sorry, the past week and a half have been totally insane. :-(

Crell’s picture

Status: Needs review » Reviewed & tested by the community

I still disagree that this information belongs in the connection object at all. Putting it there instead of in the schema object is a hack and I'm still not convinced that it's a necessary hack. However, it's a single-database-specific hack and we do need to make simpletest work in postgres. So for now RTBC, and let's try really hard to block out time to clean up schema so we can move this logic there later.

webchick’s picture

Status: Reviewed & tested by the community » Needs work

Some minor issues:

+   * We introspect the database to collect that information required by insert
+   * and update queries.

Probably "the" information required...

+   *   An object with two member variables:
+   *     * 'blob_fields' that lists all the blob fields in the table.
+   *     * 'sequences' that lists the sequences used in that table.

use - rather than * for the bulleted list.

+          $table_information->sequences[] = $matches[1];

Just a question.. why are these not keyed like the blob_fields are above?

+          $blobs[$blob_cnt] = fopen('php://memory', 'a');

We don't abbreviate variable names. $blob_count. Also consistent with the lines at the bottom of the patch.

+          ++$blob_cnt;

$blob_count++ would be the norm. Could you place a comment there to explain why we pre-increment?

Crell’s picture

Pre-increment is marginally faster than post-increment in PHP, for deep engine reasons I don't fully understand but have seen benchmarks for. In cases where either work equally well logically, pre-increment is therefore a (rather) slight performance benefit.

I don't know if intricacies of PHP engine semantics like that are worth a comment, or if they cross into the "we expect you to already know this" territory (as we do for a lot of other micro-optimizations).

josh waihi’s picture

Status: Needs work » Reviewed & tested by the community
StatusFileSize
new9.29 KB
webchick’s picture

Status: Reviewed & tested by the community » Fixed

Committed to HEAD. Thanks!

josh waihi’s picture

Status: Fixed » Needs review

ah crap, the patch I posted were fixes made to the schema version. @Crell, I guess you got your wish

Status: Needs review » Needs work

The last submitted patch failed testing.

josh waihi’s picture

Status: Needs work » Closed (fixed)

closing -