Problem
- PostgreSQL converts all table column names into lowercase, unless quoted.
- Drupal's Postgres driver does not quote the table/column/alias identifiers, so Postgres creates them in lowercase and also fails to query them.
- → External databases that are using identifiers with uppercase letters cannot be queried.
- → Drupal 8: Special content entity properties added via expressions are set with incorrect case such as
$entity->isDefaultRevision, which prevents entity revisions from loading correctly.
Goal
- Retain letter-casing of table column names and aliases.
Details
- PostgreSQL treats all identifiers in schema operations as lowercase, so the following creates NOT the table you intended:
CREATE TABLE Events (
EventId SERIAL NOT NULL PRIMARY KEY,
);
- To retain natural casing in schema operations, identifiers have to be quoted:
CREATE TABLE "Events" (
"EventId" SERIAL NOT NULL PRIMARY KEY,
);
Proposed solution
- Quote identifiers to retain their natural casing by implementing escapeField(), escapeTable(), and escapeAlias() methods.
- Escape aliases used in expressions.
- Drupal 8: Fix failing tests to ignore quoted identifiers as a 1:1 query string comparison is not valid.
Challenges
- Conditions and Order By in the default classes allow escapeField() to be used with table alias. It is also valid to create tables with columns that contain the quoted "." character.
Links
Related issues
Original report by @xeniak
I have to connect to an external Postgres database that has column names with capital letters. In Postgres, these column names *must* be in double quotes, or the query will fail.
When I build my SelectQuery in Drupal, Drupal ignores the double quotes no matter how I try to escape them:
// Create select query to video database
$query = new SelectQuery('bocc_meeting', 'bocc_meeting', Database::getConnection('default', 'video'));
// Columns to select
$query->fields('bocc_meeting', array("\"meeting_Title\"", "\"meeting_Date\""));
$query->orderBy("\"meeting_Date\"", 'DESC');
return $query->execute()->fetchAll();
gets me:
PDOException: SQLSTATE[42703]: Undefined column: 7 ERROR: column bocc_meeting.meeting_title does not exist LINE 1:
SELECT bocc_meeting.meeting_Title AS meeting_Title, bocc_mee... ^:
.
SELECT bocc_meeting.meeting_Title AS meeting_Title, bocc_meeting.meeting_Date AS meeting_Date FROM {bocc_meeting} bocc_meeting ORDER BY "meeting_Date" DESC;
Array ( )
in get_video_list() (line 69 of ...).
Notice that my column names are unquoted, except in the ORDER BY clause, where the double quotes appear correctly.
I've also tried single quotes on the outside with both escaped and unescaped double quotes. Same result.
Comments
Comment #1
pounardI think this is an issue for the Drupal project, wrong issue queue, re-assigning it.
Randomly setting the last stable, core people will categorize it right.
Comment #2
bendiy commentedI tracked this down to /includes/database/database.inc The escapeField() and escapeAlias() are removing the double quotes.
I've attached a patch that stops them from being removed. It probably has SQL injection written all over it, but I'm marking it as needs review.
According to the comments on escapeField():
That may be a better solution, force the PDO drive to double quote all field names, but I don't know where that setting would be made.
Comment #3
bendiy commentedAnd your hook_schema would look like:
That's single quote ('), double quote ("), column name, double quote ("), single quote ('). No escape slashes.
Comment #4
bendiy commentedAttached is a new patch that should address any SQL injection issues. It checks for double quotes at the start and end of the string. It then sanitizes the string the old way and if both quotes were there, it adds them back.
Comment #5
damien tournoud commentedYou are going to need to override those methods to the PostgreSQL-specific implementation. The base implementation is sane and should not be touched.
Comment #6
bendiy commentedI've moved them to the includes/database/pgsql/database.inc file so they are PostgreSQL-specific.
I know this has to go through 8.x, but could it be backported to D7 if approved?
Should another approach be taken to get support for this?
Thanks for the feedback.
Comment #7
bendiy commentedPatch attached.
Comment #8
bendiy commentedOh... and needs review.
Comment #10
bendiy commentedHere's a D8 version.
Comment #11
cweagansFixing tags per http://drupal.org/node/1517250
Comment #12
liam morlandTagging
Comment #13
valthebaldIMO it's better to leave field name in schema definition without double quotes
but add double quotes in escapeField()
This could make code a little bit safer, and allow the same schema work in both mysql and postgresql
Comment #14
valthebaldComment #15
bendiy commentedIMO that is a bad approach.
If the double quotes are moved to escapeField() in the PostgreSQL driver, ALL fields would be escaped with double quotes on ALL queries. That is a very aggressive change to make. There is a chance that existing working code would break. If the column name in existing code was 'firstName' => array..., the default behavior has been to change that to 'firstname' when a query is ran. By moving the double quoting to escapeField() that will become '"firstName"' from now on for ALL queries in PostgreSQL. That query will now fail in PostgreSQL on previous installs that upgrade to this feature.
By keeping the double quotes in the schema definition, I'm being explicit that the column name should be double quoted for that column only. The schema definition is the best place to be explicit that a column in a database schema should be double quoted. One could argue that an array property be added and is a better place to store this information:
The reason for the double quotes is to add compatibility for querying third party databases that use this convention. That is how you write the queries in PostgreSQL, even using psql to query the database directly from the command line. There is no reason to support both MySQL and PostgreSQL when you are trying to interface with a third party PostgreSQL database that actually requires queries be written like this:
If support for MySQL is a concern, my current patch will actually work fine on MySQL. The double quotes will be removed by the default database driver's escapeField(). Also, in MySQL field names are case-insensitive regardless. The following query would be generated for MySQL and run just fine.
Comment #16
valthebaldFirst, I don't see how postgres version of escapeField() could affect MySQL. Field name in schema definition is without quotes, and quotes are added by postgresql driver.
Second, while escaping all field names is not a problem at all (and, if I remember correctly is even encouraged by SQL-92), it is possible to escape only "special" field names. Postgres is doing just fine with all-lowercase field names, without double quoting, but requires double quotes if there are other symbols, i.e. uppercase characters. drupal core tables, which do not use upper case, remain the same in both drivers.
Third, since this is a driver-specific problem, I find it more appropriate to solve it in driver, but not in (driver-agnostic) schema.
Comment #17
bendiy commentedWill create a query like this in MySQL currently and it will run just fine. Double quoting a schema column does not effect MySQL.
Will now be:
That query will fail on an existing install because when the table was created the column as added as all lower case letters.
Yes, in PostgreSQL you can quote all column names and you should be able to. However, you don't make a change like that to a production system without a lot of testing to ensure you do not break your queries. The patch as it is has no potential to break queries. Quoting EVERYTHING does and would require all third party modules to test their code against this change.
Changing the default behavior in PostgreSQL to quote ALL fields is dangerous without a massive amount of testing.
Comment #18
bzrudi71 commentedSeems this is more or less a duplicate of #1622982: PostgreSQL auto-converts column names into lowercase.
Comment #19
sunIndeed, this is pretty much the same bug report, both were even created around the same time. Looks like I didn't find this issue back then, because it isn't properly categorized and its title is too vague. Fixing that now + copying over the issue summary.
The shared trigger for both issues is that Drupal code needs to be able to query a Postgres database using table/column identifiers that may be using uppercase letters.
This issue identified the need to query an external Postgres database (not owned by Drupal). My issue identified that a Drupal module cannot use table column names that are simply re-using the letter-casing of response parameters of an external RESTful API.
In order to proceed, I think we should merge the patch from #1622982: PostgreSQL auto-converts column names into lowercase into the patch here.
Comment #20
bzrudi71 commentedMerged both patches as suggested by sun. While the patch seems okay for me for Drupal 8 and gives the expected results, I think back port to Drupal 7 needs some discussion because of #17. However let's get things done for D8 at least :-)
Comment #21
bzrudi71 commentedAdded parent issue
Comment #22
robertwb commentedCan someone give me a leg up on where to find the D7 code to implement this patch?
Thanks
Comment #23
stefan.r commented@robertwb I don't think there was any yet, try this backport. I also added the extra check mentioned in comment #5 of #1622982: PostgreSQL auto-converts column names into lowercase.
Is there anything we can do to address the concerns in #17? Perhaps a
module_invoke_all('schema')in an update hook and renaming the relevant lowercase-only fields to reflect the capitalization in the schema definitions?Comment #24
stefan.r commentedComment #25
stefan.r commentedSetting back to 8.x now
Comment #27
bzrudi71 commentedGreat, patch still applies. Did some docker test runs, incl. Database tests and patch doesn't break anything as far as I can see. Anyone doing review, so that we can haz RTBC :-)
Comment #28
mradcliffeShould have "string" as the return type. See API documentation and comment standards.
This matches the MySQL documentation regarding identifiers.
It's probably a rare case that someone would want their localized non-latin characters or latin with diacritics in their column names. And this would technically allow someone to attempt to create a column that begins with a number or underscore.
Also, this has the "." character in the class, but that's not documented (or supported?).
Comment #29
mradcliffeNeither the sqlite
or mysql driverscheck the identifier for valid characters.I guess that means follow-up issues.Edit: This is not necessarily true. MySQL supports other characters if quoted so it probably doesn't need to implement escapeField.
Comment #30
broonSince I just ran into the same problem where I need to query an external PostgreSQL DB with CamelCase table (and field) names via Drupal 7, I want to add that the escape function for fields should be extended.
The patch works fine for simple queries (without joins). However, if you need to set a WHERE condition on a field using table aliases, the current patch is not enough since
"alias.fieldName"doesn't work. It's eitheralias."fieldName"or"alias"."fieldName".So, the line
should be added to
escapeField()in order to allowtableName.fieldNameconstructs:Additionally, I added an
escapeTable()function (current D7 patch only has functions for field names and aliases):Comment #31
mradcliffeThis affects revision handling for entities as the "isDefaultRevision" property is set via a Sql When statement (If/Then).
Comment #32
mradcliffeescapeAlias should always escape with quotes in my opinion. We should escape the alias in places such as addExpression (see patch). I incorporated some of my comments from above into this patch as well.
Needs a run on a pgsql testbot - all tests or database + node revision tests.
Edit: I generated interdiff poorly as i modified the prior patch before committing locally :(. Please ignore that file.
Comment #33
mradcliffeComment #34
bzrudi71 commentedPlease find attached the output from node and database PG test. Seems we are quoting sometimes to much ;-)
Comment #35
mradcliffeNo fair, your machine is faster than mine! :-)
- Patch failed Database tests on db-pgsql-9.1 + web-5.4 test run after I finally got to rebuild my testbot box. (SchemaTest, SelectComplexTest, SelectTest, UpdateComplexTest, AlterTest). (+10 fails, +13 exceptions)
- All node revision tests + NodeTypeRenameConfigImportTest pass now though NodeAccessPagerTest failed (-17 fails, +3 fails)
Still think that escapeAlias should quote aliases for AddExpression and what not cases, and that escapeField should take into account whether it needs to quote or not because it may have already gone through escapeAlias and escapeTable respectively.
Comment #36
bzrudi71 commentedRe #35 Yep, mradcliffe I'm with you. We should try to quote all and everything. But since this seems to rise some new fails I just startet a complete (--all) testbot run so we will have a better overview of what breaks and what will be fixed. Will attach output later, and even if my box seems a bit faster, this will still take some time :-)
Comment #37
bzrudi71 commentedHere is the full run PostgreSQL log. As I still have bad cold, just posting here ;-)
Comment #38
mradcliffeAlterTest, SelectTest, SelectComplexTest fail because the alias is set as the key. That should be easy.
SchemaTest as we already know fails due to copyTable() not being implemented.
However...
UpdateComplexTest::testSubSelectUpdate fails because the test is bad and asserts that the query string will be the same on all drivers. This is not necessarily the case because of other things in core (the isdefaultrevision alias issue). However the query result is the same despite this failure. I think that requires a follow-up issue of itself to look at if we can even assert identical query strings anymore in core tests.
I could also add the quoting mechanism to the default addExpression() and that'll cause the test to fail on current testbot (mysql) too. :-)
OR we could create an API change in Drupal 8.0.x to change the method signature to only optionally use escapeAlias(). I'm not sure if that would make anyone too happy as we're in beta. I think it would be very confusing to a new developer to wonder why one wouldn't want to escape aliases sent into addExpression() by default.
Comment #39
mradcliffe@alexpott mentions that the test against the query string is necessary in the typo'd comment in #1837118: UPDATE foo SET bar=(SELECT...) is not supported: https://www.drupal.org/node/1837118#comment-6755630
I don't think __toString should get rid of SQL specific escaping or quoting, but that's also an option.
Comment #40
bzrudi71 commentedIf the assertIdentical for query strings is really required, the only thing I could think of is fixing it within the tests themselves. Given that MySQL will never ever do any quoting within the query string (else we should have failing test on MySQL also) it should be save to hard remove all quotes for the comparison. Guess API change is no option and as you mentioned - why would one not quote?
Let's not add the next workaround. If we could agree with doing the fixes within the tests we should first figure out how many tests use query string comparison ;-)
Comment #41
bzrudi71 commentedJust did a quick grep for assertIdentical query strings but just found the one in UpdateComplexTest. So should not something like this be good enough
str_replace('"', '', $query_string)to make test pass?Comment #42
mradcliffeAdded quote string for mysql too, but that may come back with a failure. Verified that sqlite uses quote for aliases and columns.
This patch won't fix all the test failures (SelectTest and SelectComplexTest). It looks like adding the quote to expressions is screwing up the orderby logic that we have in pgsql driver.
Comment #43
bzrudi71 commentedYep, there seems to be a problem with orderBy() since the quoting in addExpression(). Especially the comment and search test groups are totally broken since. I already spend some minutes on that but was not able to figure out yet. Anyway thanks for the patch update :-)
Comment #44
mradcliffeFound the issue with orderBy() and select tests. There are some more exact query matches, but those tests use assertEqual().
Additionally I moved the use of escapeAlias() in Select::addExpression().
I only ran this passed the Database test suite with passes except for known copyTable test. Don't have enough time for a Node or full test before my flight.
Edit: could run some tests on my next flight.
Comment #45
bzrudi71 commentedGrat work @mradcliffe! Just started complete test run and it looks very promising so far. Just think that we need an issue summary update here before we can haz RTBC?
Comment #46
mradcliffeComment #47
bzrudi71 commentedWe are getting closer and closer :-) Patch from #44 does not show any new failures, but fixes a couple of the known ones. All fails in editor and field group tests pass now, as well as some single tests in other test groups! Full test run log attached...
Comment #48
mradcliffeI think my preg_replace() in the test is lazy, and probably should be
preg_replace('/[`"][A-Za-z0-9_]+[`"]+/')instead. \S+ may match things that it shouldn't and confuse future developers.Comment #49
mradcliffeAnd the revised patch.
Comment #50
mradcliffePer #45 and #47, marking RTBC.
Comment #51
bzrudi71 commentedJust did a review of the patch for syntax but found nothing obviously. RTBC!
Comment #52
alexpottSo I think we should add a new assertion to DatabaseTestBase that does the query string mangling only for postgres rather than all dbs. Something like assertQueryStringEqual()
We also seem to be missing explicit test coverage of the change.
Comment #53
damien tournoud commentedThis doesn't make any sense to me. The field passed to
escapeFieldshould not be escaped already, so we should unconditionally escape it here.As far as the tests are concerned, please just fix them. Tests should not try to do dumb comparisons of query strings, ever.
Comment #54
mradcliffeI think that the reason escapeField needs to check if it's already quoted is that in some cases escapeField is called on a "table.field". I was seeing things like "table"."column" trying to get re-escaped as ""table"."column"".
Also, @Damien Tournoud, do you mean "fix them" as in do as @alexpott suggests or "fix them" as in just leave it as in the latest patch? I'm not sure which is the dumb query string comparison: a) assuming query string will always be the same or b) normalizing the query string before comparison?
Comment #55
damien tournoud commentedThat's fine, in that case it should split it and encode it separately.
Neither; tests really should not do any string comparison. Let's try to change them so that we still assert for the right thing without doing a full string comparison: the two comment tests should really just look for the comment, not compare the whole query string.
testSubSelectUpdate()I would prefer would assert on the result set than on the query string.Comment #56
bzrudi71 commented@mradcliffe any chance to move forward here ;-)
Comment #57
mradcliffeI've been mainly working in contrib land for the past few months, but I was able to look at this today.
I reworked the existing tests to not do full query string comparisons per @Damien Tournoud, but I quite like the simplicity of the current patch, so I think that should remain.
I added a unit test for PostgreSQL connection with one test method for escape field, which can be expanded on later. This would also be useful as a starter example for MySQL, SQLite, and other database driver unit tests.
I think this unit test needs some work with regard to adding more data to test. I am not as familiar with all the uses for these methods that need to be tested.
Comment #58
mradcliffeFlip back to needs work.
Comment #59
mradcliffeAs an example, I'm not sure if this is what we should be expecting. That doesn't /feel/ correct.
Comment #60
mradcliffeComment #61
mradcliffeHere's another try. I expanded the unit tests to capture what I think is an acceptable use case of potential strings to escape beyond the base escape methods. I also no longer escape every field by default and am only focusing on case-sensitivity so that it matches the Drupal functionality provided by the base Connection class.
I need to run this through postgresql test bot, but the phpunit tests should pass on the pifr test bot.
Comment #64
bzrudi71 commented@mradcliffe did you run the patch on PG already? If not, I can do a bot run later. Currently bot is busy with testing my patches...
Comment #65
mradcliffeNot yet, @bzrudi71.
Comment #66
bzrudi71 commentedOkay, just verified that all tests in patch pass PG bot!
Comment #67
andypostAwesome patch! just a few nits
Should use {@inheritdoc} with extended comment.
Also better to fix parent doc-block, maybe separate issue
comment is not clear, "postgresql" means server or pdo or drupal driver
why start from 2? please add inline comment
contains is assertNotEqual(FALSE, strpos($query, $expocted), ...contains
extra line--
Comment #68
mradcliffeThanks for the review.
#1. I re-read the @inheritdoc documentation, and it has to be stand-alone. I rewrote the comments inline instead. I removed the bit about case-sensitivity because we're just supporting what Drupal already expects.
#2. Clarified comment
#3. I looked back to the interdiff that had this change, and this comes from the core driver in multiple places, but without any comment about why it begins with 2. Probably need to go git blame Drupal 7.x code base for the commit. I added an inline comment in the driver regarding the code is from the abstract class.
#4. Fixed test per comment above.
#5. Removed line
I still need to ping @Crell regarding the new unit test.
Comment #70
andypostFixed tests
Comment #71
bzrudi71 commentedAny reason why we hold this one back? Patch looks good and seems RTBC from my side. Is it because of:
If so, can't we file a follow up for the unit tests and get this one in?
Comment #72
mradcliffeYes, that's the reason. I think the coverage is sufficient, but I still have a second-guess fear that I'm missing something security-wise.
What do you think?
Comment #73
bzrudi71 commented@mradcliffe: Can't find anything obvious wrong with the patch, but if you have concerns let's better hold it back until some more feedback.
Comment #74
jaredsmith commentedA few comments on the patch:
Not sure what this new function has to do with capital letters in pgsql identifiers -- can you please explain?
I don't understand this change -- it seems unrelated to the purpose of this patch.
Same thing here... what does this have to do with capital letters in pgsql identifiers?
To address the concerns that @mradcliffe expressed about security -- I specifically looked for security concerns in my review, and didn't see anything that jumped out at me. That being said, I think it's good to get as many eyes on this patch as possible before settings it as RTBC.
Comment #76
devpreview commentedNew patch (#1013034: PostgreSQL constraints do not get renamed by db_rename_table()).
Comment #77
andypostComment #78
jaredsmith commentedAnother review on the latest patch:
This comment says that it's escaping any invalid start character, but it's really just removing any invalid characters. Please be precise...
I still don't understand why this test is being changed. It seems unrelated to the purpose of this patch.
Same thing here...
This change also seems unrelated to this issue.
Comment #79
andypostFixed #78.1 comment, suppose RTBC
Can someone test and report the effect of the patch on testing bot?
Answer to #78 2-3-4 in #55
Comment #80
bzrudi71 commentedPG bot seems happy with latest patch, no side effects just some more fixed tests. +1 for RTBC
Comment #81
daffie commentedThe patch looks good to me. Good work!
Some PHPUnit test nitpicks:
Can we replace this by @coversDefaultClass \Drupal\Core\Database\Driver\pgsql\Connection
Can we replace this by @covers ::escapeTable
Can we replace this by @covers ::escapeAlias and can we rename the function to testEscapeAlias.
Can we replace this by @covers ::escapeField
Comment #82
daffie commentedFixed my PHPUnit test nitpicks from comment #81 myself.
Comment #83
catchThis is necessary for postgres to pass tests, so bumping to critical per #2157455: [Meta] Make Drupal 8 work with PostgreSQL or remove support from core before release.
Comment #84
bzrudi71 commentedWith a hand full of reviews and all nitpicks fixed it seems that this is RTBC now, thanks!
Comment #85
alexpottSQLite is not affected by the test changes. This issue addresses a critical bug and is allowed per https://www.drupal.org/core/beta-changes. Committed 60f36e8 and pushed to 8.0.x. Thanks!
Remove unused mock.