Problem/Motivation
The postgresql driver is unable to convert from text/varchar to binary types correctly.
Earlier in this issue a patch was committed that attempted to escape backslashes, but this was incorrect both due to different treatment of escaping on postgres (standard_conforming_strings
can be configured differently and the default is different after 9.1. Also even with or without that setting, the backslash escaping didn't take into account PHP's own backslash escaping.
Proposed resolution
Remove the attempted workaround for bytea and just set standard_conforming_strings
to off in the connection.
Remaining tasks
Open a follow-up to add explicit database tests for changing varchar to binary that expose the bytea issue, so that we don't only run into this either immediately before or after major core cycles.
User interface changes
API changes
Data model changes
Upgrading from Drupal 6.20 to 7.0, the update #7055 returned the following error:
Failed: PDOException: SQLSTATE[22P02]: Invalid text representation: 7 ERROR: sintaxis de entrada no válida para tipo bytea: ALTER TABLE {variable} ALTER "value" TYPE bytea USING "value"::bytea; Array ( ) in db_change_field() (line 2888 of /var/www/drupal/includes/database/database.inc).
After that, there are 17 pending updates, that won't be excuted.
The DB is PostgreSQL 8.4 on Debian squeeze, and uses UTF8 encoding. The field 'value', in the table '{variable}', has type 'text'
If I execute the query in psql client, I get the same error, Postgres seems incapable of casting field type text into type bytea. I haven't seen any issue with this same error, so I have to think that no one hit into this. I'm completely puzzled. Any help would be appreciated.
Thank you in advance.
Comment | File | Size | Author |
---|---|---|---|
#142 | D7.pgsql-bytea-1031122-142.patch | 1.58 KB | joseph.olstad |
#122 | 1031122-2.122.patch | 12.54 KB | alexpott |
#122 | 116-122-interdiff.txt | 2.49 KB | alexpott |
#116 | 114-116-interdiff.txt | 880 bytes | alexpott |
#116 | 1031122-2.116.patch | 12.35 KB | alexpott |
Comments
Comment #1
BerdirCan you try the same query on an empty variable table?
Is it maybe a specific content that triggers the error?
Because this is an unconditional change that occurs on every upgrade path and with have upgrade tests which pass on PostgreSQL.
Comment #2
mcotelo CreditAttribution: mcotelo commentedBerdir, thank you, that worked.
The problem is that with that I loose a lot of configurations, that I'll have to restore. Copying the table as was in Drupal 6 doesn't seem really good idea, as some variables might have change names??.
What can be the problem? Too long values? Special chars as ñ, á, ...?
Comment #3
BerdirYes, that was only supposed to help with debugging of this problem :)
Next would be to try and figure out which of the entries could possibly cause the error.
Not sure what could cause this, special characters could be a problem yes. You could also try delete the greatest values and try again to see if it has something to do with the size.
Comment #4
mcotelo CreditAttribution: mcotelo commentedjeje, I was fearing that.
At least I have a clue now. I will try a few things, and post whatever I'd find, just in case it happens to someone else.
Again, thank you for your help.
Comment #5
mcotelo CreditAttribution: mcotelo commentedI've found the troublesome record. The módule XML sitemap, uses a config variable
xmlsitemap_lastmod_format
, and the saved value wass:12:"Y-m-d\TH:i\Z";
.Changing the "Last modification date format" setting in XML Sitemap from 'Medium' to 'Short' or 'Complete', avoids the escaped characters '\T' and '\Z' and the conversion worked OK.
Comment #6
Dave ReidI'm not sure how this is our fault? How else am I supposed to define this constant?
Comment #7
mcotelo CreditAttribution: mcotelo commentedSorry, I am not sure that it is your fault neither.
But the string 's:12:"Y-m-d\TH:i\Z"', should be stored in database as 's:12:"Y-m-d\\TH:i\\Z"', scaping the \ character (at least in PostgreSQL).
Maybe this issue should be moved back to core.
Comment #8
greg.1.anderson CreditAttribution: greg.1.anderson commentedI also ran into this problem with a postgres site that had the following pattern:
menu_breadcrumb_pattern_matches a:1:{s:11:"book-toc-16";s:16:"/^book-toc-\d+$/";}
The variable menu_breadcrumb_menus also contained this same pattern. By replacing these two variables temporarily with values that did not contain a backslash, I was able to successfully execute system_update_7055() without error. The problem here seems to be related to the way that Postgres expects bytea (blob) data to be escaped; see 8.4.2. bytea escape format in the Postgres manual.
There does not seem to be anything wrong with storing a backslash in a string in Postgres (that is, I do not think that it is necessary to convert the storage format as suggested in #7); the issue has to do with the implicit type coercion done when altering the data type of a column.
The documentation on alter table has this to say:
So, converting from a text field to a bytea field is the same as an assignment cast from text to bytea, but bytea expects its input to be escaped, hence the failure. Drupal currently builds the following SQL to modify the value field of the variable table:
The SQL that it needs to build for conversions to bytea should instead be as follows:
That decode + replace business looks mighty suspicious, but I found two references on the internet using that solution [1] [2]. I could not find any more sage advice than this in the Postgres documentation, and experimentation revealed that the above-quoted technique does in fact work. [Note that
replace("value", '\\', '\\\\')
is essentially the same asencode("value", 'escape')
, but the later does not work without typecasting, and I did not pursue this alternative further.]The attached patch affects this change only for fields being altered to type bytea (blobs); after applying it, system_update_7055 worked correctly for me on a postgres site.
Comment #9
greg.1.anderson CreditAttribution: greg.1.anderson commentedThis fixes a typo in the patch above: iF -> if. I guess php keywords are case-insensitive.
Comment #10
greg.1.anderson CreditAttribution: greg.1.anderson commentedThe Drupal-8 version of this patch is nearly identical.
Comment #11
greg.1.anderson CreditAttribution: greg.1.anderson commentedIncreasing priority to critical, since this blocks the d6 -> d7 upgrade path and d7 minor updates for all postgres Drupal sites. Feel free to adjust the priority if you do not agree.
To completely fix the d6 -> d7 upgrade path for postgres, we will also need #1575790: Update #7002 fails on postgres - ILIKE operator on bytea not supported.
Comment #12
chx CreditAttribution: chx commentedI am happy the postgresql upgrade path gets fixed but is there any way to avoid writing I do not even know how many backslashes? str_repeat('\\' ?
Comment #13
greg.1.anderson CreditAttribution: greg.1.anderson commentedI know what you mean about the backslashes. The goal is to replace single backslashes in the field contents with double backslashes; to express this in SQL requires '\\' (two backslashes) and '\\\\' (four backslashes), as each backslash must be escaped. To express two and four backslashes in PHP requires four and eight backslashes. You are right that eight backslashes is hard to read.
str_repeat is easier to read, but is also a little confusing. You still want the SQL to come out as two and four backslashes, respectively, so the code would be:
If that is clear enough, I will re-roll the patch.
Comment #14
chx CreditAttribution: chx commentedEverything is clearer than eight backslashes. Perhaps the comment needs to reflect how did we get to eight...
Comment #15
greg.1.anderson CreditAttribution: greg.1.anderson commentedWe really only get to four in the final code, as
str_repeat('\\', 4)
really means "repleat one backslash four times". Of course, that is equivalent to eight backslashes written out longhand, so the question of how many backslashes that really makes (four or eight) is somewhat subjective. I'll re-roll the patch tomorrow; when I do, I will try to think of a clear and correct but not-to-wordy way to explain. It really is confusing, though, because we go through PHP encoding to produce a string that is SQL encoded in order to apply encoding to the contents of an SQL field -- three separate data domains are involved. Maybe I should just add "see http://drupal.org/node/1031122" to the comment. I'll see what I can do.Comment #16
chx CreditAttribution: chx commentedWhat about chr(92) two or four times and explanation?
Comment #17
greg.1.anderson CreditAttribution: greg.1.anderson commentedI could go with
str_repeat(chr(92), 2), str_repeat(chr(92), 4))
. This has the advantage of at least being really clear about how many characters are being produced. The downside is that chr(92) is not immediately recognizable as a backslash, but I can solve that in the comment.Comment #18
chx CreditAttribution: chx commentedAre you telling me that people don't have the ASCII table in their heads these days :D ? Well, I am forgetting bits and pieces here and there but the 92 is really recognizable as the backslash IMO ;)
Comment #19
greg.1.anderson CreditAttribution: greg.1.anderson commentedI just discovered that I made a serious error above. The patches in #9 and #10 are not correct. I was confused by Bash escaping, and misinterpreted the SQL that needed to be constructed (I was testing with drush sql-query instead of psql cli). I am not sure how my test went wrong yesterday, but when I retested clean today I discovered the error. :(
The actual SQL needed is:
Sorry about the confusion with the number of backslashes required; once the SQL is corrected to the one and two backslashes needed, the PHP code requires only two and four backslashes, respectively, which is much more readable. In the current Drupal 8 code base, there are many instances where four and five backslashes in a row appear, but no instance of chr(92). I am not sure which version of the patch is best at this point. For comparison, the inline backslashes now look like this:
Using chr(92) looks like this:
I think that the former is actually more readable, but I'm willing to go with either version, as desired.
Also, there was one other problem with the previous patch: changing a bytea field to bytea would cause a failure. The latest patch first checks the type of the field, and uses a normal typecast if the field is already a bytea. This operation could be optimized to skip the conversion entirely for any field type if the type is already the desired format, if that is viewed as a worthwhile optimization.
Comment #20
chx CreditAttribution: chx commentedYes four is okay. Eight wasn't. My eyes glazed over that.
Comment #21
greg.1.anderson CreditAttribution: greg.1.anderson commentedHere is an updated patch; it is functionally equivalent to #19, but the conditionals are cleaner here, and the ALTER TABLE field type conversion is omitted entirely for bytea-to-bytea conversion.
Comment #22
sunLooks good to go.
Cleaned up the comments only. No other changes.
Comment #23
Dries CreditAttribution: Dries commentedCommitted to 8.x. Thanks all.
Comment #24
greg.1.anderson CreditAttribution: greg.1.anderson commentedThanks, guys. I'll re-roll #22 per #9 for d7.
Comment #25
greg.1.anderson CreditAttribution: greg.1.anderson commentedRe-rolled for 7.x-dev.
Comment #26
wbarnes CreditAttribution: wbarnes commentedI just tried the patch from #25 on the latest Drupal 7.x-dev, and when I ran update.php, I got this:
The following updates returned messages
user module
Update #7009
Failed: PDOException: SQLSTATE[42601]: Syntax error: 7 ERROR: syntax error at or near "\" LINE 1: ...ta" TYPE bytea USING decode(replace("data", '\', '\\'), 'esc... ^: ALTER TABLE {users} ALTER "data" TYPE bytea USING decode(replace("data", '\', '\\'), 'escape');; Array ( ) in db_change_field() (line 2988 of /var/www/dev/includes/database/database.inc).
system module
Update #7074
Failed: PDOException: SQLSTATE[42883]: Undefined function: 7 ERROR: operator does not exist: bytea ~~* unknown LINE 4: WHERE (options ILIKE '%query%') ^ HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.: SELECT ml.mlid AS mlid, ml.options AS options FROM {menu_links} ml WHERE (options ILIKE :db_condition_placeholder_0) ; Array ( [:db_condition_placeholder_0] => %query% ) in system_update_7074() (line 3025 of /var/www/dev/modules/system/system.install).
Any suggestions?
Comment #27
greg.1.anderson CreditAttribution: greg.1.anderson commentedSee #11, above: you also need the patch in #1575790: Update #7002 fails on postgres - ILIKE operator on bytea not supported to successfully complete the upgrade.
Comment #28
sunSo pgsql seems to escape the single-quote in
'\'
. I was slightly skeptical about that part already.Here's a new patch for D7. We can adjust this in D8 later.
@wbarnes: Can you test again with this one? That would be great :)
Comment #29
wbarnes CreditAttribution: wbarnes commented# patch -p1 patching file includes/database/pgsql/schema.inc
Hunk #1 FAILED at 530.
1 out of 1 hunk FAILED -- saving rejects to file includes/database/pgsql/schema.inc.rej
it didn't work for me.
Below is my schema.inc.rej:
# cat schema.inc.rej
--- includes/database/pgsql/schema.inc
+++ includes/database/pgsql/schema.inc
@@ -530,7 +530,23 @@
// Remove old default.
$this->fieldSetNoDefault($table, $field);
- $this->connection->query('ALTER TABLE {' . $table . '} ALTER "' . $field . '" TYPE ' . $typecast . ' USING "' . $field . '"::' . $typecast);
+ // Convert field type.
+ // Usually, we do this via a simple typecast 'USING fieldname::type'. But
+ // the typecast does not work for conversions to bytea.
+ // @see http://www.postgresql.org/docs/current/static/datatype-binary.html
+ if ($spec['pgsql_type'] != 'bytea') {
+ $this->connection->query('ALTER TABLE {' . $table . '} ALTER "' . $field . '" TYPE ' . $typecast . ' USING "' . $field . '"::' . $typecast);
+ }
+ else {
+ // Do not attempt to convert a field that is bytea already.
+ $table_information = $this->queryTableInformation($table);
+ if (!in_array($field, $table_information->blob_fields)) {
+ // Convert to a bytea type by using the SQL replace() function to
+ // convert any single backslashes in the field content to double
+ // backslashes ('\' to '\\').
+ $this->connection->query('ALTER TABLE {' . $table . '} ALTER "' . $field . '" TYPE ' . $typecast . ' USING decode(replace("' . $field . '"' . ", '\\\\', '\\\\\\\\'), 'escape');");
+ }
+ }
if (isset($spec['not null'])) {
if ($spec['not null']) {
Comment #30
wbarnes CreditAttribution: wbarnes commentedI just applied this and I still get:
The following updates returned messages
user module
Update #7009
Failed: PDOException: SQLSTATE[42601]: Syntax error: 7 ERROR: syntax error at or near "\" LINE 1: ...ta" TYPE bytea USING decode(replace("data", '\', '\\'), 'esc... ^: ALTER TABLE {users} ALTER "data" TYPE bytea USING decode(replace("data", '\', '\\'), 'escape');; Array ( ) in db_change_field() (line 2988 of /var/www/dev/includes/database/database.inc).
Comment #31
greg.1.anderson CreditAttribution: greg.1.anderson commented#30 looks like you are still running the patch from #25, not the one from #28; please double-check your schema.inc file.
Comment #32
sunThe reason for this seems clear:
The backslashes need to be escaped for the PHP string itself, since
'\\\\'
becomes'\\'
in PHP already.The backslash then needs to be escaped in the query with
'\\'
, since it would otherwise escape the closing single-quote; e.g., as in'My friend\'s house.'
.Comment #33
wbarnes CreditAttribution: wbarnes commentedExcellent, that worked, thank you!
I had to roll back my database.inc and schema.inc to 7.x dev version.
Then I patched my database.inc from #11, and schema.inc from #28 and now the update.php worked.
Comment #34
sunThanks for testing!
So back to D8. Apparently, the original commit does not appear in the repo for me. So here's a fresh patch for D8.
Skimming the issue once again, I also found the quoted code comment in #13 much clearer. Also adding the details from #32 about the backslash escaping.
Comment #35
greg.1.anderson CreditAttribution: greg.1.anderson commentedGuys, this bug is clearly driving me crazy. I just re-tested, and I still find that #28 is broken, and #25 works. However, prior to #19 above I clearly agreed with you that #28 is correct, so I am guardedly cautious about this situation.
Here are a few simple tests I ran in psql (version 9.1.3):
This implies that psql does not expect or require the '\' to be escaped. php itself requires '\'s to be escaped, so if I am correct, we want
decode(replace("field", '\\', '\\\\'), 'escape')
, just like #25. Now, if I am correct and #25 is right, then patch #28 will still work if all of the data in "field" is populated only with even-number-of-backslash sequences, and will fail if there is an odd number (see last two examples above). This could potentially explain my pre-#19 opinion. The thing that is driving me crazy, though, is I have no good theory as to why patch #25 would fail for you, unless psql really did expect '\'s to be escaped. By my most recent tests, I do not find that to be true, nor can I get #28 to work for me. Could you please re-test your results -- maybe run the queries above and see if you get the same response? If I've made some mistake here, I'd really like to know what it is.Comment #36
greg.1.anderson CreditAttribution: greg.1.anderson commentedFurther tests, this time with PHP:
Code:
Result:
This is also consistent with #25 being correct, and #28 being incorrect. If you change the test string in A and B to \\\\2 (evaluates to two backslashes followed by a two), then you get:
This is also the expected result for even numbers of backslashes (#25 and #28 then work the same).
Comment #37
plachance CreditAttribution: plachance commentedI've applied #25 manually to version 7.15 and it produce a sql syntax error for me. The correct line with my config (pgsql 8.3) is
I've manually inserted values "a\b" and "a\\b" and the result after the update is "a\\b" and "a\\\\b" as it should.
There's 2 escaping. The first by the php string inside double-quotes. The second by pgsql. That's why we need 4 and 8 backslashes. And it seems the E constant before the escape string is suggested because backslash escape without it should be disabled in a future release. Here's the doc about it : http://www.postgresql.org/docs/8.3/interactive/sql-syntax-lexical.html#S...
Comment #38
dgv CreditAttribution: dgv commentedIt does expect that when standard_conforming_strings is set to OFF (which is its default value up to PG version 9.0). In this case, \ in a literal string is an escape character and must be itself escaped.
On the other hand, when standard_conforming_strings is set to ON, the backslash in a literal is a normal character that does not escape anything and must not be escaped, which conforms to the SQL standard. This is the default value starting with PG version 9.1 and will be for future releases.
When a literal string is prefixed with the E letter, it's interpreted as if standard_conforming_strings was set to OFF for this specific string.
To avoid depending on PG version and the local db settings, the best practice for an application is to decide as a rule whether its literal strings are standard or non-standard, set the parameter accordingly, either at connect time or once and for all with ALTER DATABASE, and then stick to that choice in all its code.
When making that choice, it should be noted that PDO has a bug with regard to backslash and standard_conforming_strings:
https://bugs.php.net/bug.php?id=55335
Unfortunately it's not getting fixed, they wrongly qualified this issue as "Not a Bug".
Apparently, the PDO generic parser always treat \ as a special character while this behavior should be driver-dependant.
This is problematic since going forward, more and more db installs will have standard_conforming_strings=ON.
Comment #39
andypostSuppose better fix it at connect time as we do for mysql
Comment #40
catchI'm downgrading this to major since it's driver-specific (and not the MySQL driver). When we can support different databases with qa.drupal.org then bugs like this would cause actual test failures (and hence be critical), but until then it doesn't make sense to hold up everything else.
Comment #41
jweowu CreditAttribution: jweowu commentedBased on dgv's explanation in #38 I've used the following as a workaround on Postgres 9.1, in conjunction with the patch in #28.
ALTER DATABASE (name) SET standard_conforming_strings='off';
This setting can then be observed from the psql prompt with the command
\drds
(i.e. "list per-database role settings")My upgrade sequence is therefore:
* import old database into postgres 9.1
* ALTER database as above
* drush updb
This works nicely, although I observe that if I drop the database completely and recreate it from a dump (i.e.
pg_dump, pg_drop, pg_restore
), I lose thestandard_conforming_strings
setting, becausepg_dump
doesn't include it. You can obtain the correct database recreation and alteration statements from the output ofpg_dumpall -s
.Using
pg_restore -c
instead of dropping the database will also retain the setting, but that might not always be ideal, so in general you'll want to be aware of where this setting is stored (if using this ALTER approach), and make sure that you don't lose it in the meantime (at least until this issue can be fixed in the connection).Comment #42
benjifisher#34: drupal8.postgres-bytea.34.patch queued for re-testing.
Comment #43
dcam CreditAttribution: dcam commentedhttp://drupal.org/node/1427826 contains instructions for updating the issue summary with the summary template.
The summary may need to be updated with information from comments.
Comment #45
jrglasgow CreditAttribution: jrglasgow commentedComment #46
jrglasgow CreditAttribution: jrglasgow commentedI went to manually apply this patch and it was already in the 8.x branch. This was committed 29 Jul 2012 http://drupalcode.org/project/drupal.git/commit/c0b07c41ba4e0a9315059681...
Comment #47
jweowu CreditAttribution: jweowu commentedIs there a separate issue for comments 38/39 ? If not, then this can't be marked as fixed.
Comment #48
jweowu CreditAttribution: jweowu commentedRe-opening, in the absence of any indication that comments 38/39 have been dealt with (and in any case it needs a D7 back-port before it is closed).
Comment #49
martin107 CreditAttribution: martin107 commentedComment #50
stefan.r CreditAttribution: stefan.r commentedOne way to address comments 38/39 is to do
SET standard_conforming_strings='off';
at connect time (see attached patch).#910388: Installation fails on PostgreSQL >=8.4: Invalid escape sequence. suggests it should be set to on instead but that also means we'd need to fix all code that currently assumes it to be off :)
Comment #51
stefan.r CreditAttribution: stefan.r commentedD7 backport
Comment #52
stefan.r CreditAttribution: stefan.r commentedComment #54
stefan.r CreditAttribution: stefan.r commentedThe testbot somehow thought this was a D8 patch, even if the issue was marked 7.x-dev while the patch was being posted, did I do something wrong there? :)
Comment #55
stefan.r CreditAttribution: stefan.r commented51: drupal7-postgres_init_strings-1031122-51.patch queued for re-testing.
Comment #56
catchThis just resurfaced in #2548725: Fix regression to menu serialization in upgrade path that causes thousands of errors in PostgreSQL.
Comment #57
catchThis just resurfaced in #2548725: Fix regression to menu serialization in upgrade path that causes thousands of errors in PostgreSQL.
Comment #58
catchComment #59
catchComment #60
catchJust noting this was opened just 13 days after 7.0 was released, 8.x was branched long before the initial commit, and it's now holding up 8.x's release.
Even if we swallowed the bug in the menu update to get 8.x out of the door, it's still already added hours to the past week or so's debugging of the upgrade path - since we had the same PHP error due to at least three separate bugs including this one, which muddied the water considerably.
Comment #61
catchMarked #2560691: Consider more robust alternatives for PostgreSQL when changing fields to a blob type as duplicate.
Re-uploading #50.
Comment #62
dgv CreditAttribution: dgv commentedIt looks like the initial intention behind the SQL statement in error is to convert a text into its utf-8 representation as a series of bytes in a bytea field.
A cast from text to bytea shouldn't be used for that, as such a cast acts on the premise that the text is a bytea expressed as a literal representation (so embedded as text and fully quoted) . But that is not what you have, as "value" designates a field in a database, not a literal embedded inside a text query.
I'd posit that the SQL statement should be like:
ALTER TABLE {variable} ALTER "value" TYPE bytea USING convert_to("value", 'UTF8')
convert_to() is documented at http://www.postgresql.org/docs/current/static/functions-string.html
This works whatever is in the "value" field.
PS: at the caller level, you could question the sense of this operation. Dynamically converting a text field in a schema into its non-text representation is a quite weird and hackish thing to do.
Comment #63
catchConfirmed via some back and forth with pwolanin that this does indeed fix #2548725: Fix regression to menu serialization in upgrade path that causes thousands of errors in PostgreSQL. Since this is the older issue, has more background, was already committed to core once with an incorrect patch, has broader commit credit, and additionally documents the 7.x upgrade path issue, I've marked that issue as duplicate of this, and am bumping it back to critical.
I'm moving the beta target tag over here for now, but given this turns out to be a postgres driver issue after all and doesn't require a change to the actual menu update, I think we could put it in after the beta if necessary. However I'm also tempted to just RTBC this patch once we get a clean run on the right issue.
Comment #64
catchCross-posted with #62, if that works without the connection option then it'd be preferable.
Comment #65
plachExperimenting a bit with this #62, stay tuned.
Comment #66
plachComment #67
DamienMcKennaComment #68
dgv CreditAttribution: dgv commentedThis is a comment about 2548725-58.patch
Even though changing "standard_conforming_strings" is not the correct fix *for this issue*, as that would be a wrong interpretation of the root cause of the problem (per#62) , it still makes sense for Drupal to enforce a specific value for "standard_conforming_strings"
Why? It tells how every literal string in every SQL statement should be interpreted (in essence, whether backslash inside the string has a special role or not). A close equivalent in MySQL is NO_BACKSLASH_ESCAPES in sql_mode.
Not fixing it implies hazardous behavior. For example, a query like this:
UPDATE table SET field=replace(field, '\n', '<br>');
will either :
It seems very unwise to make this depend on "whatever standard_conforming_strings happens to be in the environment".
However, it is inefficient to SET it at every connection to always the same value.
It could be made permanent with
ALTER DATABASE nameofdb SET standard_conforming_strings TO 'off';
when doing that, every future connection to this database will inherit this setting as a default.
Another point is that, if it is decided that standard_conforming_strings should be set to 'off' ("old syntax"), it should also mute the warning that comes with it on every use of that "old syntax", with a command like this (again having a permanent effect and meant to run at install time):
ALTER DATABASE nameofdb SET escape_string_warning TO 'off';
Comment #69
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedI agree a bit with dgv that this is a hackish thing to do in general to convert some text to bytea and especially in this update function. I should have thought harder about it initially. I think we should fix the update function per the other issue in addition to any change to the schema class.
I'lll contrast with the most common use of this conversion in the past which was an incorrect schema where a long text field was being used to store serialized data instead of a blob type field, and had to be converted (I know Views and maybe some places in core had this bug).
Here's a combined patch with #63 that includes the update function fix.
Comment #70
DamienMcKennaMinor quibble: this line was indented too far.
Comment #71
stefan.r CreditAttribution: stefan.r commented@dgv is it measurably inefficient to set this on connection? If we set it as a setting on the database we might still need hook_requirements-type check to make sure it's not been changed?
Comment #72
pwolanin CreditAttribution: pwolanin at Acquia commentedI agree it would be better to set it once, but we don't necessarily expect the Drupal user to have the ALTER DATABASE permission do we?
https://www.drupal.org/requirements/database
Comment #73
dgv CreditAttribution: dgv commented@stefan.r By itself
SET standard_conforming_strings = off
is a fast query and shouldn't be noticeable in the midst of a lot of other queries; but it still implies a network round-trip to the SQL engine.
To check the setting client-side, a fast method would be to quote a string containing a single standalone backslash, with the database API method used to quote literals (I don't know yet about Drupal8, this would be the equivalent of PDO::quote() with PDO or pg_escape_string() with the pg_*-style PHP API).
Then if the result of that call is \\, two backslashes, it means that standard_conforming_strings is OFF, whereas if the result is "\" (input string unchanged) it's ON.
Comment #74
catchWe should discuss the connection options in a follow-up. That's worth having (I've looked at the SET NAMES utf8 call from MySQL in strace more than once with a raised eyebrow) but I'd rather have postgres slow with no data corruption, than hold up the data corruption fix on this.
Comment #75
pwolanin CreditAttribution: pwolanin at Acquia commentedIt seems pgsql isalso already doing a 'SET NAMES' call on every connection.
It's unclear to me in terms of the referenced PDO or C-APi bug whether it really affects us - in general in 8 we are NOT using quote() the way it's used in the bug report:
https://bugs.php.net/bug.php?id=55335
We are instead using the bindParam() method. There are a few cases we do quote in the pgsql, but they wouldn't have a backslash like:
Comment #76
pwolanin CreditAttribution: pwolanin at Acquia commentedPer catch, re-opening the other issue for just fixing the update function, and leaving this one open to get a driver fix + test coverage.
Comment #77
dgv CreditAttribution: dgv commented@pwolanin: indeed there's the question of permission. ALTER DATABASE is permitted for the owner of the database (or superuser(s), who can do anything anyways).
The db account used by Drupal is typically the owner of the Drupal database.
That's what you'd get when following instructions in the doc:
https://www.drupal.org/documentation/install/create-database#postgresql
The situation where the Drupal account does not own its database is theorically possible. It would require a bit of manual setting of permissions though, since by default we can connect to a db we don't own but we couldn't create objects in it.
BTW, as an alternative to ALTER DATABASE, you may also do:
ALTER USER drupaluser SET standard_conforming_string TO 'off'
which has a a similar effect than the ALTER DATABASE discussed above, except if applies to every connection initiated by this user, to any database. If db and user have conflicting defaults, it's the setting given by ALTER USER that wins.
Comment #78
catchCommitting the patch at #2548725: Fix regression to menu serialization in upgrade path that causes thousands of errors in PostgreSQL means we'll lose the implicit test coverage for this issue, so let's add that here.
Comment #79
mradcliffeI've been out of it for the past few days.
In the distant past, I've had to store Java classes in the database. We changed to storing in hex when we moved from Oracle to PostgreSQL. Hex as part of bytea was introduced in 9.0, but I recall doing a workaround in 8.0 using
encode/decode
which did support hex.So maybe the driver should look at storing as hex instead either when converting a field, and maybe also in general?
Comment #80
mradcliffeOh, right. #926636: Drupal install error on PostgreSQL 9.0 database, which suggests that unserialize() and hex output has problems.
Edit: but maybe that won't matter since we could still have bytea output as escape, but be able to input as hex?
Comment #81
dawehnerI tried to run #2561229: Upgrade content tests fails on postgres for translated comment with this patch and it took forever.
Comment #82
catch#2548725: Fix regression to menu serialization in upgrade path that causes thousands of errors in PostgreSQL is in, this is now a bug in the driver, not with the actual beta12-HEAD upgrade path (even on postgres), so untagging.
Comment #83
alexpott#2561229: Upgrade content tests fails on postgres for translated comment is also a bytea issue. I think Postgres's install Tasks are currently broken.
Comment #84
alexpottImplemented #77 and set it on the db during the installer.
Pseudo interdiff as part of #69 had been implemented in HEAD.
Comment #85
stefan.r CreditAttribution: stefan.r commentedNot sure if critical but a hook_requirements() runtime check and an update hook might be nice here as well
^
Comment #86
plachIt seems we have a failure with the latest HEAD: https://www.drupal.org/pift-ci-job/27107
Comment #87
alexpottIt looks like this patch causes a fail on postgres in Drupal\system\Tests\Update\UpdatePathTestBaseFilledTest::testUpdateHookN - unfortunately I do get the fail locally.
Comment #88
alexpottAlso given that Postgres is now green with all of our test coverage is this actually critical?
Comment #89
dawehnerGood question. Let's see the definition of a critical issue.
This could apply here if I understand the issue correctly.
Comment #91
alexpottI can't reproduce the postgres fail in
Drupal\system\Tests\Update\UpdatePathTestBaseFilledTest::testUpdateHookN
locally :( and HEAD postgres is now passing - so this issue is causing the postgres fail.Comment #92
catchI'm posting a combined patch which is a revert of 475299e63c6d3d6baa9058 (which removed our implicit test coverage for this bug by doing the update differently) + the patch here. Just to see what happens...
Comment #93
MixologicRe-ran just that one failing test on a clean postgres9.1 container and am seeing this reams and reams of this continously repeated.
The weird thing is that the drupalci script just stops running the test, yet pgsql continues in the background.
Comment #94
alexpottSo reading the documentation for standard_conforming_strings I think we want it on - it makes postgres behave like mysql.
We've removed all escaping here...
If we want it to work with 'off' we need to then we also need to turn the warning off as #68 suggests.
Comment #95
alexpottDrupal\system\Tests\Update\UpdatePathTestBaseFilledTest
is green locally for me with #94.Comment #96
alexpottEmphasis theirs ... https://dba.stackexchange.com/questions/71681/postgres-9-3-changes-the-s... ... the answer is from the top postgres expert on stackexchange.
Comment #97
jhedstromMakes sense to me to turn
standard_conforming_strings
on. Especially since we get to remove all of this code :).
Comment #98
alexpottAdding a test to prove the current changes to
Drupal\Core\Database\Driver\pgsql\Schema
don't work. Reverting those changes does not work either but that might be because of testing fun.Comment #99
dgv CreditAttribution: dgv commentedThat test in 94-98-interdiff.txt looks wrong.
It assumes that when you get a BYTEA from the db, it must have the same *text representation* that when it got in. That expectation is not correct, because there are multiple text representations for BYTEA.
For exemple if 'a' is in a text field and that field gets converted to BYTEA.
Then the field will now contain a single byte, value 0x61 (hexa).
When selecting back that field it may come out as '\x61', or 'a', or '\141'.
All three are valid representations of a BYTEA whose value is 0x61.
If the Drupal API has the equivalent of pg_unescape_bytea
http://php.net/manual/en/function.pg-unescape-bytea.php
the test should apply this function to the field it selects from the database before comparing the result to the string it knows should be the result.
Comment #100
alexpott@dgv in the install tasks we set the bytea_output to be escape.
So funnily enough HEAD is work for varchar to bytea... what we actually have a problem with is going the other way...
Comment #101
alexpottAnd now for a green patch on postgres that proves we can go from varchar to bytea and back again and adds more tests into the mix too.
Yay for tests.
Comment #102
dawehnerJust some small points. I have no clue about pgsql.
Nitpick: protected or public function?
Nitpick: so you got the connection object, then you could use the ->query() method on it
Could this code be executed when the field is alraedy bytea but is maybe just renamed or something like that?
I don't see changes to KerneltestBaseTNG
acction => action
Comment #103
bzrudi71 CreditAttribution: bzrudi71 commentedComment #104
alexpottSo reading https://github.com/laravel/framework/issues/3669 and looking at the the postgres fail above i think we need to seriously reconsider the usage of bytea fields, and just use text instead. It seems that bytea and PHP serialization just don't play well.
Comment #105
alexpottWhat is weird though is that storing serialized data works fine in bytea with the patch in #101... the problem we're having is when you convert a column that contains serialized data. It's because PHP adds some chr(0) to serialized output that gets messed up when converting between types.
Comment #106
alexpottComment #107
alexpottWe've been around this bend before... #690746: Text column type doesn't reliably hold serialized variables
Comment #108
alexpottSo tldr; bytea definitely is the column type we should be using to store serialized PHP that contains objects - since the
text
will break them. If they don't contain object then it might work (as the patch shows) but it is bad practice. I don't think we need a CR or anything to tell module developers to store serialized PHP in blobs cause if you're and someone using Postgres uses you module it will be broken - therefore we don't have an upgrade path to worry about.re #102 - thanks @dawehner
if ($spec['pgsql_type'] != 'bytea') {
is checked just above.Comment #109
dawehnerSo we seem to have test failures here again ... :(
Comment #110
alexpottWell yay for postgres testing... this check is completely bogus... the structure of
$table_information->blob_fields
isfield_name => TRUE
.Comment #111
catchIf this is green it'd be interesting to re-revert the system.install changes to see if it's green with the original update that rediscovered this. Not sure we want to do that in the final patch though.
Comment #112
alexpott@catch the patch contains that revert already :) (I think)
Comment #113
catchha only looked at the interdiff.
So pwolanin was pretty adamant we should make that change regardless of this issue. I quite like it as implicit test coverage for this one though. To save having that discussion block this issue we could possibly commit this without that change (i.e. remove it from the patch), then open a follow-up? Do you remember if that hunk has specifically caught anything the other test coverage here didn't?
Comment #114
alexpottOk reverted the changes to system.install - I think I agree with @pwolanin since having a field which is supposed to contain serialised data have incorrect data in seems like a recipe for weird bugs.
I've added tests to ensure that the databases tasks are run when we expect them to be. Unfortunately I don't think we can test mysql since its tasks don't make any testable changes to the db. Same with sqlite.
Comment #115
dawehnerLet's do this now
Comment #116
alexpottGood point. Not bothering to test on postgres since this is just a comment change.
Comment #117
pwolanin CreditAttribution: pwolanin at Acquia commentedI don't understand why if standard conforming strings is ON do we still need the code to convert backslashes?
Comment #118
alexpott@pwolanin this because
bytea
is not string storage see http://www.postgresql.org/docs/9.4/static/datatype-binary.html.Comment #119
dawehnerWe have tests, the feedback from @pwolanin got addressed. We have good documentation
Comment #120
alexpottCreated #2564275: Check database requirements as part of the status report to report on wrong postgres settings on the system status report.
Comment #121
catchI feel like we need docs somewhere that say 'never ever store serialized data in a text column ever'.
Given this issue arose in the first place because 6.x did that and 7.x tried to convert it to bytea that seems more than likely people will keep doing it.
Should we explicitly point out MySQL doesn't do anything assertable?
Leaving RTBC those are both minor.
Comment #122
alexpottUpdated docs in patch and on https://www.drupal.org/node/159605 - there does not seem to be anywhere obvious in code to document this. Also the problem is somewhat mitigated by the fact that it is only serialisation of objects. The test added shows we're able to move stuff around that's serialising everything else.
Note the documentation:
From https://secure.php.net/manual/en/function.serialize.php
Comment #124
catchPosted #2564985: Add a serialized_data schema type.
I think we've done as much as we can here. This gets 8.x right for new sites, and while it's not fixing the original reported issue, it at least moves things a long way towards doing that for 6-7 - at least for those sites lucky enough to have standard_conforming_strings on and no modules storing objects in {variables}.
Committed/pushed to 8.0.x, thanks!
Moving back to 7.x for backport.
Comment #125
dgv CreditAttribution: dgv commentedConcerning #94:
Only if sql_mode has NO_BACKSLASH_ESCAPES SQL, which I expect is the minority in MySQL users.
PG's standard_conforming_strings to ON is actually the opposite of MySQL's default.
That's not a good setting for Drupal as much its users will test their db queries on MySQL only.
Comment #126
alexpott@dgv can you make a failing test to show the problems? I think the tests added by this patch prove we have consistent behaviour between mysql, postgres and mysql since all are able to store the text - retrieve it unaltered and convert the column to other things that can store text including bytea/blob.
Comment #127
dgv CreditAttribution: dgv commentedI don't have the environment and Drupal-fu to write a proper test in the testsuite but my point is that, under the regime of standard_conforming_strings to ON, a query like this:
SELECT length('\n')
More generally, any query containing backslashes in string literals is problematic.
Comment #128
alexpottInteresting critical followup #2565241: First test fails on postgres because of a stale connection.
@dgv thanks for this I will test.
Comment #129
alexpott@dgv opened #2565529: Postgres standard_conforming_strings - should it be on or off? to discuss and test your concerns.
Comment #131
LiceBaseAdmin CreditAttribution: LiceBaseAdmin commentedI find the provisioning of patches confusing, why are there drupal 8 patches in a d7 issue? Would someone like to clarify or re-write the summary?
Reading the summary, it is now extremely unclear, what the problem is, what it affects what are proper and improper resolutions.
In particular:
.
It is left unclear which one is the incorrect patch (assume the bytea.27.patch), and what the correct escaping of backslashes would be. Even the latest patch includes a
change wrt the escaping of backslashes!
What is the latest >working< solution for 7 then? Or does it mean there is none?
Comment #132
alexpott@LiceBaseAdmin the issue status is "Patch (to be ported)" which means that the issue has a patch for another version of Drupal that can be ported to the version specified in the issue. See #124 for where this happened.
Comment #133
LiceBaseAdmin CreditAttribution: LiceBaseAdmin commentedalexpott,
unfortunately #124 adds to the confusion, as it states the patch to be back ported doesn't fix the original problem (but something (?) else).
I understand that the patch for d8 needs to be backported, but I think you are confusing 'normal' users with the way the issues are treated here.
Comment #134
alexpott#124 states that the backport will have to do more work to resolve inconsistencies between sites with standard_conforming_strings either on or off because previsou version of drupal support lower versions of postgres and do not enforce a value for this setting. It is giving whoever works on the patch the idea that the solution for D7 is likely to be more complex.
I'm unsure of exactly what you mean here.
Comment #135
LiceBaseAdmin CreditAttribution: LiceBaseAdmin commentedI meant that it is very hard to grasp if and what should an admin running drupal 7 with postgres do, or on which ground to make that decision. And at least from my perspective as not being a core developer ('normal user') this is far from clear.
Some confounding factors, which I would just like to see clarified, and I would be more than happy to help with that:
Comment #136
alexpott@LiceBaseAdmin the bug is:
This is fixed in Drupal 8 and still a problem in previous versions of Drupal. If your Drupal site uses postgres and code tries to call
db_change_field()
and make a text field a binary field then you might get this error. If you site has not tried to do this then you will be unaffected by this bug.The patch in #122 added tests to
SchemaTest
- the first step in porting the patch would be to port the test. Doing this would allow someone to run the test on D7 against postgres to confirm how it behaves and the nature of the problem. #8 contains a very old patch for D7 if you merge the tests and that patch it might fix it. However there are complications to consider given that different postgres installs might have different settings forstandard_conforming_strings
.Comment #142
joseph.olstadMy client wants to prototype managing 200 Drupal 9 sites (in multisite mode) using Aegir (Aegir currently based on D7) on postgresql.
Patch 27 still rolls cleanly against HEAD of the 7.x branch and is pretty much identical to what went into D8.
It would be nice to see some automated tests so I will re-upload this patch.
Comment #143
stephencamilo CreditAttribution: stephencamilo as a volunteer commentedComment #144
poker10 CreditAttribution: poker10 at ActivIT s.r.o. commentedPlease do not change issues statuses without further comments.
Comment #145
quietone CreditAttribution: quietone at PreviousNext commentedNo longer relevant to D8, remove tag.