The "Simpletest functionality" test now fails of PostgreSQL. This was triggered by #986992: Insane etid / {field_config_entity_type} abstraction changing the structure of the field tables, but is actually a different problem.
The problem we have is that Simpletest prefixes are long, and are added to an existing prefix. When the "Simpletest functionality" test creates a testing environment inside another testing environment, we end up with insane table names like this:
simpletest88284simpletest467931field_revision_comment_body
The problem is that index names are global in PostgreSQL and as a consequence we prefix them with the table name, ending up with a name like: simpletest88284simpletest467931field_revision_comment_body_entity_id_idx. Because all identifiers are limited to 63 characters in PostgreSQL, we end up with duplicate index names and errors like this one:
PDOException: SQLSTATE[42P07]: Duplicate table: 7 ERROR: relation "simpletest88284simpletest467931field_revision_comment_body_enti" already exists: CREATE INDEX "simpletest88284simpletest467931field_revision_comment_body_entity_id_idx" ON {field_revision_comment_body} ("entity_id"); Array ( ) in db_create_table() (line 2588 of /var/lib/drupaltestbot/pgsql/includes/database/database.inc).
There is no good solution for that, so I suggest we just reduce the length of the Simpletest prefix.
| Comment | File | Size | Author |
|---|---|---|---|
| #98 | 998898-98.patch | 10.4 KB | mcdruid |
| #98 | interdiff-998898-95-98.txt | 418 bytes | mcdruid |
Comments
Comment #1
damien tournoud commentedHere.
Comment #2
damien tournoud commentedNot nearly enough.
Comment #3
damien tournoud commentedActually, it might be easier to just do this.
Comment #4
damien tournoud commentedSome of the schema functions were not using the API correctly...
Comment #5
Stevel commentedsha1 returns a 40-character hash, so the substr is not needed here.
We also need a kind of upgrade path here. tableObjectName would be the place for this I think. Proposed approach:
Check if a constraint / index exists with the old naming convention.
If the constraint / index exists, return the name of the existing constraint / index
If it doesn't, return the name using the new naming method.
Comment #6
damien tournoud commentedComment #7
chx commentedWe have a GCI submission here
Comment #8
chx commentedBack to #3 / #4 this approach only touches the surface and while it does make the test pass, field API can very easily lead to too long identifiers.
Comment #9
Stevel commentedI implemented the proposed idea in #5. This keeps at least the upgrade tests happy. Schema API tests are failing because of indexes not being renamed now. (renameTable checks for the index name based on table name).
Comment #10
chalet16 commentedIgnore this, above patch already fix that
----
Patch to make "Comment fields" test passed on PostgreSQL
Comment #11
chalet16 commentedComment #12
josh waihi commentedThe 63 character limit is configured in PostgreSQL when PostgreSQL is compiled. It can be changed if you build PostgreSQL from source, but distributions like Debian and Ubuntu set 63 (default) as the character limit. I've encountered this issue with Views SQL aliases as Views uses extremely long aliases to prevent field conflicts.
As its simpletest that is breaking and currently not the upgrade path, could we not just rename the simpletest prefix to stxxxxx?
Comment #13
josh waihi commented#9: 998898-postgresql-identifiers-update.patch queued for re-testing.
Comment #14
josh waihi commentedsimpletest88284simpletest467931 is a ridiculous prefix. If we changed that to st88284st467931 for starters, then the problem will disappear from core. However, this is a known issue with Views aliases (#571548: Identifiers longer than 63 characters are truncated, causing Views to break on Postgres) but could be fixed with a proposed patch to hash field aliases.
And while PostgreSQL is limited to 63 characters for identifiers, according to docs, MySQL is only one character better (http://dev.mysql.com/doc/refman/5.1/en/identifiers.html) at 64 characters with the exception of aliases which is why Views gets away long alias names.
So it only makes sense then, to do something globally about the identifier length. My suggested approach is set a default length of 64 characters and then alter where Drupal can query the database and discover the correct maximum length.
Attached is the patch that validates an identifier and throws an Exception if the length is too long. It currently doesn't apply to field aliases.
I've also shortened the simpletest prefix which will fix this issue with core.
Comment #15
damien tournoud commentedPlease. Can we please not derail this issue?
The problem here is not the size of the identifiers actually specified in the schema (table names), it is the size of derived identifiers (index and constraint names). Throwing an exception at the user will not fix anything.
Comment #17
josh waihi commented#14: 998898-restrict-identifiers.patch queued for re-testing.
Comment #18
josh waihi commentedcurrently (to my knowledge) there are no restrictions to fields about how long a field value can be which correlates to sooner or later, identifier lengths will be exceeded. Maybe this is another issue, but IMO, the database layer should preserve this by throwing an exception to the module executing the query where is can deal with the issue appropriately.
I'm not suggesting we throw an exception to the user but catch it and deal with is accordingly.
Comment #20
steven jones commentedThere's an issue to generally change how indexes names are generated on PostgreSQL: #1008128: Do not use a single underscore as table and index separator on PostgreSQL and SQLite maybe work on this issue should continue there?
Comment #21
Stevel commentedActually, this is waiting for #1010188: Add support for database-driver updates. Both issues address different bugs (length vs namespacing), but they do touch the same codebase.
Comment #22
basic commentedPostgreSQL will create index names automatically if none are given, is there good reason not to investigate going this route?
From http://www.postgresql.org/docs/current/static/sql-createindex.html:
Comment #23
bzrudi71 commentedOkay, here is the port to D8. This patch combines some ideas from all patches regarding 63 chars limit in PostgreSQL from the issues queue. I did extensive testing to verify that indexes, constraints and primary keys are created and deleted as expected. I used the Fields and Comment test for that as they are affected by the limit. With this patch not a single test is failing because of to long identifiers. Also I ran the Database tests themselves with no problems.
Comment #24
bzrudi71 commentedAdded parent issue
Comment #25
liam morlandRelated: #571548: Identifiers longer than 63 characters are truncated, causing Views to break on Postgres
Comment #26
basic commentedComment #27
stefan.r commentedBackport of #23 to D7
Comment #28
bzrudi71 commented@stefan.r Thanks for the back port. Guess for the D8 patch we should do a small update using \Drupal\Component\Utility\Crypt; instead of md5 directly just like the way #2224847: Automatically shorten cid's in Cache\DatabaseBackend and PhpBackend does. Also this issue needs to be set back to D8 as the D8 a version needs to get in before the D7 one.
Comment #29
stefan.r commented@bzrudi71 yes that was just for the testbot.
But I don't think it runs Postgres tests properly anyway, does it? Does it look like we'll be able to run those for 7.x any time soon do you know?
Comment #30
bzrudi71 commented@steafn.r Sadly there is no (working) PG testbot this time so patch just passes MySQL bot ;-) PG bot is broken since ages, but at least for D8 we need a working PG bot environment for the beta, or at least for the RC's. So I don't give up hope :-)
FYI: Results from D7 PG environment and here for the D8 one.
Comment #31
stefan.r commented@bzrudi71 it fails 74 tests so if it's the same 74 tests as any other patch at least it doesn't break any further tests :).
As to the hash, it looks like we can just use
drupal_hash_base64()(for D7) andCrypt::hashBase64()(for D8) instead ofmd5(), it has 11 more characters but we should still stay well within the 63 characters with that.Apparently we shouldn't use md5() anyway considering comments in #2268875: [Policy, no patch] Using md5()/sha1()/crc32b in Drupal code :)
Comment #32
stefan.r commentedUpdated D8 patch
Comment #33
stefan.r commentedUpdated D7 patch
Comment #34
stefan.r commentedSetting issue back to D8
Comment #35
bzrudi71 commented@stefan.r Perfect! Thanks a lot for the updates :-) Hopefully we can see the results by PG testbot soon and then fix even more issues.
FYI: See #697220: Run tests on all supported database servers for testbot issues
Comment #36
cdracars commentedWhen attempting to do a re-roll of the patch there was a merge conflict.
Comment #37
stefan.r commented@cdracars it should apply fine on a clean install. Are you also using another patch which modifies the same code?
Comment #38
cmonnow commentedThanks for the work. This one is long overdue.
I suppose any outstanding PostgreSQL issues should intertwine since there could be residual conflicts unless agreements on best practice are made. For example, the use of uppercase letters in field names causing index creation errors in PostgreSQL schema due to double quotes in the _createKeySql($fields) function (https://www.drupal.org/node/1600670).
Comment #39
mradcliffeThis method will replace characters with hyphens, which is not allowed in PostgreSQL (syntax error thrown).
When running the pgsql test bot with this patch jaredsmith and I discovered this issue.
nitpick: else on new line
Edit: we tested on PostgreSQL 8.3.
Comment #40
mradcliffe- Created a separate method on the Schema class to do the hashing without being url-encoded per PostgreSQL documentation.
- Fixed coding standard issue.
- Added back method to default Schema class, which I'm not sure why it was removed and does not seem to have an affect on testing. i was able to pass FieldStorageCrudTest with this patch.
Comment #41
mradcliffeCrell mentioned that we do not need to use a security-based hashing for database stuff. He things sha1 should be fine.
Comment #42
mradcliffeSwitching to sha1 for hashing per comments.
Also removed Crypt from being used by class.
Comment #43
stefan.r commentedDoes this fix #1008128: Do not use a single underscore as table and index separator on PostgreSQL and SQLite as well?
Comment #44
stefan.r commentedSo I reviewed the previous patch and noticed it still uses single underscores, just wondering if this change would be enough to fix #1008128: Do not use a single underscore as table and index separator on PostgreSQL and SQLite as well?
Comment #45
mradcliffeI ran the "field" testgroup on this patch which is a way to reproduce the issue.
Attached are results with patch and without patch. As you can see this patch helps to uncover more bugs in Drupal :-)
Comment #46
mradcliffeI created #2350791: Link, Boolean, Email field tests attempt to query integer entity id with strings such as "manage" or "structure" as a follow-up because fixing this exposes that, which we've seen before.
Comment #47
mradcliffeViews references this issue in core/modules/views/src/Plugin/views/query/Sql.php wrt to aliases.
We should handle aliases as well.
Comment #48
stefan.r commentedWouldn't it be enough to do like they did in views and just truncate the alias to 60 characters in the addField() method in the query builder for SELECT statements?
If a solution would be more complicated than that, the alias thing should probably be in a separate issue as it isn't breaking any core tests (as opposed to this issue).
Comment #49
mradcliffeAh, that makes sense.
Comment #50
mradcliffeSo, uh, pwolanin just mentioned that we definitely should not use sha1 and there are no performance issues with using sha256. :-)
Comment #51
cmonnow commentedYeah, but sha256 hashes are already 64 characters long.
On another note, while extremely unlikely with non-random or "structured" data like we have, I suppose there still exists the extremely small chance that one unlucky drupaller will create a field name that causes its sha-1 hashes to clash with another.
Comment #52
stefan.r commentedSo I guess we can put back that call to Crypt::hashBase64, which is a base64 encoded raw binary sha256 hash (44 characters). As to the chance of any sha256 collisions, afaik it is tiny enough for us to be able to ignore it :)
Comment #53
stefan.r commentedI looked into the alias problem mentioned in #47 and as it has already been fixed in Views, if we want to fix this in the postgreSQL driver itself, we should probably do so in a separate issue (as merely truncating the alias is not enough). It's also not as pressing as it's not breaking anything yet :)
Revised patch with sha256 hash attached.
Comment #54
stefan.r commentedComment #55
bzrudi71 commentedLooks good to me and because of #39 this seems the right way to go :-)
Comment #56
stefan.r commentedI had accidently taken out the fix for #1008128: Do not use a single underscore as table and index separator on PostgreSQL and SQLite mentioned in #44, here it is again.
Comment #57
bzrudi71 commentedDocumentation nitpick:
We are back to sha256 :-)
Comment #58
stefan.r commentedComment #59
jhedstromTested this manually with Postgres 9.3.5, and also ran the
Drupal\SimpleTestsuite. Not sure if there is anything remaining to be done here. RTBC I think.Comment #60
bzrudi71 commentedPatch applies cleanly and local tests show it works. All changes are within PG driver only, so we won't break anything related to mysql. Lets's not hold it back for another 4 years. RTBC
Comment #61
webchickGreat, thanks for the work on this. Agreed that now this is restricted to just pgsql the risk in committing this is relatively low.
Committed and pushed to 8.x. Thanks!
Comment #63
stefan.r commentedThanks! This should probably be in D7 as well?
Comment #64
kpv commented+ * table/relation names, indexes, primary keys, and constraints. So we map all to long
typo: "too long"
Comment #65
webchickThat's also really awkward English. Fixed as:
And just committed that to 8.0.x directly. Will need to be factored into the 7.x fix.
Comment #67
agerard commentedReally appreciate all the work that has gone into this so far! Just curious, since I may need to migrate some Drupal 7 sites from MySQL to PostgreSQL shortly - was this ever ported back or rolled into D7? If not, any prospect of that happening?
Comment #68
milos.kroulik commentedAnyone tested patch in #33? It seems to be likely candidate for D7 patch.
Comment #69
stefan.r commented#33 is very close to what a backport should look like, it just needs to still incorporate some of the changes from #58.
Comment #70
mradcliffeThere was also a change made to this code in #2443659: PostgreSQL: Fix system\Tests\Entity\FieldSqlStorageTest, which would be useful. But that could also be made in a back port of #1600670: Cannot query Postgres database that has column names with capital letters.
Comment #72
rosk0It's great that this is fixed for PostgreSQL but MySQL has the same issues http://dev.mysql.com/doc/refman/5.7/en/identifiers.html. So should we create another issue for MySQL or reopen this for 8.*?
Comment #75
sylus commentedAttaching an updated patch + interdiff for D7 taking into account the comments since D7 patch posted. Since we kept the sha256 only remaining fixes I think that were neeeded related to comments #39 + #44:
#998898-39: Make sure that the identifiers are not more the 63 characters on PostgreSQL
#998898-44: Make sure that the identifiers are not more the 63 characters on PostgreSQL
Setting to needs review ^_^
Comment #77
rosk0Patch working and fixing identifiers length issue.
Attaching patch from #75 with minor code style fixes.
Comment #80
stefan.r commentedSince I worked on this probably someone else should commit this
Comment #81
joseph.olstadStefan.r, please commit, u are expert and postgresql db layer needs some tlc and it sounds like you're well aware .
Noticed this as I work on :
#2615496: A serial/primary key field can not be added to an existing table for some databases
SVP merci
Comment #82
David_Rothstein commentedLooks like a nice fix to me overall.
Why is this patch removing a public function, especially one that is accessible outside the PostgreSQL driver? If it's really not needed anymore (for any database driver) it could be marked deprecated or something like that. But it shouldn't be removed it in a stable release.
Note that this function still exists (and seems to even still be used) in Drupal 8.
I do not understand how this works. The code in the ensureIdentifiersLength() method specifically references the third argument to the function (via
$args[2]after setting$args = func_get_args()) but these examples only pass two arguments in.This is still missing the fix from #64/#65 above.
The documentation of the $identifier parameter doesn't match the parameter name. Also, it's confusing because it refers to multiple arguments. I think what this actually needs to do is document $identifier on its own, then separately document the extra arguments. https://api.drupal.org/api/drupal/includes%21module.inc/function/module_... is a good example of how to do that.
Also as described above it seems like the third argument to this function is effectively required (and also has specific intended values 'idx', 'key' or 'pkey' as documented earlier in the documentation) so that should be mentioned here too.
This applies to Drupal 8 too - could be a separate issue I guess.
Comment #83
grub3 commentedNice patch.
This is not a PostgreSQL only issue, all standard databases are impacted.
According to SQL99 standards, names should be 18 characters long maximum.
But this is a rather old standard and I wonder if other databases request so low values.
https://mariadb.com/kb/en/sql-99/naming-rules/#26-try-to-stop-a-name-at-...
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2...
So maybe we should truncate $this->hashBase64($identifierName) to an "acceptable" lower value than 63 characters:
$saveIdentifier = 'drupal_' . $this->hashBase64($identifierName) . '_' . $args[2];A maximum length of 8 characters for $this->hashBase64($identifierName) should be enough.
$saveIdentifier = 'drupal_' . substr($this->hashBase64($identifierName),0,8) . '_' . $args[2];And it will be easier to read.
Comment #84
dbiscalchin commentedAfter I applied the patch #77, db_drop_unique_key() stopped working.
Comment #85
pwolanin commentedWas the committed patch correct? From https://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html
it seems unquoted identifiers are folded to lower case, and hence you could have collisions. If quoted, you don't need to remove + and /, and actually by mapping those both to _ leads also a possible collision here.
Comment #86
mradcliffeYes, you're right, @pwolanin. There could be some collisions regardless. The original code had the same issue because the driver isn't consistent with quoting identifiers:
The result being (with or without the reverting that commit directly using DBTNG):
Probably need a new issue regarding this for Drupal 8.
Aside: Also should we create a follow-up issue since this issue was created before the new policy on creating follow-up backport issues?
Comment #87
mradcliffeLooks like we fixed it in #2443659: PostgreSQL: Fix system\Tests\Entity\FieldSqlStorageTest. I should have checked the actual code base instead of just the patch.
Comment #88
joseph.olstadwhat is the current status of postgresql for D7?
anyone have a list of patches they recommend for Postgresql?
I have a client interested in running Aegir to manage 200 D9 multisites using postgresql and Aegir is based on D7.
Comment #89
poker10 commentedI have updated the patch from #77 to match as closely as possible the current Drupal 9.4.x branch. This code is used for 7 years in D8/D9 (from the first commit to the D8 branch) and it seems like that without any issues.
I have made some small changes to this patch to reflect current naming convention in D7 so that if we have primary keys named like "table_pkey" it will still be "table_pkey" after this change and not "table__pkey" (and the same for "table_name_type" vs "table__name__type"). See:
As per #82 (1) I have removed deletion of the unused function, #82 (2) the ensureIdentifiersLength() is now practically the same as in D9, #82 (3, 4) functions comments now match D9.
Another changes I have to made was to adjust usage of quotes with identifiers. There are still some inconsistencies in D9 and D7 codebase potentialy causing issues in PostgreSQL because identifiers without quotes are always folded to lowercase. We have in D9:
So I have to remove quotes in the new identifier name generated when the maxIdentifierLength is exceeded and then was able to keep quotes/no quotes in the respective functions calling ensureIdentifiersLength() as it was before - for the backwards compatibility. This is also related with the change in hashBase64() function, where strtolower needs to be added to workaround these exceptions (until there is some fix available for this behavior):
Caused by:
However I think that fixing all inconsistencies in uppercase/lowercase letters in identifiers is out of scope of this issue and that it should be a separate issue for D7 (or we need to extend this issue: #3262341: [D7] Database test table TEST_UPPERCASE causes PostgreSQL tests to fail to include also indentifiers, not only table names).
When I have applied this new patch + temporarily (as there is no patch yet) removed TEST_UPPERCASE tests (problems caused by #3262341: [D7] Database test table TEST_UPPERCASE causes PostgreSQL tests to fail), the PostgreSQL tests seems to run again (without un-catched or new exceptions). I hope there are still some sites running PostgreSQL able to test a review this patch. It will be great, because this issue is still causing a lot of (un-catched) Simpletest exceptions in D7 and thus tests do not run properly at the moment.
Comment #90
liam morlandIs there a single or small list of tests I can run to test this patch? The whole test suite takes hours on my computer.
Comment #91
joseph.olstadaside from the duplicate table issue, this latest patch looks actually pretty good
[0mPDOException: SQLSTATE[42P07]: Duplicate table: 7 ERROR: relation "simpletest783806test_uppercase" already exists in /var/www/html/includes/database/database.inc:2284postgres db driver has some issue with upper/lowercase , ignoring case it looks like? not sure
Comment #92
liam morlandThe case issue is explained in #3262341: [D7] Database test table TEST_UPPERCASE causes PostgreSQL tests to fail.
Comment #93
poker10 commented@Liam Morland Primarily I have tried tests from "Upgrade path", because this is the place where these exceptions are currently thrown. These tests will create pre-populated database with fields and long table name with indexes. And you can also run all "Database" tests, because this changes affects primarily database operations. But I think that the majority of tests contains database table operations (such as create, delete, rename), because Simpletest uses a lot of temporary tables and data for testing.
The current exceptions in testbot looks like this (these are the ones related with this issue):
These are gone for me after applying the patch.
Comment #94
mcdruid commentedIf we're removing the quoting around the identifier (per #89) we don't need the empty string after
$taghere.nit: cSpell comments are unnecessary in D7.
The extra logic for
pkeyisn't in D9. Why does D7 need this?Comment #95
mcdruid commentedRemoved that trailing empty string, and tweaked comments a little.
#94.3 is the main thing I'd like to address before saying I'm happy with this patch.
Comment #96
andypostnitpick - the first line should be one-liner
Comment #97
poker10 commentedThis part is here to keep the current naming convention for primary keys in D7.
Primary keys in D9 are created with this pattern:
[table_name]__[]__[pkey]And the the preg_match
preg_match('/^' . preg_quote($old_full_name) . '__(.*)__' . preg_quote($index_type) . '/', $index->indexname, $matches);then extracts the
$index_namefrom$matches[1]In D7 the current pattern for primary keys is:
[table_name]_[pkey](and not[table_name]_[]_[pkey])To keep that naming convention the same I had two options - modify
preg_match('/^' . preg_quote($old_full_name) . '_(.*)_' . preg_quote($index_type) . '/', $index->indexname, $matches);because if unchanged, this will return empty
$matcheson existing primary keys names likenode_pkey, or add this additional condition to handle primary keys separately.For this same reason this additional statement was also added (in contrast with D9):
This will assure that
ensureIdentifiersLength()will create primay keys like this[table_name]_[pkey](and not like this[table_name]_[]_[pkey]).Comment #98
mcdruid commented#96 you're right @andypost but this is copy-pasted from D9. Should be fixed there and backported :)
#97 thanks for the explanation which makes sense. I've made another little tweak to that comment, but I'm generally pretty happy with this patch.
I'll ask @Fabianx to review.
Comment #99
fabianx commentedUsually I would suggest to take the prefix and just add the hash to use the full length available (minus the drupal_).
But as this only is for constraints and indexes and MOSTLY concerns testbot in real world usage, I think it's fine to leave as is.
---
RTBC, but one question: I though the Postgres Driver already cut down to 63 chars for table names, field names, etc.
Are we sure we cannot re-use any of that code?
Comment #101
mcdruid commentedFabianx I couldn't find the existing code you're referring to?
Thanks everyone - this is a big step in the direction of getting tests running again for PostgreSQL.
Comment #102
poker10 commented@Fabianx I am not aware that there is a code in Drupal PostgreSQL driver doing this currently. This is done automatically but on the database layer (outside of Drupal scope). And because of this, the problem arises when the real index name (in database) does not match Drupal 7 constructed name - as all identifiers longer than 63 characters were shortened by PostgreSQL database when created.
The patch is/was mostly the backport of the D8/D9 fix (commited 7 years ago).