Problem/Motivation
The commit from #2532476: Menu links should use a TranslationWrapper to encapsulate safe translatable strings from YAML files caused regressions in the upgrade path as demonstrated by the PostgreSQL test bots results since 7/31/2015:
The exceptions all look like this:
unserialize(): Error at offset 61 of 64 bytes<pre class="backtrace">unserialize('O:48:"Drupal\Core\StringTranslation\TranslationWrapper":3:{s:9:"')
Drupal\system\Tests\Update\MenuTreeSerializationTitleTest->testUpdate()
Drupal\simpletest\TestBase->run(Array)
dawehner notes that this is due possibly to incorrect NULL byte handling during the data migration.
Proposed resolution
Rather than converting the existing fields, re-name them, create the blob fields, migrate the data into the serialized objects, and then drop the original fields.
Remaining tasks
Investigate the root cause
Update issue summary to fill out unknown sections.
Write a patch
Review patch
User interface changes
None.
API changes
None
Data model changes
None. However update function is changed.
Beta phase evaluation
Comments
Comment #2
mradcliffeI didn't get to look at this over the weekend like I mentioned in my comment in #2532476: Menu links should use a TranslationWrapper to encapsulate safe translatable strings from YAML files .
Comment #3
andypostlooks like entity schema updates are not applied properly, so conversion from varchar to blob could be broken
Comment #4
andypostJust tested manually on "9.4.4" test fails
Comment #5
mradcliffeAlso could be related to #2549045: Update Tests can't run with --die-on-fail.
Comment #6
mradcliffeYeah, this is more related to #2542748: Automatic entity updates can fail when there is existing content, leaving the site's schema in an unpredictable state which was filed on the same day as the exceptions started showing up on test bots. Corroborated by a manual test run in simpletest. Used the test class that @neclimdul was using today:
Comment #7
neclimdulYes the --die-on-fail has to do with a weird hack to simpletest that doesn't actually work with --die-on-fail. Sadly not related to this.
Comment #8
andypostI remember the same issue was in d7 when we converted cache tables to blobs, "bytea" has issues with convertion
http://www.postgresql.org/docs/9.1/static/datatype-binary.html
EDIT http://www.postgresql.org/docs/9.1/static/ddl-alter.html#AEN2793
Comment #9
mradcliffeFrom Issue #2532476, #27, @epong mentions
Comment #10
pwolanin commentedSo it seems like there is a more fundamental bug in the pgsql driver in terms of schema changes? or is there some other step we need to take?
Comment #11
mradcliffeThere is special handling in the driver for cases of bytea conversions, but it looks like the Test is not ever getting to the update.
I finally had a chance to sit down and run the test. The test uses the database dump, but then the base class does a rebuild of cache, and that causes the menu serialization to occur on a database that is not ready to do so. And so the update never gets run.
drupal_flush_all_caches()
Drupal\simpletest\WebTestBase->resetAll()
Drupal\simpletest\WebTestBase->rebuildAll()
Drupal\system\Tests\Update\UpdatePathTestBase->rebuildAll()
Drupal\system\Tests\Update\UpdatePathTestBase->setUp()
Drupal\system\Tests\Update\MenuTreeSerializationTitleTest->setUp()
Comment #12
mradcliffeModifying the die-on-fail settings to be less specific helps, but it looks like something still isn't going right. The test still fatal errors, but after the update and for the same unserialize() issue. I did not notice anything in the query logs.
The update was successful and title and description both have bytea as the type (blob:big = bytea).
(to needs review and then back to needs work)
Comment #13
mradcliffeOne oddity I've noticed about update tests is that they run every single test. This makes the test suite more expensive as core adds more database updates.
I haven't found anything that could fix this issue. The test case needs to clear cache, and by that time it's too late, the new code is going to bork because the database isn't ready to handle storing serialized class instance in the database table.
So why is MySQL test bot passing? Why isn't the test suite running MySQL running into the same error when it tries to rebuild menu? Does the test suite running MySQL even try to do the same thing? Why not?
Comment #14
catchComment #15
catchSince we agreed to support postgres for 8.0.0, and this is a critical postgres failure, bumping to critical.
However one way to 'fix' this would be to change the database dump to be based on beta14 instead of beta12.
We might have to add some extra test coverage for entity updates, but then this particular update would never need to run and the problem goes away.
Comment #16
catch@mradcliffe if the cache clear happens after the updates have run, then why would it be failing?
We shouldn't be clearing caches before updates have run at all.
Comment #17
mradcliffeThe base test class does a cache rebuild in setUp. I need to spin up a mysql environment and see why it's not failing there.
Comment #18
catchOh I see https://api.drupal.org/api/drupal/core%21modules%21system%21src%21Tests%...
Opened #2558247: Remove rebuildAll() and module install from UpdatePathTestBase.
Comment #19
webchickJust to let peoples' heart rate down a touch. ;)
Comment #20
andypostcalled to re-test pg after #2542748: Automatic entity updates can fail when there is existing content, leaving the site's schema in an unpredictable state
Comment #21
catchPostponing on #2558247: Remove rebuildAll() and module install from UpdatePathTestBase, turns out we were hiding this error in the update path tests with a hack to literally remove the errors from the assertion checking, but in a way that only works for MySQL. That's not really a postgres issue at all.
If #2558247: Remove rebuildAll() and module install from UpdatePathTestBase gets done there might be nothing to do here.
Comment #22
xjmDiscussed with @catch, @webchick, @alexpott, and @effulgentsia. If #2558247: Remove rebuildAll() and module install from UpdatePathTestBase resolves all of this, we can close it as a duplicate. If it doesn't, we should evaluate the criticality of any remaining issues for postgres at that time (so marking for later discussion).
Comment #23
berdirSo, #2558247: Remove rebuildAll() and module install from UpdatePathTestBase now has a green patch, I just did a postgresql test run there: https://www.drupal.org/pift-ci-job/24511
So, that seems to fix about 50% of the failing tests, but there are still a few that have those notices in menu storage. can someone with a working postgresql environment test that and post a full backtrace of those notices and when they happen exactly?
Comment #24
berdirOk, looks like that issue does not fix this.
But we assume that the reason is quite different now. It is likely that the update functions aren't running through and then we get those errors after an imcomplete update.php run.
So, still unpostponed but we can then look into those real problems once that other issue is in.
Comment #25
dawehnerFor me those tests work locally (I mean they just produce notices):
Comment #26
dawehnerSome more interesting result: These are the entries which are stored in the database after the menu rebuild:
For some reasons though, these null bytes cause issues in pgsql, see https://php.net/manual/de/function.serialize.php#96504 as one example
People seem to suggest to use something like https://secure.php.net/manual/en/function.pg-escape-bytea.php#89036
Do we need to care about that in our DB layer?
Comment #27
berdir@dawehner: I'm sure we serialize objects with protected properties already, e.g. in the cache tables? Also, shouldn't all tests be failing with those if it would be a more generic problem?
Comment #28
dawehnerWell sure, I hope that the trait does not make sense worse ... but well you see this unserialize call which is triggered as part the drupal_flush_caches() at the end of update.php
At that point the entries should be fine already. Yeah in other words no clue.
Here is a try to fix some of the test failures which are caused by the fact that we don't convert some exceptions properly
Comment #29
pwolanin commentedLooking at the PDO docs, is seems the way we cal execute, all params are just treated as strings:
https://secure.php.net/manual/en/pdostatement.execute.php
Would it make sense so have a way to to bind the values so we can specify that it's binary?
https://secure.php.net/manual/en/pdostatement.bindvalue.php
Comment #30
pwolanin commentedNever mind, looks like we are already doing that in the postgres driver.
Could this relate to caching of the table schema? I see in the postgres driver we are checking the schema info to decide which fields are blob to handle them correctly using bindParam() with type \PDO::PARAM_LOB.
Comment #31
alexpott#2558247: Remove rebuildAll() and module install from UpdatePathTestBase has landed
Comment #32
catchRunning #28 past the bot.
Comment #33
catchIf #28 comes back green, let's split it off as a quick fix. The less postgres fails we have, the less easy it is to introduce new ones at least.
Comment #34
dawehnerJust ran the other tests:
And well of course, those tests work fine on my local pgsql installation, :(
Comment #35
catchWell #28 fixes half of the fails vs. HEAD so still worth getting in on its own?
Comment #36
webchickCommitted and pushed #28, setting back to active for the rest.
Comment #38
dawehnerJust an alternative idea for an solution:
Don't store the translation wrapper but rather an array:
['title' => 'meh', 'context' => 't_context', 'translatable' => TRUE]and special case some behaviour.Comment #39
dawehnerHere is a patch just to run some pgsql test.
Comment #40
dawehnerjust some ordinary tagging
Comment #41
pwolanin commentedOk, here's a possible approach now that the early rebuild is removed. Seems to fix the bug for me in a local install with pgsql - possibly it just can't work there to convert UTF-8 varchar to bytea and then try to read it back and serialize it.
Comment #42
pwolanin commentednote that this is similar to an approach dawehner suggested originally of creating columns with new names, but the renaming only lasts during the update function so we don't have to change any code.
Comment #43
catch#41 looks good - both the patch itself and the test results.
All the other test failures are unrelated to this one - we'll need a new postgres critical to track them though.
Comment #44
dawehnerWe are removing test failures, this is great!
Comment #45
catchNot sure where, but it'd be good to document this workaround centrally for other upgrade paths to emulate.
Is there any way we could make the postgres driver handle some of this?
I'm thinking if changeField() did the following:
- rename old field
- create new field
- copy old field data to new field with an INSERT INTO ..SELECT FROM.
in this case we'd have to then take the copied data and update it again to the correct format.
CNR for that question, if the answer is 'no' then I think this is right.
Comment #46
catchAlso since this requires changing the update that runs for MySQL too, bumping to beta target.
Comment #47
catchINSERT INTO ... SELECT FROM won't work if the field is NOT NULL or has a default, since then it'll have data as soon as it's created.
I did find this though: http://www.postgresql.org/message-id/25980.1358264400@sss.pgh.pa.us
That seems worth looking into.
Comment #48
pwolanin commented@catch - this update was particularly problematic since we wanted to take the existing data and serialize it inside the TranslationWrapper and re-use the field name. That's not a very common pattern for updates.
the postgres driver already has some special handling for bytea fields, but possibly this would be a better general strategy. I don't think it's in scope for this issue, and it's unclear if it would have prevented the problem.
Let's make a follow-up for that: #2560691: Consider more robust alternatives for PostgreSQL when changing fields to a blob type
Comment #49
pwolanin commentedComment #50
pwolanin commentedComment #51
catchDo we know that this is the actual problem though?
Or is it that the data is corrupted when we do the conversion of the type, then that corruption persists through the actual data update? If it's that, then just fixing the initial type conversion would fix this - and that's data loss just from a type change, not from the extra conversion.
Comment #52
catchI looked to see where our bytea special casing was added. It does the exact opposite of what everyone on the internet says you should do with a bytea column (i.e. cast to it).
I get commit hash 9166c436a054187feb29dbf89f9100dbd60c220b
Which was #2443681: PostgreSQL: Fix user\Tests\UserAccountLinksTest, which never had a patch with that code in.
Trying to just nuke that to see what happens.
Comment #53
pwolanin commentedI don't think that's right. The current code was added in #1031122: postgres changeField() is unable to convert to bytea column type correctly specifically to avoid this problem
Comment #54
catchThis is now a re-roll of #1031122: postgres changeField() is unable to convert to bytea column type correctly.
We committed a bad patch, then neither reverted nor fixed it.
Comment #55
pwolanin commentedI sill think that's wrong since standard_conforming_strings in ON by default, but we are adding quoting as if it is off.
Guess let's see what the result is of the test, but I can also try it locally.
Comment #56
catchUploading stefan_r's patch from https://www.drupal.org/node/1031122#comment-8836081
If one of these passes, we should probably bump that issue back to critical and mark this one as duplicate.
Comment #57
pwolanin commentedPatch #56 fails for me locally.
Comment #58
pwolanin commentedLet's see if this combination of parts is better.
Comment #59
catchNice one! 25 fails down from 60+.
Going to do what I said in #56 and mark this duplicate of #1031122: postgres changeField() is unable to convert to bytea column type correctly. See you over there.
Comment #60
pwolanin commentedre-opening per catch to just fix the update function.
Just a typo fix in code comment and whitespace fix compared to #41.
Comment #61
dawehnerThis issue has been RTBC already
Comment #62
catchSo yes I think #1031122: postgres changeField() is unable to convert to bytea column type correctly is a 'critical postgresql' issue that this rediscovered.
However there are good arguments for doing the update like this regardless of that issue.
Once we commit this (regardless of which issue it goes in for), there is no implicit test coverage for the bytea issue, and that issue also needs more time to sort out the proper fix.
So I think it makes sense to commit this now - to get the total number of postgres test faiures down by over half to around 25. I've just split out the remaining failures to dedicated issues and doubled the criticals list - most of these are new test coverage added by other critical issues afaict. The less, and more consistent, postgres fails we have, the harder it is to introduce new ones because we have at least a chance of spotting them.
Will just let the bot run to get final numbers on remaining test fails, then I'll commit this one.
Comment #63
catchI only uploaded other people's patches here, so I think I'm good to commit this.
Committed/pushed to 8.0.x, thanks!