Problem/Motivation

The commit from #2532476: Menu links should use a TranslationWrapper to encapsulate safe translatable strings from YAML files caused regressions in the upgrade path as demonstrated by the PostgreSQL test bots results since 7/31/2015:

The exceptions all look like this:

unserialize(): Error at offset 61 of 64 bytes<pre class="backtrace">unserialize('O:48:"Drupal\Core\StringTranslation\TranslationWrapper":3:{s:9:"')
Drupal\system\Tests\Update\MenuTreeSerializationTitleTest->testUpdate()
Drupal\simpletest\TestBase->run(Array)

dawehner notes that this is due possibly to incorrect NULL byte handling during the data migration.

Proposed resolution

Rather than converting the existing fields, re-name them, create the blob fields, migrate the data into the serialized objects, and then drop the original fields.

Remaining tasks

  • Investigate the root cause
  • Update issue summary to fill out unknown sections.
  • Write a patch
  • Review patch

User interface changes

None.

API changes

None

Data model changes

None. However update function is changed.

Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue category Bug because of PHP errors
Issue priority Major because PostgreSQL test failures
Prioritized changes Main goal is to fix regressions and broken head.
Disruption not disruptive

Comments

mradcliffe created an issue. See original summary.

mradcliffe’s picture

I didn't get to look at this over the weekend like I mentioned in my comment in #2532476: Menu links should use a TranslationWrapper to encapsulate safe translatable strings from YAML files .

andypost’s picture

looks like entity schema updates are not applied properly, so conversion from varchar to blob could be broken

andypost’s picture

mradcliffe’s picture

mradcliffe’s picture

Yeah, this is more related to #2542748: Automatic entity updates can fail when there is existing content, leaving the site's schema in an unpredictable state which was filed on the same day as the exceptions started showing up on test bots. Corroborated by a manual test run in simpletest. Used the test class that @neclimdul was using today:

Drupal\Core\Entity\EntityStorageException: SQLSTATE[42703]: Undefined column: 7 ERROR: column "revision_translation_affected" of relation "simpletest425427node_field_data" does not exist LINE 1: ..., uid, status, created, changed, promote, sticky, revision_t... ^: INSERT INTO simpletest425427node_field_data (nid, vid, type, langcode, title, uid, status, created, changed, promote, sticky, revision_translation_affected, default_langcode) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8, :db_insert_placeholder_9, :db_insert_placeholder_10, :db_insert_placeholder_11, :db_insert_placeholder_12); Array ( ) in Drupal\Core\Entity\Sql\SqlContentEntityStorage->save() (line 755 of /var/www/drupal8.dev/webroot/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php).

Drupal\Core\Entity\Sql\SqlContentEntityStorage->save(Object)
Drupal\Core\Entity\Entity->save()
Drupal\block\Tests\Update\BlockContextMappingUpdateTest->testUpdateHookN()
neclimdul’s picture

Yes the --die-on-fail has to do with a weird hack to simpletest that doesn't actually work with --die-on-fail. Sadly not related to this.

andypost’s picture

I remember the same issue was in d7 when we converted cache tables to blobs, "bytea" has issues with convertion
http://www.postgresql.org/docs/9.1/static/datatype-binary.html

EDIT http://www.postgresql.org/docs/9.1/static/ddl-alter.html#AEN2793

5.5.6. Changing a Column's Data Type

To convert a column to a different data type, use a command like:

ALTER TABLE products ALTER COLUMN price TYPE numeric(10,2);

This will succeed only if each existing entry in the column can be converted to the new type by an implicit cast. If a more complex conversion is needed, you can add a USING clause that specifies how to compute the new values from the old.

PostgreSQL will attempt to convert the column's default value (if any) to the new type, as well as any constraints that involve the column. But these conversions might fail, or might produce surprising results. It's often best to drop any constraints on the column before altering its type, and then add back suitably modified constraints afterwards.

mradcliffe’s picture

From Issue #2532476, #27, @epong mentions

It looks like system_update_8001 fails in postgresql as after the update, the title column is still of type varchar(255). Its data, however, appears to be serialized.

pwolanin’s picture

So it seems like there is a more fundamental bug in the pgsql driver in terms of schema changes? or is there some other step we need to take?

mradcliffe’s picture

There is special handling in the driver for cases of bytea conversions, but it looks like the Test is not ever getting to the update.

I finally had a chance to sit down and run the test. The test uses the database dump, but then the base class does a rebuild of cache, and that causes the menu serialization to occur on a database that is not ready to do so. And so the update never gets run.

drupal_flush_all_caches()
Drupal\simpletest\WebTestBase->resetAll()
Drupal\simpletest\WebTestBase->rebuildAll()
Drupal\system\Tests\Update\UpdatePathTestBase->rebuildAll()
Drupal\system\Tests\Update\UpdatePathTestBase->setUp()
Drupal\system\Tests\Update\MenuTreeSerializationTitleTest->setUp()

mradcliffe’s picture

Status: Active » Needs review
StatusFileSize
new1.03 KB

Modifying the die-on-fail settings to be less specific helps, but it looks like something still isn't going right. The test still fatal errors, but after the update and for the same unserialize() issue. I did not notice anything in the query logs.

The update was successful and title and description both have bytea as the type (blob:big = bytea).

 title            | bytea                  |                                                          | extended |            
  | The serialized title for the link. May be a TranslationWrapper.
 description      | bytea                  |                                                          | extended |            
  | The serialized description of this link - used for admin pages and title attribute. May be a TranslationWrapper.

(to needs review and then back to needs work)

mradcliffe’s picture

Status: Needs review » Needs work

One oddity I've noticed about update tests is that they run every single test. This makes the test suite more expensive as core adds more database updates.

I haven't found anything that could fix this issue. The test case needs to clear cache, and by that time it's too late, the new code is going to bork because the database isn't ready to handle storing serialized class instance in the database table.

So why is MySQL test bot passing? Why isn't the test suite running MySQL running into the same error when it tries to rebuild menu? Does the test suite running MySQL even try to do the same thing? Why not?

catch’s picture

Issue tags: +D8 upgrade path
catch’s picture

Priority: Major » Critical

Since we agreed to support postgres for 8.0.0, and this is a critical postgres failure, bumping to critical.

However one way to 'fix' this would be to change the database dump to be based on beta14 instead of beta12.

We might have to add some extra test coverage for entity updates, but then this particular update would never need to run and the problem goes away.

catch’s picture

@mradcliffe if the cache clear happens after the updates have run, then why would it be failing?

We shouldn't be clearing caches before updates have run at all.

mradcliffe’s picture

The base test class does a cache rebuild in setUp. I need to spin up a mysql environment and see why it's not failing there.

webchick’s picture

Title: Fix regression to menu serialization in upgrade path that causes thousands of errors » Fix regression to menu serialization in upgrade path that causes thousands of errors in PostgreSQL

Just to let peoples' heart rate down a touch. ;)

catch’s picture

Status: Needs work » Postponed

Postponing on #2558247: Remove rebuildAll() and module install from UpdatePathTestBase, turns out we were hiding this error in the update path tests with a hack to literally remove the errors from the assertion checking, but in a way that only works for MySQL. That's not really a postgres issue at all.

If #2558247: Remove rebuildAll() and module install from UpdatePathTestBase gets done there might be nothing to do here.

xjm’s picture

Issue tags: +D8 critical triage deferred

Discussed with @catch, @webchick, @alexpott, and @effulgentsia. If #2558247: Remove rebuildAll() and module install from UpdatePathTestBase resolves all of this, we can close it as a duplicate. If it doesn't, we should evaluate the criticality of any remaining issues for postgres at that time (so marking for later discussion).

berdir’s picture

So, #2558247: Remove rebuildAll() and module install from UpdatePathTestBase now has a green patch, I just did a postgresql test run there: https://www.drupal.org/pift-ci-job/24511

So, that seems to fix about 50% of the failing tests, but there are still a few that have those notices in menu storage. can someone with a working postgresql environment test that and post a full backtrace of those notices and when they happen exactly?

berdir’s picture

Ok, looks like that issue does not fix this.

But we assume that the reason is quite different now. It is likely that the update functions aren't running through and then we get those errors after an imcomplete update.php run.

So, still unpostponed but we can then look into those real problems once that other issue is in.

dawehner’s picture

For me those tests work locally (I mean they just produce notices):

Notice: unserialize(): Error at offset 61 of 64 bytes in Drupal\Core\Menu\MenuTreeStorage->loadFullMultiple() (line 741 of core/lib/Drupal/Core/Menu/MenuTreeStorage.php).
Drupal\Core\Menu\MenuTreeStorage->loadFullMultiple(Array)
Drupal\Core\Menu\MenuTreeStorage->loadFull('standard.front_page')
Drupal\Core\Menu\MenuTreeStorage->doSave(Array)
Drupal\Core\Menu\MenuTreeStorage->saveRecursive('standard.front_page', Array, Array)
Drupal\Core\Menu\MenuTreeStorage->rebuild(Array)
Drupal\Core\Menu\MenuLinkManager->rebuild()
Drupal\Core\EventSubscriber\MenuRouterRebuildSubscriber->menuLinksRebuild()
Drupal\Core\EventSubscriber\MenuRouterRebuildSubscriber->onRouterRebuild(Object, 'routing.route_finished', Object)
Drupal\Component\EventDispatcher\ContainerAwareEventDispatcher->dispatch('routing.route_finished', Object)
Drupal\Core\Routing\RouteBuilder->rebuild()
Drupal\Core\ProxyClass\Routing\RouteBuilder->rebuild()
drupal_flush_all_caches()
Drupal\system\Controller\DbUpdateController::batchFinished(1, Array, Array, '29 sec')
call_user_func_array(Array, Array)
_batch_finished()
_batch_page(Object)
Drupal\system\Controller\DbUpdateController->handle('start', Object)
call_user_func_array(Array, Array)
Drupal\Core\Update\UpdateKernel->handleRaw(Object)
Drupal\Core\Update\UpdateKernel->handle(Object)
dawehner’s picture

StatusFileSize
new1.42 KB

Some more interesting result: These are the entries which are stored in the database after the menu rebuild:

>>> $b = new \Drupal\Core\StringTranslation\TranslationWrapper('Books');
=> Drupal\Core\StringTranslation\TranslationWrapper {#564}
>>> serialize($b);
=> "O:48:"Drupal\Core\StringTranslation\TranslationWrapper":3:{s:9:"\x00*\x00string";s:5:"Books";s:12:"\x00*\x00arguments";a:0:{}s:10:"\x00*\x00options";a:0:{}}"

For some reasons though, these null bytes cause issues in pgsql, see https://php.net/manual/de/function.serialize.php#96504 as one example
People seem to suggest to use something like https://secure.php.net/manual/en/function.pg-escape-bytea.php#89036
Do we need to care about that in our DB layer?

berdir’s picture

@dawehner: I'm sure we serialize objects with protected properties already, e.g. in the cache tables? Also, shouldn't all tests be failing with those if it would be a more generic problem?

dawehner’s picture

StatusFileSize
new988 bytes

Well sure, I hope that the trait does not make sense worse ... but well you see this unserialize call which is triggered as part the drupal_flush_caches() at the end of update.php
At that point the entries should be fine already. Yeah in other words no clue.

Here is a try to fix some of the test failures which are caused by the fact that we don't convert some exceptions properly

pwolanin’s picture

Looking at the PDO docs, is seems the way we cal execute, all params are just treated as strings:
https://secure.php.net/manual/en/pdostatement.execute.php

Would it make sense so have a way to to bind the values so we can specify that it's binary?

https://secure.php.net/manual/en/pdostatement.bindvalue.php

pwolanin’s picture

Never mind, looks like we are already doing that in the postgres driver.

Could this relate to caching of the table schema? I see in the postgres driver we are checking the schema info to decide which fields are blob to handle them correctly using bindParam() with type \PDO::PARAM_LOB.

alexpott’s picture

catch’s picture

Status: Needs work » Needs review

Running #28 past the bot.

catch’s picture

If #28 comes back green, let's split it off as a quick fix. The less postgres fails we have, the less easy it is to introduce new ones at least.

dawehner’s picture

Just ran the other tests:

fail: [Completion check] Line 54 of core/modules/aggregator/src/Tests/FeedLanguageTest.php:
The test did not complete due to a fatal error.


exception: [Uncaught exception] Line 868 of core/vendor/symfony/dependency-injection/ContainerBuilder.php:
Symfony\Component\DependencyInjection\Exception\InvalidArgumentException: The service definition "renderer" does not exist. in Symfony\Component\DependencyInjection\ContainerBuilder->getDefinition() (line 868 of /var/www/html/core/vendor/symfony/dependency-injection/ContainerBuilder.php). Symfony\Component\DependencyInjection\ContainerBuilder->getDefinition('renderer')
Symfony\Component\DependencyInjection\ContainerBuilder->get('renderer')
Drupal::service('renderer')
install_database_errors(Array, './sites/simpletest/207534/settings.php')
install_verify_database_settings('sites/simpletest/207534')
install_begin_request(Object, Array)
install_drupal(Object, Array)
Drupal\simpletest\WebTestBase->doInstall(Array)
Drupal\simpletest\WebTestBase->setUp()
Drupal\aggregator\Tests\AggregatorTestBase->setUp()
Drupal\aggregator\Tests\FeedLanguageTest->setUp()
Drupal\simpletest\TestBase->run(Array)
simpletest_script_run_one_test('15', 'Drupal\aggregator\Tests\FeedLanguageTest')

And well of course, those tests work fine on my local pgsql installation, :(

catch’s picture

Well #28 fixes half of the fails vs. HEAD so still worth getting in on its own?

webchick’s picture

Status: Needs review » Active

Committed and pushed #28, setting back to active for the rest.

  • webchick committed 4b7ba21 on 8.0.x
    Issue #2548725 by dawehner: Fix database exception handling, causing...
dawehner’s picture

Just an alternative idea for an solution:

Don't store the translation wrapper but rather an array: ['title' => 'meh', 'context' => 't_context', 'translatable' => TRUE] and special case some behaviour.

dawehner’s picture

Status: Active » Needs review
StatusFileSize
new281 bytes

Here is a patch just to run some pgsql test.

dawehner’s picture

Issue tags: +D8 Accelerate

just some ordinary tagging

pwolanin’s picture

StatusFileSize
new3.04 KB

Ok, here's a possible approach now that the early rebuild is removed. Seems to fix the bug for me in a local install with pgsql - possibly it just can't work there to convert UTF-8 varchar to bytea and then try to read it back and serialize it.

pwolanin’s picture

note that this is similar to an approach dawehner suggested originally of creating columns with new names, but the renaming only lasts during the update function so we don't have to change any code.

catch’s picture

#41 looks good - both the patch itself and the test results.

All the other test failures are unrelated to this one - we'll need a new postgres critical to track them though.

dawehner’s picture

Status: Needs review » Reviewed & tested by the community

We are removing test failures, this is great!

catch’s picture

Status: Reviewed & tested by the community » Needs review

Not sure where, but it'd be good to document this workaround centrally for other upgrade paths to emulate.

Is there any way we could make the postgres driver handle some of this?

I'm thinking if changeField() did the following:

- rename old field
- create new field
- copy old field data to new field with an INSERT INTO ..SELECT FROM.

in this case we'd have to then take the copied data and update it again to the correct format.

CNR for that question, if the answer is 'no' then I think this is right.

catch’s picture

Issue tags: +beta target

Also since this requires changing the update that runs for MySQL too, bumping to beta target.

catch’s picture

INSERT INTO ... SELECT FROM won't work if the field is NOT NULL or has a default, since then it'll have data as soon as it's created.

I did find this though: http://www.postgresql.org/message-id/25980.1358264400@sss.pgh.pa.us


ALTER TABLE test ALTER COLUMN val TYPE varchar(255) USING convert(val);

with whatever is appropriate in place of convert().

That seems worth looking into.

pwolanin’s picture

Status: Needs review » Reviewed & tested by the community

@catch - this update was particularly problematic since we wanted to take the existing data and serialize it inside the TranslationWrapper and re-use the field name. That's not a very common pattern for updates.

the postgres driver already has some special handling for bytea fields, but possibly this would be a better general strategy. I don't think it's in scope for this issue, and it's unclear if it would have prevented the problem.

Let's make a follow-up for that: #2560691: Consider more robust alternatives for PostgreSQL when changing fields to a blob type

pwolanin’s picture

pwolanin’s picture

Issue summary: View changes
catch’s picture

@catch - this update was particularly problematic since we wanted to take the existing data and serialize it inside the TranslationWrapper and re-use the field name.

Do we know that this is the actual problem though?

Or is it that the data is corrupted when we do the conversion of the type, then that corruption persists through the actual data update? If it's that, then just fixing the initial type conversion would fix this - and that's data loss just from a type change, not from the extra conversion.

catch’s picture

Status: Reviewed & tested by the community » Needs review
StatusFileSize
new1.69 KB

I looked to see where our bytea special casing was added. It does the exact opposite of what everyone on the internet says you should do with a bytea column (i.e. cast to it).

I get commit hash 9166c436a054187feb29dbf89f9100dbd60c220b

Which was #2443681: PostgreSQL: Fix user\Tests\UserAccountLinksTest, which never had a patch with that code in.

Trying to just nuke that to see what happens.

pwolanin’s picture

I don't think that's right. The current code was added in #1031122: postgres changeField() is unable to convert to bytea column type correctly specifically to avoid this problem

catch’s picture

StatusFileSize
new954 bytes

This is now a re-roll of #1031122: postgres changeField() is unable to convert to bytea column type correctly.

We committed a bad patch, then neither reverted nor fixed it.

pwolanin’s picture

I sill think that's wrong since standard_conforming_strings in ON by default, but we are adding quoting as if it is off.

Guess let's see what the result is of the test, but I can also try it locally.

catch’s picture

Uploading stefan_r's patch from https://www.drupal.org/node/1031122#comment-8836081

If one of these passes, we should probably bump that issue back to critical and mark this one as duplicate.

pwolanin’s picture

Patch #56 fails for me locally.

pwolanin’s picture

StatusFileSize
new3 KB

Let's see if this combination of parts is better.

catch’s picture

Status: Needs review » Closed (duplicate)

Nice one! 25 fails down from 60+.

Going to do what I said in #56 and mark this duplicate of #1031122: postgres changeField() is unable to convert to bytea column type correctly. See you over there.

pwolanin’s picture

Status: Closed (duplicate) » Needs review
StatusFileSize
new3.05 KB
new1.28 KB

re-opening per catch to just fix the update function.

Just a typo fix in code comment and whitespace fix compared to #41.

dawehner’s picture

Status: Needs review » Reviewed & tested by the community

This issue has been RTBC already

catch’s picture

So yes I think #1031122: postgres changeField() is unable to convert to bytea column type correctly is a 'critical postgresql' issue that this rediscovered.

However there are good arguments for doing the update like this regardless of that issue.

Once we commit this (regardless of which issue it goes in for), there is no implicit test coverage for the bytea issue, and that issue also needs more time to sort out the proper fix.

So I think it makes sense to commit this now - to get the total number of postgres test faiures down by over half to around 25. I've just split out the remaining failures to dedicated issues and doubled the criticals list - most of these are new test coverage added by other critical issues afaict. The less, and more consistent, postgres fails we have, the harder it is to introduce new ones because we have at least a chance of spotting them.

Will just let the bot run to get final numbers on remaining test fails, then I'll commit this one.

catch’s picture

Status: Reviewed & tested by the community » Fixed

I only uploaded other people's patches here, so I think I'm good to commit this.

Committed/pushed to 8.0.x, thanks!

  • catch committed 475299e on 8.0.x
    Issue #2548725 by pwolanin, catch, dawehner, mradcliffe: Fix regression...

  • catch committed 447a049 on 8.0.x
    Revert "Issue #2548725 by dawehner: Fix database exception handling,...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.