pasting texts with 4 byte UTF-8 characters in a field leads to lost text and a broken screen (see screenshot) on installs using mysql. Instead the character could be escaped or skipped or a error message could be given to the user.

Steps to reproduce:
1. install D8
2. create a new article
3. paste a character in the body or title from http://www.i18nguy.com/unicode/supplementary-test.html#utf8 or http://grumdrig.com/emoji-list/
4. save
error

Reason for not saving full unicode is due to a bug in mysql, see #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols)
Similar ticket Wordpress: http://core.trac.wordpress.org/ticket/13590 Inserting a 4-byte UTF-8 character truncates data

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Hanno’s picture

Issue summary: View changes

.

Hanno’s picture

Issue summary: View changes

.

Hanno’s picture

Hanno’s picture

Issue summary: View changes

.

grendzy’s picture

grendzy’s picture

Status: Closed (duplicate) » Active

Sorry I didn't see you'd already commented on the previous issue. Can you clarify how this differs from the main utf8mb4 issue?

Hanno’s picture

Damien said:

The database layer triggers an exception, that's all that it does. Catching it and processing it belongs in the upper layers. If they don't do that correctly, it's a bug there, not in the database layer.

So that's why I opened this issue here. Not sure though if we should handle this in a upper layer like the field system as this is mysql specific and not for other databases as far as i know.

Hanno’s picture

Issue summary: View changes

.

Pancho’s picture

Priority: Normal » Major
Issue tags: +Needs backport to D7

Yes, the difference is that the other issue is a normal task (or maybe normal bug) with a proper fix bringing major consequences. Therefore it might or might not land in D8, and probably won't be backported.

This one on the other side is easy to fix yet it might even be considered critical because data is lost. In this case, major might be enough because only few users are affected and data loss would in most cases be limited to some direct input. But it has to be both fixed in D8 and backported to D7.

pingers’s picture

So, ideally we should get back to the previous page (in this case content creation), without losing form data.

if ($node->id()) is treated as the success condition in NodeFormController->save(). However, $node->id is set regardless of whether the rest of the database transaction fails or not.

Also, this is dealing with a mysql specific error well outside the realm of mysql storage controller. If it's a form, what choice do you have though?

form_set_error() will not show a message when the form is rebuilt, so using drupal_set_message().

Here's a terrible patch which catches much more than just this particular exception. But, you do get back to the form and don't lose data.

pingers’s picture

Uh, now a patch without a syntax error. Doh.

swentel’s picture

Component: field system » entity system
Priority: Major » Normal
Status: Active » Needs work

That's really not the right place. Fields can be on every entity type. This needs to happen in the entity storage controller somewhere, probably DatabaseStorageControllerNG or so.

This is not entirely field system alone as this can happen to titles as well which is part of the entity system, so moving component too.

pingers’s picture

Okay, but DatabaseStorageControllerNG::save() is what is throwing the EntityStorageException in the first place.

(Relevant) Call stack is:
DatabaseStorageControllerNG::save()
Entity::save()
NodeFormController::save()

Entity object doesn't know about context (form submission), but we need to inform a user that the exception occurred, rather than just WSOD. That leaves NodeFormController... and so either all forms should handle exceptions when saving entities, or we just don't throw the exception in the first place (which feels wrong). I could be missing an obvious solution here. Thoughts?

pingers’s picture

Issue summary: View changes

Wordpress ticket

Damien Tournoud’s picture

Issue summary: View changes

The try/catch belongs in NodeFormController, nowhere else.

Damien Tournoud’s picture

Status: Needs work » Needs review

So I think #7 is actually what we want.

The last submitted patch, 6: 2002100_6_catch_entitystorageexception.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 7: 2002100_7_catch_entitystorageexception.patch, failed testing.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

David_Rothstein’s picture

Priority: Normal » Major

Is this still an issue in Drupal 8, or can it be bumped back down to Drupal 7?

For Drupal 7, there are some relevant patches by @pwolanin such as #2488180-48: Support full UTF-8 (emojis, Asian symbols, mathematical symbols) on MySQL and other database drivers when they are configured to allow it which can be copied here, though following that issue they will need to be updated to check whether the database driver supports 4 byte UTF-8 and only act if it doesn't.

I think this should be major priority since it can lead to a PDOException based on user input.

There are some contrib modules like https://www.drupal.org/project/strip_utf8mb4 and https://www.drupal.org/project/unicode which address this too, but it's really something we should try to fix in core.

David_Rothstein’s picture

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

alexpott’s picture

alexpott’s picture

Status: Closed (duplicate) » Active

Hmmm maybe closing is wrong perhaps in D7 where full UTF8 is not enabled we need to error before trying to save the content.

donquixote’s picture

Just saying, the same error can be triggered by a site search.
Pick your favorite popular D7 site and open "/search/site/%F0%9F%98%89". If the site does not have utf8mb4, you get an error page.
Probably the same can happen with watchdog entries or anything else.

So imo every site should be encouraged to enable utf8mb4, by something stronger than the current pleasant green notice in the status report.

donquixote’s picture

Pick your favorite popular D7 site and open "/search/site/%F0%9F%98%89". If the site does not have utf8mb4, you get an error page.

I found exactly one popular site with this problem. It was the first one I tested, so I thought it would be more common.
I will not post it here so they don't get spammed. But I will contact them.

hass’s picture

I tried to switch to mb4, but the problem seems to be you need file per table. Most hosters have file handle limits... not sure why the file per table is needed... anyone knows? I cannot enable it without taking the system down... too many tables.