pasting texts with 4 byte UTF-8 characters in a field leads to lost text and a broken screen (see screenshot) on installs using mysql. Instead the character could be escaped or skipped or a error message could be given to the user.
Steps to reproduce:
1. install D8
2. create a new article
3. paste a character in the body or title from http://www.i18nguy.com/unicode/supplementary-test.html#utf8 or http://grumdrig.com/emoji-list/
4. save
Reason for not saving full unicode is due to a bug in mysql, see #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols)
Similar ticket Wordpress: http://core.trac.wordpress.org/ticket/13590 Inserting a 4-byte UTF-8 character truncates data
Comment | File | Size | Author |
---|---|---|---|
#19 | Screen Shot 2016-09-27 at 12.39.56.png | 141.05 KB | alexpott |
#7 | 2002100_7_catch_entitystorageexception.patch | 2.22 KB | pingers |
#6 | 2002100_6_catch_entitystorageexception.patch | 2.25 KB | pingers |
screenshotutf8.png | 150.35 KB | Hanno |
Comments
Comment #0.0
Hanno CreditAttribution: Hanno commented.
Comment #0.1
Hanno CreditAttribution: Hanno commented.
Comment #1
Hanno CreditAttribution: Hanno commentedComment #1.0
Hanno CreditAttribution: Hanno commented.
Comment #2
grendzy CreditAttribution: grendzy commented#1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols)
Comment #3
grendzy CreditAttribution: grendzy commentedSorry I didn't see you'd already commented on the previous issue. Can you clarify how this differs from the main utf8mb4 issue?
Comment #4
Hanno CreditAttribution: Hanno commentedDamien said:
So that's why I opened this issue here. Not sure though if we should handle this in a upper layer like the field system as this is mysql specific and not for other databases as far as i know.
Comment #4.0
Hanno CreditAttribution: Hanno commented.
Comment #5
PanchoYes, the difference is that the other issue is a normal task (or maybe normal bug) with a proper fix bringing major consequences. Therefore it might or might not land in D8, and probably won't be backported.
This one on the other side is easy to fix yet it might even be considered critical because data is lost. In this case, major might be enough because only few users are affected and data loss would in most cases be limited to some direct input. But it has to be both fixed in D8 and backported to D7.
Comment #6
pingers CreditAttribution: pingers commentedSo, ideally we should get back to the previous page (in this case content creation), without losing form data.
if ($node->id()) is treated as the success condition in NodeFormController->save(). However, $node->id is set regardless of whether the rest of the database transaction fails or not.
Also, this is dealing with a mysql specific error well outside the realm of mysql storage controller. If it's a form, what choice do you have though?
form_set_error() will not show a message when the form is rebuilt, so using drupal_set_message().
Here's a terrible patch which catches much more than just this particular exception. But, you do get back to the form and don't lose data.
Comment #7
pingers CreditAttribution: pingers commentedUh, now a patch without a syntax error. Doh.
Comment #8
swentel CreditAttribution: swentel commentedThat's really not the right place. Fields can be on every entity type. This needs to happen in the entity storage controller somewhere, probably DatabaseStorageControllerNG or so.
This is not entirely field system alone as this can happen to titles as well which is part of the entity system, so moving component too.
Comment #9
pingers CreditAttribution: pingers commentedOkay, but DatabaseStorageControllerNG::save() is what is throwing the EntityStorageException in the first place.
(Relevant) Call stack is:
DatabaseStorageControllerNG::save()
Entity::save()
NodeFormController::save()
Entity object doesn't know about context (form submission), but we need to inform a user that the exception occurred, rather than just WSOD. That leaves NodeFormController... and so either all forms should handle exceptions when saving entities, or we just don't throw the exception in the first place (which feels wrong). I could be missing an obvious solution here. Thoughts?
Comment #9.0
pingers CreditAttribution: pingers commentedWordpress ticket
Comment #10
Damien Tournoud CreditAttribution: Damien Tournoud commentedThe try/catch belongs in NodeFormController, nowhere else.
Comment #11
Damien Tournoud CreditAttribution: Damien Tournoud commentedSo I think #7 is actually what we want.
Comment #15
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedIs this still an issue in Drupal 8, or can it be bumped back down to Drupal 7?
For Drupal 7, there are some relevant patches by @pwolanin such as #2488180-48: Support full UTF-8 (emojis, Asian symbols, mathematical symbols) on MySQL and other database drivers when they are configured to allow it which can be copied here, though following that issue they will need to be updated to check whether the database driver supports 4 byte UTF-8 and only act if it doesn't.
I think this should be major priority since it can lead to a PDOException based on user input.
There are some contrib modules like https://www.drupal.org/project/strip_utf8mb4 and https://www.drupal.org/project/unicode which address this too, but it's really something we should try to fix in core.
Comment #17
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedComment #19
alexpottThis is fixed in D8 see screenshot:
I also think there are options to fix this in D7 now - see #2488180: Support full UTF-8 (emojis, Asian symbols, mathematical symbols) on MySQL and other database drivers when they are configured to allow it
Comment #20
alexpottHmmm maybe closing is wrong perhaps in D7 where full UTF8 is not enabled we need to error before trying to save the content.
Comment #21
donquixote CreditAttribution: donquixote commentedJust saying, the same error can be triggered by a site search.
Pick your favorite popular D7 site and open "/search/site/%F0%9F%98%89". If the site does not have utf8mb4, you get an error page.
Probably the same can happen with watchdog entries or anything else.
So imo every site should be encouraged to enable
utf8mb4
, by something stronger than the current pleasant green notice in the status report.Comment #22
donquixote CreditAttribution: donquixote commentedI found exactly one popular site with this problem. It was the first one I tested, so I thought it would be more common.
I will not post it here so they don't get spammed. But I will contact them.
Comment #23
hass CreditAttribution: hass commentedI tried to switch to mb4, but the problem seems to be you need file per table. Most hosters have file handle limits... not sure why the file per table is needed... anyone knows? I cannot enable it without taking the system down... too many tables.