Problem/Motivation
The Drupal MySQL driver does not currently provide full UTF-8 support. The (confusingly named) MySQL utf8
charset only provides support for the basic character plane (out of 17 in total), which constitutes 6% of possible characters. As of MySQL 5.5.3 (2010), the utf8mb4
charset provides full UTF-8 support, being completely backwards compatible, not requiring more space than utf8
for characters that are in the utf8
set, and using extra byte for characters outside of the utf8
set.
Proposed resolution
Provide an optional setting in settings.php to enable utf8mb4:
$database['default']['default']['charset'] = 'utf8mb4';
$database['default']['default']['collation'] = 'utf8mb4_general_ci';
Note: everything in the attached patch is completely optional. All current Drupal installs are unaffected until the database settings for a MySQL database are changed to the utf8mb4 charset.
The patch should completely fix new installations, however when using this setting on an existing installation, all existing tables must be converted to the utf8mb4 charset first.
In order for this setting to work, all of the following conditions need to be met:
1. In order to allow for large indexes, MySQL must be set up with the following my.cnf settings:
[mysqld]
innodb_large_prefix=true
innodb_file_format=barracuda
innodb_file_per_table=true
These settings are available as of MySQL 5.5.14, and are defaults in MySQL 5.7.7 and up.
2. The PHP MySQL driver must support the utf8mb4 charset (libmysqlclient 5.5.3 and up, as well as mysqlnd 5.0.9 and up).
3. The MySQL server must support the utf8mb4 charset (5.5.3 and up).
Remaining tasks
- Review/commit
- Review installation instructions in [#2754539]
- Work on follow-up issue to implement a UI for this in the installer: #2600982: Create UI support for utf8mb4 support
User interface changes
N/A
If you set up your database in settings.php prior to running the installer, and your database system does not support utf8mb4, we finish early and show the following error:
API changes
N/A
Comment | File | Size | Author |
---|---|---|---|
#171 | interdiff.txt | 6.89 KB | David_Rothstein |
#171 | 2488180-171.patch | 14.58 KB | David_Rothstein |
#165 | 2488180-165.patch | 14.58 KB | stefan.r |
#165 | interdiff.txt | 493 bytes | stefan.r |
#164 | interdiff.txt | 1.84 KB | stefan.r |
Comments
Comment #1
stovak CreditAttribution: stovak commentedComment #2
stovak CreditAttribution: stovak commentedComment #3
stovak CreditAttribution: stovak commentedComment #4
stovak CreditAttribution: stovak commentedComment #5
stovak CreditAttribution: stovak commentedComment #6
stovak CreditAttribution: stovak commentedComment #7
stovak CreditAttribution: stovak commentedThe last one fixes an issue with schema lengths being incompatible with utf8mb4. Indexed varchar fields cannot be longer than 191 bytes.
Comment #8
stovak CreditAttribution: stovak commentedComment #9
stovak CreditAttribution: stovak commentedComment #10
stovak CreditAttribution: stovak commentedComment #11
stovak CreditAttribution: stovak commentedComment #12
David Strauss-1 on adding a configuration option. If "utf8mb4" is as good as you're saying, why not always use it?
Comment #13
joelpittet@David Strauss Wordpress recently auto-updated to this and from what I've heard they had a few issues with the transition to utfmb4.
https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/
https://core.trac.wordpress.org/ticket/21212
Maybe the issues weren't all that severe but opt-in may be a better approach here?
Comment #14
David Strauss@joelpittet Looks like the only issue, other than minimum versions, is hitting the maximum index size a little faster.
Comment #15
joelpittet@David Strauss they are auto updating the charset/collation as well on the upgrade which we are not. And I failed pretty bad last attempt at this upgrade on a D7 site(willing to try again though). So just wanted to put in a warning.
We could try to do an update hook that covers all the tables Drupal knows about.
https://core.trac.wordpress.org/attachment/ticket/21212/21212.2.diff
Comment #16
stovak CreditAttribution: stovak commented@david - Existing installs will not have the tables converted and moving the entire Drupal platform to use this encoding exclusively was not something I assumed everyone would want. I have a script to convert the tables to utf8mb4. The initial goal as just "making drupal go" on mb4, which it currently does not. Should i include the table updater in /scripts/ folder with this patch?
Comment #17
joelpittet@stovak I'm interested to see the script you use and give it a whirl.
Comment #18
joelpittetI'm leaning towards the full on utf8mb4 by default method. How can we mitigate the risks?
Comment #19
joelpittetDefinitely needs some docs and some changes to default.settings.php
Extra white space at the end of the lines.
Whitespace removal not needed.
Capital letter to start period at the end. Coding standards.
Period at the end.
Comment #20
mikeytown2 CreditAttribution: mikeytown2 commentedJust throwing this out that if we had support for charset and collation in hook_schema that would be awesome. I've switched some tables over to ascii, ascii_bin for 5-20% faster queries. Best example being the bunder in advagg where I've been able to drop 40ms of query time (100ms to 60ms) #2487087: Speed up the bundler, use char instead of varchar; also use ascii_bin instead of utf8_bin.
Comment #21
joelpittethappened quite a few times with:
ALTER TABLE table_name_here CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
The following tables:
How safe is it to convert the 255 varchar columns to 191? Wouldn't that truncate data? Or is there a better way?
Comment #22
stovak CreditAttribution: stovak commented@joel - here's a symfony console command i used on my sites. If included in the patch, I would port the script so that the symfony stack wasn't a requirement and use the conventions of the other files in the /scripts folder, e.g. use database from bootstrap.inc.
https://gist.github.com/stovak/eb7361c5de2921197f04
Yes, the script truncates the columns. Currently, i'm only working on a single site buildout that has the emoji requirement and I didn't have any columns with long values that were varchar, so truncation to varchar(191) was not an issue for me. I'm not sure how to mitigate the risk for the rest of the community. That would need some careful thought. Off the top of my head, I think node titles would be the biggest culprit.
I agree that EVENTUALLY D7 should embrace mb4 as its default. MySQL totally screwed the pooch with it's 3byte implementation and it's ridiculous that we find ourselves here. I'm just not sure how the Drupal community... who are resistant to upgrading from php 5.3... would feel having this forced on them in a point Drupal release. There would need to be messaging and pre-warning about the fact that it was a breaking change. There's no going back once you've made the conversion.
If it just pops up in a release that the update changes the table structure of the entire db, I think there would be villagers with pitchforks.
Comment #23
stovak CreditAttribution: stovak commented@mikey = I think that kind of a change is out of scope of this thread. That would need to be a different issue. This focus is not performance, its ability to store correct text values for languages that require 4-byte characters.
Comment #24
stovak CreditAttribution: stovak commentedHere's the first stab at converting the DB script away from Symfony and using it standalone:
https://gist.github.com/stovak/8b0e45774226b915d0ac
Comment #25
mikeytown2 CreditAttribution: mikeytown2 commentedIf D7 schema supported different charset and collations the conversion could easily be done by using db_change_field().
Code could look something like this once the changes are put into the correct hook_schema implementations.
Edit: Here's the script I created to change the collation in advagg
Comment #26
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedWhy was this filed as a separate issue from #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols)? I think we might want to close it as a duplicate; that issue is already marked for backport, and having discussions going on in two places at once isn't really good.
Regarding the default behavior, seems worth keeping an eye on WordPress's experience. We don't want to break things, but it would be really nice to make this the default. Presumably if we did that we'd do it in a similar way as they did (don't change the core MySQL requirement; just make it the default only for sites that are running on a new enough version of MySQL). But yeah, making it an opt-in configuration setting is certainly the safer way to go.
In https://www.drupal.org/node/2473301#comment-9858035 @catch suggested making it the default on new installs only (again, only if their MySQL version is high enough), which may be a good compromise.
Comment #27
stovak CreditAttribution: stovak commented@david The back port fix is so completely different from the D8 fix I assumed it would be tracked separately. The DB subsystems are completely different. But if this conversation needs to go on somewhere else, i'm more than happy to close this issue. Not really interested in anything other than getting the issue resolved and back ported to 7.
The "patch" process for core is a mystery to me so I did what I knew to do.
Comment #28
stefan.r CreditAttribution: stefan.r commentedIn terms of charsets we should probably only allow utf8/utf8mb4 (and possibly ascii, although that may be outside of the scope of this issue) as options. We don't support other charsets on D8 either.
Truncating tables to 191 characters seems quite invasive. We'd only need to do it on columns with unique indexes or primary keys (regular columns such as node title are fine), but there are still a lot of those in core and contrib, and some may actually need the extra 64 characters.
Our best may be to only use utf8mb4 only if
innodb_large_prefixes
is enabled (as of MySQL 5.7 that will be the case by default, in earlier versions that requires some extra configuration), which allows for the varchar fields to stay at 255 characters even with utf8mb4. That way we wouldn't need to deal with the 191 character limitation or backport workarounds such as #1923406: Use ASCII character set on alphanumeric fields so we can index all 255 charactersComment #29
stefan.r CreditAttribution: stefan.r commentedCan we summarize the comments from this thread in #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols)? There's no need for this to be a separate issue and the D8 patch is close to being committed anyway.
Comment #30
rlustemberg CreditAttribution: rlustemberg commentedI don't understand why this is being closed as a duplicate. I have not seen any Drupal 7 solution on https://www.drupal.org/node/1314214
Comment #31
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedThanks, and no worries. It looks like we got some valuable discussion here anyway, about how to handle the D7-specific part of this.
From a code standpoint, I actually don't think they are very different at all. Most of the Drupal 8 patch is dealing with 191 character issues around the codebase which would need to be considered in Drupal 7 too... Although I guess @stefan.r has an interesting idea above for how to avoid dealing with that in Drupal 7.
Comment #32
stovak CreditAttribution: stovak commentedI agree that truncating those fields is not anything we should put in core. Hence my reticence at including my "update" script. I hadn't thought about truncating ONLY fields that have indexes. I will look into that today and modify the latest version of my conversion script.
@stefan - i think we have three different use cases here:
My suggestion would be incremental change that eventually resolves all three use cases. I would say that the first push to core should simply solve the problems with *ENABLING* mb4. In other words... its there if you want it, we're going to solve all the problems before we make it mandatory for all users in a later update.
So including an "update" script (not necessarily mine) and a way to simply 'make it go' to solve the problem for people who _NEED_ to solve the problem. Then later come back and update core to ONLY use mb4 encoding once the install and new field process has been sorted.
Comment #33
stovak CreditAttribution: stovak commentedComment #34
stovak CreditAttribution: stovak commentedI'd like to continue this conversation a little while longer before we push back to the other ticket, after the d8 patch is committed, if that's ok with all concerned.
Comment #35
stefan.r CreditAttribution: stefan.r commentedWell the utf8->utf8mb migration module could also be contrib. And actually regular indexes are fine (it's not a big deal if those are only 191 characters, InnoDB will just index 191 characters out of the 255 automatically), it's just primary keys and unique keys that pose a problem.
Which can be worked around if we really need to support utf8mb4 on older MySQL versions, but those workarounds would be a bigger effort than the D8 effort, because of contrib and migration path/BC issues.
Comment #36
stefan.r CreditAttribution: stefan.r commentedD8 patch has now been committed so feel free to summarize and move discussion to #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols)
Comment #37
pwolanin CreditAttribution: pwolanin at Acquia commentedIt seems the D8 patch is not done?
@David_Rothstein, for Drupal 7 I would suggest we just strip them on input as a 1st pass. I don't see how the Drupal 7 mysql requirement could be raised to support this?
Comment #38
stefan.r CreditAttribution: stefan.r commentedYes, the D8 patch was reverted and needs review now
Comment #39
pwolanin CreditAttribution: pwolanin at Acquia commentedLooks like you could just filter using like:
or maybe:
Not sure how slow that might be, however.
Looks like the best place to try it might be in drupal_write_record() - it wouldn't catch 100% of queries, but the schema is already loaded there and we already case some fields to int or string.
Comment #40
pwolanin CreditAttribution: pwolanin at Acquia commentedHere's a little patch to start. Let's see if it breaks anything.
Comment #42
stefan.r CreditAttribution: stefan.r commentedmysql_character_set may not always be set
Comment #43
mikeytown2 CreditAttribution: mikeytown2 commentedpreg_replace can return NULL so I would want to make sure it's not NULL at a minimum.
http://stackoverflow.com/questions/3466035/how-to-skip-invalid-character...
Comment #44
pwolanin CreditAttribution: pwolanin at Acquia commentedok, easy to add isset() check
Comment #45
pwolanin CreditAttribution: pwolanin at Acquia commentedoops - forgot to deal w/ NuLL question - we can cast to string like the else branch.
Comment #46
pwolanin CreditAttribution: pwolanin at Acquia commentedComment #47
pwolanin CreditAttribution: pwolanin at Acquia commentedLooks like we should use instead the invalid character utf-8 character: U+FFFD
Comment #48
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedSeems like the patch above is wrong - $info['mysql_character_set'] is only set when getting the CREATE TABLE sql.
This ought to be closer (includes a test) but everything blows up.
Comment #49
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedNote that #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) was committed to Drupal 8 a few days ago - should we merge these issues back together?
Comment #50
stovak CreditAttribution: stovak commentedI have created a table_converter module. Would welcome feedback.
https://github.com/stovak/table_converter
-tom
Comment #51
stovak CreditAttribution: stovak commentedRerolled original patch for current Drupal HEAD
Comment #52
geerlingguy CreditAttribution: geerlingguy commentedSince #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) was merged into D8, should this issue be marked duplicate now, and the work merged back into that issue?
Comment #53
dpopdan CreditAttribution: dpopdan commentedIt is working now on Drupal 8, but what about Drupal 7 ?
Comment #54
stovak CreditAttribution: stovak commentedre-rolled for 7.38
Comment #55
stovak CreditAttribution: stovak commented@geerlingguy - This is a non-breaking change that only ENABLES the ability to do UTF8MB4. What do you think the chances are we can get this in a future update to d7?
Comment #56
geerlingguy CreditAttribution: geerlingguy commentedI can't speak for the community, or for the D7 core maintainers, but this patch does only add the ability to set a default charset and collation for each database connection, so it's non-destructive, and would be a welcome improvement to allow D7 sites everywhere to work with utf8mb4 if desired.
From #25:
That's basically all this patch does, and since it's non-destructive and doesn't change anything for existing Drupal 7 sites, I definitely give a thumbs up. However, one thing that this patch would introduce that is not present in Drupal 8 (AFAICT) is the database
charset
setting. In Drupal 8's default.settings.php, only thecollation
is configurable per-database. Also, I think there was a whitespace issue or two in the patch when I grabbed it and applied it to my sites; that would need cleaning up.If we want that setting added, I would think it needs to be added to D8 first then backported like this patch.
Setting to needs review so the testbot can have a go.
Comment #58
joelpittetI much prefer this incremental approach but echo what @geerlingguy said in #56, glad to see collation and charset configurable per database. Seems to pass tests. May want to add some notes in default.settings.php and clean up the comment coding standards to help this patch get through the core gates.
Same here re isset/empty vs array_key_exists.
Nit: Capital 'a' to start the sentence.
We usually use isset() or empty() unless you expect a NULL or flasey value'd charset key.
Will this affect existing fields?
Comment #59
stovak CreditAttribution: stovak commented@joel -
1-3 NP. Will clean these up and re-roll.
4. No, only new fields. This code does not change the database. For changes to existing structure: https://github.com/stovak/table_converter
-t
Comment #60
Noe_ CreditAttribution: Noe_ at Devhouse Spindle commentedAdded some documentation about the patch in /sites/default/default.settings.php
Replaced the
array_key_exists
byisset
function calls.Fixed the nitpicking start with a capital :)
Comment #61
joelpittetThanks for the clean up, not sure I know enough about this system to RTBC but thanks for code standard nit cleanup and answering my question.
Comment #62
stovak CreditAttribution: stovak commentedThanks guys, sorry, got in the weeds with another project.
Comment #63
geerlingguy CreditAttribution: geerlingguy as a volunteer commentedRTBC by me (@stovak, you can confirm as well) for D7, but with the caveat that we need to add the
charset
configuration option to Drupal 8 if we add it in D7, otherwise we're introducing a new feature in D7 that's not supported in D8!This patch adds two optional settings.php configuration directives to specify
charset
andcollation
per-database, and in Drupal's critical path, it only adds oneisset()
call and two string concatenations, so it only adds a couple microseconds to execution time, while affording every D7 site the ability, if so desired, to support emojis. Not having support is a serious deficiency in today's mobile-centric world, where emoji are fairly common in community sites (and even in many 'brochure' style sites, eesh!).Comment #64
geerlingguy CreditAttribution: geerlingguy as a volunteer commentedShould we consolidate efforts with #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols)? It seems that the only major differences here are we're not changing the default collation, since we have to support older versions of MySQL in D7, and we're adding
charset
configuration (which should first be added to Drupal 8).Comment #65
stefan.r CreditAttribution: stefan.r commentedThis needs a check for $charset being either utf8 or utf8mb4, as we probably shouldn't be supporting all possible charsets in core. Maybe we can throw an exception if it's neither of those? Or make it a boolean instead? (TRUE for utf8mb4, FALSE for utf8)
This will also break primary/unique keys that are > 191 characters, unless
inno_db_large_prefixes
is turned on. IMO we should only support utf8mb4 on D7 if that database option is turned on, otherwise we have to deal with the gargantuan task of reworking all contrib modules.#63 I don't think this will need a "forward port" to Drupal 8 as a decision was made for Drupal 8 not to support utf8
Comment #66
stovak CreditAttribution: stovak commented@geerlingguy I don't see this as a "feature that d7 has" so much as "hacky bullshit way of making d7 support something d8 already does by default." A/D?
@stefan.r this module doesn't make any changes to the tables themselves, only makes it POSSIBLE for utf8mb4. It's not turned on by default and you have to explicitly convert the tables in order for the 191 length restriction to be in effect.
-tom
Comment #67
stefan.r CreditAttribution: stefan.r commentedActually the current implementation won't necessarily break for fields over 191 characters, but if the charset is set to utf8mb4 in the database settings it _will_ limit all new fields (including non-string fields, where the limitation is not needed) to 191 characters on field creation, which could have a couple of unintended consequences we need to see that we can live with.
So this probably shouldn't be RTBC yet :)
Comment #68
joelpittetre #67 thanks that was really the worry I had tried to ask in #58.4
Comment #69
stefan.r CreditAttribution: stefan.r commentedWell one of the main concerns with the current patch is how problematic it would be to support in both core and contrib. Mixing charsets (and by implication, character cutoffs) on the same install doesn't seem like a good idea to begin with, and we also shouldn't have maintainers trying to figure out which table is running on which charset whenever there's a support request caused by this patch.
As to the character limitation, in D8 we found a way to deal with it (with varchar_ascii and some rework) but in D7 I doubt we will, considering the amount of contrib modules out there.
However, if
innodb_large_prefixes
is turned on, the character limitation doesn't apply and the support concerns disappear as for new installs everything would still be the same as usual, and as far as the upgrade path is concerned, the steps would merely be:1. Backup the tables
2. Convert all tables to utf8mb4
3. Everything still works as usual, maybe with a tiny performance degradation?
So for any patch we do here, I would suggest to only allow utf8mb4 if
inno_db_large_prefixes
is enabled (which as of MySQL 5.7 GA will be the case by default anyway), so we can forget about the whole 191 character problem in D7.Comment #70
stovak CreditAttribution: stovak commentedi'm on board with this. It was my desire to only _ENABLE_ this and not to actually change anyone's install. What changes need to be made to the patch in order for this plan to go forward? Turn off the 191 limit if that setting is applied?
Comment #71
stefan.r CreditAttribution: stefan.r commentedFor the brave ones who are fine with tinkering the 191 limit may be OK (as well as the current patch along with @stovak's migration script), but if we want to get this patch into core it should probably not have any trace of a 191 character limit at all.
I believe we can test for this with
SELECT @innodb_large_prefix
, so if that doesn't have the right value, we should error if the charset is set to utf8mb4.The patch should also disallow marking any new fields as utf8mb4 before all existing tables have been converted as we don't want to cause new issues by mixing charsets.
Comment #72
joelpittetCan we provide some instructions on how to enable innodb_large_prefix?
Comment #73
joelpittetAlso is the query not
SELECT @@innodb_large_prefix
?SELECT @innodb_large_prefix
gives meNULL
. (Doesn't know much about server variable queries).I'm using mariadb 10 locally but same
NULL
on production MySQL 5.5.44 servers.Comment #74
stefan.r CreditAttribution: stefan.r commentedNot sure about the query. As to how to turn it on, see https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_in...
The following must be in my.cnf:
tables must also be created with
ROW_FORMAT=DYNAMIC
Comment #75
joelpittetI think query with the double @@ seems to work quite well on mariadb and mysql. Just need a confirm.
I know how to do these things myself but if we expect people to change MySQL configuration we should give them some kind of rudimentary instructions.
How doyou check forROW_FORMAT=DYNAMIC
:Edit:
SELECT row_format FROM information_schema.tables WHERE table_schema="your_db"
And could you explain why those settings are all needed and will all of them be on by default in 5.7?Edit: it says most of this in the link you provided;)
Comment #76
Noe_ CreditAttribution: Noe_ at Devhouse Spindle commentedFrom what I understand utf8mb4 is in D8 see: https://www.drupal.org/node/1314214#comment-10036841
And what are the next actionable steps to get this into core.
I really need this because now my core is patched, so I would be willing to help out. I just need to know what eed to do.
First check if @@innodb_large_prefix is on and if it is, just allow utf8 and utf8mb4?
Comment #77
stovak CreditAttribution: stovak commentedSo I think the suggestion on the table is the code surrounding the 191 character limit on Indexes needs to be replaced with a check for this config flag.
If its there, allow the index to continue without intervention, if not, limit the column to 191 chars...???? I don't know here, i'm asserting. Please feel free to suggest anything you think makes sense.
-t
Comment #78
stefan.r CreditAttribution: stefan.r commentedInstead of a silent failure like that it should fail harder IMO, like a big error screen if the
utf8mb4
flag is set in the config file but the innodb_large_prefix option is disabled... along with a warning in the documentation right above the config file flag that enabling the flag will break your site if that innodb setting is not set.And we should probably keep failing if that flag is set but tables haven't been converted to the current character set yet (similarly if all tables have been converted to utf8mb4 and people set the config file setting back to utf8)...
So for existing installs, the actual steps would be:
1. Set up InnoDB database settings
2. Set up the config setting
3. Turn on maintenance mode
4. Run migration script that backups database and converts all tables if the backup is successful
5. Turn off maintenance mode
While for new installs it would be:
1. Set up InnoDB database settings
2. Set up the config setting
3. Install Drupal as usual
Comment #79
hass CreditAttribution: hass commentedShould this patch not allow people to install Drupal with the installer correctly? Editing the settings file first is not very common. People expect this to work from installer, too. I think we need to be able to select the collation in installer, too. The patch also does not verify if mySQL is capable of doing all right plus is does not check the client requirements. There is also a D8 case open for this. As people may lose data otherwise or corrupt it's database I think this need to be much more fools prove.
Comment #80
stefan.r CreditAttribution: stefan.r commentedI agree ideally we want this to be an installer toggle that's only checkable when the database client/server have support (and the right config settings).
I actually dont think there's any chance of corruption/data loss though, the database driver would fail rather than corrupt anything.
Comment #81
stovak CreditAttribution: stovak commentedExcellent point & i agree completely. I will attempt to add this in the near future if someone doesn't beat me to it.
Personal note: The sites I work on are retail so August/Sept is my busy season getting ready for xmas. Won't have much time in the next 30-60 days.
-tom
Comment #82
devbanana CreditAttribution: devbanana commentedI'd love any updates on this. I converted my database and tables to utf8mb4, but now can't install any new modules that create tables because they try to create it in utf8 by default.
Comment #83
darol100 CreditAttribution: darol100 as a volunteer and commentedWhat are the steps to continue fixing this issue ? What things needs to be done in order to commit this to Drupal core ?
Comment #84
stovak CreditAttribution: stovak commentedNot sure about anyone else, but I think we left it needing a hook for the installer to have the option of installing utf8mb4 if support was available. I haven't been able to code that and probably won't get to it until after xmas.
Comment #85
stovak CreditAttribution: stovak commentedRerolled for 7.41
Comment #86
twistor CreditAttribution: twistor as a volunteer commentedComment #87
adammaloneOk let's take a quick pass at this. It's not pretty but it seems to work. I'm open to ideas about how we present the option to the user. Unfortunately we can't check utf8mb4 presence before presenting the db connection form because we'll have no credentials. We are also unable to check after that form has been submitted because the tables will have been created so it's a bit of a Frankenstein's monster in between job.
Comment #88
stovak CreditAttribution: stovak commentedSo I think there's 2 parts to this. First basic support, which the #85 patch does. Can we disassociate the user-facing stuff to get it elegant and the developer-facing stuff to just add support? I'd like someone to mark this RTBC on my #85 patch to get it into the next core update. Then we can continue this thread on the changes to the installer to allow utf8mb4 during the install process.
Agree/Disagree?
-tom
Comment #89
geerlingguy CreditAttribution: geerlingguy as a volunteer commented@stovak:
I vote +1 for this too—let's get in the backend functionality first ("support full UTF-8"), then work on the user-facing components for new installs. Since that will likely require more bikeshedding, I vote we create a separate ticket to implement the UI for choosing utf8mb4 support during site install, and work to get what's in #85 into core.
Comment #90
adammaloneWorks for me. Marking this as RTBC based on #85 and spinning out #2600982: Create UI support for utf8mb4 support for the UI patch and inevitable bikeshedding.
Comment #91
stovak CreditAttribution: stovak commentedAs they say in Texas, "muchos grassy ass"
Comment #92
stefan.r CreditAttribution: stefan.r commentedCould we use the Drupal 8 code here instead?
I don't think this will be able to go into a core release. I understand this will only apply to new tables but just silently changing the length of the fields from 255 to 191 is a BC break and will lead to issues with contrib or custom modules as well.
Per #67 maybe we could check for the innodb_large_prefix option instead (which is now the default in mysql 5.7 GA) and refrain from changing the length of fields?
Comment #93
adammaloneThanks for the review @stefan.r. Could I politely ask for part 1 of your review to be cross posted over to #2600982: Create UI support for utf8mb4 support since we're limiting the scope of this issue to what the patch in #85 adds; that is, the backend functionality.
For part 2, if the change only applies to new tables and new Drupal installs does that count as BC breakage? With utf8mb4 collation we do have to limit field sizes due to the extra information that's stored - are you suggesting we make innodb_large_prefix support a prerequisite for utf8mb4 collations?
Comment #94
stefan.r CreditAttribution: stefan.r commentedYes that is what I was suggesting, as we can't just have a field that refers to the same thing in two tables that has a length of 255 in one place and 191 in another. And changing max length of all string fields from 255 to 191 is definitely a BC break for both contrib and core. We don't do this in D8 either. For D8 we went through a big effort to make this workable for installs without innodb_large_prefix (see #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols)), but I feel doing the same for D7 is not realistic.
So even if we split out the UI functionality some points will still need to be addressed before this is close to RTBC:
Comment #95
darol100 CreditAttribution: darol100 as a volunteer and commentedComment #96
geerlingguy CreditAttribution: geerlingguy as a volunteer commentedI hid the patch from #87, addressing point #4
Comment #97
stovak CreditAttribution: stovak commentedI will address Stefan's concerns one at a time: addressing #94.1 "Column Length Issue":
So what this code does is truncate the index for fields > 192 chars when installed AFTER the config for UTF8MB4 has been set. This will not effect the install process because we are not making UTF8MB4 available to the install process in this ticket. This is for newly-created fields and enabled modules and doesn't truncate any currently existing data. So I'm thinking we surround the current check for field size with a check for innodb_large_prefix and if innodb_large_prefix fails, then set the length at 192?
Again, this will not actually CHANGE any currently existing field, but will change newly created fields after the charset and collation has been applied and configured.
Stefan?
Comment #98
stefan.r CreditAttribution: stefan.r commented@stovak I'm aware of that, and didn't we go through this earlier in the issue? I thought that after #31/#32 you agreed that truncation of the fields should not be put in core...
IMO if innodb_large_prefix fails we should just not allow utf8mb4 at all as we have no good way to deal with the 191 character problem.
Comment #99
stovak CreditAttribution: stovak commented@stefan: Probably, I don't remember. This thing has dragged out six months and honestly I'm actually very sorry I wanted to contribute any code to core.
Considering the fact that is developer-facing only how do we "prevent UTF8MB4 from being used"? I say we use my fix above with a drupal_set_message that the field has been truncated and let the developer solve the problem by enabling large_prefix...??
Comment #100
stovak CreditAttribution: stovak commented@stefan:
Comment #101
stovak CreditAttribution: stovak commentedUpdated documentation:
Comment #102
stefan.r CreditAttribution: stefan.r commentedI understand your frustration but we all want to get this into core. This is a hard issue which will need quite a bit of work so you help is very much appreciated!
I think truncating is a bad idea, full stop. The Drupal 7 maintainer already mentioned in #31 that _if_ we truncate, we need to work around all the issues that may cause, and a trigger_error on createField is not nearly enough. Ideally we would do a fatal error on connect if people set the charset to utf8mb4 without setting the innodb_large_prefix setting, but that may have performance implications as we don't want to add an extra query every time on connect.
Also see #74, in order to be able to set innodb_large_prefix there are further prerequisites (and this is only available on MySQL 5.5.14 and up).
Comment #103
nithinkolekar CreditAttribution: nithinkolekar commentedUntil this hard issue get fixed (related to 191 characters) and committed to core could it be possible to support converting 4byte character to its equivalent html entities? just like https://www.drupal.org/node/2531908#comment-10122828 instead of stripping those characters.
update for future readers: feature implemented by helgi in module https://www.drupal.org/project/unicode.
Comment #104
stefan.r CreditAttribution: stefan.r commented#103 this seems like a good idea. Maybe a new filter as a contrib module?
Comment #105
nithinkolekar CreditAttribution: nithinkolekar commentedback to original issue...
I gone through complete thread and like to revise few key points based on above comments
1. #2600982: Create UI support for utf8mb4 support should be postponed until this issue get RTBC and committed .
2. make utf8mb4 only applicable/available(fresh install) for MySQL >=5.5.14 (this way we can force user to upgrade the mysql if they really need utf8mb4 ?)
3. If MySQL >=5.5.14 check for innodb_large_prefix fails do nothing. (also applies to #2600982: Create UI support for utf8mb4 support As per the #87's concern this must be checked after credential submission. If innodb_large_prefix is disabled then notify user to enable it or suggest them to stick to utf8)
long live 255 chars :)
Comment #106
joelpittet@stovak was testing your script out in #24 but it had some issue with
$db_url
which looks like D6 variable.Have you thought of maybe a drush script, it would be bootstrapped already and they could just run drush scr convert.php
Comment #107
joelpittet@stovak similar script but for drush, didn't have to touch the individual column it would seem (my test was to throw some emoji's on a node).
https://gist.github.com/14c155902e6b62bc8c8e
I had a pain of a time getting
open_files_limit
working on my OSX because of the conversion toinnodb_file_per_table
.Here's the settings I needed:
I also had to do this a few times (locally) but that may be because I was changing buffer pool sizes.
http://dba.stackexchange.com/questions/1261/how-to-safely-change-mysql-i...
mysql.server start doesn't work well on OS X with mariadb... (tones of reboots instead;-) <- lack of emoji)
Comment #108
sebas79 CreditAttribution: sebas79 commentedHello guys,
I'm following this issue for a couple of weeks now. With the guide of midwesternmac i got it working for my site.
As I also mentioned on Midwesternmac I can only add emoji to the body field and not to the title or comment field. Do you have the same experience?
Comment #109
joelpittetOh I did need to touch the columns, I've updated my script to add some of the stuff @mikeytown2 mentioned in #25
https://gist.github.com/joelpittet/14c155902e6b62bc8c8e
In the process of testing it out some more.
@sebas79 Sorry haven't tried out that table converter module, it looks pretty good maybe @stovak can help with why it didn't work for title and other fields.
Comment #110
infojunkieFor those interested, we've forked @stovak's module to add
drush
support: https://github.com/meedan/drupal_utf8mb4_converter. We made a few other fixes to handle different edge cases. The big part that still needs fixing is the truncation of keys.Comment #111
joelpittet@infojunkie I added some large prefix support to the README that I mentioned in #107 as a pull request to @stovak's repo
Comment #112
stovak CreditAttribution: stovak commentedSo I love the changes you guys have done with my table converter. Send me a pull request and i'll pull all your changes back into my source. I also like the name change.
What I've been wresting with is what to do if innodb_large_prefix = OFF and Schema decides to install a new table with a 255 char length string field index. Anything more than 191 will throw a sequel error. My current solution was to truncate the field index and send the user a message. Stefan is dead set against it and I understand his point.
I just don't know what to do in this eventuality.
For Reference:
Comment #113
stefan.r CreditAttribution: stefan.r commentedWhat we did in Drupal 8 was introduce varchar_ascii, which does allow for 255 character indexes. I don't think we can go through that exercise for Drupal 7, so if innodb_large_prefix = OFF, just don't use utf8mb4 - either throw an error or fall back to utf8
Comment #114
stovak CreditAttribution: stovak commentedI'm ok with it throwing an exception. I don't think the developer should get unexpected behavior of reverting to utf8.
Anyone else care to chime in?
Comment #115
nithinkolekar CreditAttribution: nithinkolekar commented+1 for 113 and that is what I suggested at #105. Instead of struggling with the old version mysql , just update to version which supports utf8mb4.
Comment #116
stovak CreditAttribution: stovak commentedLook well o, wolves:
Comment #117
KhaledBlah CreditAttribution: KhaledBlah commented@infojunkie @stovak
I still received the error message:
comment: SQLSTATE[HY000]: General error: 1709 Index column size too large. The maximum column size is 767 bytes.
Turns out that the table's row format must be either ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED as said before by @joelpittet and @stefan.r.
I've forked @infojunkie's repo at https://github.com/khaledblah/drupal_utf8mb4_converter and sent a pull request (https://github.com/meedan/drupal_utf8mb4_converter/pull/1).
Comment #118
KhaledBlah CreditAttribution: KhaledBlah commentedSo I used the patch from #87 to support utf8mb4 on my system but although I had all the necessary settings it would still fail with the error message: "General error: 1709 Index column size too large. The maximum column size is 767 bytes." I figured this was because when the table was created ROW_FORMAT=DYNAMIC was not added to the query and ROW_FORMAT=COMPACT (the default) just does not work in conjunction with "innodb_large_prefix=1". (see ROW_FORMAT at https://dev.mysql.com/doc/refman/5.6/en/create-table.html).
So I took the patch from #112 (I slightly modified the message) and added the bit "ROW_FORMAT=DYNAMIC" in createTableSql() in case innodb_large_prefix is set to ON. This is in accordance to the mysql documentation and obviously only done if charset is set to "utf8mb4".
Comment #119
infojunkieThanks KhaledBlah. Can you explain the overlap between the pull request at #117 and the patch at #118? Are they meant to work independently, or together? I'm testing the pull request locally, without the patch at #118, and it seems to function correctly for my use case.
Comment #120
stefan.r CreditAttribution: stefan.r commentedApplied the fix from #116. Leaving needs work as the issues in #94 still need to be addressed.
Comment #121
KhaledBlah CreditAttribution: KhaledBlah commentedHi infojunkie,
the pull request and the patch in #118 are related because in that patch, new fields will be created using a table with the ROW_FORMAT=DYNAMIC setting if innodb_large_prefix is "ON". In my pull request there is no check for the "innodb_large_prefix" setting, however, since I have quite a few fields with rows of type varchar and a length > 191 characters it was necessary to first alter the tables before the rows and second make the sure a table has the appropriate ROW_FORMAT setting. This way the same indexes will be available on the converted tables/rows like before even if they have more than 191 characters (obviously the "innodb_large_prefix" setting is "ON" in my DBMS).
Comment #122
infojunkieOK thanks KhaledBlah. Merged your pull request.
Comment #123
helmo CreditAttribution: helmo at Initfour websolutions commentedAfter some more reading I found a module (strip_utf8mb4) with a quickfix/workaround.
It was mentioned already in the D8 discussion, but not yet in this issue.
+1 for fixing it properly though.
Comment #124
Helgi Andri Jónsson CreditAttribution: Helgi Andri Jónsson commentedHello everyone,
I made a module that uses entity metadata wrappers to loop over the properties of an entity on hook_entity_presave and it converts all unicode characters to html entities for properties of type text and formatted text. I know it would be great have this solved in core and its not really a solution to strip them from input as that would be counter to the expectation of a user experience.
I hope the module solves a middle ground. I also made sure to include configurable settings to allow us to blacklist or whitelist entities to check for unicode. Therefore we can isolate such input to certain entities like comments or messages.
https://www.drupal.org/project/unicode
Best Regards,
Helgi.
Comment #125
jduhls CreditAttribution: jduhls as a volunteer commentedWorking with an existing (dev) site/database:
1. I dumped my database and (UPDATED!) replaced
CHARSET=utf8
withCHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC
in the create table statements2. I upgraded from MySQL 5.1 to 5.6.
3. I updated my mysql configuration using all except "init_connect": http://stackoverflow.com/a/36405914
4. I re-imported my database dump.
5. I updated charset/collation of my drupal database using: https://gist.github.com/jduhls/dd77f7dfea879766d01ee60279ab0c3e
6. I applied #120.
7. I updated my settings.php: http://www.jeffgeerling.com/blogs/jeff-geerling/solving-emoji-problem-dr...
So far so good working with Twitter API results.
Comment #126
DamienMcKennaJust to mention it, the Unicode module does indeed fix the problem for Field API, though you have to specifically enable it for the entity types that are affected. So +1 for it as a workaround.
Comment #127
stefan.r CreditAttribution: stefan.r commentedI will look at this during a Drupal Dev Days next week :)
Aiming to get a simple version of this patch that makes this work with late MySQL 5.6 and 5.7 into Drupal 7.50 (which is due to be released early July at the earliest) :)
Comment #128
barobba CreditAttribution: barobba commentedI am glad this is being addressed for D7 because it is pretty important to me. I am working on implementing a site for a Native American tribe has a 5-byte character (1XXXX). So, I am looking at what to change in the existing site now that I am inserting text, and Drupal (in combination with MySQL) is crashing.
Comment #129
stefan.r CreditAttribution: stefan.r commentedJust posting a first draft of the backport here. The idea is we allow people to set the charset to be 'utf8mb4' in the database connection info array prior to running the installer and then do a hook_requirements() check.
We'll point to https://www.drupal.org/node/2754539 for instructions on how to make this work for existing installations, for which we'll work on a D7 contrib project that checks requirements and converts all tables.
Comment #131
stefan.r CreditAttribution: stefan.r commentedComment #133
stefan.r CreditAttribution: stefan.r commentedLet's see if this fixes the test failure
Comment #134
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedIt seems like these methods need to be implemented in the base DatabaseConnection class?
Does this patch work on postgres or sqlite?I see now there is a check that the method exists, but still seems like it would be better to have them implemented in the base class or every subclass for consistency?
Comment #135
Fabianx CreditAttribution: Fabianx as a volunteer commentednit - This line is a _little_ long ... :D
Lets use a constant for that.
In case this is the same in D8, lets open a child issue to fix that in both versions.
--
Nothing checks that this fails due to the charset being set to utf8mb4. It could be set to garbage and this error message is misleading.
Overall the approach is nice, however I can't see why we don't autodetect support for UTF-8 MB and use that automatically?
As that is only for new installations, I don't think it would be a BC-break.Contrib might totally not support it, so it should indeed be an optional feature - even if the DX to set this up prior to installation is pretty bad.
--
And yes, tests are still much missing. Kill the tests with 'fire'! :D
--
Edit: Do we need to change things in core from varchar to varchar_ascii as well?We do not as we only support special configurations.
Comment #136
Fabianx CreditAttribution: Fabianx as a volunteer commentedThe technical approach has been approved by a framework manager. (me)
Comment #137
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedI think this needs tests for 4-byte utf8, which should pass on all 3 supported SQL backends (needs some hacking around to set the mysql options in the conntection)
Comment #138
stefan.r CreditAttribution: stefan.r commentedAdded tests and addressed #134, #135 :)
Can only run this locally as far as MySQL is concerned, test results attached
Comment #139
stefan.r CreditAttribution: stefan.r commentedComment #140
Fabianx CreditAttribution: Fabianx as a volunteer commentedThis looks very great to me! RTBC
Comment #141
stefan.r CreditAttribution: stefan.r commentedComment #142
stefan.r CreditAttribution: stefan.r commentedComment #143
stefan.r CreditAttribution: stefan.r commentedComment #144
stefan.r CreditAttribution: stefan.r commentedCreated a contrib project at https://www.drupal.org/project/utf8mb4_convert that we can point to in the release notes
Comment #145
stefan.r CreditAttribution: stefan.r commentedUpdated installation instructions in https://www.drupal.org/node/2754539
Comment #146
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedSome minor concerns over code comments and text
This code comment doesn't seem quite right - it should be something like "By default we assume database back-ends may not support 4 byte utf-8"
It would be good to get this to run on the testbot - I'm not clear why it can't now - at the least a follow-up issue for that?
This string is not accurate - MySQL always supports up to 3 byte utf-8, so this is a test for 4 byte or maybe "extended multibyte"?
Comment #147
stefan.r CreditAttribution: stefan.r commentedFiled #2756079: Ensure testbot can test 4 byte UTF-8.
I'm going with "4 byte UTF-8" since "extended multibyte" is not very Googleable (only 2 results ;)
Comment #148
Fabianx CreditAttribution: Fabianx as a volunteer commentedBack to RTBC
Comment #149
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedMarking for commit, even though probably David should take a look, too.
Comment #150
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedIt looks like the default.settings.php examples are much less detailed for Drupal 8? Perhaps those need to be updated too?
Comment #151
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedDiscussed in IRC:
- We want this to be only enabled if we can be sure that an admin converted all tables first OR is just installing the system
Therefore we will be guarding the DB option with an additional variable that all DB tables have been converted, which is obviously true for new installs by itself.
And warn on hook_requirements() that it was enabled, but tables had not been converted and only reference the contrib project, but not the variable.
Comment #152
stefan.r CreditAttribution: stefan.r commentedComment #153
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedBack to RTBC. Already now user input can kill a table and the user warning is loud and clear.
Comment #154
stefan.r CreditAttribution: stefan.r commentedAssigning to David for final review
Comment #155
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedOverall I like the way this is trying to be nonintrusive for existing installations. Here's an initial review; it seems to need work and probably more review and testing from someone else.
Does this ever run in the installer at all? I couldn't get it to (I think the requirements check doesn't ever happen after settings have been verified).
This does not explain why it's a warning condition - what bad thing happens if I don't enable it?
Also if the only way I can enable it is to edit settings.php by hand, that's going to be a pretty bad user experience (especially if it interrupts the installer).
Is there a reason we can't just make this a checkbox on the Database Configuration screen in the installer? Then (a) it won't interrupt anything, (b) it should be possible to write it to settings.php automatically (along with all the other database connection info), (c) none of the hook_requirements() code would need to run in the installer at all.
Don't think we can have the leading "\" on any of these (for PHP 5.2).
Also (minor) there's an extra space in the last one after "catch".
Pretty sure it should use st() rather than t().
Is all this really just so we can give a better user experience to someone who edits settings.php and puts the wrong character encoding in it? If so, is that even worth it compared to just letting them see the "Can't initialize character set" error that the database gives them?
Maybe I'm not understanding what this is for.
Is this really supposed to be always supported but never active?
This doesn't look quite right to me. What will it do on a command-line installation?
I would think the 'mysql' check isn't needed here. Don't we always want to skip these tests (for any database driver) if utf8mb4 isn't turned on?
Should this be drupal_strlen()? I don't think we require mb_strlen()...
This should have some link text (maybe "documentation for adding Multi-byte UTF-8 support").
The page is not anywhere in the drupal.org book structure right now - guess it will need to be moved there eventually. Or should it just become a change notice?
We haven't ever done that in Drupal 7; though I can see the point in starting someday (if it works with api.drupal.org). It's going to lead to inconsistent documentation though.
Probably put the FALSE on the right side - that's how it tends to be done elsewhere in core.
Formatting is wrong.
"Asian" (capitalized)
"contrib" => "contributed". Also the line spacing looks off on the first couple lines.
It would be good to link to https://www.drupal.org/project/utf8mb4_convert directly here.
I think it's OK to reference that contrib project in the core settings.php file, but right now it has an alpha release and doesn't look like many people have tried to use it, so I think the project page should indicate its experimental status somehow. Just so people know what they're getting into... :)
Comment #156
stovak CreditAttribution: stovak commentedNever ask yourselves why more people don't contribute to core. Seriously, this started out as a 5-10 line patch that i've been using in production without issue for over a year. We could have added the patch to core to *support* MB4 for those that needed it and then "bike-shedded" it to fuck and back but whatever.
Good luck with whoever this ends up. i'm out.
-tom
Comment #157
stefan.r CreditAttribution: stefan.r commentedThanks!
1.
Absolutely unnecessary indeed!Seems it does run, hence the failing test :)2. We actually decided to split this out on purpose (validated with xjm, alexpott, pfrenssen) and implement the UI in a follow-up (most likely a checkbox). But yes, very good idea, and the very next thing I will work on as soon as this is committed.
3. If we believe the interruption is too intrusive (maybe it is, but it can also be ignored and clicked away, similarly to the opcache warning in Drupal 8), perhaps we can downgrade the severity level of that specific error during the installer? I believe that would stop the error from showing unless there were other issues. Or alternatively, specifically spell out that it's an expert operation, and tell the user to click continue and ignore the message if it's not a feature they want?
5. Yes, it's backported from D8, to prevent an ugly error from appearing and the installer dying (in preparation for the UI work as well). But for now, it only happens if people actually manually override their settings.php, so the reason for it appearing will probably quite clear (unlike in D8, where utf8mb4 is always on). So for this issue, let's take this all out, and in the UI followup we properly look at putting this back in.
7. Yes, most likely that should be an exception instead. But now that 5 is taken out, we don't need it :)
Comment #158
stefan.r CreditAttribution: stefan.r commentedComment #159
stefan.r CreditAttribution: stefan.r commented@stovak good to hear the patch worked in production -- the current one is functionally very similar.
Comment #161
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commented#156: I can understand your frustration, but we are working to improve the core process.
On the other hand we do have a responsibility and we try to prevent users from shooting themselves in the foot or loosing data due to bugs.
Also we try to prevent 100s of support requests from happening in the first place.
If someone applies the patch, they are on their own. But if someone just reads in a blogpost that for their emoji, they "just" need to change one line in settings.php, then from experience we get to deal with all the different bug cases: MySQL version not supported, Tables just partially converted, etc.
Here you just have 3 core maintainers discussing the last tweaks on this patch. I am pretty sure it will be part of 7.50.
Comment #162
stefan.r CreditAttribution: stefan.r commentedSo we're going to have to put the code from point 1 back, seems it runs after all :)
Comment #163
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedI would say #2 helps answer #3. Postponing the UI to a followup definitely makes sense to me, but that should mean actually postponing it :) A warning that interrupts the installer is a pretty big UI change.
I think if you want to include a REQUIREMENT_INFO that informs people of the new feature and links to the documentation page, that's a small UI change which is probably reasonable. (And so is a REQUIREMENT_ERROR if they tried to configure it in settings.php but failed to do it correctly.) But I don't see the reasons for putting in any warnings at this point.
Sorry, my earlier comment about this was a little unclear. What I meant was that this code:
will always return when called during the requirements phase of the installer; any code that runs after it will never be reached.
Except actually I wasn't 100% right... During a normal install with a fresh settings.php I think I was right. But if settings.php has been preconfigured (by hand) with database connection info before running the installer, then your code does work OK.
I am not sure if you intended it to only run in the second case (if so, I'd suggest a code comment), but that seems to be what it's currently doing.
Also, this should probably check
isset($install_state['settings_verified'])
beforehand since outside of the installer$install_state
itself will always be empty?Comment #164
stefan.r CreditAttribution: stefan.r commentedDoes this seem right? (the warning on enabled was a debugging artifact, slipped in in the previous patch)
Comment #165
stefan.r CreditAttribution: stefan.r commentedComment #166
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedYes, taking a quick look that looks right to me now - thanks!
Comment #167
travelvc CreditAttribution: travelvc commentedHi all thanks very much for this work on this, including stovak and stefan.r
I successfully applied patch #165 and ran stefan.r's drush tool 'utf8mb4_convert' and it works great. I have about 20,000 users and over 2,000 nodes.
I did find it worthwhile to check the character sets of my tables after running the script with 'show table status from DATABASE_NAME' which identified a few module tables which didn't get converted (such as search_api_db tables) due to what I assume are key lengths.
As a general comment - with the advent of more mobile users with emoji keyboards, I'm seeing many more of our users enter this kind of data into fields. So this update is much appreciated and I'm looking forward to it being officially released so I can move this to my project's master branch.
Comment #168
stefan.r CreditAttribution: stefan.r commented@travelvc please file an issue against the contrib project issue queue with further information -- I am assuming it skipped those tables as only Drupal-managed tables (returned by drupal_get_schema()) are converted
Comment #169
scuba_flyI successfully applied patch #165 as well.
Also changed my db settings using http://www.jeffgeerling.com/blogs/jeff-geerling/solving-emoji-problem-drupal-7 Thnx @geerlingguy!
I can now successfully insert smileys into the field_body
Comment #170
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedBack to RTBC, changes look great to me.
I think David should commit this.
Comment #171
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedI reviewed this again and did some testing. (I especially tested the status report page since the changes there had less testing than the others.) Good to see that the main functionality has had some nice testing above as well.
I had to make a few changes to the patch (see attached). I'm leaving this RTBC because we really do want to get it in today and the changes aren't huge, but a couple are notable:
I think we don't want @link in settings.php documentation; it's not used anywhere else and the file is supposed to be "human-readable".
For this:
I thought about whether we want to display the human-readable driver name rather than the machine name here... for the most part we would, but "4 byte UTF-8 for MySQL, MariaDB, or equivalent is disabled" looked too weird when I tried it, so I left it as is (with the machine name) :)
Thanks everyone. Hoping to be able to commit this later today.
Comment #172
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedWe need a change record here. Mostly it can just point people to the documentation page (https://www.drupal.org/node/2754539) I think.
I edited that page now to move it into the drupal.org book structure, as discussed above. Wasn't entirely sure where it belonged, but I put it under the "Troubleshooting" section for now since people will tend to get to it via a message they see on the Status Report page.
Ideally that documentation page should be edited to say something brief about other database drivers too, not just MySQL? (Since to start with, anyone using a non-core driver is going to get a message on the status report page pointing them to this documentation....)
There may be some followups to do (for drupal.org documentation or maybe INSTALL.txt) to point people to this documentation from there as well.
Comment #173
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedActually it should also address the new class methods that database drivers in contrib can implement if they want to support this.
Comment #174
stefan.r CreditAttribution: stefan.r commentedThose changes look great, thanks. Ha at the SQLite mess-up ;)
Added a bit about SQLite/PostgreSQL in the book page.
Comment #175
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedRTBC for the interdiff in #171 as well.
Comment #176
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedClicking through to the test results, the new tests added in this issue pass on all environments (there's a failure on the SQLite tests but it's due to a failure to drop tables at the end of the test or something like that - happens for the other tests too and not related to this issue). No obvious regressions either, so this is good to go.
Committed to 7.x - thanks!
I wrote a change record here (review needed):
https://www.drupal.org/node/2761183
I also updated the documentation a bit more regarding other database drivers:
https://www.drupal.org/node/2754539/revisions/view/9847329/9848305
Reviews welcome of that too (especially what I wrote about MariaDB/etc, which I think is correct but am not positive).
Comment #178
hgoto CreditAttribution: hgoto as a volunteer and at Studio Umi commentedGreat!! I'm looking forward to getting this change in my sites. Thank you all who addressed this issue.
Comment #179
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedFor some followups noted above:
Hopefully that covers everything.
Comment #180
ptmkenny CreditAttribution: ptmkenny commentedI'm really glad to see this in core, thanks! For the past year, I was using a patch in this thread to add emoji support, although the linked patch only affects textfields.
I see that the first post now says that however when using this setting on an existing installation, all existing tables must be converted to the utf8mb4 charset first., but with the old patch I was able to only convert the textfields to utf8mb4 and leave everything else as utf8. Is there really no way to use the utf8mb4 collation in 7.50 without converting all tables?
I ask because there is no easy way to run the utf8mb4_convert drush utility on my server, which means I will have to convert the database manually, which may be a bit beyond my skill level.
Comment #181
cilefen CreditAttribution: cilefen commentedComment #182
cilefen CreditAttribution: cilefen commentedComment #183
jibranCreated #2770319: Testbot MySQL config doesn't use large prefix hiding potential errors for testbot.
Comment #185
geek-merlinComment #186
vpsaravanan CreditAttribution: vpsaravanan commentedutf8mb4_convert requires drupal version 7.50. We have to convert character set and collation in all tables or we can convert the affected tables. provide detailed explaination for sites running lesser than 7.50 version
This variable drupal_all_databases_are_utf8mb4 is only set while running installer. This variable is not set for the existing sites. As this variable is not set, it throws requirement error while running update.php in status report page