Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Using Drupal 8 (git checkout and install today) and CKeditor I tried to type some greek in a node.
I typed this:
English: We have wysiwyg be default.
Greek: Άραγε δουλεύει σωστά ή τα κάνει μαντάρα?
French: à, è, ù
And then checked what gets stored in the database. All greek text was converted to html entities.
This is what resides in mysql:
<p>English: We have wysiwyg be default.</p>
<p>Greek: ?ραγε δουλε?ει σωστ? ? τα κ?νει μαντ?ρα?</p>
<p>French: à, è, ù</p>
For those wondering, the greek text above translates as: "Does it work or is it messy?" :)
This used to be an issue with the ckeditor module too which has been long solved, I am marking the related issues.
I searched for an issue against D8 but found nothing, thus I created this issue.
I strongly believe that this conversion to html should not happen.
Comment | File | Size | Author |
---|---|---|---|
#6 | ckeditor_html_entities-2345037-6.patch | 1.38 KB | Wim Leers |
Comments
Comment #1
bserem CreditAttribution: bserem commentedAdded french characters example
Comment #2
bserem CreditAttribution: bserem commentedThe attached patch does the trick for me for greek characters while it does not affect basicEntities (<, >, &, they get stored as html entities).
If I'm on the wrong direction please tell me
Comment #3
bserem CreditAttribution: bserem commentedAlso, git diff provided this, arguing with me about newline. Attaching this too.
Comment #4
Wim LeersOh, wow, amazing catch!
You found the right settings, but we don't modify CKEditor itself. We modify CKEditor's settings :) Also, I think we just want to tell CKEditor to not create HTML entities by default at all (
CKEDITOR.config.entities = false
), which then also prevents the problems with Greek and French characters.I looked up the documentation for this and http://docs.ckeditor.com/#!/api/CKEDITOR.config-cfg-entities tells me CKEditor also converts the single quote into an entity by default. We also don't want that. It also tells me Chinese is not HTML-encoded by default. We want to verify that. So I'm testing with this example:
When stored, this yields:
As expected and needed, the things that must be converted to HTML entities (the ampersand, smaller than and greater than symbols) are converted. (To disable those, we'd need to set
CKEDITOR.config.basicEntities
to false, which we don't want to, because it'd result in invalid HTML.)I asked the CKEditor team to chime in, to confirm that this is the correct approach.
Comment #6
Wim LeersUpdated the test coverage, will be green now.
Comment #7
wwalc CreditAttribution: wwalc commentedThis default behaviour (
config.entities = true
) has been set many years ago, in early FCKeditor times. Long time ago utf8 was not that popular on websites and people set wrong database encodings too often, so this was a remedy for common issues about "characters being destroyed" etc. Since Drupal is using utf8 correctly, there is no sense is keeping it enabled, hence this issue is valid.Comment #8
bserem CreditAttribution: bserem commentedOh man... I couldn't figure out where to control out ckeditors settings.
Patch on #6 is working for me, and keeps the greater/lesser and ampersand as html entities as it should have.
I'm moving this on to "reviewed".
As a side note. On D7+ckeditor this functionallity was controlable from the UI. I never liked it, since it never made sense for greek users. Do you believe we should re-implement something like that?
From wwalc's comment I understand that we shouldn't.
ps: Wim will you be on Amsterdam next week?
Comment #9
Wim LeersNo, we shouldn't have a setting for this in the UI, because we should just always use UTF-8! :)
Yes, I'll be in Amsterdam! And so will wwalc, by the way. He's presenting about CKEditor in Drupal 8!
Comment #10
bserem CreditAttribution: bserem commentedGood, see you both there then. I haven't found the equivalent of last years beer museum yet though :P
On to the next bug...
Comment #11
Wim LeersAnother language-and-WYSIWYG-related bug is #2318237: CKEditor translates its user interface even if interface translation is turned off. If you could roll a patch for that one, I'll definitely review it :)
Comment #12
bserem CreditAttribution: bserem commentedI'll have a look at it. I'm not a patch master, but I'll see what I can do. Thanks
Comment #13
webchickCommitted and pushed to 8.x. Thanks!