When I work with TinyMCE (3.2.1.1) in rich-text-mode and enter a
with the character map, the space is shown, but there is only a simple space when I hit TinyMCE's code button and look at the HTML.
A side-issue: When I try to add two non-breaking-spaces next to each other, I have to insert three in order to see two spaces, but when I look at the code, there is only one simple space left.
When I try the same with the "Full feature example" on the Moxiecode site, I can see the non-breaking-space-entities in the source code.
That leads me to believe that the problem lies with the Drupal implementation of TinyMCE.
I was able to reproduce this on several installations (even under Drupal 5.12 with the old 5.x-1.9 module and TinyMCE 2.1.3 - except for the side-issue).
Comment | File | Size | Author |
---|---|---|---|
#18 | wysiwyg-HEAD.entities-18.patch | 2.13 KB | sun |
#17 | wysiwyg-HEAD.entities.patch | 1.08 KB | sun |
#2 | wysiwyg-DRUPAL-6--1.tinymce-entities.patch | 649 bytes | sun |
Comments
Comment #1
spade CreditAttribution: spade commentedI got a hint to check the entity_encoding and found out it was set to 'raw'. This setting is explained here. Thanks to Pete.
Changing line 112 in sites/all/modules/wysiwyg/editors/tinymce.inc to
'entity_encoding' => 'named',
solved the missing non-breaking-space-entity issue, but the question remains: shouldn't this be the default setting?The above stated side-issue remains as well: to get one space to show, I enter one space through the character-map popup. To get two spaces I have to enter three and to get three I have to enter five. The HTML source shows one
after I entered one non-breaking-space, it shows
after three non-breaking-spaces and
after five.Comment #2
sun"named" is TinyMCE's default value, so we can just remove the entire setting.
I have no idea why this was set to "raw". I always thought it was because of multilingual sites, where TinyMCE would replace special UTF-8 characters, but according to the (then) used default value of http://wiki.moxiecode.com/index.php/TinyMCE:Configuration/entities, "named" will only replace well-known HTML entities.
Comment #3
spade CreditAttribution: spade commentedWell, thanks a bunch and keep up the good work!
Comment #4
sunThanks, committed to all branches.
Comment #5
kardave CreditAttribution: kardave commentedYes, raw was because of multilingual sites, where TinyMCE would replace special UTF-8 characters.
Now i have bunch of nodes with ugly source. All of the language specific accent chracters changed in source.
Please try to make a compromise, my authors do edit the text in plain text too. Now i have to apologize about the "obfuscated" text.
I inserted back that raw option line, so I don't have more problems with new content. Nodes created between wysiwyg update and today? That is hurt.
Not at least I have to say, big thanks for your work, and time.
David
Comment #6
sunWhat's required to fix this?
Comment #7
karol.haltenberger.old CreditAttribution: karol.haltenberger.old commentedThere should be a setting in the admin section for entity encoding where one could choose the preferred method.
If set to "named" one can specify the entities to be converted by TinyMCE using the to TinyMCE in the "entities" variable.
http://wiki.moxiecode.com/index.php/TinyMCE:Configuration/entities
Therefore I suggest such an entry field too.
I've just patched my "tinymce.inc" with a static value and it seems to work.
$init['entities'] = "160,nbsp,38,amp,60,lt,62,gt";
Comment #8
GiorgosKI agree with Lorak
make a choice in the settings "raw" or "named" and for named have them choose the "entities"
"raw" is pretty much needed for non latin character languages
here is an example
lets say I input the greek letter α tinymce will transform it to & alpha ; and that is how its going to be saved in the database and that is what an editor will see if tinymce is disabled (disable rich-text) ... which is pretty ugly
hope it helps
Comment #9
sunTo give this issue a direction: A configurable setting is too technical and would be a pain to explain. We want to implement hard-coded settings that work flawlessly on all Drupal sites.
So, I need someone to flesh out the proper settings. To be very concrete: We want a solution of the kind like #7 suggests.
In addition to that, I want to know why http://wiki.moxiecode.com/index.php/TinyMCE:Configuration/entities shows a very sane default value, but it does not seem to be the default. So what's the default?
I really hope someone of YOU can flesh this out thoroughly as soon as possible. The malformed setting hi-jacks countless new contents on many sites in the meantime.
Comment #10
sunAnyone up for solving this issue? I think this is the most critical issue currently.
Comment #11
kari.kaariainen CreditAttribution: kari.kaariainen commentedOn that TinyMCE wiki page they only list part of the default value: "Part of the default value of this option is placed in the example below."
This is the complete value:
160,nbsp,161,iexcl,162,cent,163,pound,164,curren,165,yen,166,brvbar,167,sect,
168,uml,169,copy,170,ordf,171,laquo,172,not,173,shy,174,reg,175,macr,
176,deg,177,plusmn,178,sup2,179,sup3,180,acute,181,micro,182,para,
183,middot,184,cedil,185,sup1,186,ordm,187,raquo,188,frac14,189,frac12,
190,frac34,191,iquest,192,Agrave,193,Aacute,194,Acirc,195,Atilde,196,Auml,
197,Aring,198,AElig,199,Ccedil,200,Egrave,201,Eacute,202,Ecirc,203,Euml,
204,Igrave,205,Iacute,206,Icirc,207,Iuml,208,ETH,209,Ntilde,210,Ograve,
211,Oacute,212,Ocirc,213,Otilde,214,Ouml,215,times,216,Oslash,217,Ugrave,
218,Uacute,219,Ucirc,220,Uuml,221,Yacute,222,THORN,223,szlig,224,agrave,
225,aacute,226,acirc,227,atilde,228,auml,229,aring,230,aelig,231,ccedil,
232,egrave,233,eacute,234,ecirc,235,euml,236,igrave,237,iacute,238,icirc,
239,iuml,240,eth,241,ntilde,242,ograve,243,oacute,244,ocirc,245,otilde,
246,ouml,247,divide,248,oslash,249,ugrave,250,uacute,251,ucirc,252,uuml,
253,yacute,254,thorn,255,yuml,402,fnof,913,Alpha,914,Beta,915,Gamma,
916,Delta,917,Epsilon,918,Zeta,919,Eta,920,Theta,921,Iota,922,Kappa,
923,Lambda,924,Mu,925,Nu,926,Xi,927,Omicron,928,Pi,929,Rho,931,Sigma,
932,Tau,933,Upsilon,934,Phi,935,Chi,936,Psi,937,Omega,945,alpha,946,beta,
947,gamma,948,delta,949,epsilon,950,zeta,951,eta,952,theta,953,iota,
954,kappa,955,lambda,956,mu,957,nu,958,xi,959,omicron,960,pi,961,rho,
962,sigmaf,963,sigma,964,tau,965,upsilon,966,phi,967,chi,968,psi,969,omega,
977,thetasym,978,upsih,982,piv,8226,bull,8230,hellip,8242,prime,8243,Prime,
8254,oline,8260,frasl,8472,weierp,8465,image,8476,real,8482,trade,
8501,alefsym,8592,larr,8593,uarr,8594,rarr,8595,darr,8596,harr,8629,crarr,
8656,lArr,8657,uArr,8658,rArr,8659,dArr,8660,hArr,8704,forall,8706,part,
8707,exist,8709,empty,8711,nabla,8712,isin,8713,notin,8715,ni,8719,prod,
8721,sum,8722,minus,8727,lowast,8730,radic,8733,prop,8734,infin,8736,ang,
8743,and,8744,or,8745,cap,8746,cup,8747,int,8756,there4,8764,sim,8773,cong,
8776,asymp,8800,ne,8801,equiv,8804,le,8805,ge,8834,sub,8835,sup,8836,nsub,
8838,sube,8839,supe,8853,oplus,8855,otimes,8869,perp,8901,sdot,8968,lceil,
8969,rceil,8970,lfloor,8971,rfloor,9001,lang,9002,rang,9674,loz,9824,spades,
9827,clubs,9829,hearts,9830,diams,338,OElig,339,oelig,352,Scaron,353,scaron,
376,Yuml,710,circ,732,tilde,8194,ensp,8195,emsp,8201,thinsp,8204,zwnj,
8205,zwj,8206,lrm,8207,rlm,8211,ndash,8212,mdash,8216,lsquo,8217,rsquo,
8218,sbquo,8220,ldquo,8221,rdquo,8222,bdquo,8224,dagger,8225,Dagger,
8240,permil,8249,lsaquo,8250,rsaquo,8364,euro
The javascript source file this is found in is tinymce/jscripts/tiny_mce/tiny_mce_src.js
The side effect I'm having with this situation is that when I try to enter Finnish text that has strings such as "öidä", anywhere in the text, mod_security blocks this with a 501 error "method not implemented", as it encounters "& ouml ; id & auml ;" (without the spaces) and apparently semicolon + id + ampersand is regarded as a potential attack.
Comment #12
sunWell, yeah. I need someone to figure out the proper default value we need for Drupal.
Comment #13
sunGiven this default list of entities:
and comparing that to their actual characters in http://de.selfhtml.org/html/referenz/zeichen.htm (sorry, German, but you should get the point)
I would suggest to use this custom default: (removed indented)
So my suggestion is to only include
a) invisible HTML characters with special meaning (
, 
, etc.)b) Typographic symbols (
—
, etc.)c) Common HTML entities (
©
, etc.)Resulting list:
b) and c) are up for debate. Basically, Drupal is Unicode, so we could as well rely 100% on it, and just go with a).
Comment #14
kardave CreditAttribution: kardave commentedPlease don't forget some extra hungarian character:
336, Ő
337, ő
368, Ű
369, ű
You can find them here for ex.
Thanks,
David
Comment #15
sun@kardave: Why should we convert those to HTML entities? As stated in #13, my proposal is to only convert very certain characters to HTML entities.
Comment #16
kardave CreditAttribution: kardave commentedThe default list of entities didn't contain them... :)
Comment #17
sunAfter further backtalk to smk-ka, we probably should convert HTML control characters and invisible characters. Meaning:
Attached patch works for me.
Comment #18
sunWhile being there, also fixing FCKeditor.
Thanks for reporting, reviewing, and testing! Committed to all branches.
A new development snapshot will be available within the next 12 hours. This improvement will be available in the next official release.