When I work with TinyMCE (3.2.1.1) in rich-text-mode and enter a   with the character map, the space is shown, but there is only a simple space when I hit TinyMCE's code button and look at the HTML.

A side-issue: When I try to add two non-breaking-spaces next to each other, I have to insert three in order to see two spaces, but when I look at the code, there is only one simple space left.

When I try the same with the "Full feature example" on the Moxiecode site, I can see the non-breaking-space-entities in the source code.

That leads me to believe that the problem lies with the Drupal implementation of TinyMCE.

I was able to reproduce this on several installations (even under Drupal 5.12 with the old 5.x-1.9 module and TinyMCE 2.1.3 - except for the side-issue).

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

spade’s picture

I got a hint to check the entity_encoding and found out it was set to 'raw'. This setting is explained here. Thanks to Pete.

Changing line 112 in sites/all/modules/wysiwyg/editors/tinymce.inc to 'entity_encoding' => 'named', solved the missing non-breaking-space-entity issue, but the question remains: shouldn't this be the default setting?

The above stated side-issue remains as well: to get one space to show, I enter one space through the character-map popup. To get two spaces I have to enter three and to get three I have to enter five. The HTML source shows one   after I entered one non-breaking-space, it shows     after three non-breaking-spaces and       after five.

sun’s picture

Status: Active » Needs review
FileSize
649 bytes

"named" is TinyMCE's default value, so we can just remove the entire setting.

I have no idea why this was set to "raw". I always thought it was because of multilingual sites, where TinyMCE would replace special UTF-8 characters, but according to the (then) used default value of http://wiki.moxiecode.com/index.php/TinyMCE:Configuration/entities, "named" will only replace well-known HTML entities.

spade’s picture

Well, thanks a bunch and keep up the good work!

sun’s picture

Title: TinyMCE filters   » TinyMCE: entity_encoding raw removes HTML entities
Status: Needs review » Fixed

Thanks, committed to all branches.

kardave’s picture

Version: 6.x-0.5 » 6.x-1.0
Status: Fixed » Active

Yes, raw was because of multilingual sites, where TinyMCE would replace special UTF-8 characters.
Now i have bunch of nodes with ugly source. All of the language specific accent chracters changed in source.

Please try to make a compromise, my authors do edit the text in plain text too. Now i have to apologize about the "obfuscated" text.
I inserted back that raw option line, so I don't have more problems with new content. Nodes created between wysiwyg update and today? That is hurt.

Not at least I have to say, big thanks for your work, and time.

David

sun’s picture

Priority: Normal » Critical

What's required to fix this?

karol.haltenberger.old’s picture

There should be a setting in the admin section for entity encoding where one could choose the preferred method.
If set to "named" one can specify the entities to be converted by TinyMCE using the to TinyMCE in the "entities" variable.
http://wiki.moxiecode.com/index.php/TinyMCE:Configuration/entities
Therefore I suggest such an entry field too.

I've just patched my "tinymce.inc" with a static value and it seems to work.
$init['entities'] = "160,nbsp,38,amp,60,lt,62,gt";

GiorgosK’s picture

I agree with Lorak

make a choice in the settings "raw" or "named" and for named have them choose the "entities"

"raw" is pretty much needed for non latin character languages

here is an example

lets say I input the greek letter α tinymce will transform it to & alpha ; and that is how its going to be saved in the database and that is what an editor will see if tinymce is disabled (disable rich-text) ... which is pretty ugly

hope it helps

sun’s picture

Version: 6.x-1.0 » 6.x-1.1

To give this issue a direction: A configurable setting is too technical and would be a pain to explain. We want to implement hard-coded settings that work flawlessly on all Drupal sites.

So, I need someone to flesh out the proper settings. To be very concrete: We want a solution of the kind like #7 suggests.

In addition to that, I want to know why http://wiki.moxiecode.com/index.php/TinyMCE:Configuration/entities shows a very sane default value, but it does not seem to be the default. So what's the default?

I really hope someone of YOU can flesh this out thoroughly as soon as possible. The malformed setting hi-jacks countless new contents on many sites in the meantime.

sun’s picture

Issue tags: +Release blocker

Anyone up for solving this issue? I think this is the most critical issue currently.

kari.kaariainen’s picture

On that TinyMCE wiki page they only list part of the default value: "Part of the default value of this option is placed in the example below."

This is the complete value:

160,nbsp,161,iexcl,162,cent,163,pound,164,curren,165,yen,166,brvbar,167,sect,
168,uml,169,copy,170,ordf,171,laquo,172,not,173,shy,174,reg,175,macr,
176,deg,177,plusmn,178,sup2,179,sup3,180,acute,181,micro,182,para,
183,middot,184,cedil,185,sup1,186,ordm,187,raquo,188,frac14,189,frac12,
190,frac34,191,iquest,192,Agrave,193,Aacute,194,Acirc,195,Atilde,196,Auml,
197,Aring,198,AElig,199,Ccedil,200,Egrave,201,Eacute,202,Ecirc,203,Euml,
204,Igrave,205,Iacute,206,Icirc,207,Iuml,208,ETH,209,Ntilde,210,Ograve,
211,Oacute,212,Ocirc,213,Otilde,214,Ouml,215,times,216,Oslash,217,Ugrave,
218,Uacute,219,Ucirc,220,Uuml,221,Yacute,222,THORN,223,szlig,224,agrave,
225,aacute,226,acirc,227,atilde,228,auml,229,aring,230,aelig,231,ccedil,
232,egrave,233,eacute,234,ecirc,235,euml,236,igrave,237,iacute,238,icirc,
239,iuml,240,eth,241,ntilde,242,ograve,243,oacute,244,ocirc,245,otilde,
246,ouml,247,divide,248,oslash,249,ugrave,250,uacute,251,ucirc,252,uuml,
253,yacute,254,thorn,255,yuml,402,fnof,913,Alpha,914,Beta,915,Gamma,
916,Delta,917,Epsilon,918,Zeta,919,Eta,920,Theta,921,Iota,922,Kappa,
923,Lambda,924,Mu,925,Nu,926,Xi,927,Omicron,928,Pi,929,Rho,931,Sigma,
932,Tau,933,Upsilon,934,Phi,935,Chi,936,Psi,937,Omega,945,alpha,946,beta,
947,gamma,948,delta,949,epsilon,950,zeta,951,eta,952,theta,953,iota,
954,kappa,955,lambda,956,mu,957,nu,958,xi,959,omicron,960,pi,961,rho,
962,sigmaf,963,sigma,964,tau,965,upsilon,966,phi,967,chi,968,psi,969,omega,
977,thetasym,978,upsih,982,piv,8226,bull,8230,hellip,8242,prime,8243,Prime,
8254,oline,8260,frasl,8472,weierp,8465,image,8476,real,8482,trade,
8501,alefsym,8592,larr,8593,uarr,8594,rarr,8595,darr,8596,harr,8629,crarr,
8656,lArr,8657,uArr,8658,rArr,8659,dArr,8660,hArr,8704,forall,8706,part,
8707,exist,8709,empty,8711,nabla,8712,isin,8713,notin,8715,ni,8719,prod,
8721,sum,8722,minus,8727,lowast,8730,radic,8733,prop,8734,infin,8736,ang,
8743,and,8744,or,8745,cap,8746,cup,8747,int,8756,there4,8764,sim,8773,cong,
8776,asymp,8800,ne,8801,equiv,8804,le,8805,ge,8834,sub,8835,sup,8836,nsub,
8838,sube,8839,supe,8853,oplus,8855,otimes,8869,perp,8901,sdot,8968,lceil,
8969,rceil,8970,lfloor,8971,rfloor,9001,lang,9002,rang,9674,loz,9824,spades,
9827,clubs,9829,hearts,9830,diams,338,OElig,339,oelig,352,Scaron,353,scaron,
376,Yuml,710,circ,732,tilde,8194,ensp,8195,emsp,8201,thinsp,8204,zwnj,
8205,zwj,8206,lrm,8207,rlm,8211,ndash,8212,mdash,8216,lsquo,8217,rsquo,
8218,sbquo,8220,ldquo,8221,rdquo,8222,bdquo,8224,dagger,8225,Dagger,
8240,permil,8249,lsaquo,8250,rsaquo,8364,euro

The javascript source file this is found in is tinymce/jscripts/tiny_mce/tiny_mce_src.js

The side effect I'm having with this situation is that when I try to enter Finnish text that has strings such as "öidä", anywhere in the text, mod_security blocks this with a 501 error "method not implemented", as it encounters "& ouml ; id & auml ;" (without the spaces) and apparently semicolon + id + ampersand is regarded as a potential attack.

sun’s picture

Well, yeah. I need someone to figure out the proper default value we need for Drupal.

sun’s picture

Given this default list of entities:

160,nbsp
161,iexcl
162,cent
163,pound
164,curren
165,yen
166,brvbar
167,sect
168,uml
169,copy
170,ordf
171,laquo
172,not
173,shy
174,reg
175,macr
176,deg
177,plusmn
178,sup2
179,sup3
180,acute
181,micro
182,para
183,middot
184,cedil
185,sup1
186,ordm
187,raquo
188,frac14
189,frac12
190,frac34
191,iquest
192,Agrave
193,Aacute
194,Acirc
195,Atilde
196,Auml
197,Aring
198,AElig
199,Ccedil
200,Egrave
201,Eacute
202,Ecirc
203,Euml
204,Igrave
205,Iacute
206,Icirc
207,Iuml
208,ETH
209,Ntilde
210,Ograve
211,Oacute
212,Ocirc
213,Otilde
214,Ouml
215,times
216,Oslash
217,Ugrave
218,Uacute
219,Ucirc
220,Uuml
221,Yacute
222,THORN
223,szlig
224,agrave
225,aacute
226,acirc
227,atilde
228,auml
229,aring
230,aelig
231,ccedil
232,egrave
233,eacute
234,ecirc
235,euml
236,igrave
237,iacute
238,icirc
239,iuml
240,eth
241,ntilde
242,ograve
243,oacute
244,ocirc
245,otilde
246,ouml
247,divide
248,oslash
249,ugrave
250,uacute
251,ucirc
252,uuml
253,yacute
254,thorn
255,yuml
402,fnof
913,Alpha
914,Beta
915,Gamma
916,Delta
917,Epsilon
918,Zeta
919,Eta
920,Theta
921,Iota
922,Kappa
923,Lambda
924,Mu
925,Nu
926,Xi
927,Omicron
928,Pi
929,Rho
931,Sigma
932,Tau
933,Upsilon
934,Phi
935,Chi
936,Psi
937,Omega
945,alpha
946,beta
947,gamma
948,delta
949,epsilon
950,zeta
951,eta
952,theta
953,iota
954,kappa
955,lambda
956,mu
957,nu
958,xi
959,omicron
960,pi
961,rho
962,sigmaf
963,sigma
964,tau
965,upsilon
966,phi
967,chi
968,psi
969,omega
977,thetasym
978,upsih
982,piv
8226,bull
8230,hellip
8242,prime
8243,Prime
8254,oline
8260,frasl
8472,weierp
8465,image
8476,real
8482,trade
8501,alefsym
8592,larr
8593,uarr
8594,rarr
8595,darr
8596,harr
8629,crarr
8656,lArr
8657,uArr
8658,rArr
8659,dArr
8660,hArr
8704,forall
8706,part
8707,exist
8709,empty
8711,nabla
8712,isin
8713,notin
8715,ni
8719,prod
8721,sum
8722,minus
8727,lowast
8730,radic
8733,prop
8734,infin
8736,ang
8743,and
8744,or
8745,cap
8746,cup
8747,int
8756,there4
8764,sim
8773,cong
8776,asymp
8800,ne
8801,equiv
8804,le
8805,ge
8834,sub
8835,sup
8836,nsub
8838,sube
8839,supe
8853,oplus
8855,otimes
8869,perp
8901,sdot
8968,lceil
8969,rceil
8970,lfloor
8971,rfloor
9001,lang
9002,rang
9674,loz
9824,spades
9827,clubs
9829,hearts
9830,diams
338,OElig
339,oelig
352,Scaron
353,scaron
376,Yuml
710,circ
732,tilde
8194,ensp
8195,emsp
8201,thinsp
8204,zwnj
8205,zwj
8206,lrm
8207,rlm
8211,ndash
8212,mdash
8216,lsquo
8217,rsquo
8218,sbquo
8220,ldquo
8221,rdquo
8222,bdquo
8224,dagger
8225,Dagger
8240,permil
8249,lsaquo
8250,rsaquo
8364,euro

and comparing that to their actual characters in http://de.selfhtml.org/html/referenz/zeichen.htm (sorry, German, but you should get the point)

I would suggest to use this custom default: (removed indented)

160,nbsp
-- 161,iexcl
-- 162,cent
-- 163,pound
-- 164,curren
-- 165,yen
-- 166,brvbar
-- 167,sect
-- 168,uml
169,copy
-- 170,ordf
171,laquo
-- 172,not
173,shy
174,reg
-- 175,macr
-- 176,deg
-- 177,plusmn
-- 178,sup2
-- 179,sup3
-- 180,acute
-- 181,micro
-- 182,para
183,middot
-- 184,cedil
-- 185,sup1
-- 186,ordm
187,raquo
-- 188,frac14
-- 189,frac12
-- 190,frac34
-- 191,iquest

-- 192 - 255

-- 913 - 982

-- 402,fnof

-- 8226,bull
8230,hellip
-- 8242,prime
-- 8243,Prime
-- 8254,oline
-- 8260,frasl
-- 8472,weierp
-- 8465,image
-- 8476,real
8482,trade
-- 8501,alefsym
-- 8592,larr
-- 8593,uarr
-- 8594,rarr
-- 8595,darr
-- 8596,harr
-- 8629,crarr
-- 8656,lArr
-- 8657,uArr
-- 8658,rArr
-- 8659,dArr
-- 8660,hArr

-- 8704 - 8901

-- 8968 - 9002

-- 9674,loz
-- 9824,spades
-- 9827,clubs
-- 9829,hearts
-- 9830,diams

-- 338 - 376

-- 710,circ
-- 732,tilde

8194,ensp
8195,emsp
8201,thinsp
8204,zwnj
8205,zwj
8206,lrm
8207,rlm
8211,ndash
8212,mdash

-- 8216 - 8250

-- 8364,euro

So my suggestion is to only include

a) invisible HTML characters with special meaning ( ,  , etc.)

b) Typographic symbols (—, etc.)

c) Common HTML entities (©, etc.)

Resulting list:

160,nbsp
169,copy
171,laquo
173,shy
174,reg
183,middot
187,raquo
8230,hellip
8482,trade
8194,ensp
8195,emsp
8201,thinsp
8204,zwnj
8205,zwj
8206,lrm
8207,rlm
8211,ndash
8212,mdash

b) and c) are up for debate. Basically, Drupal is Unicode, so we could as well rely 100% on it, and just go with a).

kardave’s picture

Please don't forget some extra hungarian character:
336, Ő
337, ő
368, Ű
369, ű

You can find them here for ex.

Thanks,
David

sun’s picture

@kardave: Why should we convert those to HTML entities? As stated in #13, my proposal is to only convert very certain characters to HTML entities.

kardave’s picture

The default list of entities didn't contain them... :)

sun’s picture

Version: 6.x-1.1 » 6.x-2.0-alpha1
Status: Active » Needs review
FileSize
1.08 KB

After further backtalk to smk-ka, we probably should convert HTML control characters and invisible characters. Meaning:

160,nbsp
173,shy
8194,ensp
8195,emsp
8201,thinsp
8204,zwnj
8205,zwj
8206,lrm
8207,rlm

Attached patch works for me.

sun’s picture

Status: Needs review » Fixed
FileSize
2.13 KB

While being there, also fixing FCKeditor.

Thanks for reporting, reviewing, and testing! Committed to all branches.

A new development snapshot will be available within the next 12 hours. This improvement will be available in the next official release.

Status: Fixed » Closed (fixed)
Issue tags: -Release blocker

Automatically closed -- issue fixed for 2 weeks with no activity.