Hello,
When I create a node with HTML Entities (manually, or with FCK Editor replacing some characters by their HTML entity), Drupal may break an HTML entity when generating the teaser version. For example, if my node body is:
This is a post with some HTML entities é éééé éé éééé éé éééé éé éééé The end
Where 'é' are HTML entities (so "é"), my teaser is:
This is a post with some HTML entities é éééé éé éééé éé éééé é&ea
I use Filtered HTML input format, and HTML Corrector is activated.
I hope this can get corrected soon. ;)
Regards,
David
Comments
Comment #1
David Stosik commentedForgot to tell that teaser limit is set to 200 characters.
And I just realize that 'é' as "é" counts as 8 characters... :(
Comment #2
David Stosik commentedAnybody having the same problem?
Comment #3
span commentedYes, I'm having the same problem when using entities for å,ä,ö in swedish, anyone found a good solution?
Comment #4
Stephen Scholtz commentedYup, I'm having the same problem, although I don't think this has anything to do with the HTML Corrector filter.
As far as I can tell, the teaser is generated by node_teaser(). Because we haven't specified a delimiter (the
code) and we're letting Drupal auto-limit the teaser length, node_teaser() calls truncate_utf8().
Nothing too special going on in truncate_utf8(), except the fact that it's chopping the html entity in half. In my particular case, an HTML entity that happens to be part of an anchor tag's title attribute, which ends up breaking the page 'cuz of a half-opened tag. :P
Original code:
Truncated code:
I'm guessing what needs to happen is that the rest of the stuff in node_teaser(), the stuff that's supposed to be trying to cut the teaser back until it finds something useful to break at (ending paragraph tag, for example) needs to be set up to watch out for HTML entities?
Or is it the job of HTML Corrector filter to catch the broken html entity (and in my case, the broken, half finished anchor tag) and clean it up? I'm not sure where the solution to this problem should go, in node_teaser or in the HTML Corrector.
BTW, this is still an issue in Drupal 6.12, which is what I'm currently using.
Comment #5
jhodgdonI'm sure this is still an issue in the latest node.module.
The reason for both of these issues is that truncate_utf8() should not be used to make a substring.
#200185: truncate_utf8() is used as a substring function
See comment #4:
http://drupal.org/node/200185#comment-662567
Comment #6
jhodgdonSeparate issue on truncate_utf8():
#768040: truncate_utf8() only works for latin languages (and drupal_substr has a bug)
Comment #7
jhodgdonIf #768040: truncate_utf8() only works for latin languages (and drupal_substr has a bug) is ever fixed in Drupal 6, it will then be fine to use truncate_utf8() as a substring function. But it would probably be best to tell it to split on word boundaries, which would then take care of this problem.
Comment #8
AlexisWilke commentedI have a partial fix here for you guys:
#221257: text_summary() should output valid HTML and Unicode text
This won't check identity though. Good point! 8-}
I use FCKeditor with the identity feature turned off.
Thank you.
Alexis