Hello,
When I create a node with HTML Entities (manually, or with FCK Editor replacing some characters by their HTML entity), Drupal may break an HTML entity when generating the teaser version. For example, if my node body is:

This is a post with some HTML entities é éééé éé éééé éé éééé éé éééé The end

Where 'é' are HTML entities (so "é"), my teaser is:

This is a post with some HTML entities é éééé éé éééé éé éééé é&ea

I use Filtered HTML input format, and HTML Corrector is activated.

I hope this can get corrected soon. ;)

Regards,
David

Comments

David Stosik’s picture

Forgot to tell that teaser limit is set to 200 characters.
And I just realize that 'é' as "é" counts as 8 characters... :(

David Stosik’s picture

Anybody having the same problem?

span’s picture

Yes, I'm having the same problem when using entities for å,ä,ö in swedish, anyone found a good solution?

Stephen Scholtz’s picture

Yup, I'm having the same problem, although I don't think this has anything to do with the HTML Corrector filter.

As far as I can tell, the teaser is generated by node_teaser(). Because we haven't specified a delimiter (the

code) and we're letting Drupal auto-limit the teaser length, node_teaser() calls truncate_utf8().

Nothing too special going on in truncate_utf8(), except the fact that it's chopping the html entity in half. In my particular case, an HTML entity that happens to be part of an anchor tag's title attribute, which ends up breaking the page 'cuz of a half-opened tag. :P

Original code:

...fond de <a title="ontario-h&eacute;bert-toronto star" href="http://www.thestar.com/Canada/Columnist/article/616732" target="_blank">fl&eacute;chissement des appuis conservateurs </a>....

Truncated code:

...fond de <a title="ontario-h&e

I'm guessing what needs to happen is that the rest of the stuff in node_teaser(), the stuff that's supposed to be trying to cut the teaser back until it finds something useful to break at (ending paragraph tag, for example) needs to be set up to watch out for HTML entities?

Or is it the job of HTML Corrector filter to catch the broken html entity (and in my case, the broken, half finished anchor tag) and clean it up? I'm not sure where the solution to this problem should go, in node_teaser or in the HTML Corrector.

BTW, this is still an issue in Drupal 6.12, which is what I'm currently using.

jhodgdon’s picture

Title: HTML Corrector sometimes breaks teaser when using HTML Entities » node teaser generation breaks when using entities
Version: 6.6 » 6.x-dev
Component: filter.module » node.module

I'm sure this is still an issue in the latest node.module.

The reason for both of these issues is that truncate_utf8() should not be used to make a substring.
#200185: truncate_utf8() is used as a substring function
See comment #4:
http://drupal.org/node/200185#comment-662567

jhodgdon’s picture

jhodgdon’s picture

If #768040: truncate_utf8() only works for latin languages (and drupal_substr has a bug) is ever fixed in Drupal 6, it will then be fine to use truncate_utf8() as a substring function. But it would probably be best to tell it to split on word boundaries, which would then take care of this problem.

AlexisWilke’s picture

I have a partial fix here for you guys:

#221257: text_summary() should output valid HTML and Unicode text

This won't check identity though. Good point! 8-}

I use FCKeditor with the identity feature turned off.

Thank you.
Alexis

Status: Active » Closed (outdated)

Automatically closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.