node teaser generation breaks when using entities [#362609]

Hello,
When I create a node with HTML Entities (manually, or with FCK Editor replacing some characters by their HTML entity), Drupal may break an HTML entity when generating the teaser version. For example, if my node body is:

This is a post with some HTML entities é éééé éé éééé éé éééé éé éééé The end

Where 'é' are HTML entities (so "é"), my teaser is:

This is a post with some HTML entities é éééé éé éééé éé éééé é&ea

I use Filtered HTML input format, and HTML Corrector is activated.

I hope this can get corrected soon. ;)

Regards,
David

Comments

Comment #1

David Stosik commented 22 January 2009 at 14:59

Forgot to tell that teaser limit is set to 200 characters.
And I just realize that 'é' as "é" counts as 8 characters... :(

Comment #2

David Stosik commented 26 January 2009 at 10:20

Anybody having the same problem?

Comment #3

span commented 24 March 2009 at 20:21

Yes, I'm having the same problem when using entities for å,ä,ö in swedish, anyone found a good solution?

Comment #4

Stephen Scholtz commented 26 May 2009 at 16:36

Yup, I'm having the same problem, although I don't think this has anything to do with the HTML Corrector filter.

As far as I can tell, the teaser is generated by node_teaser(). Because we haven't specified a delimiter (the

code) and we're letting Drupal auto-limit the teaser length, node_teaser() calls truncate_utf8().

Nothing too special going on in truncate_utf8(), except the fact that it's chopping the html entity in half. In my particular case, an HTML entity that happens to be part of an anchor tag's title attribute, which ends up breaking the page 'cuz of a half-opened tag. :P

Original code:

...fond de <a title="ontario-h&eacute;bert-toronto star" href="http://www.thestar.com/Canada/Columnist/article/616732" target="_blank">fl&eacute;chissement des appuis conservateurs </a>....

Truncated code:

...fond de <a title="ontario-h&e

I'm guessing what needs to happen is that the rest of the stuff in node_teaser(), the stuff that's supposed to be trying to cut the teaser back until it finds something useful to break at (ending paragraph tag, for example) needs to be set up to watch out for HTML entities?

Or is it the job of HTML Corrector filter to catch the broken html entity (and in my case, the broken, half finished anchor tag) and clean it up? I'm not sure where the solution to this problem should go, in node_teaser or in the HTML Corrector.

BTW, this is still an issue in Drupal 6.12, which is what I'm currently using.

Comment #5

jhodgdon

she/her

English

commented 2 April 2010 at 16:05

Title:	HTML Corrector sometimes breaks teaser when using HTML Entities	» node teaser generation breaks when using entities
Version:	6.6	» 6.x-dev
Component:	filter.module	» node.module

I'm sure this is still an issue in the latest node.module.

The reason for both of these issues is that truncate_utf8() should not be used to make a substring.
#200185: truncate_utf8() is used as a substring function
See comment #4:
http://drupal.org/node/200185#comment-662567

Comment #6

jhodgdon

she/her

English

commented 11 April 2010 at 14:18

Separate issue on truncate_utf8():
#768040: truncate_utf8() only works for latin languages (and drupal_substr has a bug)

Comment #7

jhodgdon

she/her

English

commented 10 June 2010 at 15:30

If #768040: truncate_utf8() only works for latin languages (and drupal_substr has a bug) is ever fixed in Drupal 6, it will then be fine to use truncate_utf8() as a substring function. But it would probably be best to tell it to split on word boundaries, which would then take care of this problem.

Comment #8

AlexisWilke commented 19 September 2010 at 09:31

I have a partial fix here for you guys:

#221257: text_summary() should output valid HTML and Unicode text

This won't check identity though. Good point! 8-}

I use FCKeditor with the identity feature turned off.

Thank you.
Alexis

Comment #9

2 March 2016 at 22:18

Status:

Active

» Closed (outdated)

Automatically closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.

node teaser generation breaks when using entities

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

News items

Our community

Documentation

Drupal code base

Governance of community