The Post settings page says:

Length of trimmed posts:
The maximum number of characters used in the trimmed version of a post. Drupal will use this setting to determine at which offset long posts should be trimmed.

HTML markup should not be counted as "characters" when determining where to trim the post. Currently this causes a particular problem with feeds which often embed a lot of tags in their content, and for which reliable automatic teaser generation is a must. Attached patch uses a regex to allow the node_teaser() function to work out where in $body corresponds to 600 characters (or whatever the value of $size is) of actual content.

Have tested with a variety of content, seems to work reliably but obviously needs some thorough independent testing.

A minor side effect (benefit) of this patch is that mangled tags (such as <stron>) should be prevented.

Could be enhanced so that anything between <script></script> or <style><style> tags is also "not counted", but I'm not convinced this is adding any significant value.

Also modified a couple of comments to make more sense given that we now have the JS teaser splitter.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Rowanw’s picture

I would recommend renaming $cum_length to something else.

gpk’s picture

OK OK and erm thanks for the link :-P

So ... $cumulative_length ? $cumul_length ? I'm uninspired this morning...

catch’s picture

$cumulative_length
Sounds fine to me. and lol at rowanw, well spotted.

gpk’s picture

Yes, thanks, it looks fine, I guess I'm thinking back to the days of 80 x 24 B&W monitor and 32KB RAM when less was more!

cburschka’s picture

Now that the code is SFW, I guess I can test. =P

With the patch, nodes that are heavy on markup will not be trimmed as short as they would otherwise. So it works as advertised.

catch’s picture

Status: Needs review » Reviewed & tested by the community

Works for me too.

Gábor Hojtsy’s picture

Version: 6.x-dev » 7.x-dev
Category: bug » feature

This is a whole new feature, and goes to Drupal 7. The trim formatting works as in Drupal 6 for years now.

gpk’s picture

Status: Reviewed & tested by the community » Needs work

Actually there is a problem with my patch since if you use HTML comments in the Body to comment out a section of markup, the "breakpoint search algorithm" may find a suitable breakpoint within the comment, resulting in broken HTML. This is not a new bug, but should really be fixed at the same time IMO.

Gábor Hojtsy’s picture

Status: Needs work » Reviewed & tested by the community

Does not sound like a related problem to me, please open a new issue with reproduction instructions.

gpk’s picture

Done, see http://drupal.org/node/221257.

It's not so much that it's related but having used a regex to split out the tags it's then relatively easy to fix that problem too (and reduce the possible extent of other mangling of markup by node_teaser()).

gpk’s picture

Category: feature » bug

(re. #7)
Ahhh, OK, I thought perhaps I heard trumpets tuning up ... :-D

I still think this is more of a (long standing) bug than a feature though because it just doesn't do anything sensible in some pretty ordinary situations, and as a result Drupal's automatic teasers don't have much of a reputation. For example see http://drupal.org/node/155337#comment-645458 and http://drupal.org/node/181720. And there are other posts where people describe various ways of hacking node_teaser() to get it to do something reasonable for their site.

sun’s picture

Title: HTML markup in a post should not be counted when trimming the teaser to the required length » HTML markup is counted in for the length of trimmed posts
Version: 7.x-dev » 6.x-dev
Status: Reviewed & tested by the community » Needs review
FileSize
1.24 KB

Me too thinks that this is a long outstanding bug in Drupal. Take for example Image Assist, insert an image to the beginning of a post, and set the length of trimmed posts to a small value. Readers of your post will see only an image where normally a teaser text should be displayed.

The markup for a regular image displayed inline with Image Assist is about 400 characters. A teaser text with about 400 characters is... quite long. Drupal admins are faced with a situation in which they need to explain their users to properly use the teaser delimiter, or need to edit affected nodes themselves.

Thanks to gpk's input, here's a much cleaner and simpler patch that fixes this bug.

catch’s picture

Version: 6.x-dev » 7.x-dev

afaik, bugs still have to be fixed in 7.x then be backported since yesterday. But someone please correct me if I'm wrong. I'd like to see this fixed in 6.x though - it's only a little bit featurish and a lot bug fixing.

gpk’s picture

@sun: If I've understood Image Assist correctly you can set it to insert a custom [img_assist] tag. In this case, since teaser generation happens *before* filtering, node_teaser() will not recognise the tag, which will still count towards the teaser length (and in rare instances node_teaser could break it). At least the custom tag is not as long as an HTML <img> tag though.

Maybe you were referring to the HTML Code insert behaviour?

(And thanks for simplifying the patch BTW!)

gpk’s picture

Status: Needs review » Needs work

Hmm this seems pretty broken now. Needs to capture the length of markup as well as text: $chunks[i][1] is not the length of markup (and can generate an "undefined offset" PHP notice. Also immediately after the loop should check for $text_length <= $size and return $body if so.

sun’s picture

Err... my fault, don't know how I hit on that. :-/

Now, I don't know how much this patch gains us. While it certainly would work for posts containing plenty of HTML markup, I strongly believe that many users experience this bug when a post contains Inline macros which also can become quite lengthy.

Cory Goodwin’s picture

Can any of these patches work with 5.7?

vladimir.dolgopolov’s picture

FileSize
2.06 KB

I have made the test (for Simpletest module) for the issue, so patch #4 pass and patch #12 fail the one.

gpk’s picture

Status: Needs work » Needs review
FileSize
2.01 KB

Thanks, don't fully understand how the test works but your results are as I'd expect, i.e. #4 is OK and #12 doesn't work.

I couldn't see any way of fixing #12 without ending up with something like #4. Have rerolled #4, and changed one comment, renamed $split -> $chunks, and removed unwanted whitespace from one line. Otherwise identical to #4. Probably RTBC as per Gábor @ #4 but I'll leave it at CNR for now. Still plays nicely :-).

gpk’s picture

@sun: forgot to say in previous comment - I think inline macros would need further thought; originally I did not intend this issue to deal with them. In principle it's just another regex but I'm not sufficiently familiar with typical macros used to be able to propose a sensible way forward. Maybe any macro(s) used by a filter and that should be excluded from teaser length calculations could be declared for specified $delta in hook_filter() with $op = 'list macros' or somesuch. Or maybe, even, teaser generation should happen *after* filtering ...? Anyway I think this would belong in a new issue. I'd still say the attached patch is very worthwhile even though it doesn't address the question of inline macros.

Freso’s picture

One thing about in-line macros is that they pretty much just replace themselves with something else. If the teaser 'splitting' (would) happen after having run the applicable filters, these should be a non-issue. I don't know how it works now, but that might be an idea. Thought it would probably be somewhat of a resource drain. Hm.

Robin Monks’s picture

Status: Needs review » Needs work

Patch applied OK, results of the node teaser simpletest in core. For this patch to get in core you will need to include an updated test as well.

http://www.picamatic.com/show/2008/09/01/04/55/933749_958x3667.png

Robin

jrao’s picture

So what is the status of this issue now? It's been abandoned? It's too bad since I think the patch is try to solve a serious problem in Drupal's teaser feature.

Dane Powell’s picture

I agree- this is a serious problem on my site and unfortunately it's not one that's easily fixed unless it's fixed in core. If this doesn't make it into 7, does anyone know of any way to backport it to a 6.x module?

butler360’s picture

I'd like to see this in 6.x as well.

bryancasler’s picture

Is #19 for D6.x ?

BrightBold’s picture

+1 for backporting to 6.x - I found this thread while looking for a way to have a view on my homepage with three articles side-by-side, trimmed to equal length. But the fact that links and other HTML code count towards the trim length make this difficult, so it's great that you are working on a fix.

sun’s picture

Component: node.module » field system

re @Freso #21:

One thing about in-line macros is that they pretty much just replace themselves with something else. If the teaser 'splitting' (would) happen after having run the applicable filters, these should be a non-issue. I don't know how it works now, but that might be an idea. Thought it would probably be somewhat of a resource drain. Hm.

I agree that this is actually how it should work. Less a performance issue, as filtered content is cached.

Cutting after filtering would mean that we'd have to manually run the HTML Corrector filter on the resulting markup (which actually happens already).

The new/current code lives in http://api.drupal.org/api/function/text_summary/7

It would be awesome to get this epic bug finally fixed for D7.

Moving to "field system" component, as there is no component for Text module yet.

BrightBold’s picture

It would be awesome to get this epic bug finally fixed for D7.

It would indeed. But does the fact that sun's link points to api.drupal.org mean it's an API change and that ship has sailed? (Sorry for my ignorance on back-end core issues.) If there's no chance of this getting solved for D7, can we bump to D8? Or is there still hope? (Even a partial solution that allowed for images but omitted macros would be better than what we have now!)

bryancasler’s picture

I will sponsor a patch to d6. By sponsor I mean I will pay for an have delivered a pizza to your house. Since it's almost the weekend I though I would offer.

james.williams’s picture

the patch #19 looks fairly good for D6. However, it can truncate text before ending HTML tags. (it doesn't chop tags in half, but can be cut off the text between a starting tag and an ending tag.)

Full testing is still needed on it, but it's definitely an improvement over the existing code, so I would suggest that if people want to at least improve the current broken state of the teasering, use that patch to at least improve things, and then test it some more :-)

bryancasler’s picture

Been using it with no problems, but I don't think the final count is accurate still.

ex. www.ivawbeta.org/staff

sun’s picture

Status: Needs work » Needs review
FileSize
2.3 KB

Straight re-roll against HEAD. Didn't actually review or verify the code.

Status: Needs review » Needs work

The last submitted patch, drupal.text-summary-html.33.patch, failed testing.

bryancasler’s picture

Anyone able to take a look at this again?

pillarsdotnet’s picture

pillarsdotnet’s picture

Status: Needs work » Closed (duplicate)
alexverb’s picture

If anyone is still looking for a D7 solution I have a sandbox provided here: [D7] Field HTML Trim. If you want to get this into contrib, help me out with some reviews.