The Post settings page says:
Length of trimmed posts:
The maximum number of characters used in the trimmed version of a post. Drupal will use this setting to determine at which offset long posts should be trimmed.
HTML markup should not be counted as "characters" when determining where to trim the post. Currently this causes a particular problem with feeds which often embed a lot of tags in their content, and for which reliable automatic teaser generation is a must. Attached patch uses a regex to allow the node_teaser() function to work out where in $body corresponds to 600 characters (or whatever the value of $size is) of actual content.
Have tested with a variety of content, seems to work reliably but obviously needs some thorough independent testing.
A minor side effect (benefit) of this patch is that mangled tags (such as <stron>) should be prevented.
Could be enhanced so that anything between <script></script> or <style><style> tags is also "not counted", but I'm not convinced this is adding any significant value.
Also modified a couple of comments to make more sense given that we now have the JS teaser splitter.
Comment | File | Size | Author |
---|---|---|---|
#33 | drupal.text-summary-html.33.patch | 2.3 KB | sun |
#19 | node_teaser_markup_220783_1.patch | 2.01 KB | gpk |
#18 | 220783.test_.txt | 2.06 KB | vladimir.dolgopolov |
#12 | drupal6.node-teaser-markup.patch | 1.24 KB | sun |
#4 | node_teaser_markup_220783.patch | 2.06 KB | gpk |
Comments
Comment #1
Rowanw CreditAttribution: Rowanw commentedI would recommend renaming $cum_length to something else.
Comment #2
gpk CreditAttribution: gpk commentedOK OK and erm thanks for the link :-P
So ... $cumulative_length ? $cumul_length ? I'm uninspired this morning...
Comment #3
catch$cumulative_length
Sounds fine to me. and lol at rowanw, well spotted.
Comment #4
gpk CreditAttribution: gpk commentedYes, thanks, it looks fine, I guess I'm thinking back to the days of 80 x 24 B&W monitor and 32KB RAM when less was more!
Comment #5
cburschkaNow that the code is SFW, I guess I can test. =P
With the patch, nodes that are heavy on markup will not be trimmed as short as they would otherwise. So it works as advertised.
Comment #6
catchWorks for me too.
Comment #7
Gábor HojtsyThis is a whole new feature, and goes to Drupal 7. The trim formatting works as in Drupal 6 for years now.
Comment #8
gpk CreditAttribution: gpk commentedActually there is a problem with my patch since if you use HTML comments in the Body to comment out a section of markup, the "breakpoint search algorithm" may find a suitable breakpoint within the comment, resulting in broken HTML. This is not a new bug, but should really be fixed at the same time IMO.
Comment #9
Gábor HojtsyDoes not sound like a related problem to me, please open a new issue with reproduction instructions.
Comment #10
gpk CreditAttribution: gpk commentedDone, see http://drupal.org/node/221257.
It's not so much that it's related but having used a regex to split out the tags it's then relatively easy to fix that problem too (and reduce the possible extent of other mangling of markup by node_teaser()).
Comment #11
gpk CreditAttribution: gpk commented(re. #7)
Ahhh, OK, I thought perhaps I heard trumpets tuning up ... :-D
I still think this is more of a (long standing) bug than a feature though because it just doesn't do anything sensible in some pretty ordinary situations, and as a result Drupal's automatic teasers don't have much of a reputation. For example see http://drupal.org/node/155337#comment-645458 and http://drupal.org/node/181720. And there are other posts where people describe various ways of hacking node_teaser() to get it to do something reasonable for their site.
Comment #12
sunMe too thinks that this is a long outstanding bug in Drupal. Take for example Image Assist, insert an image to the beginning of a post, and set the length of trimmed posts to a small value. Readers of your post will see only an image where normally a teaser text should be displayed.
The markup for a regular image displayed inline with Image Assist is about 400 characters. A teaser text with about 400 characters is... quite long. Drupal admins are faced with a situation in which they need to explain their users to properly use the teaser delimiter, or need to edit affected nodes themselves.
Thanks to gpk's input, here's a much cleaner and simpler patch that fixes this bug.
Comment #13
catchafaik, bugs still have to be fixed in 7.x then be backported since yesterday. But someone please correct me if I'm wrong. I'd like to see this fixed in 6.x though - it's only a little bit featurish and a lot bug fixing.
Comment #14
gpk CreditAttribution: gpk commented@sun: If I've understood Image Assist correctly you can set it to insert a custom [img_assist] tag. In this case, since teaser generation happens *before* filtering, node_teaser() will not recognise the tag, which will still count towards the teaser length (and in rare instances node_teaser could break it). At least the custom tag is not as long as an HTML <img> tag though.
Maybe you were referring to the HTML Code insert behaviour?
(And thanks for simplifying the patch BTW!)
Comment #15
gpk CreditAttribution: gpk commentedHmm this seems pretty broken now. Needs to capture the length of markup as well as text: $chunks[i][1] is not the length of markup (and can generate an "undefined offset" PHP notice. Also immediately after the loop should check for $text_length <= $size and return $body if so.
Comment #16
sunErr... my fault, don't know how I hit on that. :-/
Now, I don't know how much this patch gains us. While it certainly would work for posts containing plenty of HTML markup, I strongly believe that many users experience this bug when a post contains Inline macros which also can become quite lengthy.
Comment #17
Cory Goodwin CreditAttribution: Cory Goodwin commentedCan any of these patches work with 5.7?
Comment #18
vladimir.dolgopolov CreditAttribution: vladimir.dolgopolov commentedI have made the test (for Simpletest module) for the issue, so patch #4 pass and patch #12 fail the one.
Comment #19
gpk CreditAttribution: gpk commentedThanks, don't fully understand how the test works but your results are as I'd expect, i.e. #4 is OK and #12 doesn't work.
I couldn't see any way of fixing #12 without ending up with something like #4. Have rerolled #4, and changed one comment, renamed $split -> $chunks, and removed unwanted whitespace from one line. Otherwise identical to #4. Probably RTBC as per Gábor @ #4 but I'll leave it at CNR for now. Still plays nicely :-).
Comment #20
gpk CreditAttribution: gpk commented@sun: forgot to say in previous comment - I think inline macros would need further thought; originally I did not intend this issue to deal with them. In principle it's just another regex but I'm not sufficiently familiar with typical macros used to be able to propose a sensible way forward. Maybe any macro(s) used by a filter and that should be excluded from teaser length calculations could be declared for specified $delta in hook_filter() with $op = 'list macros' or somesuch. Or maybe, even, teaser generation should happen *after* filtering ...? Anyway I think this would belong in a new issue. I'd still say the attached patch is very worthwhile even though it doesn't address the question of inline macros.
Comment #21
Freso CreditAttribution: Freso commentedOne thing about in-line macros is that they pretty much just replace themselves with something else. If the teaser 'splitting' (would) happen after having run the applicable filters, these should be a non-issue. I don't know how it works now, but that might be an idea. Thought it would probably be somewhat of a resource drain. Hm.
Comment #22
Robin Monks CreditAttribution: Robin Monks commentedPatch applied OK, results of the node teaser simpletest in core. For this patch to get in core you will need to include an updated test as well.
http://www.picamatic.com/show/2008/09/01/04/55/933749_958x3667.png
Robin
Comment #23
jrao CreditAttribution: jrao commentedSo what is the status of this issue now? It's been abandoned? It's too bad since I think the patch is try to solve a serious problem in Drupal's teaser feature.
Comment #24
Dane Powell CreditAttribution: Dane Powell commentedI agree- this is a serious problem on my site and unfortunately it's not one that's easily fixed unless it's fixed in core. If this doesn't make it into 7, does anyone know of any way to backport it to a 6.x module?
Comment #25
butler360 CreditAttribution: butler360 commentedI'd like to see this in 6.x as well.
Comment #26
bryancasler CreditAttribution: bryancasler commentedIs #19 for D6.x ?
Comment #27
BrightBold+1 for backporting to 6.x - I found this thread while looking for a way to have a view on my homepage with three articles side-by-side, trimmed to equal length. But the fact that links and other HTML code count towards the trim length make this difficult, so it's great that you are working on a fix.
Comment #28
sunre @Freso #21:
I agree that this is actually how it should work. Less a performance issue, as filtered content is cached.
Cutting after filtering would mean that we'd have to manually run the HTML Corrector filter on the resulting markup (which actually happens already).
The new/current code lives in http://api.drupal.org/api/function/text_summary/7
It would be awesome to get this epic bug finally fixed for D7.
Moving to "field system" component, as there is no component for Text module yet.
Comment #29
BrightBoldIt would indeed. But does the fact that sun's link points to api.drupal.org mean it's an API change and that ship has sailed? (Sorry for my ignorance on back-end core issues.) If there's no chance of this getting solved for D7, can we bump to D8? Or is there still hope? (Even a partial solution that allowed for images but omitted macros would be better than what we have now!)
Comment #30
bryancasler CreditAttribution: bryancasler commentedI will sponsor a patch to d6. By sponsor I mean I will pay for an have delivered a pizza to your house. Since it's almost the weekend I though I would offer.
Comment #31
james.williams CreditAttribution: james.williams commentedthe patch #19 looks fairly good for D6. However, it can truncate text before ending HTML tags. (it doesn't chop tags in half, but can be cut off the text between a starting tag and an ending tag.)
Full testing is still needed on it, but it's definitely an improvement over the existing code, so I would suggest that if people want to at least improve the current broken state of the teasering, use that patch to at least improve things, and then test it some more :-)
Comment #32
bryancasler CreditAttribution: bryancasler commentedBeen using it with no problems, but I don't think the final count is accurate still.
ex. www.ivawbeta.org/staff
Comment #33
sunStraight re-roll against HEAD. Didn't actually review or verify the code.
Comment #35
bryancasler CreditAttribution: bryancasler commentedAnyone able to take a look at this again?
Comment #36
pillarsdotnet CreditAttribution: pillarsdotnet commented@animelion -- see #221257: text_summary() should output valid HTML and Unicode text, and not count markup characters as part of the text length, especially comment #93: Summary requested by ysched.
Comment #37
pillarsdotnet CreditAttribution: pillarsdotnet commentedMarking this as a dup of #221257: text_summary() should output valid HTML and Unicode text, and not count markup characters as part of the text length since that issue has a working patch.
Comment #38
alexverb CreditAttribution: alexverb commentedIf anyone is still looking for a D7 solution I have a sandbox provided here: [D7] Field HTML Trim. If you want to get this into contrib, help me out with some reviews.