print node_teaser("A question? A sentence. Another sentence.", 0, 30) returns "A question?" instead of "A question? A sentence."

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Bevan’s picture

Version: 7.x-dev » 6.0-beta1
FileSize
2.36 KB

A simple fix for this is

 Index: node.module
===================================================================
RCS file: /cvs/drupal/drupal/modules/node/node.module,v
retrieving revision 1.889
diff -r1.889 node.module
314c314
<       $position = 0 - $length - $offset;
---
>       $position = 0 - $min_length - $offset;

Which makes me think that this was a simple oversight that never got discovered or fixed till now. However further evaluation of the code led me to find other areas that could be improved, leading to this patch

yched’s picture

Status: Needs review » Needs work

Interesting. I also encountered strange teaser cuts.

But you have to create your patches in the format expected on drupal.org if you want them to get reviewed (not to mention applied :-) : http://drupal.org/patch/create

Bevan’s picture

FileSize
2.65 KB

Oh bugger. This again. I hope this is the right format.

Owen Barton’s picture

This looks good to me. Certainly the $position = 0 - $length - $offset; bug should be fixed in Drupal 6.

It would be good to get some testers (especially multilingual) to try it out and use retease module to regenerate all their site teasers (on a dev site, of course) to verify that this is still playing nicely with all the I18n edge cases.

In terms of code style, the comments should wrap at 80 columns.
I think having a separate $breakpoints array (as it was before) was neater, but that if very subjective ;)

Bevan’s picture

Status: Needs work » Needs review

Thanks for reviewing Owen. bump

seanr’s picture

Works here as far as I can tell.

Bevan’s picture

Hmmm, RTBC?

steinmb’s picture

Status: Needs review » Reviewed & tested by the community

Tested on nightly build 3.nov 2007. That patch works just fine. RTBC.

--
Stein Magne

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs work

Coding style concerns were not resolved.

Bevan’s picture

Version: 6.0-beta1 » 6.x-dev
Status: Needs work » Reviewed & tested by the community
FileSize
2.75 KB

I addressed the two issues grugnog raised in #5; length of comment lines and declaring breakpoints in arrays outside of the foreach clause. I can't find any other issues using coder module or by skimming through coding standards http://drupal.org/node/318.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs work

Seems like a little bit of optimization could still be possible with reusing $size at the end instead of the strlen($reverse). $size contains the same value at that point still as it seems. Also add a comment there, that // If a better breakpoint is found before the end of the maximum teaser length, break there. Or something along these lines, so we remember what $size is. (Or maybe better to rename $size to $max_length, and leave $size alone, not changing its meaning).

Bevan’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
2.74 KB

Ah, good spotting. I also implemented a === operator as these should both be integers. I think this communicates that more clearly to any future coders. I did a more extended round of testing on this one to be sure it doens't break anything.

Bevan’s picture

FileSize
2.74 KB

Ah, good spotting. I also implemented a === operator as these should both be integers. I think this communicates that more clearly to anyone reading this code. I did a more extended round of testing on this one to be sure it doesn't break anything.

Bevan’s picture

Ah, good spotting. I also implemented a === operator as these should both be integers. I think this communicates that more clearly to any future coders. I did a more extended round of testing on this one to be sure it doesn't break anything.

This is the third time I've tried to upload the patch. I'm leaving the attachment off this time...

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs work

As I said, using $max_length would be so much cleaner, instead of redefining what $size means. Also the return at the end of function could make use of this $max_length.

Bevan’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
3.04 KB

Ooops. I hadn't understood all of your previous comment. I implemented $max_length and left $size unchanged. I agree it is tidier that way and easier to understand. I also reevaluated all the comments in this patch and updated to make them a little clearer and consistent.

Sorry for the duplicate posts. This new comment system takes several minutes to display new posts, it seems.

Bevan’s picture

I rewrote this patch with some minor 'intelligence' enhancements and reducing the amount of code. I put it in a new issue, as I'm not sure if the new patch is a feature or still just a bug fix. http://drupal.org/node/190374

If 190374 is still good for d6, then close this in favor of that. Otherwise we can push 190374 off to d7.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs work

I think we can fold in all code improvements without adding the word boundary feature (ie. the third breakpoints group). Otherwise your patch in the other issue looks better.

Bevan’s picture

Assigned: Unassigned » Bevan

Okay. So I'm gonna reroll 190374 as a feature-less bug fix (removing the third breakpoints and the sentence-in-parenthesis cases) and upload here. I'll push 190374 off to d7...

Thanks for your time Gabor. Your comments and contributions are MASSIVELY appreciated! :)

Bevan’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
3.41 KB

So this is 190374 rerolled without the features of words and sentences in parenthesis. I tested 190374 very extensively. The changes here are only 'configuration' changes due to the new simpler and more flexible structure. So I didn't test this one as much.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs work

I don't think the second $break_points[] needs to be on multiple lines. Also we modified enough that this needs some independent (ie. not bevan) testing again. So once this coding style error is fixed, set this to "for review", and let's see what people find.

Bevan’s picture

Status: Needs work » Needs review
FileSize
3.09 KB

Ah, yes. That was leftover from 190374's features.

STyL3’s picture

subscribe

Gábor Hojtsy’s picture

Status: Needs review » Needs work

Phew, not a unified patch. :(

Bevan’s picture

FileSize
3.33 KB

doh

Bevan’s picture

Status: Needs work » Needs review
gpk’s picture

Status: Needs review » Needs work

Hi Bevan and thanks for picking up the problematic node_teaser() function which has been causing angst for a long time!

I ran some tests (output is below) and unfortunately there are a couple of new issues.

The first is that in some cases an empty teaser is returned. This is because the case $position == 0 (in the unpatched version) is no longer handled.

The second is that the patched version is much more likely to return invalid HTML. This is because in the original version it prioritised <p> over <br> and more to the point over an embedded line break - in other words, it "tried" to return a complete <p> element.

This also means that the behaviour of the function has changed somewhat when handling the different types of line break as it no longer prioritises them in the order specified.

On the plus side, the code is now much better commented. On more subjective matters, perhaps variable names could be improved further? Depending on which variable it is being used in the tem "length" seems to mean either the length of a string, a position in a string (from the RH end), or the size of a chunk to truncate from the RH end. I'd suggest using something like $rpos instead of $length, and $rtrunc instead of $min_length.

I know you've already taken out word breaks from this but I wonder if it would be best for now to go back to something like one of your earlier patches which would I think fix the original problem without introducing new issues, and then pick up any outstanding functionality we want in a separate issue?

Sorry to be the bearer of bad news! Anyway, here's the PHP code I used to test the function:

<pre>
<?php
$a = "<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p>";

// Print out $a
print htmlspecialchars($a, ENT_QUOTES) . "\n\n";
// Print out $a with embedded newlines shown as \n
print htmlspecialchars(str_replace("\n", '\n', $a), ENT_QUOTES) . ' (length: ' . strlen($a) . ")\n\n";

// Test node_teaser() for different values of $size
for ($size = 37; $size >= 0; $size--) {
  $b = node_teaser($a, NULL, $size);
  print htmlspecialchars(str_replace("\n", '\n', $b), ENT_QUOTES) . ' (size: ' . $size . ', actual length: ' . strlen($b) . ")\n";
}
?>
</pre>

Results from running this after patch 12:

<p>
Hi
</p>
<p>
folks
<br />
!
</p>

<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p> (length: 35)

<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p> (size: 37, actual length: 35)
<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p> (size: 36, actual length: 35)
 (size: 35, actual length: 0)
<p>\nHi\n</p>\n<p>\nfolks\n<br />\n! (size: 34, actual length: 30)
<p>\nHi\n</p>\n<p>\nfolks\n<br />\n! (size: 33, actual length: 30)
<p>\nHi\n</p>\n<p>\nfolks\n<br />\n! (size: 32, actual length: 30)
<p>\nHi\n</p>\n<p>\nfolks\n<br />\n! (size: 31, actual length: 30)
<p>\nHi\n</p>\n<p>\nfolks\n<br /> (size: 30, actual length: 28)
<p>\nHi\n</p>\n<p>\nfolks\n<br /> (size: 29, actual length: 28)
<p>\nHi\n</p>\n<p>\nfolks\n (size: 28, actual length: 22)
<p>\nHi\n</p>\n<p>\nfolks (size: 27, actual length: 21)
<p>\nHi\n</p>\n<p>\nfolks (size: 26, actual length: 21)
<p>\nHi\n</p>\n<p>\nfolks (size: 25, actual length: 21)
<p>\nHi\n</p>\n<p>\nfolks (size: 24, actual length: 21)
<p>\nHi\n</p>\n<p>\nfolks (size: 23, actual length: 21)
<p>\nHi\n</p>\n<p>\nfolks (size: 22, actual length: 21)
<p>\nHi\n</p>\n<p> (size: 21, actual length: 15)
<p>\nHi\n</p>\n<p> (size: 20, actual length: 15)
<p>\nHi\n</p>\n<p> (size: 19, actual length: 15)
<p>\nHi\n</p>\n<p> (size: 18, actual length: 15)
<p>\nHi\n</p>\n<p> (size: 17, actual length: 15)
<p>\nHi\n</p>\n<p> (size: 16, actual length: 15)
<p>\nHi\n</p> (size: 15, actual length: 11)
<p>\nHi\n</p> (size: 14, actual length: 11)
<p>\nHi\n</p> (size: 13, actual length: 11)
<p>\nHi\n</p> (size: 12, actual length: 11)
 (size: 11, actual length: 0)
<p>\nHi (size: 10, actual length: 6)
<p>\nHi (size: 9, actual length: 6)
<p>\nHi (size: 8, actual length: 6)
<p>\nHi (size: 7, actual length: 6)
<p> (size: 6, actual length: 3)
<p> (size: 5, actual length: 3)
<p> (size: 4, actual length: 3)
<p> (size: 3, actual length: 3)
<p (size: 2, actual length: 2)
< (size: 1, actual length: 1)
<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p> (size: 0, actual length: 35)

And the test results for the unpatched node_teaser():

<p>
Hi
</p>
<p>
folks
<br />
!
</p>

<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p> (length: 35)

<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p> (size: 37, actual length: 35)
<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p> (size: 36, actual length: 35)
<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p> (size: 35, actual length: 35)
<p>\nHi\n</p> (size: 34, actual length: 11)
<p>\nHi\n</p> (size: 33, actual length: 11)
<p>\nHi\n</p> (size: 32, actual length: 11)
<p>\nHi\n</p> (size: 31, actual length: 11)
<p>\nHi\n</p> (size: 30, actual length: 11)
<p>\nHi\n</p> (size: 29, actual length: 11)
<p>\nHi\n</p> (size: 28, actual length: 11)
<p>\nHi\n</p> (size: 27, actual length: 11)
<p>\nHi\n</p> (size: 26, actual length: 11)
<p>\nHi\n</p> (size: 25, actual length: 11)
<p>\nHi\n</p> (size: 24, actual length: 11)
<p>\nHi\n</p> (size: 23, actual length: 11)
<p>\nHi\n</p> (size: 22, actual length: 11)
<p>\nHi\n</p> (size: 21, actual length: 11)
<p>\nHi\n</p> (size: 20, actual length: 11)
<p>\nHi\n</p> (size: 19, actual length: 11)
<p>\nHi\n</p> (size: 18, actual length: 11)
<p>\nHi\n</p> (size: 17, actual length: 11)
<p>\nHi\n</p> (size: 16, actual length: 11)
<p>\nHi\n</p> (size: 15, actual length: 11)
<p>\nHi\n</p> (size: 14, actual length: 11)
<p>\nHi\n</p> (size: 13, actual length: 11)
<p>\nHi\n</p> (size: 12, actual length: 11)
<p>\nHi\n</p> (size: 11, actual length: 11)
<p>\nHi (size: 10, actual length: 6)
<p>\nHi (size: 9, actual length: 6)
<p>\nHi (size: 8, actual length: 6)
<p>\nHi (size: 7, actual length: 6)
<p> (size: 6, actual length: 3)
<p> (size: 5, actual length: 3)
<p> (size: 4, actual length: 3)
<p> (size: 3, actual length: 3)
<p (size: 2, actual length: 2)
< (size: 1, actual length: 1)
<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p> (size: 0, actual length: 35)
Bevan’s picture

Thanks for the excellent review and your test code! Kudos to you.

If we want to go with a simple patch that fixes the issue and doesn't improve readability for future editors / readers, then probably #1 or maybe #17 is the way to go. However I'll rework the patch to 'prefer' paragraph breaks.

In the latest patches I've worked this into a format that completely separates breakpoints from code that handles them in such a way that breakpoints can be grouped by preference. So this should just be a matter of regrouping them.

This format will also allow for additional breakpoints to be added and for breakpoints to be customized in the future, which may prove important for i18n.

Bevan’s picture

Status: Needs work » Needs review
FileSize
3.47 KB

And here it is. I renamed length, min_length and max_length to rpos, min_rpos, and max_rpos. I agree 'rpos' makes more sense than 'length', since it is the index from the RHS where the teaser will be sliced, not the length of the teaser. I don't think 'rtrunc' makes anything clearer.

I fixed the case ($min_rpos === 0). I had misunderstood what ($position == 0) was handling in the original.

I left <br> <br /> and \n in the same preference-group as they are all equivalent as far as breaking up text is concerned and make no difference to markup-validity.

The current node_teaser() can also cut a teaser off in the middle of a <p>, but as you pointed out I had made this more likely with my previous patch. I think there is an html corrector filter in drupal core now that should help alleviate that issue.

Btw<p> with no </p> is valid html, it's not valid xhtml.

The output using your test code with this patch is;

<p>
Hi
</p>
<p>
folks
<br />
!
</p>

<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p> (length: 35)

<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p> (size: 37, actual length: 35)
<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p> (size: 36, actual length: 35)
<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p> (size: 35, actual length: 35)
<p>\nHi\n</p> (size: 34, actual length: 11)
<p>\nHi\n</p> (size: 33, actual length: 11)
<p>\nHi\n</p> (size: 32, actual length: 11)
<p>\nHi\n</p> (size: 31, actual length: 11)
<p>\nHi\n</p> (size: 30, actual length: 11)
<p>\nHi\n</p> (size: 29, actual length: 11)
<p>\nHi\n</p> (size: 28, actual length: 11)
<p>\nHi\n</p> (size: 27, actual length: 11)
<p>\nHi\n</p> (size: 26, actual length: 11)
<p>\nHi\n</p> (size: 25, actual length: 11)
<p>\nHi\n</p> (size: 24, actual length: 11)
<p>\nHi\n</p> (size: 23, actual length: 11)
<p>\nHi\n</p> (size: 22, actual length: 11)
<p>\nHi\n</p> (size: 21, actual length: 11)
<p>\nHi\n</p> (size: 20, actual length: 11)
<p>\nHi\n</p> (size: 19, actual length: 11)
<p>\nHi\n</p> (size: 18, actual length: 11)
<p>\nHi\n</p> (size: 17, actual length: 11)
<p>\nHi\n</p> (size: 16, actual length: 11)
<p>\nHi\n</p> (size: 15, actual length: 11)
<p>\nHi\n</p> (size: 14, actual length: 11)
<p>\nHi\n</p> (size: 13, actual length: 11)
<p>\nHi\n</p> (size: 12, actual length: 11)
<p>\nHi\n</p> (size: 11, actual length: 11)
<p>\nHi (size: 10, actual length: 6)
<p>\nHi (size: 9, actual length: 6)
<p>\nHi (size: 8, actual length: 6)
<p>\nHi (size: 7, actual length: 6)
<p> (size: 6, actual length: 3)
<p> (size: 5, actual length: 3)
<p> (size: 4, actual length: 3)
<p> (size: 3, actual length: 3)
<p (size: 2, actual length: 2)
< (size: 1, actual length: 1)
<p>\nHi\n</p>\n<p>\nfolks\n<br />\n!\n</p> (size: 0, actual length: 35)
gpk’s picture

Status: Needs review » Needs work

OK great stuff, will try to have a proper look at this tomorrow my time (GMT-London).

A thought - a newline should only be interpreted as a line break if the input format includes the line break filter. Would it be appropriate to include a check for this in this patch? Would fix most of the problems with null teasers being returned (as per related issues). Frequently null teasers are caused by editors like TinyMCE returning HTML in the format I used in my test, with the initial paragraph longer than the max teaser length. (In such cases the line break filter should actually have been turned off.) This corresponds to the case $size = 6/5/4/3 in my not very realistic test example (except that in a real case there would usually be a sentence within the teaser size limit where a good break could be made). In point of fact $size = 6 ought to return the same as $size = 7. Not quite sure why it doesn't ... need fresh brain ... Not really worth worrying about though.

Spotted another problem - the PHP filter no longer has key 'filter/1'. This one not of your making ;-) Possibly warrants a critical Priority?

Also a comment near the top refers to word breaks (boundaries) ...

Also (2) what do you think about incorporating http://drupal.org/node/115912? Will probably never happen otherwise as it's so trivial.

Yes it looks as though the new HTML corrector filter should fix typical problems with incomplete elements. Therefore it was perhaps not strictly necessary to fix the invalid HTML after all (!), though in my opinion we have a much better solution than relying on the corrector. With the latest patch it does of course also behave more like the original function.

Yes indeed additional breakpoints could be added as a new feature in the future - such as ... all the block elements ..?! To be fair though with the improved teaser handling on the node edit form the need for automatic teaser generation is probably reducing. Will be interesting to see to what extent the WYSYWYG editors integrate with this.

Bevan’s picture

Assigned: Bevan » Unassigned
Status: Needs work » Needs review
FileSize
3.47 KB

Pending your re-testing I think this is probably RTBC. WDYT?

> a newline should only be interpreted as a line break if the input format includes the line break filter. Would it be appropriate to include a check for this in this patch?

This would be a feature, and should wait till d7. We could roll that up as part of 190374.

> Spotted another problem - the PHP filter no longer has key 'filter/1'. This one not of your making ;-) Possibly warrants a critical Priority?

This should go into a new issue for d6 dev. I agree that it's 'critical'.

> Also a comment near the top refers to word breaks (boundaries) ...

Fixed in this patch.

> Also (2) what do you think about incorporating http://drupal.org/node/115912? Will probably never happen otherwise as it's so trivial.

Never say never! :) It's not dealing with the same issue as this patch/issue is. I'm sure someone will get around to committing spatz4000's patch sometime...

> Therefore it was perhaps not strictly necessary to fix the invalid HTML after all

Perhaps not, but I think the issue of preferring <p> over other line breaks is important.

> additional breakpoints could be added as a new feature in the future - such as ... all the block elements ..?!

Maybe not all, but certainly divs, uls, ols, blockquotes and others would be useful. Again, that'd be for d7.

> the need for automatic teaser generation is probably reducing

Yes. I noticed this widget after starting on this patch.

gpk’s picture

Status: Needs review » Reviewed & tested by the community

OK further testing has not revealed any more problems with your patch. Have also tested multibyte content, and examined code in this regard - all looks OK.

> a newline should only be interpreted as a line break if the input format includes the line break filter. Would it be appropriate to include a check for this in this patch?

This would be a feature, and should wait till d7. We could roll that up as part of 190374.

I meant to say, this is actually the bug reported in http://drupal.org/node/155337 - which is where the fix belongs. :-)

Bevan’s picture

Thanks so much for the testing Giles! :) Can I review a patch of yours in exchange?

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Fixed

Thanks for the continued work and testing guys. Committed.

Bevan’s picture

Thanks Gabor! At last this one can get some rest! :)

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.