Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
print node_teaser("A question? A sentence. Another sentence.", 0, 30)
returns "A question?" instead of "A question? A sentence."
Comment | File | Size | Author |
---|---|---|---|
#32 | node_teaser.patch | 3.47 KB | Bevan |
#30 | node_teaser.patch | 3.47 KB | Bevan |
#26 | node_teaser.patch | 3.33 KB | Bevan |
#23 | node_teaser.patch | 3.09 KB | Bevan |
#21 | node_teaser.patch | 3.41 KB | Bevan |
Comments
Comment #1
Bevan CreditAttribution: Bevan commentedA simple fix for this is
Which makes me think that this was a simple oversight that never got discovered or fixed till now. However further evaluation of the code led me to find other areas that could be improved, leading to this patch
Comment #2
Bevan CreditAttribution: Bevan commentedDuplicates; http://drupal.org/node/155337 http://drupal.org/node/165534
Related: http://drupal.org/node/108456 http://drupal.org/node/115912 old; http://drupal.org/node/73910
Comment #3
yched CreditAttribution: yched commentedInteresting. I also encountered strange teaser cuts.
But you have to create your patches in the format expected on drupal.org if you want them to get reviewed (not to mention applied :-) : http://drupal.org/patch/create
Comment #4
Bevan CreditAttribution: Bevan commentedOh bugger. This again. I hope this is the right format.
Comment #5
Owen Barton CreditAttribution: Owen Barton commentedThis looks good to me. Certainly the $position = 0 - $length - $offset; bug should be fixed in Drupal 6.
It would be good to get some testers (especially multilingual) to try it out and use retease module to regenerate all their site teasers (on a dev site, of course) to verify that this is still playing nicely with all the I18n edge cases.
In terms of code style, the comments should wrap at 80 columns.
I think having a separate $breakpoints array (as it was before) was neater, but that if very subjective ;)
Comment #6
Bevan CreditAttribution: Bevan commentedThanks for reviewing Owen. bump
Comment #7
seanrWorks here as far as I can tell.
Comment #8
Bevan CreditAttribution: Bevan commentedHmmm, RTBC?
Comment #9
steinmb CreditAttribution: steinmb commentedTested on nightly build 3.nov 2007. That patch works just fine. RTBC.
--
Stein Magne
Comment #10
Gábor HojtsyCoding style concerns were not resolved.
Comment #11
Bevan CreditAttribution: Bevan commentedI addressed the two issues grugnog raised in #5; length of comment lines and declaring breakpoints in arrays outside of the foreach clause. I can't find any other issues using coder module or by skimming through coding standards http://drupal.org/node/318.
Comment #12
Gábor HojtsySeems like a little bit of optimization could still be possible with reusing $size at the end instead of the strlen($reverse). $size contains the same value at that point still as it seems. Also add a comment there, that
// If a better breakpoint is found before the end of the maximum teaser length, break there.
Or something along these lines, so we remember what $size is. (Or maybe better to rename $size to $max_length, and leave $size alone, not changing its meaning).Comment #13
Bevan CreditAttribution: Bevan commentedAh, good spotting. I also implemented a === operator as these should both be integers. I think this communicates that more clearly to any future coders. I did a more extended round of testing on this one to be sure it doens't break anything.
Comment #14
Bevan CreditAttribution: Bevan commentedAh, good spotting. I also implemented a === operator as these should both be integers. I think this communicates that more clearly to anyone reading this code. I did a more extended round of testing on this one to be sure it doesn't break anything.
Comment #15
Bevan CreditAttribution: Bevan commentedAh, good spotting. I also implemented a === operator as these should both be integers. I think this communicates that more clearly to any future coders. I did a more extended round of testing on this one to be sure it doesn't break anything.
This is the third time I've tried to upload the patch. I'm leaving the attachment off this time...
Comment #16
Gábor HojtsyAs I said, using $max_length would be so much cleaner, instead of redefining what $size means. Also the return at the end of function could make use of this $max_length.
Comment #17
Bevan CreditAttribution: Bevan commentedOoops. I hadn't understood all of your previous comment. I implemented $max_length and left $size unchanged. I agree it is tidier that way and easier to understand. I also reevaluated all the comments in this patch and updated to make them a little clearer and consistent.
Sorry for the duplicate posts. This new comment system takes several minutes to display new posts, it seems.
Comment #18
Bevan CreditAttribution: Bevan commentedI rewrote this patch with some minor 'intelligence' enhancements and reducing the amount of code. I put it in a new issue, as I'm not sure if the new patch is a feature or still just a bug fix. http://drupal.org/node/190374
If 190374 is still good for d6, then close this in favor of that. Otherwise we can push 190374 off to d7.
Comment #19
Gábor HojtsyI think we can fold in all code improvements without adding the word boundary feature (ie. the third breakpoints group). Otherwise your patch in the other issue looks better.
Comment #20
Bevan CreditAttribution: Bevan commentedOkay. So I'm gonna reroll 190374 as a feature-less bug fix (removing the third breakpoints and the sentence-in-parenthesis cases) and upload here. I'll push 190374 off to d7...
Thanks for your time Gabor. Your comments and contributions are MASSIVELY appreciated! :)
Comment #21
Bevan CreditAttribution: Bevan commentedSo this is 190374 rerolled without the features of words and sentences in parenthesis. I tested 190374 very extensively. The changes here are only 'configuration' changes due to the new simpler and more flexible structure. So I didn't test this one as much.
Comment #22
Gábor HojtsyI don't think the second $break_points[] needs to be on multiple lines. Also we modified enough that this needs some independent (ie. not bevan) testing again. So once this coding style error is fixed, set this to "for review", and let's see what people find.
Comment #23
Bevan CreditAttribution: Bevan commentedAh, yes. That was leftover from 190374's features.
Comment #24
STyL3 CreditAttribution: STyL3 commentedsubscribe
Comment #25
Gábor HojtsyPhew, not a unified patch. :(
Comment #26
Bevan CreditAttribution: Bevan commenteddoh
Comment #27
Bevan CreditAttribution: Bevan commentedComment #28
gpk CreditAttribution: gpk commentedHi Bevan and thanks for picking up the problematic node_teaser() function which has been causing angst for a long time!
I ran some tests (output is below) and unfortunately there are a couple of new issues.
The first is that in some cases an empty teaser is returned. This is because the case $position == 0 (in the unpatched version) is no longer handled.
The second is that the patched version is much more likely to return invalid HTML. This is because in the original version it prioritised <p> over <br> and more to the point over an embedded line break - in other words, it "tried" to return a complete <p> element.
This also means that the behaviour of the function has changed somewhat when handling the different types of line break as it no longer prioritises them in the order specified.
On the plus side, the code is now much better commented. On more subjective matters, perhaps variable names could be improved further? Depending on which variable it is being used in the tem "length" seems to mean either the length of a string, a position in a string (from the RH end), or the size of a chunk to truncate from the RH end. I'd suggest using something like $rpos instead of $length, and $rtrunc instead of $min_length.
I know you've already taken out word breaks from this but I wonder if it would be best for now to go back to something like one of your earlier patches which would I think fix the original problem without introducing new issues, and then pick up any outstanding functionality we want in a separate issue?
Sorry to be the bearer of bad news! Anyway, here's the PHP code I used to test the function:
Results from running this after patch 12:
And the test results for the unpatched node_teaser():
Comment #29
Bevan CreditAttribution: Bevan commentedThanks for the excellent review and your test code! Kudos to you.
If we want to go with a simple patch that fixes the issue and doesn't improve readability for future editors / readers, then probably #1 or maybe #17 is the way to go. However I'll rework the patch to 'prefer' paragraph breaks.
In the latest patches I've worked this into a format that completely separates breakpoints from code that handles them in such a way that breakpoints can be grouped by preference. So this should just be a matter of regrouping them.
This format will also allow for additional breakpoints to be added and for breakpoints to be customized in the future, which may prove important for i18n.
Comment #30
Bevan CreditAttribution: Bevan commentedAnd here it is. I renamed length, min_length and max_length to rpos, min_rpos, and max_rpos. I agree 'rpos' makes more sense than 'length', since it is the index from the RHS where the teaser will be sliced, not the length of the teaser. I don't think 'rtrunc' makes anything clearer.
I fixed the case
($min_rpos === 0)
. I had misunderstood what($position == 0)
was handling in the original.I left
<br>
<br />
and\n
in the same preference-group as they are all equivalent as far as breaking up text is concerned and make no difference to markup-validity.The current
node_teaser()
can also cut a teaser off in the middle of a<p>
, but as you pointed out I had made this more likely with my previous patch. I think there is an html corrector filter in drupal core now that should help alleviate that issue.Btw
<p>
with no</p>
is valid html, it's not valid xhtml.The output using your test code with this patch is;
Comment #31
gpk CreditAttribution: gpk commentedOK great stuff, will try to have a proper look at this tomorrow my time (GMT-London).
A thought - a newline should only be interpreted as a line break if the input format includes the line break filter. Would it be appropriate to include a check for this in this patch? Would fix most of the problems with null teasers being returned (as per related issues). Frequently null teasers are caused by editors like TinyMCE returning HTML in the format I used in my test, with the initial paragraph longer than the max teaser length. (In such cases the line break filter should actually have been turned off.) This corresponds to the case $size = 6/5/4/3 in my not very realistic test example (except that in a real case there would usually be a sentence within the teaser size limit where a good break could be made). In point of fact $size = 6 ought to return the same as $size = 7. Not quite sure why it doesn't ... need fresh brain ... Not really worth worrying about though.
Spotted another problem - the PHP filter no longer has key 'filter/1'. This one not of your making ;-) Possibly warrants a critical Priority?
Also a comment near the top refers to word breaks (boundaries) ...
Also (2) what do you think about incorporating http://drupal.org/node/115912? Will probably never happen otherwise as it's so trivial.
Yes it looks as though the new HTML corrector filter should fix typical problems with incomplete elements. Therefore it was perhaps not strictly necessary to fix the invalid HTML after all (!), though in my opinion we have a much better solution than relying on the corrector. With the latest patch it does of course also behave more like the original function.
Yes indeed additional breakpoints could be added as a new feature in the future - such as ... all the block elements ..?! To be fair though with the improved teaser handling on the node edit form the need for automatic teaser generation is probably reducing. Will be interesting to see to what extent the WYSYWYG editors integrate with this.
Comment #32
Bevan CreditAttribution: Bevan commentedPending your re-testing I think this is probably RTBC. WDYT?
> a newline should only be interpreted as a line break if the input format includes the line break filter. Would it be appropriate to include a check for this in this patch?
This would be a feature, and should wait till d7. We could roll that up as part of 190374.
> Spotted another problem - the PHP filter no longer has key 'filter/1'. This one not of your making ;-) Possibly warrants a critical Priority?
This should go into a new issue for d6 dev. I agree that it's 'critical'.
> Also a comment near the top refers to word breaks (boundaries) ...
Fixed in this patch.
> Also (2) what do you think about incorporating http://drupal.org/node/115912? Will probably never happen otherwise as it's so trivial.
Never say never! :) It's not dealing with the same issue as this patch/issue is. I'm sure someone will get around to committing spatz4000's patch sometime...
> Therefore it was perhaps not strictly necessary to fix the invalid HTML after all
Perhaps not, but I think the issue of preferring
<p>
over other line breaks is important.> additional breakpoints could be added as a new feature in the future - such as ... all the block elements ..?!
Maybe not all, but certainly divs, uls, ols, blockquotes and others would be useful. Again, that'd be for d7.
> the need for automatic teaser generation is probably reducing
Yes. I noticed this widget after starting on this patch.
Comment #33
gpk CreditAttribution: gpk commentedOK further testing has not revealed any more problems with your patch. Have also tested multibyte content, and examined code in this regard - all looks OK.
I meant to say, this is actually the bug reported in http://drupal.org/node/155337 - which is where the fix belongs. :-)
Comment #34
Bevan CreditAttribution: Bevan commentedThanks so much for the testing Giles! :) Can I review a patch of yours in exchange?
Comment #35
Gábor HojtsyThanks for the continued work and testing guys. Committed.
Comment #36
Bevan CreditAttribution: Bevan commentedThanks Gabor! At last this one can get some rest! :)
Comment #37
(not verified) CreditAttribution: commentedAutomatically closed -- issue fixed for two weeks with no activity.