Problem/Motivation

The text_summary function in modules/field/modules/text/text.module will cut off text in the middle of a word if the length of the summary (passed in $size) it is producing is less than the length of the first sentence or paragraph. Also, if there are long sentences then the cut off will not be close to the desired size.

Proposed resolution

Cut at words instead of sentences. This is a problem we solved for Drupal (not 100% standard as the standard is insanely hard to implement but it's good enough): #604002: Poor search support of some Unicode scripts contains the relevant regular expression and #768040: truncate_utf8() only works for latin languages (and drupal_substr has a bug) moved it into unicode.inc so it's available.

Remaining tasks

Use a regular expression to find the last PREG_CLASS_UNICODE_WORD_BOUNDARY plus the few tags being used now, port to D8, write test.

User interface changes

API changes

Data model changes

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

pounard’s picture

Project: Core Library » Drupal core
Version: 7.x-2.x-dev » 7.x-dev
Component: Code » field system

Wrong project issue queue, I'm assigning it to the right one. Please beepy assign the right core version.

beepy’s picture

Thanks for fixing it; sorry about my errors. I don't know what core version to assign; I discovered the bug in an install of 7.8, but the bug still appears to exist in the latest version of 8.x I could find.

longwave’s picture

Status: Active » Needs review
FileSize
701 bytes

Confirm this is an issue with long first paragraphs and short summary trim lengths.

I realise this should be fixed in 8.x and backported, but here's a 7.x patch for testbot to chew on to start with.

Status: Needs review » Needs work

The last submitted patch, 1482174-text-summary-word-break.patch, failed testing.

longwave’s picture

Status: Needs work » Needs review
FileSize
701 bytes

Uploaded the patch with 0 instead of 1, let's try that again (editing text that switches to RTL is hard!)

longwave’s picture

Version: 7.x-dev » 8.x-dev
Status: Needs review » Needs work
Issue tags: +Needs tests, +Needs backport to D7

Changing version, adding tags.

ezheidtmann’s picture

Version: 8.x-dev » 7.x-dev
Status: Needs work » Needs review
FileSize
1.22 KB

Hitting this bug in 7.x; here's one alternate approach for the testbot.

ezheidtmann’s picture

Version: 7.x-dev » 8.x-dev

Passed testbot; bumping back up to 8.x

neRok’s picture

Status: Needs review » Needs work

There is a different - but tightly coupled - issue #1620104: text_summary() returns very small summaries if no stop characters are hit.
I have tested the above patches on drupal 7.22.
I tested the patch in #7, and it did not work for the other issue.
I tested the patch in #5, and it did work for the other issue.

Merging the features of the 2 patches together, I came up with the following code. You would have to read both patches to understand the context of the code (where it goes). I am unable to make a patch at the moment, hopefully someone can my clues out and test the code. I also didnt bother to understand the $break_points array of arrays, but mashed it together in a way that hopefully works for both issues.

  // If the first paragraph is too long, split at the end of a sentence.
  $break_points[] = array('. ' => 1, '! ' => 1, '? ' => 1, '。' => 0, '؟ ' => 1, ' ' => 1);

snip

      $word_breaks = array( ' ' => 0, "\t" => 0);
      // Append ellipsis if we broke a sentence
      if (in_array($point, array_keys($word_breaks))) {
        $summary .= '…';
      }
swentel’s picture

Component: field system » text.module

Moving to text module

hefox’s picture

Issue summary: View changes

#7 works for this specific problem in d7. What's the weird character in 5?

DamienMcKenna’s picture

@hefox: It seems like the code is being mangled slightly by git because the code in the file is correct, it just doesn't convert properly to a git patch for some reason. Maybe that code should be converted to use the chr() command or something?

chx’s picture

This whole function looks like garbage to me. Premature optimization perhaps. Or just too clever. Maybe there was a time when this wanted to produce, I dunno, best summary? but users expect, you know, if they set this to 250 to get a roughly 250 characters long summary, while not cutting in the middle of a word. But what is the code, I do not even. I have PCRE and I am not afraid to use it unlike whoever authored this originally.

There's some truly strange oddity here: for some demented reason we picked "IDEOGRAPHIC FULL STOP" and "ARABIC QUESTION MARK" as sentence ending. These are "Punctuation, Other" and there are 513 such characters in Unicode. Why these two?? I have no idea but in a stable release we shall keep them. Perhaps Drupal 8.1 could add all others Po or perhaps all P which have a neat PCRE in search already... Please find a Drupal 7 patch attached.

Edit: this code is definitely prehistory. These Unicode characters were added in 2004 (back then this was in node_teaser), breaking on various HTML tags were added in late 2002. I think we can safely ignore whatever the intent was then.

dawehner’s picture

Status: Needs work » Needs review

What does the bot say!

Status: Needs review » Needs work

The last submitted patch, 13: 1482178_13.patch, failed testing.

longwave’s picture

Version: 8.0.x-dev » 7.x-dev
Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 13: 1482178_13.patch, failed testing.

chx’s picture

Version: 7.x-dev » 8.0.x-dev

I believe the policy is 8.0.x first but I didn't bother porting and testing until the community says "yes this is what we want" which can be done rather easily from the D7 patch. Also, testing will be *very* easy for this one ;)

chx’s picture

Still D7 patch just to make sure someone has a clean ground when writing new tests -- the current tests are completely meaningless.

chx’s picture

Issue summary: View changes
chx’s picture

Issue summary: View changes
chx’s picture

I did code this for D8 but I feel it changes text_summary too much. Posted #2661632: Trim to word boundary when using character count for the Smart Trim project.

Wim Leers’s picture

We should simply remove the automatic trimming and summary features from Drupal 8. They are broken by design.

This issue clearly demonstrates how trimming is broken for technical reasons. But it's also broken at a more fundamental level: trimming can never be smart enough (until we have strong AI in Drupal 8 core, which will never ever make sense).

Summaries are also broken, for similar reasons: #2671162: Also use text editor (CKEditor) for "summary" of a text field was mentioned in IRC and sparked a related discussion. See below.

14:31:59 <webflo> swentel: https://www.drupal.org/node/2671162
14:32:01 <Druplicon> https://www.drupal.org/node/2671162 => Initialize CKEditor for summary part of a textfield #2671162: Also use text editor (CKEditor) for "summary" of a text field => 0 comments, 1 IRC mention
14:33:25 <WimLeers> webflo: :(
14:33:32 <WimLeers> webflo: swentel I can't believe we still have that insane split
14:33:54 <WimLeers> webflo: swentel we should just have a separate "summary" field when you want a summary imo
14:34:04 <swentel> that's what I always do tbh
14:34:27 <webflo> WimLeers: swentel yeah makes sense
14:35:04 <WimLeers> I tried to get rid of it in D8, but there were more important things to do, so I didn't spend much time on it
14:35:12 <WimLeers> (same goes for the comment subject field)
14:35:52 <swentel> oh yeah, that one is even worse
14:36:10 <webflo> i wonder if i should try to to make it work with ckeditor or change my migration path.
14:36:19 <chx_afk> urgh summary
14:36:27 <chx_afk> so text_summary makes no bloody sense
14:36:39 <chx_afk> it's one of those old Drupal routines which try to be so bloody smart
14:36:51 <chx_afk> it's surely a great trick
14:36:58 <chx_afk> except it doesnt do whatever the user wants.
14:37:53 <chx_afk> https://www.drupal.org/node/1482178#comment-10808604
14:37:56 <Druplicon> https://www.drupal.org/node/1482178 => text_summary does not break on word boundaries if the first sentence is longer than the length of the summary being produced #1482178: text_summary does not break on word boundaries if the first sentence is longer than the length of the summary being produced => 22 comments, 4 IRC mentions
14:38:00 <chx_afk> rant over, i am going to sleep.
14:38:07 <webflo> WimLeers: on the other hand, teaser break could actually with the awesome ckeditor widgets
14:41:00 <WimLeers> webflo: Yes, for the common case. But what if you try to break in the middle of a <ol> or <ul> or <dl>? Or between two <tbody>s?
14:41:16 <WimLeers> webflo: but yeah, usually that'll work better than what we have today
14:41:57 <WimLeers> webflo: https://www.drupal.org/node/2516038
14:41:59 <Druplicon> https://www.drupal.org/node/2516038 => Add "Teaser Break" button #2516038: Add "Teaser Break" button => 3 comments, 1 IRC mention
14:42:06 <WimLeers> Bojhan said "Nope, this was a usability nightmare."
14:42:28 <WimLeers> We should get rid of automated summaries, whether using <!--break--> or automatic trimming.
14:42:33 <WimLeers> Either way is crappy.
14:42:49 <WimLeers> swentel: webflo ^^
14:43:39 <swentel> agreed
14:43:42 <WimLeers> The *real* problem is that we still kept the fields and widgets for it. We shouldn't have. Because it means people use it, and then they're right: the experience sucks. But it always sucks. We should only let contrib do it
14:43:47 <webflo> the usability of teaser break was crap because it never was integrated well in the editor
14:44:03 <WimLeers> webflo: https://www.drupal.org/project/ckeditor has it
14:44:07 <WimLeers> (in D7)
14:44:25 <webflo> never used ckeditor 4 in d7
14:49:43 <WimLeers> And it's hard to get right, see https://www.drupal.org/node/1217352
14:49:45 <Druplicon> https://www.drupal.org/node/1217352 => Teaser break = unwanted paragraph break #1217352: Teaser break = unwanted paragraph break => 19 comments, 1 IRC mention

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.2.x-dev

Drupal 8 is end-of-life as of November 17, 2021. There will not be further changes made to Drupal 8. Bugfixes are now made to the 9.3.x and higher branches only. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.15 was released on June 1st, 2022 and is the final full bugfix release for the Drupal 9.3.x series. Drupal 9.3.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.4.x-dev branch from now on, and new development or disruptive changes should be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.9 was released on December 7, 2022 and is the final full bugfix release for the Drupal 9.4.x series. Drupal 9.4.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.5.x-dev branch from now on, and new development or disruptive changes should be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

smustgrave’s picture

Status: Needs work » Closed (duplicate)

This appears to be a duplicate of #2835615: Trimmed Formatter no work with html code space. which seems further along.

Will move over credit.