Add a way to truncate HTML strings without counting or damaging HTML elements (and use it in Views) - Html::truncate() [#2279655]

Comment	File	Size	Author
#21	Screen Shot 2014-07-20 at 1.06.51 pm.png	106.96 KB	thedavidmeister

Comment #1

thedavidmeister commented 4 June 2014 at 13:11

Title:

Views' FieldPluginBase::trimText() and Unicode::truncate() should be functionally identical.

» Views' FieldPluginBase::trimText() should use Unicode::truncate() and/or a new Html::truncate()

Log in or register to post comments

Comment #3

thedavidmeister commented 5 June 2014 at 14:12

Title:

Views' FieldPluginBase::trimText() should use Unicode::truncate() and/or a new Html::truncate()

» Add a way to truncate HTML strings without counting or damaging HTML elements (and use it in Views) - Html::truncate()

So, Views isn't great because, unless I've read the function wrong, truncating the following to 10 characters:

<strong>foo bar baz foo</strong>

Gives:

<strong>fo</strong>

Which is technically correct I suppose, but I'm sure not many people's *intention* when dealing with HTML.

Woah http://alanwhipple.com/2011/05/25/php-truncate-string-preserving-html-ta...

Log in or register to post comments

Comment #4

thedavidmeister commented 5 June 2014 at 15:48

From IRC:

https://ideone.com/5Gbe5J
https://ideone.com/u0cd9c
https://ideone.com/gm7Zgb
https://ideone.com/0fZUBk

Log in or register to post comments

Comment #5

mgifford

he/him

English

commented 6 June 2014 at 04:43

Issue tags:

+typography

You've made a good case for merging the two. Interesting link to http://alanwhipple.com

Log in or register to post comments

Comment #6

thedavidmeister commented 7 June 2014 at 06:18

more reading - http://www.pjgalbraith.com/2011/11/truncating-text-html-with-php/

Log in or register to post comments

Comment #7

thedavidmeister commented 7 June 2014 at 07:57

Category:

Task

» Feature request

This is sort of a feature request... I guess...

Log in or register to post comments

Comment #8

thedavidmeister commented 7 June 2014 at 08:39

Category:

Feature request

» Bug report

More things wrong with the Views approach... Because of the regex being run, an existing malformed HTML entity inside a chunk of text will cause the whole thing to be chopped right back to the entity.

sometext &nbsp more text

becomes:

sometext

This is also sort of a bug report because Views seems pretty buggy (or at least naive) at the moment.

Log in or register to post comments

Comment #9

mgifford

he/him

English

commented 23 June 2014 at 15:30

Component:

markup

» views.module

I think this has to be resolved by the Views folks.

Log in or register to post comments

Comment #10

thedavidmeister commented 28 June 2014 at 02:36

Component:

views.module

» markup

No, I don't think this should be something in Views!

This functionality should be lower level than that, and Views should simply use it.

I suspect that the only reason it was ever in Views in the first place is because Core has never provided something decent to achieve this totally normal functionality.

Log in or register to post comments

Comment #11

mgifford

he/him

English

commented 10 July 2014 at 12:48

Are you going to have time to write up a patch with Html::truncate() & FieldPluginBase::trimText()?

Log in or register to post comments

Comment #12

thedavidmeister commented 10 July 2014 at 13:57

I've been thinking about it on and off. Not sure the best way yet...

I put the start of a sandbox up at https://github.com/thedavidmeister/html_truncate_sandbox

The bit I'm wondering about atm:

Say you have 'foo bar' and you want to truncate it at 5 characters, what you want to see is "foo b" without word safe, and "foo" with word safe.

If we move the cursor to "foo&n", which is 5 characters, we can't easily know that "&n" is actually the start of   (which we'd count as one character).. this messes with our counting.

This is basically where I assume the Views people got to, which led to what I was complaining about in #8

Log in or register to post comments

Comment #13

mgifford

he/him

English

commented 10 July 2014 at 14:42

Nice to have this here https://github.com/thedavidmeister/html_truncate_sandbox/blob/master/src...

Would something like this account for the character entities?

        if (!$wordsafe) {
          $delta = $counter - $maxlength;
          $fragment = mb_substr($token, 0, $delta);
          $newtext[] = $fragment;
        }
        elseif (strpos($str, '&') === TRUE) {
          $maxlength = $maxlength + 4;
        }

We just need a bit of an extra buffer in the function to accommodate for the entities, right?

Log in or register to post comments

Comment #14

thedavidmeister commented 11 July 2014 at 05:32

well not exactly that, because some html entities are longer than that, like ¤.

I was actually thinking using get_html_translation_table(), then getting the length of the longest entity from that and doing something similar to what you suggested.

Log in or register to post comments

Comment #15

mgifford

he/him

English

commented 11 July 2014 at 12:09

I've been trying to think of an elegant way to use PHP's get_html_translation_table, but I'm coming up short. We first need to determine if there is a "&" within the first few characters being truncated. We'd then need to isolate that html entity to determine how long it is. Finally we'd adjust the maxsize to account for that.

What about if in every string we just used html_entity_decode to convert them to single characters, then we calculated the maxsize, before finally adding back in the entities with htmlentities.

I do worry about performance for doing this type of check, although it would have already been fairly well optimized in PHP I would assume.

Log in or register to post comments

Comment #16

thedavidmeister commented 11 July 2014 at 14:25

What about if in every string we just used html_entity_decode to convert them to single characters, then we calculated the maxsize, before finally adding back in the entities with htmlentities.

That could potentially work, would have to write some more tests to see if we can break that.

Log in or register to post comments

Comment #17

damien tournoud commented 11 July 2014 at 14:38

If you know that you don't have any HTML tags in the input, just convert to plaintext, do the truncation there and convert back to HTML.

If you have any HTML tags in the input... all bets are off and good luck with that.

Log in or register to post comments

Comment #18

thedavidmeister commented 12 July 2014 at 15:38

If you have any HTML tags in the input...

Well that's exactly what Views claims (and has claimed for years) to handle.

Log in or register to post comments

Comment #19

damien tournoud commented 12 July 2014 at 20:56

@thedavidmeister: I don't see any grand claim in the current implementation. It really just tries to not truncate in the middle of an HTML entity, but that's really about it. If you pass it anything with tags, it's going to mess it up pretty nicely.

I stand by #17: there is only one case truncating HTML is doable, and it's when there is no tags whatsoever. In that case, just convert to plaintext, do the truncation there and convert back to HTML.

If there are any tags, it's basically anyone's guess what the proper behavior should be. Should the result be a truncation of the *visible* text rendered in the browser? How do you know what that is going to be without knowing the CSS context? What is it reasonable to do with other visible elements (images and stuff)?

So I would recommend to stop pretending that we can remotely handle truncation of arbitrary HTML.

Log in or register to post comments

Comment #20

mgifford

he/him

English

commented 13 July 2014 at 03:24

At the very least can we move the Views truncation functionality to \Drupal\Component\Utility\Html?

Html::truncate() & FieldPluginBase::trimText() seem like useful central functions even if we don't have a solution for arbitrary HTML.

Log in or register to post comments

Comment #21

thedavidmeister commented 20 July 2014 at 03:17

Status	File	Size
new	Screen Shot 2014-07-20 at 1.06.51 pm.png	106.96 KB

No "grand claims" for sure, but from the D7 interface, I'll show you where the confusion comes from, for me:

Trim this field to a maximum length
Enable to trim the field to a maximum length of characters

and also

Field can contain HTML
If checked, HTML corrector will be run to ensure tags are properly closed after trimming.

I certainly expect, after reading this in the UI and not reading the code, the following:

- Views is aware of HTML (not limited to HTML entities, it just says "HTML")
- The characters being counted for determining the maximum for trimming wont include invisible characters inside tags, after all, we've told Views that this is an HTML string and there's no caveats listed in the UI
- Views won't do anything at all to my HTML entities, it didn't mention HTML entities once in the UI, why would it be damaging those?

I stand by #17: there is only one case truncating HTML is doable, and it's when there is no tags whatsoever. In that case, just convert to plaintext, do the truncation there and convert back to HTML.

What's wrong with the DOMDocument approach - using that to normalize the string, then getting the inner text of tags? That looks like it would work to me.

At the very least can we move the Views truncation functionality to \Drupal\Component\Utility\Html?

You're probably right, this issue could benefit from being broken into two parts - improving the organization/centralization of some decent existing functionality, and then improving said functionality.

Log in or register to post comments

Comment #22

20 July 2014 at 03:17

Version:

8.0.x-dev

» 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #23

20 July 2014 at 03:17

Version:

8.1.x-dev

» 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #24

manuel garcia commented 1 October 2016 at 12:29

Log in or register to post comments

Comment #25

1 October 2016 at 12:29

Version:

8.2.x-dev

» 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #26

1 October 2016 at 12:29

Version:

8.3.x-dev

» 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #27

1 October 2016 at 12:29

Version:

8.4.x-dev

» 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #28

1 October 2016 at 12:29

Version:

8.5.x-dev

» 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #29

1 October 2016 at 12:29

Version:

8.6.x-dev

» 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #30

1 October 2016 at 12:29

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #31

1 October 2016 at 12:29

Version:

8.9.x-dev

» 9.2.x-dev

Drupal 8 is end-of-life as of November 17, 2021. There will not be further changes made to Drupal 8. Bugfixes are now made to the 9.3.x and higher branches only. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #32

1 October 2016 at 12:29

Version:

9.2.x-dev

» 9.3.x-dev

Log in or register to post comments

Comment #33

1 October 2016 at 12:29

Version:

9.3.x-dev

» 9.4.x-dev

Drupal 9.3.15 was released on June 1st, 2022 and is the final full bugfix release for the Drupal 9.3.x series. Drupal 9.3.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.4.x-dev branch from now on, and new development or disruptive changes should be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #34

1 October 2016 at 12:29

Version:

9.4.x-dev

» 9.5.x-dev

Drupal 9.4.9 was released on December 7, 2022 and is the final full bugfix release for the Drupal 9.4.x series. Drupal 9.4.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.5.x-dev branch from now on, and new development or disruptive changes should be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #35

larowlan

🇦🇺🏝.au GMT+10

commented 23 February 2023 at 00:13

Category:	Bug report	» Feature request
Issue tags:		+Bug Smash Initiative

Adding a new API is a feature request in my book.

Log in or register to post comments

Comment #36

larowlan

🇦🇺🏝.au GMT+10

commented 23 February 2023 at 00:17

Log in or register to post comments

Comment #37

23 February 2023 at 00:17

Version:

9.5.x-dev

» 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #38

23 February 2023 at 00:17

Version:

11.x-dev

» main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Add a way to truncate HTML strings without counting or damaging HTML elements (and use it in Views) - Html::truncate()

Comments

Related issues

Referenced by