Drupal\Component\Utility\Html::normalize() leaves messy </body></html> in certain situations [#418620]

Problem/Motivation

Under certain circumstances, Drupal\Component\Utility\Html::normalize() (in D7, it's _filter_htmlcorrector()) will add messy </body></html> to the resulting HTML. This happens when the HTML ends in the middle of an attribute, for example:

<p>Here <img alt="ao

This will produce output like:

You can reproduce on Drupal 7 or 8 by following these steps:

Drupal 8

Install Drupal 8.3.1 with the standard profile
Go to /admin/structure/types/manage/article/display/teaser and configure the "Body" filed to trim at 20 characters
Go to /admin/config/content/formats/manage/basic_html and both (a) enable the "Correct faulty and chopped off HTML" filter and (b) disable the "Restrict images to this site" filter (only necessary for the example HTML, not necessary to trigger the bug)
Go to /node/add/article and use this HTML as the body (be sure to click the "Source" button in the WYSIWYG toolbar before pasting it, otherwise you're adding text not HTML):
```
Here <img alt="aoeunhteoas unthoaesn theoausnth oaesntheo asnthoae" src="http://flowjournal.org/wp-content/uploads/2011/12/Im-Not-Here.png"  /> it is
```
Go to /node and observe output like in the screenshot

Drupal 7

Install Drupal 7.54 with the standard profile
Go to /admin/structure/types/manage/article/display/teaser and configure the "Body" filed to trim at 20 characters
Go to /admin/config/content/formats/filtered_html and add <img> to the "Allowed HTML tags" (under "Limit allowed HTML tags")

Go to /node/add/article and use this HTML as the body:

Here <img alt="aoeunhteoas unthoaesn theoausnth oaesntheo asnthoae" src="http://flowjournal.org/wp-content/uploads/2011/12/Im-Not-Here.png"  /> it is

Go to /node and observe output like in the screenshot

Proposed resolution

Remove the potentially-misinterpreted </body></html> closing tags that \Drupal\Component\Utility\Html::load() adds to the end of the text before parsing into the DOM, as they are not required for the HTML soup to be successfully parsed.

By the way, \Drupal\views\Plugin\views\field\FieldPluginBase::trimText() has some code to avoid this exact problem when trimming fields:

      // Remove scraps of HTML entities from the end of a strings
      $value = rtrim(preg_replace('/(?:<(?!.+>)|&(?!.+;)).*$/us', '', $value));

Remaining tasks

Consider backporting patch to Drupal 7 and Drupal 10

User interface changes

None.

API changes

None.

Data model changes

None.

Original summary

_filter_htmlcorrector leaves in fragmentary tags that may be passed in and break the rest of the page. This is most liable to happen when using the "Field can contain HTML" filter in the Views module, but could also occur any other time a developer were to trim a string that contains HTML and pass it to this function.

Example (trimmed to 250 chars):

Lorem ipsum dolor sit amet, consectetur adipiscing elit. <strong>Aliquam posuere enim</strong>. Sed ultrices semper tortor. Pellentesque cenim consectetur. Nulla sed risus eu ipsum venenatis <a class="sample" href="http://www.example.com/partial/path

Output is identical to input, breaking any HTML that follows on the page. Ideal output would be:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. <strong>Aliquam posuere enim</strong>. Sed ultrices semper tortor. Pellentesque cenim consectetur. Nulla sed risus eu ipsum venenatis

Patch attached.

Comment	File	Size	Author
#41	418620-40.patch	2.08 KB	mfb
#41	11.x: PHP 8.2 & MySQL 8 30,367 pass, 2 fail
#35	418620-35.patch	2.21 KB	mfb
#35	11.x: PHP 8.2 & MySQL 8 30,371 pass
#35	interdiff-33-35.txt	584 bytes	mfb
#33	418620-33.patch	1.46 KB	mfb
#33	11.x: PHP 8.2 & MySQL 8 30,358 pass, 1 fail
#33	interdiff-26-33.txt	923 bytes	mfb
#26	418620-26.patch	1.42 KB	smustgrave
#26	10.1.x: PHP 8.1 & MySQL 5.7 29,098 pass, 83 fail 11.x: PHP 8.2 & MySQL 8 30,360 pass
#26	interdiff-21-26.txt	1.68 KB	smustgrave
#21	418620-21.patch	1.37 KB	smustgrave
#21	9.5.x: PHP 8.1 & MySQL 5.7 30,165 pass
#21	418620-21-tests-only.patch	833 bytes	smustgrave
#21	9.5.x: PHP 8.1 & MySQL 5.7 30,155 pass, 1 fail
#9	drupal7-htmlcorrector-tag-fragments-418620-9.patch	990 bytes	dsnopek
#9	7.x: PHP 5.3 & MySQL 5.5 2,033 pass, 1 fail PHP 5.5 & MySQL 5.5 2,034 pass
#5	drupal7-htmlcorrector-tag-fragments-418620-5.patch	537 bytes	dsnopek
#5	7.x: PHP 5.3 & MySQL 5.5 2,033 pass, 1 fail
	_filter_htmlcorrector.patch	541 bytes	greenbeans

#4	Selection_053.png	12.63 KB	dsnopek

Issue fork drupal-418620

Show commands

Start within a Git clone of the project using the version control instructions.

Add & fetch this issue fork’s repository

Or, if you do not have SSH keys set up on git.drupalcode.org:

Add & fetch this issue fork’s repository

418620-drupalcomponentutilityhtmlnormalize-leaves-messy changes, plain diff MR !4930
Check out this branch

About issue forks

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

seaneffel CreditAttribution: seaneffel commented 6 April 2009 at 21:48

Confirming the problem, when selecting the "Field can contain HTML" as on or off, the resulting field always displays clipped HTML tags. Glad someone saw this, I thought it was my bad CSS on the theme layer. Here is a sample of an unclosed a and p tag that clued me in to the problem:

<div class="field-content"><p>This Week in NeighborMedia: a recap of the week's captivating Cambridge stories produced by <a href="http://www.cctvcambridge.org/...</div>

The ellipsis comes from the field setting in views.

I'll try the patch and report back.

Comment #2

vann CreditAttribution: vann commented 22 April 2009 at 15:59

This patch corrects the same problem I was having, however it doesn't correct the problem if the "Trim only on a word boundary" checkbox is checked in Views. Since I don't know what code is executed to trim the text, I can't tell precisely where the code is deciding where the "word boundary" is, but it clearly thinks it's somewhere inside a tag. The result is the trimmed tag is incorrectly displayed as text to the end user.

Here is the input and the incorrect output with a trim length of 500, only on word boundary:

input:

<div class="snap_preview"><br />
<p>Listen to an interview with Gustavo Vilchis on Worldview on Chicago Public Radio!</p>
</div>
<div class="snap_preview"><br />
<p>Listen to an interview with Gustavo Vilchis on Worldview on Chicago Public Radio!</p>
<ul>
    <li><a href="http://www.chicagopublicradio.org/Program_WV.aspx?episode=30344">http://www.chicagopublicradio.org/Program_WV.aspx?episode=30344</a></li>
</ul>
<a rel="nofollow"><img alt="" style="border: 0px;" src="http://feeds.wordpress.com/1.0/comments/teachingrebellion.wordpress.com/353/" /></a> <a rel="nofollow"><img alt="" style="border: 0px;" src="http://feeds.wordpress.com/1.0/delicious/teachingrebellion.wordpress.com/353/" /></a> <a rel="nofollow"><img alt="" style="border: 0px;" src="http://feeds.wordpress.com/1.0/stumble/teachingrebellion.wordpress.com/353/" /></a> <a rel="nofollow"><img alt="" style="border: 0px;" src="http://feeds.wordpress.com/1.0/digg/teachingrebellion.wordpress.com/353/" /></a> <a rel="nofollow"><img alt="" style="border: 0px;" src="http://feeds.wordpress.com/1.0/reddit/teachingrebellion.wordpress.com/353/" /></a> <img alt="" style="border: 0px;" src="http://stats.wordpress.com/b.gif?host=teachingrebellion.wordpress.com&amp;blog=5191692&amp;post=353&amp;subd=teachingrebellion&amp;ref=&amp;feed=1" /></div>

output:

<div class="snap_preview"><br/>
<p>Listen to an interview with Gustavo Vilchis on Worldview on Chicago Public Radio!</p>
<ul>
    <li><a href="http://www.chicagopublicradio.org/Program_WV.aspx?episode=30344" class="views-processed">http://www.chicagopublicradio.org/Program_WV.aspx?episode=30344</a></li>
</ul>
<a rel="nofollow" class="views-processed"><img style="border: 0px none ;" src="http://feeds.wordpress.com/1.0/comments/teachingrebellion.wordpress.com/353/" alt=""/></a> <a rel="nofollow" class="views-processed">img alt="" src="http://feeds.wordpress.com/1.0/</a></div>

Comment #3

multiplextor CreditAttribution: multiplextor commented 8 May 2013 at 15:06

Status:

Needs review

» Closed (won't fix)

Closed. The reason: expired.

Comment #4

dsnopek

he/him

English

USA

CreditAttribution: dsnopek commented 2 May 2017 at 20:05

Title:	_filter_htmlcorrector leaves tag fragments	» _filter_htmlcorrector() leaves messy </body></html> in certain situations
Version:	6.10	» 7.x-dev
Issue summary:	View changes
Status:	Closed (won't fix)	» Active

File	Size
Selection_053.png	12.63 KB

Update issues summary because this problem still exists in Drupal 7 (might be present in Drupal 8 too)

Comment #5

dsnopek

he/him

English

USA

CreditAttribution: dsnopek commented 2 May 2017 at 20:07

Status:

Active

» Needs review

File	Size
drupal7-htmlcorrector-tag-fragments-418620-5.patch	537 bytes
7.x: PHP 5.3 & MySQL 5.5 2,033 pass, 1 fail

1 file was hidden/shown/deleted

File	Size
_filter_htmlcorrector.patch	541 bytes

Here's a Drupal 7 patch that works in my testing (based on Views code)

Comment #6

dsnopek

he/him

English

USA

CreditAttribution: dsnopek commented 2 May 2017 at 20:48

Drupal 8 theoretically has the same issue. In D8, the function is Drupal\Component\Utility\Html::normalize(), which uses pretty much the exact same code as _filter_htmlcorrector(). If I run this code via Devel or drush, it produces the same messy HTML as in Drupal 7:

use Drupal\Component\Utility\Html;
print Html::normalize('<p>Here <img alt="ao');

However, I can't seem to be able to reproduce on Drupal 8 using similar steps as I can on Drupal 7! I'm not sure what's different, but maybe we can backport whatever that is?

Comment #7

2 May 2017 at 20:53

Status:

Needs review

» Needs work

The last submitted patch, 5: drupal7-htmlcorrector-tag-fragments-418620-5.patch, failed testing.

Comment #8

dsnopek

he/him

English

USA

CreditAttribution: dsnopek commented 2 May 2017 at 20:57

Title:	_filter_htmlcorrector() leaves messy </body></html> in certain situations	» Drupal\Component\Utility\Html::normalize() leaves messy </body></html> in certain situations
Version:	7.x-dev	» 8.3.x-dev
Issue summary:	View changes

Aha! The difference is that the "Correct faulty and chopped off HTML" filter isn't enabled on the "Basic HTML" text format by default on Drupal 8. So, I can reproduce on D8 now too!

I've updated the issue summary to reflect this.

Comment #9

dsnopek

he/him

English

USA

CreditAttribution: dsnopek commented 2 May 2017 at 21:23

Status:

Needs work

» Needs review

File	Size
drupal7-htmlcorrector-tag-fragments-418620-9.patch	990 bytes
7.x: PHP 5.3 & MySQL 5.5 2,033 pass, 1 fail PHP 5.5 & MySQL 5.5 2,034 pass

1 file was hidden/shown/deleted

File	Size
drupal7-htmlcorrector-tag-fragments-418620-5.patch	537 bytes
7.x: PHP 5.3 & MySQL 5.5 2,033 pass, 1 fail

Here's an updated Drupal 7 patch which should fix the failed test. It doesn't add new test coverage for this specific bug - which should really be added - for now, it just fixes the existing tests.

Comment #10

2 May 2017 at 21:34

Status:

Needs review

» Needs work

The last submitted patch, 9: drupal7-htmlcorrector-tag-fragments-418620-9.patch, failed testing.

Comment #11

2 May 2017 at 21:34

Version:

8.3.x-dev

» 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #12

2 May 2017 at 21:34

Version:

8.4.x-dev

» 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #13

2 May 2017 at 21:34

Version:

8.5.x-dev

» 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #14

2 May 2017 at 21:34

Version:

8.6.x-dev

» 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Comment #15

2 May 2017 at 21:34

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Comment #16

2 May 2017 at 21:34

Version:

8.9.x-dev

» 9.2.x-dev

Drupal 8 is end-of-life as of November 17, 2021. There will not be further changes made to Drupal 8. Bugfixes are now made to the 9.3.x and higher branches only. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #17

2 May 2017 at 21:34

Version:

9.2.x-dev

» 9.3.x-dev

Comment #18

2 May 2017 at 21:34

Version:

9.3.x-dev

» 9.4.x-dev

Drupal 9.3.15 was released on June 1st, 2022 and is the final full bugfix release for the Drupal 9.3.x series. Drupal 9.3.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.4.x-dev branch from now on, and new development or disruptive changes should be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #19

smustgrave CreditAttribution: smustgrave at Mobomo commented 31 August 2022 at 23:28

Status:

Needs work

» Postponed (maintainer needs more info)

Wondering if this is still an issue in D9,5?

Normalize() seems to be a different from when this ticket was opened.

Comment #20

mfb

they or he

CreditAttribution: mfb commented 31 August 2022 at 23:56

Version:	9.4.x-dev	» 9.5.x-dev
Status:	Postponed (maintainer needs more info)	» Needs work

I followed the instructions in the issue summary to verify, and yes, unfortunately this is still an issue in D9.5 and D10.1

Comment #21

smustgrave CreditAttribution: smustgrave at Mobomo commented 6 September 2022 at 15:38

Status:	Needs work	» Needs review
Issue tags:		+Bug Smash Initiative

File	Size
418620-21-tests-only.patch	833 bytes
9.5.x: PHP 8.1 & MySQL 5.7 30,155 pass, 1 fail
418620-21.patch	1.37 KB
9.5.x: PHP 8.1 & MySQL 5.7 30,165 pass

Not really sold on this solution but the regex using for D7 didn't work on D9.5

But added some tests

Comment #22

6 September 2022 at 16:42

The last submitted patch, 21: 418620-21-tests-only.patch, failed testing. View results

Comment #23

6 September 2022 at 16:42

Version:

9.5.x-dev

» 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #24

smustgrave CreditAttribution: smustgrave at Mobomo commented 27 January 2023 at 19:10

Wonder if ckeditor5 has any bearing on this.

Comment #25

catch

he/him

English

CreditAttribution: catch at Third and Grove commented 28 March 2023 at 14:57

Status:

Needs review

» Needs work

ckeditor5 won't make any difference here. There is #2441811: Upgrade filter system to HTML5 but I don't think that really affects this either.

Added test coverage looks good, I'm also not entirely sure about the string replace but don't really have a better idea either. However if we go that way we should add a comment.

Comment #26

smustgrave CreditAttribution: smustgrave at Mobomo commented 7 April 2023 at 15:27

Status:

Needs work

» Needs review

File	Size
interdiff-21-26.txt	1.68 KB
418620-26.patch	1.42 KB
10.1.x: PHP 8.1 & MySQL 5.7 29,098 pass, 83 fail 11.x: PHP 8.2 & MySQL 8 30,360 pass

File

Size

interdiff-21-26.txt

1.68 KB

418620-26.patch

1.42 KB

10.1.x:
PHP 8.1 & MySQL 5.7 29,098 pass, 83 fail

11.x:
PHP 8.2 & MySQL 8 30,360 pass

2 files were hidden/shown/deleted

File	Size
418620-21-tests-only.patch	833 bytes
9.5.x: PHP 8.1 & MySQL 5.7 30,155 pass, 1 fail
418620-21.patch	1.37 KB
9.5.x: PHP 8.1 & MySQL 5.7 30,165 pass

Added comment

Comment #27

larowlan

🇦🇺🏝.au GMT+10

CreditAttribution: larowlan at PreviousNext commented 10 April 2023 at 23:57

+++ b/core/lib/Drupal/Component/Utility/Html.php
@@ -252,7 +252,9 @@ public static function resetSeenIds() {
+    // Remove any leftover <body> or <html> tags.
+    return str_replace(['&lt;/body&gt;', '&lt;/html&gt;'], '', $serialize);

Super nit, but they're technically 'closing' tags.

What I don't understand is how the </body> and </html> get there - \Drupal\Component\Utility\Html::serialize works by finding the body element and then looping over its child elements, so the wrapping html and body elements shouldn't even be in the picture.

I think we need to do some more forensics here.

Comment #28

mfb

they or he

CreditAttribution: mfb commented 11 April 2023 at 01:20

As far as I understand, the HTML-encoded closing tags come from the load() method, where they are parsed as part of the cut-off attribute, not closing tags, and they get "repaired" (i.e. HTML-encoded) to make them valid as part of the attribute.

Comment #29

borisson_

Dutch

Mechelen, 🇧🇪

CreditAttribution: borisson_ at Calibrate commented 22 April 2023 at 14:01

Status:

Needs review

» Reviewed & tested by the community

What I don't understand is how the and get there - \Drupal\Component\Utility\Html::serialize works by finding the body element and then looping over its child elements, so the wrapping html and body elements shouldn't even be in the picture.

They are added because of the reasons mentioned by @mfb in #28.
https://3v4l.org/r6nNq#v8.1.18 is a 3v4l with the minimum of load & serialize. I think this could also be a bug in php core's dom handling, but I think we should just fix it like in #26.
Setting this back to rtbc.

Comment #30

olli CreditAttribution: olli commented 27 April 2023 at 06:49

#26 seems to remove any closing body and html elements, also intentional ones inside code blocks.

Should we instead try to fix this somehow in Html::load() that adds them? https://3v4l.org/vm89s

Comment #31

28 April 2023 at 17:08

Status:

Reviewed & tested by the community

» Needs work

The last submitted patch, 26: 418620-26.patch, failed testing. View results

Comment #32

28 April 2023 at 17:08

Version:

10.1.x-dev

» 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #33

mfb

they or he

CreditAttribution: mfb commented 2 October 2023 at 03:42

Status:

Needs work

» Needs review

File	Size
interdiff-26-33.txt	923 bytes
418620-33.patch	1.46 KB
11.x: PHP 8.2 & MySQL 8 30,358 pass, 1 fail

3 files were hidden/shown/deleted

File	Size
drupal7-htmlcorrector-tag-fragments-418620-9.patch	990 bytes
7.x: PHP 5.3 & MySQL 5.5 2,033 pass, 1 fail PHP 5.5 & MySQL 5.5 2,034 pass
interdiff-21-26.txt	1.68 KB
418620-26.patch	1.42 KB
10.1.x: PHP 8.1 & MySQL 5.7 29,098 pass, 83 fail 11.x: PHP 8.2 & MySQL 8 30,360 pass

Yes, it actually seems quite reasonable to just leave out the potentially-misinterpreted </body></html> and rely on DOMDocument to deal with the HTML soup. Let's see if tests pass.

Comment #34

2 October 2023 at 04:26

Status:

Needs review

» Needs work

The last submitted patch, 33: 418620-33.patch, failed testing. View results

Comment #35

mfb

they or he

CreditAttribution: mfb commented 2 October 2023 at 04:37

Status:

Needs work

» Needs review

File	Size
interdiff-33-35.txt	584 bytes
418620-35.patch	2.21 KB
11.x: PHP 8.2 & MySQL 8 30,371 pass

1 file was hidden/shown/deleted

File	Size
418620-33.patch	1.46 KB
11.x: PHP 8.2 & MySQL 8 30,358 pass, 1 fail

Well, the one failing test honestly seems like an improvement, so I'm fixing the test.

Comment #36

smustgrave CreditAttribution: smustgrave at Mobomo commented 2 October 2023 at 15:02

Status:

Needs review

» Reviewed & tested by the community

Seems odd the closing bracket was the failing test but agreed.

Comment #37

mfb

they or he

CreditAttribution: mfb commented 3 October 2023 at 19:26

Issue summary:

View changes

Clarified proposed resolution and remaining tasks. Btw, it looks like these closing tags should still be removed even with #2441811: Upgrade filter system to HTML5, which is not surprising since DOMDocument is still in use under the hood.

Comment #38

4 October 2023 at 06:06

Status:

Reviewed & tested by the community

» Needs work

The last submitted patch, 35: 418620-35.patch, failed testing. View results

Comment #39

borisson_

Dutch

Mechelen, 🇧🇪

CreditAttribution: borisson_ at Calibrate commented 4 October 2023 at 06:23

Status:

Needs work

» Reviewed & tested by the community

Known fail

Comment #40

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Thunder commented 4 October 2023 at 09:16

Status:

Reviewed & tested by the community

» Needs work

We recently committed #2441811: Upgrade filter system to HTML5 which makes this one not apply.

Comment #41

mfb

they or he

CreditAttribution: mfb commented 4 October 2023 at 09:25

Status:

Needs work

» Needs review

File	Size
418620-40.patch	2.08 KB
11.x: PHP 8.2 & MySQL 8 30,367 pass, 2 fail

1 file was hidden/shown/deleted

File	Size
418620-35.patch	2.21 KB
11.x: PHP 8.2 & MySQL 8 30,371 pass

Rerolled.

Comment #42

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Thunder commented 4 October 2023 at 09:28

Status:

Needs review

» Needs work

I think I agree with #33 and

Yes, it actually seems quite reasonable to just leave out the potentially-misinterpreted </body></html> and rely on DOMDocument to deal with the HTML soup. Let's see if tests pass.

+++ b/core/tests/Drupal/Tests/Core/EventSubscriber/RssResponseRelativeUrlFilterTest.php
@@ -49,8 +49,7 @@ public function providerTestOnResponse() {
   <item>
      <title>Drupal 8 turns one!</title>
      <link>https://www.drupal.org/blog/drupal-8-turns-one</link>
-     <description>&lt;a href="localhost/node/1"&gt;Hello&lt;/a&gt;
-    </description>
+     <description>&lt;a href="localhost/node/1"&gt;Hello&lt;/a&gt;</description>
   </item>
   </channel>
 </rss>

This change is not longer needed.

Comment #43

4 October 2023 at 09:33

longwave made their first commit to this issue’s fork.

Comment #44

4 October 2023 at 09:38

longwave opened merge request !4930

Comment #45

longwave

he/him

English

CreditAttribution: longwave at Full Fat Things commented 4 October 2023 at 09:40

Status:

Needs work

» Needs review

Discussed with @alexpott. It seems that most of the HTML boilerplate in Html::load() is no longer required and we can get away with just parsing <body> . $html.

Doctype is not needed, if I reserialize the DOMDocument then the doctype is added.
Encoding meta tag is not needed, UTF-8 is the default but we can also specify this to the parser.
<html> tag is not needed, again if I reserialize the document then the tag is added around <body>.

Opened an MR with the new test and my suggested changes.

Comment #46

mfb

they or he

CreditAttribution: mfb commented 4 October 2023 at 10:01

1 file was hidden/shown/deleted

File	Size
418620-40.patch	2.08 KB
11.x: PHP 8.2 & MySQL 8 30,367 pass, 2 fail

Changes look good to me (especially getting rid of the ancient Content-Type meta tag :)

Comment #47

4 October 2023 at 10:09

Status:

Needs review

» Needs work

The last submitted patch, 41: 418620-40.patch, failed testing. View results

Comment #48

longwave

he/him

English

CreditAttribution: longwave at Full Fat Things commented 4 October 2023 at 10:13

Status:

Needs work

» Needs review

Added the comment, reworded slightly.

Comment #49

smustgrave CreditAttribution: smustgrave at Mobomo commented 4 October 2023 at 14:57

Status:

Needs review

» Reviewed & tested by the community

Looks good! Big win getting #2441811: Upgrade filter system to HTML5 to land!

Comment #50

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Thunder commented 4 October 2023 at 16:23

Status:

Reviewed & tested by the community

» Fixed

Committed f101ff7 and pushed to 11.x. Thanks!

Comment #51

4 October 2023 at 16:23

alexpott committed f101ff7f on 11.x

Issue #418620 by mfb, smustgrave, longwave, dsnopek, greenbeans,...

Comment #52

mfb

they or he

CreditAttribution: mfb commented 4 October 2023 at 21:39

This bugfix will need a little tweaking if folks want to backport to Drupal 7 or Drupal 10.1, but should be easy..

Comment #53

18 October 2023 at 21:39

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.