Problem/Motivation

On Drupal 9 faulty does not break the TOC. On D10 it does. It is fine on D9 because it was fixed in #3224559: Invalid HTML causes errors.

That fix no longer works because Drupal has changed the HTML serializer. If core/lib/Drupal/Component/Utility/Html.php is replaced by a copy from D9, the problem clears up. I have not worked out exactly what has changed, but the core issue where the change was made, with links to the commit, is #2441811: Upgrade filter system to HTML5.

Steps to reproduce

Take a node with a TOC including an <h3> heading. Keep opening tag h3, and change closing tag to </h2>.

Proposed resolution

Work out where the problem lies, and look for ways to update the fix in #3224559: Invalid HTML causes errors.

CommentFileSizeAuthor
#4 toc_api-fix_faulty_html-3416816_4.patch2.81 KBjohn_b

Issue fork toc_api-3416816

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

John_B created an issue. See original summary.

john_b’s picture

Issue summary: View changes
john_b’s picture

It seems that Html::normalize in TocBuilder.php is fixing mismatching tags when using PHP's DOMDocument class for HTML, as Drupal 9 does.

As a result, toc_api module is fixing broken HTML which would not be fixed by Drupal's core 'Fix broken HTML' filter.

However, this function is not fixing mismatching tags when using Mastermind/html5-php, used by D10's html utility.

cf https://github.com/Masterminds/html5-php/issues/247

john_b’s picture

StatusFileSize
new2.81 KB

In case someone wants to use the idea I suggested in #3224559: Invalid HTML causes errors of calling the PHP HTML Tidy extension, here is a patch which does that.

joseph.olstad’s picture

Version: 8.x-1.x-dev » 2.0.x-dev
Status: Active » Needs work

Please note, 1.x is no longer supported

All MRs should now be reviewed and made to go against 2.0.x if still needed

mrinalini9 made their first commit to this issue’s fork.

mrinalini9’s picture

Status: Needs work » Needs review
joseph.olstad’s picture

Would be best to wrap the normal processing logic/call in a try/catch, perform normal processing in the try, and if there's mangled html catch the exception, log it in the dblog so that people know about it, ideally it would be optional to auto-correct faulty html so add a configuration option for this and ONLY autocorrect the faulty HTML if you've first logged that an exception due to faulty html has occured and conditionally based on the configuration option value then and only then do the autocorrection.

joseph.olstad’s picture

Status: Needs review » Needs work