Problem/Motivation
On Drupal 9 faulty does not break the TOC. On D10 it does. It is fine on D9 because it was fixed in #3224559: Invalid HTML causes errors.
That fix no longer works because Drupal has changed the HTML serializer. If core/lib/Drupal/Component/Utility/Html.php is replaced by a copy from D9, the problem clears up. I have not worked out exactly what has changed, but the core issue where the change was made, with links to the commit, is #2441811: Upgrade filter system to HTML5.
Steps to reproduce
Take a node with a TOC including an <h3> heading. Keep opening tag h3, and change closing tag to </h2>.
Proposed resolution
Work out where the problem lies, and look for ways to update the fix in #3224559: Invalid HTML causes errors.
| Comment | File | Size | Author |
|---|---|---|---|
| #4 | toc_api-fix_faulty_html-3416816_4.patch | 2.81 KB | john_b |
Issue fork toc_api-3416816
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
john_b commentedComment #3
john_b commentedIt seems that Html::normalize in TocBuilder.php is fixing mismatching tags when using PHP's DOMDocument class for HTML, as Drupal 9 does.
As a result, toc_api module is fixing broken HTML which would not be fixed by Drupal's core 'Fix broken HTML' filter.
However, this function is not fixing mismatching tags when using Mastermind/html5-php, used by D10's html utility.
cf https://github.com/Masterminds/html5-php/issues/247
Comment #4
john_b commentedIn case someone wants to use the idea I suggested in #3224559: Invalid HTML causes errors of calling the PHP HTML Tidy extension, here is a patch which does that.
Comment #5
joseph.olstadPlease note, 1.x is no longer supported
All MRs should now be reviewed and made to go against 2.0.x if still needed
Comment #8
mrinalini9 commentedComment #9
joseph.olstadWould be best to wrap the normal processing logic/call in a try/catch, perform normal processing in the try, and if there's mangled html catch the exception, log it in the dblog so that people know about it, ideally it would be optional to auto-correct faulty html so add a configuration option for this and ONLY autocorrect the faulty HTML if you've first logged that an exception due to faulty html has occured and conditionally based on the configuration option value then and only then do the autocorrection.
Comment #10
joseph.olstad