Problem/Motivation
This issue is part of #1333730: [Meta] PHP DOM (libxml2) misinterprets HTML5.
The Drupal\system\Tests\Theme\FunctionsTest needs to be upgraded to HTML5, along with other parts of the code using the php DOM extension to do parsing of HTML.
Blocked on #2667340: Usage of field_prefix and container-inline creates invalid markup. which was fixed in #3072752: Invalid markup in ImageItem (causes test failures with symfony/dom-crawler:4.3+).
Beta phase evaluation
| Issue priority | Major because the testing system needs HTML5 support to test HTML5 code. |
|---|---|
| Prioritized changes | The main goal of this issue is a bugfix to for HTML5 support, which is part of the Drupal 8 product. |
Commit credits
Please give commit credits to all who have worked on the parent issue (#1333730: [Meta] PHP DOM (libxml2) misinterprets HTML5).
Steps to reproduce
N.A., is about the use of Masterminds for HTML parsing on tests instead of php DOM extension parsing for HTML.
Proposed resolution
Use Masterminds HTML parsing.
Remaining tasks
Decide if: (a) apply the patch at #94 as it is, and open a follow-up for the help topics test fix; or (b) apply the patch from #100 with the work-around, and open a follow-up for fixing the one edge case in the help topics test that is failing; or (c) continue fixing that help topics test case here.
User interface changes
N.A.
API changes
N.A.
Data model changes
N.A.
Release notes snippet
N.A.
| Comment | File | Size | Author |
|---|
Issue fork drupal-2441373
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #1
daffie commentedComment #2
daffie commentedPostponed on #2429363: Add HTML5-lib to Drupal 8 core for the filter system and for the testing system.
Comment #3
daffie commentedComment #4
daffie commentedThe library #2429363: Add HTML5-lib to Drupal 8 core for the filter system and for the testing system has landed.
Comment #5
star-szr@daffie so is anything else needed or is the patch here complete from your perspective?
Comment #6
joelpittetThis is not really a bug but I does look good and consistent to what we've previously done in core with StyleTestBase.
Comment #9
joelpittetBack to RTBC
Comment #10
alexpottThe meta issue lists several issues each changing a single test - can these please be rolled into 1 issue - breaking this type of work into several issues makes more work for everyone. Also
makes it a lot of work for a committer to work out who to credit.
Comment #12
daffie commentedThe new patch contains the patches from:
Comment #14
daffie commentedComment #15
jhedstromComment #16
alvar0hurtad0Start working on reroll
Comment #17
alvar0hurtad0Rerolled #1
Comment #18
daffie commented@alvar0hurtad0: Thank you for the reroll of the patch from comment #1, but what needs to be done is to reroll the patch from comment #12.
Comment #19
alvar0hurtad0ups @daffie,
lets go :D
Comment #20
daffie commented@alvar0hurtad0: Can you add an interdiff.txt file. I makes reviewing of your patch a lot easier. :-)
Comment #21
alvar0hurtad0Applying reroll of #12
Comment #24
mgiffordComment #25
effulgentsia commentedWhy are the following changes to non-test code needed? If we can remove them, then this can be tagged "rc eligible" per https://www.drupal.org/core/d8-allowed-changes#rc.
.
.
.
Comment #26
eporama commentedHere's a straight reroll of #21. No interdiff since I haven't changed anything except to move sections to make it apply. I checked #21 against #12 and the only functional change (other than moving blocks was that #12 had the following:
which #21 seems to have condensed to:
which seems functionally equivalent.
I will take my reroll and remove the changes to non-test files as noted by @effulgentsia in #25 since those are commenting field prefix and suffix that shouldn't matter and changing the return type hint for the Entity.php label function should also not be part of this test as far as I can see.
Comment #27
eporama commentedHere is a new patch and an interdiff. The interdiff only removes the trivial changes to non-test files.
Comment #29
twistor commentedThe HTML5 parser has added the option 'disable_html_ns' which removes all of the weird namespace stuff I was doing before. Woot.
Comment #31
twistor commentedThis adds the meta-refresh hack back.
Comment #33
twistor commentedThis adds back some commented out field prefixes. The reason for those is to just get the tests passing. We're generating invalid markup in some places. That should be a separate issue.
Comment #35
twistor commentedComment #37
twistor commentedComment #39
twistor commentedThis adds an incrementor inWebTestBase::checkForMetaRefresh(). I frankly don't understand how it worked before.Edit: It was there the whole time, I just removed it in a previous patch.
Comment #40
joelpittetThanks for tackling this @twistor!
Had a review through the latest patch:
Of course we'd love to move that to the template, are these making some failures?
multiple is valid HTML5 as boolean attributes. What made this change?
https://www.w3.org/html/wg/drafts/html/master/infrastructure.html#boolea...
Nice to see we don't ahve to supress.
This move is incorrect in the ordering. b before C.
Wasn't trimmed before, why is this needed?
Looks scope creepy, why all the changes in this hunk?
in scope?
Would be nice if xpath continued to work with XML or this could mess with contrib and may make this harder to get in.
Comment #41
twistor commentedThanks for the review! This was just a very quick attempt to get things passing so we could figure out what's broken.
Comment #42
twistor commentedThis addresses #4 and #8.
Comment #44
twistor commentedRegarding point 2, in comment #40:
This is changed because simpletest is emulating the ajax API, and the HTML5 parser is changing the attributes. This could be fixed, but I'm not sure if it matters.
Comment #45
joelpittetI figured as much, I just thought I'd see what direction this was going.
I had a pain with one test that didn't work with DomDocument or HTML5, which was really sad. I ended up going for Regex to ensure the regression didn't happen. IIRC they were ignoring or correcting the markup which is not cool. The problem was nested
<form>tags in views_form. For your reference in-case you want to maybe avoid some strange things #2558061: Nested form tag in views formsComment #46
joelpittetFeel free to spin-off follow-ups for things that start diverging from the main IS goal.
Comment #47
twistor commented#2667340: Usage of field_prefix and container-inline creates invalid markup. created.
Here's a patch with that added.
Comment #48
twistor commentedUpdate on point 2, in comment #40:
Core is settings multiple="multiple" by default. It was \DOMDocument that was changing it. The current test and behavior is actually more correct.
Comment #49
twistor commented#2667358: CheckboxTest doesn't test what it thinks it's testing. addresses point 7 from #40.
Comment #53
daffie commentedComment #54
daffie commented#2667358: CheckboxTest doesn't test what it thinks it's testing. has been fixed.
Comment #55
Sonal.Sangale commentedComment #56
daffie commentedOn #2667340: Usage of field_prefix and container-inline creates invalid markup..
Comment #57
Sonal.Sangale commentedComment #66
quietone commentedTriaging issues in simpletest.module as part of the Bug Smash Initiative to determine if they should be in the Simpletest Project or core.
This looks like it belongs in the testing component, phpunit.
Comment #68
catchComment #69
catchComment #70
jibranThe blocker issue #2667340: Usage of field_prefix and container-inline creates invalid markup. was fixed in #3072752: Invalid markup in ImageItem (causes test failures with symfony/dom-crawler:4.3+) so this is not postponed anymore.
Comment #71
jibranHere is a reroll from last patch. We added more
new \DOMDocument();since then so this patch NW for that.Comment #73
joseph.olstadTwo patches, one is just comming out of thin air, (73b)
the other is a reroll of patch 71 (73)
lets see how ugly it looks
Comment #75
joseph.olstadI'm not sure what needs to be done (yet) to get the last passing tests, 73b wasn't good so start from 73.
any assistance appreciated
https://www.drupal.org/files/issues/2021-07-20/D93x-2441373-73.patch
Comment #76
longwaveI don't understand why we need to make these changes. The tests are passing with the HTML4 DOMDocument parser, and the updated tests don't seem to be any different except for the parser. What benefits does switching to the HTML5 parser bring to the tests?
Comment #77
longwaveThe IS only talks about one class but scope was expanded to more classes, it needs updating to cover what we are supposed to be fixing here and why.
Comment #78
joseph.olstadDOMDocument should be from masterminds HTML5 not xhtml
there's many postponed issues held up on this issue.
Currently the "Limit allowed HTML tags and correct faulty HTML" parser in core is still using xhtml not HTML5, it's rewriting HTML5 into xhtml. There's a whole bunch of postponed issues that are hinging on updating the tests to use the HTML5 masterminds api in this above patch.
Perhaps I've misunderstood but it seems that this issue blocks almost all of the HTML5 fixes from moving forward.
***EDIT*** I will retest my observations on a fresh install of of D9.2.1
Comment #79
joseph.olstadok perhaps #3133749: Deprecate masterminds/html5 as a production dependency for Drupal 10
changes things
one way or another, there's some work to do to completely fix HTML5 support and test coverage
Comment #80
longwaveI still don't see why the test changes have been split out to a different issue. How do we know which tests to change, and why are we doing them separately from the actual change to the filter system?
Comment #81
joseph.olstadI asked myself the very same question as brought up by @longwave in comment #80
Looks like we're very close to completion with #2441811: Upgrade filter system to HTML5
Comment #86
andypostas commited #2441811: Upgrade filter system to HTML5
Comment #87
vsujeetkumar commentedRe-roll patch created as per #86. rdf changes has been removed from the patch as it is not part of core now.
Comment #88
vsujeetkumar commentedFixed the test case, Please have a look.
Comment #89
smustgrave commentedWas previously tagged for issue summary update which believe still needs to happen.
Also not 100% about the test fix in #88. Is that what HTML5 requires?
Comment #90
vsujeetkumar commented@smustgrave, Will take the reference from "related & committed" issue in #86, Please have a look on file "core/modules/filter/tests/src/Kernel/FilterCaptionTwigDebugTest.php" in the related issue and advise.
Comment #91
kostyashupenkoComment #92
longwaveCan we use
Html::load()here instead of using the HTML5 parser directly? Same for all the other places that we call newHTML5().Comment #93
vsujeetkumar commentedAddressed #92, Used Html::load() wherever we use HTML5(). Patch created, Please have a look.
Comment #94
vsujeetkumar commentedFixed test cases, Please have a look.
Comment #95
longwaveThis looks good to me so far. The remaining uses of
new DOMDocument()in tests are:The latter two are explicit cases where we still need DOMDocument - loading XML instead of HTML and testing a specific serialization bug.
Not sure what to do with HelpTopicsSyntaxTest; it uses libxml to ensure the HTML is valid and unsure how we can port this to the HTML5 library.
Comment #96
marvil07 commented@longwave, there may be a port for HelpTopicsSyntaxTest .
Here a version of the validation there using Masterminds\HTML5 functionality.
Let us see what CI thinks.
Comment #97
vsujeetkumar commentedFixed the CCF issues.
Comment #99
marvil07 commented@vsujeetkumar, thanks for the variable typo fix, I was hoping to not need to run it locally :'-)
The test fail in #97 was related to the approach change.
Basically the error strings are different on Masterminds\HTML5 parser, than in php DOM extension libxml2.
Gladly there seems to be only one actual relevant error tested from the library output, the rest are errors created on the test itself related to structure.
After changing a bit how this works, I arrived to a point were it is mainly OK, but there is a problem in one case, on
bad_html3help topic error template.Likely the parsing is different and produces a bit different error for that case, or there is an actual error to fix.
I am attaching a couple of patches, one expected to fail without the workaround, and other skipping the check on that specific case.
At this point we may either use the patch at #94 and open a follow-up specifically for help topics, that my use the code on the patches in this comment; or continue here.
Comment #100
vsujeetkumar commented#99 patch fails to apply, created updated path without fails. Also on below comment I am also thinking the same for now we can go with #94 and do followups on help topics, Here we need expert advise.
Comment #102
marvil07 commented@vsujeetkumar: thanks for re-exporting the patch as p1!
It seems I exported p0 instead :'-)
I see the test results reflect the hypothesis \o/
Comment #103
smustgrave commentedWas previously tagged for issue summary update which still believe is needed.
Since this is fixing tests I'm wondering if it should just continue here. Is there a large number of help_topic tests that need to be fixed?
Comment #104
marvil07 commented@smustgrave,
I have updated the description.
Maybe :-)
Per my comment above at #99, I would suggest to apply the change on #94, which cover most of the cases, and open at least one follow-up for the help_topic test.
To answer the question, it is only one edge case in one of the help topics tests failing, but since that test is walking over all help topics data in core, it may be worth a follow-up.
Also, I noticed I did not even mentioned the other two cases @longwave mentioned on #95, but I arrived to the same conclusion.
On core/tests/Drupal/KernelTests/AssertContentTrait.php case, that new DOMDocument object is only used for parsing xml content, and around it, it already uses the Html::load() helper that will use HTML5 version from Masterminds.
Here an extract:
On core/tests/Drupal/Tests/Component/Utility/HtmlTest.php case, it also seems OK.
The use there is just a new empty object passed on the next line to
Html::serialize(), of which the signature of the method requires the DOMDocument class.In other words, it is not really used for parsing, so it is OK to leave it as it.
Comment #105
smustgrave commented@marvil07 thanks for the explanation. Opened up a follow up for help_topics.
Comment #106
alexpottThere's a broken test. And...
Is this checking for
<?xml<code> actually necessary? Before we were adding <code><?xml encoding="UTF-8">to the content so how would have the content that starts withWhich results in the following SimpleXml element
Which is nonsense compared to the input. We've added a body - because we're treating it as HTML and removed the spurious body from the HTML.
I think parse has always been expecting HTML and nothing else.
Comment #107
alexpottI guess #94 is supposed to be the patch to commit. Can that be re-uploaded so it is the last patch on the issue otherwise it is very very confusing. The comment about AssertContentTrait::parse() from #106 still stand. I couldn't see any supporting evidence on the issue that we need to support XML in that method and if we do then we need to have some test coverage of it in AssertContentTraitTest.
Comment #109
marvil07 commented@alexpott, that may be the case, I guess the changes were being cautious, trying to not change the logic more than needed.
I see that the actual
parse()method was added at 0858e9350b8d99a8476b78a845e1531fb124acef in 2018, so it has been there for a while.That method is part of
AssertContentTrait, which is mainly used inKernelTestBase, an naturally onDrupalKernelby extension.I do not see direct uses of
parse().The following is an attempt to search for that, which only leads a false positive, i.e. not using
$this.Even if no direct uses,
parse()is used indirectly through the highly usedxpath()method on the same trait.Said that, I could not find anything, at least with the following search, of use with non-HTML.
In other words, it likely makes sense to remove the support to try to parse XML, since all found uses are about HTML.
I see test results pass after the change, so you were probably right!
Yes, I guess we have not been hiding files enough in this issue to make that clear.
I have just opened a new merge request with two commits, the first apply the patch in 94, and the second removes support for XML on the
parse()method.The interdiff would be the new commit, so let me point to that directly instead of uploading one here .
Comment #110
smustgrave commentedOnly other test I see that has loadXML is HelpTopicsSyntaxTest which was broken off to a follow up. So remarking this one.
Comment #111
alexpottCommitted and pushed 13883cc3808 to 11.x and ca47a29ef9f to 10.2.x. Thanks!
Backported to 10.2.x as this is making only test changes.