Hey,
I've just found an issue with the Text Filter 'Correct faulty and chopped off HTML' on D8
It seems to incorrectly close the tag.

For example:

<video>
    <source src="..." type="video/mp4">
</video>

Would output the following the the filter on:

<video>
    <source src="..." type="video/mp4"></source>
</video>

This is the case even if I was to self close, which is actually invalid html.

The correct result would be to not act on <source>.

I'm not sure if this filter actually removes incorrect html closing tags?

Thanks,
Oliver.

Comments

ocastle created an issue. See original summary.

ocastle’s picture

Issue summary: View changes
eporama’s picture

I think this is due to how we process the filtereed text in FilterHtmlCorrector::process(), we call Html::normalize($text) which loads the text snippet as the body of a new HTML document and then builds a DOMDocument element on that and calls the saveXML() method to write out valid XML for each snippet.

However, in HTML::load() we build up the new HTML document as an XHTML doc:

  public static function load($html) {
    $document = <<<EOD
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /></head>
<body>!html</body>
</html>
EOD;
    // PHP's \DOMDocument serialization adds extra whitespace when the markup
    // of the wrapping document contains newlines, so ensure we remove all
    // newlines before injecting the actual HTML body to be processed.
    $document = strtr($document, array("\n" => '', '!html' => $html));

    $dom = new \DOMDocument();
    // Ignore warnings during HTML soup loading.
    @$dom->loadHTML($document);

    return $dom;
  }

However, source isn't a valid XHTML element, so it just punts it to having a close element. I don't know what the full ramifications would be to change that XHTML document template to a valid HTML template, but in a quick test,

    $document = <<<EOD
<!DOCTYPE html>
<html>
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /></head>
<body>!html</body>
</html>
EOD;

Produced the following:

<video>
  <source src="..." type="video/mp4"/>
</video>
eporama’s picture

Looks like this may be being discussed already in #1333730: [Meta] PHP DOM (libxml2) misinterprets HTML5 and a much more extensive solution is being handled in #2441811: Upgrade filter system to HTML5 which includes this quick test that I just had, so I would say this is probably a duplicate and can be closed.

ocastle’s picture

Status: Active » Closed (duplicate)

Agreed, #1333730 covers the issue.

Closing as duplicate