#2191285-17: Text module is not required, but is marked as required discovered that _filter_htmlcorrector() is used in many places throughout core, not related to Filter module.

→ Convert the internals of _filter_htmlcorrector() into a Drupal\Core\Utility\Html class.


Note: #1333730: [Meta] PHP DOM (libxml2) misinterprets HTML5 might replace this utility class with an external HTML5 library later on, but for now, we need to unblock #2191285: Text module is not required, but is marked as required, so we can fix a hidden broken module dependency problem.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

sun’s picture

Status: Active » Needs review
FileSize
22.45 KB

Attached patch moves the corresponding functions into a new Drupal\Component\Utility\Html class.

Next to fixing/modernizing phpDoc + signatures, it also adds a Html::normalize() helper function that essentially acts as a 1:1 replacement for _filter_htmlcorrector().

sun’s picture

FileSize
29.6 KB
7.81 KB

Actually, let's get rid of the filter_dom_* functions entirely while being there.

The last submitted patch, 1: filter.dom_.1.patch, failed testing.

sun’s picture

FileSize
29.6 KB
678 bytes

Reverted change in theme_file_upload_help(). → #2195891: Clean faulty HTML in all administrative strings

The last submitted patch, 2: filter.dom_.2.patch, failed testing.

amateescu’s picture

+++ b/core/lib/Drupal/Component/Utility/Html.php
@@ -0,0 +1,135 @@
+/**
+ * Provides DOMDocument helpers for parsing and serializing HTML strings.
+ */
+class Html {

A Html class name doesn't really make sense on its own (without taking the namespace into consideration), how about DOMDocument or DOMDocumentHelper?

jibran’s picture

Issue tags: +PHPUnit
  1. +++ b/core/lib/Drupal/Component/Utility/Html.php
    @@ -0,0 +1,135 @@
    +   * @param string $html
    

    IMHO $html should be $text or $xml because it is not html.

  2. +++ b/core/lib/Drupal/Component/Utility/Html.php
    @@ -0,0 +1,135 @@
    +    $prefix = <<<EOD
    +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    +<html xmlns="http://www.w3.org/1999/xhtml">
    +<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /></head>
    +<body>
    +EOD;
    +    $suffix = '</body></html>';
    

    Can we rename it to doc? We can print $html in the heredoc so no need for $prefix and $suffix.

  3. +++ b/core/modules/filter/lib/Drupal/filter/Plugin/Filter/FilterHtmlCorrector.php
    --- a/core/modules/filter/lib/Drupal/filter/Tests/FilterUnitTest.php
    +++ b/core/modules/filter/lib/Drupal/filter/Tests/FilterUnitTest.php
    

    We should convert it phpUnitTest so that we can use dataProvider.

Status: Needs review » Needs work

The last submitted patch, 4: filter.dom_.3.patch, failed testing.

Berdir’s picture

+++ b/core/lib/Drupal/Component/Utility/Html.php
@@ -0,0 +1,135 @@
+    $suffix = '</body></html>';
+
+    $document = new DOMDocument();
+    // Ignore warnings during HTML soup loading.

Needs to be \DOMDocument

We discussed PHPUnit too, but that test class tests a bunch of other functions too, which we would either have to convert too or split the test up. I'd prefer to not do that here.

sun’s picture

Status: Needs work » Needs review
FileSize
29.56 KB
1005 bytes

Fixed Html::load().

re: #6:

I wasn't sure what the most appropriate class name would be at the beginning, but after studying the provided functionality, this class solely provides helper methods for operating on snippets of HTML strings, whereas the most common use-case is just simply Html::normalize().

In other words, this is not a helper class for DOMDocument — usage of that is a detail that is internal to the utility class and should not be exposed in the public naming.

I considered HtmlHelper first, but given that String is also just String and not StringHelper, and because I think it is rather unlikely that there's going to be a different Html class anywhere (and both imported into the same file), the name Html appears to be most appropriate and most consistent (→ DX) right now.

We can consider to change the utility class names from Foo into FooHelper, but for DX/consistency, that effort should then rename all of them IMHO.

re: #7:

The accepted variable is actually supposed to contain HTML, not plain-text. Therefore, $html is the proper variable name.

As @Berdir mentioned, I'd like to defer the filter test conversion to a separate issue. The primary purpose of this issue is to unblock #2191285: Text module is not required, but is marked as required ASAP.

Status: Needs review » Needs work

The last submitted patch, 10: filter.dom_.10.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
FileSize
29.67 KB
744 bytes

Fixed text_summary() is not able to supply all required parameters for FilterInterface::process().

Status: Needs review » Needs work

The last submitted patch, 12: filter.dom_.12.patch, failed testing.

jibran’s picture

I am removing phpunit tag and adding needs followup tag also now we need change record before the commit so adding the tag for a remainder.

The accepted variable is actually supposed to contain HTML, not plain-text. Therefore, $html is the proper variable name.

How about xhtml?
Some minor doc issues.

  1. +++ b/core/lib/Drupal/Component/Utility/Html.php
    @@ -0,0 +1,134 @@
    +   * a full DOMDocument object that represents this document.
    ...
    +   * Use Html::serialize() to serialize this DOMDocument back to a string.
    ...
    +   *   A DOMDocument that represents the loaded (X)HTML snippet.
    ...
    +   * The function serializes the body part of a DOMDocument back to an XHTML
    +   * snippet. The resulting XHTML snippet will be properly formatted to be
    ...
    +   *   A DOMDocument object to serialize, only the tags below the first <body>
    ...
    +   * DOMDocument::loadHTML() in Html::load() makes CDATA sections from the
    

    \DOMDocument, \Html and (X)HTML

  2. +++ b/core/lib/Drupal/Component/Utility/Html.php
    @@ -0,0 +1,134 @@
    +        // See drupal_get_js().  This code is more or less duplicated there.
    

    I think _drupal_get_js know

sun’s picture

Status: Needs work » Needs review
FileSize
29.95 KB
4.04 KB

How about xhtml?

No, it's just HTML. Details are internal to the methods and not going to be addressed here.

Fixed Heredoc in Html::load() causes extraneous whitespace upon serialization.

amateescu’s picture

sun’s picture

Would it be possible to move the naming discussion into a separate follow-up issue, please? The primary objective of this issue here is to unblock #2191285: Text module is not required, but is marked as required ASAP.

jibran’s picture

Issue tags: -PHPUnit

Just minor issues other then that RTBC for me. We just need followups and change record after this.

  1. +++ b/core/lib/Drupal/Component/Utility/Html.php
    @@ -33,41 +33,47 @@ public static function normalize($html) {
    +    // PHP's DOMDocument serialization adds straw whitespace in case the markup
    

    New comment it should be \DOMDocument.

  2. +++ b/core/lib/Drupal/Component/Utility/Html.php
    @@ -0,0 +1,139 @@
    + * Provides DOMDocument helpers for parsing and serializing HTML strings.
    ...
    +   * operates on an HTML string instead of a DOMDocument.
    

    \DOMDocumnet

  3. +++ b/core/lib/Drupal/Component/Utility/Html.php
    @@ -0,0 +1,139 @@
    +   * Converts a DOM object back to an HTML snippet.
    

    Instead of DOM object can we use \DOMDocument

  4. +++ b/core/lib/Drupal/Component/Utility/Html.php
    @@ -0,0 +1,139 @@
    +   * Adds comments around the <!CDATA section in a DOM element.
    

    Instead of DOM element can we use \DOMNode

  5. +++ b/core/lib/Drupal/Component/Utility/Html.php
    @@ -0,0 +1,139 @@
    +   * @param \DOMDocument $document
    +   *   The \DOMDocument that contains $node.
    +   * @param \DOMNode $node
    

    Just for the record. We can access \DOMDocument form \DOMNode. Maybe we can fix it in follow up.

sun’s picture

FileSize
29.85 KB
3.82 KB
  1. Fixed phpDoc.
  2. Leverage DOMNode::$ownerDocument.
jibran’s picture

Thank you very much for the changes. It is RTBC for me. It just needs followup and change record. :)

sun’s picture

Change notice: https://drupal.org/node/2197131

I'm leaving the creation of a follow-up issue to whoever feels that it is necessary, as I'm comfortable with the current code. As mentioned before, it would be best if such an issue would then have the objective to rename all of the utility classes into *Helper for consistency/DX reasons.

jibran’s picture

Thanks for the change notice. Can we add other function as well to it?

sun’s picture

Updated change notice for filter_dom_*() functions.

filter_dom_escape_cdata_element() is not included, because it never was a public API function, only used internally by filter_dom_serialize().

@jibran, you could have edited the change notice yourself, especially after nitpicking this patch to death. ;)

jibran’s picture

Status: Needs review » Reviewed & tested by the community

Thanks for the changes so RTBC. :)

@sun you seem perfectly motivated to make the changes so I don't want to disturb your flow. :D

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed a16b563 and pushed to 8.x. Thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

yched’s picture

[edited out : wrong issue]

yched’s picture

Oops, never mind the previous post, wrong issue.