#374441: Refactor Drupal HTML corrector (PHP5) added some code to load a text into a DOM object and then convert that back to a string.
As proved by #362972: Filters should remove "rel" attributes instead of just adding rel="nofollow" and #16161: Move Read More link to end of node content the exactly same code will be used by any filter that doesn't want to use regexp voodoo to alter the HTML.
Given that here is a patch to create 2 functions used to load and serialize the DOM. It probably needs phpdocs improvements.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

tic2000’s picture

And a patch with a one line html corrector.

tic2000’s picture

And now a patch that removes the first regexp.
I hope that not many kittens are killed in the process.

Status: Needs review » Needs work

The last submitted patch failed testing.

tic2000’s picture

Status: Needs work » Needs review
FileSize
2.26 KB

I saw that coming. New patch. It should fix the exceptions.

tic2000’s picture

tic2000’s picture

tic2000’s picture

Damien Tournoud’s picture

Thanks, tic2000.

A few nitpicks:

+ * You can use filter_dom_serialize( ) to serialize this DOMDocument

^ That should be filter_dom_serialize()

+ * The function serializes the body part of a DOMDocument
+ * back to an XHTML snippet.
+ * The resulting XHTML snippet will have a space before the trailing />
+ * of closing elements, for better rendering on HTML user agents.

There is a blank line missing between the two sentences.

Let's rewrite the second one as "The resulting XHTML snippet will be properly formatted to be compatible with HTML user agents."

tic2000’s picture

Damien Tournoud’s picture

Status: Needs review » Reviewed & tested by the community

Yay! Thanks.

Dries’s picture

Status: Reviewed & tested by the community » Needs work

Committed this to CVS HEAD. Thanks!

I'm marking this 'code needs work' because I feel it is lacking some isolated tests -- or at least, maybe we should consider refactoring some of the existing tests. If you disagree, feel free to mark this 'fixed'.

tic2000’s picture

What should they test? They are just some wrapper function. I mean the load wrapper just does DOMDocument::LoadHTML. I don't think we need a test for that.
The serialize is a little more complex, but I think that tests for HTML corrector cover that.

dropcube’s picture

Tests in FilterUnitTest->testHtmlCorrectorFilter() cover that.

tic2000’s picture

Status: Needs work » Fixed

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.