Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Off-shoot from #721536: HTML corrector filter has problems with unescaped CDATA and incorrectly closed tags.
The HTML correcter filter is pretty flawed in that it uses the PHP DOM extension to try to force HTML to be XHTML compliant. This is not a true tidier and can have side effects. Plus, Tidy makes well... tidier output and is bundled w/ PHP.
Comments
Comment #1
Owen Barton CreditAttribution: Owen Barton commentedI don't think it is quite accurate to say that Tidy comes bundled with PHP - at least not on Ubuntu:
I can install it using apt-get install php5-tidy - but that means that users on shared servers will be stuck with ugly, broken PHP.
I would suggest we look at http://htmlpurifier.org/ which does both security and standards validation and tidying, and is very high quality in my experience. It is LGPL, but we could legally fork it to a GPL licence each time (Drupal contributors would need to pass patches upstream themselves, however).
Comment #2
JacobSingh CreditAttribution: JacobSingh commentedYeah, I've used that tool. It is quite good.
When I said bundled I mean that when you download the PHP source, it comes with it. It isn't a PECL extension. I think just because it isn't enabled by default on Ubuntu doesn't mean that we can't use it though. mod_rewrite isn't either. Clean urls just work if you enable it. If you don't you get the fallback behavior.
I see this as a similar case. We could potentially make drupal urls all be /index.php/content/story to avoid the fact that mod_rewrite is not enabled by default, but that would be another example of using a sub-standard approach to avoid a configuration step for example.
I would prefer HTML purifier though over the current DOM silliness even if we don't use Tidy (which I have bias towards because it is for this and written in C). It would be nice if it could be used in place of our XSS filtering which is a little too brute force for my liking as well.
-J
Comment #3
sunThis is sorta built-in in the PHP5 DOMDocument now.
Just needs a flag to be set to TRUE.
Comment #4
Hanno CreditAttribution: Hanno commentedrelated: #1333730: [Meta] PHP DOM (libxml2) misinterprets HTML5
Comment #17
quietone CreditAttribution: quietone at PreviousNext commented@JacobSingh, Thank you for the idea to help improve Drupal core.
The proposal doesn't met the Criteria for evaluating proposed changes. In this case, there is not demonstrated demand and support for the change.