Does this module support HTML5 tags? I'm just about to recommend it for a new site & realizing that I can't find it in the docs.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

ezyang’s picture

Status: Active » Fixed

Unfortunately, not at present.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

TravisCarden’s picture

Status: Closed (fixed) » Active

Are there any changes on this front? HTML Purifier is important to several of my sites, but HTML5 is also becoming quite important. Does the library itself support HTML5? If so, what needs to happen to make the module support it? I could possibly help with a patch, if the required changes aren't too extensive.

mgifford’s picture

Don't think much work has one in the main library for HTML5. There's this:
http://htmlpurifier.org/doxygen/html/classHTML5.html

Tidy's got a version that's supporting it though:
https://github.com/w3c/tidy-html5

For some reason it's still experimental though.

ezyang’s picture

There are two thrusts to HTML5. The first is implementing the HTML5 parsing specification; absent an extremely motivated individual, this is probably never going to happen. The second is implementing support for all of the new tags that HTML5 adds; I'm not planning on doing this, but the process is not difficult, just a little tedious, and if someone submits a patch to upstream for support and is willing to work with me to get it up to the standards of the HTML Purifier codebase, getting this in is plausible.

TravisCarden’s picture

Thanks for the update, ezyang. That's too bad. So do you foresee a future for HTML Purifier, or does this mean that it's basically gone into "minimal maintenance"? Do you recommend any alternatives for those of us who need HTML5 support? I do thank you for everything you gave us for so long!

ezyang’s picture

It's been minimal maintenance for a while ;-) I haven't seen anything with close to feature parity to HTML Purifier come into being yet; it's a bit a shame, frankly.

TravisCarden’s picture

Well, @ezyang, HTML Purifier is central to at least one application I support, and HTML5 support is paramount to that application's ongoing development. How much effort might we be looking at to add support to the library? Days of work? Weeks? Months? I don't have free time to undertake a new project, but if the investment were manageable for my employer, I would be more than happy to work on it.

ezyang’s picture

I think you could get a large subset of HTML5 going in a few days of work. The easy stuff to do is add the new tags which HTML5 added support, which are basically the same as DIVs. The hard stuff is the really new things, like SVG or Canvas or parsing.

heddn’s picture

Status: Active » Postponed

Waiting for support upstream.

heddn’s picture

If you want to follow the upstream conversation: http://htmlpurifier.org/phorum/read.php?2,6847,6847#msg-6847

Rob230’s picture

Any idea what problems it might cause if I use this on an HTML5 site? And which would be better to choose, XHTML 1.0 or HTML 4.01 Strict? I'm guessing it is not as simple as HTML4 being a subset of HTML5, but is HTMLPurifier really going to break anything if I'm just using CKEditor which doesn't use any new tags like <section>?

I really can't find an alternative module that offers the same level of usefulness, but unfortunately it looks like HTML5 support for HTMLPurifier won't be coming based on the thread linked above. The best alternative I can see is htmLawed, which has beta support for HTML5.

Sarah_G’s picture

Issue summary: View changes

I'm also anxious for HTML Purifier to support HTML5. I wanted to add a role attribute and could not get HTML Purifier to allow it. Isn't there any way to override or force it to allow an attribute or element?

Sarah_G’s picture

I'm trying to dig further and it looks like some progress was made to allow the HTML Purifier library to use HTML5. https://github.com/kennberg/php-htmlpurfier-html5. Does the Drupal module not use the same library? Or can I change libraries and it would work?

lukus’s picture

Hi

I've managed to get the script above working with htmlpurifier. I'm making use of a forked version of the script, but there might be more elegant way to integrate this. I'll submit a patch later today with my progress.

Do any of you know the best way to obtain the list of allowed elements (and attributes) from the htmlpurifier profile that's is current use?

Thanks

Luke

lukus’s picture

Status: Postponed » Needs review
FileSize
3.01 KB

Hi

Here's the patch.

I've also created a forked version of the script, which is available at https://github.com/lukusw/php-htmlpurfier-html5

To make use of this, the repo contents need to be downloaded and added to your codebase at

/sites/all/libraries/htmlpurifier_html5

I still need to provide the list of allowed tags from the htmlpurifier profile, currently I've hardcoded the elements that are relevant to me—any pointers to enable me to do this would be appreciated.

Thanks

Luke

Rob230’s picture

From looking at the code, I think this should work:

$config->get('HTML.Allowed');
lukus’s picture

Hey Rob

Thanks, I was assuming the same, but that provided an error—which led me to discover the script then goes on to set the value again.

Then it hit me—I don't need to set the allowed elements at all, as this is already done by the private function _htmlpurifier_get_config(). I just need to pass $config to the script, so it can set additional options.

As a result, I've attached another patch. I've also updated the repo for the forked version of the htmlpurifier_html5 script.

Could you let me know if this works for you?

Thanks

Luke

heddn’s picture

re #16: Not sure that you still need it but in fact, there is an example of how to do this in the Phorum module. We will use a similar solution for porting to D8, because it wants to know the tags that are touched by a filter.

    $html_definition = $config->getDefinition('HTML');
    $allowed = array();
    foreach ($html_definition->info as $name => $x) $allowed[] = $name;
    sort($allowed);
    $allowed_text = implode(', ', $allowed);
    echo $allowed_text;
heddn’s picture

I'm considering accepting #18 as a patch, but I'm curious if you've had any luck posting a patch upstream to http://htmlpurifier.org? Ideally, these changes should be incorporated there, rather than fragmenting the library. I'm pretty sure that Edward (the upstream maintainer and co-maintainer for this module) is willing to accept patches that provide HTML5 support.

lukus’s picture

Hi—that sounds like a great idea, I'll contact Edward and ask for his opinion.

Thanks

Luke

lukus’s picture

Hi Lucas

I've contacted Edward, I'll let you know you response.

Thanks

luke

ezyang’s picture

I responded by email, but here is my response pasted here:

- All of the HTML5 content needs to be gated, so it is only
available when a user specifies an HTML5 doctype. You
could try to put all of the HTML5 definitions in a new
HTMLModule.

- Tests! Tests for all of the bugs I mention here would be good.

- section/nav/aside/article are not Block content but Sectioning
content. Flow should be redefined to include Sectioning
(similar to how HTMLPurifier/HTMLModule/Text.php does Flow)

- header and footer need to exclude header/footer/main descendants;
see the 'excludes' attribute; also an example in Text.php (pre)

- Ditto with address, use the same technique

- hgroup got removed from the HTML5 spec, so doesn't belong here.

- The figure specification doesn't look right; I think you need
an asterisk after the Flow. A plain spec 'Flow' is special-cased.
I suspect your specifications also exclude plain text.

- figcaption is not Inline, give it false instead.

- I'm a little worried about video tag, but the definition you've
given is probably OK. I'm not sure if it should be allowed by
default. Definitely autoplay should not be allowed. The contents
has the same problem as figure.

- We should already have the inline elements; are the existing
definitions buggy?

- For ins/del datetime, ideally we would apply the HTML5 parse a
date or time string and validate it, see
http://www.w3.org/TR/html5/infrastructure.html#parse-a-date-or-time-string

- data-mce-src/data-mce-json don't look like they're in the HTML5
spec. Also the 'Text' specification is a bit worrisome; does
TinyMCE specify how to interpret these?

- iframe allowfullscreen isn't an HTML5 attribute. And it shouldn't
be allowed by default anyway, should be gated by Tricky at least.

- All the other attributes should already exist. Is there a reason
they had to be set to Text? That's also worrisome.

lukus’s picture

Thanks Edward—I haven't got time to carry out all of the alterations at the moment. If anyone else would like to take this on, please do. I'll attempt to look at this again in a couple of weeks.

Jorge Navarro’s picture

Using CKEditor+HTML purifier+Allowing html5 tags like <audio> -> all HTML5 tags are being stripped.
Subscribing since it's a must have!

lukus’s picture

Assigned: Unassigned » lukus
paulsheldrake’s picture

I was not able to get the patch from #18 to work. I got the latest dev version of the the module and applied the patch but started get fatal errors in the HTML Purifier Library about unexpected indexes and finalised config objects. I ended up backing the patch out.

I used the latest 2.x-dev version and managed to find a hook that allows us to edit the html definition. I've attached a example module that uses that hook and adds the HTML5 tags that Lukus had referenced in github. Using this hook I no longer get errors when adding HTML5 tags to the allowedTags field in the UI and the filter works as expected.

Jorge Navarro’s picture

Any news on this? Should we move to 2.x-dev to get HTML5 support?

vyasamit2007’s picture

Version: 7.x-2.x-dev » 8.x-1.x-dev

Should we implement something like this? - https://github.com/xemlock/htmlpurifier-html5 and make sure HTML5 is supported in D8 version as well? Changing the version for this issue.

Thanks!
~Amit

silverham’s picture