Hi,

I was under the impression that HTML Purifier would correct my HTML issues (according to standards) however, it seems it removes almost everything including images and Google maps.

Is there any way of treating this or is this just normal behaviour?

CommentFileSizeAuthor
#43 html-one.png24.35 KBAnonymous (not verified)
#20 iframes.patch1.5 KBdevkinetic
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

sp_key’s picture

Update:

Articles I've written that contain images are stripped as soon as I install html purifier (without even enabling it).

Going to input format I can see purifier is not enabled for all profiles, still I have no images on my site. As soon as I uninstall the module my images get back.

Any help or suggestion would be hugely appreciated!

ezyang’s picture

Hi!

External images are disabled by default, since Drupal's default HTML doesn't allow any images at all. You can turn them back on by setting "DisableExternalResources" to No.

As for Google Maps, it utilizes iframes, which are disallowed by HTML Purifier for obvious security reasons. There are some ways to work around this, but it will require writing a nominal amount of code. I can teach you how to do it, but it'll be kind of nontrivial if you've never written PHP before.

Cheers,
Edward

ezyang’s picture

Status: Active » Postponed (maintainer needs more info)
sp_key’s picture

ezyang,

Many thanks for your response.
I can confirm that turning DisableExternalResources off does indeed allow us to embed images from external sources.

With regards to Google maps, can you please think of a workaround that would allow us to use them?
I'm afraid my knowledge of PHP is simply null. I have the confidence of pasting some code to a document but writing my own code...

Any alternative approach would be extremely appreciated!

Cheers

ezyang’s picture

Would a SafeIframe functionality work for you? This would require you to explicitly whitelist domains that you'd want to allow iframes from.

bryancasler’s picture

I would like to second the SafeIframe whitelist idea. I think being explicit about who you trust is a perfect solution.

sp_key’s picture

Sounds an excellent idea.
Can you show me the right direction? I need to see a few examples, maybe an article or a drupal resource?

Many thanks!

ezyang’s picture

Title: HTML Purifier removes Images and Google maps » SafeIframe configuration for images and google maps
Status: Postponed (maintainer needs more info) » Needs work

Renamed.

ezyang’s picture

Status: Needs work » Postponed (maintainer needs more info)

We also need this because YouTube changed its embed code to use iframes. I need some UI advice from you guys: what kind of whitelisting mechanism do you want? Domains? Regexes? Arbitrary code? If we allow multiple whitelisting mechanisms, how do they interact with each other?

bryancasler’s picture

domain whitelisting would work to solve issues for non-mainstream websites.

Example Embed Code
ex www.democracynow.org

<script type="text/javascript" src="http://www.democracynow.org/embed_show_v2/300/2011/1/25/story/do_you_know_the_full_story"></script>
gesko’s picture

I have some problems to embed Amazon banner code:

<iframe src="http://rcm-de.amazon.de/e/cm?t=xxxxxxxxxxxx&o=3&p=20&l=ur1&category=generic&banner=1VH46RJT28QKG4Q5HM02&f=ifr" width="120" height="90" scrolling="no" border="0" marginwidth="0" style="border:none;" frameborder="0"></iframe>

I think domain whitelisting would be great.

Any other ideas on how I could embed this code to a block with htmlpurifier turned on?

kevinquillen’s picture

I had to write Filter for HTMLPurifier, and tell HTMLPurifier module to add the filter in the config:

http://stackoverflow.com/questions/5144189/htmlpurifier-iframe-regex-iss...

Now I can embed Google maps and other iFrame content.

It would be nice to add a domain whitelist, so iframes would be allowed if the source was Google, Youtube, Vimeo, etc.

mgifford’s picture

Ya, this is annoying. I had trouble embedding Youtube videos with this module enabled.

ParisLiakos’s picture

@Kevin Quillen:

can you provide more specific details on how you managed this?
I did what you mention on stack overflow, also read the HTMLPurifier forum but it wont work :(

kevinquillen’s picture

In the HTMLPurifier module you also have to add to _htmlpurifier_get_config():

$config->set('Filter.Custom', array( new HTMLPurifier_Filter_MyIframe() ));

kevinquillen’s picture

I know I should not hack the module, unless there is a hook I simply did not see.

It might be best to have the _config function invoke a hook so other modules can set their own filters or other HTML Purifier settings through code.

In my case, I cannot enable Advanced mode with the iFrame plugin (PHP error, something about it cannot render it in the form). So I had to adjust the module. Is there any other way to change settings through code without editing the module? I could not get the format specific config file to work.

ParisLiakos’s picture

Thanks a lot Kevin!!

I know that i shouldnt hack the module as well,but my client cant wait :/

So i just add this to my list with hacked modules to watch out on upgrades

devkinetic’s picture

Kevin,

Can you provide a more detailed explanation?

I added: $config->set('Filter.Custom', array( new HTMLPurifier_Filter_MyIframe() )); to _htmlpurifier_get_config()

But I'm unsure where to add the snippet from http://stackoverflow.com/questions/5144189/htmlpurifier-iframe-regex-iss....

Thanks!

ParisLiakos’s picture

devkinetic

i added it to HTMLPurifier_DefinitionCache_Drupal.php and it works perfectly:)

devkinetic’s picture

FileSize
1.5 KB

UPDATE: The issue i was having was the line break converter in Drupal was wrapping the iframe in a P tag. The code was working correctly, but because the block element was placed within the inline p tag, Purifier was stripping it out anyways because it was invalid HTML.

Here is a patch file that is comprised of the suggestions in this thread.

back to the main point though, a safe-list sounds like the best bet!

ParisLiakos’s picture

yeap +1 for domain whitelisting

El Bandito’s picture

Another +1 for whitelisting.

Cheers

El B

kevinquillen’s picture

I was able to get this working in 7.x but only briefly. Returning to the Text Format config form for any format utilizing HTML Purifier results in the following PHP error:

Object of class HTMLPurifier_Filter_MyIframe could not be converted to string";s:9:"%function";s:49:"HTMLPurifier_Printer_ConfigForm_default->render()

The page is not editable as it just says a generic Error message. It also points to line 266 of ConfigForm.php in the Printer library of HTMLPurifier:

 case HTMLPurifier_VarParser::ALIST:
                   $value = implode(PHP_EOL, $value);
                    break;

Commenting out $value makes the form show up.

What are some possible solutions to this problem? Is it the plugin code, or the way it is trying to be interpreted? Casting (string) on the imploded value there also makes the form re-appear, though I do not know what implications that has on the library.

kevinquillen’s picture

Version: 6.x-2.1 » 7.x-2.x-dev
btopro’s picture

4.4.0 of html purifier now supports safeiframe

ezyang’s picture

Status: Postponed (maintainer needs more info) » Fixed

Fixed. You need HTML Purifier 4.4.0, and you need to access the "Advanced Settings" (as they are not shown in basic settings.) The configuration you need to set is: turn on HTML.SafeIframe, and fill in URI.SafeIframeRegexp with the necessary values. Here is an example that allows YouTube and Vimeo: %^http://(www.youtube.com/embed/|player.vimeo.com/video/)%

Don't forget to add iframe (and the necessary attributes) to your allowed elements list, if you are manually configuring this.

kevinquillen’s picture

This still doesn't work. It gets stripped out.

btopro’s picture

if you have the remove empty items then yes it will. To fix this, give the iframe a name property and purifier will ignore it's remove empty things rule. You might also have to refresh the page after save as I've noticed I have to do this all the time after new-ly saving the node (6.x but should still be the same).

kevinquillen’s picture

Its so super confusing. I turned those off and cleared the cache, but iframes did not show up until the 8th reload. Why is that?

oriol_e9g’s picture

I have tested this and you have to touch more things.

1. You need: RemoveEmpty: No

2. If you have: RemoveEmpty.RemoveNbsp: Yes, then you need to add > RemoveEmpty.RemoveNbsp.Exceptions: iframe

3. If you use HTML Allowed > You need to add here: iframe[frameborder|marginheight|marginwidth|scrolling|src]

4. Put SafeIframe: Yes and I use for SafeIframeRegexp: %^http://(www.youtube.|player.vimeo.|maps.google.|www.slideshare.)%

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

theMusician’s picture

Version: 7.x-2.x-dev » 7.x-1.0-rc1

This does not appear to work with 7.x-1.0-rc1 of HTML Purifier. I am using 4.4.0 of the HTMLPurifier library.

I wish to embed the following video.
http://www.youtube.com/embed/e3OthM-seJs?wmode=opaque

My Settings:
SafeIframe: Yes
SafeIframeRegexp: %^http://(www.youtube.|player.vimeo.|maps.google.|www.slideshare.)%
RemoveEmpty: No
RemoveEmpty.RemoveNbsp: No
I have added the following to AllowedFrameTargets:
_blank
_self
_top
_parent
I am using the default allowed HTML.

The output src attribute of the iframe is stripped out when I use HTML Purifier, however with Full HTML allowed I can see that the src that is output is as follows: //www.youtube.com/embed/e3OthM-seJs?wmode=opaque

I am guessing the regex is incorrect but everything I have tried is not working. The src link is being created by the media module filter that runs before HTML Purifier.

Any ideas as to why I cannot get a video to appear? If 7.x-1.0-rc1 does not support this where can I grab the 2.x-dev version?

theMusician’s picture

Category: feature » bug
Status: Closed (fixed) » Active

I tried this in another environment and have the same results. No YouTube video is shown. The src attribute is stripped out upon save when using HTML purifier.

heddn’s picture

I tend to agree its a regex thing. Can you confirm if this an upstream library issue? If so, I'll point you to http://htmlpurifier.org/phorum/list.php?3.

theMusician’s picture

I have tried this with the standalone PHP library and it works great. The settings in the code block match what I have in Drupal.

require_once 'htmlpurifier/library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.SafeIframe', true);
$config->set('URI.SafeIframeRegexp','%^http://(www.youtube.com/embed/|player.vimeo.com/video/)%');
$config->set('Attr.AllowedFrameTargets', '_blank, _self, _target, _parent');
$config->set('Attr.EnableID', true);
$config->set('AutoFormat.Linkify', true);
$purifier = new HTMLPurifier($config);

echo '<h1>Show me my Movie</h1> <iframe width="560" height="315" src="http://www.youtube.com/embed/e3OthM-seJs" frameborder="0" allowfullscreen></iframe>';

I followed this thread, http://htmlpurifier.org/phorum/read.php?3,6237,6237#msg-6237 to set up the standalone version.

heddn’s picture

Make sure that none of the other filters, including core's html filter don't break what is going on with htmlpurifier. Disable all the other filters and see if it still doesn't work...

heddn’s picture

Status: Active » Postponed (maintainer needs more info)
theMusician’s picture

I apologize for the delay. I am only using videos on a few areas of this site. I have been using the default full html text format for the moment.

I turned off the two other filters, image resize and convert media tags to markup. I also tried it with one off the other one for both combinations with no luck. However, perhaps the media tag markup upon conversion is messing with HTML purifier. I am converting the media tags first in the filter processing order and html purify is running last.

If I switch that order and have the media tag markup filtered last the videos are output correctly. I am guessing this just avoids the regex check applied by HTML Purifier.

If it helps in diagnosis, the media markup that is output if I do not convert the markup with the media module's filter is as follows:

Video 1:
[[{"type":"media","view_mode":"media_large","fid":"33","attributes":{"alt":"Intro.mov","class":"media-image","typeof":"foaf:Image"}}]]

Video 2:
[[{"type":"media","view_mode":"media_large","fid":"35","attributes":{"alt":"WWU Summer Commencement 2011","class":"media-image","typeof":"foaf:Image"}}]]

For now, I will keep the order swapped as it works and the media sources are currently vetted before being posted. Thank you for the help.

Working Filter Order on text format

  • Image Resize
  • HTML Purifier
  • Convert media tags to markup
heddn’s picture

Status: Postponed (maintainer needs more info) » Fixed

Glad that worked for you.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Have you noticed a problem where the embedded media is wrapped in paragraph tags and mucks up the market when source is viewed?

Anonymous’s picture

Title: SafeIframe configuration for images and google maps » SafeIframe configuration for images, google maps, and videos
Version: 7.x-1.0-rc1 » 7.x-1.0
Status: Closed (fixed) » Active

Going to reopen this b/c I having the same issue.
Yes I understand the fix is to run the Convert media filter after htmlpurifier, however that still isn't optimal since you lose out on stripping the automatic paragraph tags that are wrapped around your embedded content.

To reproduce, use wysiwyg, media + media_youtube, and htmlpurifier. Remove all filters except Covert media and htmlpurifier. Run 1) convert media before htmlpurifier then 2) htmlpurifier before convertmedia.

Create some content and insert a youtube video with media. You'll notice in the first case the iframe renders but the src and embedded markup do not exist so you get a blank square, in addition there are no P tags around the iframe's container div if you view source. In the second case, the iframe and video are rendered correctly, but viewing source you see a pair of empty P tags above and below the iframe container div.

Any ideas?

Anonymous’s picture

FileSize
24.35 KB

Here are two images to illustrate my previous post.

one

two

Anonymous’s picture

Here are two images to illustrate my previous post.

one

two

trkest’s picture

Thank you oriol_e9g - this works for me!

hawkeye.twolf’s picture

#30 worked for me too (Thanks, oriol_e9g!) but note that I had to clear caches after making the configuration changes. Probably just clearing the HTML Purifier cache at admin/config/content/htmlpurifier should suffice.

Not recommended, but you can allow content from all sources by using %^.*% in the SafeIframeRegexp field.

ergow’s picture

#30 it doesn't work for me I loose image, youtube and vimeo video. I'm using HTML purifier 7.x 1.0, and HTML Purifier v4.5.0. Should I to change to HTML purifier 7.x-2.x-dev?

Core is 7.23.

Thanks a lot!

csuggs4’s picture

I've had some success with the #30's steps, plus the following regex for the SafeIframeRegexp:
%^(https?:)?//(www\.youtube(?:-nocookie)?\.com/embed/|player\.vimeo\.com/video/)%
This way it accommodates for a src that starts with "//", and also if you have http or https. I got it from the HTMLPurifier documentation.

Now, I said "some" success. I'm using the Media embed toolbar button. I can get the video to embed, and it shows when I view the node, but if I go to edit it again, all the iframe stuff gets stripped from the field. Has anyone else had this experience?

k.dani’s picture

Issue summary: View changes

Same problem. Just one more thing. If I disabled and re-enabled CKeditor on node edit form, the iframe is removed the same way as it is removed when someone is editing an existing node.

heddn’s picture

Category: Bug report » Support request

Based on the conversation involved here, this is more of a support question. Not a bug.

gisle’s picture

Since this is still active, I just point to an alternate solution for whitelisting (or "puryfying") HTML while allowing iframes to be embedded, but only from whitelisted domains.

This is the WYSIWYG filter + Src whitelist text filter.