PHP 4.3.0 change in strip_tags breaks flexinode+HTML Filter

By Damien Tournoud on 13 May 2005 at 17:09 UTC

Since PHP 4.3.0, comments are also stripped by the strip_tags() function. That means that the HTML filter also strip the

tag. That's generally the good behavior, because the teaser is cut before filters are applied to the content of the node.

BUT: flexinode's teasers are generated AFTER the filters are applied (because each field must be parsed for generating the node content). So PHP 4.3.0 breaks the generation of the teaser of flexinodes.

My suggestion is to makes the

tag pass thru the HTML filter (patch enclosed).

Damien TOURNOUD

--- filter.module.orig  2005-05-13 19:07:21.425782058 +0200
+++ filter.module       2005-05-13 19:07:59.788613542 +0200
@@ -916,6 +916,9 @@
  * HTML filter. Provides filtering of input into accepted HTML.
  */
 function _filter_html($text, $format) {
+
+  $text = str_replace("<!--break-->", "%%break%%", $text);
+
   if (variable_get("filter_html_$format", FILTER_HTML_STRIP) == FILTER_HTML_STRIP) {
     // Allow users to enter HTML, but filter it
     $text = strip_tags($text, variable_get("allowed_html_$format", '<a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>'));
@@ -929,6 +932,8 @@
     // Escape HTML
     $text = check_plain($text);
   }
+
+  $text = str_replace("%%break%%", "<!--break-->", $text);

   if (variable_get("filter_html_nofollow_$format", FALSE)) {
     $text = preg_replace('/<a([^>]+)>/i', '<a\\1 rel="nofollow">', $text);

Comments

Out of band information

Steven commented 13 May 2005 at 17:21

This is really not a good way to fix it. Before, I've used bytes 0xFF and 0xFE to encode out-of-band information (as they are not used in UTF-8), but this depends on PHP not actually knowing UTF-8.

It would be better to start using private-use characters for this and standardise their use in Drupal.

PS: To anyone reading, the break got stripped out in the code above because of teaser handling too. The reason is we strip

from the content before filtering it, but at this point the <code> tags haven't been parsed by codefilter yet.

--
If you have a problem, please search before posting a question.

Ok, my mistake...

Damien Tournoud commented 13 May 2005 at 17:41

... I didn't pay attention to the code block. By the way, what's the use of filtering the break comment?

  // Remove the delimiter (if any) that separates the teaser from the body.
  // TODO: this strips legitimate uses of '<!--break-- >' also.
  $node->body = str_replace('<!--break-- >', '', $node->body);

Non-HTML content

Steven commented 14 May 2005 at 23:15

The reason it's there is because the content may not be in HTML. Leaving the comment in would mean it would appear on screen, if the content was marked up in Textile for example.

--
If you have a problem, please search before posting a question.

So what's a clean solution?

darius commented 14 May 2005 at 23:03

I just happened onto the same problem today (strip_tags stripping HTML comments, including "break"). Is this a flexinode specific issue only? Should there be a bug report about this? Or will this have to be addressed in filter.module or node.module?

Darius

No perfect solution?

Damien Tournoud commented 16 May 2005 at 10:32

I don't think there is any perfect solution for this problem except using reserved characters : strip_tags will deeply change the string, and you can always find a case where a simple replacement will break.

A simple solution that work in most case could be to use some string magic to protect a "%break%" tag that could be in the user input (see below). But this breaks with the following input:

%break<br/>%

Damien

--- filter.module.orig 2005-05-13 19:07:21.425782058 +0200
+++ filter.module 2005-05-13 19:07:59.788613542 +0200
@@ -916,6 +916,9 @@
  * HTML filter. Provides filtering of input into accepted HTML.
  */
 function _filter_html($text, $format) {
+
+ $text = str_replace("%break%", "%break %", $text);
+ $text = str_replace("<!--break -->", "%break%", $text);
+
   if (variable_get("filter_html_$format", FILTER_HTML_STRIP) == FILTER_HTML_STRIP) {
     // Allow users to enter HTML, but filter it
     $text = strip_tags($text, variable_get("allowed_html_$format", '<a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>'));
@@ -929,6 +932,8 @@
     // Escape HTML
     $text = check_plain($text);
   }
+
+ $text = str_replace("%break%", "<!--break -->", $text);
+ $text = str_replace("%break %", "%break%", $text);
   if (variable_get("filter_html_nofollow_$format", FALSE)) {
     $text = preg_replace('/<a([^>]+)>/i', '<a\\1 rel="nofollow">', $text);

Another way

darius commented 17 May 2005 at 16:13

I ended up using this:

+ $text = str_replace(array('<!--', '-->'), array('&lt;!--', '--&gt;'), $text);
...

+ $text = str_replace(array('&lt;!--', '--&gt;'), array('<!--', '-->'), $text);

This retains any HTML in the code.

Darius