I've seen a bit of discussion on the filtering of html from user input and how the user specifies that their submission is text or html.
Below is a modification to two functions in node.module.
1. node_filter()
2. node_filter_html()
With the code below html filtering is either all or nothing. If filtering is turned on then the allowed tags are let go and the disallowed tags are converted to:
<tag>
If html filtering is turned off the drupal "line break" filter is used and all tags are converted as above.
To test this code the allowed_html entry in the variables table needs to contain the allowable tags in the following format:
a,b,dd,dl,dt,i,li,ol,u,ul,p,br
Note: I am by no means a php coder. the code below is predominantly a cut and paste job from the phpBB code, if it breaks something else I don't know, if it breaks your system, sorry ....
-- node_filter --
<code>function node_filter($text) {
$html_entities_match = array('#&#', '#<#', '#>#');
$html_entities_replace = array('&', '<', '>');
$unhtml_specialchars_match = array('#>#', '#<#', '#"#', '#&#');
$unhtml_specialchars_replace = array('>', '<', '"', '&');
if (variable_get("filter_html", 0)) {
/*
** filter out any unwanted html tags
*/
$text = node_filter_html($text);
}
else
{
/*
** use the drupal "line break" filter
*/
$text = preg_replace($html_entities_match, $html_entities_replace, node_filter_line($text));
}
/*
** filter links
*/
if (variable_get("filter_link", 0)) $text = node_filter_link($text);
return $text;
}
-- node_filter_html --
<code>function node_filter_html($text) {
/*
** This function will prepare a posted message for
** entry into the database.
**
** This code has been shamelessly "borrowed" from the
** phpBB project [www.phpbb.com]
**
*/
$html_entities_match = array('#&#', '#<#', '#>#');
$html_entities_replace = array('&', '<', '>');
$unhtml_specialchars_match = array('#>#', '#<#', '#"#', '#&#');
$unhtml_specialchars_replace = array('>', '<', '"', '&');
//
// Clean up the message
//
$message = trim($text);
$allowed_html_tags = split(',', variable_get("allowed_html", "a,b,dd,dl,dt,i,li,ol,u,ul,p,br"));
$end_html = 0;
$start_html = 1;
$tmp_message = '';
$message = ' ' . $message . ' ';
while ( $start_html = strpos($message, '<', $start_html) )
{
$tmp_message .= preg_replace($html_entities_match, $html_entities_replace, substr($message, $end_html + 1, ( $start_html - $end_html - 1 )));
if ( $end_html = strpos($message, '>', $start_html) )
{
$length = $end_html - $start_html + 1;
$hold_string = substr($message, $start_html, $length);
if ( ( $unclosed_open = strrpos(' ' . $hold_string, '<') ) != 1 )
{
$tmp_message .= preg_replace($html_entities_match, $html_entities_replace, substr($hold_string, 0, $unclosed_open - 1));
$hold_string = substr($hold_string, $unclosed_open - 1);
}
$tagallowed = false;
for($i = 0; $i < sizeof($allowed_html_tags); $i++)
{
$match_tag = trim($allowed_html_tags[$i]);
if ( preg_match('/^<\/?' . $match_tag . '(?!(\s*)style(\s*)\\=)/i', $hold_string) )
{
$tagallowed = true;
}
}
$tmp_message .= ( $length && !$tagallowed ) ? preg_replace($html_entities_match, $html_entities_replace, $hold_string) : $hold_string;
$start_html += $length;
}
else
{
$tmp_message .= preg_replace($html_entities_match, $html_entities_replace, substr($message, $start_html, strlen($message)));
$start_html = strlen($message);
$end_html = $start_html;
}
}
if ( $end_html != strlen($message) && $tmp_message != '' )
{
$tmp_message .= preg_replace($html_entities_match, $html_entities_replace, substr($message, $end_html + 1));
}
$message = ( $tmp_message != '' ) ? trim($tmp_message) : trim($message);
$text = $message;
return $text;
}
Comments
Comment #1
al commentedIs this still an issue?
IMHO, we still haven't got <br /> filtering right yet. The conditions under which it is applied aren't very obvious; it doesn't work intuitively.
We probably need to look at this in more detail...
Comment #2
jonbob commentedI don't think this applies anymore.