Why does Drupal filter on output?

Last updated on
20 September 2016

Some web applications process/filter the user input in the name of security before storing it in the database. Historically, Drupal has preserved user input as is, and filter it on output only. This is occasionally debated within the Drupal community.

Steven Wittens' excellent article Safe string theory for the web provides a full technical explanation of why it is best to preserve the original user input. The type of filtering needed depends on the output context. Acting on input can be quite problematic because you do not know what characters are forbidden without knowing the context in which they will appear.

To make things even trickier, a certain string could appear in more than one context at one time. For example, the same string might be used as HTML text, and as an HTML attribute too:
<a title="$node->title">$node->title</a>

If you attempt to strip all "special" characters from this string, you'll be unable to output meaningful text in an HTML page. Encoding those characters won't help either. In addition, encoding creates another problem, in that processing the escaped or encoded text is very cumbersome (for example, consider the difficulty of extracting a teaser from an HTML-escaped node body).

The best choice is to store the user input unchanged, and perform proper escaping upon output. As much as possible, variables should be escaped prior to the theme layer in a way appropriate for their most likely use.

For information on which functions to sanitizing output with, look at http://api.drupal.org/api/drupal/includes!common.inc/group/sanitization/7