Handle text in a secure fashion

Last updated on

13 March 2019

Drupal 7 will no longer be supported after January 5, 2025. Learn more and find resources for Drupal 7 sites

When handling and outputting text in HTML, you need to be careful that proper filtering or escaping is done. Otherwise, there might be bugs when users try to use angle brackets or ampersands, or worse you could open up XSS exploits.

Any variable or data you put out onto the page needs to be thought about carefully. If it came from a user, a form, a node edit page, a file upload or an API call, you need to sanitize it before you use it.

What are the risks?

Data that comes from users, even trusted users, may contain HTML tags. Since you're essentially compiling a HTML document, allowing other peoples' HTML tags into your page is a potentially dangerous thing for your page. What could happen if, for example, somebody saved some HTML tags in their node title field, and your site just displayed them without thinking? What if they inserted just an opening tag, with no closing tag? Your whole page could break!

Particularly dangerous are <script> HTML tags. If somebody were to put a script tag into a field or form value somewhere, and your site were to just print it, then they would have introduced external code to your site that would execute for thousands of users.

Why isn't the data safe to use?

When handling data, the golden rule is to store exactly what the user typed. When a user edits a post they created earlier, the form should contain the same things as it did when they first submitted it (including any dangerous content they may have included). This means that conversions are performed when content is output, not when saved to the database (be sure to read the db_query() documentation on how to use the database API securely).

Take care to sanitize your data

Most themeable functions and APIs take HTML for their arguments, and there are a few that automatically sanitize text by first passing it through check_plain(). It's worth looking up the functions or their documentation if you're not sure.

t(): the placeholders (e.g. '%name' or '@name') are passed as plain-text and will be escaped when inserted into the translatable string. You can disable this escaping by using placeholders of the form '!name' but only if you are sure that the string is safe.
l(): the link caption should be passed as plain-text (unless overridden with the $html parameter).
menu items and breadcrumbs: the menu item titles and breadcrumb titles are automatically sanitized.
theme('placeholder'): the placeholder text is plain-text.
Block descriptions
Watchdog messages put their content through t(). So do as you would for t().

Form elements

When using the Form API to create your form, some element properties are sanitized for you. Most, though, you'll need to sanitize yourself using an appropriate filter.

Sanitize yourself: #title, #description, #value

Sanitized by Drupal: #default_value, #options

Example:

$form['my_safe_element'] = array(
  '#type' => 'select',
  '#title' => check_plain($node->title),
  '#description' => check_plain($user_description),
  '#default_value' => $user_value, // FAPI will pass through check_plain(),
  '#options' => node_get_types('names'),  // FAPI will sanitize the '#options' attribute with check_plain() for select boxes.
);

Good and bad examples:

$form['bad'] = array(
 '#type' => 'textfield',
 '#default_value' => check_plain($u_supplied),  // Bad: escaped twice.
 '#description' => t("Old data: !data", array('!data' => $u_supplied)), // XSS risk!
);

$form['good'] = array(
 '#type' => 'textfield',
 '#default_value' => $u_supplied, // Better :)
 '#description' => t("Old data: @data", array('@data' => $u_supplied)), // Better :)
);

Formats and use-cases

Plain-text

To output something as simple plain-text, with no working markup, pass it through check_plain().

This will convert quotes, ampersands and angle brackets into entities, causing the string to be shown literally on screen in the browser. This is generally what you want for most scenarios, unless you're specifically working with rich text or HTML fields. What the user entered is displayed exactly on screen as is, and will not be interpreted in any way by the browser, which makes it safe.

If in doubt about which format to use, try check_plain() anyway and see if your field displays how you need it to.

Rich text

This is text which is marked up in some language (HTML, Textile, etc). It is stored in the markup-specific format, and converted to HTML on output using the various filters that are enabled. This is generally the format used for multi-line text fields.

All you need to do is pass the rich text through check_markup() and you'll get HTML returned, safe for outputting. You should also allow the user to choose the input format with a text_format form element and should pass the chosen format along to check_markup().

Note that you must make sure that the author of a post is allowed to use a particular input format, typically by checking with filter_access() when the content is being submitted. Note that in Drupal 6 check_markup() performs this check for the current user by default. However, because content is filtered on output, this is often not the person who originally wrote the content. In that case, you can disable this check by passing $check = false to check_markup().

Admin-only HTML

There are some places in the administration section where it is impractical to invoke the filter system (for rich text), but where some simple markup is desired, such as a link or some emphasis (so plain text is not acceptable). Examples include the mission statement, posting guidelines, and forum descriptions.

For such cases, you can use a regular text-area, and pass the text through filter_xss_admin() when you output it. This will allow most HTML tags to pass through, while still blocking possibly harmful script or styles.

URLs

URLs across Drupal require special handling in two ways:

If you wish to put any sort of dynamic data into a URL, you need to pass it through urlencode().
If you don't, characters like '#' or '?' will disrupt the normal URL semantics. urlencode() will prevent this by escaping them with %XX syntax.

Note that Drupal paths (e.g. 'node/123') are passed through urlencode() as a whole so you don't need to urlencode individual parts of it. This convenience does not apply to other parts of the URL like GET query arguments or fragment identifiers.
When using user-submitted URLs in a hyperlink, you need to use check_url() rather than just check_plain().
check_url() will call check_plain(), but also perform additional XSS checks to ensure the URL is safe for clicking on.

Note that all Drupal functions which return URLs (url(), request_uri(), etc.) output plain URLs which have not been HTML escaped in any way (in other words, they are plain-text). Remember to use check_url() to escape them when outputting HTML (or XML). Don't use check_url() in situations where a real URL is expected, e.g. in the HTTP Location: ... header.

In practice

All the rules above can be summed up quite easily: no piece of user-submitted content should ever be placed into HTML. If you are unsure of whether this is the case, you can always test it by submitting a piece of text like <u>xss</u> into your module's fields. If the text comes out underlined or mangles existing tags, you know you have a problem.

Here are some examples of good and bad code. $title, $body and $url are assumed to be user-submitted fields containing a title, a piece of marked up text and a URL respectively. They are fresh from the database and thus contain exactly what the user submitted without any changes.

Bad:
<?php print "<tr><td>$title</td></tr>"; ?>
<?php print '<a href="/..." title="' . $title . '">view node</a>'; ?>

Good (the title is plain-text and may not be placed into HTML as is):
<?php print '<tr><td>'. check_plain($title) .'</td></tr>'; ?>
<?php print '<a href="/..." title="'. check_plain($title) .'">view node</a>'; ?>

Bad:
<?php print l(check_plain($title), 'node/'. $nid); ?>

Good (l() already contains a check_plain() call by default):
<?php print l($title, 'node/'. $nid); ?>

Bad:
<?php print '<a href="/' . $url . '">'; ?>
<?php print '<a href="/'. check_plain($url) .'">'; ?>

Good (URLs must be checked with check_url()):
<?php print '<a href="/'. check_url($url) .'">'; ?>

Writing filters

When writing a filter which translates from another markup language into HTML, you need to ensure you don't open any holes yourself. Generally, the same rules apply: check URLs with check_url() and ensure no literal HTML can be injected by escaping appropriately using check_plain().

Help improve this page

Page status: No known problems

You can:

Log in, click Edit, and edit this page
Log in, click Discuss, update the Page status value, and suggest an improvement
Log in and create a Documentation issue with your suggestion

On this page

Writing secure code

Handle text in a secure fashion

What are the risks?

Why isn't the data safe to use?

Take care to sanitize your data

Form elements

Formats and use-cases

Plain-text

Rich text

Admin-only HTML

URLs

In practice

Writing filters

Help improve this page

News items

Our community

Documentation

Drupal code base

Governance of community