When handling and outputting text in HTML, you need to be careful that proper filtering or escaping is done to prevent Cross-site Scripting (XSS) exploits.

When handling data, the golden rule is to store exactly what the user typed. When a user edits a post they created earlier, the form should contain the same things as it did when they first submitted it. This means that conversions are performed when content is output, not when saved to the database (be sure to read the db_query() documentation on how to use the database API securely).

User-submitted data in Backdrop can be divided into three categories:

  1. Plain-text

    This is simple text without any markup. What the user entered is displayed exactly on screen as is, and is not interpreted in any form. This is generally the format used for single-line text fields.

    When outputting plain-text, you need to pass it through check_plain() before it can be put inside HTML. This will convert quotes, ampersands and angle brackets into entities, causing the string to be shown literally on screen in the browser.

    Most themeable functions and APIs take HTML for their arguments, and there are a few that automatically sanitize text by first passing it through check_plain() :

    • t(): the placeholders (e.g. '%name' or '@name') are passed as plain-text and will be escaped when inserted into the translatable string. You can disable this escaping by using placeholders of the form '!name' but only if you are sure that the string is safe.
    • l(): the link caption should be passed as plain-text (unless overridden with the $htmlparameter).
    • menu items and breadcrumbs: the menu item titles and breadcrumb titles are automatically sanitized.
    • theme('placeholder'): the placeholder text is plain-text.
    • Block descriptions (but not titles--see below)
    • Form API (FAPI) #default_value element and #options element when the type is a select box.
      Examples:
      <?php
      $form
      ['safe'] = array(
        '#type' => 'textfield',
        '#default_value' => $u_supplied,
      );

      $form['also_safe'] = array(
        '#type' => 'select',
        '#default_value' => 0, // FAPI will pass through check_plain(),
        '#options' => node_get_types('names'),  // FAPI will sanitize the '#options' attribute with check_plain() for select boxes.
      );

      ?>

    Some places require that you first sanitize any text:

    • Watchdog messages
      Examples:
      (The message and variables are passed through t() by the watchdog function):
          watchdog('content', "Deleted !title", array('!title' => $node->title)); // XSS
          watchdog('content', "Deleted %title", array('%title' => $node->title)); // or @
    • Form elements #description and #title
      Examples:
      $form['bad'] = array(
          '#type' => 'textfield',
          '#default_value' => check_plain($u_supplied),  // bad: escaped twice
          '#description' => t("Old data: !data", array('!data' => $u_supplied)), // XSS
          );

      $form['good'] = array(
          '#type' => 'textfield',
          '#default_value' => $u_supplied,
          '#description' => t("Old data: @data", array('@data' => $u_supplied)),
          );
    • Form elements - #value of #type markup and item need to be safe. Note that the
      default form element #type is markup!
      Examples:
      $form['unsafe'] = array('#value' => $user->name); //XSS
          $form['safe'] = array('#value' => check_plain($user->name));
          or
          $form['safe'] = array('#value' => theme('username', $user));
  2. Rich text

    This is text which is marked up in some language (HTML, Textile, etc). It is stored in the markup-specific format, and converted to HTML on output using the various filters that are enabled. This is generally the format used for multi-line text fields.

    All you need to do is pass the rich text through check_markup() and you'll get HTML returned, safe for outputting. You should also allow the user to choose the input format with a format widget through filter_form() and should pass the chosen format along tocheck_markup() .

    Note that you must make sure that the author of a post is allowed to use a particular input format, typically by checking with  filter_access() when the content is being submitted. However, because content is filtered on output, this is often not the person who originally wrote the content. In that case, you can disable this check by passing $check = false to check_markup() .

  3. Admin-only HTML

    Examples include the mission statement, posting guidelines, and forum descriptions.

    For such cases, you can use a regular text-area, and pass the text through filter_xss_admin() when you output it. This will allow most HTML tags to pass through, while still blocking possibly harmful script or styles.

URLs across Backdrop require special handling in two ways:

  1. If you wish to put any sort of dynamic data into a URL, you need to pass it through urlencode() . If you don't, characters like '#' or '?' will disrupt the normal URL semantics. urlencode() will prevent this by escaping them with %XX syntax. Note that Backdrop paths (e.g. 'node/123') are passed through  urlencode() as a whole so you don't need to urlencode individual parts of it. This convenience does not apply to other parts of the URL like GET query arguments or fragment identifiers.
  2. When using user-submitted URLs in a hyperlink, you need to use  check_url() rather than just  check_plain() .  check_url() will call  check_plain() , but also perform additional XSS checks to ensure the URL is safe for clicking on.

Note that all Backdrop functions which return URLs ( url() ,  request_uri() , etc.) output plain URLs which have not been HTML escaped in any way (in other words, they are plain-text). Remember to use  check_url() to escape them when outputting HTML (or XML). Don't use check_url() in situations where a real URL is expected, e.g. in the HTTP Location: ... header.

In practice

All the rules above can be summed up quite easily: no piece of user-submitted content should ever be placed as-is into HTML. If you are unsure of whether this is the case, you can always test it by submitting a piece of text like <u>xss</u> into your module's fields. If the text comes out underlined or mangles existing tags, you know you have a problem.

Here are some examples of good and bad code. $title$body and $url are assumed to be user-submitted fields containing a title, a piece of marked up text and a URL respectively. They are fresh from the database and thus contain exactly what the user submitted without any changes.

Bad:
<?php print '<tr><td>$title</td><td>'; ?>
<?php print '<a href="/..." title="$title">view node</a>'; ?>

Good (the title is plain-text and may not be placed into HTML as is):
<?php print '<tr><td>'. check_plain($title) .'</td></tr>'; ?>
<?php print '<a href="/..." title="'. check_plain($title) .'">view node</a>'; ?>

Bad:
<?php print l(check_plain($title), 'node/'. $nid); ?>

Good (l() already contains a check_plain() call by default):
<?php print l($title, 'node/'. $nid); ?>

Bad:
<?php print '<a href="/$url">'; ?>
<?php print '<a href="/'. check_plain($url) .'">'; ?>

Good (URLs must be checked with check_url()):
<?php print '<a href="/'. check_url($url) .'">'; ?>