Deep Dive: Text Formats, Editors & Filters
Text input and formatting is a major function of any content management system, including Backdrop, since text forms the bulk of website content. Furthermore, humans type text, but browsers display HTML. A CMS, therefore, needs to be able to safely convert and format text for display on computer screens. Text is entered into Backdrop via two types of HTML fields:
- Text field (also "textfield") - a single line text input element
- Textarea - a multi-line input element
Backdrop captures text input from users in its raw form, saving whatever gets submitted straight to the database without alteration. However, before displaying this content in the browser, Backdrop processes the text to ensure that it is properly formatted, but also, and most importantly, that it is safe to display. Several security issues may arise from text which is allowed to be displayed without filtering. These issues will be discussed later in this document.
Why doesn't Backdrop filter text before saving input into the database? The answer is simple; flexibility. Changing the text a user has input before saving it in the database, would make it impossible to get back to the original state. By filtering on output, not on input, Backdrop gives the site administrator the option of changing how content is displayed at any time.
Backdrop manages text entry and filtering and display of text through the Editor and Filter systems.
Filters and Text Formats
A filter is a set of rules that can be applied to transform text in some way. Some filters strip certain HTML tags or security hazards from text. Other filters look for special patterns and expand the text in a meaningful way. Filters know how to do one thing, and do it well; text in, filtered text out.
Some filters have extra configuration options. The "Limit allowed HTML tags", for example, strips all but an allowed set of HTML tags from text. The set of allowed tags can be determined by the administrator.
Backdrop ships with the following core filters:
Display any HTML as plain text - Converts any HTML characters to plain text.
Limit allowed HTML tags - The "Limit allowed HTML tags" filter is primarily responsible for removing HTML tags from text. It can be configured to allow any number of tags (whitelist) and it will remove the rest. It removes them either by stripping them, or by escaping them into entities like this: <div> If tags are escaped, they show up in the output as visible tags: <div>Some text</div>. The set of tags that are allowed by default include: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
The final task of the "Limit allowed HTML tags" filter is to add a spam link deterrent to anchor tags. The deterrent, proposed by Google, gives search engines a tip about which links to follow when crawling the web. If this option is enabled, rel="nofollow" will be added as an attribute of all anchor tags.
Convert line breaks into HTML - This filter converts line breaks into <br> or <p> tags depending on whether a single or double line break is found. This preserves the paragraph formatting in the text that is input.
Convert URLs into links - Any web or email addresses that are found in the text will be converted to clickable links, thus saving the user the hassle of having to type <a href="....">.
Convert image captions to figure and figcaption elements - This filter works with CKEditor's image caption plugin to convert captions typed onto images into HTML5 <figure> and <figcation> elements.
Float images left and right using the data-align attribute - CKEditor assigns a "data-align" property to images when you indicate that an image should be aligned and this filter converts data-align property to a corresponding CSS "align-*" class. For example, an image aligned left in the CKEditor image dialog would be saved in the node as <img data-align="left" ...>; this data property would be converted on rendering to <img class="align-left" ...>
Correct faulty and chopped off HTML - Scans the text and makes sure that HTML tags are properly closed.
Contributed modules can add many more filter options to help format text.
Text formats
A Text format is an ordered collection of filters. Any HTML text that is saved in Backdrop is saved with an associated text format, and when this HTML is being displayed to the browser it is run through the filters in its Text format first. The Text format then applies all of the filters, in the right order, so that one filter feeds its output to the next, forming a chain.
Plain text format
This format results in plain text, without HTML elements. It applies the following filters:
- Display any HTML as plain text
- Convert URLs into links
- Convert line breaks into HTML (i.e. <br> and <p>)
Filtered HTML format
This format applies the following filters to text:
- Limit allowed HTML tags
- Convert URLs into links
- Convert line breaks into HTML (i.e. <br> and <p>)
- Convert image captions to figure and figcaption elements
- Float images left and right using the data-align attribute
- Correct faulty and chopped off HTML
Full HTML format
This format applies the following filters to text:
- Convert URLs into links
- Convert line breaks into HTML (i.e. <br> and <p>)
- Convert image captions to figure and figcaption elements
- Float images left and right using the data-align attribute
- Correct faulty and chopped off HTML
Note that this is the same list as the Filtered HTML format except that HTML tags are not limited.
Text entry editor
The text entry editor (otherwise known simply as the Editor) in default Backdrop is CKEditor. Read more about CKEditor on the official website. It is an open source WYSIWYG (What You See Is What You Get) text editor designed to bring common word processor features directly to web pages, simplifying their content creation.
CKEditor allows Backdrop to attach multiple text manipulation buttons to textareas, which allow you to apply HTML formatting to text without needing to type in HTML directly.
For example selecting text and clicking the Bold button will surround the selected text with HTML <strong></strong> tags. The underlying HTML is however not directly displayed, but CKEditor rather displays the text as it would appear in the browser so that the text formatting you see in CKEditor is what you get on the web page.
Before switching an existing site from using no text editor to using CKEditor, please see "Caution when switching an existing site from no text editor to CKEditor" on the CKEditor Backdrop documentation page.
The Backdrop editor API can also accommodate other editors as contributed modules.
Editor configurations
Editors may be enabled on textarea fields. The Body field in the default Post and Page content types are typical examples of textareas with an enabled Editor. An Editor configuration is a configuration in which a text entry editor has been paired with a Text format (a combination of text filters).
The Formatting options fieldset below the Body field holds the select field which allows you to choose different Editor configurations. In a newly installed site, there are two Editor configurations:
- Filtered HTML
- Full HTML.
Selecting a new Editor configuration updates the editor attached to the textarea being edited. For example, on the Body field, changing from the Filtered HTML configuration to the Full HTML reloads a new CKEditor build with new (and more comprehensive) buttons which allow the full range of HTML tags and elements to be used. This action also importantly attaches the Full HTML Text format to this piece of content.
Saving text with a chosen format
Text which has been entered using a particular Text format will have the filters in that format applied once the text is being displayed (rendered).
If for example you are creating or editing a new Post and are typing content into the Body field, you will note that the default Editor configuration is Filtered HTML. The CKEditor instance attached to the field has a limited number of tags, reflecting the allowed tags from the "Limit allowed HTML tags" filter. If however you were to reveal the source HTML text (there is a "View source" button in CKEditor), it would be perfectly possible to type in any HTML element or tag you wish (a <table> tag for example). The Body content you entered would be saved as typed including the <table> tag.
However, on attempting to view the content (by viewing the published node) you will note that the table will not be printed, and in fact, the <table> tags would not be in the HTML source presented to the browser by Backdrop. This is because since the node was saved with the Filtered HTML as its format, on view this format would apply its "Limit allowed HTML tags" filter and strip out the disallowed <table> tag from view.
How then to display table tags on a Backdrop site?
If you were to open this node's edit form, and simply change the Editor configuration to Full HTML and save, the node would now be saved with this new format. On view, all tags would now be rendered, since the "Limit allowed HTML tags" filter is not part of the Full HTML format. No tags would then be stripped from the HTML source.
Alternatively, you could modify the allowed tags in the "Limit allowed HTML tags" filter to include and allow the <table> tag.
Or a final alternative would be to create an entirely new Text format which either includes the "Limit allowed HTML tags" filter but allows <table> tags, or does not include this filter at all, and use this new format when editing or creating your nodes.
Modifying Editor configurations
To modify Editor configurations, go to Configuration > Content authoring > Text editors and formats to view a table list of all available Editor configurations.
Click the "Configure" link on the row for the Editor configuration you wish to change. This loads the configuration form which provides several fields.
- Name - this field allows you to rename the Text format
- Editor - attaches an editor to this configuration. Only CKEditor is available by default or you can select none.
- CKEditor toolbar - a drag and drop interface for changing CKEditor buttons for this format. Icons may be dragged from the "Available buttons" row onto the "Active toolbar" row to activate these features for this configuration.
- Image uploading - fields for enabling and configuring image uploads in CKEditor. Once the "Enable image uploads" checkbox is checked, the fieldset becomes expanded allowing you to set the Upload directory for images, the Maximum file size and Maximum dimensions for each uploaded image.
- Roles - allows you to select which roles can use this format.
- Enabled filters - allows you to add filters to this format. Selections made here dynamically update the next two sections.
- Filter processing order - lists the enabled filters in draggable rows to determine the order that filters will be processed.
- Filter settings - presents the settings fields for any configurable filters.
Creating new Editor configurations
The process is similar to the preceding: go to Configuration > Content authoring > Text editors and formats and click the "Add text editor" link and continue the other steps for this page.