1

Topic: XSS and HTML filters

from a response to an email

HTML markup, and scripts

Browser applications interpret the HTML markup of web-pages to render a page as well as to perform actions such as form submissions and script executions. Among other things, scripts (most commonly Javascript code) enhance the usability of web-pages.

However, scripts can be and are used for less benevolent purposes, such as displaying ads on a web-pages and collecting private information such as users' browsing habits as well as their very private account usernames and passwords.

XSS

Mostly it is the HTML markup of a web-page that contains the code that calls and/or codes a script. XSS, or cross-site scripting, generally refers to being able to make a browser application interpret HTML markup on a web-page in a way that leads to the execution of 'malevolent' scripts.

XSS thus has many aspects:

* Malevolence may not be so from a different perspective (e.g., someone may be secretly collecting data for some good purpose)

* The interpretation of the HTML markup or the browser-behavior will vary depending on the type and version of the browser, and on the operating system, installed plug-ins, etc.

* The ultimate effect can be dependent on other factors such as a web-page server's configuration.

* The 'dangerous' HTML markup (or code) may be very 'visible' or it may have been obfuscated to escape attention.

* The dangerous code may have been deliberately put in by a web-page's author(s), or it could have been introduced by others either indirectly (e.g., database hacks using SQL injections) or directly (e.g., HTML code in some comment posted on a blog).

* Dangerous code may also be indirectly called using non-text content (such as a Flash movie) on a web-page.

Input HTML filters and XSS

It thus should be clear that not all XSS scenarios can be prevented by using input HTML filters.

Secondly, blocking XSS using HTML filters generally means blocking scripting altogether, which also means that elements like 'iframe', 'object' and 'embed', and attributes like 'style', 'onclick' and 'onmouseover' have to be filtered out. And that, in many cases, can reduce a web-page's functionality.

Different sections of a web-page can thus require the use of different types of HTML filtering. Highly customizable filters like htmLawed prove helpful in this context too. E.g., an admin may use htmLawed with a setting that permits 'onclick' on input from a 'trusted' user authoring a blog post, but use a different setting that disables scripting altogether on input from an anonymous person commenting on the blog post.

htmLawed more than anti-XSS

Besides anti-XSS'ing when asked , htmLawed does other things like balance tags and nest them properly, check character entities for standards-compliance, obfuscate email addresses to reduce spam, transform deprecated tags (like, 'b' to 'strong'), etc.

Also see this web-page illustrating anti-XSS efficacy of htmLawed with 'safe'=>1 against XSS code listed in RSnake's XSS cheat-sheet (http://ha.ckers.org/xss.html).

BBCode best anti-XSS for untrusted user input

For the limited scenario of filtering untrusted user input, the best anti-XSS means may be using BBCode (code like [ b ]bold[/ b] text) exclusively to permit limited HTML.