1 (edited by patnaik 2013-06-09 22:04:08)

Topic: Evaluating htmLawed

The htmLawed website provides a number of resources to evaluate htmLawed: a demo page, extensive documentation, source code, test cases, and filtering results against a large number and variety of XSS code.

These resources should help one independently evaluate htmLawed.

htmLawed is fast, small in size (one ~50 kb file) and memory use, highly customizable, and rich in features.

htmLawed allows both black- and white-listing of tags and attributes, and htmLawed does not require many configuration values to be set -- using htmLawed can be as simple as putting in this code 'htmLawed($input)', and a filtering of 'dangerous' HTML can be done with just 'htmLawed($input, array('safe'=>1))'.

A comparison of standalone HTML filters shows that only the HTMLPurifier script comes the closest to htmLawed in terms of efficacy and features. Though good, HTMLPurifier is slow, 15-20 times bigger, uses scores (hundreds?) of files, and consumes a few megabytes of RAM memory just to be loaded. It does not provide full HTML support, lacks some features like customizable code beautification, and has poor end-user documentation. Its code is also no longer PHP 4-compatible.

htmLawed is also a simpler alternative to using HTML Tidy as there is no need to install an external, non-PHP library or a PHP extension.

htmLawed does have minor limitations, also detailed in the documentation. E.g., it permits '<table></table>' even though empty 'table' elements are not permitted as per standards. But from a practical perspective, does that break page display or layouts, introduce security vulnerabilities, or crash applications? No.

The logic of htmLawed puts a priority on safety, speed, tolerance for HTML as it is commonly used, and customization rather than an absolute adherence to standards. Note that no browser enforces 100% standards-compliance. HTML standards themselves continue to evolve with multiple specifications and varying degree of support among different browsers and their versions.