1

Topic: Dev: Allowing non-standard attributes / elements

Admins might occasionally wish htmLawed permitted more non-standard HTML element attributes. This is especially true when the filter is used on old text. Some might also want to modify the htmLawed code to let in non-standard elements like 'noembed', or new, not-yet-standard elements like 'canvas'.

Permitting non-standard attributes

(Update, August 2012: Since version 1.1.13, htmLawed allows use of custom, non-standard attributes or custom rules for standard attributes.)

htmLawed's logic uses a white-list of all standard (incl. deprecated) and a few commonly used non-standard attributes (this is the default behavior that can however be configured to restrict the attributes using white- or black-lists). The documentation has a list of the accepted attributes. All other attributes are always filtered out. Furthermore, values specified through such attributes are lost as htmLawed transforms only deprecated (and a few non-standard) attributes. The 'hook_tag' functionality, which allows custom manipulation of opening tag content with the attributes, kicks in after htmLawed has transformed and filtered attributes. So that too doesn't help rescue information declared by non-standard attributes like 'background'.

Modifying htmLawed (as of version 1.1) to permit more attributes requires editing code declared for the function 'hl_tag'. This post provides an example using the 'background' attribute.

Permitting new elements

htmLawed covers 86 elements: all those in the (X)HTML 4 standard, incl. those for Ruby, and 'embed'. The 86 include deprecated elements like 'font' which can be optionally transformed to another element by htmLawed. Any other element is always removed.

As an example, code modifications to cover the 'canvas' element proposed in the WHATWG HTML5 standard are described here. Following HTML code shows the use of a canvas tag:

<div>
  <canvas id="graph" width="150" height="150">
    <p>current price: Rs. 315.50 <small>+1.50</small></p>
  </canvas>
</div>

To modify htmLawed code to cover a new element, the following have to be considered:

* if transformation of the element is to be facilitated
* the attributes (incl. deprecated ones) the element can have
* if transformation of the deprecated attributes is to be facilitated
* parent-child relationships of the element with other elements

The following brief ignores points 1 and 3 above.

To begin, 'canvas' is added to the default set of elements by modifying $e array in function 'htmLawed':

// config eles
$e = array('a'=>1, 'abbr'=>1, ... 'canvas'=>1 ...

Were 'canvas' to be an 'empty' element like 'img', one would also modify the $eE array in function 'hl_tag':

static $eE = array('area'=>1, 'br'=>1, 'canvas'=>1, 'col'=>1 ...

Attributes 'width' and 'height' are the two specific attributes of 'canvas', which also can have universal attributes like 'class' and 'id'. htmLawed allows universal attributes in every element, so the only thing left to do is to allow 'width' and 'height' for 'canvas' (there are no deprecated attributes to consider for transformation). This is done by modifying the $aN array in function 'hl_tag':

static $aN = array( ...
'height' => array('canvas'=>1, 'embed'=>1, 'iframe'=>1 ...
...
'width' => array('canvas'=>1, 'embed'=>1, 'hr'=>1 ...

The last step requires specifying the nesting properties for 'canvas' by modifying a few arrays declared at the beginning of function 'hl_bal' (if htmLawed is not used for tag-balancing, this is not required). The array $eI specifies elements that are considered inline. It is modified to add 'canvas'. The array $cF specifies elements whose child-elements can be either inline- or block-type, and it too is modified to add 'canvas'.

// eles by content
$cB = array('blockquote'=>1 ...); // Block
$cE = array('area'=>1 ...); // Empty
$cF = array('canvas'=>1, 'button'=>1 ...); // Flow...
$cI = array('a'=>1 ...); // Inline
$cN = array('a'=> ...); // Illegal
$cN2 = array_keys($cN);
$cR = array('blockquote'=>1 ...);
$cS = array('colgroup'=> ...); // Specific (immediate parent-child)
$cO = array('address'=> ...); // Other

// eles by block/inline type; ins & del both type; #pcdata: plain text
$eB = array('address'=>1 ...);
$eI = array('canvas'=>1, '#pcdata'=>1 ...);
$eN = array('a'=>1 ...); // Exclude from specific ele; $cN values
$eO = array('area'=>1 ...); // Missing in $eB & $eI