1 (edited by lriggle 2013-07-11 15:49:52)

Topic: Trying to understand the groups of elements

Hi There!

A while ago the organization I work for did a refactor of htmLawed (keeping all the same functionality) to make it easier for us to read and understand what's going on, and to make things easier to use on our end.  It plugs into our code tools quite well! (It's open sourced, you can find it here http://github.com/LinkUp/htmlCleaner )

However, looking back on it, I noticed that while we were able to figure out what a lot of the variables in the code represent, we didn't understand what all of them were for. So...I was wondering if someone could help me flesh out my definitions a bit.

These are all in hl_bal()
$cB = Block level elements
$cE = Empty elements
$cF = Flow elements (not sure what flow is supposed to mean, but that's what the comment stated)
$cI = Inline elements
$cN = Illegal elements
$cN2 = Array keys for $cN
$cR = ????
$cS = Parent/Child relationships between tags
$cO = Other misc tags
$cT = Tags with omitable closing tags
$eB = ????
$eI = ????
$eN = ????
$eO = ????
$eF = Combination of eB and eI. Don't know what it means though.

These are all in hl_tag()
$eD = Deprecated tags
$eE = Tags with no close elements
$aN = Attributes specific to certain elements
$aNE = Empty attributes
$aNP = ????
$aNU = ????
$aNL = ????
$eAL = ????
$aND = Deprecated attributes
$eAD = Elements with those deprecated attributes


There are others, but these are the main ones that have been bothering me for a while.

Thanks for the help!

2

Re: Trying to understand the groups of elements

Thanks for writing about this. I should seriously consider putting out well-commented htmLawed code for use by developers like you. Following is some information on the various variables that you mention. I hope it is helpful. Some of the variables might appear redundant but they are there for faster lookups, etc. Note that the information may change in future versions of htmLawed (1.2+).

The array variables in function hl_bal

(In order of appearance)

(1) $cB = Elements (tags) that require a 'block' element as an immediate child in their content. E.g., 'form' requires its immediate child element to be a block element like 'div' or 'table.'

(2) $cE = 'Empty' elements like 'img' and 'br' that do not take any other element as a child in their content.

(3) $cF = Elements that can have a 'flow' element in their content.

(4) $cI = Elements that can have an 'inline' element in their content.

(5) $cN = Elements that are not allowed in certain elements indexed in the variable. E.g., 'a' is not allowed within another 'a.'

(6) $cN2 = List of elements that are keys in the $cN array.

(7) $cR = List of elements that require a child element. E.g., 'table' is required to have elements like 'td' within it. Currently, this variable's value is actually not used for htmLawed's functionality; see this section of documentation.

(8) $cS = Elements that are used as immediate child elements of some specific elements. E.g., if 'tr' has a child element, it has to be 'td' or 'th.'

(9) $cO = Some other permitted parent-child element pairs. E.g., 'param' inside 'object.'

(10) $cT = Elements that need not have a closing tag. E.g., a new 'tr' can be started without closing the 'tr' before it (<tr><td></td><td></td><tr>...).

(11) $eB = 'Block' elements like 'div' and 'table.'

(12) $eI = 'Inline' elements like 'em' and 'strong.'

(13) $eN = List of elements that are values in the $cN array.

(14) $eO = List of elements missing in $eB and $eI.

(15) $eF = 'Flow' elements; i.e., list of both 'block' and 'inline' elements.


The array variables in function hl_tag

(In order of appearance)

(1) $eD = Elements (tags) like 'center' that are designated as deprecated in HTML 4 specs. Tag-content with such elements is passed on to function hl_tag2 if the config. parameter 'make_tag_strict' is enabled; see documentation for 'make_tag_strict'.

(2) $eE = 'Empty' elements like 'img' and 'br' that do not require a closing tag.

(3) $aN = Attributes (names) and the elements that they can be in. See this appendix in the documentation for a list that this array's value is based on.

(4) $aNE = 'Empty' attributes (names) that do not require a value; such as 'disabled' for the 'input' element.

(5) $aNP = Attributes (names) whose values have or can have URL protocols/schemes; e.g., the 'href' attribute can have a URL value with the 'http' protocol/scheme. The values are run through function hl_prot to 'disable' protocols that are not permitted as per config. parameter 'schemes'; see documentation for 'schemes'.

(6) $aNU = 'Universal' attributes (names) that are acceptable in any element except those indexed in this variable.

(7) $aNL = Values of attributes (not attribute names) that take one of a set of defined standard values when they are in certain elements (defined in $eAL). E.g, the 'type' attribute of the 'input' element can only be one of 'password,' 'checkbox,' 'radio,' etc. When the config. parameter 'lc_std_val' is enabled, these attribute values are lower-cased by htmLawed. See documentation for 'lc_std_val'.

(8) $eAL = See $aNL.

(9) $aND = 'Deprecated' attributes (names) that are considered so as per HTML 4 specs. when present in the elements indexed in this variable. $eAD is a list of elements that can potentially contain a deprecated attribute. Deprecated attributes are transformed as per code in the hl_tag function if config. parameter 'no_deprecated_attr' is in effect; see documentation for 'no_deprecated_attr'.

(10) $eAD = See $aND.

(11) $eAR = 'Required' attributes (names) that are required to be present in certain elements. E.g., the 'alt' attribute in 'img'. See documentation on addition of required attributes and values.