5.2 Valid attribute-element combinations
(to top)
Valid attribute-element combinations as per W3C specs:
* includes deprecated attributes (marked
^), and attributes for the non-standard
embed element (marked
*)
* only non-frameset, HTML body elements
*
name for
a and
map, and
lang are invalid in XHTML 1.1
*
target is valid for
a in XHTML 1.1 and higher
*
xml:space is only for XHTML 1.1
abbr - td, th
accept - form, input
accept-charset - form
accesskey - a, area, button, input, label, legend, textarea
action - form
align - caption^, embed, applet, iframe, img^, input^, object^, legend^, table^, hr^, div^, h1^, h2^, h3^, h4^, h5^, h6^, p^, col, colgroup, tbody, td, tfoot, th, thead, tr
alt - applet, area, img, input
archive - applet, object
axis - td, th
bgcolor - embed, table^, tr^, td^, th^
border - table, img^, object^
cellpadding - table
cellspacing - table
char - col, colgroup, tbody, td, tfoot, th, thead, tr
charoff - col, colgroup, tbody, td, tfoot, th, thead, tr
charset - a, script
checked - input
cite - blockquote, q, del, ins
classid - object
clear - br^
code - applet
codebase - object, applet
codetype - object
color - font
cols - textarea
colspan - td, th
compact - dir, dl^, menu, ol^, ul^
coords - area, a
data - object
datetime - del, ins
declare - object
defer - script
dir - bdo
disabled - button, input, optgroup, option, select, textarea
enctype - form
face - font
for - label
frame - table
frameborder - iframe
headers - td, th
height - embed, iframe, td^, th^, img, object, applet
href - a, area
hreflang - a
hspace - applet, img^, object^
ismap - img, input
label - option, optgroup
language - script^
longdesc - img, iframe
marginheight - iframe
marginwidth - iframe
maxlength - input
method - form
model* - embed
multiple - select
name - button, embed, textarea, applet^, select, form^, iframe^, img^, a^, input, object, map^, param
nohref - area
noshade - hr^
nowrap - td^, th^
object - applet
onblur - a, area, button, input, label, select, textarea
onchange - input, select, textarea
onfocus - a, area, button, input, label, select, textarea
onreset - form
onselect - input, textarea
onsubmit - form
pluginspage* - embed
pluginurl* - embed
prompt - isindex
readonly - textarea, input
rel - a
rev - a
rows - textarea
rowspan - td, th
rules - table
scope - td, th
scrolling - iframe
selected - option
shape - area, a
size - hr^, font, input, select
span - col, colgroup
src - embed, script, input, iframe, img
standby - object
start - ol^
summary - table
tabindex - a, area, button, input, object, select, textarea
target - a^, area, form
type - a, embed, object, param, script, input, li^, ol^, ul^, button
usemap - img, input, object
valign - col, colgroup, tbody, td, tfoot, th, thead, tr
value - input, option, param, button, li^
valuetype - param
vspace - applet, img^, object^
width - embed, hr^, iframe, img, object, table, td^, th^, applet, col, colgroup, pre^
xml:space - pre, script, style
These are allowed in all but the shown elements:
class - param, script
dir - applet, bdo, br, iframe, param, script
id - script
lang - applet, br, iframe, param, script
onclick - applet, bdo, br, font, iframe, isindex, param, script
ondblclick - applet, bdo, br, font, iframe, isindex, param, script
onkeydown - applet, bdo, br, font, iframe, isindex, param, script
onkeypress - applet, bdo, br, font, iframe, isindex, param, script
onkeyup - applet, bdo, br, font, iframe, isindex, param, script
onmousedown - applet, bdo, br, font, iframe, isindex, param, script
onmousemove - applet, bdo, br, font, iframe, isindex, param, script
onmouseout - applet, bdo, br, font, iframe, isindex, param, script
onmouseover - applet, bdo, br, font, iframe, isindex, param, script
onmouseup - applet, bdo, br, font, iframe, isindex, param, script
style - param, script
title - param, script
xml:lang - applet, br, iframe, param, script
5.6 Brief on htmLawed code
(to top)
Much of the code's logic and reasoning can be understood from the documentation above.
The
output of htmLawed is a text string containing the processed input. There is no custom error tracking, etc.
Function arguments for htmLawed are:
*
$in - 1st argument; a text string; the
input text to be processed. Any extraneous slashes added by PHP when magic quotes are enabled should be removed beforehand using PHP's
stripslashes function.
*
$cf - 2nd argument; an associative array; optional. The array has keys with names like
balance and
keep_bad, and the values, which can be boolean, string, or array, depending on the key, are read to accordingly set the
configurable parameters (indicated by the keys). All configurable parameters receive some default value if the value to be used is not specified by the user through
$cf.
Finalized $cf is thus a filtered and possibly larger array.
*
$spec - 3rd argument; a text string; optional. The string has rules, writted in an htmLawed-designated format,
specifying element-specific attribute and attribute value restrictions. Function
hl_spec is used to convert the string to an associative-array for internal use.
Finalized $spec is thus an array.
Finalized $cf and
$spec are made
global variables while htmLawed is at work. Values of any pre-existing global variables with same names are noted, and their valueds are restored after htmLawed finishes processing the input. Depending on
$cf, another global variable
hl_Ids, to track
id attribute values for uniqueness, may be set. Unlike the other two variables, this one is not reset (or unset) post-processing.
Except for the main function
htmLawed and the functions
kses and
kses_hook, htmLawed function names are
name-spaced using the
hl_ prefix. The
functions and their roles are:
*
hl_attrval - checking attribute values against $spec
*
hl_bal - tag balancing
*
hl_cmtcd - handling CDATA sections and HTML comments
*
hl_ent - entity handling
*
hl_prot - checking a URL scheme/protocol
*
hl_regex - checking syntax of a regular expression
*
hl_spec - converting user-supplied $spec value to one used by htmLawed internally
*
hl_tag - handling tags
*
hl_tag2 - transforming tags
*
hl_version - reporting htmLawed version
*
htmLawed - main function
*
kses - main function of
kses
*
kses_hook - hook function of
kses
The last two are for compatibility with pre-existing code using the
kses script. htmLawed's
kses() basically passes on the filtering task to
htmLawed() function after deciphering
$cf and
$spec from the argument values supplied to it.
kses_hook() is "blank" and is meant for being filled with custom code if the
kses script users were using one.
htmLawed() finalizes
$spec (with the help of
hl_spec()) and
$cf, and globalizes them. Finalization of
$cf involves setting default values if an inappropriate or invalid one is supplied. This includes calling
hl_regex() to check well-formedness of regular expression patterns if such expressions are user-supplied through
$cf.
htmLawed() then removes invalid characters like nulls and
x01 and appropriately handles entities using
hl_ent(). HTML comments and CDATA sections are identified and treated as per
$cf values with the help of
hl_cmtcd(). When retained,
< and
> of their markups are replaced with control characters until the end to avoid their being mis-read as tag markup.
htmLawed() identifies tags using regex and processes them with the help of
hl_tag() -- a large function that analyzes tag content, filtering it as per HTML standards,
$config and
$spec. Among other things,
hl_tag() transforms deprecated elements using
hl_tag2(), removes attributes from closing tags, checks attribute values as per
$spec rules using
hl_attrval(), and checks URL protocols using
hl_prot.
htmLawed() performs tag balancing and nesting checks at the end with a call to
hl_bal().
htmLawed 1.0.3, 3 Mar. 2008
Copyright Santosh Patnaik
GPLv3 license
A PHP Labware internal utility - http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed