1

Topic: Spec and Elements to Restrict Tags & Attribute Values

Greetings Dr Patnaik and other forum users,

I am looking to use htmLawed in default safe mode, prohibit the style attribute, allow iframes and object tags with specific attribute pregmatches only.

I've cooked up the following lines (with help from http://www.bioinformatics.org/phplabware/forum/viewtopic.php?id=186 ). I have had success with using the following on iframe:

$spec = 'iframe=-*,height,width,type,src(match="`^https?://(www\.)?((youtube)|(dailymotion)|(vimeo))\.com/`i")';
    $result= htmLawed($text, array('safe'=>1, 'elements'=>'*+iframe', 'deny_attribute'=>'style'), $spec);

Here is where I have run into trouble, the following does no good at sanitizing the object tag:

$spec = 'iframe=-*,height,width,type,src(match="`^https?://(www\.)?((youtube)|(dailymotion)|(vimeo))\.com/`i"); object=-*,height,width,type,data(match="`^https?://(www\.)?((youtube)|(dailymotion)|(vimeo))\.com/`i")';
$result = htmLawed($text, array('safe'=>1, 'elements'=>'*+iframe+object', 'deny_attribute'=>'style'), $spec);

Here is sample data that I am looking to sanitize:

<div class="youtube" style="width: 350; height: 300;"><object width="350" height="300" data="http://www.youtube.com/v/bCtkqfY0S-Y" type="application/x-shockwave-flash"><param name="wmode" value="transparent" /><param name="src" value="http://www.youtube.com/v/bCtkqfY0S-Y" /></object></div>

<p><iframe src="http://www.youtube.com/embed/bCtkqfY0S-Y" frameborder="0" width="425" height="350"></iframe></p>

Kind regards and appreciation for your assistance,

Carson

2

Re: Spec and Elements to Restrict Tags & Attribute Values

Sorry for the delay in responding to your post.

I tried your suggested 'spec' setting and input example on the htmLawed test page. It seems that the 'data' attribute of 'object' does get filtered as 'expected'. A few notes on this:

(1) For htmLawed config./setting for 'elements', by default 'object' and 'iframe' are not filtered. Thus, 'elements' need not be specified, in your case, as 'elements'=>'*+iframe+object'.

(2) The rules specified in 'spec' operate at an attribute level. Thus, as per your regular expression pattern, if the value of the 'data' attribute of an 'object' fails the match, only 'data' and its value will be removed. Rest of the 'object' element will remain.