1

Topic: !DOCTYPE being stripped

Hello,

I'm using htmLawed to clean up html email documents loaded into an iframe for a webmail application.  As such, while I want to strip out many dangerous tags and attributes, I do not want the DOCTYPE declaration to be removed.  However, I can't seem to figure out the appropriate elements string for the config array to prevent this from happening. Here's my config array as it sits now:

$aConfig = array(    
    'balance' => 0, //don't worry about balancing tags
    'clean_ms_char' => 1, //clean up funky windows-1252 chars that the browser gets annoyed with
    'elements' => '*+!DOCTYPE+html+head+title+img+style', //allow these tags in addition to those already allowed by htmLawed
    'schemes' => 'src: cid' //allow cid in img src attribute so we can match up inline images
);

Anyone have any recommendations?  Thanks!

-Bryan

2

Re: !DOCTYPE being stripped

htmLawed does not handle non-HTML BODY elements, etc. (like, HEAD and TITLE). Perhaps you can pass just the BODY to htmLawed after identifying it using a PHP substring or regular expression function.

3

Re: !DOCTYPE being stripped

Fair enough. Thanks for the response, and the awesome tool!