1

Topic: Apply 'keep_bad' setting to tags that do not match regex for HTML tags

--- trunk/phpgwapi/inc/htmLawed/htmLawed.php    2012/07/27 09:59:49    39957
+++ trunk/phpgwapi/inc/htmLawed/htmLawed.php    2012/07/27 10:17:06    39958
@@ -412,7 +412,9 @@
 if($t == '< '){return '&lt; ';}
 if($t == '>'){return '&gt;';}
 if(!preg_match('`^<(/?)([a-zA-Z][a-zA-Z1-6]*)([^>]*?)\s?>$`m', $t, $m)){
- return str_replace(array('<', '>'), array('&lt;', '&gt;'), $t);
+ return (($C['keep_bad']%2) ? str_replace(array('<', '>'), array('&lt;', '&gt;'), $t) : '');
 }elseif(!isset($C['elements'][($e = strtolower($m[2]))])){
  return (($C['keep_bad']%2) ? str_replace(array('<', '>'), array('&lt;', '&gt;'), $t) : '');
 }

That way one will be able to decide to what extent one wants to keep the content or tags of stuff like

<![if !vml]> some stuff <![endif]>

2

Re: Apply 'keep_bad' setting to tags that do not match regex for HTML tags

There was an earlier topic posted about this. I have to think if this modification is the best approach... see this post.

3

Re: Apply 'keep_bad' setting to tags that do not match regex for HTML tags

The modification suggested in the first posting can be simplified to

--if(!preg_match('`^<(/?)([a-zA-Z][a-zA-Z1-6]*)([^>]*?)\s?>$`m', $t, $m)){
-- return str_replace(array('<', '>'), array('&lt;', '&gt;'), $t);
--}elseif(!isset($C['elements'][($e = strtolower($m[2]))])){
++if(!preg_match('`^<(/?)([a-zA-Z][a-zA-Z1-6]*)([^>]*?)\s?>$`m', $t, $m) or !isset($C['elements'][($e = strtolower($m[2]))])){
 return (($C['keep_bad']%2) ? str_replace(array('<', '>'), array('&lt;', '&gt;'), $t) : '');
}

But it raises issues with certain kinds of inputs -- see this post. A better approach might be to have a second 'preg_match' to look for possible non-HTML 'tags' that fail the first preg_match as expected but still are being used as 'tags' by the input writer.

if(!preg_match('`^<(/?)([a-zA-Z][a-zA-Z1-6]*)([^>]*?)\s?>$`m', $t, $m)){
-- return str_replace(array('<', '>'), array('&lt;', '&gt;'), $t);
++ if(preg_match('`^<[a-zA-Z?/!\[][^>]*?[a-zA-Z"\'/\]]>$`m', $t)){return (($C['keep_bad']%2) ? str_replace(array('<', '>'), array('&lt;', '&gt;'), $t) : '');}
++ else{return str_replace(array('<', '>'), array('&lt;', '&gt;'), $t);}
}elseif(!isset($C['elements'][($e = strtolower($m[2]))])){
 return (($C['keep_bad']%2) ? str_replace(array('<', '>'), array('&lt;', '&gt;'), $t) : '');
}