1

Topic: Bug: img with data tags are ill-escaped

I am not sure if this is the right place to report a bug and a fix, but i guess i at least try.

When you have an img with an data tag, base64 encoded png or similar, it can happen that the regular expression

if(!preg_match('`^<(/?)([a-zA-Z][a-zA-Z1-6]*)([^>]*?)\s?>$`m', $t, $m)){

picks up the whole tag an replaces <> with '&lt;', '&gt;' so

&lt;img src=".."&gt;

That marks the document broken. We patched this so it does not do that if that is an data:image tag, see the patch agains below 1.2.4.2


@@ -425,7 +425,11 @@
   if($t == '< '){return '&lt; ';}
   if($t == '>'){return '&gt;';}
   if(!preg_match('`^<(/?)([a-zA-Z][a-zA-Z1-6]*)([^>]*?)\s?>$`m', $t, $m)){
-    return str_replace(array('<', '>'), array('&lt;', '&gt;'), $t);
+    if(strstr($t, 'data:image')){
+      return $t;
+    }else{
+      return str_replace(array('&lt;', '&gt;'), array('&amp;lt;', '&amp;gt;'), $t);
+    }
   }elseif(!isset($C['elements'][($e = strtolower($m[2]))])){
     return (($C['keep_bad']%2) ? str_replace(array('<', '>'), array('&lt;', '&gt;'), $t) : '');
   }


it does not break current behavior as far as we could test it

2

Re: Bug: img with data tags are ill-escaped

I apologize for the delay in responding because of a personal issue. I will soon look into your suggestion in detail and post my comment.

3

Re: Bug: img with data tags are ill-escaped

Hi, can you provide an example to illustrate the issue that you refer to? My understanding is that '<' and '>' characters are not permitted within data URIs, and the issue should not arise because these characters will not be present inside <img src="data:image,..." />.