Topic: Keeping nested UL untouched (revisited)
In this topic, there is a discussion of nested UL tags, and the fact that HTMLawed is being very strict about standards compliance, and thus stripping out the nested <UL> and </UL> in examples like this:
<ul>
<li>first element</li>
<li>second element</li>
<ul>
<li>first sub-element</li>
</ul>
<li>third element</li>
</ul>In the other topic, the suggested solution is to use a regular expression to add an <li> tag. This would produce:
<ul>
<li>first element</li>
<li>second element</li><li>
<ul>
<li>first sub-element</li>
</ul>
<li>third element</li>
</ul>which is incorrect. I don't know about other browsers, but Firefox adds an extra dot to the rendered list when it sees this empty <li> tag. The more correct solution is:
<ul>
<li>first element</li>
<li>second element
<ul>
<li>first sub-element</li>
</ul></li>
<li>third element</li>
</ul>Which is actually very hard to achieve in a regular expression, since it means having to recursively parse the content of the <ul> tag, to figure out where it ends. So, we can settle for this:
<ul>
<li>first element</li>
<li>second element
<ul>
<li>first sub-element</li>
</ul>
<li>third element</li>
</ul>where the </li> is removed completely. Here's the revised regexp:
$s = preg_replace('`</li>([^<]*)(<[u|o])`i', '$1$2', $s);Of course, in an ideal world, there would be a configuration option to add the appropriate tags to the $cS array. We have tens of thousands of existing pages in a CMS that are getting parsed by HTMLawed, and it seems excessive to have to filter them all through the above regular expression first.
I hesitate to modify the source to add them to $cS, because then I have to be sure to make the same change if we ever upgrade HTMLawed. This can be difficult to remember at some date in the far future, when you have a group of four programmers all working on the same project.