1

Topic: Keeping nested UL untouched (revisited)

In this topic, there is a discussion of nested UL tags, and the fact that HTMLawed is being very strict about standards compliance, and thus stripping out the nested <UL> and </UL> in examples like this:

<ul>
  <li>first element</li>
  <li>second element</li>
  <ul>
    <li>first sub-element</li>
  </ul>
  <li>third element</li>
</ul>

In the other topic, the suggested solution is to use a regular expression to add an <li> tag. This would produce:

<ul>
  <li>first element</li>
  <li>second element</li><li>
  <ul>
    <li>first sub-element</li>
  </ul>
  <li>third element</li>
</ul>

which is incorrect. I don't know about other browsers,  but Firefox adds an extra dot to the rendered list when it sees this empty <li> tag. The more correct solution is:

<ul>
  <li>first element</li>
  <li>second element
  <ul>
    <li>first sub-element</li>
  </ul></li>
  <li>third element</li>
</ul>

Which is actually very hard to achieve in a regular expression, since it means having to recursively parse the content of the <ul> tag, to figure out where it ends. So, we can settle for this:

<ul>
  <li>first element</li>
  <li>second element
  <ul>
    <li>first sub-element</li>
  </ul>
  <li>third element</li>
</ul>

where the </li> is removed completely. Here's the revised regexp:

$s = preg_replace('`</li>([^<]*)(<[u|o])`i', '$1$2', $s);

Of course, in an ideal world, there would be a configuration option to add the appropriate tags to  the $cS array. We have tens of thousands of existing pages in a CMS that are getting parsed by HTMLawed, and it seems excessive to have to filter them all through the above regular expression first.

I hesitate to modify the source to add them to $cS, because then I have to be sure to make the same change if we ever upgrade HTMLawed. This can be difficult to remember at some date in the far future, when you have a group of four programmers all working on the same project.

2

Re: Keeping nested UL untouched (revisited)

Gribnif wrote:

In the other topic, the suggested solution is to use a regular expression to add an <li> tag. This would produce:

<ul>
  <li>first element</li>
  <li>second element</li><li>
  <ul>
    <li>first sub-element</li>
  </ul>
  <li>third element</li>
</ul>

which is incorrect.

The result after the regular expression-based replacement, that adds the '<li>' after the 'second element' item is 'incorrect', until it is passed through htmLawed for the addition of a '</li>' before the 'third element' to balance the '<li>'. You are right that Firefox does render the new '<li>' as an 'empty' bullet. I am not sure if this is a Firefox bug (e.g., see https://bugzilla.mozilla.org/show_bug.cgi?id=383289). The regular expression code you suggest is clearly better. Processing input with it before handing it to htmLawed results in output that renders properly and is standards-compliant.

Let me look into the possibility of having a configurable option to allow 'ul' and 'ol' as direct descendants of 'ul' and 'ol' without a need for an intermediary 'li'.

3

Re: Keeping nested UL untouched (revisited)

The latest version, 1.1.10, of htmLawed (22 Oct. 2011 release) now provides the option to allow a list to be a direct descendant of another. I.e., 'ul' or 'ol' can directly go within another 'ul' or 'ol' (this is not standard-compliant, though browser applications don't seem to care). To enable the option, use the $config parameter 'direct_list_nest'.