1

Topic: Making 'span' like 'div'

In the following example, htmlawed drops all h1, h2, ul, and li elements for no good reason hat I can think of. I tried the current version and the version from 16 july 2009.

My settings are:

$ntrd = htmLawed($ntrd, array('safe'=>1,'balance'=>1,'anti_link_spam'=>array('/./','')));

<span>
<h1 style="margin: 0px 0px 7px; padding: 0px; border-width: 0px; font-size: 23px; vertical-align: baseline; background-color: transparent; font-family: 'Trebuchet MS','Liberation Sans','DejaVu Sans',sans-serif; font-weight: bold;">
Languages</h1>
<h2 style="margin: 0px 0px 7px; padding: 0px; border-width: 0px; font-size: 19px; vertical-align: baseline; background-color: transparent; font-family: 'Trebuchet MS','Liberation Sans','DejaVu Sans',sans-serif; font-weight: bold;">
C</h2>
<ul style="margin: 0px 0px 1em 30px; padding: 0px; border-width: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent; list-style-type: disc;">
<li style="margin: 0px; padding: 0px; border-width: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent;">
<a href="http://publications.gbdirect.co.uk/c_book/" rel="nofollow" style="margin: 0px; padding: 0px; border-width: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent; color: rgb(0, 119, 204); text-decoration: none; cursor: pointer;">
The C book</a>
</li>
<li style="margin: 0px; padding: 0px; border-width: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent;">
<a href="http://www.knosof.co.uk/cbook/cbook.html" rel="nofollow" style="margin: 0px; padding: 0px; border-width: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent; color: rgb(0, 119, 204); text-decoration: none; cursor: pointer;">
The new C standard - an annotated reference</a>
</li>
</ul>
<h2 style="margin: 0px 0px 7px; padding: 0px; border-width: 0px; font-size: 19px; vertical-align: baseline; background-color: transparent; font-family: 'Trebuchet MS','Liberation Sans','DejaVu Sans',sans-serif; font-weight: bold;">
C++</h2>
<ul style="margin: 0px 0px 1em 30px; padding: 0px; border-width: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent; list-style-type: disc;">
<li style="margin: 0px; padding: 0px; border-width: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent;">
<a href="http://www.mindviewinc.com/downloads/TICPP-2nd-ed-Vol-one.zip" rel="nofollow" style="margin: 0px; padding: 0px; border-width: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent; color: rgb(0, 119, 204); text-decoration: none; cursor: pointer;">
Thinking in C++</a>
</li>
<li style="margin: 0px; padding: 0px; border-width: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent;">
<a href="http://cppannotations.sourceforge.net/" rel="nofollow" style="margin: 0px; padding: 0px; border-width: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent; color: rgb(0, 119, 204); text-decoration: none; cursor: pointer;">
C++ Annotations</a>
</li>
<li style="margin: 0px; padding: 0px; border-width: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent;">
<a href="http://openbookproject.net/thinkcs/cpp.php" rel="nofollow" style="margin: 0px; padding: 0px; border-width: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent; color: rgb(0, 119, 204); text-decoration: none; cursor: pointer;">
How to Think Like a Computer Scientist</a>
</li>
<li style="margin: 0px; padding: 0px; border-width: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent;">
<a href="http://www.agner.org/optimize/" rel="nofollow" style="margin: 0px; padding: 0px; border-width: 0px; font-size: 14px; vertical-align: baseline; background-color: transparent; color: rgb(0, 119, 204); text-decoration: none; cursor: pointer;">
Software optimization resources by Agner Fog</a>
</li>
</ul>
</span>

2

Re: Making 'span' like 'div'

Can you re-check? I tried your input on the htmLawed test-page, and did not find any issue.

3

Re: Making 'span' like 'div'

Patnaik, thanks a lot for your prompt reply, and for your excellent and generous work on htmlawed.

Sorry I did not paste all the html, if you include the previous example inside a span element you will loose all h1, h2, ul, and li elements. I have checked that using the htmLawed test-page.

I guess htmlawed is probably doing so because of some standard that says something like blocks can not be inside spans. But the fact is the web is full of html examples like this one, and 99% of browsers render such examples perfectly fine.

How can I modify the htmlawed config or source code so it does not drop all those elements when they are inside a span?

Thanks again.

4

Re: Making 'span' like 'div'

To let htmLawed treat 'span' the unusual way that you want it to, either turn off the tag-balancing setting ('balance'=>0), or edit function hl_bal so that 'span'=>1 is in the $cF and $eB and not $cI or $eI arrays.

ps. I edited the topic of your original post, and put the HTML sample code within 'span'.

5

Re: Making 'span' like 'div'

Thank you very much for your reply.

The problem is not just about span. How about all the other inline elements?

In this case it seems that many tags are being removed, changing they way the html is rendered, but the removed tags do not represent any security risk.

I think it would be good if, by default,  htmlAwed only removed tags when they represent a security risk or when they break rendering in one of the main browsers (IE, Firefox or Chrome). Otherwise a large percentage of the HTML code out there will stop to render the way it is meant to after going through htmlawed. Lots of people put divs inside spans, b, and other inline elements, this is not a correct thing to do, but is not a security risk (AFAIK).

6

Re: Making 'span' like 'div'

I understand your point. Unfortunately, to accommodate such flexibility, htmLawed would have to be made even more configurable, which can slow it down and/or make it more confusing to use.

Short of altering the code like what was done for 'span' in my previous post, the best way I can suggest is to turn off tag-balancing (config['balance']=>0). That will not introduce any security risk; only standards-compliant tag nesting check/filtering will be turned off (basic tag-balance will remain; i.e., there won't be any unclosed tag, etc., after filtering).