1

Topic: Problem regarding 'balance' together with 'make_tag_strict'

Hi Patanik,
this issue is probably related to the Post "Making 'span' like 'div'"

<HTML>
<BODY BGCOLOR="white"><FONT FACE="Arial, Helvetica">
<CENTER><FONT SIZE="+2" COLOR="#000080"><B>
List of Sessions
</B></FONT></CENTER><BR>
<CENTER>
<FONT COLOR="#505050">
Cell Manager: SRVTAR1
<BR>
Creation Date: 15.05.2012 07:00:06
</FONT>
</CENTER>
<BR><BR>
<CENTER><TABLE BORDER="1" CELLSPACING="0" CELLPADDING="2" WIDTH="95%">
<TR>
<TH BGCOLOR="#000080"><FONT FACE="Arial, Helvetica" COLOR="white">Session Type</FONT></TH>
<TH BGCOLOR="#000080"><FONT FACE="Arial, Helvetica" COLOR="white">Specification</FONT></TH>
<TH BGCOLOR="#000080"><FONT FACE="Arial, Helvetica" COLOR="white">Status</FONT></TH>
<TH BGCOLOR="#000080"><FONT FACE="Arial, Helvetica" COLOR="white">Mode</FONT></TH>
<TH BGCOLOR="#000080"><FONT FACE="Arial, Helvetica" COLOR="white">Start Time</FONT></TH>
<TH BGCOLOR="#000080"><FONT FACE="Arial, Helvetica" COLOR="white">Queuing</FONT></TH>
<TH BGCOLOR="#000080"><FONT FACE="Arial, Helvetica" COLOR="white">Duration</FONT></TH>
<TH BGCOLOR="#000080"><FONT FACE="Arial, Helvetica" COLOR="white">GB Written</FONT></TH>
<TH BGCOLOR="#000080"><FONT FACE="Arial, Helvetica" COLOR="white"># Media</FONT></TH>
<TH BGCOLOR="#000080"><FONT FACE="Arial, Helvetica" COLOR="white"># Errors</FONT></TH>
<TH BGCOLOR="#000080"><FONT FACE="Arial, Helvetica" COLOR="white"># Warnings</FONT></TH>
<TH BGCOLOR="#000080"><FONT FACE="Arial, Helvetica" COLOR="white"># Files</FONT></TH>
<TH BGCOLOR="#000080"><FONT FACE="Arial, Helvetica" COLOR="white">Success</FONT></TH>
<TH BGCOLOR="#000080"><FONT FACE="Arial, Helvetica" COLOR="white">Session ID</FONT></TH>
</TR>
<TR>
<TD BGCOLOR="white" ALIGN="LEFT"><FONT FACE="Arial, Helvetica">Backup</FONT></TD>
<TD BGCOLOR="white" ALIGN="LEFT"><FONT FACE="Arial, Helvetica">Linux Fileserver NTSRV1</FONT></TD>
<TD BGCOLOR="white" ALIGN="LEFT"><FONT FACE="Arial, Helvetica"><FONT COLOR="green">Completed/Errors</FONT></FONT></TD>
<TD BGCOLOR="white" ALIGN="LEFT"><FONT FACE="Arial, Helvetica">full</FONT></TD>
<TD BGCOLOR="white" ALIGN="LEFT"><FONT FACE="Arial, Helvetica">14.05.2012 23:45:05</FONT></TD>
<TD BGCOLOR="white" ALIGN="RIGHT"><FONT FACE="Arial, Helvetica">0:00</FONT></TD>
<TD BGCOLOR="white" ALIGN="RIGHT"><FONT FACE="Arial, Helvetica">4:42</FONT></TD>
<TD BGCOLOR="white" ALIGN="RIGHT"><FONT FACE="Arial, Helvetica">771,81</FONT></TD>
<TD BGCOLOR="white" ALIGN="RIGHT"><FONT FACE="Arial, Helvetica">2</FONT></TD>
<TD BGCOLOR="white" ALIGN="RIGHT"><FONT FACE="Arial, Helvetica">6</FONT></TD>
<TD BGCOLOR="white" ALIGN="RIGHT"><FONT FACE="Arial, Helvetica">8</FONT></TD>
<TD BGCOLOR="white" ALIGN="RIGHT"><FONT FACE="Arial, Helvetica">2111887</FONT></TD>
<TD BGCOLOR="white" ALIGN="RIGHT"><FONT FACE="Arial, Helvetica"><CENTER><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" WIDTH="100%" HEIGHT="100%"><TR><TD BGCOLOR="green" WIDTH="100"><FONT FACE="Arial, Helvetica">&nbsp;</FONT></TD></TR></TABLE></CENTER></FONT></TD>
<TD BGCOLOR="white" ALIGN="LEFT"><FONT FACE="Arial, Helvetica">2012/05/14-9</FONT></TD>
</TR>
</TABLE></CENTER>
<BR><P><BR><P>
</FONT></BODY>
</HTML>

if I use a config like that

$htmLawed_config = array('comment'=>1,
                'keep_bad'=>6,
                'balance'=>1,
                'tidy'=>1,
                'elements' => "* -script",
                'schemes'=>'href: file, ftp, http, https, mailto; src: cid, data, file, ftp, http, https; *:file, http, https',
                'hook_tag' =>"hl_email_tag_transform",
            );

I run into the situation that make_tag_strict is turning the first font tag into a span tag; with balance switched on, the expected table is crippled as in the process of tag balancing a span may only hold inline elements as childs.

I may use the suggested change  "Making 'span' like 'div'" - adding span to $cF and $eB and removing it from $cI and $eI, or switching balancing off, which results in other unwanted effects elsewhere.

If I switch off make_tag_strict its the same as font is not part of the $cF and $eB arrays, by trading off the benefits of make_tag_strict.

I considered using div as replacement for the span tag when running into font elements in hl_tag2.
This works with the particular example, but may fail elsewhere.

Any suggestions?

2

Re: Problem regarding 'balance' together with 'make_tag_strict'

already found some examples where the approach with  "using div as replacement for the span tag when running into font elements in hl_tag2" failes.
:-(, so this is not a well thought off consideration of mine.

3

Re: Problem regarding 'balance' together with 'make_tag_strict'

With 'make_tag_strict' enabled, htmLawed converts 'font' elements to 'span' elements with 'style' attribute property values derived from the attributes of 'font'. This is OK when 'font' is used as an inline element (e.g., within 'th' in your input example above) but not when it is used more like a 'block' element (e.g., the very first 'font', after 'body', in the input example). For the latter, conversion to 'div' instead of 'span' is more suitable.

Right now, I cannot think of an easy and less resource-intensive way to have htmLawed recognize the intended use of 'font' (inline or block). One option, then, is to have htmLawed convert 'font' to an inline 'div' (with the right 'style' attribute value). But this too has issues (e.g., when 'font' is within a 'p').

May be the best way is for administrators to implement a hook_tag function or to pre-filter the input (regex.-based search-replace?).

4

Re: Problem regarding 'balance' together with 'make_tag_strict'

Right now I am using this patch within hl_bal

@@ -183,8 +183,8 @@
   $p = array_pop($q);
   $q[] = $p;
   if(isset($cS[$p])){$ok = $cS[$p];}
-  elseif(isset($cI[$p])){$ok = $eI; $cI['del'] = 1; $cI['ins'] = 1;}
-  elseif(isset($cF[$p])){$ok = $eF; unset($cI['del'], $cI['ins']);}
+  elseif(isset($cI[$p])&&$p!='span'){$ok = $eI; $cI['del'] = 1; $cI['ins'] = 1;}
+  elseif(isset($cF[$p])||$p=='span'){$ok = $eF; unset($cI['del'], $cI['ins']);}
   elseif(isset($cB[$p])){$ok = $eB; unset($cI['del'], $cI['ins']);}
   if(isset($cO[$p])){$ok = $ok + $cO[$p];}
   if(isset($cN[$p])){$ok = array_diff_assoc($ok, $cN[$p]);}

to avoid the problem. As far as I have tested it, it works as desired.
I agree that this approach is not fine art, and is not really following the intention of hl_bal.
But I think, because of the use of span (or font), this approach could be worth either a config option, or could be
activated if one switches on balancing together with make_tag_strict.
I do have a hook_tag function, but could not think of a way to handle the problem there.

5

Re: Problem regarding 'balance' together with 'make_tag_strict'

While this modification lets 'span' be treated like 'div', the output remains standards-incompatible. E.g., with the following input, one gets a 'div' within 'span' in the output which is not permitted in the standard specification. Of course, this does not matter if compatibility with standards is not a big issue, especially if browsers render the HTML as intended.

//input
<center><font color='red'><div>hi</div><div>QQQ</div></font></center>

// output
<div style="text-align: center;"><span style="color: red;"><div>hi</div><div>QQQ</div></span></div>

Still, I will look for a good solution for this issue.

6

Re: Problem regarding 'balance' together with 'make_tag_strict'

Thanks, I have a week off, so I will not respond, or able to test a lot til next week

7

Re: Problem regarding 'balance' together with 'make_tag_strict'

Because 'div' is more permissive than 'span' as to what elements can reside within it, one approach would be to transform 'font' to 'div' instead of 'span' during tag-transformation when 'make_tag_strict' is in effect. Then, during balancing, a 'div' is converted to 'span' if the 'div' is not otherwise permitted.

Finer aspects of this approach are:

(1) If 'balance' is not enabled, then 'span' replaces 'font,' as it is now; otherwise, 'div' replaces 'font'. In general, when 'font' is in the input, it is more likely that it is being used as an inline element and not as a block one. The issue we are addressing arises from the latter case.

(2) The 'div' made from 'font' can be given the style property of 'display: inline' to avoid possibly breaking page layout because 'div' is displayed as a block element by default.

(3) The 'div' made from 'font' can also be given a proprietary style property of '-htmlawed-transform: 1' to allow administrators to still identify such a post-transformation element in case they want to operate on it using CSS, Javascript, PHP, etc. (Note that hl_tag2 returns a value that can only be used within the 'style' attribute value.)

(4) During balancing ('balance' in effect),  a 'div' is converted to 'span' only if 'make_tag_strict' is in effect.

--

Coming to your original issue, following are the modifications, principled on the above points, that seem to fix the issue. Line numbers of file htmLawed.php refer to those of htmLawed version 1.1.4.

// line 637; in function hl_tag2
 $e = 'span'; return ltrim($a2);

// replace above with following
 if($GLOBALS['C']['balance']){
  $e = 'div'; return 'display: inline; -htmlawed-transform: 1; '. ltrim($a2);
 }else{
  $e = 'span'; return ltrim($a2);
 }

// add following after line 229; in function hl_bal
 if($e == 'div' && !isset($ok['div']) && strpos($a, '-htmlawed-transform')){
  $t[$i] = "span{$a}>{$x}"; unset($e, $x); --$i; continue;  
 }

For the final part of the modification above, one can get rid of the third, 'strpos' condition to speed up balancing. An extra advantage could be that this takes care of an input with an illegal nesting like a 'div' within an 'h1' (one sees this, e.g., in the code of Huffington Post web pages: www.huffingtonpost.com). But, by keeping the condition, I espouse htmLawed's philosophy of trying to (1) not guess the input writer's intention, (2) have minimal corrective behavior, and (3) be on the side of standards.

Some input texts to test/tweak the above modification:

<h4><div></div></h4> <!-- div not allowed in h4 -->
<h4><span></span></h4>
<h4><div><font></font></div></h4>
<h4><span><font></font></span></h4>
<h4><font><div></div></font></h4> <!-- font meant as block? -->
<h4><font><span></span></font></h4> <!-- font meant as inline -->
<center><font color='red'><div>hi</div><div>QQQ</div></font></center> <!-- font meant as block? -->