1

Topic: Plain text lost in elements like form that don't allow direct text

Balance switched on:
Config:

$htmLawed_config = array('comment'=>1, //remove comments
                'keep_bad'=>6,
                'balance'=>1,//turn off tag-balancing (config['balance']=>0). That will not introduce any security risk; only standards-compliant tag nesting check/filtering will be turned off (basic tag-balance will remain; i.e., there won't be any unclosed tag, etc., after filtering)
                'tidy'=>1,
                'elements' => "* -script",
                'schemes'=>'href: file, ftp, http, https, mailto; src: cid, data, file, ftp, http, https; *:file, http, https',
                'hook_tag' =>"hl_email_tag_transform",
            );

Example:

<BLOCKQUOTE> <DIV><BR></DIV>Other Text (vanished)<BR>Some Text<BR></BLOCKQUOTE>

Result:

<blockquote>\n <div>\n  <br />\n </div>\n <div>\n  <br />\n  Some Text<br />\n </div>\n</blockquote>

Tested on Testpage as well, no special config used there. Same result.

2 (edited by leithoff 2012-08-03 07:32:17)

Re: Plain text lost in elements like form that don't allow direct text

Another one:

<BLOCKQUOTE>
      <DIV>&nbsp;</DIV>text that vanishes<BR>&nbsp; more text
</blockquote>

Result

 <blockquote><div>
      &nbsp;<div>&nbsp;</div>text that vanishes<br />&nbsp; more text¬
</div></blockquote>

If I add an space (or text) before this code-snipped in question, everything seems fine

<BLOCKQUOTE>
      &nbsp;<DIV>&nbsp;</DIV>text that vanishes<BR>&nbsp; more text
</blockquote>

tested on test-page, with balance and make strict, but you may switch off make_tag_strict, as it produces the same result

3

Re: Plain text lost in elements like form that don't allow direct text

The elements 'blockquote', 'form', 'map' and 'noscript' need a block element as the immediate child. Thus, plain text is not allowed directly within 'blockquote', or 'input' directly within 'form'.

htmLawed checks for this rule (function hl_bal) when 'balance' is enabled. If the above rule is broken, it tries to correct by forcing-in a 'div'. Thus, 'x' in the input below will be put within a 'div':

// input
<blockquote>x<div>y</div></blockquote>

// output
<blockquote><div>x<div>y</div></div></blockquote>

This corrective behavior is not triggered when 'x' is before the closing 'blockquote' tag:

// input
<blockquote><div>y</div>x</blockquote>

// output
<blockquote><div>y</div></blockquote>

I have tried to keep correcting behavior of htmLawed to a minimal. Mainly because it is difficult to deduce what the input writer has in mind. Anyway, I will look into this issue.

4

Re: Plain text lost in elements like form that don't allow direct text

OK, I have looked into this, and it seems that the following modification fixes the issue. This modification triggers the above-mentioned corrective behavior for the cases where it was not getting effectuated. I have checked a large number of input cases and the modification does not appear to introduce any bug. Still, let me know if you encounter anything. I will also continue checking more test cases.

 // the following line, line number 197 (in function hl_bal) of htmLawed.php, version 1.1.13 has to be replaced

  if($do < 3 or isset($ok['#pcdata'])){echo $x;}

// replace with

  if(strlen(trim($x)) && (($ql && isset($cB[$p])) or (isset($cB[$in]) && !$ql))){
   echo '<div>', $x, '</div>';
  }
  elseif($do < 3 or isset($ok['#pcdata'])){echo $x;}

Below are some example inputs to test if you want to tweak the code:

<blockquote>QQQ<div>x</div></blockquote>
<blockquote><div>x</div>QQQ</blockquote>
<blockquote><div>x</div>QQQ<div>x</div></blockquote>
<blockquote><div>x</div>QQQ</blockquote><p>x</p>
<blockquote><p>x</p>QQQ</blockquote>
<blockquote>QQQ<p>x</p>RRR</blockquote>

<blockquote>QQQ<div>x</div><!-- comment --></blockquote>
<blockquote><div>x</div><!-- comment -->QQQ</blockquote>
<blockquote><!-- comment --><div>x</div>QQQ<div>x</div></blockquote>
<blockquote><div>x<!-- comment --></div>QQQ</blockquote><p>x</p>

<div>QQQ<div>x</div></div>
<div><div>x</div>QQQ</div>
<div><div>x</div>QQQ<div>x</div></div>
<div><div>x</div>QQQ</div><p>x</p>
<div><p>x</p>QQQ</div>
<div>QQQ<p>x</p>RRR</div>

<form>QQQ<div>x</div></form>
<form><div>x</div>QQQ</form>
<form><div>x</div>QQQ<div>x</div></form>
<form><div>x</div>QQQ</form><p>x</p>
<form><p>x</p>QQQ</form>
<form>QQQ<p>x</p>RRR</form>

5

Re: Plain text lost in elements like form that don't allow direct text

Thanks a lot, for looking into this, I have a week off, so I will not respond, or be able to test til next week

6

Re: Plain text lost in elements like form that don't allow direct text

The new release of htmLawed (1.1.14, 8 Aug'12) includes this code modification.