1 (edited by Vladimir 2013-07-09 16:26:44)

Topic: htmLawed results

htmLawed.1.2.beta.2-9June2013

htmLawed('<p><а href="link">Edit description</а></p>', array('safe' => 1,'keep_bad' => 1,));

Out

<p></p><а href="link">Edit description</а> text &lt;/p&gt;

2 (edited by Vladimir 2013-07-09 16:48:38)

Re: htmLawed results

input old youtube video

{object width="560" height="315"}
{param name="movie" value="-/-/w-w-w.youtube.com/v/FL9sJygKzgk?version=3&amp;hl=ru_RU"}{/param}
{param name="allowFullScreen" value="true"}{/param}
{param name="allowscriptaccess" value="always"}{/param}
{embed src="-/-/w-w-w.youtube.com/v/FL9sJygKzgk?version=3&amp;hl=ru_RU" type="application/x-shockwave-flash" width="560" height="315" allowscriptaccess="always" allowfullscreen="true"}{/embed}
{/object}

PHP

$HLSpec = 'iframe=-*,height,width,type,src(match="`^(?:https:|http:)?//(?:www\.)?(?:youtube\.com)/`i"); object=-*,height,width,type,data(match="youtube\.com/`i")';
$HLConfig = array(
    'safe' => 1,
    'keep_bad' => 1,
    'elements' => '*-script-style+iframe+object+embed', //  +iframe
);
$htmLawed = htmLawed($text, $HLConfig, $HLSpec);

out

{object width="560" height="315"}
{param name="movie" value="-/-/w-w-w.youtube.com/v/FL9sJygKzgk?version=3&amp;hl=ru_RU" /}&lt;/param&gt;
{param name="allowFullScreen" value="true" /}&lt;/param&gt;
{param name="allowscriptaccess" value="always" /}&lt;/param&gt;
{embed src="-/-/w-w-w.youtube.com/v/FL9sJygKzgk?version=3&amp;hl=ru_RU" type="application/x-shockwave-flash" width="560" height="315" /}&lt;/embed&gt;
{/object}

Replace { to < ; } to > ; -/-/w-w-w to //www

3

Re: htmLawed results

Thanks for writing. I will look into it.

4

Re: htmLawed results

patnaik wrote:

Thanks for writing. I will look into it.

I am refering to the first post.

For the second post that you have, I do not understand the issue. htmLawed does not have the ability to convert BBCode- or Markdown-like code to HTML. Thus, it will not convert {tag}abc{/tag} to <tag>abc</tag>. It also does not perform checks on all attribute values. Thus, it does not know that '-/-/w-w-w' means '//www.'

5

Re: htmLawed results

if i insert 
www.some.link
www.some.link
www.some.link
and ...

forum show error:

Warning! The following errors must be corrected before your message can be posted:

    Too more links in message. Allowed 3 links. Reduce number of links and post it again.

6

Re: htmLawed results

Normal Html code this
http://pastebin.com/nJhTkDuS

7

Re: htmLawed results

The 'allowed 3 links' message seems to be because of some setting of the forum software that you are using.

Regarding the HTML snippet you put on Pastebin, I tried it with both the current release and the current beta of htmLawed on the htmLawed test pages. I get different outputs than the ones you note.

<object width="560" height="315">¬
<param name="movie" value="//www.youtube.com/v/FL9sJygKzgk?version=3&amp;hl=ru_RU" />¬
<param name="allowFullScreen" value="true" />¬
<param name="allowscriptaccess" value="always" />¬
<embed src="//www.youtube.com/v/FL9sJygKzgk?version=3&amp;hl=ru_RU" type="application/x-shockwave-flash" width="560" height="315" />¬
</object>

htmLawed does remove the 'allowscriptaccess' and 'allowfullscreen' attributes of the 'embed' element, which is expected as those attributes are not in the HTML specs. However, to allow their use, you can use the $spec parameter of htmLawed:

$config = array(...); // htmLawed config.
$spec = 'embed = *, allowscriptaccess, allowfullscreen';
$out = htmLawed($in, $config, $spec)

8

Re: htmLawed results

$config = array(
'safe' => 1,
'keep_bad' => 1,
'elements' => '*-script-style+iframe+object+embed',
);

I use option keep_bad. I think this option stay only bad(denied) tags, and remove "wrong" tags.

9 (edited by patnaik 2013-07-15 21:21:14)

Re: htmLawed results

With your $config values, I still get the expected result as noted in my last post; that is, I do not see an issue.

Regarding 'keep_bad,' it is to declare how htmLawed deals with tags that are filtered out. With 'keep_bad' set to 1, htmLawed will neutralize such tags by converting '<' and '>' characters to entities ('&lt;' and '&gt'). See documentation on keep_bad.

// input; 'param' is an empty element; that is, it should not have the closing tag '</param>'
<param name="allowscriptaccess" value="always"></param>

// output, with keep_bad = 6 (default value, used when not declared)
<param name="allowscriptaccess" value="always" />

// output, with keep_bad = 1
<param name="allowscriptaccess" value="always" />&lt;/param&gt;

10

Re: htmLawed results

Thank
You can add new option "Remove end tag, for tag haven't it".

Feater request.
Maybe add in code some check

function htmLawed($t, $C=1, $S=array()){
if (strpos($t, '<') === false) {
return $t;
}
....

11

Re: htmLawed results

Vladimir wrote:

You can add new option "Remove end tag, for tag haven't it".

It looks like a good feature to have, but I feel the current configurability of the 'keep_bad' config. parameter is sufficient for handling 'bad' tags, and having this additional option is not worth the additional script execution time.

You can also remove the closing tags for these empty elements with plain PHP before passing the input to htmLawed. E.g.:

// remove closing tags for empty elements that shouldn't have them
$in = str_ireplace(array('</br>', '</embed>', '</hr>', '</img>'...), '', $in);
$out = htmLawed($in, $config...);

12 (edited by patnaik 2013-07-24 05:11:23)

Re: htmLawed results

Vladimir wrote:

htmLawed.1.2.beta.2-9June2013

htmLawed('<p><а href="link">Edit description</а></p>', array('safe' => 1,'keep_bad' => 1,));

Out

<p></p><а href="link">Edit description</а> text &lt;/p&gt;

This is not a bug. The 'a' in the 'a' element in the input text you have in your first post is the Cyrillic  (hexadecimal code-point D0B0 in UTF-8) and not the Latin 'a' (hexadecimal code-point 61). You can see this by inputting the text in a text-to-hex converter such as this online tool: http://www.swingnote.com/tools/texttohex.php. HTML uses the Latin 'a' to denote the 'a' element.

13

Re: htmLawed results

OK! No Cyrillic.

htmLawed('<p><a href="link">Edit description</a></p>', array('safe' => 1,'keep_bad' => 1,));
<p></p><a href="link">Edit description</a>&lt;/p&gt;

14

Re: htmLawed results

Yes, you are right. This is indeed a bug in the beta release. I will work at it. Thanks.

15

Re: htmLawed results

The new release of the htmLawed 1.2 beta (6 August 2013) fixes this issue. Thanks again for your feedback, Vladimir.