1

Topic: Prevent conversion of empty tags to self-closing ones

htmLawed does not have an option to prevent conversion of empty tags to self-closing ones (for example, '<br>' to '<br />'). One can consider one of the following methods to retain empty tags without the self-closing ' /' in the htmLawed output:

(1) A string-replace run on htmLawed-filtered output: Perhaps the simplest way. E.g.,

$out = htmLawed($in, ...);
$out = str_replace (" />", ">", $out);

(2) Use the $config 'hook_tag' parameter: When $config parameter 'hook_tag' is set to the name of a custom function, htmLawed allows tag values, after any security/administration-related filtering, to be processed by the custom function. This can be used to prevent conversion of empty tags to self-closing ones. E.g.,

// For htmLawed hook_tag; ensures that empty tags are not self-closed (e.g., '<br>' stays as '<br>')
function my_no_self_closing_function($element, $attribute_array=0){
    // If second argument is not received, it means a closing tag is being handled
    if(is_numeric($attribute_array)){
        return "</$element>";
    }
    $string = '';
    foreach($attribute_array as $k=>$v){
        $string .= " {$k}=\"{$v}\"";
    }
    return "<{$element}{$string}>";
}
// htmLawed processing
$config = array(..., 'hook_tag' => 'my_no_self_closing_function', ...);
$out = htmLawed($in, $config, ...);

(3) Modify htmLawed code: E.g.

// Find in htmLawed.php, at the end of function hl_tag():
return "<{$e}{$aA}". (isset($eE[$e]) ? ' /' : ''). '>'; 
// Change to:
return "<{$e}{$aA}>"; 

2

Re: Prevent conversion of empty tags to self-closing ones

That's great, thanks very much.

Since my HTML writers use a mixture of self-closing style and not (and are occasionally very opinionated on the matter!), rather than enforcing one way or the other, is there some way I can leave them untouched?

3

Re: Prevent conversion of empty tags to self-closing ones

Synchro wrote:

...is there some way I can leave them untouched?

Modifying the htmLawed code is the only way that I can think of for this, to leave empty tags the way they are (i.e., self-closing if they were so before, or non-self-closing  if they were so before).

You can try this modification:

// First, find in htmLawed.php (line 459 for version 1.1.19), in the middle of function hl_tag():
$mode = 0; $a = trim($a, ' /'); $aA = array();
// Add the following new line of code BEFORE the above line:
$self_close = (isset($eE[$e]) && substr($a, -1) == '/') ? 1 : 0;

// Then, find in htmLawed.php  (line 613 for version 1.1.19), towards the end of function hl_tag():
return "<{$e}{$aA}". (isset($eE[$e]) ? ' /' : ''). '>'; 
// And change the line to:
return "<{$e}{$aA}". ($self_close ? ' /' : ''). '>';

Example input and output with and without above modification:

// Input
<br/><br><br />

// Output without the code modification
<br /><br /><br />

// Output with the code modification
<br /><br><br />

4

Re: Prevent conversion of empty tags to self-closing ones

Thanks again.

I do have the doctype on hand, so what I think I'll do is enforce self-closing for xhtml, and enforce non-self-closing for everything else. That should make for more consistent code.

Something I tried to do for the above suggestions was to implement the hook_tag function as a closure, so as not to pollute my global namespace, like this:

'hook_tag' => function ($element, $attribute_array = 0) { ...

However, that won't work because it's existence is checked with function_exists (which limits you to built-in or global functions), and it's called as a variable function with "$C['hook_tag']($e, $a)". It would be nicer if hook_tag was a callable - it could be checked with is_callable and called with call_user_func - this would work with the existing syntax (a function name as a string), but allow more flexible structures too - such as a closure, a static or dynamic method.

Just a suggestion.

5

Re: Prevent conversion of empty tags to self-closing ones

The suggestion to implement 'hook_tag' through is_callable and called with call_user_func is a good idea. Thanks. I will put this change in the next htmLawed release.

6 (edited by Synchro 2015-01-21 12:39:55)

Re: Prevent conversion of empty tags to self-closing ones

There's an obvious problem with the code in your number 2 suggestion - it's not fussy about which tags it tries to close, so it self-closes all of them. I added a filter to only self-close tags that are always void (in this context we can't tell if an optionally-self-closed tag has a separate end tag or not), using this list of tags:

if (in_array($element, array('br', 'hr', 'img', 'input', 'wbr')) { ...

I think these are the only ones which can always be self-closed safely. Tags like base, meta and link would also apply, but they should only appear inside head tags, which htmlAwed doesn't touch anyway, and command, source etc are HTML5 which doesn't need self-close slashes anyway.

7

Re: Prevent conversion of empty tags to self-closing ones

Synchro wrote:

There's an obvious problem with the code in your number 2 suggestion...

Thanks for picking up this possible issue.