1 (edited by gmf 2012-10-07 11:06:58)

Topic: Restricting elements, attributes, and attribute values / 5 questions

Question n°1:
I want to allow the posting of certain html tags only. I would use for the config:
   'elements' => 'p, br, span, b, i, em, strong, sub, sup, strike, table, caption, tbody, tr, td',
Would that be correct?

Question n°2: How can I remove all style="..." attributes but allowed ones?
I could not find out how to use htmLawed to clean CSS styles. I want to allow only these style attributes: "color", "font-weight", "text-decoration", "background-color"

Question n°3: Are empty style-elements removed automatically?
After removing style attributes, is the style element removed. E.g. what becomes: <span style="">test</span>

Question n°4: How can class="" and id="" attributes be removed completely?
What settings do we need or is the hook_tag function required?

Question n°5: How can we remove empty tags, such as <b></b> or <p></p> -> only from the end of the content?

Thanks for your time and answers!
Kai


PS: Thanks for the great tool!

2

Re: Restricting elements, attributes, and attribute values / 5 questions

******** Q1. Yes, the config. is correct.

******** Q2. There are a couple of ways to achieve this. One uses the 'spec' argument of htmLawed, and the other a 'hook_tag' function. For the latter, see this web-page or this forum topic. Below is an example that uses 'spec':

// The 'style' attribute value for these elements ('p', 'span', 'b'...) cannot soft-match our pattern.
// The pattern looks for presence of a CSS style property name (like 'align') in the 'style' value.
// The 'style' attribute name and value will be filtered if 'style' value includes a property whose name
//   does not end in 'color', 'font-weight', 'text-decoration', 'background-color'

$spec = '
  p, span, b, i, em, strong, sub, sup, strike, table, caption, tbody, tr, td = 
    style(nomatch=%"("?<!background-color"|"color"|"font-weight"|"text-decoration")"\s*:%i);
';

$out = htmLawed($in, $config, $spec);

******** Q3. With either 'hook_tag' or 'spec' option, both the 'style' attribute name and the attribute value will be removed fro the attribute string of the element.

******** Q4. There are a number of options, including using 'hook_tag.' But the simplest is to use the 'deny_attribute' config. parameter:

$config['deny_attribute'] = 'class, id';
$out = htmLawed($in, $config, $spec);

******** Q5. I assume you mean that you want to trim the end of content to remove all white space and empty elements (elements without non-white-space content). htmLawed does not have a direct functionality for this. I suggest using some regular expression-based search-replace operation on the content before it is htmLawed-filtered.

3 (edited by gmf 2012-10-08 12:50:18)

Re: Restricting elements, attributes, and attribute values / 5 questions

Very helpful, thank you for this detailed answer! Next I have to dive into the css filtering, thanks for the two links.

We use htmLawed to filter user input at: www.gute-mathe-fragen.de which uses the open-source forum software: question2answer.org

Your php tool makes content secure! Great work!

4

Re: Restricting elements, attributes, and attribute values / 5 questions

Dear patnaik,

For cleaning CSS styles I used the $spec argument, however, it does not suit my needs as it eventually cleans all styles attributes.

The function my_css_filter (1), on the other hand, works great as it keeps the other allowed css attributes!

The only concern I have is performance as htmLawed + this filter function are called every time content gets posted AND content gets read from the database.

Do you think performance concerns are negligible here?
Thank you!
Kai

(1) http://www.bioinformatics.org/phplabware/forum/viewtopic.php?id=211

5

Re: Restricting elements, attributes, and attribute values / 5 questions

The 'spec' example I gave can certainly be tweaked to suit your needs. But, the 'hook_tag' approach is better when 'style' attribute values are expected to be more complex.

In a simple test that I did using an input of ~2500 characters with ~150 HTML tags, 10 of them with 'style' attributes, use of the 'my_css_filter' increased htmLawed processing time by only ~10%-15% (to an overall time of ~16 ms in my setup). So the extra load is marginal, but it may still be significant for your case depending on so many things.

6 (edited by gmf 2012-10-08 14:53:14)

Re: Restricting elements, attributes, and attribute values / 5 questions

Thanks again for your reply, and for the performance test!
I implemented this solution.

... and gave you some "credit": http://question2answer.org/qa/tag/htmlawed