1 (edited by imp 2012-10-28 19:36:28)

Topic: Ignoring HTML elements within 'pre'

Hello I am wanting to use the wonderful tidy config however everything in the pre elements looses their formatting. Is there a setting for tidy to ignore everything inside particular tags like pre? I ask this as I am using syntaxhighlighter to parse code in pre elements.

I cannot not find anything in the documentation about this and in the code I would think that if(strpos('pre,script,textarea', "$p,")){return $t;} would do this anyway?

I am using htmLawed in drupal. Any insight would be helpful, thanks.

This is the basic config I am using.
'safe'=>1, 'elements'=>'pre, p, a, em, strong, cite, code, ol, ul, li, dl, dt, dd', 'tidy'=>1

2

Re: Ignoring HTML elements within 'pre'

htmLawed allows within the 'pre' element all elements that are legally permitted by standards (e.g., 'b' and 'em'). Illegal elements like 'p' are removed. The checking of what elements are allowed within other elements is performed by the hl_bal() function.

(1) The function can be turned off using the 'balance' config. parameter. This has drawbacks, of course, but may be permissible depending on your situation (the types of input you get, expected quality of inputs, etc.).

(2) The function can be altered to allow 'pre' to have any element. This too is not recommended (e.g., because there probably are 'pre' elements in inputs that do not have code to syntax-highlight), but may be permissible in your situation. To edit the function, simply edit the various arrays at the beginning of the function's code such that 'pre' mimcs 'div.'

(3) Another suggestion is to convert '<', '>', and '&' characters within 'pre' to character entities like '&lt;' before the input is sent to htmLawed for filtering. This can be done, for instance, using PHP's preg_replace_callback function. If the replacement affects code recognition by the syntax highlighter, you may have to re-configure the highlighter, consider another highlighting software, replace the character entities within 'pre' using another search-replace operation, etc. (I don't know which Drupal module you use for syntax highlighting).

(4) If the result of syntax highlighting is code with character entities used for the HTML characters '<', '>', and '&', then perhaps you can configure Drupal to execute htmLawed at the very end (i.e., as the last filter).