1 (edited by gmf 2016-01-17 14:03:35)

Topic: Can HTMLawed merge multiple line breaks with nested br and nbsp;?

I am using too many regex to filter dirty HTML content, here is an overview: http://stackoverflow.com/q/34841209/1066234

Just now I have recalled HTMLawed that helped me already a lot.

Is there any way to merge redundant line breaks that are empty or hold an arbitrary amout of br and  ?

2

Re: Can HTMLawed merge multiple line breaks with nested br and nbsp;?

htmLawed does not have a feature to merge redundant line-breaks... such content is HTML-legal and the software cannot discern an admin's intent.

Your serial regex search-replace method seems the best option. Below is my suggestion that seems to accomplish the goal with two search-replace runs.

// Remove empty P element that has only white-spaces, HTML-encoded or not
$in = preg_replace('`((&nbsp;)|(<br[^>]*>)|\s)*<p[^>]*>((&nbsp;)|(<br[^>]*>)|\s)*</p>((&nbsp;)|(<br[^>]*>)|\s)*`i', '', $in);

// Replace string of multiple BR elements with one BR
$in = preg_replace('`(<br[^>]*>)((&nbsp;)|(<br[^>]*>)|\s)*`i', '\\1', $in);