1 (edited by gmf 2013-02-24 07:32:14)

Topic: Is it possible to filter email address from text with htmLawed?

I would like to filter email addresses from my users' posts.

E.g. something like that: "If you know an answer, just write me an email to mike@example.com" should become "email to ***" (or just removed).

In the documentation I found the option anti_mail_spam but as I understand, this will check the href attribute, I need instead to check the entire text.

I would be glad to get a how-to-do-it on that :)

2

Re: Is it possible to filter email address from text with htmLawed?

Perhaps running the text, after the htmLawed filtering step, through a simple find-replace that uses regular expression will be best. Example code:

// Text is run through htmLawed
$text = htmLawed($text, ...);

// replace email address not in an 'href' attribute value
$emailAddress_pattern = '`(^|[^:])\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b`i';
$emailAddress_replacement = '***'; // can be empty

if(strpos('@', $text)){ // avoiding preg_replace if no emailAddress_pattern
 $text = preg_replace($emailAddress_pattern, $emailAddress_replacement, $text);
}

The regex pattern above is based on http://www.regular-expressions.info/email.html, with the (^|[^:]) to exclude addresses following 'mailto:' in href attribute values. You can modify/test the pattern online at http://www.regextester.com.

3

Re: Is it possible to filter email address from text with htmLawed?

Thanks for your help, I will have a try :)