I was thinking something a little more robust. Tags should be stripped or converted to entities. But I was thinking why not kill everything not on lines 2 through 7 excluding the DEL character on the ASCII chart:
http://en.wikipedia.org/wiki/File:ASCII_Code_Chart.svg
That would mean only allowing hex 20 through hex 7e (space to tilde). The only important things missing would be tabs and line breaks (\s). So something like:
PHP Code:
$new_string = strip_tags(preg_replace("/[^\x20-\x7e\s]/", '', $body));
Notes:
- I used preg instead of ereg, ereg has been deprecated.
- Unless you still need the original contents of $body, there's no reason to create a new variable:
PHP Code:
$body = strip_tags(preg_replace("/[^\x20-\x7e\s]/", '', $body));
That would turn this:
into:
It occurred to me that you can use the literal characters in the regex, so I did and made a working demo:
PHP Code:
<!DOCTYPE html>
<html>
<head>
<title>Strip Tags & Machine Code ('high' and 'low' ASCII)</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<?php
$str = "<span>Some stuff©</span>\r<div>More Stuff@stuff.com</div><script type='text/javascript'>
Some Destructive Javascript Code That Won't Run Without Script Tags
</script>";
$str = strip_tags(preg_replace("/[^ -~\s]/", '', $str));
echo "<textarea cols=50 rows=5>$str</textarea>";
?>
</body>
</html>
Bookmarks