XtremeGamer99
07-27-2007, 08:27 AM
Okay, I'm making a class that parses user input for allowed HTML tags and whatnot.
It's coming along nicely, but I'm having trouble with something... First thing's first, here's the code. Forgive me, it comes packed with junk for testing purposes (the first, second, etc vars in the class; the bit at the bottom; and all that commenting). I figured I'd bloat it a bit so that people would know what was happening and whatnot. >_>
<?php
class parse {
/*
Very Messy. I wanted to make it example-ish fast.
*/
private $final = null;
// steps. There's a better way to do this, but I'm tired and I want to do this quickly.
// Using this for demo purposes only.
public $first;
public $second;
public $third;
public $fourth;
// maybe make it public. Not sure yet... =/
private static $allowed = array('b', 'i', 'a', 'strong', 'em', 'pre', 'code', 'sup', 'sub');
function __construct($input) {
if(!is_array(self::$allowed)) {
// throw an error later
}
/*
This creates something like this:
(?:b|i|a|strong|em|pre|ect)
*/
$tags = '(?:'.implode('|', self::$allowed).')';
$this->first = $tags;
// Replace valid tags so that they have a special character before and after.
/*
This makes something like this: ([] = special char -- for display puposes >.>, <bad> = non-valid tag)
blah []<b>[]blah <bad>blah</bad> []</b>[] blah
*/
$temp = preg_replace('#(</?'.$tags.'( [^>]*)?>)#i', chr(0).'\1'.chr(0), trim($input));
$this->second = $temp;
// Use that special char to mark 'points', seperating tags and content. Makes an array.
// Only 'valid' tags with be 'wrapped' in these points, so the un-valid tags will be considered 'content'.
// From what I've tested so far, valid tags are _always_ odd-indexed, while content is _always_ even-indexed
/*
This make is look like this:
Array
(
[0] => blah
[1] => <b>
[2] => blah
[3] => </b>
[4] => <bad>blah</bad> blah
)
*/
$temp = explode(chr(0), $temp);
$this->third = $temp;
// Convert all 'content', then skip the next index (the valid tags)
/*
This makes tags such as <bad> converted.
*/
for($i = 0, $n = count($temp); $i < $n; $i += 2) {
$temp[$i] = htmlspecialchars($temp[$i], ENT_QUOTES); }
$this->fourth = $temp;
// Put it all back together.
$temp = implode('', $temp);
$this->final = $temp;
}
function show() {
// bad function name. Oh well...
return $this->final; }
}
// Lets try it
$html = <<<HTML
blah <b>blah</b> <bad>blah</bad> blah
HTML;
$parse = new parse($html);
echo
"<html>
<body>
<p>This is the orginal HTML:</p>\n".
htmlspecialchars($html).
"<p>This is what it looks like parsed:</p>\n".
$parse->show().
"<br /><br /><p>Here's some more info.</p>\n".
"<p><b>Reg Ex Pattern:</b> <code>'#(</?$parse->first( [^>]*)?>)#i</code>'\n".
"<p>After special car insertion (they don't show up correctly, but whatever, they're not supposed to. >_>. Also, <bad></bad> doesn't display, but it shows up in the source):</p>\n".
$parse->second.
"<p>The Array that you get when it explodes... (again, <bad></bad> only shows up in the source)</p>\n<pre>\n";
echo print_r($parse->third);
echo "</pre>\n<p>and finally, what you get after the conversion of the content:</p>\n<pre>\n";
echo print_r($parse->fourth);
echo "</pre>\n".
"<p>This is a very messy example, but whatever, it works.
</body>
</html>";
?>
Right. So, what that does is basically separates the 'valid' tags from the content in $input, then htmlspecialchars() the content. It works really well for me.
My problem:
I would like it so that HTML tags inside a <pre> tag get converted to special chars also. At the moment, valid tags inside <pre>'s are left alone, thus producing formatted code inside the <pre>. Since I'm using <pre> mostly for posting code, I need it not to format. To see what I mean, copy the code to a file on your local server, and change around $html a bit.
I also need it to be flexible. I was thinking about a switch function inside the for(). If the case is '<pre>', or '<php>' (that would be a custom tag, of course), or '<pre class="blah">', have it do different things other than the default. Like, for <pre>, have it htmlspecialchars everything inside. For <php>, make a <pre> with syntax highlighting. Etc.
I just don't know how I would do this. Any thoughts? Right now, I thought of this, but it doesn't get me anywhere:
Replace the current for() with this:
for($i = 1, $n = count($temp); $i < $n; $i += 1) {
switch($temp[$i]) {
case '<pre>':
//do something that will make everything inside <pre></pre> special chars
break;
default:
$i++;
$temp[$i] = htmlspecialchars($temp[$i], ENT_QUOTES);
}
}
I don't know where to go from there.
At the moment, I have a preg_replace_callback() function in place to match all <pre> tags. They go through a function that's basically the same as the one above, except instead of leaving valid tags alone, it converts them also. It works, but I would rather have everything done in __construct() (with a bit of a change), especially since it's the same function... (I can post that function if it helps) I'd just rather not have one function for every 'special case' I have. <_<
I probably confused everyone with that, so any question, I'll answer then gladly.
And any help would be greatly appreciated. =D
It's coming along nicely, but I'm having trouble with something... First thing's first, here's the code. Forgive me, it comes packed with junk for testing purposes (the first, second, etc vars in the class; the bit at the bottom; and all that commenting). I figured I'd bloat it a bit so that people would know what was happening and whatnot. >_>
<?php
class parse {
/*
Very Messy. I wanted to make it example-ish fast.
*/
private $final = null;
// steps. There's a better way to do this, but I'm tired and I want to do this quickly.
// Using this for demo purposes only.
public $first;
public $second;
public $third;
public $fourth;
// maybe make it public. Not sure yet... =/
private static $allowed = array('b', 'i', 'a', 'strong', 'em', 'pre', 'code', 'sup', 'sub');
function __construct($input) {
if(!is_array(self::$allowed)) {
// throw an error later
}
/*
This creates something like this:
(?:b|i|a|strong|em|pre|ect)
*/
$tags = '(?:'.implode('|', self::$allowed).')';
$this->first = $tags;
// Replace valid tags so that they have a special character before and after.
/*
This makes something like this: ([] = special char -- for display puposes >.>, <bad> = non-valid tag)
blah []<b>[]blah <bad>blah</bad> []</b>[] blah
*/
$temp = preg_replace('#(</?'.$tags.'( [^>]*)?>)#i', chr(0).'\1'.chr(0), trim($input));
$this->second = $temp;
// Use that special char to mark 'points', seperating tags and content. Makes an array.
// Only 'valid' tags with be 'wrapped' in these points, so the un-valid tags will be considered 'content'.
// From what I've tested so far, valid tags are _always_ odd-indexed, while content is _always_ even-indexed
/*
This make is look like this:
Array
(
[0] => blah
[1] => <b>
[2] => blah
[3] => </b>
[4] => <bad>blah</bad> blah
)
*/
$temp = explode(chr(0), $temp);
$this->third = $temp;
// Convert all 'content', then skip the next index (the valid tags)
/*
This makes tags such as <bad> converted.
*/
for($i = 0, $n = count($temp); $i < $n; $i += 2) {
$temp[$i] = htmlspecialchars($temp[$i], ENT_QUOTES); }
$this->fourth = $temp;
// Put it all back together.
$temp = implode('', $temp);
$this->final = $temp;
}
function show() {
// bad function name. Oh well...
return $this->final; }
}
// Lets try it
$html = <<<HTML
blah <b>blah</b> <bad>blah</bad> blah
HTML;
$parse = new parse($html);
echo
"<html>
<body>
<p>This is the orginal HTML:</p>\n".
htmlspecialchars($html).
"<p>This is what it looks like parsed:</p>\n".
$parse->show().
"<br /><br /><p>Here's some more info.</p>\n".
"<p><b>Reg Ex Pattern:</b> <code>'#(</?$parse->first( [^>]*)?>)#i</code>'\n".
"<p>After special car insertion (they don't show up correctly, but whatever, they're not supposed to. >_>. Also, <bad></bad> doesn't display, but it shows up in the source):</p>\n".
$parse->second.
"<p>The Array that you get when it explodes... (again, <bad></bad> only shows up in the source)</p>\n<pre>\n";
echo print_r($parse->third);
echo "</pre>\n<p>and finally, what you get after the conversion of the content:</p>\n<pre>\n";
echo print_r($parse->fourth);
echo "</pre>\n".
"<p>This is a very messy example, but whatever, it works.
</body>
</html>";
?>
Right. So, what that does is basically separates the 'valid' tags from the content in $input, then htmlspecialchars() the content. It works really well for me.
My problem:
I would like it so that HTML tags inside a <pre> tag get converted to special chars also. At the moment, valid tags inside <pre>'s are left alone, thus producing formatted code inside the <pre>. Since I'm using <pre> mostly for posting code, I need it not to format. To see what I mean, copy the code to a file on your local server, and change around $html a bit.
I also need it to be flexible. I was thinking about a switch function inside the for(). If the case is '<pre>', or '<php>' (that would be a custom tag, of course), or '<pre class="blah">', have it do different things other than the default. Like, for <pre>, have it htmlspecialchars everything inside. For <php>, make a <pre> with syntax highlighting. Etc.
I just don't know how I would do this. Any thoughts? Right now, I thought of this, but it doesn't get me anywhere:
Replace the current for() with this:
for($i = 1, $n = count($temp); $i < $n; $i += 1) {
switch($temp[$i]) {
case '<pre>':
//do something that will make everything inside <pre></pre> special chars
break;
default:
$i++;
$temp[$i] = htmlspecialchars($temp[$i], ENT_QUOTES);
}
}
I don't know where to go from there.
At the moment, I have a preg_replace_callback() function in place to match all <pre> tags. They go through a function that's basically the same as the one above, except instead of leaving valid tags alone, it converts them also. It works, but I would rather have everything done in __construct() (with a bit of a change), especially since it's the same function... (I can post that function if it helps) I'd just rather not have one function for every 'special case' I have. <_<
I probably confused everyone with that, so any question, I'll answer then gladly.
And any help would be greatly appreciated. =D