09-24-2006, 06:49 PM
I am working on a small comments feature for a site I'm developing. Right now it does absolutely no checks on the input. That is just plain stupid. I know I can use strip_tags to remove HTML, but I would like to allow the user a couple of HTML tags(b,strong,em,i,ul,ol,li,a,you get the idea). So my question is how can I go about removing the attributes of these tags to prevent XSS?

09-24-2006, 09:51 PM
Well, you could use a search and replace to find any html tags, using wildcards with preg_replace, though I really don't understand how that works.
And, just check if it's an allowed tag.
Or, you could do it the longer way without using preg_replace and search for < or > and check what's in the middle.

The other option is using markup codes, like on this board. I'm coding some myself, and it's not that complex. However, verifying things like if there's a closing tag is a bit annnoying.

09-24-2006, 10:33 PM
you mean BBcode? I wrote a bbcode parser a long time ago that worked pretty well but I would prefer to allow HTML. If you are trying to code a bbcode parser it is probably bes done with preg_replace/str_replace.

09-24-2006, 10:52 PM
Use preg_replace() to "whitelist" tags, call htmlentities() on the data, then replace the whitelisted tag strings with their respective angular brackets.