> In particular, I have a form with a textarea that I wish to allow *some* > HTML tags (strong,em,a href,img,etc.) - such as used in many blogging apps. > I'm searching for a methodology to check input for possible maliousness. Don't use the "check for known badnesses; if none, let the message go through unaltered" approach. As well as having to know all the ways scripting can be inserted into a page, you would have to know the parsing bugs of all major browsers that could result in malformed input - which your checking couldn't detect - being interpreted as malicious code. (For example: disallowing "<script" fails to take into account that IE will happily parse "<[ASCII 0 character]script" as a script tag.) I've been collecting examples of JavaScript injection techniques which I can post if anyone's interested. But the point is that these sorts of bugs have been found again and again in every application that allows straight-through user markup, including every webmail provider and bulletin board system. You just can't make this approach secure without infinite debugging time. It's better to parse the input yourself and then send it back out in a process that you know cannot generate malicious or malformed code. > Any suggestions? UBB type markup? Regex? Other? Easiest is probably to parse the input using a standard XML parser, then remove all but a small number of allowed element and attribute names, then use a standard XML serialiser to write out the results; this will ensure that stray quotes, ampersands, control characters etc. will get escaped correctly rather than causing potential security problems. Of course this requires XHTML being used everywhere (though you could conceivably use HTML Tidy as an input stage so users don't have to input well-formed markup). Alternatively, invent your own noddy markup language which you can parse and then output with proper escaping, so user input such as '<' and '&' is never echoed directly to the browser. Here you can start simple - plain text with newline converted to new paragraph - then add just whatever features are needed. UBB started like this, but got a bit carried away and added every conceivable feature *and HTML markup as well*, which isn't brilliant for security. Other issues to look out for include URIs - only ever allow a few known-good URI types like http, https, ftp etc.; don't attempt to just detect and disallow the known-bad like javascript: as there are more than you think and many ways to obfuscate them; - and Unicode - IE and old versions of Opera support invalid UTF-8 sequences, and as such the user can include character 0xC0 followed by 0xBC to get a '<', without triggering many naive filters. Ideally your server environment should be using Unicode strings for everthing internally (don't know if that's the case with ASP) so should catch the invalid sequence, otherwise you'd have to check for them manually; Allowing user input securely can be pretty hard. It doesn't help that 90% of webapp example code doesn't even try to get it right.