Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Re: Preventing cross site scripting

From: Tim Greer <chatmaster(at)charter.net>
Date: Fri Jun 20 2003 - 17:49:58 EDT

  • Original Message ----- From: "Laurian Gridinoc" <laur@grapefruitdesign.com> Subject: Re: Preventing cross site scripting

...
>
> > Again, interesting idea, but I don't see the advantage for me
personally.
>
> I replied to your message but in the context of the thread starter
> message - filtering html; and doing it by treating html as a language
> rather than just text.

But you can't. You have to look at it as text and determine what characters will be dangerous. HTML is only a markup language, there's no dictionary type matches. You would have a very large index as well if you attempted to determine what was valid. That is okay, and is reasonable if done properly... not the problem. The problem is XSS and how someone can insert characters or values into otherwise valid HTML tags to cause the problem. The only way to determine if it's valid and safe, barring a lot of static assumptions and basically having a huge whitelist, would be to simply strip out or refuse to render any HTML tag that has any character in it that could pose the potential to insert something to create an XSS attack.

Only so many HTML tags would allow for someone to do this in reality. The one's that do, since any tag element and value can be in any combination in a tag and be valid, so it requires some very specific checks and some just simply denying it, since it would be too open for faults. Anyway, like you said, people sent emails in HTML (I personally would either not render any email with HTML or only safe tags and screw the people that want to send HTML-ized email), so it can get rather involved, unless you simply remove those 4 or 6 vital characters from within a specific tag that could cause the problem.

And, why would someone need the characters in a tag anyway? You can check this all, allowing the special characters only in what must be valid places. Even a string with multiple single or double quotes. It's just as effective and much simpler this way. Text is what creates the markup language, after all, and thus you can't treat it as a language only and be safe. You are going to have to do a lot more work and have to modify it for each newly implemented tag in D?HTML, as well as for anything that could be an *XML, PHP, etc. type of tag.

> > Whatever works, works though. Also, regex's don't have to be written on
one
> > line. In Perl, for example, simply use the /x anchor and you can break
it up
> > to be very readable.
>
> I wasn't aware of it.
>
> > Nonetheless, if you develop anything along the lines you
> > speak, please let me know, I'd like to check it out and what you're
doing.
>
> Yup, I'm already using in production stuff like I posted, whitelist
> style - to validate/clean html input, I have a WYSIWYG editor (MSIE
> IFRAME in edit mode) which outputs extreme ugly/bad html (combine it
> with a copy/paste from MS-WORD and you get something extremely loaded
> with custom MS style definitions), all this I have to clean according to
> a white list.
>
> When I'll have another examples I'll post'em to the list.
>
>
> Cheers,
> --
> Laurian Gridinoc
> Chief Developer
Received on Sat Jun 21 22:27:29 2003

This archive was generated by hypermail 2.1.8 : Wed Aug 23 2006 - 14:07:53 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library