Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Re: Preventing cross site scripting

From: Laurian Gridinoc <laur(at)grapefruitdesign.com>
Date: Fri Jun 20 2003 - 17:55:18 EDT

On Fri, 2003-06-20 at 20:11, Tim Greer wrote:
> Please provide some examples of this. I'd like to see your idea(s) at work
> and how it would solve this problem. I'm honestly not quite clear on the
> context in which you mean this to solve this problem and I'm interested
> knowing. I'm not sure I agree right now, so some examples illustrating it

> would be great--if you'd be so kind. Thanks.

This thread started with `how to export safely HTML mail messages to the web'.
This may require to deal with the some of the following issues:

  1. broken markup (<ni <foo href="a"d"" bar='> baz> &quot no semicolon)
  2. unacceptable entities
  3. unacceptable tags (applet, object)
  4. unacceptable attributes on acceptable tags (onmouseover, ...)
  5. unacceptable attribute values (href="javascript:...", width="100000")
  6. unacceptable text tokens (offensive words)

I suggest to deal with them in the stated order, and not treat the HTML string as a mere string, but dissect it in markup and content; clean the markup (first elements, then attributes of the accepted elements) then text.

[1] is wonderfully solved by filtering through tidy outputting xml (xhtml) - this would be the data for the next steps.

The rest of the issues may be controlled by a XSL transformation on the above generated xml.

[2] with a proper DTD you may alter the `rendering' of any unaccepted entity, let's say that I want to change &acirc; (capital A, circumflex accent) to capital A instead, simply by defining it in the DTD: <!ENTITY Acirc CDATA "A">

Note that &lt;, &gt;, &amp; and &quote; cannot be handled this way.

Do you need help?X

[3] unacceptable tags, now is preferable to use white lists; let's see a black list solution:

<!-- drop script silently-->
<xsl:template match="script" />

<!-- or drop script and leave a note --> <xsl:template match="script">

        <xsl:comment>here was an evil script</xsl:comment> </xsl:template>

<!-- drop applet preserving it's content (ex. the `backup' markup for useragents that don't understand applet tag) --> <xsl:template match="applet">

        <xsl:apply-templates />
</xsl:template>

<!-- and accept everything since this is a blacklist solution --> <xsl:template match="*|@*|text()|comment()">

    <xsl:copy>

Do you need more help?X

        <xsl:apply-templates select="*|@*|text()|comment()" />     </xsl:copy>
</xsl:template>

The whitelist solution would match only accepted tags:

<!-- accept only p, ul, li and attributes on them (and text nodes too, and comments) -->
<xsl:template match="p|ul|li|@*|text()|comment()">

    <xsl:copy>

        <xsl:apply-templates select="*|@*|text()|comment()" />     </xsl:copy>
</xsl:template>

[4] unacceptable attributes, blacklist version:

<!-- accept everything on `a' except on* attributes --> <xsl:template match="a">

	
		
		        
				
					
				
				
					
				
	                
        	 
	
	

</xsl:template>

Whitelist version:

Can we help you?X

<!-- accept only href and title on `a' --> <xsl:template match="a">

	
		
			
		
		
			
		
		
	

</xsl:template>

[5, 6] unacceptable attribute and text values, now here is funny, the string manipulation functions in XSL are few and not so powerful as regex, but there isn't impossible to build proper value validation.

On strings (node and attribute names, attribute and text node values) you have just concat, contains, starts-with, string-length, substring, substring-after, substring-before and translate; almost nothing compared to regex power, but in the end is not a contest of writing it all on a line.

I'm not writing this to say regex are bad, I'm just stating that not everything that can be hold in a string should be treated this way; this means that HTML should be represented as (parsed to) a DOM tree (where only nodes/attributes names, attributes values, text nodes and comments are separate strings) where what cannot be divided anymore (atom) to another set of tokens should be the subject of validation as a string or number; however an attribute value which should represent an URL should be validated by using a parser specifically built for this task (based on URL grammar).

Cheers,

-- 
Laurian Gridinoc
Chief Developer
GRAPEFRUIT DESIGN

tel/fax: +40.232.233068
tel/fax: +1.646.349.2916
mobile: +40.745.304379
e-mail: laur@gd.ro
www.grapefruitdesign.com
www.gd.ro
Received on Sat Jun 21 22:26:31 2003

This archive was generated by hypermail 2.1.8 : Wed Aug 23 2006 - 14:07:53 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library