Dotfuscator中字符串混淆算法
代码混淆工具,像Dotfuscator、Xenocode Postbuild等,都有重要功能就是字符串混淆,说起来很轻巧很简单,那么它到底是什么呢,如何工作的呢?
本文以Dotfuscator 4.x为例,并制造一个简单的ConsoleApplication用来做小白鼠,以此窥探字符串混淆的一斑。一下是简单ConsoleApplication的代码:
从IL代码来看,混淆逻辑使用了一个永远为true的条件(等效为if(0<1)),做了一次跳转,这才到真正的循环上,显然这里对string的每一个char进行遍历并处理,然后依次对char的高低位分别和参考量做异或运算,在交换高低位后做对高低位求或,其结果就是真实的字符串了。
总结整理了一下,算法如下:
由此可见,字符串混淆的代价是相当大的,对于商业应用来说,应该尽量避免,也就是说避免使用hard code字符串保存敏感信息。此外,显然以上字符串混淆只能阻碍静态逆向分析,因为在.NET所有的字符串对CLR Runtime Host都是透明的,如果hacker使用debugger或者类似ProcessExplorer之类的工具是很容易分析出字符串里的秘密的。
本文以Dotfuscator 4.x为例,并制造一个简单的ConsoleApplication用来做小白鼠,以此窥探字符串混淆的一斑。一下是简单ConsoleApplication的代码:
1using System;
2
3namespace ConsoleApplication1
4{
5 class Program
6 {
7 static void Main(string[] args)
8 {
9 Console.WriteLine("This is the unencrypted string.");
10 }
11 }
12}
编译,然后使用Dotfuscator混淆——我使用的Dotfuscator是4.x Pro,你需要在Option Tab里面设置Disable String Encryption为No,再Input Tab 设置输入为上面工程的编译结果,在String Encryption Tab里勾选所有的项或者添加type为*和method为*的两条规则,然后编译,完成后就可以在输出目录里找到已经混淆过了的ConsoleApplication1.exe了,使用Reflector打开,可以看到代码如下:2
3namespace ConsoleApplication1
4{
5 class Program
6 {
7 static void Main(string[] args)
8 {
9 Console.WriteLine("This is the unencrypted string.");
10 }
11 }
12}
1private static void a(string[] A_0)
2{
3 int num = 2;
4 Console.WriteLine(a("軙듛럝鏟싡跣闥죧黩蓫语탯蟱髳鏵雷駹軻蟽烿瘁愃戅⠇礉砋簍礏簑猓㠕", num));
5}
一串乱码,同时还可以看到这里增加了一个叫a的方法,那么这个a到底是什么呢?Reflector报告如下:2{
3 int num = 2;
4 Console.WriteLine(a("軙듛럝鏟싡跣闥죧黩蓫语탯蟱髳鏵雷駹軻蟽烿瘁愃戅⠇礉砋簍礏簑猓㠕", num));
5}
/* private scope */ static string a(string A_0, int A_1)
{
// This item is obfuscated and can not be translated.
显然这段代码使用Control Flow混淆过了,如此只能从IL下手了:{
// This item is obfuscated and can not be translated.
.method privatescope hidebysig static string a(string A_0, int32 A_1) cil managed
{
.maxstack 8
.locals init (
[0] char[] chArray,
[1] int32 num,
[2] int32 num2,
[3] uint8 num3,
[4] uint8 num4)
L_0000: ldarg.0
L_0001: callvirt instance char[] [mscorlib]System.String::ToCharArray()
L_0006: stloc.0
L_0007: ldc.i4 0xe74d6d7
L_000c: ldarg.1
L_000d: add
L_000e: stloc.1
L_000f: ldc.i4.0
L_0010: dup
L_0011: ldc.i4.1
L_0012: blt.s L_0047
L_0014: dup
L_0015: stloc.2
L_0016: ldloc.0
L_0017: ldloc.2
L_0018: ldloc.0
L_0019: ldloc.2
L_001a: ldelem.i2
L_001b: dup
L_001c: ldc.i4 0xff
L_0021: and
L_0022: ldloc.1
L_0023: dup
L_0024: ldc.i4.1
L_0025: add
L_0026: stloc.1
L_0027: xor
L_0028: conv.u1
L_0029: stloc.3
L_002a: dup
L_002b: ldc.i4.8
L_002c: shr
L_002d: ldloc.1
L_002e: dup
L_002f: ldc.i4.1
L_0030: add
L_0031: stloc.1
L_0032: xor
L_0033: conv.u1
L_0034: stloc.s num4
L_0036: pop
L_0037: ldloc.s num4
L_0039: ldloc.3
L_003a: stloc.s num4
L_003c: stloc.3
L_003d: ldloc.s num4
L_003f: ldc.i4.8
L_0040: shl
L_0041: ldloc.3
L_0042: or
L_0043: conv.u2
L_0044: stelem.i2
L_0045: ldc.i4.1
L_0046: add
L_0047: dup
L_0048: ldloc.0
L_0049: ldlen
L_004a: conv.i4
L_004b: blt.s L_0014
L_004d: pop
L_004e: ldloc.0
L_004f: newobj instance void [mscorlib]System.String::.ctor(char[])
L_0054: call string [mscorlib]System.String::Intern(string)
L_0059: ret
}
这里我不想过多解释IL,毕竟不是介绍MSIL,如果你有兴趣,可以查阅MSDN、相关书籍或者Google一下。{
.maxstack 8
.locals init (
[0] char[] chArray,
[1] int32 num,
[2] int32 num2,
[3] uint8 num3,
[4] uint8 num4)
L_0000: ldarg.0
L_0001: callvirt instance char[] [mscorlib]System.String::ToCharArray()
L_0006: stloc.0
L_0007: ldc.i4 0xe74d6d7
L_000c: ldarg.1
L_000d: add
L_000e: stloc.1
L_000f: ldc.i4.0
L_0010: dup
L_0011: ldc.i4.1
L_0012: blt.s L_0047
L_0014: dup
L_0015: stloc.2
L_0016: ldloc.0
L_0017: ldloc.2
L_0018: ldloc.0
L_0019: ldloc.2
L_001a: ldelem.i2
L_001b: dup
L_001c: ldc.i4 0xff
L_0021: and
L_0022: ldloc.1
L_0023: dup
L_0024: ldc.i4.1
L_0025: add
L_0026: stloc.1
L_0027: xor
L_0028: conv.u1
L_0029: stloc.3
L_002a: dup
L_002b: ldc.i4.8
L_002c: shr
L_002d: ldloc.1
L_002e: dup
L_002f: ldc.i4.1
L_0030: add
L_0031: stloc.1
L_0032: xor
L_0033: conv.u1
L_0034: stloc.s num4
L_0036: pop
L_0037: ldloc.s num4
L_0039: ldloc.3
L_003a: stloc.s num4
L_003c: stloc.3
L_003d: ldloc.s num4
L_003f: ldc.i4.8
L_0040: shl
L_0041: ldloc.3
L_0042: or
L_0043: conv.u2
L_0044: stelem.i2
L_0045: ldc.i4.1
L_0046: add
L_0047: dup
L_0048: ldloc.0
L_0049: ldlen
L_004a: conv.i4
L_004b: blt.s L_0014
L_004d: pop
L_004e: ldloc.0
L_004f: newobj instance void [mscorlib]System.String::.ctor(char[])
L_0054: call string [mscorlib]System.String::Intern(string)
L_0059: ret
}
从IL代码来看,混淆逻辑使用了一个永远为true的条件(等效为if(0<1)),做了一次跳转,这才到真正的循环上,显然这里对string的每一个char进行遍历并处理,然后依次对char的高低位分别和参考量做异或运算,在交换高低位后做对高低位求或,其结果就是真实的字符串了。
总结整理了一下,算法如下:
1static string GetString(string source, int salt)
2{
3 int index = 0;
4 char[] data = source.ToCharArray();
5 salt += 0xe74d6d7; // This const data generated by dotfuscator
6 while (index < data.Length)
7 {
8 char key = data[index];
9 byte low = (byte)((key & '\x00ff') ^ salt++);
10 byte high = (byte)((key >> 8) ^ salt++);
11 data[index] = (char)((low << 8 | high));
12 index++;
13 }
14 return string.Intern(new string(data));
15}
2{
3 int index = 0;
4 char[] data = source.ToCharArray();
5 salt += 0xe74d6d7; // This const data generated by dotfuscator
6 while (index < data.Length)
7 {
8 char key = data[index];
9 byte low = (byte)((key & '\x00ff') ^ salt++);
10 byte high = (byte)((key >> 8) ^ salt++);
11 data[index] = (char)((low << 8 | high));
12 index++;
13 }
14 return string.Intern(new string(data));
15}
由此可见,字符串混淆的代价是相当大的,对于商业应用来说,应该尽量避免,也就是说避免使用hard code字符串保存敏感信息。此外,显然以上字符串混淆只能阻碍静态逆向分析,因为在.NET所有的字符串对CLR Runtime Host都是透明的,如果hacker使用debugger或者类似ProcessExplorer之类的工具是很容易分析出字符串里的秘密的。
To be the apostrophe which changed “Impossible” into “I’m possible”
----------------------------------------------------
WinkingZhang's Blog (http://winkingzhang.cnblogs.com)
GCDN(http://gcdn.grapecity.com/cs)