MD5即Message-Digest Algorithm 5(信息-摘要算法 5),用于确保信息传输完整一致。是计算机广泛使用的雜湊算法之一(又译摘要算法、哈希算法),主流编程语言普遍已有MD5实现。
将数据(如汉字)运算为另一固定长度值是雜湊算法的基础原理,MD5的前身有MD2、MD3和MD4。
历史与密码学
1992年8月Ronald L. Rivest在向IETF提交了一份重要文件,描述了这种算法的原理,由于这种算法的公开性和安全性,在90年代被广泛使用在各种程序语言中,用以確保资料傳遞無誤等。
MD5由MD4、MD3、MD2改进而来,主要增强算法复杂度和不可逆性。
弱点
MD5较老,散列长度通常为128位,随着计算机运算能力提高,找到“碰撞”是可能的。因此,在少数安全要求高的场合不使用MD5。
2004年,中国数学家王小云证明MD5数字签名算法可以产生碰撞[1]。2007年,Marc Stevens,Arjen K. Lenstra和Benne de Weger进一步指出通过伪造软件签名,可重复性攻击MD5算法[2]。研究者使用前缀碰撞法(chosen-prefix collision),使程序前端包含恶意程序,利用后面的空间添上垃圾代码凑出同样的MD5 哈希值。
MD5算法以16个32位子分组即512位分组来提供数据雜湊,经过程序流程,生成四个32位数据,最后联合起来成为一个128位散列。基本方式为,求余、取余、调整长度、与链接变量进行循环运算。得出结果。
是 XOR, AND, OR , NOT 的符号。
伪代码
//Note: All variables are unsigned 32 bits and wrap modulo 2^32 when calculating
var int[64] r, k
//r specifies the per-round shift amounts
r[ 0..15] := {7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22}
r[16..31] := {5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20}
r[32..47] := {4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23}
r[48..63] := {6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21}
//Use binary integer part of the sines of integers as constants:
for i from 0 to 63
k[i] := floor(abs(sin(i + 1)) × 2^32)
//Initialize variables:
var int h0 := 0x67452301
var int h1 := 0xEFCDAB89
var int h2 := 0x98BADCFE
var int h3 := 0x10325476
//Pre-processing:
append "1" bit to message
append "0" bits until message length in bits ≡ 448 (mod 512)
append bit length of message as 64-bit little-endian integer to message
//Process the message in successive 512-bit chunks:
for each 512-bit chunk of message
break chunk into sixteen 32-bit little-endian words w[i], 0 ≤ i ≤ 15
//Initialize hash value for this chunk:
var int a := h0
var int b := h1
var int c := h2
var int d := h3
//Main loop:
for i from 0 to 63
if 0 ≤ i ≤ 15 then
f := (b and c) or ((not b) and d)
g := i
else if 16 ≤ i ≤ 31
f := (d and b) or ((not d) and c)
g := (5×i + 1) mod 16
else if 32 ≤ i ≤ 47
f := b xor c xor d
g := (3×i + 5) mod 16
else if 48 ≤ i ≤ 63
f := c xor (b or (not d))
g := (7×i) mod 16
temp := d
d := c
c := b
b := ((a + f + k[i] + w[g]) leftrotate r[i]) + b
a := temp
//Add this chunk's hash to result so far:
h0 := h0 + a
h1 := h1 + b
h2 := h2 + c
h3 := h3 + d
var int digest := h0 append h1 append h2 append h3 //(expressed as little-endian)
var int[64] r, k
//r specifies the per-round shift amounts
r[ 0..15] := {7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22}
r[16..31] := {5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20}
r[32..47] := {4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23}
r[48..63] := {6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21}
//Use binary integer part of the sines of integers as constants:
for i from 0 to 63
k[i] := floor(abs(sin(i + 1)) × 2^32)
//Initialize variables:
var int h0 := 0x67452301
var int h1 := 0xEFCDAB89
var int h2 := 0x98BADCFE
var int h3 := 0x10325476
//Pre-processing:
append "1" bit to message
append "0" bits until message length in bits ≡ 448 (mod 512)
append bit length of message as 64-bit little-endian integer to message
//Process the message in successive 512-bit chunks:
for each 512-bit chunk of message
break chunk into sixteen 32-bit little-endian words w[i], 0 ≤ i ≤ 15
//Initialize hash value for this chunk:
var int a := h0
var int b := h1
var int c := h2
var int d := h3
//Main loop:
for i from 0 to 63
if 0 ≤ i ≤ 15 then
f := (b and c) or ((not b) and d)
g := i
else if 16 ≤ i ≤ 31
f := (d and b) or ((not d) and c)
g := (5×i + 1) mod 16
else if 32 ≤ i ≤ 47
f := b xor c xor d
g := (3×i + 5) mod 16
else if 48 ≤ i ≤ 63
f := c xor (b or (not d))
g := (7×i) mod 16
temp := d
d := c
c := b
b := ((a + f + k[i] + w[g]) leftrotate r[i]) + b
a := temp
//Add this chunk's hash to result so far:
h0 := h0 + a
h1 := h1 + b
h2 := h2 + c
h3 := h3 + d
var int digest := h0 append h1 append h2 append h3 //(expressed as little-endian)
MD5散列
一般128位的MD5散列被表示为32位十六进制数字。以下是一个43位长ASCII字母列的MD5散列:
MD5("The quick brown fox jumps over the lazy dog") = 9e107d9d372bb6826bd81d3542a419d6
即使在原文中作一个小变化(比如用c取代d)其散列也会发生巨大的变化:
MD5("The quick brown fox jumps over the lazy cog") = 1055d3e698d289f2af8663725127bd4b
空文的散列为:
MD5("") = d41d8cd98f00b204e9800998ecf8427e