python中使用struct模块处理二进制数据
假设这样个场景:
你有个文件,里面全是二进制方式存储的整型,你需要读取出来。于是你随手写出了这样的代码:
1 f = open("file","rb")
2 #读取个整型
3 data = f.read(4)
4 #读取完毕,关了文件
5 f.close()
6 #转换
7 num = int(data)
2 #读取个整型
3 data = f.read(4)
4 #读取完毕,关了文件
5 f.close()
6 #转换
7 num = int(data)
然后就会报错:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: null byte in argument for int()
File "<stdin>", line 1, in <module>
ValueError: null byte in argument for int()
你查看下data:
data = '8\x00:\x00'
神马!!竟然是十六进制!!这货怎么转换?
你翻开《CookBook》 里面扯了一堆不着边用不到的东西,就是没有这个问题。绝望之时,你在文档里看到了struct模块,文档里这么说:
This module performs conversions between Python values and C structs represented as Python strings. It uses format strings (explained below) as compact descriptions of the lay-out of the C structs and the intended conversion to/from Python values. This can be used in handling binary data stored in files or from network connections, among other sources.
大概看了下,主要用到的是pack 函数和unpack函数。
pack:
struct.pack(fmt, v1, v2, ...)
Return a string containing the values v1, v2, ... packed according to the given format. The arguments must match the values required by the format exactly.
Return a string containing the values v1, v2, ... packed according to the given format. The arguments must match the values required by the format exactly.
unpack:
struct.unpack(fmt, string)
Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).
Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).
下面还有个对应C语言中数据类型的表格:
Format | C Type | Python | Notes |
---|---|---|---|
x | pad byte | no value | |
c | char | string of length 1 | |
b | signed char | integer | |
B | unsigned char | integer | |
? | _Bool | bool | (1) |
h | short | integer | |
H | unsigned short | integer | |
i | int | integer | |
I | unsigned int | integer or long | |
l | long | integer | |
L | unsigned long | long | |
q | long long | long | (2) |
Q | unsigned long long | long | (2) |
f | float | float | |
d | double | float | |
s | char[] | string | |
p | char[] | string | |
P | void * | long | |
你把代码改成了:
f = open("file","rb")
data = f.read(4)
#转换格式
num = struct.unpack("i",data)
f.close()
print(num)
data = f.read(4)
#转换格式
num = struct.unpack("i",data)
f.close()
print(num)
这样就正常输出了吧?
嘿嘿,可以早点下班了。
如果还不明白,就接着看文档去,还有可以参考下这篇文章:
http://www.cnblogs.com/tonychopper/archive/2010/07/23/1783501.html
再次赞颂简洁万能的Python。