file.seek()方法引出的文本文件和二进制文件问题

问题的起因

菜鸟教程上有一段关于file.seek()方法的讲解，先简短描述一下seek()方法：

seek(offset, whence)方法用于移动文件读取指针到指定位置
参数offset--开始的偏移量，也就是代表偏移的字节数
参数whence--可选，默认为0。0表示从文件开头算起，1表示从当前位置算起，2表示从文件末尾算起
返回值：该方法没有返回值

实例

文件runoob.txt的内容如下:

1:www.runoob.com
2:www.runoob.com
3:www.runoob.com
4:www.runoob.com
5:www.runoob.com

循环读取文件的内容：

#打开文件
fo = open("runoob.txt", "r+")
print("文件名为: ", fo.name)

line = fo.readline()
print("读取的数据为: %s" % line)

#重新设置文件读取指针到开头
fo.seek(2, 1)
line = fo.readline()
print("读取的数据为: %s" % line)

#关闭文件
fo.close()

运行一下发现报错了：

D:\Program\python34\python.exe D:/python_workshop/python6/study_file.py
Traceback (most recent call last):
文件名为:  runoob.txt
  File "D:/python_workshop/python6/study_file.py", line 158, in <module>
读取的数据为: 1:www.runoob.com
    fo.seek(2, 1)

io.UnsupportedOperation: can't do nonzero cur-relative seeks

Process finished with exit code 1

分析原因

文本文件和二进制文件

这要要从文本文件和二进制文件说起了。

从广义(物理意义)上说，二进制文件包括了文本文件，因为计算机最终存储的是二进制文件，所以物理意义上两者就是一回事，但从狭义(逻辑)上来说，两者存储的方式又有所不同。

文本文件又叫ASCII文件，这种文件在磁盘上存放时，每个字符(8个bit位)对应一个字节，用于存放对应的ASCII码，例如：

ASCII 码 00110101 00110110 00110111 00111000

| | | |

　　　　　5　　　　　6　　　　　7　　　　 8

共占用了4个字节，ASCII文件可在屏幕上按字符显示。

二进制文件，是按二进制编码存放文件的，如5678的存储方式为：00010110 00101110，只占用2个字节，二进制文件虽然也可在屏幕上显示，但其内容无法读懂。C系统在处理这些文件时，并不区分类型，都看成是字符流，按字节进行处理。输入输出字符流的开始和结束只由程序控制而不受物理符号(如回车符)的控制。，因此也把这种文件称作"流式文件"。

读写的区别

在读取文档时，python认为0x1A(26)是文档结束符(EOF)，所以有时使用"r"读取二进制文件时，可能会出现读取不全的现象，如：

二进制文件中存在如下从低位向高位排列的数据：7F 32 1A 2F 3D 2C 12 2E 76
如果使用'r'进行读取，则读到第三个字节，即认为文件结束
如果使用'rb'按照二进制位进行读取的，不会将读取的字节转换成字符，从而避免了上面的错误

在写入文档时，写入'\n'，windows操作系统会隐式的将'\n'转换为"\r\n"，再写入到文件中；读的时候，会把“\r\n”隐式转化为'\n'，再读到变量中，而二进制文件是非解释性的，一次处理一个字符，并且不转换字符。

r+和rb+

"r+"是打开一个文件用于读写。文件指针将会放在文件的开头
"rb+"是以二进制格式打开一个文件用于读写。文件指针将会放在文件的开头

python3.4的官方解释

To change the file object’s position, use f.seek(offset, from_what). The position is computed from adding offset to a reference point; the reference point is selected by the from_what argument. A from_what value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. from_what can be omitted and defaults to 0, using the beginning of the file as the reference point.

意思是，为了改变文件对象的位置，可以使用f.seek(offset, from_what)方法。最终位置等于参考点位置加上偏移量，参考点是由from_what参数决定的。from_what有三个值，0代表文件的开始位置，1代表使用当前的位置，2代表文件的末尾位置。from_what参数省略时，默认是0。

python官方文档也给了一个实例：

>>> f = open('workfile', 'rb+')                      #我们可以看出，文件workfile是以二进制格式进行读写的
>>> f.write(b'0123456789abcdef')
16
>>> f.seek(5)     # Go to the 6th byte in the file
5
>>> f.read(1)
b'5'
>>> f.seek(-3, 2) # Go to the 3rd byte before the end
13
>>> f.read(1)
b'd'

In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2))

在文本文件中，只有在文件开头(即from_what默认是0时)进行偏移是被允许的。不允许在当前所在位置(from_what=1)和文件末尾(from_what=2)时进行偏移。

现在回头看看我们的第一个代码和报错信息，发现我们的读写模式用了"r+"，我们的文件是文本文件，在使用fo.seek()时，将from_what的值设置为1，所以报错了。

#打开文件
fo = open("runoob.txt", "r+")
print("文件名为: ", fo.name)

line = fo.readline()
print("读取的数据为: %s" % line)

#重新设置文件读取指针带开头
fo.seek(2, 1)
line = fo.readline()
print("读取的数据为: %s" % line)

#关闭文件
fo.close()

D:\Program\python34\python.exe D:/python_workshop/python6/study_file.py
Traceback (most recent call last):
文件名为:  runoob.txt
  File "D:/python_workshop/python6/study_file.py", line 158, in <module>
读取的数据为: 1:www.runoob.com
    fo.seek(2, 1)

io.UnsupportedOperation: can't do nonzero cur-relative seeks

Process finished with exit code 1

延伸

如果我们把读写模式改为"rb+"，以二进制的方式进行读写，那么是不是就可以了，确实可以，但不容忽视的一个小问题是，二进制文件不会对windows下的换行(\r\n, 0x0D 0x0A)进行转化的，我们看到的结果将是：

文件名为:  runoob.txt
读取的数据为: b'1:www.runoob.com\r\n'
读取的数据为: b'www.runoob.com\r\n'

参考

https://blog.csdn.net/timberwolf_2012/article/details/28499615

https://www.zhihu.com/question/19971994

https://docs.python.org/3.4/tutorial/inputoutput.html

http://bbs.fishc.com/thread-60449-1-1.html

https://www.cnblogs.com/kingleft/p/5142469.html

https://www.cnblogs.com/xisheng/p/7636736.html

https://bbs.csdn.net/wap/topics/350127738

https://www.cnblogs.com/pengwangguoyh/articles/3223072.html

https://blog.csdn.net/seu_xuxueqi/article/details/621904

http://www.xuebuyuan.com/367184.html

posted @ 2018-04-21 22:14 cnhkzyy 阅读(1258) 评论(0) 收藏举报

刷新页面返回顶部

cnhkzyy

认真写博客，努力加餐饭