python3 open csv encoding error

python3.4 & Django1.9

with open(r'C:\test\test.csv', newline='', encoding="utf-8") as f:

data_reader = csv.reader(f)

for row in data_reader:

print(row)

test.csv 是encoding UTF-8 without BOM类型（Notepad++查看）

Error info：'utf-8' codec can't decode byte 0xa0 in position 1396: invalid start byte

修改如下：

with open(r'C:\test\test.csv', newline='', encoding="utf-8", errors="ignore") as f:

参考文档：

Python open CSV file with supposedly mixed encodings

open()

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

errors is an optional string that specifies how encoding and decoding errors are to be handled–this cannot be used in binary mode. A variety of standard error handlers are available, though any error handling name that has been registered with codecs.register_error() is also valid. The standard names are:

'strict' to raise a ValueError exception if there is an encoding error. The default value of None has the same effect.
'ignore' ignores errors. Note that ignoring encoding errors can lead to data loss.
'replace' causes a replacement marker (such as '?') to be inserted where there is malformed data.
'surrogateescape' will represent any incorrect bytes as code points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These private code points will then be turned back into the same bytes when thesurrogateescape error handler is used when writing data. This is useful for processing files in an unknown encoding.
'xmlcharrefreplace' is only supported when writing to a file. Characters not supported by the encoding are replaced with the appropriate XML character reference &#nnn;.
'backslashreplace' (also only supported when writing) replaces unsupported characters with Python’s backslashed escape sequences.

posted on 2017-10-13 11:24 cyn_413 阅读(254) 评论(0) 编辑收藏举报