Python Pandas read_csv报错

为实现文本去重(将前面采集的数据进行两两对比删除重复),写了以下代码。

#-*- coding: utf-8 -*-
import pandas as pd

inputfile = 'e:/data/H_KJ300F-JAC2101W.txt' #评论文件
outputfile = 'e:/data/H_KJ300F-JAC2101W_process_1.txt' #评论处理后保存路径
data = pd.read_csv(inputfile, encoding = 'utf-8', header = None)
l1 = len(data)
data = pd.DataFrame(data[0].unique())
l2 = len(data)
data.to_csv(outputfile, index = False, header = False, encoding = 'utf-8')
print(u'删除了%s条评论。' %(l1 - l2))

报错:

Traceback (most recent call last):  File "<stdin>", line 1, in <module>    return _read(filepath_or_buffer, kwds)  File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 401, in _read    data = parser.read()  File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 939, in read    ret = self._engine.read(nrows)  File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1508, in read    data = self._reader.read(nrows)  File "pandas\parser.pyx", line 848, in pandas.parser.TextReader.read (pandas\parser.c:10415)  File "pandas\parser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:10691)  File "pandas\parser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandas\parser.c:11437)  File "pandas\parser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:11308)  File "pandas\parser.pyx", line 2024, in pandas.parser.raise_parser_error (pandas\parser.c:27037)pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 360, saw 2>>> data =pd.read_csv(inputfile,encoding ='utf-8',header = None)    data = self._reader.read(nrows)  File "pandas\parser.pyx", line 848, in pandas.parser.TextReader.read (pandas\parser.c:10415)>>>   File "pandas\parser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:11308)  File "pandas\parser.pyx", line 2024, in pandas.parser.raise_parser_error (pandas\parser.c:27037)pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 361, saw 2  File "pandas\parser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:10691)  File "pandas\parser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandas\parser.c:11437)    ret = self._engine.read(nrows)  File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1508, in read    data = parser.read()  File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 939, in read    return _read(filepath_or_buffer, kwds)  File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 401, in _read  File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 646, in parser_fTraceback (most recent call last):  File "<stdin>", line 1, in <module>

解决:把整个文件里面的半角","换成全角",“

原因:没有设定分隔符的情况下,默认使用","作为分隔条符。

 

posted @ 2017-04-22 20:59  <编程小白>  阅读(1397)  评论(0编辑  收藏  举报