python -- excel文件被重命名修改后缀为.csv后,使用pandas来读取时,仍会被识别为一个excel文件
新建数据文件如c_data.xlsx(后缀为.xlsx),右键重命名,直接将文件后缀名一并修改,修改为“c_data.csv”
读取文件里的数据
data = pd.read_csv('E:/python_workspace/data_space/c_data.csv')
发现报错信息如下:
Traceback (most recent call last): File "E:/python_workspace/Demo/pandas_pratices.py", line 3, in <module> data = pd.read_csv('E:/python_workspace/data_space/c_data.csv') File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 460, in _read data = parser.read(nrows) File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 1198, in read ret = self._engine.read(nrows) File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 2157, in read data = self._reader.read(nrows) File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read File "pandas\_libs\parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory File "pandas\_libs\parsers.pyx", line 918, in pandas._libs.parsers.TextReader._read_rows File "pandas\_libs\parsers.pyx", line 905, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas\_libs\parsers.pyx", line 2042, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 7, saw 2
网上找了好几个解决教程都发现解决不了。
后面察觉到会不会一开始新建的文件为.xlsx后缀直接修改成.csv文件后,其实本质上还是一个excel文件,导致read_csv报错。
打开c_data.csv文件,将其另存为后缀为.csv文件,命名为“a_data.csv”,后面进行测试对比。
(1)demo1:首先我们假设这个c_data.csv文件即使被修改后缀了,仍然是一个excel文件,用read_excel来读取
import pandas as pd data = pd.read_excel('E:/python_workspace/data_space/c_data.csv')
print(data)
运行后并没有报错,而是输出读取结果
id name score
0 1 小米 78.01
1 2 小白 88.02
2 3 小新 99.03
3 4 小圆 99.04
4 5 小羊 NaN
(2)demo2:用read_csv读取新的csv文件“a_data.csv”
import pandas as pd data = pd.read_csv('E:/python_workspace/data_space/a_data.csv') print(data)
运行后,上面那个报错已经没有出现,只是出现编码问题
Traceback (most recent call last): File "pandas\_libs\parsers.pyx", line 1119, in pandas._libs.parsers.TextReader._convert_tokens File "pandas\_libs\parsers.pyx", line 1244, in pandas._libs.parsers.TextReader._convert_with_dtype File "pandas\_libs\parsers.pyx", line 1259, in pandas._libs.parsers.TextReader._string_convert File "pandas\_libs\parsers.pyx", line 1450, in pandas._libs.parsers._string_box_utf8UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 2: invalid continuation byte During handling of the above exception, another exception occurred: Traceback (most recent call last): File "E:/python_workspace/MyWriter/CDA_Demo/pandas_pratices.py", line 3, in <module> data = pd.read_csv('E:/python_workspace/data_space/a_data.csv') File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 460, in _read data = parser.read(nrows) File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 1198, in read ret = self._engine.read(nrows) File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 2157, in read data = self._reader.read(nrows) File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read File "pandas\_libs\parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory File "pandas\_libs\parsers.pyx", line 941, in pandas._libs.parsers.TextReader._read_rows File "pandas\_libs\parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_column_data File "pandas\_libs\parsers.pyx", line 1126, in pandas._libs.parsers.TextReader._convert_tokens File "pandas\_libs\parsers.pyx", line 1244, in pandas._libs.parsers.TextReader._convert_with_dtype File "pandas\_libs\parsers.pyx", line 1259, in pandas._libs.parsers.TextReader._string_convert File "pandas\_libs\parsers.pyx", line 1450, in pandas._libs.parsers._string_box_utf8 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 2: invalid continuation byte
将编码设置补充进去
import pandas as pd data = pd.read_csv('E:/python_workspace/data_space/a_data.csv', encoding='gbk') print(data)
运行后,控制台输出
id name score 0 1 小米 78.01 1 2 小白 88.02 2 3 小新 99.03 3 4 小圆 99.04 4 5 小羊 NaN
综合上面两个测试,可得出:后缀为.xlsx的文件重命名修改成.csv文件后,使用pandas来读取时,仍会被识别为一个excel文件,导致read_csv报错。
若遇到此问题,可以将这个csv文件重新另存为csv文件,就可以用read_csv来读取了。
本篇到此结束~