python -- excel文件被重命名修改后缀为.csv后,使用pandas来读取时,仍会被识别为一个excel文件

 

新建数据文件如c_data.xlsx(后缀为.xlsx),右键重命名,直接将文件后缀名一并修改,修改为“c_data.csv”

读取文件里的数据

data = pd.read_csv('E:/python_workspace/data_space/c_data.csv')

发现报错信息如下:

Traceback (most recent call last):
  File "E:/python_workspace/Demo/pandas_pratices.py", line 3, in <module>
    data = pd.read_csv('E:/python_workspace/data_space/c_data.csv')
  File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 460, in _read
    data = parser.read(nrows)
  File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 1198, in read
    ret = self._engine.read(nrows)
  File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 2157, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
  File "pandas\_libs\parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas\_libs\parsers.pyx", line 918, in pandas._libs.parsers.TextReader._read_rows
  File "pandas\_libs\parsers.pyx", line 905, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas\_libs\parsers.pyx", line 2042, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 7, saw 2

 

网上找了好几个解决教程都发现解决不了。

后面察觉到会不会一开始新建的文件为.xlsx后缀直接修改成.csv文件后,其实本质上还是一个excel文件,导致read_csv报错。

 

打开c_data.csv文件,将其另存为后缀为.csv文件,命名为“a_data.csv”,后面进行测试对比。

 

1)demo1:首先我们假设这个c_data.csv文件即使被修改后缀了,仍然是一个excel文件,用read_excel来读取

import pandas as pd

data = pd.read_excel('E:/python_workspace/data_space/c_data.csv')
print(data)

运行后并没有报错,而是输出读取结果

   id   name  score
0   1   小米   78.01   
1   2   小白   88.02   
2   3   小新   99.03   
3   4   小圆   99.04   
4   5   小羊    NaN

 

2)demo2:用read_csv读取新的csv文件“a_data.csv”

import pandas as pd

data = pd.read_csv('E:/python_workspace/data_space/a_data.csv')
print(data)

运行后,上面那个报错已经没有出现,只是出现编码问题

Traceback (most recent call last):
  File "pandas\_libs\parsers.pyx", line 1119, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas\_libs\parsers.pyx", line 1244, in pandas._libs.parsers.TextReader._convert_with_dtype
  File "pandas\_libs\parsers.pyx", line 1259, in pandas._libs.parsers.TextReader._string_convert
  File "pandas\_libs\parsers.pyx", line 1450, in pandas._libs.parsers._string_box_utf8UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 2: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:/python_workspace/MyWriter/CDA_Demo/pandas_pratices.py", line 3, in <module>
    data = pd.read_csv('E:/python_workspace/data_space/a_data.csv')
  File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 460, in _read
    data = parser.read(nrows)
  File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 1198, in read
    ret = self._engine.read(nrows)
  File "E:\python_workspace\MyWriter\venv\lib\site-packages\pandas\io\parsers.py", line 2157, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
  File "pandas\_libs\parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas\_libs\parsers.pyx", line 941, in pandas._libs.parsers.TextReader._read_rows
  File "pandas\_libs\parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas\_libs\parsers.pyx", line 1126, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas\_libs\parsers.pyx", line 1244, in pandas._libs.parsers.TextReader._convert_with_dtype
  File "pandas\_libs\parsers.pyx", line 1259, in pandas._libs.parsers.TextReader._string_convert
  File "pandas\_libs\parsers.pyx", line 1450, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 2: invalid continuation byte

 

将编码设置补充进去

import pandas as pd

data = pd.read_csv('E:/python_workspace/data_space/a_data.csv', encoding='gbk')
print(data)

运行后,控制台输出

   id   name  score
0   1   小米   78.01   
1   2   小白   88.02   
2   3   小新   99.03   
3   4   小圆   99.04   
4   5   小羊    NaN

 

综合上面两个测试,可得出:后缀为.xlsx的文件重命名修改成.csv文件后,使用pandas来读取时,仍会被识别为一个excel文件,导致read_csv报错。

若遇到此问题,可以将这个csv文件重新另存为csv文件,就可以用read_csv来读取了。

 

本篇到此结束~

 

posted @ 2023-05-14 17:22  lmei  阅读(216)  评论(0编辑  收藏  举报