Fork me on GitHub

使用pandas导入csv文件到MySQL

之前尝试过用命令行来解决csv文件导入到MySQL这个问题,没想到一直没有成功。之后会继续更新的吧,现在先用pandas来解决这个问题,虽然会复杂一点,但至少能用。

例子是导入movielens的rating数据,如下:

1. 创建表格

CREATE TABLE ratings(
    id BIGINT NOT NULL AUTO_INCREMENT, 
    userId BIGINT NOT NULL, 
    movieId BIGINT NOT NULL, 
    rating DOUBLE NOT NULL, 
    timestamp BIGINT NOT NULL, 
	PRIMARY KEY(id)
);

2. pandas 操作

下列操作如遇到no module,请自行用pip进行下载😄。

2.1. 创建与数据库的连接

from sqlalchemy import create_engine

engine = create_engine('mysql+mysqldb://root:password@localhost:3306/test?charset=utf8')

#format: create_engine(mysql+mysqldb://用户名:密码@localhost:端口/数据库名?编码)

上面的操作可能会遇到这个问题:

Traceback (most recent call last):
  ...
  ...
ImportError: dlopen(/Users/wooka/anaconda3/lib/python3.6/site-packages/_mysql.cpython-36m-darwin.so, 2): Library not loaded: @rpath/libmysqlclient.21.dylib
  Referenced from: /Users/wooka/anaconda3/lib/python3.6/site-packages/_mysql.cpython-36m-darwin.so
  Reason: image not found

虽然没太弄懂是原因,但是应该是导入错误。尝试了一些方法,直接提供可行的解决方法:

原文是创建一个symbolic link,命令行如下,具体可见注释2:(将libmysqlclient.21.dylib 的版本改为自己的版本号)

sudo ln -s /usr/local/mysql/lib/libmysqlclient.21.dylib /usr/lib/libmysqlclient.21.dylib

2.2 导入csv为DataFrame

import pandas as pd
data = pd.read_csv('~/Desktop/ratings.csv', encoding='utf-8')

2.3 导出到MySQL

data.to_sql('ratings', con=engine, if_exists='append', index_label='id')

哈哈,我发现这样有错误:

sqlalchemy.exc.IntegrityError: (_mysql_exceptions.IntegrityError) (1062, "Duplicate entry '1' for key 'PRIMARY'") [SQL: 'INSERT INTO ratings (id, `userId`, `movieId`, rating, timestamp)...

之前在数据库中创建表格的时候定义了主键id,且自动递增,而主键的index应该从1开始。然而这里的index_label明显是从0开始的,所以会产生这样的错误。

其实看了一些方法,有定义函数,有先创建表格后定义主键的。但是,这样最简便:

data.to_sql(con=engine, name='ratings', if_exists='replace', flavor='mysql', index=False)

with engine.connect() as con:
    con.execute('ALTER TABLE `ratings` ADD PRIMARY KEY (`id`);')

Reference:

  1. https://blog.csdn.net/qq_36523839/article/details/80588092
  2. https://stackoverflow.com/questions/6383310/python-mysqldb-library-not-loaded-libmysqlclient-18-dylib
  3. https://stackoverflow.com/questions/30867390/python-pandas-to-sql-how-to-create-a-table-with-a-primary-key
posted @ 2019-01-10 16:26  Byron_NG  阅读(4405)  评论(0编辑  收藏  举报