批量将excel文件导入到数据库
一、背景
1.1 需求
如下图,该文件夹下有许多csv文件,要将这些文件导入到数据库中,数据库中的表名与文件夹中的表名一致。
1.2 配置
Python 版本:3.9
Python编辑器:Pycharm 2022.2.1
数据库:
Microsoft SQL Server 2019 (RTM-GDR) (KB5021125) - 15.0.2101.7 (X64) Jan 23 2023 13:08:05 Copyright (C) 2019 Microsoft Corporation Developer Edition (64-bit) on Windows 10 Home China 10.0 <X64> (Build 22000: )
二、连接数据库
2.1 代码
import pymssql class MysqlOperation(object): def __init__(self, config): self.connection = pymssql.connect(server=config['server'], user=config['user'], password=config['password'], database=config['database'], charset='utf8' ) def read_sql(self, sql): with self.connection.cursor() as cursor: try: cursor.execute(sql) result = cursor.fetchall() return result except Exception as e: self.connection.rollback() # 回滚 print('事务失败', e) def insert_sql(self, sql): with self.connection.cursor() as cursor: try: cursor.execute(sql) self.connection.commit() except Exception as e: self.connection.rollback() print('事务失败', e) def update_sql(self, sql): # sql_update ="update user set username = '%s' where id = %d" with self.connection.cursor() as cursor: try: cursor.execute(sql) # 像sql语句传递参数 # 提交 self.connection.commit() except Exception as e: # 错误回滚 self.connection.rollback() def delect_sql(self, sql_delete): with self.connection.cursor() as cursor: try: cursor.execute(sql_delete) # 像sql语句传递参数 # 提交 self.connection.commit() except Exception as e: # 错误回滚 self.connection.rollback() def read_one(self, sql): with self.connection.cursor() as cursor: try: cursor.execute(sql) result = cursor.fetchone() return result except Exception as e: self.connection.rollback() # 回滚 print('事务失败', e)
2.2 说明
- 该段代码参考:https://www.cnblogs.com/qianslup/p/12567284.html。是用来操作mysql数据库的,有些代码可能不适用。未进行全部测试。
- 代码能否成功的关键是cursor、rollback、fetchall。
- 该段代码的文件名为:ssmsConnect ,下面会作为一个包导入。
三、 代码
3.1 思路
- 首先确定数据库连接成功,并进行SQL语句测试
- 将Excel表遍历,获的所有Excel表的路径,准备后面的循环。
- 读取Excel表的表名,列名,在数据库中建立相应的表
- 读取Excel表的内容,并以行为单位转化成可插入语句。
- 如果Excel表的内容过多,防止数据库过载,可以每次只插入一定量的数据,循环插入,直至全部插入完成。
- 可以选择一个较小的Excel表进行测试,Python中的所有SQL语句,都需要再SQL编辑器中执行一下,确保语句无误。
3.2 展示
# -*- coding: utf-8 -*- """ @time : 2023/9/11 19:18 @File: pb_01.py @Software: PyCharm @Author : qianslup @Version: python3.9 """ from ssmsConnect import MysqlOperation import math import os import pandas as pd import pymysql import pymssql # 配置数据库 config = {'server': 'QIANSL', 'user': 'sa', 'password': 'sa123456', 'database': 'cnblogs' } mssql = MysqlOperation(config=config) # 建表 def create_table(columns,table_name): sql_drop = 'Drop table if exists cnblogs.pb.['+table_name+']' # print(sql_drop) mssql.delect_sql(sql_drop) # 用删除的语句drop掉 print('drop表:'+table_name) sql_create = 'CREATE TABLE cnblogs.pb.['+table_name+'] (' for column in columns: sql_create = sql_create + ' [' +column + '] ' + 'nvarchar(255),' sql_create = sql_create[:len(sql_create)-1] sql_create = sql_create + ")" # print(sql_create) mssql.delect_sql(sql_create) # 建表 print('create表:'+table_name) def insert_sql(values, table_name, columns): sql_delete = 'delete from cnblogs.pb.['+table_name+']' mssql.delect_sql(sql_delete) # 先将表清空,防止表中已经有其他数据。表是新建的,不可能有其他数据,但还是习惯防错清空 print('delte表:'+table_name) # print(sql_delete) sql_insert_0 = 'insert into cnblogs.pb.[' + table_name + '](' for column in columns: sql_insert_0 = sql_insert_0 + ' [' +column + '], ' sql_insert_0 = sql_insert_0[:len(sql_insert_0)-2] n=1000 #每次做多插入1000行,可根据实际情况调整, 插入行数太多,数据库扛不住 len_v = math.ceil(len(values)/n) for i in range(len_v): value = values[i*n:(i+1)*n] sql_insert = sql_insert_0+") VALUES "+ ','.join(value) sql_insert = sql_insert.replace("'Null'", "Null") if len(columns) ==1: # 如果只有一列,插入SQL语句需要变化一下。 sql_insert = sql_insert.replace(",)", ")") # print(sql_insert) mssql.insert_sql(sql_insert) print("insert表:"+table_name) def convert_value(df): values =[] for v in df.iloc[:].values: values.append(f"{tuple(v)}") return values if __name__== '__main__': path_file = r'E:\PowerBI学习\零售源文件' for i in os.listdir(path_file): path_file_i = path_file+'\\'+i table_name = i.replace(".csv", "") df = pd.read_csv(path_file_i, encoding='utf-8') df.fillna("Null", inplace=True) columns = df.columns create_table(columns=columns, table_name=table_name) values = convert_value(df) insert_sql(values=values, table_name=table_name, columns=columns) # 调出一个表进行测试,无误之后再进行循环 # path_file = r'E:\PowerBI学习\零售源文件' # i = '02_Hour.csv' # path_file_i = path_file + '\\' + i # table_name = i.replace(".csv", "") # df = pd.read_csv(path_file_i, encoding='utf-8') # df.fillna("Null", inplace=True) # columns = df.columns # create_table(columns=columns, table_name=table_name) # values = convert_value(df) # insert_sql(values=values, table_name=table_name, columns=columns)
四、结果展示
4.1 查看表
所有excel表都被存在数据库中。
4.2 查看内容
主要是核对行数是否一致。