批量将excel文件导入到数据库

一、背景

1.1 需求

如下图,该文件夹下有许多csv文件,要将这些文件导入到数据库中,数据库中的表名与文件夹中的表名一致。

 

 

 

1.2 配置

Python 版本:3.9 

Python编辑器:Pycharm 2022.2.1

数据库:
Microsoft SQL Server 2019 (RTM-GDR) (KB5021125) - 15.0.2101.7 (X64)   Jan 23 2023 13:08:05   Copyright (C) 2019 Microsoft Corporation  Developer Edition (64-bit) on Windows 10 Home China 10.0 <X64> (Build 22000: ) 

 

二、连接数据库

2.1 代码

import pymssql

class MysqlOperation(object):
    def __init__(self, config):
        self.connection = pymssql.connect(server=config['server'],
                                          user=config['user'],
                                          password=config['password'],
                                          database=config['database'],
                                          charset='utf8'
                                          )

    def read_sql(self, sql):
        with self.connection.cursor() as cursor:
            try:
                cursor.execute(sql)
                result = cursor.fetchall()
                return result
            except Exception as e:
                self.connection.rollback()  # 回滚
                print('事务失败', e)

    def insert_sql(self, sql):
        with self.connection.cursor() as cursor:
            try:
                cursor.execute(sql)
                self.connection.commit()
            except Exception as e:
                self.connection.rollback()
                print('事务失败', e)

    def update_sql(self, sql):
        # sql_update ="update user set username = '%s' where id = %d"

        with self.connection.cursor() as cursor:
            try:
                cursor.execute(sql)  # 像sql语句传递参数
                # 提交
                self.connection.commit()
            except Exception as e:
                # 错误回滚
                self.connection.rollback()

    def delect_sql(self, sql_delete):
        with self.connection.cursor() as cursor:
            try:
                cursor.execute(sql_delete)  # 像sql语句传递参数
                # 提交
                self.connection.commit()
            except Exception as e:
                # 错误回滚
                self.connection.rollback()

    def read_one(self, sql):
        with self.connection.cursor() as cursor:
            try:
                cursor.execute(sql)
                result = cursor.fetchone()
                return result
            except Exception as e:
                self.connection.rollback()  # 回滚
                print('事务失败', e)

 

2.2 说明

  1. 该段代码参考:https://www.cnblogs.com/qianslup/p/12567284.html。是用来操作mysql数据库的,有些代码可能不适用。未进行全部测试。
  2. 代码能否成功的关键是cursor、rollback、fetchall。
  3. 该段代码的文件名为:ssmsConnect ,下面会作为一个包导入。

 

三、 代码

3.1 思路

  1. 首先确定数据库连接成功,并进行SQL语句测试
  2. 将Excel表遍历,获的所有Excel表的路径,准备后面的循环。
  3. 读取Excel表的表名,列名,在数据库中建立相应的表
  4. 读取Excel表的内容,并以行为单位转化成可插入语句。
  5. 如果Excel表的内容过多,防止数据库过载,可以每次只插入一定量的数据,循环插入,直至全部插入完成。
  6. 可以选择一个较小的Excel表进行测试,Python中的所有SQL语句,都需要再SQL编辑器中执行一下,确保语句无误。

 

3.2 展示

# -*- coding: utf-8 -*-
"""
@time :    2023/9/11  19:18
@File:     pb_01.py
@Software: PyCharm
@Author :  qianslup
@Version:  python3.9
"""


from ssmsConnect import MysqlOperation
import math
import os
import pandas as pd
import pymysql
import pymssql
# 配置数据库
config = {'server': 'QIANSL',
          'user': 'sa',
          'password': 'sa123456',
          'database': 'cnblogs'
          }
mssql = MysqlOperation(config=config)



# 建表

def create_table(columns,table_name):
    sql_drop = 'Drop table if exists cnblogs.pb.['+table_name+']'
    # print(sql_drop)
    mssql.delect_sql(sql_drop)         # 用删除的语句drop掉
    print('drop表:'+table_name)
    sql_create = 'CREATE TABLE cnblogs.pb.['+table_name+'] ('
    for column in columns:
        sql_create = sql_create + ' [' +column + '] ' + 'nvarchar(255),'
    sql_create = sql_create[:len(sql_create)-1]
    sql_create = sql_create + ")"
    # print(sql_create)
    mssql.delect_sql(sql_create)  # 建表
    print('create表:'+table_name)


def insert_sql(values, table_name, columns):
    sql_delete = 'delete from cnblogs.pb.['+table_name+']'
    mssql.delect_sql(sql_delete)               # 先将表清空,防止表中已经有其他数据。表是新建的,不可能有其他数据,但还是习惯防错清空
    print('delte表:'+table_name)
    # print(sql_delete)
    sql_insert_0 = 'insert into cnblogs.pb.[' + table_name + ']('
    for column in columns:
        sql_insert_0 = sql_insert_0 + ' [' +column + '], '
    sql_insert_0 = sql_insert_0[:len(sql_insert_0)-2]
    n=1000                                               #每次做多插入1000行,可根据实际情况调整, 插入行数太多,数据库扛不住
    len_v = math.ceil(len(values)/n)
    for i in range(len_v):
        value = values[i*n:(i+1)*n]
        sql_insert = sql_insert_0+") VALUES "+ ','.join(value)
        sql_insert = sql_insert.replace("'Null'", "Null")
        if len(columns) ==1:           # 如果只有一列,插入SQL语句需要变化一下。
            sql_insert = sql_insert.replace(",)", ")")
        # print(sql_insert)
        mssql.insert_sql(sql_insert)
    print("insert表:"+table_name)

def convert_value(df):
    values =[]
    for v  in df.iloc[:].values:
        values.append(f"{tuple(v)}")
    return values


if __name__== '__main__':
    path_file = r'E:\PowerBI学习\零售源文件'
    for i in os.listdir(path_file):
        path_file_i = path_file+'\\'+i
        table_name = i.replace(".csv", "")
        df = pd.read_csv(path_file_i, encoding='utf-8')
        df.fillna("Null", inplace=True)
        columns = df.columns
        create_table(columns=columns, table_name=table_name)
        values = convert_value(df)
        insert_sql(values=values, table_name=table_name, columns=columns)


# 调出一个表进行测试,无误之后再进行循环
# path_file = r'E:\PowerBI学习\零售源文件'
# i = '02_Hour.csv'
# path_file_i = path_file + '\\' + i
# table_name = i.replace(".csv", "")
# df = pd.read_csv(path_file_i, encoding='utf-8')
# df.fillna("Null", inplace=True)
# columns = df.columns
# create_table(columns=columns, table_name=table_name)
# values = convert_value(df)
# insert_sql(values=values, table_name=table_name, columns=columns)

 

 

四、结果展示

4.1 查看表

所有excel表都被存在数据库中。

 

 

4.2 查看内容

主要是核对行数是否一致。

 

posted @ 2023-09-11 22:28  qsl_你猜  阅读(110)  评论(0编辑  收藏  举报