Python Json分别存入Mysql、MongoDB数据库,使用Xlwings库转成Excel表格
前情提要:将 [第5天] Python 爬虫基础 - 小能日记 (cnblogs.com) 的电影数据 data.json 数据通过xlwings库转换成excel表格,存入mysql,mongodb数据库中。
数据下载:2020_3/data.json
xlwings文档: xlwings中文文档
MongoDB资料:2020_3/MongoDB 实战.pdf
总共用时:2小时 (代码在最后面)
学习内容:python基础语法、xlwings库、mysql库、pymongo库、mongoDB数据库复习
Excel表格
MYSQL
MongoDB
踩过的坑
1、‘gbk’ codec can’t decode byte 0xae
open(path+'/5_json/data.json',encoding = "utf-8")
2、python 打开文件,保存文件时相对路径报错
import os
# ^ 获取当前py脚本文件夹路径
path = os.path.dirname(__file__)
# path + '/test.txt' 合并为绝对路径
3、python中with...as的用法
晚点更
4、python中list与string的转换
(6条消息) python中list与string的转换_bufengzj的博客-CSDN博客_python str转list
晚点更
5、python mysql插入null数据
None if i['release'] == "" else i['release'],
将想要存储为Null的值填为None
6、python 集合里不能放列表,可以放元组
我的代码
# pip install xlwings -i https://pypi.tuna.tsinghua.edu.cn/simple
# pip install pymongo -i https://pypi.tuna.tsinghua.edu.cn/simple
# pip install mysql-connector -i https://pypi.tuna.tsinghua.edu.cn/simple
import json
import xlwings as xw
import os
import mysql.connector
import pymongo
import sys
import requests
# ^ 获取当前py脚本文件夹路径
path = os.path.dirname(__file__)
# ^ 阿里云OSS读取数据集文件
content = requests.get("https://xiaonenglife.oss-cn-hangzhou.aliyuncs.com/static/cnblogs/2020_3/films_data.json")
if content.status_code != 200:
print("文件加载不成功")
sys.exit() # ^ 退出程序
f = open(path+'/data.json', 'w', encoding="utf-8")
f.write(content.text)
f.close()
# ^ 加载json文件
# WARN 'gbk' codec can't decode byte 0xae in position
file = open(path+'/data.json', 'r', encoding="utf-8")
# ^ json转换为字典数组
films = json.loads(file.read())
print(file)
filmArr = []
for i in films:
filmArr.append([ # ^ 不用 list() 因为给定了八个参数,list只要一个
i['chineseName'],
i['foreignName'],
'、'.join(list(str(s) for s in i['tags'])),
i['address'],
i['time'],
i['release'],
i['desc'],
i['score']
])
# ^ excel
wb = xw.Book()
sht = wb.sheets['sheet1']
sht.range('A1').value = ['中文名', '外语名', '类别', '出版地', '时长', '发行日期', '介绍', '评分']
sht.range('A2').value = filmArr
wb.save(path+'/优秀电影.xlsx')
# ^ mysql
sqlDB = mysql.connector.connect(
host="localhost",
user="root",
passwd="sql2008",
auth_plugin='mysql_native_password' # ^ 验证方式必须要有
)
mycursor = sqlDB.cursor()
mycursor.execute("drop DATABASE if exists test")
mycursor.execute("CREATE DATABASE test")
sqlDB = mysql.connector.connect(
host="localhost",
user="root",
passwd="sql2008",
database="test",
auth_plugin='mysql_native_password'
)
mycursor = sqlDB.cursor()
mycursor.execute("drop table if exists films")
mycursor.execute("""
CREATE TABLE films(
id INT AUTO_INCREMENT PRIMARY KEY,
chinesename VARCHAR(255),
foreignName VARCHAR(255),
tags json,
address VARCHAR(255),
time decimal(5,1),
releasetime date,
description mediumtext,
score decimal(3,1)
)
""")
sql = "INSERT INTO films (chinesename, foreignName,tags,address,time,releasetime,description,score) VALUES (%s, %s,%s,%s,%s,%s,%s,%s)"
filmArr = [] # ^ 里面放元组
for i in films:
filmArr.append((
i['chineseName'],
i['foreignName'],
json.dumps(i['tags']),
i['address'],
i['time'],
None if i['release'] == "" else i['release'],
i['desc'],
i['score']
))
mycursor.executemany(sql, filmArr) # ^ 执行多条
sqlDB.commit() # ^ 修改数据库的时候必须加
print(mycursor.rowcount, "条数据被插入")
# ^ mongoDB
client = pymongo.MongoClient('mongodb://localhost:27017')
mongoDB = client['test'] # ^ 数据库
col = mongoDB['films'] # ^ 集合
col.drop() # ^ 删除集合
filmArr = []
for i in films:
filmArr.append(dict(
chineseName=i['chineseName'],
foreignName=i['foreignName'],
tags=i['tags'],
address=i['address'],
time=float(i['time']),
release=i['release'],
desc=i['desc'],
score=float(i['score'])
))
x = col.insert_many(filmArr)
# print(x.inserted_ids)
x = col.find_one()
print(x)
for x in col.find({}, {'chineseName': 1, 'time': 1, 'score': 1}).sort('score'):
print(x)
print('----------------------')
for x in col.find({}, {'chineseName': 1, 'time': 1, 'score': 1}).limit(5).sort('score', -1):
print(x)
print('----------------------')
# ^ 查找评分大于等于9.0且时长小于90分钟的电影
query = {'score': {"$gte": 9.0}, 'time': {"$lt": 90}}
for x in col.find(query, {'chineseName': 1, 'time': 1, 'score': 1}).sort('score'):
print(x)
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· Docker 太简单,K8s 太复杂?w7panel 让容器管理更轻松!