第十四篇 Mongodb数据库
Mongodb
windows安装
下载链接:https://www.mongodb.com/try/download/community (安装3.22.2 msi)
安装参考:https://www.runoob.com/mongodb/mongodb-mongodump-mongorestore.html
数据备份与恢复:
# 数据备份 mongodump -h dbhost -d dbname -o dbdirectory -h: 服务器地址 127.0.0.1:27017 -d: 需要备份的数据库实例,例如:test -o: 备份的数据存放位置,例如:c:\data\dump,该目录需提前建立 # 数据恢复 mongorestore -h <hostname><:port> -d dbname <path>(数据存放位置) 详情查看:https://www.runoob.com/mongodb/mongodb-mongodump-mongorestore.html
源码安装mongodb(ubuntu)
1 下载 到官网查找适合自己的版本 执行下载命令: root@ubuntu:~# wget https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-ubuntu1404-3.4.7.tgz 2 创建mongodb安装目录以及日志记录 root@ubuntu:~# mkdir /data/mongodb root@ubuntu:~# mkdir /var/log/mongodb 3 创建mongodb配置文件 root@ubuntu:~# vim /etc/mongodb.conf 详情如下: #数据库安装位置 dbpath=/data/mongodb #日志存储位置 logpath=/var/log/mongodb/mongodb.log #是否不覆盖原日志 logappend=true #是否立即持久化存储 journal=true #是否以守护进程方式运行 fork=true #绑定使用IP bind_ip = 127.0.0.1 #绑定端口默认27017 #port = 27017 #是否以安全认证方式运行(即用户登录信息) noauth = true #auth = true 4 解压下载的文件 root@ubuntu:~# tar -xvf mongodb-linux-x86_64-ubuntu1404-3.4.7.tgz 5 将 mongo的shell工具拷贝到 usr/bin root@ubuntu:~# cp ./mongodb-linux-x86_64-ubuntu1404-3.4.7/bin/* /usr/bin/ 6 启动数据库 root@ubuntu:~# mongod --config /etc/mongodb.conf about to fork child process, waiting until server is ready for connections. forked process: 3850 child process started successfully, parent exiting 7 测试 root@ubuntu:~# mongo >show dbs admin 0.000GB local 0.000GB
密码验证相关
1配置密码 #a.在配置文件添加如下: security: authorization : enabled #b.设置管理员权限 user admin db.createUser({user:"root",pwd:"123",roles:["root"]}) #c.密码登录使用: mongo use root db.auth('root', '123456') 2 在 mongoengine中使用 import mongoengine # connect('tumblelog') # 无密码连接与登录 # conn = mongoengine.connect("meanning12",username='root', password='xxx',host="xxx",port=27017,authentication_source='admin') # 线上
#密码登录使用: mongo use admin db.auth('root', 'micagent20211015') 文章参考:https://www.hangge.com/blog/cache/detail_2613.html#:~:text=%E4%BD%86%E6%98%AF%E9%BB%98%E8%AE%A4%E6%83%85%E5%86%B5%E4%B8%8B%EF%BC%8C%E5%90%AF%E5%8A%A8%E7%9A%84%20MongoDB%20%E6%B2%A1%E6%9C%89%E7%99%BB%E5%BD%95%E5%AF%86%E7%A0%81%EF%BC%8C%E5%9C%A8%E7%94%9F%E4%BA%A7%E7%8E%AF%E5%A2%83%E4%B8%AD%E8%BF%99%E6%98%AF%E9%9D%9E%E5%B8%B8%E4%B8%8D%E5%AE%89%E5%85%A8%E7%9A%84%E3%80%82%20MongoDB%20%E5%85%B6%E5%AE%9E%E4%B9%9F%E5%8F%AF%E4%BB%A5%E5%BC%80%E5%90%AF%E7%99%BB%E5%BD%95%E5%AF%86%E7%A0%81%E9%AA%8C%E8%AF%81%EF%BC%8C%E4%BD%86%E6%98%AF%E4%B8%8D%E5%90%8C%E4%BA%8E,MySQL%20%E3%80%81%20Oracle%20%E7%AD%89%E5%85%B3%E7%B3%BB%E5%9E%8B%E6%95%B0%E6%8D%AE%E5%BA%93%EF%BC%8C%20MongoDB%20%E4%B8%AD%E6%AF%8F%E4%B8%80%E4%B8%AA%E5%BA%93%E9%83%BD%E6%9C%89%E7%8B%AC%E7%AB%8B%E7%9A%84%E5%AF%86%E7%A0%81%EF%BC%8C%E5%9C%A8%E5%93%AA%E4%B8%80%E4%B8%AA%E5%BA%93%E4%B8%AD%E5%88%9B%E5%BB%BA%E7%94%A8%E6%88%B7%E5%B0%B1%E8%A6%81%E5%9C%A8%E5%93%AA%E4%B8%80%E4%B8%AA%E5%BA%93%E4%B8%AD%E9%AA%8C%E8%AF%81%E5%AF%86%E7%A0%81%E3%80%82
其他 bug(重新启动,排查空间磁盘等。。)
1bug:mongo非正常退出,无法启动 将data/db目录下的 mongod.lock 和 diagnostic.data文件删掉,这时再重启mongodb 2 磁盘空间不足,少于2g(df -h) 参考:https://blog.csdn.net/weixin_43842451/article/details/106584027?utm_medium=distribute.pc_relevant.none-task-blog-2~default~baidujs_title~default-0.pc_relevant_paycolumn_v3&spm=1001.2101.3001.4242.1&utm_relevant_index=3 参考:https://www.codenong.com/cs106651111/#:~:text=exception%20in%20initAndListen%3A%2015926%20Insufficient%20free%20space%20for,files%20%E5%8E%9F%E5%9B%A0%E6%98%AF%E5%9B%A0%E4%B8%BA%20mongo%20%E7%9A%84%20journa%20%E7%9B%AE%E5%BD%95%E4%B8%8B%E7%A9%BA%E9%97%B4%E5%B0%8F%E4%BA%8E3379MB%E3%80%82%20journal%20%E8%87%B3%E5%B0%91%E4%BB%A52G%E7%9A%84%E6%95%B0%E9%87%8F%E8%BF%9B%E8%A1%8C%E5%A2%9E%E9%95%BF%EF%BC%8C%E5%BD%93%E7%A3%81%E7%9B%98%E7%A9%BA%E9%97%B4%E4%B8%8D%E8%B6%B3%E6%97%B6%EF%BC%8C%E5%B0%B1%E4%BC%9A%E6%8A%A5%E8%AF%A5%E9%94%99%E8%AF%AF%E3%80%82
SQL - 结构化查询语言 - 关系数据库全都同SQL来操作 MongoDB - MongoDB是一个NoSQL的数据库 - MongoDB是一款文档型数据库 - 数据库指的就是一个存储数据的仓库 数据库可以使我们完成对数据的持久化的操作 - MongoDB数据库中存储的数据的基本单位就是文档, MongoDB中存储的就是文档,所谓文档其实就是一个“JSON” - MongoDB中的“JSON”我们称为BSON,比普通的JSON的功能要更加的强大 - MongoDB数据库使用的是JavaScript进行操作的,在MongoDB含有一个对ES标准实现的引擎, 在MongoDB中所有ES中的语法中都可以使用 - 安装 - 配置环境变量 C:\Program Files\MongoDB\Server\3.2\bin - 在c盘根目录 - 创建一个文件夹 data - 在data中创建一个文件夹db - MongoDB的基本的指令 - 启动服务器 mongod --dbpath 路径 --port 端口号 - 启动客户端 mongo - 基本概念 数据库(database) 集合(collection) 文档(document) - 在MongoDB中,数据库和集合都不需要手动创建, 当我们创建文档时,如果文档所在的集合或数据库不存在会自动创建数据库和集合 - MongoDB的CRUD的操作 - 基本操作 use 数据库 - 进入指定的数据库 show dbs - 显示所有的数据库 show collections - 显示数据库中所有的集合 db - 显示当前所在的数据库 - 向数据库中插入文档 - db.collection.insert() - insert()可以向集合中插入一个或多个文档 - 例子:向test数据库中的,stus集合中插入一个新的学生对象 {name:"孙悟空",age:18,gender:"男"} db.stus.insert({name:"孙悟空",age:18,gender:"男"} - db.collection.insertOne() - 向集合中插入一个文档 - db.collection.insertMany() - 向集合中插入多个文档 - 查询数据库中的文档 - db.collection.find() - 可以根据指定条件从集合中查询所有符合条件的文档 - 返回的是一个数组 示例: db.stu.find({age:28}) - db.collection.findOne() - 查询第一个符合条件的文档 - 返回的是一个对象 - db.collection.find().count() - 查询符合条件的文档的数量 - 修改数据库中的文档 - db.collection.update() - 可以修改、替换集合中的一个或多个文档 - db.collection.updateOne() - 修改集合中的一个文档 - db.collection.updateMany() - 修改集合中的多个文档 - db.collection.replaceOne() - 替换集合中的一个文档 - 删除集合中的文档 - db.collection.remove() - 删除集合中的一个或多个文档(默认删除多个) - db.collection.deleteOne() - 删除集合中的一个文档 - db.collection.deleteMany() - 删除集合中的多个文档 - 清空一个集合 db.collection.remove({}) - 删除一个集合 db.collection.drop() - 删除一个数据库 db.dropDatabase() 示例: 1查询文档 /* 查询 db.collection.find() - find()用来查询集合中所有符合条件的文档 - find()可以接收一个对象作为条件参数 {} 表示查询集合中所有的文档 {属性:值} 查询属性是指定值的文档 - find()返回的是一个数组 db.collection.findOne() - 用来查询集合中符合条件的第一个文档 - findOne()返回的是一个文档对象 db.collection.find({}).count() - 查询所有结果的数量 pretty() 方法以格式化的方式来显示所有文档。 ---db.col.find().pretty() */ db.stus.find({_id:"hello"}); db.stus.find({age:16 , name:"白骨精"}); db.stus.find({age:28}); db.stus.findOne({age:28}); db.stus.find({}).count(); 2插入文档 /* 向数据库插入文档 db.<collection>.insert() - 向集合中插入一个或多个文档 - 当我们向集合中插入文档时,如果没有给文档指定_id属性,则数据库会自动为文档添加_id 该属性用来作为文档的唯一标识 - _id我们可以自己指定,如果我们指定了数据库就不会在添加了,如果自己指定_id 也必须确保它的唯一性 db.collection.insertOne() - 插入一个文档对象 db.collection.insertMany() - 插入多个文档对象 */ db.stus.insert({name:"猪八戒",age:28,gender:"男"}); db.stus.insert([ {name:"沙和尚",age:38,gender:"男"}, {name:"白骨精",age:16,gender:"女"}, {name:"蜘蛛精",age:14,gender:"女"} ]); db.stus.insert({_id:"hello",name:"猪八戒",age:28,gender:"男"}); db.stus.find(); ObjectId() 3 修改文档 /* 修改 db.collection.update(查询条件,新对象) - update()默认情况下会使用新对象来替换旧的对象 - 如果需要修改指定的属性,而不是替换需要使用“修改操作符”来完成修改 $set 可以用来修改文档中的指定属性 $unset 可以用来删除文档的指定属性 - update()默认只会修改一个 db.collection.updateMany() - 同时修改多个符合条件的文档 db.collection.updateOne() - 修改一个符合条件的文档 db.collection.replaceOne() - 替换一个文档 */ db.stus.find({}); //替换 db.stus.update({name:"沙和尚"},{age:28}); db.stus.update( {"_id" : ObjectId("59c219689410bc1dbecc0709")}, {$set:{ gender:"男", address:"流沙河" }} ) db.stus.update( {"_id" : ObjectId("59c219689410bc1dbecc0709")}, {$unset:{ address:1 }} ) db.stus.updateMany( {"name" : "猪八戒"}, { $set:{ address:"猪老庄" } } ); db.stus.update( {"name" : "猪八戒"}, { $set:{ address:"呵呵呵" } } , { multi:true } ) db.stus.find() 4删除文档 /* db.collection.remove() - 删除一个或多个,可以第二个参数传递一个true,则只会删除一个 - 如果传递一个空对象作为参数,则会删除所有的 db.collection.deleteOne() db.collection.deleteMany() db.collection.drop() 删除集合 db.dropDatabase() 删除数据库 - 一般数据库中的数据都不会删除,所以删除的方法很少调用 一般会在数据中添加一个字段,来表示数据是否被删除 */ db.stu.deleteOne({name:"齐天大圣"}) db.stu.remove({name:"孙悟空"}, {justOne: false}) db.stus.insert([ { name:"zbj", isDel:0 }, { name:"shs", isDel:0 }, { name:"ts", isDel:0 } ]); db.stus.updateOne({name:"ts"},{$set:{isDel:1}}); db.stus.find({isDel:0})
/* 文档之间的关系 一对一(one to one) - 夫妻 (一个丈夫 对应 一个妻子) - 在MongoDB,可以通过内嵌文档的形式来体现出一对一的关系 一对多(one to many)/多对一(many to one) - 父母 - 孩子 用户 - 订单 文章 - 评论 - 也可以通过内嵌文档来映射一对多的关系 多对多(many to many) - 分类 - 商品 老师 - 学生 */ db.wifeAndHusband.insert([ { name:"黄蓉", husband:{ name:"郭靖" } },{ name:"潘金莲", husband:{ name:"武大郎" } } ]); db.wifeAndHusband.find(); //一对多 用户(users) 和 订单(orders) db.users.insert([{ username:"swk" },{ username:"zbj" }]); db.order.insert({ list:["牛肉","漫画"], user_id: ObjectId("59c47e35241d8d36a1d50de0") }); db.users.find() db.order.find() //查找用户swk的订单 var user_id = db.users.findOne({username:"zbj"})._id; db.order.find({user_id:user_id}); //多对多 db.teachers.insert([ {name:"洪七公"}, {name:"黄药师"}, {name:"龟仙人"} ]); db.stus.insert([ { name:"郭靖", tech_ids:[ ObjectId("59c4806d241d8d36a1d50de4"), ObjectId("59c4806d241d8d36a1d50de5") ] },{ name:"孙悟空", tech_ids:[ ObjectId("59c4806d241d8d36a1d50de4"), ObjectId("59c4806d241d8d36a1d50de5"), ObjectId("59c4806d241d8d36a1d50de6") ] } ]) db.teachers.find() db.stus.find()
示例:
//1.进入my_test数据库 use my_test //2.向数据库的user集合中插入一个文档 db.users.insert({ username:"sunwukong" }); //3.查询user集合中的文档 db.users.find(); //4.向数据库的user集合中插入一个文档 db.users.insert({ username:"zhubajie" }); //5.查询数据库user集合中的文档 db.users.find(); //6.统计数据库user集合中的文档数量 db.users.find().count(); //7.查询数据库user集合中username为sunwukong的文档 db.users.find({username:"sunwukong"}); //8.向数据库user集合中的username为sunwukong的文档,添加一个address属性,属性值为huaguoshan db.users.update({username:"sunwukong"},{$set:{address:"huaguoshan"}}); //9.使用{username:"tangseng"} 替换 username 为 zhubajie的文档 db.users.replaceOne({username:"zhubajie"},{username:"tangseng"}); //10.删除username为sunwukong的文档的address属性 db.users.update({username:"sunwukong"},{$unset:{address:1}}); //11.向username为sunwukong的文档中,添加一个hobby:{cities:["beijing","shanghai","shenzhen"] , movies:["sanguo","hero"]} //MongoDB的文档的属性值也可以是一个文档,当一个文档的属性值是一个文档时,我们称这个文档叫做 内嵌文档 db.users.update({username:"sunwukong"},{$set:{hobby:{cities:["beijing","shanghai","shenzhen"] , movies:["sanguo","hero"]}}}); db.users.find(); //12.向username为tangseng的文档中,添加一个hobby:{movies:["A Chinese Odyssey","King of comedy"]} db.users.update({username:"tangseng"},{$set:{hobby:{movies:["A Chinese Odyssey","King of comedy"]}}}) //13.查询喜欢电影hero的文档 //MongoDB支持直接通过内嵌文档的属性进行查询,如果要查询内嵌文档则可以通过.的形式来匹配 //如果要通过内嵌文档来对文档进行查询,此时属性名必须使用引号 db.users.find({'hobby.movies':"hero"}); //14.向tangseng中添加一个新的电影Interstellar //$push 用于向数组中添加一个新的元素 //$addToSet 向数组中添加一个新元素 , 如果数组中已经存在了该元素,则不会添加 db.users.update({username:"tangseng"},{$push:{"hobby.movies":"Interstellar"}}); db.users.update({username:"tangseng"},{$addToSet:{"hobby.movies":"Interstellar"}}); db.users.find(); //15.删除喜欢beijing的用户 db.users.remove({"hobby.cities":"beijing"}); //16.删除user集合 db.users.remove({}); db.users.drop(); show dbs; //17.向numbers中插入20000条数据 7.2s for(var i=1 ; i<=20000 ; i++){ db.numbers.insert({num:i}); } db.numbers.find() db.numbers.remove({}); //0.4s var arr = []; for(var i=1 ; i<=20000 ; i++){ arr.push({num:i}); } db.numbers.insert(arr);
//查询文档时,默认情况是按照_id的值进行排列(升序) //sort()可以用来指定文档的排序的规则,sort()需要传递一个对象来指定排序规则 1表示升序 -1表示降序 //limit skip sort 可以以任意的顺序进行调用 db.emp.find({}).sort({sal:1,empno:-1}); //在查询时,可以在第二个参数的位置来设置查询结果的 投影(展示所要展示的字段)) db.emp.find({},{ename:1 , _id:0 , sal:1});
//18.查询numbers中num为500的文档 db.numbers.find({num:500}) //19.查询numbers中num大于5000的文档 db.numbers.find({num:{$gt:500}}); db.numbers.find({num:{$eq:500}}); //20.查询numbers中num小于30的文档 db.numbers.find({num:{$lt:30}}); //21.查询numbers中num大于40小于50的文档 db.numbers.find({num:{$gt:40 , $lt:50}}); //22.查询numbers中num大于19996的文档 db.numbers.find({num:{$gt:19996}}); //23.查看numbers集合中的前10条数据 db.numbers.find({num:{$lte:10}}); //limit()设置显示数据的上限 db.numbers.find().limit(10); //在开发时,我们绝对不会执行不带条件的查询 db.numbers.find(); //24.查看numbers集合中的第11条到20条数据 /* 分页 每页显示10条 1-10 0 11-20 10 21-30 20 。。。 skip((页码-1) * 每页显示的条数).limit(每页显示的条数); skip()用于跳过指定数量的数据 MongoDB会自动调整skip和limit的位置 */ db.numbers.find().skip(10).limit(10); //25.查看numbers集合中的第21条到30条数据 db.numbers.find().skip(20).limit(10); db.numbers.find().limit(10).skip(10);
//26.将dept和emp集合导入到数据库中 db.dept.find() db.emp.find() //27.查询工资小于2000的员工 db.emp.find({sal:{$lt:2000}}); //28.查询工资在1000-2000之间的员工 db.emp.find({sal:{$lt:2000 , $gt:1000}}); //29.查询工资小于1000或大于2500的员工 db.emp.find({$or:[{sal:{$lt:1000}} , {sal:{$gt:2500}}]}); //30.查询财务部的所有员工 //(depno) var depno = db.dept.findOne({dname:"财务部"}).deptno; db.emp.find({depno:depno}); //31.查询销售部的所有员工 var depno = db.dept.findOne({dname:"销售部"}).deptno; db.emp.find({depno:depno}); //32.查询所有mgr为7698的所有员工 db.emp.find({mgr:7698}) //33.为所有薪资低于1000的员工增加工资400元 db.emp.updateMany({sal:{$lte:1000}} , {$inc:{sal:400}}); db.emp.find()
综合使用
/*查找小于result.record_time的文档个数*/ db.ai_data.find({"result.record_time":{$lt:new Date(2022,10,1)}}).count(); /* 删除指定日期的文档*/ db.ai_data.remove({"result.record_time":{$lt:new Date(2022,8,1)}});
pymongo模块
1.连接mongo
# -*- coding:utf-8 -*- import pymongo # 导入pymongo模块 def mongodb_init01(): """连接mongodb""" mongo = pymongo.MongoClient(host='127.0.0.1', port=27017, tz_aware=True) def mongodb_init02(): """连接mongodb""" uri = "mongodb://{}:{}".format('127.0.0.1', 27017) mongo = pymongo.MongoClient(uri, tz_aware=True)
2.操作文档
emp1 = {"_id":1,"name":"武大郎","sex":"male","age":18,"hire_date":"20170301","post":"烧饼检察官","salary":7300.33} emp2 = {"_id":2,"name":"武松","sex":"male","age":78,"hire_date":"20150302","post":"公务员","salary":71000000.31} emp3 = {"_id":3,"name":"宋江","sex":"male","age":81,"hire_date":"20130305","post":"公务员","salary":78300} emp4 = {"_id":4,"name":"林冲","sex":"male","age":73,"hire_date":"20140701","post":"公务员","salary":73500} emp5 = {"_id":5,"name":"林冲","sex":"male","age":73,"hire_date":"20140701","post":"公务员","salary":73500} emp6 = {"_id":6,"name":"柴进","sex":"male","age":28,"hire_date":"20121101","post":"公务员","salary":72100} emp7 = {"_id":7,"name":"卢俊义","sex":"female","age":18,"hire_date":"20110211","post":"公务员","salary":79000} emp8 = {"_id":8,"name":"高俅","sex":"male","age":18,"hire_date":"19000301","post":"公务员","salary":730000} emp9 = {"_id":9,"name":"鲁智深","sex":"male","age":48,"hire_date":"20101111","post":"公务员","salary":710000} emp10 = {"_id":10,"name":"史进","sex":"female","age":48,"hire_date":"20150311","post":"打手","salary":73000.13} emp11 = {"_id":11,"name":"李逵","sex":"female","age":38,"hire_date":"20101101","post":"打手","salary":72000.35} emp12 = {"_id":12,"name":"周通","sex":"female","age":18,"hire_date":"20110312","post":"打手","salary":71000.37} emp13 = {"_id":13,"name":"石秀","sex":"female","age":18,"hire_date":"20160513","post":"打手","salary":73000.29} emp14 = {"_id":14,"name":"李忠","sex":"female","age":28,"hire_date":"20170127","post":"打手","salary":74000.33} emp15 = {"_id":15,"name":"吴用","sex":"male","age":28,"hire_date":"20160311","post":"文人","salary":710000.13} emp16 = {"_id":16,"name":"萧让","sex":"male","age":18,"hire_date":"19970312","post":"文人","salary":720000} emp17 = {"_id":17,"name":"安道全","sex":"female","age":18,"hire_date":"20130311","post":"文人","salary":719000} emp18 = {"_id":18,"name":"公孙胜","sex":"male","age":18,"hire_date":"20150411","post":"文人","salary":718000} emp19 = {"_id":19,"name":"朱贵","sex":"female","age":18,"hire_date":"20140512","post":"文人","salary":717000} db.emp.insertMany([emp1, emp2, emp3, emp4, emp5, emp6, emp7, emp8, emp9, emp10, emp11, emp12, emp13, emp14, emp15, emp16, emp17, emp18, emp19])
# -*- coding:utf-8 -*- import pymongo # 导入pymongo模块 from bson import ObjectId from pymongo import IndexModel from pymongo.database import Database mongo = pymongo.MongoClient(host='192.168.10.100', port=27017, tz_aware=True) def handler_db(): """操作数据库""" # 创建数据库 # db = Database(name='abc', client=mongo) db = mongo.abc print(db) # 删除数据库 print(mongo.drop_database('abc')) # 获取数据库 for db_name in mongo.list_database_names(): print(db_name) def handler_collection(): """操作集合""" # db = mongo.get_database('abc') # 获取数据库,如果没有会自动创建 db = mongo.abc # 创建一个集合 col = db.create_collection('col') # 创建集合,如果存在就会报错 print('创建的集合:', col) # 获取一个集合 # col = db.get_collection('col') # 获取集合,如果没有会自动创建集合 col = db.col # 获取集合,如果没有会自动创建集合 print('获取的集合:', col) # 获取所有集合 collection_names = db.list_collection_names() # 获取集合名称,返回list for name in collection_names: print('集合名称:', name) collections = db.list_collections() # 返回所有集合对象 for collection in collections: print('获取集合:', collection) # 删除集合 print(db.drop_collection('col')) # 删除集合,返回字典 def handler_index(): """操作索引""" # 创建升序索引 users = mongo.sxt.users r = users.create_index([('name', pymongo.ASCENDING)]) print(r) # 创建降序索引 r = users.create_index([('age', pymongo.DESCENDING)]) print(r) # 创建混合索引 r = users.create_index([('name', pymongo.ASCENDING), ('status', pymongo.DESCENDING)]) print(r) # 创建唯一索引 r = users.create_index([('user_id', pymongo.ASCENDING)], unique=True) print(r) # 删除索引 users.drop_index('name_1') # 如果索引不存在,就会报错 users.drop_indexes() # 删除所有索引 # 创建多个索引 user_id_index = IndexModel([('user_id', pymongo.ASCENDING)], unique=True) name_index = IndexModel([('name', pymongo.DESCENDING)]) r = users.create_indexes([user_id_index, name_index]) # 返回索引名称的list print(r) def handler_document(): """操作文档""" shsxt = Database(name='sxt', client=mongo) # 获取数据库 col = shsxt.get_collection(name='col') # 获取集合 # 添加 doc = {'name': 'xiaohuang', 'age': 18, 'sex': '男'} r = col.insert_one(doc) print(r.acknowledged, r.inserted_id) # acknowledged是否成功,inserted_id主键id doc1 = {'name': '老王', 'age': 20, 'sex': '男'} doc2 = {'name': '老李', 'age': 15, 'sex': '女'} rs = col.insert_many([doc1, doc2], ordered=True) print(rs.acknowledged, rs.inserted_ids) # acknowledged是否成功,inserted_id主键id # 删除 d = col.delete_one({'name': '老王'}) # 删除一条 print(d.acknowledged, d.deleted_count) # acknowledged是否成功,deleted_count删除记录数 d = col.delete_many({'name': '老李'}) # 删除多条 print(d.acknowledged, d.deleted_count) # acknowledged是否成功,deleted_count删除记录数 # 修改 upt = col.update_one(filter={'name': '老李'}, update={'$set': {'name': '小李', 'age': 19}}, upsert=True) print(upt.acknowledged, upt.matched_count, upt.modified_count, upt.raw_result, upt.upserted_id) upt = col.update_many(filter={'name': '老李'}, update={'$set': {'name': '小李', 'age': 19}}, upsert=True) print(upt.acknowledged, upt.matched_count, upt.modified_count, upt.raw_result, upt.upserted_id) # upt = col.update_one(filter={'_id': ObjectId("5b2d041fa3dbd1ad597db57e")}, update={'$set': {'name': '小李'}}) # print(upt.acknowledged, upt.matched_count, upt.modified_count, upt.raw_result, upt.upserted_id) if __name__ == '__main__': handler_document()
# -*- coding:utf-8 -*- from collections import OrderedDict import pymongo client = pymongo.MongoClient(host='192.168.10.100', port=27017, tz_aware=True) db = client.get_database('shsxt') emp = db.get_collection('emp') user = db.get_collection('user') def show(document): for d in document: print(d) def test_and(): """and操作""" ''' db.user.find({"$and":[ {"_id":{"$gte":3,"$lte":4}}, {"age":{"$gte":4}} ]}) ''' r = user.find({"$and": [ {"_id": {"$gte": 3, "$lte": 4}}, {"age": {"$gte": 4}} ]}) # 返回Cursor print(r) show(r) def test_or(): """or操作""" ''' db.user.find({"$or":[ {"_id":{"$lte":1,"$gte":0}}, {"_id":{"$gte":4}}, {"name":"tianqi"} ]}) ''' r = user.find({"$or": [ {"_id": {"$lte": 1, "$gte": 0}}, {"_id": {"$gte": 4}}, {"name": "tianqi"} ]}) show(r) def test_reg(): """正则操作""" # 匹配规则:z开头、n或u结尾,不区分大小写 # db.user.find({'name':/^z.*(n|u)$/i}) r = user.find({"name": {"$regex": '^z.*(n|u)$'}}) show(r) # db.users.find({name: /a/}) r = user.find({'name': {'$regex': 'a'}}) print() print('like %a%') show(r) # db.users.find({name: /^zh/}) // like 'pa%' r = user.find({'name': {'$regex': '^zh'}}) print() print('like zh%') show(r) # db.users.find({name: /qi$/}) // like'%ro' r = user.find({'name': {'$regex': 'qi$'}}) print() print('like %qi') show(r) def test_array(): """数组操作""" ''' #查看所有人的第1个到第3个爱好,第一个{}表示查询条件为所有,第二个是显示条件 db.user.find( {}, { "_id":0, "name":0, "age":0, "addr":0, "hobbies":{"$slice":[0,2]}, } ) ''' r = user.find({}, { "_id": 0, "name": 0, "age": 0, "addr": 0, "hobbies": {"$slice": [0, 2]}, }) show(r) def test_page(page=1, page_size=2): """分页操作""" # 获取结果集 r = user.find().skip((page - 1) * page_size).limit(page_size) show(r) # 获取总记录数 # count = db.user.find().count() count = user.count() print('总记录数:', count) def test_project(): """投影操作""" ''' db.user.find({ "name":/^z.*(n|u)$/i }, { "_id":0, "name":1, "age":1 } ) ''' r = user.find({"name": {"$regex": "^z.*(n|u)$"}}, { "_id": 0, "name": 1, "age": 1 }) show(r) def test_sort(): """排序操作""" ''' # 按姓名正序 db.user.find().sort({"name":1}) # 按年龄倒序 按id正序 db.user.find().sort({"age":-1,'_id':1}) ''' r = user.find({}, {'age': 1}).sort('age', pymongo.ASCENDING) show(r) print() r = user.find({}, {'age': -1, '_id': 1}).sort([('age', pymongo.DESCENDING), ('_id', pymongo.ASCENDING)]) show(r) def test_group(): """分组查询""" # {"$match": {"字段": "条件"}}, 可以使用任何常用查询操作符$gt,$lt,$in等 # select * from db1.emp where post='公务员'; r = db.emp.aggregate([{"$match": {"post": "公务员"}}]) # select * from db1.emp where id > 3 group by post; r = db.emp.aggregate([ {"$match": {"_id": {"$gt": 3}}}, {"$group": {"_id": "$post", 'avg_salary': {"$avg": "$salary"}}} ]) # select * from db1.emp where id > 3 group by post having avg(salary) > 10000; r = db.emp.aggregate([ {"$match": {"_id": {"$gt": 3}}}, {"$group": {"_id": "$post", 'avg_salary': {"$avg": "$salary"}}}, {"$match": {"avg_salary": {"$gt": 10000}}} ]) # {"$group": {"_id": 分组字段, "新的字段名": 聚合操作符}} # 1、将分组字段传给$group函数的_id字段即可 # {"$group": {"_id": "$sex"}} # 按照性别分组 # {"$group": {"_id": "$post"}} # 按照职位分组 # {"$group": {"_id": {"state": "$state", "city": "$city"}}} # 按照多个字段分组,比如按照州市分组 # 2、分组后聚合得结果,类似于sql中聚合函数的聚合操作符:$sum、$avg、$max、$min、$first、$last # 例1:select post,max(salary) from db1.emp group by post; r = db.emp.aggregate([{"$group": {"_id": "$post", "max_salary": {"$max": "$salary"}}}]) # 例2:取每个部门最大薪资与最低薪资 r = db.emp.aggregate( [{"$group": {"_id": "$post", "max_salary": {"$max": "$salary"}, "min_salary": {"$min": "$salary"}}}]) # 例3:如果字段是排序后的,那么$first,$last会很有用,比用$max和$min效率高 r = db.emp.aggregate([{"$group": {"_id": "$post", "first_id": {"$first": "$_id"}}}]) # 例4:求每个部门的总工资 r = db.emp.aggregate([{"$group": {"_id": "$post", "count": {"$sum": "$salary"}}}, {"$sort": {"count": 1}}]) # 例5:求每个部门的人数 r = db.emp.aggregate([{"$group": {"_id": "$post", "count": {"$sum": 1}}}, {"$sort": {"count": 1}}]) # 3、数组操作符 # {"$addToSet": expr} # 不重复 # {"$push": expr} # 重复 # 例:查询岗位名以及各岗位内的员工姓名:select post,group_concat(name) from db1.emp group by post; r = db.emp.aggregate([{"$group": {"_id": "$post", "names": {"$push": "$name"}}}]) r = db.emp.aggregate([{"$group": {"_id": "$post", "names": {"$addToSet": "$name"}}}]) show(r) def test_project02(): """投影操作""" # {"$project": {"要保留的字段名": 1, "要去掉的字段名": 0, "新增的字段名": "表达式"}} # select name,post,(age+1) as new_age from db1.emp; r = db.emp.aggregate([ { "$project": { "name": 1, "post": 1, "new_age": {"$add": ["$age", 1]} } } ]) show(r) def test_sort_limit_skip(): """排序限制跳过""" # 例1、取平均工资最高的前两个部门 r = db.emp.aggregate([ { "$group": {"_id": "$post", "平均工资": {"$avg": "$salary"}} }, { "$sort": {"平均工资": -1} }, { "$limit": 2 } ]) # 例2、取平均工资最高的第二个部门 r = db.emp.aggregate([ { "$group": {"_id": "$post", "平均工资": {"$avg": "$salary"}} }, { "$sort": {"平均工资": -1} }, { "$limit": 2 }, { "$skip": 1 } ]) show(r) def test_sample(): """随机获取""" # 随机获取3个文档 r = db.emp.aggregate([ {"$sample": {"size": 3}} ]) show(r) def test_str(): """字符串操作""" # 截取字符串 r = db.emp.aggregate([ { "$project": { "_id": 0, "str": {"$substr": ["$sex", 0, 2]} } } ]) # 拼接 r = db.emp.aggregate([ { "$project": { "name": 1, "post": 1, "name_sex": {"$concat": ["$name", "测试拼接", "$sex"]} } } ]) # 将性别的英文转为大写 r = db.emp.aggregate([{"$project": {"sex": {"$toUpper": "$sex"}}}]) show(r) if __name__ == '__main__': test_str()
3 示例:scrapy+mongo(爬取bibi排行榜相关数据)
bibi_ranke.py
import scrapy from bibi.items import BibiItem class BibiRankeSpider(scrapy.Spider): name = 'bibi_ranke' # allowed_domains = ['https://www.bilibili.com'] start_urls = ['https://www.bilibili.com/v/popular/rank/all'] def parse(self, response): titles =response.xpath('//div[@class="info"]//a[@class="title"]/text()').extract() play_nums =response.xpath('//div[@class="detail"]/span[@class="data-box"][1]/text()').extract() up_names =response.xpath("//div[@class='detail']/a//text()").extract() scores =response.xpath('//div[@class="info"]//div[@class="pts"]/div/text()').extract() play_nums=[i.strip() for i in play_nums] up_names=[i.strip() for i in up_names] # print(titles,play_nums,up_names,scores) # 打包 for title,play_num,up_name,score in zip(titles,play_nums,up_names,scores): item=BibiItem() item['play_num']=play_num item['title']=title item['up_name']=up_name item['score']=score yield item
item.py
# Define here the models for your scraped items # # See documentation in: # https://docs.scrapy.org/en/latest/topics/items.html import scrapy class BibiItem(scrapy.Item): _id =scrapy.Field() # define the fields for your item here like: title = scrapy.Field() play_num = scrapy.Field() up_name = scrapy.Field() score = scrapy.Field()
pipelines.py
# Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES setting # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html # useful for handling different item types with a single interface from itemadapter import ItemAdapter import pymongo class BibiPipeline: # 爬虫开始时连接数据库 def open_spider(self,spider): self.client =pymongo.MongoClient() def process_item(self, item, spider): # 存入mongodb数据库中 self.client.bilibli.rank.insert_one(item) return item # 爬虫结束关闭数据库 def close_spider(self,spider): self.client.close()
# Scrapy settings for bibi project # # For simplicity, this file contains only settings considered important or # commonly used. You can find more settings consulting the documentation: # # https://docs.scrapy.org/en/latest/topics/settings.html # https://docs.scrapy.org/en/latest/topics/downloader-middleware.html # https://docs.scrapy.org/en/latest/topics/spider-middleware.html BOT_NAME = 'bibi' SPIDER_MODULES = ['bibi.spiders'] NEWSPIDER_MODULE = 'bibi.spiders' from fake_useragent import UserAgent # Crawl responsibly by identifying yourself (and your website) on the user-agent #USER_AGENT = 'bibi (+http://www.yourdomain.com)' USER_AGENT = UserAgent().random # 随机生成UA伪装 # Obey robots.txt rules ROBOTSTXT_OBEY = False # Configure maximum concurrent requests performed by Scrapy (default: 16) #CONCURRENT_REQUESTS = 32 # Configure a delay for requests for the same website (default: 0) # See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay # See also autothrottle settings and docs DOWNLOAD_DELAY = 1 # The download delay setting will honor only one of: #CONCURRENT_REQUESTS_PER_DOMAIN = 16 #CONCURRENT_REQUESTS_PER_IP = 16 # Disable cookies (enabled by default) #COOKIES_ENABLED = False # Disable Telnet Console (enabled by default) #TELNETCONSOLE_ENABLED = False # Override the default request headers: #DEFAULT_REQUEST_HEADERS = { # 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', # 'Accept-Language': 'en', #} # Enable or disable spider middlewares # See https://docs.scrapy.org/en/latest/topics/spider-middleware.html #SPIDER_MIDDLEWARES = { # 'bibi.middlewares.BibiSpiderMiddleware': 543, #} # Enable or disable downloader middlewares # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html #DOWNLOADER_MIDDLEWARES = { # 'bibi.middlewares.BibiDownloaderMiddleware': 543, #} # Enable or disable extensions # See https://docs.scrapy.org/en/latest/topics/extensions.html #EXTENSIONS = { # 'scrapy.extensions.telnet.TelnetConsole': None, #} # Configure item pipelines # See https://docs.scrapy.org/en/latest/topics/item-pipeline.html ITEM_PIPELINES = { 'bibi.pipelines.BibiPipeline': 300, } # Enable and configure the AutoThrottle extension (disabled by default) # See https://docs.scrapy.org/en/latest/topics/autothrottle.html #AUTOTHROTTLE_ENABLED = True # The initial download delay #AUTOTHROTTLE_START_DELAY = 5 # The maximum download delay to be set in case of high latencies #AUTOTHROTTLE_MAX_DELAY = 60 # The average number of requests Scrapy should be sending in parallel to # each remote server #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 # Enable showing throttling stats for every response received: #AUTOTHROTTLE_DEBUG = False # Enable and configure HTTP caching (disabled by default) # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings #HTTPCACHE_ENABLED = True #HTTPCACHE_EXPIRATION_SECS = 0 #HTTPCACHE_DIR = 'httpcache' #HTTPCACHE_IGNORE_HTTP_CODES = [] #HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
作者:华王
博客:https://www.cnblogs.com/huahuawang/