python中级---->pymongo存储json数据
这里面我们介绍一下python中操作mangodb的第三方库pymongo的使用,以及简单的使用requests库作爬虫。人情冷暖正如花开花谢,不如将这种现象,想成一种必然的季节。
pymongo的安装及前期准备
一、mangodb的安装以及启动
测试机器:win10, mangodb版本v3.4.0,python版本3.6.3。
mangodb的安装目录:D:\Database\DataBase\Mongo。数据的存放目录:E:\data\database\mango\data。首先我们启动mangodb服务器的:可以看到在本地27017端口成功启动server。
D:\Database\DataBase\Mongo\Server\3.4\bin>mongod --dbpath E:\data\database\mango\data
2017-11-21T20:48:38.458+0800 I CONTROL [initandlisten] MongoDB starting : pid=20484 port=27017 dbpath=E:\data\database\mango\data 64-bit host=Linux
2017-11-21T20:48:38.461+0800 I CONTROL [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
2017-11-21T20:48:38.462+0800 I CONTROL [initandlisten] db version v3.4.0
2017-11-21T20:48:38.463+0800 I CONTROL [initandlisten] git version: f4240c60f005be757399042dc12f6addbc3170c1
2017-11-21T20:48:38.464+0800 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1t-fips 3 May 2016
2017-11-21T20:48:38.465+0800 I CONTROL [initandlisten] allocator: tcmalloc
2017-11-21T20:48:38.466+0800 I CONTROL [initandlisten] modules: none
2017-11-21T20:48:38.466+0800 I CONTROL [initandlisten] build environment:
2017-11-21T20:48:38.467+0800 I CONTROL [initandlisten] distmod: 2008plus-ssl
2017-11-21T20:48:38.468+0800 I CONTROL [initandlisten] distarch: x86_64
2017-11-21T20:48:38.469+0800 I CONTROL [initandlisten] target_arch: x86_64
2017-11-21T20:48:38.469+0800 I CONTROL [initandlisten] options: { storage: { dbPath: "E:\data\database\mango\data" } }
2017-11-21T20:48:38.491+0800 I - [initandlisten] Detected data files in E:\data\database\mango\data created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2017-11-21T20:48:38.493+0800 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=5573M,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-11-21T20:48:39.931+0800 I CONTROL [initandlisten]
2017-11-21T20:48:39.933+0800 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2017-11-21T20:48:39.936+0800 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2017-11-21T20:48:39.940+0800 I CONTROL [initandlisten]
2017-11-21T20:48:41.253+0800 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory 'E:/data/database/mango/data/diagnostic.data'
2017-11-21T20:48:41.259+0800 I NETWORK [thread1] waiting for connections on port 27017
mangodb客户端的启动:D:\Database\DataBase\Mongo\Server\3.4\bin\mongo.exe。双击即可运行
MongoDB shell version v3.4.0
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.0
Server has startup warnings:
2017-11-21T20:48:39.931+0800 I CONTROL [initandlisten]
2017-11-21T20:48:39.933+0800 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2017-11-21T20:48:39.936+0800 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2017-11-21T20:48:39.940+0800 I CONTROL [initandlisten]
>
二、python中pymongo的安装
pip install pymongo
这里简单的介绍一下pymongo的使用,这里面的代码是选自github的入门例子。
>>> import pymongo
>>> client = pymongo.MongoClient("localhost", 27017)
>>> db = client.test
>>> db.name
u'test'
>>> db.my_collection
Collection(Database(MongoClient('localhost', 27017), u'test'), u'my_collection')
>>> db.my_collection.insert_one({"x": 10}).inserted_id
ObjectId('4aba15ebe23f6b53b0000000')
>>> db.my_collection.insert_one({"x": 8}).inserted_id
ObjectId('4aba160ee23f6b543e000000')
>>> db.my_collection.insert_one({"x": 11}).inserted_id
ObjectId('4aba160ee23f6b543e000002')
>>> db.my_collection.find_one()
{u'x': 10, u'_id': ObjectId('4aba15ebe23f6b53b0000000')}
>>> for item in db.my_collection.find():
... print(item["x"])
...
10
8
11
>>> db.my_collection.create_index("x")
u'x_1'
>>> for item in db.my_collection.find().sort("x", pymongo.ASCENDING):
... print(item["x"])
...
8
10
11
>>> [item["x"] for item in db.my_collection.find().limit(2).skip(1)]
[8, 11]
pymongo的使用例子
一、python爬虫以及pymongo存储数据
import requests
import pymongo
import json
def requestData():
url = 'http://****.com/*.do'
data = {
'projectId': 90,
'myTaskFlag': 1,
'userId': 40
}
json_data = requests.post(url, data=json.dumps(data)).json()
return json_data
def output_data(json_data):
client = pymongo.MongoClient(host='localhost', port=27017)
db = client.test
collection = db.tasks
tasks_data = json_data.get('taskList')
collection.insert(tasks_data)
client.close()
if __name__ == '__main__':
json_data = requestData()
output_data(json_data)
我们把得到的数据存放在tasks集合中,这里使用的是mangodb默认的test数据库。运行完程序,我们可以通过mangodb的客户端查看数据,运行:db.tasks.find().pretty()可以查询tasks集合的所有数据。
{
"_id" : ObjectId("5a1427a2edc9f04be40bc02d"),
"taskId" : 1,
"summary" : "PC版“个人信息”页面优化",
"status" : 8,
"categoryId" : 3,
"creatorId" : 7,
"projectId" : 1,
"dateSubmit" : NumberLong("1481105108000"),
"level" : 1,
"handlerId" : 2,
"ViewState" : 2,
"priority" : 2
} {
"_id" : ObjectId("5a1427a2edc9f04be40bc02e"),
"taskId" : 2,
"summary" : "PC版“添加新任务”界面字体太大",
"status" : 8,
"categoryId" : 3,
"creatorId" : 7,
"projectId" : 1,
"dateSubmit" : NumberLong("1481105195000"),
"level" : 1,
"handlerId" : 2,
"ViewState" : 2,
"priority" : 1
}
作者:
huhx
出处: www.cnblogs.com/huhx
格言:你尽力了,才有资格说自己的运气不好。
版权:本文版权归作者huhx和博客园共有,欢迎转载。未经作者同意必须保留此段声明,且在文章页面明显位置给出原文链接,否则保留追究法律责任的权利。
出处: www.cnblogs.com/huhx
格言:你尽力了,才有资格说自己的运气不好。
版权:本文版权归作者huhx和博客园共有,欢迎转载。未经作者同意必须保留此段声明,且在文章页面明显位置给出原文链接,否则保留追究法律责任的权利。