MongoDB与pymongo

首先，必须确认以下环境已经安装：

1. Python

2. MongoDB

3. pymongo

导入模块

import pymongo

使用MongoClient连接MongoDB

from pymongo import MongoClient
client = MongoClient(host, port)

引用MongoClient来创建数据库连接实例。在MongoDB中，默认的host和port如下：

client = MongoClient('localhost', 27017)

或者使用MongoDB的URL来引用：

client = MongoClient('mongodb://localhost:27017/')

连接MongoDB的一个数据库

任何一个连接实例都可以连接一个或者多个独立的数据库。这里默认已有一个名为test_db的数据库，下面是连接方法：

database = client.test_database

或者当你的数据库名称不符合Python标准的时候，可以用：

database = client['test-database']

读取一个Collection

Collection在这里的意思是一个存储在MongoDB中的文件集合，相当于关系型数据库中的table。具体方法和database一样：

collection = db.test_collection

或者：

collection = db['test-collection']

数据格式

数据在MongoDB中是以JSON的方式储存的。在pymongo中使用字典的形式来保存数据。事例如下：

>>> import datetime
>>> post = {"author": "Mike",
...         "text": "My first blog post!",
...         "tags": ["mongodb", "python", "pymongo"],
...         "date": datetime.datetime.utcnow()}

插入一条数据

我们使用insert_one()方法来插入一条数据

>>> posts = db.posts
>>> post_id = posts.insert_one(post).inserted_id
>>> post_id
ObjectId('...')

如果数据不含有“_ID”，那么当它被插入到数据库中的时候，数据库就会自动赋予它一个“_ID”。在整个MongoDB中，这个_ID都是唯一的。

当post这个数据被插入的时候，它也就在MongoDB中同时被创建了一个Collection。我们可以用如下的方法来验证这些Collection：

>>> db.collection_names(include_system_collections=False)
[u'posts']

查询一条数据

在MongoDB中最为基础的查询语言就是find_one()。这种方法只能返回一条查询结果，当有多个查询结果符合查询条件的时候，数据库会返回第一条。

>>> import pprint
>>> pprint.pprint(posts.find_one())
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

返回的结果也是以字典的方式呈现的。

同样地，这个方法也支撑具体的条件查询，例如，我们想要获得作者为Mike的数据：

>>> pprint.pprint(posts.find_one({"author": "Mike"}))
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

如果我们试着查询另一个不存在的作者，例如Eliot，返回的结果就是空：

>>> posts.find_one({"author": "Eliot"})
>>>

通过_ID查询

由于_ID是唯一的，当我们知道这个ID的时候，我们可以通过这个ID进行查询

>>> post_id
ObjectId(...)
>>> pprint.pprint(posts.find_one({"_id": post_id}))
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

注意：上例中的ObjectID的数据类型并不是str

>>> post_id_as_str = str(post_id)
>>> posts.find_one({"_id": post_id_as_str}) # No result
>>>

在网页应用中，最常见的就是从request URL中或者ID并查询，此时要注意的即是这个ID的数据类型问题了。

from bson.objectid import ObjectId

# The web framework gets post_id from the URL and passes it as a string
def get(post_id):
    # Convert from string to ObjectId:
    document = client.db.collection.find_one({'_id': ObjectId(post_id)})

多条数据插入

可以使用insert_many()来插入多条数据。使用这种插入方法，并不需要多条命令。

>>> new_posts = [{"author": "Mike",
...               "text": "Another post!",
...               "tags": ["bulk", "insert"],
...               "date": datetime.datetime(2009, 11, 12, 11, 14)},
...              {"author": "Eliot",
...               "title": "MongoDB is fun",
...               "text": "and pretty easy too!",
...               "date": datetime.datetime(2009, 11, 10, 10, 45)}]
>>> result = posts.insert_many(new_posts)
>>> result.inserted_ids
[ObjectId('...'), ObjectId('...')]

注意：在第二条数据中，加入了一个与第一条数据格式不符合的数据点“title”，而数据库不会发生错误，这也就是MongoDB的优点之一：不会也别局限于数据点的格式。

多条数据查询

可以使用find()方法来查询多条数据，返回的是一个Cursor实例，我们可以遍历所有匹配的数据。

>>> for post in posts.find():
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}
{u'_id': ObjectId('...'),
 u'author': u'Eliot',
 u'date': datetime.datetime(...),
 u'text': u'and pretty easy too!',
 u'title': u'MongoDB is fun'}

同样地，find()同样支持条件查询：

>>> for post in posts.find({"author": "Mike"}):
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}

数据计数

当我们只想知道有多少数据满足我的查询条件的时候，可以使用count()来对查询结果计数。

>>> posts.count()
3

>>> posts.find({"author": "Mike"}).count()
2

范围查询

MongoDB同样支持很多的高级查询的功能，例如，我们在下面的查询中限定日期，并对查询结果根据作者author进行排序：

>>> d = datetime.datetime(2009, 11, 12, 12)
>>> for post in posts.find({"date": {"$lt": d}}).sort("author"):
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Eliot',
 u'date': datetime.datetime(...),
 u'text': u'and pretty easy too!',
 u'title': u'MongoDB is fun'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}

索引

加入索引系统可以加速查询的进程并且添加更多的查询功能。在这个例子中，我们将要演示索引的创建以及使用：

首先，先创建一个索引

>>> result = db.profiles.create_index([('user_id', pymongo.ASCENDING)],
...                                   unique=True)
>>> sorted(list(db.profiles.index_information()))
[u'_id_', u'user_id_1']

在返回的结果中，有两个ID，一个是MongoDB自动创建的，另一个是我们新加上去的。此时，我们设置一些用户ID：

>>> user_profiles = [
...     {'user_id': 211, 'name': 'Luke'},
...     {'user_id': 212, 'name': 'Ziltoid'}]
>>> result = db.profiles.insert_many(user_profiles)

然而索引系统就会自动地阻止我们设置在Collection中重复的ID：

>>> new_profile = {'user_id': 213, 'name': 'Drew'}
>>> duplicate_profile = {'user_id': 212, 'name': 'Tommy'}
>>> result = db.profiles.insert_one(new_profile)  # This is fine.
>>> result = db.profiles.insert_one(duplicate_profile)
Traceback (most recent call last):
DuplicateKeyError: E11000 duplicate key error index: test_database.profiles.$user_id_1 dup key: { : 212 }

posted @ 2017-04-19 11:53 cyoutetsu 阅读(407) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

cyoutetsu