NoSQL Databases - CouchDB
CouchDB还是蛮有意思的一个DB, 总结一下, 他重要的特点
1. 最大的特点就是他的file layout and commitment system, 并由此可以保证ACID特性, 在Nosql里面相当有特点, 参考5.1.6
2. 采用View机制, 这个很方便, 通过javascript就可以简单的定义view, 并可以通过map/reduce逻辑生成view, 但要注意的是, 这是伪map/reduce, 因为只能在单机上运行, 只是使用了这种模型而已.
但存在一个问题, View是在读时更新的, 所以如果有大量数据更新, View生成的速度就会很慢, 解决方法,
通过cron任务定时对View进行查询,从而触发定时的索引更新操作,以减少真正读操作需要等待的时间.
在1.1.0版本中,添加了一个stale=update_after的指定,可以实现返回老数据后再在后台更新的功能
3. 完备的备份机制
CouchDB提供了非常方便, 好用的备份机制. 在网络断开的时候, 你仍然可以在任意节点上进行读写操作, 而不受影响, 当网络恢复后, 各个备份之间会自动的同步, 这也是couchDB的一大特色.
但是同时我个人觉得也暴露出他的一个弱点, 即scalability, 水平扩展性
CouchDB的水平扩展性, 只能通过备份, 但是不提供sharding的功能, 其实我个人觉得, 本质上根本没有解决水平扩展问题, 因为所有读写操作都只能在单节点完成, 连map/reduce也是基于单节点的document的
所以CouchDB真是一个优点和缺点都很鲜明的, interesting DB, 尤其file layout和append-only模式非常值得借鉴.
虽然CouchDB和MongoDB都属于Document DB, 但是两者其实真的除了document这个抽象数据模型外, 没有啥共同点...totally different
相对于MongoDB的中规中矩, 和主流的设计理念, CouchDB的设计似乎非常异类, 难以为广大的传统数据库开发者所接收和理解.
在某些特定的场景下, CouchDB也会是一种不错的选择
数据量不是很大, 没有强烈的sharding的需求
机器node不稳定, 会随时增加减少, 而不想影响服务
读操作相对比较固定
读操作的一致性要求不高, 可以接受一定的写和读之间的延迟
重视写效率和原子性
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Document databases are considered by many as the next logical step from simple key-/value-stores to slightly more complex and meaningful data structures as they at least allow to encapsulate key-/value-pairs in documents. On the other hand there is no strict schema documents have to conform to which eliminates the need schema migration efforts (cf.[Ipp09]).
In this chapter Apache CouchDB and MongoDB as the two major representatives for the class of document databases will be investigated.
5.1. Apache CouchDB
5.1.1. Overview
CouchDB is a document database written in Erlang. The name CouchDB is nowadays sometimes referred to as “Cluster of unreliable commodity hardware” database.
CouchDB can be regarded as a descendant of Lotus Notes for which CouchDB’s main developer Damien Katz worked at IBM before he later initiated the CouchDB project on his own. A lot of concepts from Lotus Notes can be found in CouchDB: documents, views, distribution, and replication between servers and clients.
CouchDB can be briefly characterized as a document database which is accessible via a RESTful HTTPinterface, containing schema-free documents in a flat address space.
The most notable use of CouchDB in production is ubuntu one ([Can10a], 貌似ubuntu one已经放弃CouchDB) the cloud storage and replication service for Ubuntu Linux ([Can10b]). CouchDB is also part of the BBC’s new web application platform (cf. [Far09]). Furthermore some (less prominent) blogs, wikis, social networks, Facebook apps and smaller web sites use CouchDB as their datastore (cf. [C+10]).
http://wiki.apache.org/couchdb/
5.1.2. Data Model and Key Abstractions
Documents
The main abstraction and data structure in CouchDB is a document.
Documents consist of named fields that have a key/name and a value.
A fieldname has to be unique within a document and its assigned value may a string (of arbitrary length), number, boolean, date, an ordered list or an associative map (cf. [Apa10a]).
Documents may contain references to other documents (URIs, URLs) but these do not get checked or held consistent by the database (cf. [PLL09]).
A further limitation is that documents in CouchDB cannot be nested (cf. [Ipp09]).
A wiki article may be an example of such a document:
" Title " : " CouchDB ",
" Last editor " : "172.5.123.91" ,
" Last modified ": "9/23/2010" ,
" Categories ": [" Database ", " NoSQL ", " Document Database "],
" Body ": " CouchDB is a ..." ,
" Reviewed ": false
CouchDB considers itself as a semi-structured database.
While relational databases are designed for structured and interdependent data and key-/value-stores operate on uninterpreted, isolated key-/value-pairs
document databases like CouchDB pursue a third path: data is contained in documents which do not correspond to a fixed schema (schema-free) but have some inner structure known to applications as well as the database itself.
The advantages of this approach are that first there is no need for schema migrations which cause a lot of effort in the relational databases world; secondly compared to key-/value-stores data
can be evaluated more sophisticatedly (e. g. in the calculation of views).
In the web application field there are a lot of document-oriented applications which CouchDB addresses as its data model fits this class of applications and the possibility to iteratively extend or change documents can be done with a lot less effort compared to a relational database (cf. [Apa10a]).
介于关系型数据库和KV数据库之间, 即可以便于schema migrations , 又比KV能够描述更负载的结构, 主要是在web application field, 有大量的适用的场景...
数据模型没有嵌套, 没有层次, 只有一层的flat namespace, 包含所有的documents
Each CouchDB database consists of exactly one flat/non-hierarchical namespace that contains all the documents which have a unique identifier (consisting of a document id and a revision number aka sequence id) calculated by CouchDB. 因为他不支持nested
Document indexing is done in B-Trees which are indexing the document’s id and revision number (sequence id; cf. [Apa10b]).
Views
CouchDBs way to query, present, aggregate and report the semi-structured document data are views (cf.[Apa10a], [Apa10b]).
这个概念应该很容易理解, 很多地方都用到, 无论你数据怎样存储, 可以按不同client的要求, 随意生成各种view, 可以理解成, 关系数据库里面一个select语句就会生成一个view
A typical example for views is to separate different types of documents (such as blog posts, comments, authors in a blog system) which are not distinguished by the database itself as all of them are just documents to it ([PLL09]).
View definitions are strictly virtual and only display the documents from the current database instance, making them separate from the data they display and compatible with replication. CouchDB views are defined inside special **design documents** and can replicate across database instances like regular documents, so that not only data replicates in CouchDB, but entire application designs replicate too.
Views are defined by JavaScript functions which neither change nor save or cache the underlying documents but only present them to the requesting user or client application.
As all documents of the database are processed by a view’s functions this can be time consuming and resource intensive for large databases. Therefore a view is not created and indexed when write operations occur but on demand (at the first request directed to it) and updated incrementally when it is requested again.
View Indexes
Views are a dynamic representation of the actual document contents of a database, and CouchDB makes it easy to create useful views of data. But generating a view of a database with hundreds of thousands or millions of documents is time and resource consuming, it's not something the system should do from scratch each time.
To keep view querying fast, the view engine maintains indexes of its views, and incrementally updates them to reflect changes in the database. CouchDB’s core design is largely optimized around the need for efficient, incremental creation of views and their indexes.
Views and their functions are defined inside special “design” documents, and a design document may contain any number of uniquely named view functions. When a user opens a view and its index is automatically updated, all the views in the same design document are indexed as a single group.
Why are all Views in a single Index
For example:
view1: { "map": "function(doc) { if (doc.type === 'foo') { emit(key, value); } }", "reduce": "_count" }view2: { "map": "function(doc) { if (doc.type === 'foo') { emit(key, value); } }", "reduce": "_sum" }Here view1 and view2 have exactly the same map function. If they were in different design documents, there would be two b-trees (in two different index files) for exactly the same data.
View存储在design Document中,请注意这里design Document和View Index是不同的。design Document保存的是view的定义,View Index保存的是针对某个Database进行View操作产生的结果。
To update a view, the component responsible for it (called view-builder) compares the sequence id of the whole database and checks if it has changed since the last refresh of the view.
While the view-builder is updating a view data from the view’s old state can be read by clients. It is also possible to present the old state of the view to one client and the new one to another client as view indexes are also written in an append-only manner and the compactation of view data does not omit an old index state while a client is still reading from it (more on that in subsection 5.1.7).
CouchDB View的特点是用map/reduce产生的, 当面对大数据, 要动态生成view, 这是必然选择...但是这个m/r是单节点的
The JavaScript functions defining a view are called map and reduce which have similar responsibilities as in Google’s MapReduce approach (cf. [DG04]).
The map function gets a document as a parameter, can do any calculation and may emit arbitrary data for it if it matches the view’s criteria; if the given document
does not match these criteria the map function emits nothing.
The data structure emitted by the map function is a triple consisting of the document id, a key and a value which can be chosen by the map function.
After the map function has been executed it’s results get passed to an optional reduce function which is optional but can do some aggregation on the view (cf. [PLL09]).
5.1.3. Versioning
Documents are updated optimistically and update operations do not imply any locks.
If an update is issued by some client the contacted server creates a new document revisions in a copy-on-modify manner (see section 3.3) and a history of recent revisions is stored in CouchDB until the database gets compacted the next time.
If a document is updated, not only the current revision number is stored but also a list of revision numbers preceding it, to allow the database (when replicating with another node or
processing read requests) as well as client applications to reason on the revision history in the presence of conflicting versions (cf. [PLL09]).
看这小伙写的学术论文, 让我仿佛回到以前读阅读理解, 句子太长, 不认真看还真看不懂...
CouchDB写策略, 称为乐观写, 不用任何锁, 和Dynamo一样, 你可以随便更新, 不用等lock
这样必然会带来conflict, 要解决conflict就需要了解更新之间的时间和因果关系, Dynamo是通过clock vector来记录, 而CouchDB是通过记录revisions.
CouchDB does not consider version conflicts as an exception but rather a normal case.
They can not only occur by different clients operating on the same CouchDB node but also due to clients operating on different replicas of the same database. It is not prohibited by the database to have an unlimited number of concurrent versions.
A CouchDB database can deterministically detect which versions of document succeed each other and which are in conflict and have to be resolved by the client application.
Conflict resolution may occur on any replica node of a database, as the node (which receiving the resolved version) transmits it to all replicas which have to accept this version as valid. It may occur that conflict resolution is issued on different nodes concurrently; the locally resolved versions on both nodes then are detected to be in conflict and get resolved just like all other version conflicts (cf. [Apa10b]).
CouchDB把conflict看作是正常的, 允许同时存在多个concurrent versions. 但CouchDB可以detect到哪些版本data是有conflict的
也和dynamo一样, conflict resolution是由client来完成的, 因为client知道bussiness logic, 适合干这个
5.1.4. Distribution and Replication
CouchDB is designed for distributed setups that follows a peer-approach where each server has the same set of responsibilities and there are no distinguished roles (like in master/slave-setups, standby-clusters etc.).
类似于mongoDB的replica set, 但是所有节点都是peer to peer, 去中心化的设计
因为他不需要想mongoDB保持一致性, 所以去中心化的设计更简单, 每个节点都是独立的, 可以单独处理r/w操作, 很强的分区容错性和可用性
Different database nodes can by design operate completely independent and process read and write requests. Two database nodes can replicate databases (documents, document attachments, views) bilaterally if they reach each other via network.
The replication process works incrementally and can detect conflicting versions in simple manner. By the current revision number as well as the list of outdated revision number CouchDB can determine if are conflicting or not;
if there are version conflicts both nodes have a notion of them and can escalate the conflicting versions to clients for conflict resolution;
if there are no version conflicts the node not having the most recent version of the document updates it (cf. [Apa10a], [Apa10b], [PLL09])
The replication process operates incrementally and document-wise.
Incrementally means that only data changed since the last replication gets transmitted to another node and that not even whole documents are transferred but only changed fields and attachment-blobs;
document-wise means that each document successfully replicated does not have to be replicated again if a replication process crashes (cf. [Apa10b]).
Besides replicating whole databases CouchDB also allows for partial replicas. For these a JavaScript filter function can be defined which passes through the data for replication and rejects the rest of the database (cf. [Apa10b]). This partial replication mechanism can be used to shard data manually by defining different filters for each CouchDB node.
5.1.5. Interface
CouchDB databases are addressed via a RESTful HTTP interface that allows to read and update documents (cf. [Apa10b]).
5.1.6. ACID Properties
The CouchDB file layout and commitment system features all Atomic Consistent Isolated Durable properties.
File layout
On-disk, CouchDB never overwrites committed data or associated structures, ensuring the database file is always in a consistent state. This is a “crash-only" design where the CouchDB server does not go through a shut down process, it's simply terminated.
这个就是CouchDB最大的特点, 所有的更新操作(包括document的创建,修改和删除)都是以在couch文件尾部追加的方式(即Append方式)进行, 这样会产生(Multi-Version Concurrency Control )模型.
所以, 并发写不用等, 不用锁, 反正你不改原来的只是不断的append新的版本, 而有对于crash, 也无所谓, 大不了丢失一些未更新完的数据, 但是不会影响老数据.
Document updates (add, edit, delete) are serialized, except for binary blobs which are written concurrently. Database readers are never locked out and never have to wait on writers or other readers. Any number of clients can be reading documents without being locked out or interrupted by concurrent updates, even on the same document. CouchDB read operations use a Multi-Version Concurrency Control (MVCC) model where each client sees a consistent snapshot of the database from the beginning to the end of the read operation.
更新操作是serialized的, 都是append, 必须一个append完, 才能继续append.
读完全不受影响, 就算同时有client在并发修改该文档, 你照样读, 这个也是由append-only保证的
更牛的是, 还支持隔离性, each client sees a consistent snapshot of the database from the beginning to the end of the read operation. (对于append-only, 这个特性到很容易实现, 设个时间戳过滤, 新的更新都过滤掉就ok)
Documents are indexed in B-trees by their name (DocID) and a Sequence ID. Each update to a database instance generates a new sequential number. Sequence IDs are used later for incrementally finding changes in a database. These b-tree indexes are updated simultaneously when documents are saved or deleted. The index updates always occur at the end of the file (append-only updates).
理解这块, 参考下面的资料, CouchDB database文件的结构图.
Database文件分为header和body, body用来存documents和index
Document存储时会建立两个B-tree索引(基于DocID和SequenceID), 虽然B-tree的绝大部分数据是存在body里面的, 但是B-tree的root node是存储在header中的
而document的更新是append方式的, document更新的同时, index也要一起更新, index的更新也是append方式的
这边很重要的一点是, 你如果仅仅是不断的append body, 这些数据对用户是不可见的, why?
因为index的root是存在header里面的, 每次用户读数据的时候, 都是从root node开始遍历B-tree, 所以如果header里面的root node不更新, 那么你访问到的数据仍然是老的版本. 在CouchDB中, 所有更新都是append的, 唯独对于header中的root node的更新是overwrite, 所以为了保证root node的更新正确性, 保存两份一样的header.
所以如上图所示, 绿色的更新内容通过append的方式加到Body里面, 但如果root不更新, 用户仍然只能看到黄色的旧内容, 只有完成header的更新, 用户才能看到新的内容.
Commitment system
当CouchDB的文档更新时,为了保证数据的一致性,Commit分为以下两步:
- Document数据和index数据首先写入到disk数据库文件
- 生成两个连续的头信息(4kb),随后写入数据库文件
在上面两个过程中,如果在过程1,发生异常(系统崩溃或断电),那么couch文件的头信息没有发生变化,那么所有Append的数据都会被忽略;如果在过程2发生异常,此时Header可能会发生损坏,我们验证第一个Header和第二个Header,如果任意一个Header可用,那么数据库文件可用。
CouchDB通过这种方式来保证更新的原子性.
一般数据库, 如果需要保证原子性, 必须有rollback机制, 因为一般数据库都是overwrite, 所以你改了一半, crash了, 必须把已经改的改回来, 比较复杂.
而CouchDB就简单了, append机制, 只要我不改root node, 你新的数据就不会生效, 所以很容易就可以实现all done or nothing的机制
为了防止在更新header是crash导致head数据被写胀, 存了两份header, 一个写乱了, 还能用另一个恢复. 确实很方便
5.1.7.Compaction
Wasted space is recovered by occasional compaction. On schedule, or when the database file exceeds a certain amount of wasted space, the compaction process clones all the active data to a new file and then discards the old file. The database remains completely online the entire time and all updates and reads are allowed to complete successfully. The old file is deleted only when all the data has been copied and all users transitioned to the new file.
因此采用追加的方式,所以在数据库运行一段时间后,我们需要对其进行“瘦身”,情理那些旧的Document数据。这个过程成为 Compaction。在Compation的过程,数据库仍然可用,只是请注意,在Compation的时候,是通过遍历DBName.couch文件,将最新的数据拷贝到一个DBName.compat文件中,因此这个过程可能会耗费很大的存储空间,如果您在系统繁忙(主要是write)的情况下进行Compation,可能会导致你的硬盘空间耗尽,一定注意哦!
CouchDB让人头痛的十大问题
http://blog.nosqlfan.com/html/3667.html