Comparing Mongo DB and Couch DB
http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB, 英文
http://www.searchdatabase.com.cn/showcontent_46595.htm, 中文
最根本的不同是, 应用场景的不同, 是对CAP的取舍的不同
MongoDB选择牺牲可用性来保证一致性和原子性, 而couchDB却是选择放弃一致性而保持高可用性, 所以CouchDB更像KV,nosql, 而MongoDB太象关系型数据库
We are getting a lot of questions "how are mongo db and couch different?" It's a good question: both are document-oriented databases with schemaless JSON-style object data storage. Both products have their place -- we are big believers that databases are specializing and "one size fits all" no longer applies.
We are not CouchDB gurus so please let us know in the forums if we have something wrong.
MVCC
One big difference is that CouchDB is MVCC based, and MongoDB is more of a traditional update-in-place store.
MVCC is very good for certain classes of problems:
- problems which need intense versioning;
- problems with offline databases that resync later;
- problems where you want a large amount of master-master replication happening.
Along with MVCC comes some work too:
- first, the database must be compacted periodically, if there are many updates.
- Second, when conflicts occur on transactions, they must be handled by the programmer manually (unless the db also does conventional locking -- although then master-master replication is likely lost).
MongoDB updates an object in-place when possible. Problems requiring high update rates of objects are a great fit; compaction is not necessary. Mongo's replication works great but, without the MVCC model, it is more oriented towards master/slave and auto failover configurations than to complex master-master setups. With MongoDB you should see high write performance, especially for updates.
这个应该是最大的区别, MongoDB采用和传统DB类似的策略, update in-place, master/slave, auto failover. MongoDB的特点就是效率高, 尤其是update
而CouchDB就采用比较少见的, append-only方式, 可以支持, MVCC, master/master, 支持offline操作.
Horizontal Scalability
One fundamental difference is that a number of Couch users use replication as a way to scale.
With Mongo, we tend to think of replication as a way to gain reliability/failover rather than scalability. Mongo uses (auto) sharding as our path to scalabity (sharding is GA as of 1.6). In this sense MongoDB is more like Google BigTable. (We hear that Couch might one day add partitioning too.)
这个也是CouchDB的一大弱点, 我不认为replication能算水平扩展的方法, 如果不支持partition的话...
Query Expression
Couch uses a clever index building scheme to generate indexes which support particular queries. There is an elegance to the approach, although one must predeclare these structures for each query one wants to execute. One can think of them as materialized views.
Mongo uses traditional dynamic queries. As with, say, MySQL, we can do queries where an index does not exist, or where an index is helpful but only partially so. Mongo includes a query optimizer which makes these determinations. We find this is very nice for inspecting the data administratively, and this method is also good when we don't want an index: such as insert-intensive collections. When an index corresponds perfectly to the query, the Couch and Mongo approaches are then conceptually similar. We find expressing queries as JSON-style objects in MongoDB to be quick and painless though.
Update Aug2011: Couch is adding a new query language "UNQL".
MongoDB 与传统的数据库系统类似,支持动态查询,即使在没有建立索引的行上,也能进行任意的查询。而 CouchDB 不同,CouchDB 不支持动态查询,你必须为你的每一个查询模式建立相应的view,并在此view的基础上进行查询.
Atomicity
Both MongoDB and CouchDB support concurrent modifications of single documents. Both forego complex transactions involving large numbers of objects.
都支持原子性修改.
Durability
CouchDB is a "crash-only" design where the db can terminate at any time and remain consistent.
Previous versions of MongoDB used a storage engine that would require a repairDatabase() operation when starting up after a hard crash (similar to MySQL's MyISAM). Version 1.7.5 and higher offer durability via journaling; specify the --journal command line option
CouchDB是append-only, 再加上两步commit, 可以保证数据的一致性, crash只会导致新数据的丢失, 而不会导致老数据的不一致.
MongoDB就和传统数据库一样, crash必然会导致数据不一致, 需要去repair
Map Reduce
Both CouchDB and MongoDB support map/reduce operations. For CouchDB map/reduce is inherent to the building of all views. With MongoDB, map/reduce is only for data processing jobs but not for traditional queries.
CouchDB的map/reduce只是用于单节点的view查询, 相当的水
而MongoDB, 用于多shard的数据统计, 相对靠谱一些
Javascript
Both CouchDB and MongoDB make use of Javascript. CouchDB uses Javascript extensively including in the building of views .
MongoDB supports the use of Javascript but more as an adjunct. In MongoDB, query expressions are typically expressed as JSON-style query objects; however one may also specify a javascript expression as part of the query. MongoDB also supports running arbitrary javascript functions server-side and uses javascript for map/reduce operations.
简单的说, 就是用在不同的地方
REST
Couch uses REST as its interface to the database. With its focus on performance, MongoDB relies on language-specific database drivers for access to the database over a custom binary protocol. Of course, one could add a REST interface atop an existing MongoDB driver at any time -- that would be a very nice community project. Some early stage REST implementations exist for MongoDB.
Performance
Philosophically, Mongo is very oriented toward performance, at the expense of features that would impede performance. We see Mongo DB being useful for many problems where databases have not been used in the past because databases are too "heavy". Features that give MongoDB good performance are:
- client driver per language: native socket protocol for client/server interface (not REST)
- use of memory mapped files for data storage, 所以非常耗内存
- collection-oriented storage (objects from the same collection are stored contiguously)
- update-in-place (not MVCC)
- written in C++
Use Cases
It may be helpful to look at some particular problems and consider how we could solve them.
- if we were building Lotus Notes, we would use Couch as its programmer versioning reconciliation/MVCC model fits perfectly. Any problem where data is offline for hours then back online would fit this. In general, if we need several eventually consistent master-master replica databases, geographically distributed, often offline, we would use Couch.
- mobile
- Couch is better as a mobile embedded database on phones, primarily because of its online/offine replication/sync capabilities.
- we like Mongo server-side; one reason is its geospatial indexes.
- if we had very high performance requirements we would use Mongo. For example, web site user profile object storage and caching of data from other sources.
- for a problem with very high update rates, we would use Mongo as it is good at that because of its "update-in-place" design. For example see updating real time analytics counters
- in contrast to the above, couch is better when lots of snapshotting is a requirement because of its MVCC design.
Generally, we find MongoDB to be a very good fit for building web infrastructure.