erlang 分布式数据库Mnesia 实现及应用
Since all operations performed by Dets are disk operations, it is important to realize that a single look-up operation involves a series of disk seek and read operations. For this reason, the Dets functions are much slower than the corresponding Ets functions, although Dets exports a similar interface.
Dets organizes data as a linear hash list and the hash list grows gracefully as more data is inserted into the table. Space management on the file is performed by what is called a buddy system. The current implementation keeps the entire buddy system in RAM, which implies that if the table gets heavily fragmented, quite some memory can be used up. The only way to defragment a table is to close it and then open it again with the repair option set to force.
- Mnesia
First of all, mnesia has no 2 gigabyte limit. It is limited on a 32bit architecture, but hardly any are present anymore for real work. And on 64bit, you are not limited to 2 gigabyte. I have seen databases on the order of several hundred gigabytes. The only problem is the initial start-up time for those.
- Very low latency K/V lookup, not necessarily linearizible.
- Proper transactions with linearizible changes (C in the CAP theorem). These are allowed to run at a much worse latency as they are expected to be relatively rare.
- On-line schema change
- Survival even if nodes fail in a cluster (where cluster is smallish, say 10-50 machines at most)
The design is such that you avoid a separate process since data is in the Erlang system already. You have QLC for datalog-like queries. And you have the ability to store any Erlang term.
Mnesia fares well if the above is what you need. Its limits are:
- You can't get a machine with more than 2 terabytes of memory. And loading 2 teras from scratch is going to be slow.
- Since it is a CP system and not an AP system, the loss of nodes requires manual intervention. You may not need transactions as well. You might also want to be able to seamlessly add more nodes to the system and so on. For this, Riak is a better choice.
- It uses optimistic locking which gives trouble if many processes tries to access the same row in a transaction.