有多少项目准备和Hadoop比拼?
有哪些项目能够PK目前最红的Hadoop? 以下是目前同Hadoop一样实现MapReduce分布式处理模式的项目:
1. Sector, 自己实现了类似GFS的文件系统和处理库,被用于处理TB级的天文数据,参见http://sector.sourceforge.net/
其自称与Hadoop的PK结果如下:
2. disco:核心由 erlang 写成,外部接口是 Python 。
3. skynet:一个 Ruby 的 MapReduce 实现。
至于GFS-like系统,有 KosMos File System (KFS, C++编写,可取代Hadoop里的HDFS ), 而 Hypertable 则试图成为HBase的替代者。
1. Sector, 自己实现了类似GFS的文件系统和处理库,被用于处理TB级的天文数据,参见http://sector.sourceforge.net/
其自称与Hadoop的PK结果如下:
| Hadoop | Sector |
---|---|---|
Storage Unit | Blocks. Better granularity, better disk usage; may reduce performance due to block lookup and movement; may waste disk space for small files. | Files. Good performance for lookup and wide area data transfer. Robust (no permanent metadata required). Requires users' knowledge to split files; may waste disk space when disks are near full. |
Data replication | Real time. Emphasizes data reliability, but slow. | Periodically. Favors fast IO with less reliability (but still provides long term replicas). |
Programming Model | MapReduce | Stream processing paradigm and MapReduce |
Programming Language | System written by Java. Native programming language is Java, but support any executables with Hadoop Streaming. | System written by C++. Native programming language is C++, but any program can be called by Sphere for data processing. |
Data Transfer and Message Passing | TCP. Inefficient over wide area; sometimes requires parameters tuning. | UDP/UDT. High performance, firewall friendly, more secure, and tuning-free. |
2. disco:核心由 erlang 写成,外部接口是 Python 。
用Pthyon写的M/R程序:
from disco.core import Disco, result_iterator
def fun_map(e, params):
return [(w, 1) for w in e.split()]
def fun_reduce(iter, out, params):
s = {}
for w, f in iter:
s[w] = s.get(w, 0) + int(f)
for w, f in s.iteritems():
out.add(w, f)
results = Disco("disco://localhost").new_job(
name = "wordcount",
input = ["http://discoproject.org/chekhov.txt"],
map = fun_map,
reduce = fun_reduce).wait()
for word, frequency in result_iterator(results):
print word, frequency
3. skynet:一个 Ruby 的 MapReduce 实现。
至于GFS-like系统,有 KosMos File System (KFS, C++编写,可取代Hadoop里的HDFS ), 而 Hypertable 则试图成为HBase的替代者。