有多少项目准备和Hadoop比拼?

有哪些项目能够PK目前最红的Hadoop? 以下是目前同Hadoop一样实现MapReduce分布式处理模式的项目:

1. Sector, 自己实现了类似GFS的文件系统和处理库,被用于处理TB级的天文数据,参见http://sector.sourceforge.net/
其自称与Hadoop的PK结果如下:

Hadoop Sector
Storage Unit Blocks. Better granularity, better disk usage; may reduce performance due to block lookup and movement; may waste disk space for small files. Files. Good performance for lookup and wide area data transfer. Robust (no permanent metadata required). Requires users' knowledge to split files; may waste disk space when disks are near full.
Data replication Real time. Emphasizes data reliability, but slow. Periodically. Favors fast IO with less reliability (but still provides long term replicas).
Programming Model MapReduce Stream processing paradigm and MapReduce
Programming Language System written by Java. Native programming language is Java, but support any executables with Hadoop Streaming. System written by C++. Native programming language is C++, but any program can be called by Sphere for data processing.
Data Transfer and Message Passing TCP. Inefficient over wide area; sometimes requires parameters tuning. UDP/UDT. High performance, firewall friendly, more secure, and tuning-free.

2. disco:核心由 erlang 写成,外部接口是 Python 。

用Pthyon写的M/R程序:
from disco.core import Disco, result_iterator

def fun_map(e, params):
return [(w, 1) for w in e.split()]

def fun_reduce(iter, out, params):
s = {}
for w, f in iter:
s[w] = s.get(w, 0) + int(f)
for w, f in s.iteritems():
out.add(w, f)

results = Disco("disco://localhost").new_job(
name = "wordcount",
input = ["http://discoproject.org/chekhov.txt"],
map = fun_map,
reduce = fun_reduce).wait()

for word, frequency in result_iterator(results):
print word, frequency

3. skynet:一个 Ruby 的 MapReduce 实现。


至于GFS-like系统,有 KosMos File System (KFS, C++编写,可取代Hadoop里的HDFS ), 而 Hypertable 则试图成为HBase的替代者。
posted @ 2010-04-27 22:07  searchDM  阅读(535)  评论(0编辑  收藏  举报