My naive mapreduce notes

Challenges on parallel computing

1. node failure on huge clusters are frequent (~1000 nodes fail per day), storing data in the failed node will lost the data.

2. network bottlenecks: network bendwidth = 1Gbps, moving 10 TB data takes approximiately 1 day.

3. distributed programming is hard!

MapReduce addresses all the three challenges above:

1. Map-Reduce model addresses the node-failure-so-data-loss problem by storing data redundantly on multiple nodes. The same data is stored on multiple nodes so that even if you lose one of those nodes, the data is still available on the other nodes.

2. Map-Reduces addresses the network bottlenecks by moving the computations close to the data to avoid copying data around the network

3. Map-Reduce provides a very simple programing model that hides the complexity of the parallel computing.

posted @ 2016-06-01 10:25 Rui Yan 阅读(147) 评论(0) 编辑收藏举报

刷新页面返回顶部

Rui Yan's Blog

A variety of blogs including technologies and research

My naive mapreduce notes

公告