MapReduce:Simplified Data Processing on Large Clusters(中文翻译1)
【注:本人菜鸟一枚,喜欢Hadoop方向的大数据处理,现在正在研读Google影响世界的三篇论文,遂一边阅读一边翻译,错误在所难免,希望大家给予批评,我会增加学习的动力】
MapReduce:Simplified Data Processing on Large Clusters
Author:Jeffrey Dean and Sanjay Ghemawat
Abstract
MapReduce is a programming model and an associated implementation for processing and generating large data sets.Users specify a map funciton that processes a key/value pair to generate a set of intermediate key/value pairs,and a reduce function that merges all intermediate values associated with the same intermediate key.Many real world tasks are expressible in this model,as shown in the paper.
Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines.The run-time system takes care of the details of partitioning the input data,scheduling the program's execution across a set of machines,handling machine failure's,and managing the required inter-machinge communication.This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.
Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable:a typical MapReduce commutation processes many terabytes of data on thousands of machines.Programmers find the system easy to use:hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.
---------------------------翻译如下---------------------------
MapReduce:利用大型集群简化数据处理
作者:Jeffrey Dean and Sanjay Ghemawat
摘要
MapReduce是一个实现关联处理和关联生成大型数据集的程序设计模型。用户首先指定(创建)一个去处理key/value对的map函数,从而生成一组中间起媒介作用的key/value对,然后指定(创建)一个reduce函数,用来合并含有相同key值的中间起媒介作用的key/value对中的value值。真实的世界中有很多例子满足上述模型,详见本文。
按这个功能风格编写的程序(MapReduce)并行的运行在大量的普通配置的集群上。系统运行时只关心输入数据分割的细节、跨越一组计算机(集群)的程序调度、机器故障处理以及管理集群中必要的计算机通信。这就允许那些没有并行计算和分布式系统开发经验的程序员更容易地利用分布式系统的丰富资源。
我们实现的MapReduce运行在一个大的由普通机器组成的高度可扩展性集群:一个典型的MapReduce由几千台机器组成来处理TB级的数据量。程序员发现这个系统非常的容易使用:谷歌已经实现了数以百计的MapReduce程序,而每天有超过1000多个MapReduce任务正在被运行。