MapReduce:Simplified Data Processing on Large Clusters(中文翻译3)

【注:本人菜鸟一枚,喜欢Hadoop方向的大数据处理,现在正在研读Google影响世界的三篇论文,遂一边阅读一边翻译,错误在所难免,希望大家给予批评,我会增加学习的动力】

2 Programming Model

  The computation takes a set of input key/value pairs,and produces a set of output key/value pairs.The user of the MapReduce library expresses the computation as two functions:Map and Reduce.

  Map,written by the user,takes an input pair and produces a set of intermediate key/value pairs.The MapReduce library groups together all intermediate values associated with the same intermediate key I and passes them to the Reduce function.

  The Reduce function,also written by the user,accepts an intermediate key I and a set of values for that key.It merges together these values to form a possibly smaller set of values.Typically just zero or one output value is produced per Reduce invocation.The intermediate values are supplied to the user's reduce function via an iterator.This allows us to handle lists of values that are too large to fit in memory.

2.1 Example

  Consider the problem of counting the number of occurrences of each word in a large collection of documents.The user would write code similar to the following pseudo-code:

  map (String key,String value):

    for each word w in value:

      EmitIntermediate(w,"1");

  reduce (String key,Iterator values):

    int result = 0;

    for each v in values:

      result += ParseInt (v);

    Emit (AsString (result));

  The map function emits each word plus an associated count of occurrences (just '1' in this simple example).The reduce function sums together all counts emitted for a particular word.

  In addition,the user writes code to fill in a mapreduce specification object with the names of the input and output files,and optional tuning parameters.The user then invokes the MapReduce function,passing it the specification object.The user's code is linked together with the MapReduce library (implemented in C++).Appendix A contains the full program text for this example.

------------------------------中文翻译------------------------------

2 编程模型

  对输入的一组key/value对进行计算,并且产生一组key/value对。MapReduce库的用户用两个函数来表达这个计算:Map和Reduce。

  Map函数,用户写的这个函数用来处理输入的key/alue对,同时产生一组中间起媒介作用的key/value对。MapReduce库把所有具有相同key值I的key/value对集合起来,然后将其传递给Reduce函数。

  Reduce函数,用户写的这个函数用来接受中间key值I和与key值相关的一组value值。Reduce函数将这些value值合并形成一个较小的value值集合。通常,每次Reduce函数调用只产生0或1个输出value值。我们通过一个迭代器将中间起媒介作用的value值提供给Reduce函数。这使我们能够处理因列表值太大而无法放入内存的value值集合。

2.1 例子

  考虑到在一个大的文档集合中要统计每个单词出现的次数的问题。用户可以编写类似于下面的伪代码:

 

  map (String key,String value):

 

    for each word w in value:

 

      EmitIntermediate(w,"1");

 

  reduce (String key,Iterator values):

 

    int result = 0;

 

    for each v in values:

 

      result += ParseInt (v);

 

    Emit (AsString (result));

  Map函数输出每个单词以及出现相关单词的次数(就像这个简单的例子中的1)。Reduce函数将输出一个特定单词出现的次数总和。

  此外,用户编写代码并使用输入、输出文件和可调节的参数来填充一个按照MapReduce规范建立的对象。然后,用户可以调用MapReduce函数,将这个规范对象传递给它。用户的代码是与MapReduce库相连接的(用C++实现)。附录A中包含了这个例子的所有程序文档。

 

posted @ 2013-03-24 20:46  二手产品经理  阅读(1281)  评论(0编辑  收藏  举报