hadoop2.0.x【2】--Apache Hadoop MapReduce - Migrating from Apache Hadoop 1.x to Apache Hadoop 2--翻译与分析

Introduction

This document provides information for users to migrate their Apache Hadoop MapReduce applications from Apache Hadoop 1.x to Apache Hadoop 2.x.

本文档提供的信息为用户从Apache Hadoop的1.xMapReduce应用迁移到Apache Hadoop的2.x版本

In Apache Hadoop 2.x we have spun off resource management capabilities into Apache Hadoop YARN, a general purpose, distributed application management framework while Apache Hadoop MapReduce (aka MRv2) remains as a pure distributed computation framework.

Apache Hadoop 2.x中我们已经剥离了资源管理功能集成到Apache Hadoop YARN,成为通用化的分布式应用程序的管理框架,同时Apache Hadoop MapReduce又名MRv2仍然是一个纯粹的分布式计算框架。

In general, the previous MapReduce runtime (aka MRv1) has been reused and no major surgery has been conducted on it. Therefore, MRv2 is able to ensure satisfactory compatibility with MRv1 applications. However, due to some improvements and code refactorings, a few APIs have been rendered backward-incompatible.

一般情况下,以前的MapReduce的运行时(又名MRv1)无需大的改动而在其上运行MRv2能够确保与MRv1应用的良好兼容性。虽然进行一些改进代码重构,部分API已提供向后兼容。

The remainder of this page will discuss the scope and the level of backward compatibility that we support in Apache Hadoop MapReduce 2.x (MRv2).

这个页面的其余部分将讨论我们Apache Hadoop MapReduce 2.XMRv2向后兼容范围

Binary Compatibility

First, we ensure binary compatibility to the applications that use old mapred APIs. This means that applications which were built against MRv1 mapred APIs can run directly on YARN without recompilation, merely by pointing them to an Apache Hadoop 2.x cluster via configuration.

首先,在二进制兼容于使用mapred的API的应用程序。这意味着使用MRv1 mapred API构建的程序可以直接在YARN上运行而无需重新编译仅仅通过进行Apache Hadoop 2.x集群进行配置即可。

Source Compatibility

We cannot ensure complete binary compatibility with the applications that use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, we ensure source compatibility formapreduce APIs that break binary compatibility. In other words, users should recompile their applications that usemapreduce APIs against MRv2 jars. One notable binary incompatibility break is Counter and CounterGroup.

我们不能保证二进制完全兼容使用MapReduce API的应用程序因为这些API已经MRv1发展了很多然而,我们保证源代码级别mapreduce的兼容,即使二进制API兼容。换句话说,用户只需要基于MRv2 jars重新编译他们使用mapreduce API的应用程序需要注意的二进制形式不兼容CounterCounterGroup

Not Supported

MRAdmin has been removed in MRv2 because because mradmin commands no longer exist. They have been replaced by the commands inrmadmin. We neither support binary compatibility nor source compatibility for the applications that use this class directly.

因为mradmin命令不再存在,MRAdminMRv2被删除他们已被替换的命令rmadmin无论是二进制和源代码级别都不兼容直接使用这个类应用程序

Tradeoffs between MRv1 Users and Early MRv2 Adopters

Unfortunately, maintaining binary compatibility for MRv1 applications may lead to binary incompatibility issues for early MRv2 adopters, in particular Hadoop 0.23 users. Formapred APIs, we have chosen to be compatible with MRv1 applications, which have a larger user base. Formapreduce APIs, if they don't significantly break Hadoop 0.23 applications, we still change them to be compatible with MRv1 applications. Below is the list of MapReduce APIs which are incompatible with Hadoop 0.23.

不幸的是MRv1应用程序的二进制兼容性可能会导致二进制不兼容的问题早日MRv2采用特别是Hadoop的0.23用户 For mapred的API我们选择MRv1的应用程序其中有一个更大的用户群兼容 For mapreduce的API如果他们不显著突破0.23的Hadoop应用程序我们仍然将其更改为MRv1应用程序兼容。下面是MapReduce的API的这是用Hadoop0.23不兼容列表

Problematic Function Incompatibility Issue
org.apache.hadoop.util.ProgramDriver#drive Return type changes from void to int
org.apache.hadoop.mapred.jobcontrol.Job#getMapredJobID Return type changes from String to JobID
org.apache.hadoop.mapred.TaskReport#getTaskId Return type changes from String to TaskID
org.apache.hadoop.mapred.ClusterStatus#UNINITIALIZED_MEMORY_VALUE Data type changes from long to int
org.apache.hadoop.mapreduce.filecache.DistributedCache#getArchiveTimestamps Return type changes from long[] to String[]
org.apache.hadoop.mapreduce.filecache.DistributedCache#getFileTimestamps Return type changes from long[] to String[]
org.apache.hadoop.mapreduce.Job#failTask Return type changes from void to boolean
org.apache.hadoop.mapreduce.Job#killTask Return type changes from void to boolean
org.apache.hadoop.mapreduce.Job#getTaskCompletionEvents Return type changes from o.a.h.mapred.TaskCompletionEvent[] too.a.h.mapreduce.TaskCompletionEvent[]

Malicious

For the users who are going to try hadoop-examples-1.x.x.jar on YARN, please note thathadoop -jar hadoop-examples-1.x.x.jar will still use hadoop-mapreduce-examples-2.x.x.jar, which is installed together with other MRv2 jars. By default Hadoop framework jars appear before the users' jars in the classpath, such that the classes from the 2.x.x jar will still be picked. Users should remove hadoop-mapreduce-examples-2.x.x.jar from the classpath of all the nodes in a cluster. Otherwise, users need to setHADOOP_USER_CLASSPATH_FIRST=true and HADOOP_CLASSPATH=...:hadoop-examples-1.x.x.jar to run their target examples jar, and add the following configuration inmapred-site.xml to make the processes in YARN containers pick this jar as well.

对于想要在YARN上尝试的hadoop-examples-1.x.x.jar用户请注意hadoop -jar hadoop-examples-1.x.x.jar仍然会使用hadoop-mapreduce-examples-2.x.x.jar,这是与其他MRv2 jar一起安装的。默认情况下,Hadoop框架的jar用户类路径中的jar文件之前被调用,使得从2.xx版本的jar的类会被优先用户应该集群中的所有节点的类路径中删除hadoop-mapreduce-examples-2.x.x.jar否则用户需要设置HADOOP_USER_CLASSPATH_FIRST=trueHADOOP_CLASSPATH=...:hadoop-examples-1.x.x.jar来运行他们的目标实例的jar并添加以下配置在mapred-site.xml中,使YARN中的进程有限使用这个jar

    <property>
        <name>mapreduce.job.user.classpath.first</name>
        <value>true</value>
    </property>

posted on 2014-03-20 19:19  AI001  阅读(173)  评论(0编辑  收藏  举报

导航