吹静静

欢迎QQ交流:592590682

环境准备

CentOS 7

apache-maven-3.6.3

hadoop-2.6.0-cdh5.16.2

protobuf-2.5.0       下载:https://github.com/protocolbuffers/protobuf/releases?after=v3.0.0-alpha-4.1

apache-tez-0.9.2-src.tar.gz  下载:https://dlcdn.apache.org/tez/0.9.2/

注意:如果使用Windows环境编译,需要安装git,此外windows环境下需要不安装protobuf-2.5.0,而是需要安装protoc-2.5.0-win32.zip,下载链接相同,仔细往下翻就能找到。如果使用Linux环境编译,无需额外软件。

Protobuf安装配置

此软件是Tez编译所必须的环境,必须安装。

Linux环境下安装

[root@basecoalmine source]# tar -zxvf protobuf-2.5.0.tar.gz -C ../software/
[root@basecoalmine source]# cd ../software/
[root@basecoalmine software]# cd protobuf-2.5.0/
[root@basecoalmine protobuf-2.5.0]# ./configure
[root@basecoalmine protobuf-2.5.0]# make & make install
# 测试环境
[root@basecoalmine protobuf-2.5.0]# protoc --version
libprotoc 2.5.0

Windows环境下安装

指定任意一个目录,将protoc-2.5.0-win32.zip,然后配置环境变量即可。

环境变量:
D:\software\protoc-2.5.0-win32

测试环境:
C:\Users\King>protoc --version libprotoc 2.5.0

修改编译

修改pom文件

解压apache-tez-0.9.2-src.tar.gz源码包,修改pom文件内容。

(1)第一处修改(40行):

    <!-- 修改hadoop版本 -->
    <hadoop.version>2.6.0-cdh5.16.2</hadoop.version>

(2)第二处修改(94行):

    <!-- 添加cdh源 -->
    <repository>
      <id>cloudera</id>
      <name>cloudera Repository</name>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
      <snapshots>
            <enabled>false</enabled>
      </snapshots>
    </repository>

(3)第三处修改(117行):

    <!-- 添加cdh源 -->
    <pluginRepository>
      <id>cloudera</id>
      <name>cloudera Repository</name>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </pluginRepository>

(4)第四处修改(779行):

这两个模块用处不大,如果不注释掉的话需要额外配置其它编译环境,所以为了方便直接排除掉即可。

    <!-- 注释以下模块 -->
    <!-- <module>tez-ext-service-tests</module>
    <module>tez-ui</module> -->

修改代码

修改文件:D:\apache-tez-0.9.2-src\tez-mapreduce\src\main\java\org\apache\tez\mapreduce\hadoop\mapreduce\JobContextImpl.java

代码末尾添加方法:

一开始override没有被注释掉,编译的时候报错了,后来就注释掉了。

  // @Override
  public boolean userClassesTakesPrecedence() {
    return getJobConf().getBoolean(MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, false);
  } 

代码开头引入包:

import org.apache.tez.mapreduce.hadoop.MRJobConfig;

编译打包

进入到Tez源码根目录,开始编译,如果是windows环境需要在git bash里执行如下命令:

mvn clean package -DskipTests=true -Dmaven.test.skip=true -Dmaven.javadoc.skip=true

编译完成的包在 apache-tez-0.9.2-src\tez-dist\target 目录下。

软件安装

编译好的tez-0.9.2.tar.gz、tez-0.9.2-minimal.tar.gz将被使用。

(1)将tez-0.9.2.tar.gz上传到HDFS目录:/user/tez/。

[root@basecoalmine software]# hadoop fs -mkdir /user/tez/
[root@basecoalmine software]# hadoop fs -put /opt/source/tez-0.9.2.tar.gz /user/tez/

(2)tez-0.9.2-minimal.tar.gz将被作为依赖使用。

[root@basecoalmine software]# mkdir tez-0.9.2
[root@basecoalmine software]# mv tez-0.9.2-minimal.tar.gz ./tez-0.9.2
[root@basecoalmine software]# cd tez-0.9.2
# 解压软件包并删除原包
[root@basecoalmine tez-0.9.2]# tar -zxvf tez-0.9.2-minimal.tar.gz 
[root@basecoalmine tez-0.9.2]# rm -rf tez-0.9.2-minimal.tar.gz
# 创建软链接
[root@basecoalmine tez-0.9.2]# cd /opt/app
[root@basecoalmine app]# ln -s /opt/software/tez-0.9.2 tez
# 创建配置文件目录
[root@basecoalmine app]# mkdir /opt/app/tez/conf

配置修改

Tez 的配置方式有两种:

在hadoop中配置,通过本hadoop集群执行的mr任务都只能走tez方式提交任务,hive默认的也就运行在tez上而不用其他的配置;
在hive中配置,只有hive的程序可以动态的切换执行引擎,而其他的mapreduce程序只能在yarn上运行。

(2)Hadoop集成Tez方式

在hadoop的主节点上的$HADOOP_HOME/etc/hadoop/目录下创建tez-site.xml文件,内容如下:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <!-- 这里指向hdfs上的tez-0.9.2.tar.gz包 -->
    <property>
        <name>tez.lib.uris</name>
        <value>${fs.defaultFS}/user/tez/tez-0.9.2.tar.gz</value>    
    </property>
    <!-- Tez运行时读取hadoop jar-->
    <property>
         <name>tez.use.cluster.hadoop-libs</name>
         <value>true</value>
    </property>
</configuration>

修改yarn-site.xml配置:

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn-tez</value>
    </property>

修改yarn-env.sh配置,末尾添加如下,将tez的jar包都引入,yarn启动时加载tez依赖:

export TEZ_HOME=/opt/app/tez   
for jar in `ls $TEZ_HOME |grep jar`; do
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/lib/$jar
done

重启hadoop。

功能测试

创建一个文本文件 test.txt,在文件中随意写入字段,并上传到HDFS的/tmp目录,

创建结果输出目录/tmp/out,执行如下命令进行测试。

[root@basecoalmine tmp]# hadoop jar $TEZ_HOME/tez-examples-0.9.2.jar orderedwordcount /tmp/test.txt /tmp/out
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/software/hadoop-2.6.0-cdh5.16.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/software/tez-0.9.2/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/03/04 00:36:52 INFO shim.HadoopShimsLoader: Trying to locate HadoopShimProvider for hadoopVersion=2.6.0-cdh5.16.2, majorVersion=2, minorVersion=6
22/03/04 00:36:52 INFO shim.HadoopShimsLoader: Picked HadoopShim org.apache.tez.hadoop.shim.HadoopShim27, providerName=org.apache.tez.hadoop.shim.HadoopShim25_26_27Provider, overrideProviderViaConfig=null, hadoopVersion=2.6.0-cdh5.16.2, majorVersion=2, minorVersion=6
22/03/04 00:36:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/03/04 00:36:52 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.9.2, revision=${buildNumber}, SCM-URL=scm:git:https://gitbox.apache.org/repos/asf/tez.git, buildTime=2022-03-03T09:23:19Z ]
22/03/04 00:36:53 INFO client.RMProxy: Connecting to ResourceManager at basecoalmine/192.168.111.56:8032
22/03/04 00:36:54 INFO examples.OrderedWordCount: Running OrderedWordCount
22/03/04 00:36:54 INFO client.TezClient: Submitting DAG application with id: application_1646372190034_0001
22/03/04 00:36:54 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://basecoalmine:9000/user/tez/tez-0.9.2.tar.gz
22/03/04 00:36:54 INFO client.TezClientUtils: Using tez.lib.uris.classpath value from configuration: null
22/03/04 00:36:54 INFO client.TezClient: Tez system stage directory hdfs://basecoalmine:9000/tmp/root/tez/staging/.tez/application_1646372190034_0001 doesn't exist and is created
22/03/04 00:36:55 INFO client.TezClient: Submitting DAG to YARN, applicationId=application_1646372190034_0001, dagName=OrderedWordCount, callerContext={ context=TezExamples, callerType=null, callerId=null }
22/03/04 00:36:56 INFO impl.YarnClientImpl: Submitted application application_1646372190034_0001
22/03/04 00:36:56 INFO client.TezClient: The url to track the Tez AM: http://basecoalmine:8088/proxy/application_1646372190034_0001/
22/03/04 00:37:05 INFO client.DAGClientImpl: DAG initialized: CurrentState=Running
22/03/04 00:37:05 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:05 INFO client.DAGClientImpl:    VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:05 INFO client.DAGClientImpl:    VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:05 INFO client.DAGClientImpl:    VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 33.33% TotalTasks: 3 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 66.67% TotalTasks: 3 Succeeded: 2 Running: 1 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Summation Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl: DAG: State: SUCCEEDED Progress: 100% TotalTasks: 3 Succeeded: 3 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Summation Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Sorter Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl: DAG completed. FinalState=SUCCEEDED

(3)Hive集成Tez方式

/opt/app/tez/conf目录下创建tez-site.xml文件,内容如下:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <!-- 这里指向hdfs上的tez-0.9.2.tar.gz包 -->
    <property>
        <name>tez.lib.uris</name>
        <value>${fs.defaultFS}/user/tez/tez-0.9.2.tar.gz</value>    
    </property>
    <!-- Tez运行时读取hadoop jar-->
    <property>
         <name>tez.use.cluster.hadoop-libs</name>
         <value>true</value>
    </property>
</configuration>

修改hive-env.sh配置文件,添加如下内容,使得hive启动时可以加载tez依赖:

export TEZ_HOME=/opt/app/tez
# tez-site.xml所在目录
export TEZ_CONF_DIR=/opt/app/tez/conf
# 将tez jar包添加到 hive环境中
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
   export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
  export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done

重启hive

功能测试

-- 增加资源
set hive.tez.container.size=3220;
-- 使用tez引擎
set hive.execution.engine=tez;
-- 创建表
create table student(id int, name string);
-- 向表中插入数据
insert into student values(1,"zhangsan");
insert into student values(2,"lisi");
-- 如果没有报错就表示成功了
select * from student;
select count(*) from student;

 

posted on 2022-03-07 11:41  吹静静  阅读(193)  评论(0编辑  收藏  举报