环境准备
CentOS 7
apache-maven-3.6.3
hadoop-2.6.0-cdh5.16.2
protobuf-2.5.0 下载:https://github.com/protocolbuffers/protobuf/releases?after=v3.0.0-alpha-4.1
apache-tez-0.9.2-src.tar.gz 下载:https://dlcdn.apache.org/tez/0.9.2/
注意:如果使用Windows环境编译,需要安装git,此外windows环境下需要不安装protobuf-2.5.0,而是需要安装protoc-2.5.0-win32.zip,下载链接相同,仔细往下翻就能找到。如果使用Linux环境编译,无需额外软件。
Protobuf安装配置
此软件是Tez编译所必须的环境,必须安装。
Linux环境下安装
[root@basecoalmine source]# tar -zxvf protobuf-2.5.0.tar.gz -C ../software/ [root@basecoalmine source]# cd ../software/ [root@basecoalmine software]# cd protobuf-2.5.0/ [root@basecoalmine protobuf-2.5.0]# ./configure [root@basecoalmine protobuf-2.5.0]# make & make install # 测试环境 [root@basecoalmine protobuf-2.5.0]# protoc --version libprotoc 2.5.0
Windows环境下安装
指定任意一个目录,将protoc-2.5.0-win32.zip,然后配置环境变量即可。
环境变量:
D:\software\protoc-2.5.0-win32
测试环境:
C:\Users\King>protoc --version libprotoc 2.5.0
修改编译
修改pom文件
解压apache-tez-0.9.2-src.tar.gz源码包,修改pom文件内容。
(1)第一处修改(40行):
<!-- 修改hadoop版本 --> <hadoop.version>2.6.0-cdh5.16.2</hadoop.version>
(2)第二处修改(94行):
<!-- 添加cdh源 --> <repository> <id>cloudera</id> <name>cloudera Repository</name> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> <snapshots> <enabled>false</enabled> </snapshots> </repository>
(3)第三处修改(117行):
<!-- 添加cdh源 --> <pluginRepository> <id>cloudera</id> <name>cloudera Repository</name> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </pluginRepository>
(4)第四处修改(779行):
这两个模块用处不大,如果不注释掉的话需要额外配置其它编译环境,所以为了方便直接排除掉即可。
<!-- 注释以下模块 --> <!-- <module>tez-ext-service-tests</module> <module>tez-ui</module> -->
修改代码
修改文件:D:\apache-tez-0.9.2-src\tez-mapreduce\src\main\java\org\apache\tez\mapreduce\hadoop\mapreduce\JobContextImpl.java
代码末尾添加方法:
一开始override没有被注释掉,编译的时候报错了,后来就注释掉了。
// @Override public boolean userClassesTakesPrecedence() { return getJobConf().getBoolean(MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, false); }
代码开头引入包:
import org.apache.tez.mapreduce.hadoop.MRJobConfig;
编译打包
进入到Tez源码根目录,开始编译,如果是windows环境需要在git bash里执行如下命令:
mvn clean package -DskipTests=true -Dmaven.test.skip=true -Dmaven.javadoc.skip=true
编译完成的包在 apache-tez-0.9.2-src\tez-dist\target 目录下。
软件安装
编译好的tez-0.9.2.tar.gz、tez-0.9.2-minimal.tar.gz将被使用。
(1)将tez-0.9.2.tar.gz上传到HDFS目录:/user/tez/。
[root@basecoalmine software]# hadoop fs -mkdir /user/tez/ [root@basecoalmine software]# hadoop fs -put /opt/source/tez-0.9.2.tar.gz /user/tez/
(2)tez-0.9.2-minimal.tar.gz将被作为依赖使用。
[root@basecoalmine software]# mkdir tez-0.9.2 [root@basecoalmine software]# mv tez-0.9.2-minimal.tar.gz ./tez-0.9.2 [root@basecoalmine software]# cd tez-0.9.2 # 解压软件包并删除原包 [root@basecoalmine tez-0.9.2]# tar -zxvf tez-0.9.2-minimal.tar.gz [root@basecoalmine tez-0.9.2]# rm -rf tez-0.9.2-minimal.tar.gz # 创建软链接 [root@basecoalmine tez-0.9.2]# cd /opt/app [root@basecoalmine app]# ln -s /opt/software/tez-0.9.2 tez
# 创建配置文件目录
[root@basecoalmine app]# mkdir /opt/app/tez/conf
配置修改
Tez 的配置方式有两种:
在hadoop中配置,通过本hadoop集群执行的mr任务都只能走tez方式提交任务,hive默认的也就运行在tez上而不用其他的配置;
在hive中配置,只有hive的程序可以动态的切换执行引擎,而其他的mapreduce程序只能在yarn上运行。
(2)Hadoop集成Tez方式
在hadoop的主节点上的$HADOOP_HOME/etc/hadoop/目录下创建tez-site.xml文件,内容如下:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- 这里指向hdfs上的tez-0.9.2.tar.gz包 --> <property> <name>tez.lib.uris</name> <value>${fs.defaultFS}/user/tez/tez-0.9.2.tar.gz</value> </property> <!-- Tez运行时读取hadoop jar--> <property> <name>tez.use.cluster.hadoop-libs</name> <value>true</value> </property> </configuration>
修改yarn-site.xml配置:
<property> <name>mapreduce.framework.name</name> <value>yarn-tez</value> </property>
修改yarn-env.sh配置,末尾添加如下,将tez的jar包都引入,yarn启动时加载tez依赖:
export TEZ_HOME=/opt/app/tez for jar in `ls $TEZ_HOME |grep jar`; do export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/$jar done for jar in `ls $TEZ_HOME/lib`; do export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/lib/$jar done
重启hadoop。
功能测试
创建一个文本文件 test.txt,在文件中随意写入字段,并上传到HDFS的/tmp目录,
创建结果输出目录/tmp/out,执行如下命令进行测试。
[root@basecoalmine tmp]# hadoop jar $TEZ_HOME/tez-examples-0.9.2.jar orderedwordcount /tmp/test.txt /tmp/out SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/software/hadoop-2.6.0-cdh5.16.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/software/tez-0.9.2/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 22/03/04 00:36:52 INFO shim.HadoopShimsLoader: Trying to locate HadoopShimProvider for hadoopVersion=2.6.0-cdh5.16.2, majorVersion=2, minorVersion=6 22/03/04 00:36:52 INFO shim.HadoopShimsLoader: Picked HadoopShim org.apache.tez.hadoop.shim.HadoopShim27, providerName=org.apache.tez.hadoop.shim.HadoopShim25_26_27Provider, overrideProviderViaConfig=null, hadoopVersion=2.6.0-cdh5.16.2, majorVersion=2, minorVersion=6 22/03/04 00:36:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 22/03/04 00:36:52 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.9.2, revision=${buildNumber}, SCM-URL=scm:git:https://gitbox.apache.org/repos/asf/tez.git, buildTime=2022-03-03T09:23:19Z ] 22/03/04 00:36:53 INFO client.RMProxy: Connecting to ResourceManager at basecoalmine/192.168.111.56:8032 22/03/04 00:36:54 INFO examples.OrderedWordCount: Running OrderedWordCount 22/03/04 00:36:54 INFO client.TezClient: Submitting DAG application with id: application_1646372190034_0001 22/03/04 00:36:54 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://basecoalmine:9000/user/tez/tez-0.9.2.tar.gz 22/03/04 00:36:54 INFO client.TezClientUtils: Using tez.lib.uris.classpath value from configuration: null 22/03/04 00:36:54 INFO client.TezClient: Tez system stage directory hdfs://basecoalmine:9000/tmp/root/tez/staging/.tez/application_1646372190034_0001 doesn't exist and is created 22/03/04 00:36:55 INFO client.TezClient: Submitting DAG to YARN, applicationId=application_1646372190034_0001, dagName=OrderedWordCount, callerContext={ context=TezExamples, callerType=null, callerId=null } 22/03/04 00:36:56 INFO impl.YarnClientImpl: Submitted application application_1646372190034_0001 22/03/04 00:36:56 INFO client.TezClient: The url to track the Tez AM: http://basecoalmine:8088/proxy/application_1646372190034_0001/ 22/03/04 00:37:05 INFO client.DAGClientImpl: DAG initialized: CurrentState=Running 22/03/04 00:37:05 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 22/03/04 00:37:05 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 22/03/04 00:37:05 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 22/03/04 00:37:05 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 22/03/04 00:37:10 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 33.33% TotalTasks: 3 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 22/03/04 00:37:10 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 22/03/04 00:37:10 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 22/03/04 00:37:10 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 22/03/04 00:37:10 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 66.67% TotalTasks: 3 Succeeded: 2 Running: 1 Failed: 0 Killed: 0 22/03/04 00:37:10 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 22/03/04 00:37:10 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 22/03/04 00:37:10 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 22/03/04 00:37:10 INFO client.DAGClientImpl: DAG: State: SUCCEEDED Progress: 100% TotalTasks: 3 Succeeded: 3 Running: 0 Failed: 0 Killed: 0 22/03/04 00:37:10 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 22/03/04 00:37:10 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 22/03/04 00:37:10 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 22/03/04 00:37:10 INFO client.DAGClientImpl: DAG completed. FinalState=SUCCEEDED
(3)Hive集成Tez方式
在/opt/app/tez/conf目录下创建tez-site.xml文件,内容如下:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- 这里指向hdfs上的tez-0.9.2.tar.gz包 --> <property> <name>tez.lib.uris</name> <value>${fs.defaultFS}/user/tez/tez-0.9.2.tar.gz</value> </property> <!-- Tez运行时读取hadoop jar--> <property> <name>tez.use.cluster.hadoop-libs</name> <value>true</value> </property> </configuration>
修改hive-env.sh配置文件,添加如下内容,使得hive启动时可以加载tez依赖:
export TEZ_HOME=/opt/app/tez # tez-site.xml所在目录 export TEZ_CONF_DIR=/opt/app/tez/conf # 将tez jar包添加到 hive环境中 export TEZ_JARS="" for jar in `ls $TEZ_HOME |grep jar`; do export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar done for jar in `ls $TEZ_HOME/lib`; do export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar done
重启hive
功能测试
-- 增加资源 set hive.tez.container.size=3220; -- 使用tez引擎 set hive.execution.engine=tez; -- 创建表 create table student(id int, name string); -- 向表中插入数据 insert into student values(1,"zhangsan"); insert into student values(2,"lisi"); -- 如果没有报错就表示成功了 select * from student; select count(*) from student;