hive之编译源码支持UDF函数

下载hive源码

[root@hadoop001 ~]# cd /opt

[root@hadoop001 opt]# mkdir sourcecode

[root@hadoop001 opt]# cd sourcecode

[root@hadoop001 sourcecode]# wget http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.7.0-src.tar.gz

[root@hadoop001 sourcecode]# ll

    -rw-r--r--  1 root root 14652104 Apr 21 10:23 hive-1.1.0-cdh5.7.0-src.tar.gz

 

解压源码

[root@hadoop001 sourcecode]#tar -xzf    hive-1.1.0-cdh5.7.0-src.tar.gz 

[root@hadoop001 sourcecode]# ll
total 14316
drwxrwxr-x 31 root root 4096 Mar 24 2016 hive-1.1.0-cdh5.7.0
-rw-r--r-- 1 root root 14652104 Apr 21 10:23 hive-1.1.0-cdh5.7.0-src.tar.gz

 

添加UDF函数类

HelloUDF.java

[root@hadoop001 udf]# pwd

/opt/sourcecode/hive-1.1.0-cdh5.7.0/ql/src/java/org/apache/hadoop/hive/ql/udf

[root@hadoop001 udf]# rz    ##上传你自己写的UDF函数

 

 

 

[root@hadoop001 udf]# vim HelloUDF.java

第一行改为:该类的包名为package org.apache.hadoop.hive.ql.udf;

org/apache/hadoop/hive//ql/udf,这个包名就是HelloUDF.java所在路径

 

 

 注册函数

[root@hadoop001 exec]#  pwd

/opt/sourcecode/hive-1.1.0-cdh5.7.0/ql/src/java/org/apache/hadoop/hive/ql/exec/

[root@hadoop001 exec]# vim FunctionRegistry.java

 

在第135行添加

import org.apache.hadoop.hive.ql.udf.HelloUDF;

 

 

在176行添加

 

system.registerUDF("HelloUDF", HelloUDF.class,false);   

 ###HelloUDF是函数名,随意起,第二个HelloUDF.class是类的名字

 

 编译hive

[root@hadoop001 exec]# pwd
/opt/sourcecode/hive-1.1.0-cdh5.7.0

[root@hadoop001 hive-1.1.0-cdh5.7.0]#mvn clean package -DskipTests -Phadoop-2 -Pdist

 

  • 等待编译成功,或者各种报错,基本上就是配置文件的问题,我报错报了两天,真的心累,总结一下心得给大家
  • 1.查看一下maven的版本,最好用最新的,我现在最新的是apache-maven-3.6.1,用apache-maven-3.3.9的时候,编译不成功,会报错。
  • 2.换了版本以后看一下环境是否也配置了,如果还沿用以前的环境,会报错
  • 3.局部环境和全局环境要保持统一或者只配局部环境,要不然会报错
  • 4.setting文件配置###你可以把你之前的备份好,然后全部删掉,把以下内容复制进去
 <repositories>
   <!-- This needs to be removed before checking in-->
    <repository>
      <id>alimaven</id>
      <name>aliyun maven</name>
      <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
      <releases>
        <enabled>true</enabled>
      </releases>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
    <repository>
      <id>cdh.releases.repo</id>
      <url>https://repository.cloudera.com/content/groups/cdh-releases-rcs</url>
      <name>CDH Releases Repository</name>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
    <repository>
      <id>cdh.snapshots.repo</id>
      <url>https://repository.cloudera.com/content/repositories/snapshots</url>
      <name>CDH Snapshots Repository</name>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
    </repository>
    <repository>
      <id>datanucleus</id>
      <name>datanucleus maven repository</name>
      <url>http://www.datanucleus.org/downloads/maven2</url>
      <layout>default</layout>
      <releases>
        <enabled>true</enabled>
        <checksumPolicy>warn</checksumPolicy>
      </releases>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
    <repository>
      <id>glassfish-repository</id>
      <url>http://maven.glassfish.org/content/groups/glassfish</url>
      <releases>
        <enabled>false</enabled>
      </releases>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
    <repository>
      <id>glassfish-repo-archive</id>
      <url>http://maven.glassfish.org/content/groups/glassfish</url>
      <releases>
        <enabled>false</enabled>
      </releases>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
     </repository>
     <repository>
       <id>sonatype-snapshot</id>
       <url>https://oss.sonatype.org/content/repositories/snapshots</url>
       <releases>
         <enabled>false</enabled>
       </releases>
       <snapshots>
         <enabled>false</enabled>
       </snapshots>
    </repository>
  </repositories>

 

编译成功

 

 【注意,编译途中可能会出现这种情况,不要担心,继续等待即可

 

apache-hive-1.1.0-cdh5.7.0-bin.tar.gz这个包是我们需要的

[root@hadoop001 target]# pwd
/opt/sourcecode/hive-1.1.0-cdh5.7.0/packaging/target

[root@hadoop001 target]# ll
total 129260
drwxr-xr-x 2 root root 4096 Apr 22 21:17 antrun
drwxr-xr-x 3 root root 4096 Apr 22 21:17 apache-hive-1.1.0-cdh5.7.0-bin
-rw-r--r-- 1 root root 105854885 Apr 22 21:17 apache-hive-1.1.0-cdh5.7.0-bin.tar.gz
-rw-r--r-- 1 root root 12656493 Apr 22 21:18 apache-hive-1.1.0-cdh5.7.0-jdbc.jar
-rw-r--r-- 1 root root 13823053 Apr 22 21:18 apache-hive-1.1.0-cdh5.7.0-src.tar.gz
drwxr-xr-x 2 root root 4096 Apr 22 21:17 archive-tmp
drwxr-xr-x 3 root root 4096 Apr 22 21:17 maven-shared-archive-resources
drwxr-xr-x 3 root root 4096 Apr 22 21:17 tmp
drwxr-xr-x 2 root root 4096 Apr 22 21:17 warehouse

[root@hadoop001 lib]# pwd
/opt/sourcecode/hive-1.1.0-cdh5.7.0/packaging/target/apache-hive-1.1.0-cdh5.7.0-bin/apache-hive-1.1.0-cdh5.7.0-bin/lib

[root@hadoop001 lib]# ll hive-exec-1.1.0-cdh5.7.0.jar 
-rw-r--r-- 1 root root 19272399 Apr 22 21:17 hive-exec-1.1.0-cdh5.7.0.jar

  

##把 hive-exec-1.1.0-cdh5.7.0.jar这个包复制到hive放这个包的位置,并把原来的删掉

[root@hadoop001 lib]#su - hadoop

[hadoop@hadoop001 lib]$ pwd
/home/hadoop/app/hive-1.1.0-cdh5.7.0/lib

[hadoop@hadoop001 lib]$ ll hive-exec-1.1.0-cdh5.7.0.jar 
-rw-r--r-- 1 hadoop hadoop 19274557 Apr 21 18:54 hive-exec-1.1.0-cdh5.7.0.jar

[hadoop@hadoop001 lib]$ mv hive-exec-1.1.0-cdh5.7.0.jar hive-exec-1.1.0-cdh5.7.0.jar_yuan   重名了

  

复制到/home/hadoop/app/hive-1.1.0-cdh5.7.0/lib/目录下

[root@hadoop001 lib]# cp hive-exec-1.1.0-cdh5.7.0.jar /home/hadoop/app/hive-1.1.0-cdh5.7.0/lib/

 

 

测试

1 hive (default)> show functions;
2 
3 helloudf
4 
5 hive (default)> select helloudf('zz') from dual;
6 OK
7 Hello:zz
8 Time taken: 0.922 seconds, Fetched: 1 row(s)

 

成功

posted @ 2019-04-22 22:23  任重而道远的小蜗牛  阅读(580)  评论(0编辑  收藏  举报