HIVE UDF开发，并在CDH5.14上运行。附案例

Hive的UDF分为3种

UDF : ONE TO ONE,进来一个出去一个，row mapping。是row级别操作，如：upper、substr函数
UDAF: manyTO ONE,进来多个出去一个，row mapping。是row级别操作，如sum/min。
UDTF: ONE TO many,进来一个出去多个，如alteral view与explode

编写HIVE的UDF只需要重写 evaluate 函数即可。以下是UDF的操作步骤：

　　1. 创建一个maven项目

　　2. 在pom文件中添加需要的jar包，版本号根据实际情况决定。我这里是cdh5.14.2。可以在cm控制台查看

　　　　POM文件：

<!-- 根据要连接的hadoop和hive，设置版本参数 -->
<properties>
    <hadoop.version>2.6.0-cdh5.14.2</hadoop.version>
    <hive.version>1.1.0-cdh5.14.2</hive.version>
</properties>


<!-- 因为使用CDH的hadoop和hive，因此要添加CDH的官方repository，才能够下载相应的依赖包 -->
<!-- 如果使用Apache版本的hadoop和hive，则不需要添加该repository -->
<repositories>
    <repository>
        <id>cloudera</id>
        <name>cloudera</name>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
</repositories>


<!-- 添加依赖组件，根据上方配置的版本参数和repository知识库下载依赖 -->
<dependencies>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-exec</artifactId>
        <version>${hive.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>${hadoop.version}</version>
    </dependency>
</dependencies>

查看CDH组件版本路径

hive组件-->>Hive Metastore Server-->>随便一台主机（uat-nn-01）-->>组件

　　 3. 用maven打包，先Clean-->>package

　　4. 上传jar包至hdfs

- 一般用/user/hive/udf，注意保持权限、用户等与原先一直，通常为hive用户

$ hadoop dfs -mkdir /user/hive/udf
$ hadoop dfs -ls /user/hive/udf
$ hadoop dfs -put -f hiveUdf-1.0-SNAPSHOT.jar /user/hive/udf

　　5. 进入hive的shell界面，创建function

* 语法：
    * 函数名需要是数据库名.函数名的形式
    * 类名需要完整，即包名.类名
    * HDFS 上 jar 包的路径需要是全路径，即hdfs://...，并且数据库有权限访问该路径。 CREATE FUNCTION '数据库名.函数名' AS '包名.类名' USING JAR 'HDFS全路径'
    * 例子：create function addmonthudf as 'com.hive.addMonthUdf' using jar 'hdfs://namenod:8020/user/hive/udf/hiveUdf-1.0-SNAPSHOT.jar';
    * 删除function：drop functions function_name。

　　6. UDF的案例

　　　　月份加减案例代码：

package com.hive;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;


@Description(name = "addMonthUdf",
        value = "addMonthUdf(Text input,Integer num) ====> return date",
        extended = "Example:\n" +
                "addMonthUdf('2020-09',1) ====> '2020-10' \n"
)


public class addMonthUdf extends UDF {
    public Text evaluate(Text input,Integer num) throws ParseException{


        //String time= input.toString();


        //日期格式转换，由Text转成Date类型
        SimpleDateFormat formatter = new SimpleDateFormat( "yyyy-MM");
        System.out.println(input.toString());
        Date ctime = formatter.parse(input.toString());
        //System.out.println(ctime);


        //设置日期
        Calendar c1 = Calendar.getInstance();
        c1.setTime(ctime);


        //月份加减
        c1.add(Calendar.MONTH,num);
        String newTime = formatter.format(c1.getTime());
        //System.out.println(newTime);


        //返回处理结果
        return new Text(newTime);
    }


//测试用，生产环境可以注释
    public static void main(String[] args) throws ParseException{
        //addMonthUdf u1 = new addMonthUdf();
        //Text t= new Text("aaa");
        //Text text= u1.evaluate(new Text("2020-08"),-1);
        //System.out.println(text.toString());


    }
}

运行截图：

posted @ 2020-09-04 16:34 冷幽篁阅读(690) 评论(0) 编辑收藏举报

刷新页面返回顶部

冷幽篁

HIVE UDF开发，并在CDH5.14上运行。附案例

公告