Hive开发UDF

2020-03-23 17:58 DataBases 阅读(321) 评论(0) 编辑收藏举报

<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-service</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1</version>
</dependency>
</dependencies>

package com.yuejiesong;

import jodd.util.StringUtil;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;

public class CountToolsUDF extends UDF {
    public IntWritable evaluate(Text votetools){
        String value = votetools.toString();
        if(StringUtil.isBlank(value)){
            return  new IntWritable(0);
        }
        int length = value.trim().split(";").length;

        return new IntWritable(length);
    }

    public static void main(String[] args) {
        Text text =  new Text("1234;56");
        System.out.println(new CountToolsUDF().evaluate(text));
    }

}

UDF:User-Defined Functions
用户自定义函数
实现步骤
1.继承一个类：UDF
org.apache.hadoop.hive.ql.exec.UDF
2.方法规定
a.实现一个或多个名为'evaluate'的方法
b.'evaluate'不是一个空方法，如果需要可以返回'null'
c.'evaluate'的返回类型和参数可以是Java基本类型或Hadoop类型；Java类型会转换为Hadoop类型。
3.打包测试
打成Jar包 mvn clean;mvn package
将Jar包添加到ClassPath下面
add jar /opt/modules/hive-1.2.1-bin/HiveCountToolsUDF-1.0-SNAPSHOT.jar
注册函数
create temproray function db_hive.count_tools_length as 'com.yuejiesong.CountToolsUDF'

测试；
selet ntools,count_tools_length(votetools) as len
from
db_hive.count_tools_length；
查看所有函数
show functions；
desc formatted functionname
UDAF:
UDTF:
hive 窗口函数
数据格式：
"27.38.5.159" "31/Aug/2015:00:04:37 +0800"
IP:IP地址 datetime_str:访问时间
两个UDF
-去除字段的双引号
-转换日期时间格式
2015-08-31 00:04:37
查询结果：
select
removeQuato(ip) AS ip,
transforDate(datetime_str) AS date_str
from
access_log limit 5;
27.38.5.159,20150831000437

刷新页面返回顶部

DataBases

Hive开发UDF

About