代码改变世界

Hive开发UDF

2020-03-23 17:58  DataBases  阅读(318)  评论(0编辑  收藏  举报

<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-service</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1</version>
</dependency>
</dependencies>

package com.yuejiesong;

import jodd.util.StringUtil;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;

public class CountToolsUDF extends UDF {
public IntWritable evaluate(Text votetools){
String value = votetools.toString();
if(StringUtil.isBlank(value)){
return new IntWritable(0);
}
int length = value.trim().split(";").length;

return new IntWritable(length);
}

public static void main(String[] args) {
Text text = new Text("1234;56");
System.out.println(new CountToolsUDF().evaluate(text));
}

}


UDF:User-Defined Functions
用户自定义函数
实现步骤
1.继承一个类:UDF
org.apache.hadoop.hive.ql.exec.UDF
2.方法规定
a.实现一个或多个名为'evaluate'的方法
b.'evaluate'不是一个空方法,如果需要可以返回'null'
c.'evaluate'的返回类型和参数可以是Java基本类型或Hadoop类型;Java类型会转换为Hadoop类型。
3.打包测试
打成Jar包 mvn clean;mvn package
将Jar包添加到ClassPath下面
add jar /opt/modules/hive-1.2.1-bin/HiveCountToolsUDF-1.0-SNAPSHOT.jar
注册函数
create temproray function db_hive.count_tools_length as 'com.yuejiesong.CountToolsUDF'

测试;
selet ntools,count_tools_length(votetools) as len
from
db_hive.count_tools_length;
查看所有函数
show functions;
desc formatted functionname
UDAF:
UDTF:
hive 窗口函数
数据格式:
"27.38.5.159" "31/Aug/2015:00:04:37 +0800"
IP:IP地址 datetime_str:访问时间
两个UDF
-去除 字段的双引号
-转换日期时间格式
2015-08-31 00:04:37
查询结果:
select
removeQuato(ip) AS ip,
transforDate(datetime_str) AS date_str
from
access_log limit 5;
27.38.5.159,20150831000437