Hive自定义函数(UDF,UDTF,UDAF)
一UDF:一进一出
1、先创建maven项目并下载依赖
| <dependency> |
| <groupId>org.apache.hive</groupId> |
| <artifactId>hive-exec</artifactId> |
| <version>1.2.1</version> |
| </dependency> |
2、在后面打包jar包的时候可能出现报错:
| Could not transfer artifact org.pentaho:pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde |
| |
解决方法:在pom文件中修改hive-exec的配置
| <dependency> |
| <groupId>org.apache.hive</groupId> |
| <artifactId>hive-exec</artifactId> |
| <exclusions> |
| |
| <exclusion> |
| <groupId>org.pentaho</groupId> |
| <artifactId>pentaho-aggdesigner-algorithm</artifactId> |
| </exclusion> |
| </exclusions> |
| </dependency> |
UDF已弃用的方法(弃用的方法也写一下):
| package customer.UDF; |
| |
| import org.apache.hadoop.hive.ql.exec.UDF; |
| |
| public class MyUDF extends UDF{ |
| |
| |
| |
| |
| |
| |
| |
| |
| public String evaluate(String str) { |
| return "日期"+str; |
| } |
| } |
1、进行打包,上传到存储区
2、在hive中使用命令将jar包当作资源上载到hive环境中去(这里是我自己的路径)
| add jar /usr/soft/shellsdir/hive.jar; |

3、注册一个临时函数并使用jar包,function后面跟着的是函数名(自己取名字),在as后面跟着的是从IDEA中获取的类名
| create temporary function sp as 'customer.UDF.MyUDFDemo1'; |

4、可以先查看下是否有刚刚创建的临时函数

5、使用函数名处理数据:
| select date_time,sp(date_time) from test1; |

案例:转小写(要转大写改个函数即可)
| empno,ename,job,mgr,hiredate,sal,comm,deptno |
| 7369,SMITH,CLERK,7902,1980-12-17,800,null,20 |
| 7499,ALLEN,SALESMAN,7698,1981-02-20,1600,300,30 |
| 7521,WARD,SALESMAN,7698,1981-02-22,1250,500,30 |
| 7566,JONES,MANAGER,7839,1981-04-02,2975,null,20 |
| 7654,MARTIN,SALESMAN,7698,1981-09-28,1250,1400,30 |
| 7698,BLAKE,MANAGER,7839,1981-05-01,2850,null,30 |
| 7782,CLARK,MANAGER,7839,1981-06-09,2450,null,10 |
| 7788,SCOTT,ANALYST,7566,1987-07-13,3000,null,20 |
| 7839,KING,PRESIDENT,1981-11-17,5000,null,10 |
| 7844,TURNER,SALESMAN,7698,1981-09-08,1500,0,30 |
| 7876,ADAMS,CLERK,7788,1987-07-13,1100,null,20 |
| 7900,JAMES,CLERK,7698,1981-12-03,950,null,30 |
| 7902,FORD,ANALYST,7566,1981-12-03,3000,null,20 |
| 7934,MILLER,CLERK,7782,1982-01-23,1300,null,10 |
| |
| create table if not exists(empno bigint,ename string,job string,mgr bigint,hiredate date,sal bigint,comm string,deptno bigint) row fromat delimited fields terminated by ','; |
在新写了类并且打包完jar包上传完jar包到本地之后需要先断开本地的连接,然后重新连接:
| ctrl+c |
| nohup hiveserver2 &; |

剩下的步骤和上面一样:
| add jar /usr/soft/shellsdir/hive.jar; |
| |
| create temporary function sp2 as 'customer.UDF.MyUDFDemo2'; |
| |
| show functions; |
| |
| select ename,sp2(ename) from emp; |

创建永久函数:
1、将jars上传至HDFS(先创建一个存储jar包的地方):
| hadoop fs -put hadoop-mapreduce-1.0-SNAPSHOT.jar /jar/ |

2、在hive中用命令行创建永久函数:
| create function myUp as 'customer.UDF.MyUDFDemo2' using jar 'hdfs:/jar/hive1-1.0-SNAPSHOT-jar-with-dependencies.jar'; |

3、退出hive,再进入执行测试
| use test1; |
| select myUp('ASGUIAD'); |

4、删除永久函数并检查(检查的时候报错即为删除成功)

UDF新的方法:
| package customer.UDF; |
| import org.apache.hadoop.hive.ql.exec.UDFArgumentException; |
| import org.apache.hadoop.hive.ql.metadata.HiveException; |
| import org.apache.hadoop.hive.ql.udf.generic.GenericUDF; |
| import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; |
| import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory; |
| |
| public class MyGenericUDF extends GenericUDF { |
| String output; |
| @Override |
| public ObjectInspector initialize(ObjectInspector[] d2) throws UDFArgumentException { |
| |
| output=""; |
| return PrimitiveObjectInspectorFactory.javaStringObjectInspector; |
| } |
| |
| |
| |
| |
| |
| @Override |
| public Object evaluate(DeferredObject[] d1) throws HiveException { |
| |
| String canshu=null; |
| Object o = d1[0].get(); |
| if (o!=null){ |
| canshu=o.toString(); |
| } |
| return "ABC"+canshu; |
| } |
| |
| @Override |
| public String getDisplayString(String[] strings) { |
| return "111ABCDEFG"; |
| } |
| } |
后续步骤同上(测试并执行成功截图)

UDTF:一进多出
以一个案例来讲
| 现在有一条数据,想要讲其格式化输出,M前缀的是id,#前缀的是姓名name,S前缀的是card |
| 想要将其输出格式为三列,每行id,name,card |
| M1001#xiaohu#S324231212,sp#M1002#S2543412432,S21312312412#M1003#dyj |
| package customer.UDTF; |
| import org.apache.hadoop.hive.ql.exec.UDFArgumentException; |
| import org.apache.hadoop.hive.ql.metadata.HiveException; |
| import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF; |
| import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; |
| import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; |
| import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; |
| import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory; |
| |
| import java.util.ArrayList; |
| |
| |
| |
| |
| |
| public class MyUDTF extends GenericUDTF { |
| @Override |
| public StructObjectInspector initialize(StructObjectInspector argOIs) throws UDFArgumentException { |
| |
| ArrayList<String> colNames = new ArrayList<>(); |
| |
| ArrayList<ObjectInspector> colTypes = new ArrayList<>(); |
| |
| colNames.add("id"); |
| colTypes.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); |
| colNames.add("name"); |
| colTypes.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); |
| colNames.add("card"); |
| colTypes.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); |
| |
| |
| return ObjectInspectorFactory.getStandardStructObjectInspector(colNames, colTypes); |
| } |
| |
| |
| |
| |
| |
| |
| @Override |
| public void process(Object[] args) throws HiveException { |
| |
| |
| String[] rows = new String[3]; |
| |
| |
| |
| String col = args[0].toString(); |
| String[] infos = col.split(","); |
| |
| |
| for (String info : infos) { |
| String[] strings = info.split("#"); |
| for (String i : strings) { |
| if (i.startsWith("M")){ |
| rows[0]=i.substring(1); |
| } else if (i.startsWith("S")) { |
| rows[2]=i.substring(1); |
| }else { |
| rows[1]=i; |
| } |
| } |
| |
| forward(rows); |
| } |
| } |
| @Override |
| public void close() throws HiveException { |
| |
| |
| |
| } |
| } |
1、步骤和之前一样
| add jar /usr/soft/shellsdir/hive1-1.0-SNAPSHOT-jar-with-dependencies.jar; |
| |
| create temporary function sp1 as 'customer.UDTF.MyUDTF'; |
| |
| show functions; |
2、直接对数据进行处理
| select sp1('M1001#xiaohu#S324231212,sp#M1002#S2543412432,S21312312412#M1003#dyj'); |
3、这里的hive中会有这个报错,是类型不匹配,需要去java中将类型转换换成 .toString(),然后重新打包重复上面操作即可

4、更改后执行成功

UDAF:多进一出
以一个例子来进行示范:
| id datestr amount |
| 1,2019-02-08,6214.23 |
| 1,2019-02-08,6247.32 |
| 1,2019-02-09,85.63 |
| 1,2019-02-09,967.36 |
| 1,2019-02-10,85.69 |
| 1,2019-02-12,769.85 |
| 1,2019-02-13,943.86 |
| 1,2019-02-14,538.42 |
| 1,2019-02-15,369.76 |
| 1,2019-02-16,369.76 |
| 1,2019-02-18,795.15 |
| 1,2019-02-19,715.65 |
| 1,2019-02-21,537.71 |
| 2,2019-02-08,6214.23 |
| 2,2019-02-08,6247.32 |
| 2,2019-02-09,85.63 |
| 2,2019-02-09,967.36 |
| 2,2019-02-10,85.69 |
| 2,2019-02-12,769.85 |
| 2,2019-02-13,943.86 |
| 2,2019-02-14,943.18 |
| 2,2019-02-15,369.76 |
| 2,2019-02-18,795.15 |
| 2,2019-02-19,715.65 |
| 2,2019-02-21,537.71 |
| 3,2019-02-08,6214.23 |
| 3,2019-02-08,6247.32 |
| 3,2019-02-09,85.63 |
| 3,2019-02-09,967.36 |
| 3,2019-02-10,85.69 |
| 3,2019-02-12,769.85 |
| 3,2019-02-13,943.86 |
| 3,2019-02-14,276.81 |
| 3,2019-02-15,369.76 |
| 3,2019-02-16,369.76 |
| 3,2019-02-18,795.15 |
| 3,2019-02-19,715.65 |
| 3,2019-02-21,537.71 |
建表语句
| create table deal_tb( |
| id string |
| ,datestr string |
| ,amount string |
| )row format delimited fields terminated by ','; |
需求:求出用户连续登录天数和距离上次登录间隔多少天
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 终于写完轮子一部分:tcp代理 了,记录一下
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理