HIVE使用java生成自定义(UDF)函数，并在hive命令中使用（包含使用加密包但是报验签失败：JCE cannot authenticate the provider BC）

创建一个maven项目（不要用springboot）

引入依赖

  <!--添加hive依赖 -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>3.1.1</version>
        </dependency>

        <!--添加hadoop依赖  2.7.3-->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.1.1</version>
        </dependency>

增加打包插件，因为我们这个jar最后要把用到的依赖都要打进去所以要设置下

 <!-- 打包配置 -->
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.0.0</version>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass></mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id> <!-- this is used for inheritance merges -->
                        <phase>package</phase> <!-- 指定在打包节点执行jar包合并操作 -->
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

编写自定义函数类

这里伪代码输出长度

MyStrLengthUDF.java

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.io.Text;

/**
 */
@Description(name = "my_str_length_udf",
        value = "",
        extended = "计算字符串的长度")
public class MyStrLengthUDF extends GenericUDF {


    @Override
    public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
        // 确保输入参数只有一个，并且是字符串类型
        if (arguments.length != 1 || !arguments[0].getCategory().equals(ObjectInspector.Category.PRIMITIVE)
                || !((PrimitiveObjectInspector) arguments[0]).getPrimitiveCategory().equals(PrimitiveObjectInspector.PrimitiveCategory.STRING)) {
            throw new UDFArgumentException("my_str_length_udf() takes exactly one string argument");
        }
        // 设置输出的ObjectInspector为字符串类型
        return PrimitiveObjectInspectorFactory.writableStringObjectInspector;
    }

    @Override
    public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {
        String str = deferredObjects[0].get().toString();
        if (str == null) {
            return null;
        }
        //得出字符串的长度 并输出,这里为了演示 所以用的是输出字符串，
        return new Text(str.length()+"");
    }

    @Override
    public String getDisplayString(String[] strings) {
        return getStandardDisplayString("my_str_length_udf", strings);
    }
}

可以在evaluate()方法中编写自己的逻辑

my_str_length_udf ：这个是自定义函数的名称，就是我们到时候执行hive的sql时候使用的

然后执行打包命令

最后会生成这个文件这个里面会含有我们依赖的依赖包

因为我们要在hive里面使用，我这里上传到hdfs中，根据自己的选择来

hdfs上传命令

hdfs dfs -put /home/udf-1.0-SNAPSHOT-jar-with-dependencies.jar   /udf/

这个表示上传到/udf/路径中根据自己的来

然后登录hive

一般是用命令然后输入一个可以创建自定义函数的用户

beeline

创建udf函数

（临时创建）这种方式只对当前窗口有效

add jar /home/udf-1.0-SNAPSHOT-jar-with-dependencies.jar

CREATE TEMPORARY FUNCTION my_str_length_udf as 'com.hive.udf.MyStrLengthUDF';

永久函数

create function my_str_length_udf as 'com.hive.udf.MyStrLengthUDF' using jar 'hdfs:///udf/udf-1.0-SNAPSHOT-jar-with-dependencies.jar' ;

my_str_length_udf：这个就是上面代码里面的函数名

com.hive.udf.MyStrLengthUDF：这里就是包路径我们上面那个类的完整包路径

hdfs:///udf/udf-1.0-SNAPSHOT-jar-with-dependencies.jar：这个就是hdfs里面我们上传的jar的路径

执行之后打印ok 表示添加成功

可以使用hive的命令看是否添加了查看所有自定义函数

SHOW FUNCTIONS;

可以找到我们加的就行添加的时候没有指定数据库会自动添加 default. 这个前缀

接着使用hive命令执行

例如

select my_str_length_udf("123")

然后结果

删除已经添加的udf函数

drop function default.my_str_length_udf;

default.my_str_length_udf：这个是函数名之所以加上 default. 是因为我们没有指定数据库大家根据查看函数找下名称

如果我们使用加密包bouncycastle加解密遇到JCE cannot authenticate the provider BC的解决办法

这是因为我们环境使用的是Oraclejdk 这种会对加密包进行验签，但是我们这种打包方式会把加密包一起打到我们的jar中破坏了原有的jar包就会验签不通过

这时候把环境换成openjdk就可以了

但是如果环境jdk不能改

也可以把我们的加密包直接放到hive环境的依赖jar中，然后我们的udf包排除掉加密包让他直接读取hive环境的加密包这样也是可以的

posted @ 2024-04-23 13:51 yvioo 阅读(762) 评论(0) 收藏举报

刷新页面返回顶部

yvioo

“你从没有相信过谁吧，这就是你孤身一人的原因”

HIVE使用java生成自定义(UDF)函数，并在hive命令中使用（包含使用加密包但是报验签失败：JCE cannot authenticate the provider BC）

如果我们使用加密包bouncycastle加解密遇到JCE cannot authenticate the provider BC的解决办法

公告