本地测试机器学习算子

一、配置认证环境（本环境是testhb测试环境，不同环境得配置不同的文件）

　　安装keberos

　　1、安装kerberos

　　　　kinit -kt D:\ml\test\test001.keytab test001@DEVTEST.BONC

　　2、krb5.ini来自服务器上的krb5.conf，修改后缀名(/etc/krb5.conf)

　　　　三个地方：C:\Windows

　　　　　　　　　C:\Program Files\MIT\Kerberos

　　　　　　　　　C:\ProgramData\MIT\Kerberos5

　　3、test001.keytab

　　　　启动spark的时候指定路径

　　4、程序指定变量(pycharm中添加环境变量)

　　　　USERDNSDOMAIN=DEVTEST.BONC

　　5、spark conf目录下

　　　　core-site.xml(/opt/beh/core/hadoop/etc/hadoop/core-site.xml)

　　　　hdfs-site.xml(/opt/beh/core/hadoop/etc/hadoop/hdfs-site.xml)

　　　　hive-site.xml (/opt/beh/core/spark/conf/hive.xml)

　　　　yarn-site.xml(/opt/beh/core/hadoop/etc/hadoop/yarn-site.xml)

二、修改base和utils

在启动spark添加keberos认证

spark = SparkSession.builder \
    .config("mapreduce.output.fileoutputformat.compress", "false") \
    .config("spark.broadcast.compress", "false") \
    .config("spark.sql.parquet.compression.codec", "uncompressed") \
    .config("spark.yarn.principal", "test001@DEVTEST.BONC") \
    .config("spark.yarn.keytab","E:\test001.keytab") \
    .config("hadoop.security.authentication", "Kerberos") \
    .config("dfs.client.use.datanode.hostname", "true") \
    .enableHiveSupport() \
    .getOrCreate()
sc = spark.sparkContext

三、屏蔽部分代码

四、注意事项

1、无法读取hdfs上的模型

2、无法将表写入服务器上

3、无法从服务器上读取datatype,所以无法写data_json

4、参数配置如下：

--output=file:///F:\BONC\项目工作\机器学习算子\model
--label_col=c_IS_OUTNET
--features_col=c_ZERO,c_MEAN
--inputpath=hdfs://devtestcluster/test001/datascience-vbap-data-mart/dataSet/vbap52bdbda6ae28dc714d383881
--inputpath_sql="SELECT * FROM devtest.vbap1fa5a097693483b9b3df7780"


"""
注：
1、当读取本地或写入本地需要加file:///
2、读取hdfs上的数据时需要拼接集群 hdfs://devtestcluster/
"""

posted @ 2021-11-16 14:46 Mr·Li程序员阅读(76) 评论(0) 编辑收藏举报

刷新页面返回顶部

本地测试机器学习算子

公告