Hive(16) Hive的参数传递

Hive的参数传递

查看hive的命令的参数

[hadoop@node03 ~]$ hive -help
usage: hive
 -d,--define <key=value>          Variable subsitution to apply to hive
                                  commands. e.g. -d A=B or --define A=B
    --database <databasename>     Specify the database to use
 -e <quoted-query-string>         SQL from command line
 -f <filename>                    SQL from files
 -H,--help                        Print help information
    --hiveconf <property=value>   Use value for given property
    --hivevar <key=value>         Variable subsitution to apply to hive
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)

语法结构

hive [-hiveconf x=y]* [<-i filename>]* [<-f filename>|<-e query-string>] [-S]
说明：
- -i 从文件初始化HQL。
- -e从命令行执行指定的HQL
- -f 执行HQL脚本
- -v 输出执行的HQL语句到控制台
- -p connect to Hive Server on port number
- -hiveconf x=y Use this to set hive/hadoop configuration variables. 设置hive运行时候的参数配置

Hive参数配置方式

Hive参数大全：官网地址

开发Hive应用时，不可避免地需要设定Hive的参数。设定Hive的参数可以调优HQL代码的执行效率，或帮助定位问题。然而实践中经常遇到的一个问题是，为什么设定的参数没有起作用？这通常是错误的设定方式导致的。

第一种方式：配置文件

Hive的配置文件包括

用户自定义配置文件：$HIVE_CONF_DIR/hive-site.xml
默认配置文件：$HIVE_CONF_DIR/hive-default.xml
用户自定义配置会覆盖默认配置。
另外，Hive也会读入Hadoop的配置，因为Hive是作为Hadoop的客户端启动的，Hive的配置会覆盖Hadoop的配置。配置文件的设定对本机启动的所有Hive进程都有效。

第二种方式：命令行参数

启动Hive（启动hive shell客户端或hiveServer方式）时，可以在命令行添加-hiveconf param=value来设定参数，例如：

# 启动hive客户端方式 传递参数:
[hadoop@node03 ~]$ hive --hiveconf javax.jdo.option.ConnectionUserName=root
# 这一设定只对本次启动的Session有效!!!

# 启动hiveserver2方式 传递参数:
[hadoop@node03 ~]$ nohup hive --service hiveserver2 & --hiveconf javax.jdo.option.ConnectionUserName=root
# 对于这种Server方式的启动，对所有请求的Sessions都有效

通过beeline连接hiveserver2时也可以传递参数：

[hadoop@node03 ~]$ beeline --hiveconf javax.jdo.option.ConnectionUserName=root

第三种方式：参数声明

进入hive shell客户端后，或者通过beeline连接hiveserver2后，可以在HQL中使用SET关键字设定参数，这种方式设置的参数都是hiveconf类型，例如：

# 进入hive shell后配置参数
hive (default)> set javax.jdo.option.ConnectionUserName=root;
# 这一设定的作用域是session级的。只对当前session有效

# beeline连接hiveserver2后，配置参数，设置mr中reduce个数
0: jdbc:hive2://node03:10000> set mapreduce.job.reduces=100;
# 这一设定的作用域是session级的。只对当前session有效

参数配置的优先级

上述三种设定方式的优先级依次递增。

即参数声明覆盖命令行参数，命令行参数覆盖配置文件设定。
注意某些系统级的参数，例如log4j相关的设定，必须用前两种方式设定，因为那些参数的读取在Session建立以前已经完成了。

参数声明  >   命令行参数   >  配置文件参数（hive）

第四种方式：使用变量传递参数

实际工作当中，我们一般都是将hive的hql语法开发完成之后，就写入到一个脚本里面去，然后定时的通过命令 hive -f 去执行hive的语法即可
然后通过定义变量来传递参数到hive的脚本当中去，那么我们接下来就来看看如何使用hive来传递参数。
hive0.9以及之前的版本是不支持传参
hive1.0版本之后支持传参，语法格式是： hive -f 传递参数
在hive当中我们一般可以使用hivevar或者hiveconf来进行参数的传递

案例1：hiveconf

hiveconf用于定义HIVE执行上下文的属性(配置参数)，可覆盖覆盖hive-site.xml（hive-default.xml）中的参数值，如用户执行目录、日志打印级别、执行队列等。例如我们可以使用hiveconf来覆盖我们的hive属性配置。hiveconf配置的属性一般是hive-default.xml配置文件里有的。

获取hiveconf变量的值必须要使用hiveconf作为前缀参数，具体格式如下:

${hiveconf:key}

0: jdbc:hive2://node03:10000> set my=80;  #仅对当前session有效，一旦!quit退出，my就会失效
No rows affected (0.015 seconds)
0: jdbc:hive2://node03:10000> set my;
+--------+--+
|  set   |
+--------+--+
| my=80  |
+--------+--+
1 row selected (0.017 seconds)

0: jdbc:hive2://node03:10000> use db1;
No rows affected (0.048 seconds)

0: jdbc:hive2://node03:10000> select * from score7_1;
+----------------+----------------+-------------------+-----------------+--+
| score7_1.s_id  | score7_1.c_id  | score7_1.s_score  | score7_1.month  |
+----------------+----------------+-------------------+-----------------+--+
| 01             | 01             | 80                | 201805          |
| 01             | 02             | 90                | 201805          |
| 01             | 03             | 99                | 201805          |
| 02             | 01             | 70                | 201805          |
| 02             | 02             | 60                | 201805          |
| 02             | 03             | 80                | 201805          |
| 03             | 01             | 80                | 201805          |
| 03             | 02             | 80                | 201805          |
| 03             | 03             | 80                | 201805          |
| 04             | 01             | 50                | 201805          |
| 04             | 02             | 30                | 201805          |
| 04             | 03             | 20                | 201805          |
| 05             | 01             | 76                | 201805          |
| 05             | 02             | 87                | 201805          |
| 06             | 01             | 31                | 201805          |
| 06             | 03             | 34                | 201805          |
| 07             | 02             | 89                | 201805          |
| 07             | 03             | 98                | 201805          |
+----------------+----------------+-------------------+-----------------+--+
18 rows selected (0.102 seconds)

0: jdbc:hive2://node03:10000> select * from score7_1 where s_score=${hiveconf:my};
+----------------+----------------+-------------------+-----------------+--+
| score7_1.s_id  | score7_1.c_id  | score7_1.s_score  | score7_1.month  |
+----------------+----------------+-------------------+-----------------+--+
| 01             | 01             | 80                | 201805          |
| 02             | 03             | 80                | 201805          |
| 03             | 01             | 80                | 201805          |
| 03             | 02             | 80                | 201805          |
| 03             | 03             | 80                | 201805          |
+----------------+----------------+-------------------+-----------------+--+

案例2：hivevar

hivevar用于定义HIVE运行时的变量替换，类似于JAVA中的“PreparedStatement”，与${key}配合使用或者与 ${hivevar:key}

获取hivevar变量的值可以不使用前缀hivevar，具体格式如下：

-- 使用前缀:
 ${hivevar:key}
-- 不使用前缀:
 ${key}

[hadoop@node03 ~]$ beeline --hivevar myscore=80;  #myscore仅对当前session有效

beeline> !connect jdbc:hive2://node03:10000

0: jdbc:hive2://node03:10000> use db1;
No rows affected (0.105 seconds)

0: jdbc:hive2://node03:10000> select * from score7_1 where s_score=${myscore};
+----------------+----------------+-------------------+-----------------+--+
| score7_1.s_id  | score7_1.c_id  | score7_1.s_score  | score7_1.month  |
+----------------+----------------+-------------------+-----------------+--+
| 01             | 01             | 80                | 201805          |
| 02             | 03             | 80                | 201805          |
| 03             | 01             | 80                | 201805          |
| 03             | 02             | 80                | 201805          |
| 03             | 03             | 80                | 201805          |
+----------------+----------------+-------------------+-----------------+--+
5 rows selected (0.247 seconds)

案例3：define

define与hivevar用途完全一样，可以简写成“-d

[hadoop@node03 ~]$ hive -d myscore=90;  #myscore仅对当前session有效

hive (db1)> select * from score7_1 where s_score>${myscore};
OK
score7_1.s_id   score7_1.c_id   score7_1.s_score        score7_1.month
01      03      99      201805
07      03      98      201805
Time taken: 0.879 seconds, Fetched: 2 row(s)

案例4：hive脚本实战

第一步：创建表并加载数据

create external table score(s_id string,c_id string,s_score int) 
partitioned by(month string) 
row format delimited fields terminated by '\t';

load data local inpath '/kkb/install/hivedatas/score.csv' overwrite into table score partition(month='201805');

create external table student(s_id string, s_name string, s_birth string, s_sex string) 
row format delimitedfields terminated by '\t';

load data local inpath '/kkb/install/hivedatas/student.csv' overwrite into table student;

0: jdbc:hive2://node03:10000> select * from student;
+---------------+-----------------+------------------+----------------+--+
| student.s_id  | student.s_name  | student.s_birth  | student.s_sex  |
+---------------+-----------------+------------------+----------------+--+
| 01            | 赵雷              | 1990-01-01       | 男              |
| 02            | 钱电              | 1990-12-21       | 男              |
| 03            | 孙风              | 1990-05-20       | 男              |
| 04            | 李云              | 1990-08-06       | 男              |
| 05            | 周梅              | 1991-12-01       | 女              |
| 06            | 吴兰              | 1992-03-01       | 女              |
| 07            | 郑竹              | 1989-07-01       | 女              |
| 08            | 王菊              | 1990-01-20       | 女              |
+---------------+-----------------+------------------+----------------+--+
8 rows selected (0.093 seconds)

0: jdbc:hive2://node03:10000> select * from score;
+-------------+-------------+----------------+--------------+--+
| score.s_id  | score.c_id  | score.s_score  | score.month  |
+-------------+-------------+----------------+--------------+--+
| 01          | 01          | 80             | 201805       |
| 01          | 02          | 90             | 201805       |
| 01          | 03          | 99             | 201805       |
| 02          | 01          | 70             | 201805       |
| 02          | 02          | 60             | 201805       |
| 02          | 03          | 80             | 201805       |
| 03          | 01          | 80             | 201805       |
| 03          | 02          | 80             | 201805       |
| 03          | 03          | 80             | 201805       |
| 04          | 01          | 50             | 201805       |
| 04          | 02          | 30             | 201805       |
| 04          | 03          | 20             | 201805       |
| 05          | 01          | 76             | 201805       |
| 05          | 02          | 87             | 201805       |
| 06          | 01          | 31             | 201805       |
| 06          | 03          | 34             | 201805       |
| 07          | 02          | 89             | 201805       |
| 07          | 03          | 98             | 201805       |
+-------------+-------------+----------------+--------------+--+
18 rows selected (0.136 seconds)

第二步：定义hive脚本

开发hql脚本，并使用hiveconf和hivevar进行参数传入
node03执行以下命令定义hql脚本

cd /kkb/instal/hivedatas

vim hivevariable.hql

#使用脚本1：
use db1;  # 脚本一定要加上这一行，否则会找不到表
select * from student left join score on student.s_id = score.s_id where score.month = ${hiveconf:month} and score.s_score > ${hivevar:s_score} and score.c_id = ${c_id}; 

#或者使用脚本2：
use db1;
select * from student st left join score sc on st.s_id = sc.s_id where sc.month = ${hiveconf:month} and sc.s_score > ${hivevar:s_score} and sc.c_id = ${c_id};

第三步：调用hive脚本并传递参数

node03执行以下命令，调用脚本2：

hive --hiveconf month=201905 --hivevar s_score=80 --hivevar c_id=03  -f /kkb/install/hivedatas/hivevariable.hql

st.s_id st.s_name  st.s_birth  st.s_sex   sc.s_id sc.c_id sc.s_score    sc.month
01      赵雷        1990-01-01      男      01         3       99         201805
07      郑竹        1989-07-01      女      07         3       98         201805
Time taken: 18.187 seconds, Fetched: 2 row(s)

posted @ 2020-08-24 00:29 Whatever_It_Takes 阅读(1267) 评论(0) 编辑收藏举报

刷新页面返回顶部

cqzs+nl