1. 视图操作
-------
1) 创建一个测试表。
```
hive> create table test(id int,name string);
OK
Time taken: 0.385 seconds
hive> desc test;
OK
id int
name string
Time taken: 0.261 seconds, Fetched: 2 row(s)
```
2) 基于表 test 创建一个 test_view 视图。
```
hive> create view test_view(id,name_length) as select id,length(name) from test;
```
3) 查看 test_view 视图属性。
```
hive> desc test_view;
OK
id int
name_length int
Time taken: 0.071 seconds, Fetched: 2 row(s)
```
4) 查看视图结果。
```
hive> select * from test_view;
```
2. 索引操作
-------
1) Hive 创建索引。
```
hive> create table user like group_test;
OK
Time taken: 0.232 seconds
hive> create index user_index on table user(uid) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild IN TABLE user_index_table;
OK
Time taken: 0.183 seconds
```
2) 更新数据。
```
hive> alter index user_index on user rebuild;
Query ID = hadoop_20180518043232_ebdf97bd-5984-4310-a3c8-6befef328133
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1526553207632_0018, Tracking URL = http://master:8088/proxy/application_1526553207632_0018/
Kill Command = /opt/modules/hadoop-2.6.0/bin/hadoop job -kill job_1526553207632_0018
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-05-18 04:32:55,632 Stage-1 map = 0%, reduce = 0%
2018-05-18 04:33:04,400 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
2018-05-18 04:33:12,406 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.93 sec
MapReduce Total cumulative CPU time: 2 seconds 930 msec
Ended Job = job_1526553207632_0018
Loading data to table default.user_index_table
Table default.user_index_table stats: [numFiles=1, numRows=0, totalSize=0, rawDataSize=0]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.93 sec HDFS Read: 290 HDFS Write: 50 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 930 msec
OK
Time taken: 25.609 seconds
```
3) 查看索引
```
hive> show index on user;
OK
user_index user uid user_index_table compact
Time taken: 0.046 seconds, Fetched: 1 row(s)
```
4) 删除索引
```
hive> drop index user_index on user; OK
Time taken: 0.094 seconds
hive> show index on user;
OK
Time taken: 0.036 seconds
```
5) 创建表和索引案例
```
hive> create table index_test(id INT,name STRING) PARTITIONED BY (dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
```
创建一个索引测试表 index_test,dt作为分区属性,“ROW FORMAT DELIMITED FIELDS TERMINATED BY ','” 表示用逗号分割字符串。
6) 创建一个临时索引表 index_tmp。
```
hive> create table index_tmp(id INT,name STRING,dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
```
7) 加载本地数据到 index_tmp 表中。
```
[hadoop@master test]$ sudo vim test.txt
02,female,192.168.1.3
01,male,192.168.1.26
03,male,192.168.1.5
08,female,192.168.1.62
04,male,192.168.1.9
hive> load data local inpath '/home/hadoop/test/test.txt' into table index_tmp;
Loading data to table default.index_tmp
Table default.index_tmp stats: [numFiles=1, totalSize=106]
OK
Time taken: 0.224 seconds
hive> select * from index_tmp;
OK
2 female 192.168.1.3
1 male 192.168.1.26
3 male 192.168.1.5
8 female 192.168.1.62
4 male 192.168.1.9
```
设置 Hive 的索引属性来优化索引查询,命令如下。
```
hive> set hive.exec.dynamic.partition.mode=nonstrict;----设置所有列为 dynamic partition
hive> set hive.exec.dynamic.partition=true;----使用动态分区
```
8) 查询index_tmp 表中的数据,插入 table_test 表中。
```
hive> insert overwrite table index_test partition(dt) select id,name,dt from index_tmp;
Query ID = hadoop_20180518044343_97e7fe67-a5a1-408b-be8e-e9dadb2f9e48
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1526553207632_0019, Tracking URL = http://master:8088/proxy/application_1526553207632_0019/
Kill Command = /opt/modules/hadoop-2.6.0/bin/hadoop job -kill job_1526553207632_0019
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-05-18 04:43:42,621 Stage-1 map = 0%, reduce = 0%
2018-05-18 04:43:48,835 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec
MapReduce Total cumulative CPU time: 870 msec
Ended Job = job_1526553207632_0019
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://ns/tmp/hive/hadoop/9f7dd0d3-a14c-4535-9291-557b9cb6259b/hive_2018-05-18_04-43-36_337_559705388802402645-1/-ext-10000
Loading data to table default.index_test partition (dt=null)
Time taken for load dynamic partitions : 278
Loading partition {dt=192.168.1.62}
Loading partition {dt=192.168.1.3}
Loading partition {dt=192.168.1.5}
Loading partition {dt=192.168.1.26}
Loading partition {dt=192.168.1.9}