|NO.Z.00005|——————————|BigDataEnd|——|Hadoop&Impala.V05|——|Impala.v05|实验案例|

一、Impala⼊门案例

### --- Impala⼊门案例

~~~     使⽤用Yum⽅式安装Impala后，impala-shell可以全局使用；
~~~     进入impala-shell命令⾏impala-shell进⼊到impala的交互窗⼝

[root@linux123 ~]# impala-shell
Starting Impala Shell without Kerberos authentication
Connected to linux123:21000
Server version: impalad version 2.5.0-cdh5.7.6 RELEASE (build ecbba4f4e6d5eec6c33c1e02412621b8b9c71b6a)
***********************************************************************************
Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights reserved.
(Impala Shell v2.5.0-cdh5.7.6 (ecbba4f) built on Tue Feb 21 14:54:50 PST 2017)

When pretty-printing is disabled, you can use the '--output_delimiter' flag to set
the delimiter for fields in the same row. The default is ','.
***********************************************************************************
[linux123:21000] >

### --- 查看所有数据库

[linux123:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name             | comment                                      |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| default          | Default Hive database                        |
| mydb             |                                              |
| mydb1            |                                              |
| mydb2            | this is mydb2                                |
| sale             |                                              |
| tuning           |                                              |
+------------------+----------------------------------------------+

~~~     # 如果想要使用Impala ,需要将数据加载到Impala中，如何加载数据到Impala中呢？

~~~     使⽤用Impala的外部表，这种适用于已经有数据文件，只需将数据文件拷贝到HDFS上，
~~~     创建一张Impala外部表，将外部表的存储位置指向数据文件的位置即可。（类似Hive）
~~~     通过Insert方式插入数据，适用于我们没有数据文件的场景。

二、实验案例

### --- 准备数据⽂件user.csv

[root@linux121 ~]# vim user.csv
392456197008193000,张三,20,0
267456198006210000,李四,25,1
892456199007203000,王五,24,1
492456198712198000,赵六,26,2
392456197008193000,张三,20,0
392456197008193000,张三,20,0

### --- 创建HDFS 存放数据的路径
[root@linux121 ~]# hadoop fs -mkdir -p /user/impala/t1
 
~~~     # 上传本地user.csv到hdfs /user/impala/table1
[root@linux121 ~]# hadoop fs -put user.csv /user/impala/t1

### --- 创建表

~~~     # 进入impala-shell
[root@linux123 ~]# impala-shell

~~~     # 表如果存在则删除
[linux123:21000] > drop table if exists t1;

~~~     # 执行创建
[linux123:21000] > create external table t1(id string,name string,age int,gender int)
row format delimited fields terminated by ','
location '/user/impala/t1';

### --- 查询数据

[linux123:21000] > select * from t1;
Query: select * from t1
+--------------------+------+-----+--------+
| id                 | name | age | gender |
+--------------------+------+-----+--------+
| 392456197008193000 | 张三 | 20  | 0      |
| 267456198006210000 | 李四 | 25  | 1      |
| 892456199007203000 | 王五 | 24  | 1      |
| 492456198712198000 | 赵六 | 26  | 2      |
| 392456197008193000 | 张三 | 20  | 0      |
| 392456197008193000 | 张三 | 20  | 0      |
+--------------------+------+-----+--------+

### --- 创建t2表

~~~     # 创建一个内部表
[linux123:21000] > create table t2(id string,name string,age int,gender int)
row format delimited fields terminated by ',';

~~~     # 查看表结构

[linux123:21000] > desc t1;
+--------+--------+---------+
| name   | type   | comment |
+--------+--------+---------+
| id     | string |         |
| name   | string |         |
| age    | int    |         |
| gender | int    |         |
+--------+--------+---------+

[linux123:21000] > desc formatted t2;
+------------------------------+------------------------------------------------------------+----------------------+
| name                         | type                                                       | comment              |
+------------------------------+------------------------------------------------------------+----------------------+
| # col_name                   | data_type                                                  | comment              |
|                              | NULL                                                       | NULL                 |
| id                           | string                                                     | NULL                 |
| name                         | string                                                     | NULL                 |
| age                          | int                                                        | NULL                 |
| gender                       | int                                                        | NULL                 |
|                              | NULL                                                       | NULL                 |
| # Detailed Table Information | NULL                                                       | NULL                 |
| Database:                    | default                                                    | NULL                 |
| Owner:                       | root                                                       | NULL                 |
| CreateTime:                  | Tue Aug 31 18:11:23 CST 2021                               | NULL                 |
| LastAccessTime:              | UNKNOWN                                                    | NULL                 |
| Protect Mode:                | None                                                       | NULL                 |
| Retention:                   | 0                                                          | NULL                 |
| Location:                    | hdfs://linux121:9000/user/hive/warehouse/t2                | NULL                 |
| Table Type:                  | MANAGED_TABLE                                              | NULL                 |
| Table Parameters:            | NULL                                                       | NULL                 |
|                              | transient_lastDdlTime                                      | 1630404683           |
|                              | NULL                                                       | NULL                 |
| # Storage Information        | NULL                                                       | NULL                 |
| SerDe Library:               | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe         | NULL                 |
| InputFormat:                 | org.apache.hadoop.mapred.TextInputFormat                   | NULL                 |
| OutputFormat:                | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL                 |
| Compressed:                  | No                                                         | NULL                 |
| Num Buckets:                 | 0                                                          | NULL                 |
| Bucket Columns:              | []                                                         | NULL                 |
| Sort Columns:                | []                                                         | NULL                 |
| Storage Desc Params:         | NULL                                                       | NULL                 |
|                              | field.delim                                                | ,                    |
|                              | serialization.format                                       | ,                    |
+------------------------------+------------------------------------------------------------+----------------------+

### --- 插入数据到t2
[linux123:21000] > insert overwrite table t2 select * from t1 where gender =0;

~~~     # 验证数据
[linux123:21000] > select * from t2;
Query: select * from t2
+--------------------+------+-----+--------+
| id                 | name | age | gender |
+--------------------+------+-----+--------+
| 392456197008193000 | 张三 | 20  | 0      |
| 392456197008193000 | 张三 | 20  | 0      |
| 392456197008193000 | 张三 | 20  | 0      |
+--------------------+------+-----+--------+

### --- 更新元数据：使用Beeline连接Hive查看Hive中的数据，
~~~     发现通过Impala创建的表，导入的数据都可以被Hive感知到。

[linux123:21000] > show tables;
Query: show tables
+-----------------+
| name            |
+-----------------+
| t1              |
| t2              |
+-----------------+

[linux123:21000] > select * from t1;
Query: select * from t1
+--------------------+------+-----+--------+
| id                 | name | age | gender |
+--------------------+------+-----+--------+
| 392456197008193000 | 张三 | 20  | 0      |
| 267456198006210000 | 李四 | 25  | 1      |
| 892456199007203000 | 王五 | 24  | 1      |
| 492456198712198000 | 赵六 | 26  | 2      |
| 392456197008193000 | 张三 | 20  | 0      |
| 392456197008193000 | 张三 | 20  | 0      |
+--------------------+------+-----+--------+

三、小结：

### --- 上面案例中Impala的数据文件我们准备的是以逗号分隔的文本文件，

~~~     实际上，Impala可以支持RCFile,SequenceFile,Parquet等多种文件格式。

### --- Impala与Hive元数据的关系？

~~~     Hive对于元数据的更新操作不能被Impala感知到；
~~~     Impala对元数据的更新操作可以被Hive感知到。

~~~     # Impala同步Hive元数据命令：
~~~     手动执行invalidate metadata （后续详细讲解）
~~~     # Impala是通过Hive的metastore服务来访问和操作Hive的元数据，
~~~     但是Hive对表进行创建删除修改等操作，Impala是无法⾃自动识别到Hive中元数据的变更更情况的，
~~~     如果想让Impala识别到Hive元数据的变化，
~~~     所以需要进入impala-shell之后首先要做的操作就是执行invalidatemetadata,
~~~     该命令会将所有的Impala的元数据失效并重新从元数据库同步元数据信息。
~~~     后面详细讲解元数据更新命令。

### --- Impala操作HDFS使用的是Impala用户，所以为了避免权限问题，
~~~     我们可以选择关闭权限校验在hdfs-site.xml中添加如下配置：

[root@linux121 ~]# vim /opt/yanqi/servers/hadoop-2.9.2/etc/hadoop/hdfs-site.xml

<!--关闭hdfs权限校验 -->
<property>
    <name>dfs.permissions.enabled</name>
    <value>false</value>
</property>

### --- 分发到其它节点
### --- 重启hdfs及yarn

[root@linux121 ~]# rsync-script /opt/yanqi/servers/hadoop-2.9.2/etc/hadoop/hdfs-site.xml

Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart

——W.S.Landor