|NO.Z.00005|——————————|BigDataEnd|——|Hadoop&Impala.V05|——|Impala.v05|实验案例|
一、Impala⼊门案例
### --- Impala⼊门案例
~~~ 使⽤用Yum⽅式安装Impala后,impala-shell可以全局使用;
~~~ 进入impala-shell命令⾏impala-shell进⼊到impala的交互窗⼝
[root@linux123 ~]# impala-shell
Starting Impala Shell without Kerberos authentication
Connected to linux123:21000
Server version: impalad version 2.5.0-cdh5.7.6 RELEASE (build ecbba4f4e6d5eec6c33c1e02412621b8b9c71b6a)
***********************************************************************************
Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights reserved.
(Impala Shell v2.5.0-cdh5.7.6 (ecbba4f) built on Tue Feb 21 14:54:50 PST 2017)
When pretty-printing is disabled, you can use the '--output_delimiter' flag to set
the delimiter for fields in the same row. The default is ','.
***********************************************************************************
[linux123:21000] >
### --- 查看所有数据库
[linux123:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name | comment |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| default | Default Hive database |
| mydb | |
| mydb1 | |
| mydb2 | this is mydb2 |
| sale | |
| tuning | |
+------------------+----------------------------------------------+
~~~ # 如果想要使用Impala ,需要将数据加载到Impala中,如何加载数据到Impala中呢?
~~~ 使⽤用Impala的外部表,这种适用于已经有数据文件,只需将数据文件拷贝到HDFS上,
~~~ 创建一张Impala外部表,将外部表的存储位置指向数据文件的位置即可。(类似Hive)
~~~ 通过Insert方式插入数据,适用于我们没有数据文件的场景。
二、实验案例
### --- 准备数据⽂件user.csv
[root@linux121 ~]# vim user.csv
392456197008193000,张三,20,0
267456198006210000,李四,25,1
892456199007203000,王五,24,1
492456198712198000,赵六,26,2
392456197008193000,张三,20,0
392456197008193000,张三,20,0
### --- 创建HDFS 存放数据的路径
[root@linux121 ~]# hadoop fs -mkdir -p /user/impala/t1
~~~ # 上传本地user.csv到hdfs /user/impala/table1
[root@linux121 ~]# hadoop fs -put user.csv /user/impala/t1
### --- 创建表
~~~ # 进入impala-shell
[root@linux123 ~]# impala-shell
~~~ # 表如果存在则删除
[linux123:21000] > drop table if exists t1;
~~~ # 执行创建
[linux123:21000] > create external table t1(id string,name string,age int,gender int)
row format delimited fields terminated by ','
location '/user/impala/t1';
### --- 查询数据
[linux123:21000] > select * from t1;
Query: select * from t1
+--------------------+------+-----+--------+
| id | name | age | gender |
+--------------------+------+-----+--------+
| 392456197008193000 | 张三 | 20 | 0 |
| 267456198006210000 | 李四 | 25 | 1 |
| 892456199007203000 | 王五 | 24 | 1 |
| 492456198712198000 | 赵六 | 26 | 2 |
| 392456197008193000 | 张三 | 20 | 0 |
| 392456197008193000 | 张三 | 20 | 0 |
+--------------------+------+-----+--------+
### --- 创建t2表
~~~ # 创建一个内部表
[linux123:21000] > create table t2(id string,name string,age int,gender int)
row format delimited fields terminated by ',';
~~~ # 查看表结构
[linux123:21000] > desc t1;
+--------+--------+---------+
| name | type | comment |
+--------+--------+---------+
| id | string | |
| name | string | |
| age | int | |
| gender | int | |
+--------+--------+---------+
[linux123:21000] > desc formatted t2;
+------------------------------+------------------------------------------------------------+----------------------+
| name | type | comment |
+------------------------------+------------------------------------------------------------+----------------------+
| # col_name | data_type | comment |
| | NULL | NULL |
| id | string | NULL |
| name | string | NULL |
| age | int | NULL |
| gender | int | NULL |
| | NULL | NULL |
| # Detailed Table Information | NULL | NULL |
| Database: | default | NULL |
| Owner: | root | NULL |
| CreateTime: | Tue Aug 31 18:11:23 CST 2021 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Protect Mode: | None | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://linux121:9000/user/hive/warehouse/t2 | NULL |
| Table Type: | MANAGED_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | transient_lastDdlTime | 1630404683 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL |
| InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | 0 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | field.delim | , |
| | serialization.format | , |
+------------------------------+------------------------------------------------------------+----------------------+
### --- 插入数据到t2
[linux123:21000] > insert overwrite table t2 select * from t1 where gender =0;
~~~ # 验证数据
[linux123:21000] > select * from t2;
Query: select * from t2
+--------------------+------+-----+--------+
| id | name | age | gender |
+--------------------+------+-----+--------+
| 392456197008193000 | 张三 | 20 | 0 |
| 392456197008193000 | 张三 | 20 | 0 |
| 392456197008193000 | 张三 | 20 | 0 |
+--------------------+------+-----+--------+
### --- 更新元数据:使用Beeline连接Hive查看Hive中的数据,
~~~ 发现通过Impala创建的表,导入的数据都可以被Hive感知到。
[linux123:21000] > show tables;
Query: show tables
+-----------------+
| name |
+-----------------+
| t1 |
| t2 |
+-----------------+
[linux123:21000] > select * from t1;
Query: select * from t1
+--------------------+------+-----+--------+
| id | name | age | gender |
+--------------------+------+-----+--------+
| 392456197008193000 | 张三 | 20 | 0 |
| 267456198006210000 | 李四 | 25 | 1 |
| 892456199007203000 | 王五 | 24 | 1 |
| 492456198712198000 | 赵六 | 26 | 2 |
| 392456197008193000 | 张三 | 20 | 0 |
| 392456197008193000 | 张三 | 20 | 0 |
+--------------------+------+-----+--------+
三、小结:
### --- 上面案例中Impala的数据文件我们准备的是以逗号分隔的文本文件,
~~~ 实际上,Impala可以支持RCFile,SequenceFile,Parquet等多种文件格式。
### --- Impala与Hive元数据的关系?
~~~ Hive对于元数据的更新操作不能被Impala感知到;
~~~ Impala对元数据的更新操作可以被Hive感知到。
~~~ # Impala同步Hive元数据命令:
~~~ 手动执行invalidate metadata (后续详细讲解)
~~~ # Impala是通过Hive的metastore服务来访问和操作Hive的元数据,
~~~ 但是Hive对表进行创建删除修改等操作,Impala是无法⾃自动识别到Hive中元数据的变更更情况的,
~~~ 如果想让Impala识别到Hive元数据的变化,
~~~ 所以需要进入impala-shell之后首先要做的操作就是执行invalidatemetadata,
~~~ 该命令会将所有的Impala的元数据失效并重新从元数据库同步元数据信息。
~~~ 后面详细讲解元数据更新命令。
### --- Impala操作HDFS使用的是Impala用户,所以为了避免权限问题,
~~~ 我们可以选择关闭权限校验在hdfs-site.xml中添加如下配置:
[root@linux121 ~]# vim /opt/yanqi/servers/hadoop-2.9.2/etc/hadoop/hdfs-site.xml
<!--关闭hdfs权限校验 -->
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
### --- 分发到其它节点
### --- 重启hdfs及yarn
[root@linux121 ~]# rsync-script /opt/yanqi/servers/hadoop-2.9.2/etc/hadoop/hdfs-site.xml
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
——W.S.Landor
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」