Hive通过mysql元数据表删除分区
1 创建表 hive命令行操作
1 2 3 4 5 6 7 8 9 10 11 12 | CREATE TABLE IF NOT EXISTS emp( name STRING, salary FLOAT, subordinates ARRAY<STRING>, deductions MAP<STRING,FLOAT>, address STRUCT<street:STRING,city:STRING,province:STRING,zip:INT> ) PARTITIONED BY (province STRING,city STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':’; |
2 造数据
shanxi.txt
1 2 3 4 5 6 | zj1 10000 james,datacloase jim: 1.2 ,james: 2.1 ,lilly: 3.8 huaxing,xian,shanxi, 1 zj2 10000 james,datacloase jim: 1.2 ,james: 2.1 ,lilly: 3.8 huaxing,xian,shanxi, 2 zj3 10000 james,datacloase jim: 1.2 ,james: 2.1 ,lilly: 3.8 huaxing,xian,shanxi, 3 zj4 10000 james,datacloase jim: 1.2 ,james: 2.1 ,lilly: 3.8 huaxing,xian,shanxi, 4 zj5 10000 james,datacloase jim: 1.2 ,james: 2.1 ,lilly: 3.8 huaxing,xian,shanxi, 5 zj6 10000 james,datacloase jim: 1.2 ,james: 2.1 ,lilly: 3.8 huaxing,xian,shanxi, 6 |
hunan.txt
1 2 3 4 5 | zbq1 10000 james,datacloase jim: 1.2 ,james: 2.1 ,lilly: 3.8 huaxing,zhangjiajie,hunan, 1 zbq2 10000 james,datacloase jim: 1.2 ,james: 2.1 ,lilly: 3.8 huaxing,zhangjiajie,hunan, 2 zbq3 10000 james,datacloase jim: 1.2 ,james: 2.1 ,lilly: 3.8 huaxing,zhangjiajie,hunan, 3 zbq4 10000 james,datacloase jim: 1.2 ,james: 2.1 ,lilly: 3.8 huaxing,zhangjiajie,hunan, 4 zbq5 10000 james,datacloase jim: 1.2 ,james: 2.1 ,lilly: 3.8 huaxing,zhangjiajie,hunan, 5 |
3 导入数据 hive命令行操作
1 2 3 4 5 | LOAD DATA LOCAL INPATH '/tmp/logs/shanxi.txt' OVERWRITE INTO TABLE emp PARTITION(province= 'shanxi' ,city='xian’); LOAD DATA LOCAL INPATH '/tmp/logs/hunan.txt' OVERWRITE INTO TABLE emp PARTITION(province= 'hunan' ,city='zhangjiajie’); |
4 查询hive数据
hive表结构
hive> describe extended emp;
查询hive分区数据
hive> select * from emp where province='shanxi' and city = 'xian';
5 查看HDFS的hive目录
[root@hdp1 /tmp/logs]#hdfs dfs -ls /user/hive/warehouse/emp
6 删除hive中hunan的分区
A 进入hive使用的MySQL
B 切换为hive数据库
mysql> use hive;
C 查询相关表
mysql> SELECT * FROM TBLS WHERE TBL_NAME='emp';
发现出现1条记录,所以使用TBLS表的SD_ID字段去SDS表查询LOCATION字段的值,通过LOCATION字段,就可以知道emp这张表的数据库,TBLS中TBL_ID为6的这行记录就是我要查询的表的ID
mysql> select * from SDS where SD_ID=6;
接下来,要根据TBLS表的TBL_ID和hive表分区字段的值(模糊查询)去PARTITIONS表查询,需要得到PARTITIONS表的PART_ID的值(2)
mysql> select * from PARTITIONS t where t.tbl_id=6 and PART_NAME like '%hunan%';
D开始删除
最后,通过TBLS表的TBL_ID(70)和PARTITIONS表的PART_ID(202354)就可以删除hive的分区了
mysql> mysql> delete from PARTITION_KEY_VALS where part;
Query OK, 0 rows affected (0.00 sec)
mysql> delete from PARTITION_KEY_VALS where part_id=2;
Query OK, 2 rows affected (0.00 sec)
mysql> delete from PARTITION_PARAMS where part_id=2;
Query OK, 6 rows affected (0.01 sec)
mysql> delete from PARTITIONS where tbl_id=6 and part_id=2;
Query OK, 1 row affected (0.00 sec)
删除hdfs相关分区目录
[root@hdp1 /root]#hdfs dfs -rm -r "/user/hive/warehouse/emp/province=hunan"
Deleted /user/hive/warehouse/emp/province=hunan
7 查询分区是否删除
已经查询不出来hunan的分区数据了
hive> select * from emp where province='hunan';
OK
Time taken: 0.073 seconds
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 一个奇形怪状的面试题:Bean中的CHM要不要加volatile?
· [.NET]调用本地 Deepseek 模型
· Blazor Hybrid适配到HarmonyOS系统
· Obsidian + DeepSeek:免费 AI 助力你的知识管理,让你的笔记飞起来!
· 解决跨域问题的这6种方案,真香!
· 一套基于 Material Design 规范实现的 Blazor 和 Razor 通用组件库
· 5. Nginx 负载均衡配置案例(附有详细截图说明++)