hive外部表自动读取文件夹里的数据
我们在创建表的时候可以指定external关键字创建外部表,外部表对应的文件存储在location指定的目录下,向该目录添加新文件的同时,该表也会读取到该文件(当然文件格式必须跟表定义的一致),删除外部表的同时并不会删除location指定目录下的文件.
1.查看hdfs系统目录/user/hadoop1/myfile下文件
1 2 3 | [hadoop1 @node1 ]$ hadoop fs -ls /user/hadoop1/myfile/ Found 1 items -rw-r--r-- 3 hadoop1 supergroup 567839 2014 - 10 - 29 16 : 50 /user/hadoop1/myfile/tb_class.txt |
2.创建外部表指向myfile目录下的文件
1 2 3 4 5 6 7 8 9 10 | hive (hxl)> create external table tb_class_info_external > (id int , > class_name string, > createtime timestamp , > modifytime timestamp) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '|' > location '/user/hadoop1/myfile' ; OK Time taken: 0.083 seconds |
注意这里的location指向的是hdfs系统上的路径,而不是本地机器上的路径,这里表tb_class_info_external会读取myfile目录下的所有文件
3.查看外部表
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | hive (hxl)> select count( 1 ) from tb_class_info_external; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Starting Job = job_201410300915_0009, Tracking URL = http: //node1:50030/jobdetails.jsp?jobid=job_201410300915_0009 Kill Command = /usr1/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=http: //192.168.56.101:9001 -kill job_201410300915_0009 Hadoop job information for Stage- 1 : number of mappers: 1 ; number of reducers: 1 2014 - 10 - 30 15 : 25 : 10 , 652 Stage- 1 map = 0 %, reduce = 0 % 2014 - 10 - 30 15 : 25 : 12 , 664 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 25 : 13 , 671 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 25 : 14 , 682 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 25 : 15 , 690 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 25 : 16 , 697 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 25 : 17 , 704 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 25 : 18 , 710 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 25 : 19 , 718 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 25 : 20 , 725 Stage- 1 map = 100 %, reduce = 100 %, Cumulative CPU 1.21 sec 2014 - 10 - 30 15 : 25 : 21 , 730 Stage- 1 map = 100 %, reduce = 100 %, Cumulative CPU 1.21 sec 2014 - 10 - 30 15 : 25 : 22 , 737 Stage- 1 map = 100 %, reduce = 100 %, Cumulative CPU 1.21 sec MapReduce Total cumulative CPU time: 1 seconds 210 msec Ended Job = job_201410300915_0009 MapReduce Jobs Launched: Job 0 : Map: 1 Reduce: 1 Cumulative CPU: 1.21 sec HDFS Read: 568052 HDFS Write: 6 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 210 msec OK 10001 Time taken: 14.742 seconds |
可以看到这里表记录数是10001,下面我们在myfile目录下添加另外一个文件tb_class_bak.txt
4.在myfile目录下添加文本
1 | $hadoop fs -cp /user/hadoop1/myfile/tb_class.txt /user/hadoop1/myfile/tb_class_bak.txt |
5.再次查询表记录数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | hive (hxl)> select count( 1 ) from tb_class_info_external; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Starting Job = job_201410300915_0010, Tracking URL = http: //node1:50030/jobdetails.jsp?jobid=job_201410300915_0010 Kill Command = /usr1/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=http: //192.168.56.101:9001 -kill job_201410300915_0010 Hadoop job information for Stage- 1 : number of mappers: 1 ; number of reducers: 1 2014 - 10 - 30 15 : 32 : 02 , 275 Stage- 1 map = 0 %, reduce = 0 % 2014 - 10 - 30 15 : 32 : 04 , 286 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 32 : 05 , 292 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 32 : 06 , 300 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 32 : 07 , 306 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 32 : 08 , 313 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 32 : 09 , 319 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 32 : 10 , 327 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 32 : 11 , 331 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 0.48 sec 2014 - 10 - 30 15 : 32 : 12 , 338 Stage- 1 map = 100 %, reduce = 100 %, Cumulative CPU 1.16 sec 2014 - 10 - 30 15 : 32 : 13 , 343 Stage- 1 map = 100 %, reduce = 100 %, Cumulative CPU 1.16 sec 2014 - 10 - 30 15 : 32 : 14 , 350 Stage- 1 map = 100 %, reduce = 100 %, Cumulative CPU 1.16 sec MapReduce Total cumulative CPU time: 1 seconds 160 msec Ended Job = job_201410300915_0010 MapReduce Jobs Launched: Job 0 : Map: 1 Reduce: 1 Cumulative CPU: 1.16 sec HDFS Read: 1135971 HDFS Write: 6 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 160 msec OK 20002 Time taken: 14.665 seconds |
可以看到记录数加倍了,那就说明表已经读取了新增加的文件
6.删除表
1 2 3 | hive (hxl)> drop table tb_class_info_external; OK Time taken: 1.7 seconds |
表对应的文件并没有删除
1 2 3 4 | [hadoop1 @node1 ]$ hadoop fs -ls /user/hadoop1/myfile/ Found 2 items -rw-r--r-- 3 hadoop1 supergroup 567839 2014 - 10 - 29 16 : 50 /user/hadoop1/myfile/tb_class.txt -rw-r--r-- 3 hadoop1 supergroup 567839 2014 - 10 - 30 15 : 28 /user/hadoop1/myfile/tb_class_bak.txt |
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步