hadoop的Linux操作

初学hadoop之linux系统操作的hdfs的常用命令

 

Hadoop之HDFS文件操作

 

Hadoop fs命令详解

 

 

官网doc

 

sudo su - hdfs:免密,以hdfs账户登陆。可操作hdfs文件

logout

sudo su - root

 

hadoop fs -ls /

 

rm -rf  目录名

 

 

sh dvm_auto_hive_ci_test.sh 2017-11-22 2017-11-22 criteo

hadoop fs -get  /report/dvm_test/script/bashScript

 

ls -l :查看文件权限

 

chmod 777mm.txt:修改文件权限

 

cat criteo.log:查看文件

 

 sh dvm_auto_hive_criteoTransaction_test.sh -d "2017-11-22" -P "criteoTransaction" --input-folder "/report/dvm_test/naa" --hdfs-script "/report/dvm_test/script/etl"

 

hadoop fs -rmdir /tmp/out/report/dvm_test/naa/TransactionCriteo/2017/11

 

hadoop jar "/usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-streaming-2.7.3.2.6.2.0-205.jar" -input "/report/dvm_test/naa/TransactionCriteo/2017/11/22" -output "/tmp/out/report/dvm_test/naa/TransactionCriteo/2017/11/22" -mapper "python /report/dvm_test/script/etl/TransactionCriteo_naa_map.py" -reducer NONE

 

 

truncate table table_name;

 

DROP TABLE [IF EXISTS] table_name;

 

 

ALTER TABLE myTable DROP IF EXISTS PARTITION
(date>='date1' and date<='date2');

ALTER TABLE myTable DROP IF EXISTS PARTITION
(date>='date1' && date<='date2');

ALTER TABLE myTable DROP IF EXISTS PARTITION
(date between 'date1' and 'date2');

 

 

update partition:

ALTER TABLE logs PARTITION(year = 2012, month = 12, day = 18) 
SET LOCATION 'hdfs://user/darcy/logs/2012/12/18';




drop a partition:
ALTER TABLE logs DROP IF EXISTS PARTITION(year = 2012, month = 12, day = 18);



 I implemented a workaround for this issue using some shell scripts, like for instance:
复制代码
for y in {2011..2014} 
do 
  for m in {01..12}
  do 
    echo -n "ALTER TABLE reporting.frontend DROP IF EXISTS PARTITION (year=0000,month=00,day=00,hour=00)" 
    for d in {01..31}
    do 
      for h in {01..23}
      do 
        echo -n ", PARTITION (year=$y,month=$m,day=$d,hour=$h)" 
      done
    done
    echo ";"
  done
done > drop_partitions_v1.hql
复制代码

The resulting .hql file can be simply executed by using the hive (or beeline) -f option.

Obviously the loops should be able to generate the range you want to drop, which might be nontrivial. In the worst case you will need to use several such shell scripts in order to drop the desired range of dates.

Further, please note that in my case the partitions had four keys (year, month, day, hour). If your dates/partitions are coded as strings (not a good idea in my opinion), you will have to 'build' your target string out of the variables y, m, d and h in the shell script, and plot the string inside the echo command. By the way, the dummy partition (containing only 0s) is just there in order to write easily by means of 3-4 loops the whole 'ALTER TABLE' command, which has a special syntax.

 

 

posted @   PanPan003  阅读(347)  评论(0编辑  收藏  举报
编辑推荐:
· 没有源码,如何修改代码逻辑?
· 一个奇形怪状的面试题:Bean中的CHM要不要加volatile?
· [.NET]调用本地 Deepseek 模型
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· .NET Core 托管堆内存泄露/CPU异常的常见思路
阅读排行:
· 微软正式发布.NET 10 Preview 1:开启下一代开发框架新篇章
· DeepSeek R1 简明指南:架构、训练、本地部署及硬件要求
· 没有源码,如何修改代码逻辑?
· NetPad:一个.NET开源、跨平台的C#编辑器
· 面试官:你是如何进行SQL调优的?
点击右上角即可分享
微信分享提示