Hadoop学习笔记——HDFS

1.查看hdfs文件的block信息

不正常的文件

hdfs fsck /logs/xxx/xxxx.gz.gz -files -blocks -locations
Connecting to namenode via http://xxx-01:50070/fsck?ugi=xxx&files=1&blocks=1&locations=1&path=%2Flogs%2Fxxx%2Fxxx%2F401294%2Fds%3Dxxxx-07-14%2Fxxx.gz.gz
FSCK started by xxxx (auth:KERBEROS_SSL) from /10.90.1.91 for path xxxxx.gz.gz at Mon Jul 15 11:44:13 CST 2019
Status: HEALTHY
 Total size:	0 B (Total open files size: 194 B)
 Total dirs:	0
 Total files:	0
 Total symlinks:		0 (Files currently being written: 1)
 Total blocks (validated):	0 (Total open file blocks (not validated): 1)
 Minimally replicated blocks:	0
 Over-replicated blocks:	0
 Under-replicated blocks:	0
 Mis-replicated blocks:		0
 Default replication factor:	3
 Average block replication:	0.0
 Corrupt blocks:		0
 Missing replicas:		0
 Number of data-nodes:		99
 Number of racks:		3
FSCK ended at Mon Jul 15 11:44:13 CST 2019 in 0 milliseconds

 正常的文件

Connecting to namenode via http://xxx:50070/fsck?ugi=xxx&files=1&blocks=1&locations=1&path=%2Fxxx%2Fxxx%2Fxxx%2F401294%2Fds%3Dxxx-07-14%2Fxx.gz
FSCK started by xxxx (auth:KERBEROS_SSL) from /10.90.1.91 for path /logs/xxxx.gz at Mon Jul 15 11:46:12 CST 2019
/logs/xxxx.gz 74745 bytes, 1 block(s):  OK
0. BP-1760298736-10.90.1.6-1536234810107:blk_1392467116_318836510 len=74745 Live_repl=3 [DatanodeInfoWithStorage[10.90.1.99:1004,DS-9d465b1f-943f-4716-bce0-8b36e5631b4a,DISK], DatanodeInfoWithStorage[10.90.1.216:1004,DS-160924c6-4cd7-4822-93c0-9ac9cf9c5784,DISK], DatanodeInfoWithStorage[10.90.1.191:1004,DS-d0a2e418-610f-4bef-8f1d-4ce045533656,DISK]]

Status: HEALTHY
 Total size:	74745 B
 Total dirs:	0
 Total files:	1
 Total symlinks:		0
 Total blocks (validated):	1 (avg. block size 74745 B)
 Minimally replicated blocks:	1 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	3
 Average block replication:	3.0
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		99
 Number of racks:		3
FSCK ended at Mon Jul 15 11:46:12 CST 2019 in 1 milliseconds

 

2.修复hdfs文件命令

hdfs debug recoverLease -path /logs/xxxx.gz.gz -retries 3

修复之后

hdfs fsck /logs/xxx.gz.gz -files -blocks -locations
Connecting to namenode via http://xxx-01:50070/fsck?ugi=xxx&files=1&blocks=1&locations=1&path=%2Flogs%2Fnsh%2Fjson%2F401294%2Fds%3D2019-07-14%2Fxxx.gz.gz
FSCK started by xxx (auth:KERBEROS_SSL) from /10.90.1.91 for path /logs/xxxx.gz.gz at Mon Jul 15 11:48:01 CST 2019
/logs/xxxx.gz.gz 67157 bytes, 1 block(s):  OK
0. BP-1760298736-10.90.1.6-1536234810107:blk_1392594522_319757834 len=67157 Live_repl=3 [DatanodeInfoWithStorage[10.90.1.213:1004,DS-6aee5c90-c834-475e-8f20-7a0f8bd8d315,DISK], DatanodeInfoWithStorage[10.90.1.207:1004,DS-cd79bacc-89ff-4fb3-82b5-79341391ae8d,DISK], DatanodeInfoWithStorage[10.90.1.97:1004,DS-ba5953f8-c0c3-444a-8996-3bcfa1bcf851,DISK]]

Status: HEALTHY
 Total size:	67157 B
 Total dirs:	0
 Total files:	1
 Total symlinks:		0
 Total blocks (validated):	1 (avg. block size 67157 B)
 Minimally replicated blocks:	1 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	3
 Average block replication:	3.0
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		99
 Number of racks:		3
FSCK ended at Mon Jul 15 11:48:01 CST 2019 in 1 milliseconds

 

3. 其他博客:Hadoop学习笔记—HDFS

 

4. 一些脚本

遍历hdfs路径,并截取最后一层子路径,其中 ##*/ 是截取 / 后面的字符,参考:Shell当中的字符串切割

#!/usr/bin/env bash

function grep_dir(){
   kinit -kt xxx.keytab xxx
   # 遍历hdfs路径
   for file in `hadoop fs -ls /logs/xxxx | awk '{print $8}'`
        do
            echo $file"   "${file##*/}
        done
}

grep_dir

输出

/logs/xxxx/ds=2019-04-01   ds=2019-04-01
/logs/xxxx/ds=2019-04-02   ds=2019-04-02
/logs/xxxx/ds=2019-04-03   ds=2019-04-03
/logs/xxxx/ds=2019-04-04   ds=2019-04-04
/logs/xxxx/ds=2019-04-05   ds=2019-04-05
/logs/xxxx/ds=2019-04-06   ds=2019-04-06
/logs/xxxx/ds=2019-04-07   ds=2019-04-07

 

遍历hdfs路径,并截取中间层路径,其中

logid=`echo $file | awk -F "/" '{print $5}'`

是使用 / 切分后,取对应数组第5个位置的字符串

#!/usr/bin/env bash

function grep_dir(){
   kinit -kt xxxx.keytab xxxx
   # 遍历hdfs路径
   for file in `hadoop fs -ls /logs/xxx/json/status-1 | awk '{print $8}'`
        do
            echo $file"   "${file##*/}
            logid=`echo $file | awk -F "/" '{print $5}'`
            echo $logid
        done
}

grep_dir

输出

/logs/xxx/json/status-1/ds=2021-04-18   ds=2021-04-18
status-1
/logs/xxx/json/status-1/ds=2021-04-19   ds=2021-04-19
status-1
/logs/xxx/json/status-1/ds=2021-04-20   ds=2021-04-20
status-1

  

5.webhdfs

http://master:14000/webhdfs/v1/?op=liststatus&user.name=hadoop

 

 

参考:WebHDFS与HttpFS的使用 

 

posted @ 2019-07-15 11:51  tonglin0325  阅读(371)  评论(0编辑  收藏  举报