Hadoop:hdfs数据块和拷贝份数
hdfs上的文件的最小存储单位是块(block),一个块的大小可以指定,一般默认块的大小为64MB或128MB。
文件块的数量影响了spark读取hdfs文件生成的RDD的partition数量。
另外hdfs上文件是有多份拷贝的(具体几份可以配置)。
若一个DataNode节点失效,其上的数据会出现在其他节点上。
如:一个数据块有2个拷贝,分别位于集群中DataNode节点机器1和机器2上,机器2挂掉,hdfs会在其他DataNode节点机器(可能是机器3)上拷贝一份此数据块,以保持其2份拷贝数量。
指定hdfs存储文件默认的块大小和拷贝份数
修改$HADOOP_HOME/etc/hadoop/hdfs-site.xml:
<property>
<name>dfs.block.size</name>
<value>67108864</value>
<description>64MB</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
若只想指定某个文件的块大小和拷贝份数,可以在上传的时候指定:
hdfs dfs -Ddfs.replication=4 -Ddfs.block.size=134217728 -put file.txt hdfs_file.txt
注:改变hdfs的默认拷贝份数和数据块大小,不会影响已经存在的文件,只会对新上传的文件起作用。
查看文件数据块信息
hdfs fsck input/wordcount.txt -files -blocks -locations
-files
能笼统的查看一个文件的大小、块数量和拷贝份数等信息
-files -blocks
除了上面信息还能查看到每个块的大小和拷贝份数信息
-files -blocks -locations
除了上面信息还能查看每个块的存储节点的位置信息
以上命令输出如下:
Connecting to namenode via http://master:50070/fsck?ugi=root&files=1&blocks=1&locations=1&path=%2Fuser%2Froot%2Finput%2Fwordcount.txt
FSCK started by root (auth:SIMPLE) from /10.180.29.180 for path /user/root/input/wordcount.txt at Tue Jan 09 15:59:59 CST 2018
/user/root/input/wordcount.txt 1079758684 bytes, 9 block(s): OK
0. BP-1761322274-10.180.29.180-1514957510720:blk_1073741900_1076 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.182:50010,DS-f528fad3-deed-4268-a338-71999435b055,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK], DatanodeInfoWithStorage[10.180.29.184:50010,DS-0e0411e4-e022-48a1-8824-15c187dd0cac,DISK]]
1. BP-1761322274-10.180.29.180-1514957510720:blk_1073741901_1077 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK], DatanodeInfoWithStorage[10.180.29.183:50010,DS-f8bd931f-35ef-43c8-97ac-fd9b1b4d8049,DISK]]
2. BP-1761322274-10.180.29.180-1514957510720:blk_1073741902_1078 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.184:50010,DS-0e0411e4-e022-48a1-8824-15c187dd0cac,DISK], DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK]]
3. BP-1761322274-10.180.29.180-1514957510720:blk_1073741903_1079 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK], DatanodeInfoWithStorage[10.180.29.183:50010,DS-f8bd931f-35ef-43c8-97ac-fd9b1b4d8049,DISK]]
4. BP-1761322274-10.180.29.180-1514957510720:blk_1073741904_1080 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.184:50010,DS-0e0411e4-e022-48a1-8824-15c187dd0cac,DISK], DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK]]
5. BP-1761322274-10.180.29.180-1514957510720:blk_1073741905_1081 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.183:50010,DS-f8bd931f-35ef-43c8-97ac-fd9b1b4d8049,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK], DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK]]
6. BP-1761322274-10.180.29.180-1514957510720:blk_1073741906_1082 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.183:50010,DS-f8bd931f-35ef-43c8-97ac-fd9b1b4d8049,DISK], DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK], DatanodeInfoWithStorage[10.180.29.184:50010,DS-0e0411e4-e022-48a1-8824-15c187dd0cac,DISK]]
7. BP-1761322274-10.180.29.180-1514957510720:blk_1073741907_1083 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK], DatanodeInfoWithStorage[10.180.29.183:50010,DS-f8bd931f-35ef-43c8-97ac-fd9b1b4d8049,DISK]]
8. BP-1761322274-10.180.29.180-1514957510720:blk_1073741908_1084 len=6016860 repl=3 [DatanodeInfoWithStorage[10.180.29.184:50010,DS-0e0411e4-e022-48a1-8824-15c187dd0cac,DISK], DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK]]
Status: HEALTHY
Total size: 1079758684 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 9 (avg. block size 119973187 B)
Minimally replicated blocks: 9 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 5
Number of racks: 1
FSCK ended at Tue Jan 09 15:59:59 CST 2018 in 3 milliseconds
The filesystem under path '/user/root/input/wordcount.txt' is HEALTHY