Hadoop:hdfs数据块和拷贝份数

hdfs上的文件的最小存储单位是块(block),一个块的大小可以指定,一般默认块的大小为64MB或128MB。

文件块的数量影响了spark读取hdfs文件生成的RDD的partition数量。

另外hdfs上文件是有多份拷贝的(具体几份可以配置)。
若一个DataNode节点失效,其上的数据会出现在其他节点上。

如:一个数据块有2个拷贝,分别位于集群中DataNode节点机器1和机器2上,机器2挂掉,hdfs会在其他DataNode节点机器(可能是机器3)上拷贝一份此数据块,以保持其2份拷贝数量。

指定hdfs存储文件默认的块大小和拷贝份数

修改$HADOOP_HOME/etc/hadoop/hdfs-site.xml:

<property>
    <name>dfs.block.size</name>
    <value>67108864</value>
    <description>64MB</description>
</property>
<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>

若只想指定某个文件的块大小和拷贝份数,可以在上传的时候指定:

hdfs dfs -Ddfs.replication=4 -Ddfs.block.size=134217728 -put file.txt hdfs_file.txt

注:改变hdfs的默认拷贝份数和数据块大小,不会影响已经存在的文件,只会对新上传的文件起作用。


查看文件数据块信息

hdfs fsck input/wordcount.txt -files -blocks -locations

-files能笼统的查看一个文件的大小、块数量和拷贝份数等信息
-files -blocks除了上面信息还能查看到每个块的大小和拷贝份数信息
-files -blocks -locations除了上面信息还能查看每个块的存储节点的位置信息

以上命令输出如下:

Connecting to namenode via http://master:50070/fsck?ugi=root&files=1&blocks=1&locations=1&path=%2Fuser%2Froot%2Finput%2Fwordcount.txt
FSCK started by root (auth:SIMPLE) from /10.180.29.180 for path /user/root/input/wordcount.txt at Tue Jan 09 15:59:59 CST 2018
/user/root/input/wordcount.txt 1079758684 bytes, 9 block(s):  OK
0. BP-1761322274-10.180.29.180-1514957510720:blk_1073741900_1076 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.182:50010,DS-f528fad3-deed-4268-a338-71999435b055,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK], DatanodeInfoWithStorage[10.180.29.184:50010,DS-0e0411e4-e022-48a1-8824-15c187dd0cac,DISK]]
1. BP-1761322274-10.180.29.180-1514957510720:blk_1073741901_1077 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK], DatanodeInfoWithStorage[10.180.29.183:50010,DS-f8bd931f-35ef-43c8-97ac-fd9b1b4d8049,DISK]]
2. BP-1761322274-10.180.29.180-1514957510720:blk_1073741902_1078 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.184:50010,DS-0e0411e4-e022-48a1-8824-15c187dd0cac,DISK], DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK]]
3. BP-1761322274-10.180.29.180-1514957510720:blk_1073741903_1079 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK], DatanodeInfoWithStorage[10.180.29.183:50010,DS-f8bd931f-35ef-43c8-97ac-fd9b1b4d8049,DISK]]
4. BP-1761322274-10.180.29.180-1514957510720:blk_1073741904_1080 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.184:50010,DS-0e0411e4-e022-48a1-8824-15c187dd0cac,DISK], DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK]]
5. BP-1761322274-10.180.29.180-1514957510720:blk_1073741905_1081 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.183:50010,DS-f8bd931f-35ef-43c8-97ac-fd9b1b4d8049,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK], DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK]]
6. BP-1761322274-10.180.29.180-1514957510720:blk_1073741906_1082 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.183:50010,DS-f8bd931f-35ef-43c8-97ac-fd9b1b4d8049,DISK], DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK], DatanodeInfoWithStorage[10.180.29.184:50010,DS-0e0411e4-e022-48a1-8824-15c187dd0cac,DISK]]
7. BP-1761322274-10.180.29.180-1514957510720:blk_1073741907_1083 len=134217728 repl=3 [DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK], DatanodeInfoWithStorage[10.180.29.183:50010,DS-f8bd931f-35ef-43c8-97ac-fd9b1b4d8049,DISK]]
8. BP-1761322274-10.180.29.180-1514957510720:blk_1073741908_1084 len=6016860 repl=3 [DatanodeInfoWithStorage[10.180.29.184:50010,DS-0e0411e4-e022-48a1-8824-15c187dd0cac,DISK], DatanodeInfoWithStorage[10.180.29.181:50010,DS-34b04f22-0cba-40a6-beca-86dba07c2924,DISK], DatanodeInfoWithStorage[10.180.29.185:50010,DS-6f62d6ba-17d9-454a-903a-c2a74ea57c3e,DISK]]

Status: HEALTHY
 Total size:	1079758684 B
 Total dirs:	0
 Total files:	1
 Total symlinks:		0
 Total blocks (validated):	9 (avg. block size 119973187 B)
 Minimally replicated blocks:	9 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	3
 Average block replication:	3.0
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		5
 Number of racks:		1
FSCK ended at Tue Jan 09 15:59:59 CST 2018 in 3 milliseconds


The filesystem under path '/user/root/input/wordcount.txt' is HEALTHY
posted @ 2019-01-03 15:27  xuejianbest  阅读(1065)  评论(0编辑  收藏  举报