hadoop3.x-ec常用命令操作
一、EC原理
二、常用命令与对应解释
1.查看当前支持的EC策略
hdfs ec -listPolicies 2023-05-30 10:10:43,251 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Erasure Coding Policies: ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5], State=DISABLED ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2], State=DISABLED ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1], State=ENABLED ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=3], State=DISABLED ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4], State=DISABLED
上面其中 RS-6-3-1024k 策略是开启的,后面的state=enabled
2. 查看目录或者文件支持的EC策略(新建的目录或者文件不会指定策略)
hdfs ec -getPolicy -path /test.txt 2023-05-30 10:36:06,524 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable The erasure coding policy of /test.txt is unspecified
这个文件还没有设置策略(未说明的)
3. 设置ec策略与更换策略到对应目录与文件中
关闭对应的策略 hdfs ec -disablePolicy -policy RS-6-3-1024k 2023-05-30 11:37:28,365 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Erasure coding policy RS-6-3-1024k is disabled
[root@worker1 ~]# hdfs ec -listPolicies 2023-05-30 11:37:37,981 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Erasure Coding Policies: ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5], State=DISABLED ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2], State=DISABLED ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1], State=DISABLED ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=3], State=DISABLED ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4], State=DISABLED
开启对应策略 [root@worker1 ~]# hdfs ec -enablePolicy -policy RS-3-2-1024k 2023-05-30 11:40:18,905 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Erasure coding policy RS-3-2-1024k is enabled [root@worker1 ~]# hdfs ec -listPolicies 2023-05-30 11:40:25,288 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Erasure Coding Policies: ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5], State=DISABLED ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2], State=ENABLED ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1], State=DISABLED ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=3], State=DISABLED ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4], State=DISABLED
给不是EC的文件设置策略会抛异常(RemoteException: Attempt to set an erasure coding policy for a file /test.txt)
给目录设置EC策略不会抛异常,但是已经存在目录下的文件不会被转为EC编码(Warning: setting erasure coding policy on a non-empty directory will not automatically convert existing files to RS-3-2-1024k erasure coding policy),设置好策略后的目录,新上传的文件就是使用对应的EC策略来编码的
[root@worker1 ~]# hdfs dfs -ls / 2023-05-30 11:42:13,816 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 1 root supergroup 6 2023-05-29 14:50 /test.txt drwxr-xr-x - root supergroup 0 2023-05-30 11:19 /usr [root@worker1 ~]# hdfs ec -getPolicy -path /test.txt 2023-05-30 11:42:39,194 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable The erasure coding policy of /test.txt is unspecified [root@worker1 ~]# hdfs ec -getPolicy -path /usr 2023-05-30 11:42:47,192 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable The erasure coding policy of /usr is unspecified [root@worker1 ~]# hdfs ec -setPolicy -path /usr -policy RS-3-2-1024k 2023-05-30 11:43:43,737 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Set RS-3-2-1024k erasure coding policy on /usr Warning: setting erasure coding policy on a non-empty directory will not automatically convert existing files to RS-3-2-1024k erasure coding policy [root@worker1 ~]# hdfs ec -setPolicy -path /test.txt -policy RS-3-2-1024k 2023-05-30 11:44:13,565 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable RemoteException: Attempt to set an erasure coding policy for a file /test.txt [root@worker1 ~]# hdfs ec -getPolicy -path /usr 2023-05-30 11:44:35,958 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable RS-3-2-1024k [root@worker1 ~]# hdfs ec -getPolicy -path /test.txt 2023-05-30 11:44:46,720 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable The erasure coding policy of /test.txt is unspecified [root@worker1 ~]# hdfs dfs -cp /test.txt /usr 2023-05-30 11:45:24,300 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2023-05-30 11:45:25,427 WARN erasurecode.ErasureCodeNative: ISA-L support is not available in your platform... using builtin-java codec where applicable 2023-05-30 11:45:25,521 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2023-05-30 11:45:25,856 WARN hdfs.DFSOutputStream: Cannot allocate parity block(index=3, policy=RS-3-2-1024k). Not enough datanodes? Exclude nodes=[] 2023-05-30 11:45:25,856 WARN hdfs.DFSOutputStream: Cannot allocate parity block(index=4, policy=RS-3-2-1024k). Not enough datanodes? Exclude nodes=[] 2023-05-30 11:45:25,858 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2023-05-30 11:45:26,055 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2023-05-30 11:45:26,512 WARN hdfs.DFSOutputStream: Block group <1> failed to write 2 blocks. It's at high risk of losing data. [root@worker1 ~]# hdfs ec -getPolicy -path /usr/test.txt 2023-05-30 11:45:38,334 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable RS-3-2-1024k [root@worker1 ~]#
4. 查看EC编码的文件的信息
hdfs fsck /usr/test.txt -files -blocks -locations 2023-05-30 11:53:31,626 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Connecting to namenode via http://master:50070/fsck?ugi=root&files=1&blocks=1&locations=1&path=%2Fusr%2Ftest.txt FSCK started by root (auth:SIMPLE) from /172.16.20.239 for path /usr/test.txt at Tue May 30 11:53:32 CST 2023 /usr/test.txt 6 bytes, erasure-coded: policy=RS-3-2-1024k, 1 block(s): OK 0. BP-132737199-172.16.20.156-1685330616691:blk_-9223372036854775792_1003 len=6 Live_repl=3
[blk_-9223372036854775792:DatanodeInfoWithStorage[172.16.20.239:9866,DS-909e37f9-2ba2-4c24-b777-367cd8c16c72,DISK],
blk_-9223372036854775789:DatanodeInfoWithStorage[172.16.20.193:9866,DS-e8d953b7-f21e-41f1-8fd3-a273cd3d49a1,DISK],
blk_-9223372036854775788:DatanodeInfoWithStorage[172.16.20.156:9866,DS-105ef4d2-e454-4acc-bbc7-1cc49ed7bfa5,DISK]
] Status: HEALTHY Number of data-nodes: 3 Number of racks: 1 Total dirs: 0 Total symlinks: 0 Replicated Blocks: Total size: 0 B Total files: 0 Total blocks (validated): 0 Minimally replicated blocks: 0 Over-replicated blocks: 0 Under-replicated blocks: 0 Mis-replicated blocks: 0 Default replication factor: 1 Average block replication: 0.0 Missing blocks: 0 Corrupt blocks: 0 Missing replicas: 0 Erasure Coded Block Groups: Total size: 6 B Total files: 1 Total block groups (validated): 1 (avg. block group size 6 B) Minimally erasure-coded block groups: 1 (100.0 %) Over-erasure-coded block groups: 0 (0.0 %) Under-erasure-coded block groups: 0 (0.0 %) Unsatisfactory placement block groups: 0 (0.0 %) Average block group size: 3.0 Missing block groups: 0 Corrupt block groups: 0 Missing internal blocks: 0 (0.0 %) FSCK ended at Tue May 30 11:53:32 CST 2023 in 2 milliseconds The filesystem under path '/usr/test.txt' is HEALTHY
上面 0. BP-132737199-172.16.20.156-1685330616691:blk_-9223372036854775792_1003 len=6 Live_repl=3 表示就是实际的数据块,其余2个为校验块
总结:使用RS-3-2-1024k策略时,一次分割最小单位1024k,为1M
1. 如果不够1M,不分割,存一块,校验块与数据块大小一样.
2. 如果够分割,则按1M大小均匀分割成指定的数据块。如大于3M,每块会均匀分割(总的就3块),不足1M直接存放一块数据块.