管理Hadoop的配额
管理Hadoop的配额
作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
一.Hadoop的配额概述
可以在HDFS目录上配置配额,由此可以限制用户或应用程序消耗的HDFS空间。
HDFS的空间分配与底层Linux文件系统上的空间分配没有直接关系。
Hadoop允许设置两种类型的配额,即空间配额和名称配额。
名称配额:
指定根目录树中的文件和目录的最大数量。
空间配额:
为单个目录使用的空间设置上限。
温馨提示:
如果创建了用户的主(家)目录但未向用户授予名称配额或空间配额,则用户在HDFS中具有无限存储空间,这是很不好的操作。
名称配额和空间配额不是特定于用户的,而是特定于目录的。
二.管理名称配额
可以通过指定名称配额来限制任何目录中的文件数和目录数。如果用户尝试创建超出指定配额的文件或目录,则文件或目录将创建失败。 我们可以通过下面的命令检查配额信息(此时我们还为给HDFS配置任何配额): [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -q -v -h /user/root #使用"-q"选项可以查看到空间配额和名称配额相关信息哟~ QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME none inf none inf 15 18 374.0 M /user/root [root@hadoop101.yinzhengjie.com ~]# 相关术语解释如下: QUOTA: 名称配额相关信息,即文件和目录的限制。 REM_QUOTA: 此用户可以创建的配额中剩余文件和目录数。 SPACE_QUOTA: 授予此用户的空间配额。 REM_SPACE_QUOTA: 此用户剩余空间配额。 DIR_COUNT: 目录数。 FILE_COUNT: 文件数。 CONTENT_SIZE: 文件大小 PATHNAME: 路径名称。
1>.为HDFS设置名称配额
[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -help setQuota #使用"dfsadmin -setQuota"命令设置目录的HDFS名称配额 -setQuota <quota> <dirname>...<dirname>: Set the quota <quota> for each directory <dirName>. The directory quota is a long integer that puts a hard limit on the number of names in the directory tree For each directory, attempt to set the quota. An error will be reported if 1. quota is not a positive integer, or 2. User is not an administrator, or 3. The directory does not exist or is a file. Note: A quota of 1 would force the directory to remain empty. [root@hadoop101.yinzhengjie.com ~]#
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -ls -R /user/root drwx------ - root admingroup 0 2020-08-15 08:00 /user/root/.Trash drwx------ - root admingroup 0 2020-08-14 19:32 /user/root/.Trash/200814193733 -rw-r--r-- 3 root admingroup 490 2020-08-14 19:31 /user/root/.Trash/200814193733/fstab -rw-r--r-- 3 root admingroup 10779 2020-08-14 19:32 /user/root/.Trash/200814193733/sysctl.conf drwxr-xr-x - root admingroup 0 2020-08-14 19:04 /user/root/.Trash/200814193733/test2 drwxr-xr-x - root admingroup 0 2020-08-14 19:04 /user/root/.Trash/200814193733/test2/sub1 drwxr-xr-x - root admingroup 0 2020-08-14 19:04 /user/root/.Trash/200814193733/test2/sub1/sub2 drwx------ - root admingroup 0 2020-08-14 19:21 /user/root/.Trash/200814193733/yinzhengjie drwx------ - root admingroup 0 2020-08-15 00:04 /user/root/.Trash/200815080000 -rw-r--r-- 3 root admingroup 0 2020-08-14 22:47 /user/root/.Trash/200815080000/a.txt -rw-r--r-- 3 root admingroup 392115733 2020-08-14 23:25 /user/root/.Trash/200815080000/hadoop-2.10.0.tar.gz -rw-r--r-- 3 root admingroup 0 2020-08-14 22:58 /user/root/.Trash/200815080000/hdfs2020.log -rw-r--r-- 3 root admingroup 26 2020-08-14 23:42 /user/root/.Trash/200815080000/hostname -rw-r--r-- 3 root admingroup 371 2020-08-14 23:49 /user/root/.Trash/200815080000/hosts2020 -rw-r--r-- 3 root admingroup 69 2020-08-14 23:14 /user/root/.Trash/200815080000/wc.txt.gz drwx-w-r-x - jason admingroup 0 2020-08-14 21:46 /user/root/.Trash/200815080000/yinzhengjie drwx-w-r-x - jason admingroup 0 2020-08-14 07:07 /user/root/.Trash/200815080000/yinzhengjie/data drwx-w-r-x - jason admingroup 0 2020-08-14 07:07 /user/root/.Trash/200815080000/yinzhengjie/data/hadoop drwx-w-r-x - jason admingroup 0 2020-08-14 07:07 /user/root/.Trash/200815080000/yinzhengjie/data/hadoop/hdfs drwx-w-r-x - jason admingroup 0 2020-08-14 21:46 /user/root/.Trash/200815080000/yinzhengjie/softwares drwxr-xr-x - root admingroup 0 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020 -rw-r--r-- 3 root admingroup 69 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/wc.txt.gz drwxr-xr-x - root admingroup 0 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d -rw-r--r-- 3 root admingroup 1664 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/CentOS-Base.repo -rw-r--r-- 3 root admingroup 1309 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/CentOS-CR.repo -rw-r--r-- 3 root admingroup 649 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/CentOS-Debuginfo.repo -rw-r--r-- 3 root admingroup 630 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/CentOS-Media.repo -rw-r--r-- 3 root admingroup 1331 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/CentOS-Sources.repo -rw-r--r-- 3 root admingroup 5701 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/CentOS-Vault.repo -rw-r--r-- 3 root admingroup 314 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/CentOS-fasttrack.repo -rw-r--r-- 3 root admingroup 1050 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/epel-testing.repo -rw-r--r-- 3 root admingroup 951 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/epel.repo [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -ls -R /user/root | wc -l #"/user/root"下有32个文件,包含"/user/root"目录共计33个目录和文件 32 [root@hadoop101.yinzhengjie.com ~]#
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -q -v -h /user/root QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME none inf none inf 15 18 374.0 M /user/root [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -setQuota 35 /user/root #我们为"/user/root"目录设置名称配额大小为35,注意观察"count"命令的统计信息哟~ [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -q -v -h /user/root QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 35 2 none inf 15 18 374.0 M /user/root [root@hadoop101.yinzhengjie.com ~]#
2>.验证名称配额是否生效
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -q -v -h /user/root QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 35 2 none inf 15 18 374.0 M /user/root [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -ls /user/root #接下来我们在已配置的名称配额的目录下创建文件和目录进行验证是否立即生效。 Found 1 items drwx------ - root admingroup 0 2020-08-15 08:00 /user/root/.Trash [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -touchz /user/root/a.txt [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -mkdir /user/root/test [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -q -v -h /user/root QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 35 0 none inf 16 19 374.0 M /user/root [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -touchz /user/root/b.txt touchz: The NameSpace quota (directories and files) of directory /user/root is exceeded: quota=35 file count=36 [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -mkdir /user/root/test02 mkdir: The NameSpace quota (directories and files) of directory /user/root is exceeded: quota=35 file count=36 [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -ls /user/root Found 3 items drwx------ - root admingroup 0 2020-08-15 08:00 /user/root/.Trash -rw-r--r-- 3 root admingroup 0 2020-08-19 18:32 /user/root/a.txt drwxr-xr-x - root admingroup 0 2020-08-19 18:32 /user/root/test [root@hadoop101.yinzhengjie.com ~]#
3>.清除当前名称配额
[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -help clrQuota #使用"dfsadmin clrQuota"命令可以清除当前的名称配额 -clrQuota <dirname>...<dirname>: Clear the quota for each directory <dirName>. For each directory, attempt to clear the quota. An error will be reported if 1. the directory does not exist or is a file, or 2. user is not an administrator. It does not fault if the directory has no quota. [root@hadoop101.yinzhengjie.com ~]#
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -q -v -h /user/root QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 35 0 none inf 16 19 374.0 M /user/root [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -clrQuota /user/root #使用该命令成功清除名称配额 [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -q -v -h /user/root QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME none inf none inf 16 19 374.0 M /user/root [root@hadoop101.yinzhengjie.com ~]#
三.管理空间配额
可以对分配给HDFS下特定目录的存储设置限制,此配额是目录中所有文件可以使用的字节数。一旦目录用完其分配的空间配额,用户和应用程序将无法在目录中创建文件。
空间配额对HDFS目录树中的所有文件可以使用的磁盘空间设置硬性限制。可以通过设置用户的主目录或用户与其它用户共享的其它目录来限制用户的空间消耗。如果不在目录上设置空间配额,则意味着该目录的磁盘空间配额不受限制,它可以使用整个HDFS。
在配置空间配额时,重要的是要理解,在HDFS中,必须有足够的空间配额来容纳整个块。如果用户在分配的配额中有200MB的空闲空间,先不论副本因子等因素,不管你要存储的文件大小如何,如果HDFS块大小大于200MB(如256MB),则无法创建新文件。
温馨提示:
空间配额包括所有复制的数据。如果用户设置了30GB的配额,则该用户可以通过在其HDFS目录中存储10GB的实际数据(使用默认复制因子3,HDFS存储10GB x 3 = 30GB的数据)来消耗配额。
1>.为HDFS设置空间配额
[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -help setSpaceQuota -setSpaceQuota <quota> [-storageType <storagetype>] <dirname>...<dirname>: Set the space quota <quota> for each directory <dirName>. The space quota is a long integer that puts a hard limit on the total size of all the files under the directory tree. The extra space required for replication is also counted. E.g. a 1GB file with replication of 3 consumes 3GB of the quota. Quota can also be specified with a binary prefix for terabytes, petabytes etc (e.g. 50t is 50TB, 5m is 5MB, 3p is 3PB). For each directory, attempt to set the quota. An error will be reported if 1. quota is not a positive integer or zero, or 2. user is not an administrator, or 3. the directory does not exist or is a file. The storage type specific quota is set when -storageType option is specified. Available storageTypes are - RAM_DISK - DISK - SSD - ARCHIVE [root@hadoop101.yinzhengjie.com ~]#
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /user/root #注意观察空间配额信息 QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME none inf none inf 16 19 374.0 M /user/root [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -setSpaceQuota 2g /user/root #此处我仅为"/user/root"目录设置2G的空间配额 [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /user/root QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME none inf 2 G 926.1 M 16 19 374.0 M /user/root [root@hadoop101.yinzhengjie.com ~]#
温馨提示:
(1)为目录设置空间配额其包括副本因子的容量;
(2)我们可以同时为多个目录设置空间配额;
2>.验证空间配额是否生效
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /user/root #观察剩余空间配额的容量为926.1MB QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME none inf 2 G 926.1 M 16 19 374.0 M /user/root [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# ll -h total 375M -rw-r--r-- 1 root root 374M Aug 10 15:42 hadoop-2.10.0.tar.gz -rw------- 1 root root 265K Aug 20 16:49 messages [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -put messages /user/root #请思考为什么上传一个265K的文件会抛出异常说配额不足? put: The DiskSpace quota of /user/root is exceeded: quota = 2147483648 B = 2 GB but diskspace consumed = 2787036144 B = 2.60 GB [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -D dfs.blocksize=32m -put messages /user/root #为什么现在有可以成功上传啦? [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /user/root QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME none inf 2 G 925.3 M 16 20 374.2 M /user/root [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -D dfs.blocksize=128m -put hadoop-2.10.0.tar.gz /user/root #为什么这里指定了块大小依旧上传失败呢? put: The DiskSpace quota of /user/root is exceeded: quota = 2147483648 B = 2 GB but diskspace consumed = 2385196977 B = 2.22 GB [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -D dfs.blocksize=128m -D dfs.replication=1 -put hadoop-2.10.0.tar.gz /user/root #为什么这样配置又上传成功啦? [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /user/root #请思考为什么剩余空间配额还有551.3MB呢? QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME none inf 2 G 551.3 M 16 21 748.2 M /user/root [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -ls -h /user/root Found 5 items drwx------ - root admingroup 0 2020-08-15 08:00 /user/root/.Trash -rw-r--r-- 3 root admingroup 0 2020-08-19 18:32 /user/root/a.txt -rw-r--r-- 1 root admingroup 374.0 M 2020-08-20 17:13 /user/root/hadoop-2.10.0.tar.gz -rw-r--r-- 3 root admingroup 265.0 K 2020-08-20 17:12 /user/root/messages drwxr-xr-x - root admingroup 0 2020-08-19 18:32 /user/root/test [root@hadoop101.yinzhengjie.com ~]#
3>.清除当前空间配额
[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -help clrSpaceQuota -clrSpaceQuota [-storageType <storagetype>] <dirname>...<dirname>: Clear the space quota for each directory <dirName>. For each directory, attempt to clear the quota. An error will be reported if 1. the directory does not exist or is a file, or 2. user is not an administrator. It does not fault if the directory has no quota. The storage type specific quota is cleared when -storageType option is specified. Available storageTypes are - RAM_DISK - DISK - SSD - ARCHIVE [root@hadoop101.yinzhengjie.com ~]#
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /user/root #注意观察空间配额信息 QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME none inf 2 G 551.3 M 16 21 748.2 M /user/root [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -clrSpaceQuota /user/root #清除空间配额 [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /user/root QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME none inf none inf 16 21 748.2 M /user/root [root@hadoop101.yinzhengjie.com ~]# [root@hadoop101.yinzhengjie.com ~]#
当你的才华还撑不起你的野心的时候,你就应该静下心来学习。当你的能力还驾驭不了你的目标的时候,你就应该沉下心来历练。问问自己,想要怎样的人生。 欢迎加入基础架构自动化运维:598432640,大数据SRE进阶之路:959042252,DevOps进阶之路:526991186