Cloudera Certified Associate Administrator案例之Manage篇
Cloudera Certified Associate Administrator案例之Manage篇
作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
一.下载Namenode镜像文件
问题描述:
公司集群的Namenode今天发生了故障,你想通过分析Fsimage文件来排查问题。你需要下载最新的fsimage文件,命名为"timestamp_xxx",其中xxx为以秒为单位的Unix时间戳,代表你操作时的当前时间,并上传到HDFS的/yinzhengjie/debug/hdfs/log/目录下。
解决方案:
这里涉及到hdfs命令的dfsadmin,dfs指令,以及基本Linux命令的使用。这些知识我们尽量不要查官方文档或者简单看一下命令的help输出就能操作。
1>.下载镜像文件
[root@node101.yinzhengjie.org.cn ~]# ll total 0 [root@node101.yinzhengjie.org.cn ~]# [root@node101.yinzhengjie.org.cn ~]# hdfs dfsadmin -fetchImage ./ #你得确保HDFS集群时正常运行的,否则下载会失败哟~ 19/06/15 15:27:57 INFO namenode.TransferFsImage: Opening connection to http://node101.yinzhengjie.org.cn:50070/imagetransfer?getimage=1&txid=latest 19/06/15 15:27:57 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds 19/06/15 15:27:57 INFO namenode.TransferFsImage: Transfer took 0.02s at 3263.16 KB/s [root@node101.yinzhengjie.org.cn ~]# [root@node101.yinzhengjie.org.cn ~]# ll total 64 -rw-r--r-- 1 root root 64384 Jun 15 15:27 fsimage_0000000000000004578 [root@node101.yinzhengjie.org.cn ~]#
2>.将镜像文件进行重命名操作
[root@node101.yinzhengjie.org.cn ~]# ll total 64 -rw-r--r-- 1 root root 64384 Jun 15 15:27 fsimage_0000000000000004578 [root@node101.yinzhengjie.org.cn ~]# [root@node101.yinzhengjie.org.cn ~]# mv fsimage_0000000000000004578 timestamp_`date +%s` [root@node101.yinzhengjie.org.cn ~]# [root@node101.yinzhengjie.org.cn ~]# ll total 64 -rw-r--r-- 1 root root 64384 Jun 15 15:27 timestamp_1560583829 [root@node101.yinzhengjie.org.cn ~]# [root@node101.yinzhengjie.org.cn ~]#
3>.如果不存在目录就得手动创建hdfs上的路径
[root@node101.yinzhengjie.org.cn ~]# su hdfs #由于HDFS默认开启了sample认证功能,因此我们要切换用户,否则会抛异常"Permission denied" [hdfs@node101.yinzhengjie.org.cn /root]$ [hdfs@node101.yinzhengjie.org.cn /root]$ hdfs dfs -mkdir -p /yinzhengjie/debug/hdfs/log [hdfs@node101.yinzhengjie.org.cn /root]$ [hdfs@node101.yinzhengjie.org.cn /root]$ hdfs dfs -chmod 777 /yinzhengjie/debug/hdfs/log/ [hdfs@node101.yinzhengjie.org.cn /root]$ [hdfs@node101.yinzhengjie.org.cn /root]$ exit [root@node101.yinzhengjie.org.cn ~]# [root@node101.yinzhengjie.org.cn ~]#
4>.将日志上传到hdfs上
[root@node101.yinzhengjie.org.cn ~]# ll total 64 -rw-r--r-- 1 root root 64384 Jun 15 15:27 timestamp_1560583829 [root@node101.yinzhengjie.org.cn ~]# [root@node101.yinzhengjie.org.cn ~]# [root@node101.yinzhengjie.org.cn ~]# hdfs dfs -copyFromLocal timestamp_1560583829 /yinzhengjie/debug/hdfs/log/ [root@node101.yinzhengjie.org.cn ~]# [root@node101.yinzhengjie.org.cn ~]# hdfs dfs -ls /yinzhengjie/debug/hdfs/log/ Found 1 items -rw-r--r-- 3 root supergroup 64384 2019-06-15 15:35 /yinzhengjie/debug/hdfs/log/timestamp_1560583829 [root@node101.yinzhengjie.org.cn ~]#
二.手动均衡DataNode数据
问题描述:
公司的集群新扩充了一批工作节点,但是新的工作节点上没有数据,造成整个集群数据分布不均衡。
你知道HDFS的balancer功能可以解决这个问题。请将balancer操作占用的带宽限制为1G以内,并以阈值5启动balancer操作。
解决方案:
如果面试官问你这个问题那基本上就是送分题,我们只需要执行balancer即可。
1>.点击"HDFS"
2>.点击配置,搜索关键字"dfs.datanode.balance.bandwidth"
3>.将每个 DataNode 可用于平衡的最大带宽为1GB
4>.搜索关键字"重新平衡阈值"(或搜索英文"Threshold")
5>.修改重新平衡阈值为5
三.调小HDFS的副本数(将副本数为3的改为副本数为2)
问题描述:
你发现公司集群的HDFS集群总容量使用已经超过了80%,使用了默认的三个副本,现在想要将某个目录较大的文件副本数从3个副本改为2个副本,从而节省一定的容量。
解决方案:
如果遇到面试官问你这样的问题,那么恭喜你又是一道送分题。
1>.上传文件到HDFS集群中
[hdfs@node101.yinzhengjie.org.cn ~]$ [hdfs@node101.yinzhengjie.org.cn ~]$ ll /yinzhengjie/softwares/jdk1.8.0_201/ total 25980 drwxr-xr-x 2 10 143 4096 Dec 16 03:45 bin -r--r--r-- 1 10 143 3244 Dec 16 03:45 COPYRIGHT drwxr-xr-x 3 10 143 132 Dec 16 03:45 include -rw-r--r-- 1 10 143 5207434 Dec 12 2018 javafx-src.zip drwxr-xr-x 5 10 143 185 Dec 16 03:45 jre drwxr-xr-x 5 10 143 245 Dec 16 03:45 lib -r--r--r-- 1 10 143 40 Dec 16 03:45 LICENSE drwxr-xr-x 4 10 143 47 Dec 16 03:45 man -r--r--r-- 1 10 143 159 Dec 16 03:45 README.html -rw-r--r-- 1 10 143 424 Dec 16 03:45 release -rw-r--r-- 1 10 143 21103945 Dec 16 03:45 src.zip -rw-r--r-- 1 10 143 108109 Dec 12 2018 THIRDPARTYLICENSEREADME-JAVAFX.txt -r--r--r-- 1 10 143 155002 Dec 16 03:45 THIRDPARTYLICENSEREADME.txt [hdfs@node101.yinzhengjie.org.cn ~]$ [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -mkdir /yinzhengjie/data [hdfs@node101.yinzhengjie.org.cn ~]$ [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -put /yinzhengjie/softwares/jdk1.8.0_201/* /yinzhengjie/data/ [hdfs@node101.yinzhengjie.org.cn ~]$
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/ Found 13 items -rw-r--r-- 3 hdfs supergroup 3.2 K 2019-06-15 18:10 /yinzhengjie/data/COPYRIGHT -rw-r--r-- 3 hdfs supergroup 40 2019-06-15 18:11 /yinzhengjie/data/LICENSE -rw-r--r-- 3 hdfs supergroup 159 2019-06-15 18:11 /yinzhengjie/data/README.html -rw-r--r-- 3 hdfs supergroup 105.6 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME-JAVAFX.txt -rw-r--r-- 3 hdfs supergroup 151.4 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME.txt drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:10 /yinzhengjie/data/bin drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:10 /yinzhengjie/data/include -rw-r--r-- 3 hdfs supergroup 5.0 M 2019-06-15 18:10 /yinzhengjie/data/javafx-src.zip drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:10 /yinzhengjie/data/jre drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:11 /yinzhengjie/data/lib drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:11 /yinzhengjie/data/man -rw-r--r-- 3 hdfs supergroup 424 2019-06-15 18:11 /yinzhengjie/data/release -rw-r--r-- 3 hdfs supergroup 20.1 M 2019-06-15 18:11 /yinzhengjie/data/src.zip [hdfs@node101.yinzhengjie.org.cn ~]$ [hdfs@node101.yinzhengjie.org.cn ~]$
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/ Connecting to namenode via http://node101.yinzhengjie.org.cn:50070/fsck?ugi=hdfs&path=%2Fyinzhengjie%2Fdata FSCK started by hdfs (auth:SIMPLE) from /172.30.1.101 for path /yinzhengjie/data at Sat Jun 15 18:20:48 CST 2019 .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... ...................................Status: HEALTHY Total size: 397764951 B Total dirs: 205 Total files: 1635 Total symlinks: 0 Total blocks (validated): 1614 (avg. block size 246446 B) Minimally replicated blocks: 1614 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 #很显然,当前目录的文件副本书为3 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 4 Number of racks: 1 FSCK ended at Sat Jun 15 18:20:48 CST 2019 in 78 milliseconds The filesystem under path '/yinzhengjie/data' is HEALTHY [hdfs@node101.yinzhengjie.org.cn ~]$
2>.将HDFS一个目录的文件副本数改为2
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -setrep 2 -R -w /yinzhengjie/data/ ...... Replication 2 set: /yinzhengjie/data/man/man1/javadoc.1 Replication 2 set: /yinzhengjie/data/man/man1/javafxpackager.1 Replication 2 set: /yinzhengjie/data/man/man1/javah.1 Replication 2 set: /yinzhengjie/data/man/man1/javap.1 Replication 2 set: /yinzhengjie/data/man/man1/javapackager.1 Replication 2 set: /yinzhengjie/data/man/man1/javaws.1 Replication 2 set: /yinzhengjie/data/man/man1/jcmd.1 Replication 2 set: /yinzhengjie/data/man/man1/jconsole.1 Replication 2 set: /yinzhengjie/data/man/man1/jdb.1 Replication 2 set: /yinzhengjie/data/man/man1/jdeps.1 Replication 2 set: /yinzhengjie/data/man/man1/jhat.1 Replication 2 set: /yinzhengjie/data/man/man1/jinfo.1 Replication 2 set: /yinzhengjie/data/man/man1/jjs.1 Replication 2 set: /yinzhengjie/data/man/man1/jmap.1 Replication 2 set: /yinzhengjie/data/man/man1/jmc.1 Replication 2 set: /yinzhengjie/data/man/man1/jps.1 Replication 2 set: /yinzhengjie/data/man/man1/jrunscript.1 Replication 2 set: /yinzhengjie/data/man/man1/jsadebugd.1 Replication 2 set: /yinzhengjie/data/man/man1/jstack.1 Replication 2 set: /yinzhengjie/data/man/man1/jstat.1 Replication 2 set: /yinzhengjie/data/man/man1/jstatd.1 Replication 2 set: /yinzhengjie/data/man/man1/jvisualvm.1 Replication 2 set: /yinzhengjie/data/man/man1/keytool.1 Replication 2 set: /yinzhengjie/data/man/man1/native2ascii.1 Replication 2 set: /yinzhengjie/data/man/man1/orbd.1 Replication 2 set: /yinzhengjie/data/man/man1/pack200.1 Replication 2 set: /yinzhengjie/data/man/man1/policytool.1 Replication 2 set: /yinzhengjie/data/man/man1/rmic.1 Replication 2 set: /yinzhengjie/data/man/man1/rmid.1 Replication 2 set: /yinzhengjie/data/man/man1/rmiregistry.1 Replication 2 set: /yinzhengjie/data/man/man1/schemagen.1 Replication 2 set: /yinzhengjie/data/man/man1/serialver.1 Replication 2 set: /yinzhengjie/data/man/man1/servertool.1 Replication 2 set: /yinzhengjie/data/man/man1/tnameserv.1 Replication 2 set: /yinzhengjie/data/man/man1/unpack200.1 Replication 2 set: /yinzhengjie/data/man/man1/wsgen.1 Replication 2 set: /yinzhengjie/data/man/man1/wsimport.1 Replication 2 set: /yinzhengjie/data/man/man1/xjc.1 Replication 2 set: /yinzhengjie/data/release Replication 2 set: /yinzhengjie/data/src.zip [hdfs@node101.yinzhengjie.org.cn ~]$
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/ Found 13 items -rw-r--r-- 2 hdfs supergroup 3.2 K 2019-06-15 18:10 /yinzhengjie/data/COPYRIGHT -rw-r--r-- 2 hdfs supergroup 40 2019-06-15 18:11 /yinzhengjie/data/LICENSE -rw-r--r-- 2 hdfs supergroup 159 2019-06-15 18:11 /yinzhengjie/data/README.html -rw-r--r-- 2 hdfs supergroup 105.6 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME-JAVAFX.txt -rw-r--r-- 2 hdfs supergroup 151.4 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME.txt drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:10 /yinzhengjie/data/bin drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:10 /yinzhengjie/data/include -rw-r--r-- 2 hdfs supergroup 5.0 M 2019-06-15 18:10 /yinzhengjie/data/javafx-src.zip drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:10 /yinzhengjie/data/jre drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:11 /yinzhengjie/data/lib drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:11 /yinzhengjie/data/man -rw-r--r-- 2 hdfs supergroup 424 2019-06-15 18:11 /yinzhengjie/data/release -rw-r--r-- 2 hdfs supergroup 20.1 M 2019-06-15 18:11 /yinzhengjie/data/src.zip [hdfs@node101.yinzhengjie.org.cn ~]$
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/ Connecting to namenode via http://node101.yinzhengjie.org.cn:50070/fsck?ugi=hdfs&path=%2Fyinzhengjie%2Fdata FSCK started by hdfs (auth:SIMPLE) from /172.30.1.101 for path /yinzhengjie/data at Sat Jun 15 18:24:03 CST 2019 .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... ...................................Status: HEALTHY Total size: 397764951 B Total dirs: 205 Total files: 1635 Total symlinks: 0 Total blocks (validated): 1614 (avg. block size 246446 B) Minimally replicated blocks: 1614 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.0 #当前集群的副本数为2 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 4 Number of racks: 1 FSCK ended at Sat Jun 15 18:24:03 CST 2019 in 32 milliseconds The filesystem under path '/yinzhengjie/data' is HEALTHY [hdfs@node101.yinzhengjie.org.cn ~]$ [hdfs@node101.yinzhengjie.org.cn ~]$
四.调大HDFS的副本数(将副本数为2的改为副本数为3)
问题描述:
对集群进行例行检查的时候,你发现有个别重要文件的副本数只有两个,而集群默认的副本书参数为3个,并没有修改过。请解决"/yinzhengjie/data/"目录下文件的副本数不足的问题。
解决方案:
HDFS命令的基本用法要熟练掌握,面试的时候如果考察HDFS的命令那几乎就是送分题。
1>.修改目录下所有文件的副本数
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/ Found 13 items -rw-r--r-- 2 hdfs supergroup 3.2 K 2019-06-15 18:10 /yinzhengjie/data/COPYRIGHT -rw-r--r-- 2 hdfs supergroup 40 2019-06-15 18:11 /yinzhengjie/data/LICENSE -rw-r--r-- 2 hdfs supergroup 159 2019-06-15 18:11 /yinzhengjie/data/README.html -rw-r--r-- 2 hdfs supergroup 105.6 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME-JAVAFX.txt -rw-r--r-- 2 hdfs supergroup 151.4 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME.txt drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:10 /yinzhengjie/data/bin drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:10 /yinzhengjie/data/include -rw-r--r-- 2 hdfs supergroup 5.0 M 2019-06-15 18:10 /yinzhengjie/data/javafx-src.zip drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:10 /yinzhengjie/data/jre drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:11 /yinzhengjie/data/lib drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:11 /yinzhengjie/data/man -rw-r--r-- 2 hdfs supergroup 424 2019-06-15 18:11 /yinzhengjie/data/release -rw-r--r-- 2 hdfs supergroup 20.1 M 2019-06-15 18:11 /yinzhengjie/data/src.zip [hdfs@node101.yinzhengjie.org.cn ~]$
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/ Connecting to namenode via http://node101.yinzhengjie.org.cn:50070/fsck?ugi=hdfs&path=%2Fyinzhengjie%2Fdata FSCK started by hdfs (auth:SIMPLE) from /172.30.1.101 for path /yinzhengjie/data at Sat Jun 15 18:24:03 CST 2019 .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... ...................................Status: HEALTHY Total size: 397764951 B Total dirs: 205 Total files: 1635 Total symlinks: 0 Total blocks (validated): 1614 (avg. block size 246446 B) Minimally replicated blocks: 1614 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.0 #当前副本数为2 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 4 Number of racks: 1 FSCK ended at Sat Jun 15 18:24:03 CST 2019 in 32 milliseconds The filesystem under path '/yinzhengjie/data' is HEALTHY [hdfs@node101.yinzhengjie.org.cn ~]$
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -setrep 3 /yinzhengjie/data/ ...... Replication 3 set: /yinzhengjie/data/man/man1/javap.1 Replication 3 set: /yinzhengjie/data/man/man1/javapackager.1 Replication 3 set: /yinzhengjie/data/man/man1/javaws.1 Replication 3 set: /yinzhengjie/data/man/man1/jcmd.1 Replication 3 set: /yinzhengjie/data/man/man1/jconsole.1 Replication 3 set: /yinzhengjie/data/man/man1/jdb.1 Replication 3 set: /yinzhengjie/data/man/man1/jdeps.1 Replication 3 set: /yinzhengjie/data/man/man1/jhat.1 Replication 3 set: /yinzhengjie/data/man/man1/jinfo.1 Replication 3 set: /yinzhengjie/data/man/man1/jjs.1 Replication 3 set: /yinzhengjie/data/man/man1/jmap.1 Replication 3 set: /yinzhengjie/data/man/man1/jmc.1 Replication 3 set: /yinzhengjie/data/man/man1/jps.1 Replication 3 set: /yinzhengjie/data/man/man1/jrunscript.1 Replication 3 set: /yinzhengjie/data/man/man1/jsadebugd.1 Replication 3 set: /yinzhengjie/data/man/man1/jstack.1 Replication 3 set: /yinzhengjie/data/man/man1/jstat.1 Replication 3 set: /yinzhengjie/data/man/man1/jstatd.1 Replication 3 set: /yinzhengjie/data/man/man1/jvisualvm.1 Replication 3 set: /yinzhengjie/data/man/man1/keytool.1 Replication 3 set: /yinzhengjie/data/man/man1/native2ascii.1 Replication 3 set: /yinzhengjie/data/man/man1/orbd.1 Replication 3 set: /yinzhengjie/data/man/man1/pack200.1 Replication 3 set: /yinzhengjie/data/man/man1/policytool.1 Replication 3 set: /yinzhengjie/data/man/man1/rmic.1 Replication 3 set: /yinzhengjie/data/man/man1/rmid.1 Replication 3 set: /yinzhengjie/data/man/man1/rmiregistry.1 Replication 3 set: /yinzhengjie/data/man/man1/schemagen.1 Replication 3 set: /yinzhengjie/data/man/man1/serialver.1 Replication 3 set: /yinzhengjie/data/man/man1/servertool.1 Replication 3 set: /yinzhengjie/data/man/man1/tnameserv.1 Replication 3 set: /yinzhengjie/data/man/man1/unpack200.1 Replication 3 set: /yinzhengjie/data/man/man1/wsgen.1 Replication 3 set: /yinzhengjie/data/man/man1/wsimport.1 Replication 3 set: /yinzhengjie/data/man/man1/xjc.1 Replication 3 set: /yinzhengjie/data/release Replication 3 set: /yinzhengjie/data/src.zip [hdfs@node101.yinzhengjie.org.cn ~]$
2>.验证是否副本数是否修改成功
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/ Found 13 items -rw-r--r-- 3 hdfs supergroup 3.2 K 2019-06-15 18:10 /yinzhengjie/data/COPYRIGHT -rw-r--r-- 3 hdfs supergroup 40 2019-06-15 18:11 /yinzhengjie/data/LICENSE -rw-r--r-- 3 hdfs supergroup 159 2019-06-15 18:11 /yinzhengjie/data/README.html -rw-r--r-- 3 hdfs supergroup 105.6 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME-JAVAFX.txt -rw-r--r-- 3 hdfs supergroup 151.4 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME.txt drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:10 /yinzhengjie/data/bin drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:10 /yinzhengjie/data/include -rw-r--r-- 3 hdfs supergroup 5.0 M 2019-06-15 18:10 /yinzhengjie/data/javafx-src.zip drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:10 /yinzhengjie/data/jre drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:11 /yinzhengjie/data/lib drwxr-xr-x - hdfs supergroup 0 2019-06-15 18:11 /yinzhengjie/data/man -rw-r--r-- 3 hdfs supergroup 424 2019-06-15 18:11 /yinzhengjie/data/release -rw-r--r-- 3 hdfs supergroup 20.1 M 2019-06-15 18:11 /yinzhengjie/data/src.zip [hdfs@node101.yinzhengjie.org.cn ~]$
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/ Connecting to namenode via http://node101.yinzhengjie.org.cn:50070/fsck?ugi=hdfs&path=%2Fyinzhengjie%2Fdata FSCK started by hdfs (auth:SIMPLE) from /172.30.1.101 for path /yinzhengjie/data at Sat Jun 15 18:37:24 CST 2019 .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... ...................................Status: HEALTHY Total size: 397764951 B Total dirs: 205 Total files: 1635 Total symlinks: 0 Total blocks (validated): 1614 (avg. block size 246446 B) Minimally replicated blocks: 1614 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 #当前集群的副本数为3 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 4 Number of racks: 1 FSCK ended at Sat Jun 15 18:37:24 CST 2019 in 17 milliseconds The filesystem under path '/yinzhengjie/data' is HEALTHY [hdfs@node101.yinzhengjie.org.cn ~]$
五.将HDFS一个文件以指定的块大小复制到另一个目录
问题描述:
你发现集群中一些大文件的块大小为64MB,导致MapReduce作业使用这些文件时,默认会产生较多的map数量,造成资源浪费。
你决定将这些文件以128MB的块大小备份到另一个目录中。请将"/yinzhengjie/data/input"下的文件以128MB的块大小备份到"/yinzhengjie/data/output"下。
解决方案:
这道题主要考察对HDFS的理解,HDFS文件的块大小处理集群默认配置外,还可以针对每个文件单独设置,但一旦设定后就不能修改,只能重新拷贝一份。
1>.将HDFS一个文件以64MB的块大小复制到另一个目录
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -mkdir -p /yinzhengjie/data/input [hdfs@node101.yinzhengjie.org.cn ~]$ [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls /yinzhengjie/debug/hdfs/log Found 1 items -rw-r--r-- 3 root supergroup 64384 2019-06-15 16:37 /yinzhengjie/debug/hdfs/log/timestamp_1560583829 [hdfs@node101.yinzhengjie.org.cn ~]$ [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -Ddfs.block.size=67108864 -cp /yinzhengjie/debug/hdfs/log/timestamp_1560583829 /yinzhengjie/data/input [hdfs@node101.yinzhengjie.org.cn ~]$ [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls /yinzhengjie/data/input Found 1 items -rw-r--r-- 3 hdfs supergroup 64384 2019-06-15 18:44 /yinzhengjie/data/input/timestamp_1560583829 [hdfs@node101.yinzhengjie.org.cn ~]$
2>.确认集群默认的块大小(如下图所示,默认的块大小已经时256MB啦,因此备份时需要指定块大小的参数,如果默认值时128MB咱们就不用指定块大小的参数啦)
3>.创建备份目录,并将数据拷贝至该目录
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -mkdir /yinzhengjie/data/output [hdfs@node101.yinzhengjie.org.cn ~]$ [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -Ddfs.block.size=134217728 -cp /yinzhengjie/data/input/timestamp_1560583829 /yinzhengjie/data/output [hdfs@node101.yinzhengjie.org.cn ~]$ [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls /yinzhengjie/data/input Found 1 items -rw-r--r-- 3 hdfs supergroup 64384 2019-06-15 18:44 /yinzhengjie/data/input/timestamp_1560583829 [hdfs@node101.yinzhengjie.org.cn ~]$ [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls /yinzhengjie/data/output Found 1 items -rw-r--r-- 3 hdfs supergroup 64384 2019-06-15 18:59 /yinzhengjie/data/output/timestamp_1560583829 [hdfs@node101.yinzhengjie.org.cn ~]$ [hdfs@node101.yinzhengjie.org.cn ~]$
本文来自博客园,作者:尹正杰,转载请注明原文链接:https://www.cnblogs.com/yinzhengjie/p/10995701.html,个人微信: "JasonYin2020"(添加时请备注来源及意图备注,有偿付费)
当你的才华还撑不起你的野心的时候,你就应该静下心来学习。当你的能力还驾驭不了你的目标的时候,你就应该沉下心来历练。问问自己,想要怎样的人生。