Hadoop学习笔记 -伪分布式
伪分布式
ps:写在前面的话 这是个人的学习笔记,仅用于自己归纳,其中可能会有些莫名其妙的东西
准备工作
-
环境
- 设置 IP 和 主机名
- 关闭防火墙和 selinux
- 设置 hosts 映射
- Hadoop
- JAVA环境
- ssh免密登录
-
ssh遇到的问题
-
root账户无法连接:
-
修改
/etc/ssh/sshd_config
文件 -
将
PermitRootLogin Prohibit-password
修改为:PermitRootLogin yes
-
-
重启ssh服务
-
免密登录:
-
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys
-
-
B 包含了 A 的公钥,A 就可以免密登录
搭建Hadoop(配置Hadoop)
-
配置Hadoop环境变量 (hadoop-env.sh 中配置 JAVA_HOME)
-
core-site.xml 中配置localhost为主机名
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node01:9000</value> </property> </configuration>
-
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/var/bigdata/hadoop/local/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/var/bigdata/hadoop/local/dfs/data</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>node01:50090</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>/var/bigdata/hadoop/local/dfs/secondary</value> </property> </configuration>
-
slaves / workers 配置datanode在哪启动
初始化和启动
-
初始化
- hdfs namenode -format
- 创建目录
- 初始化一个空的fsimage
- VERSION: 存放集群ID
- hdfs namenode -format
-
成功标志
2020-10-24 13:24:34,088 INFO common.Storage: Storage directory /var/bigdata/hadoop/local/dfs/name has been successfully formatted. 2020-10-24 13:24:34,217 INFO namenode.FSImageFormatProtobuf: Saving image file /var/bigdata/hadoop/local/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression 2020-10-24 13:24:34,440 INFO namenode.FSImageFormatProtobuf: Image file /var/bigdata/hadoop/local/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 399 bytes saved in 0 seconds .
-
启动 (start-dfs.sh)
-
Starting namenodes on [node01] ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation. Starting datanodes ERROR: Attempting to operate on hdfs datanode as root ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation. Starting secondary namenodes [node01] ERROR: Attempting to operate on hdfs secondarynamenode as root ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
-
解决 : 在hadoop-env.sh 中指定用户名
export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root
-
第一次 start-dfs.sh : datanode和secondary角色会初始化创建自己的数据目录
-
其中的VERSION中的clusterID是相同的
-
Startup Progress
-
进入 name/current 观察 editlog 的 id 是否在 fsimage 的后面
- hdfs dfs : 列出支持的文件操作命令
- hdfs dfs -mkdir /bigdata
- hdfs dfs -mkdir -p /user/root (递归创建目录)
-
SecondaryNameNode 只需要从 NameNode 拷贝最后时点的 FSImage 和增量 EditLog
-
/var/bigdata/hadoop/local/dfs/data/current/BP-1520940053-172.17.0.2-1603517073931/current/finalized/subdir0/subdir0
- 文件被切割成四个块, 每个块伴随着校验和
-
自定义块大小
for i in `seq 100000`;do echo "hello hadoop $i" >> data.txt;done hdfs dfs -D dfs.blocksize=1048576 -put data.txt /var/bigdata/hadoop/local/dfs/data/current/BP-1520940053-172.17.0.2-1603517073931/current/finalized/subdir0/subdir0
检查被切割的块