Docker中搭建Hadoop-2.6单机伪分布式集群
1 获取一个简单的Docker系统镜像,并建立一个容器。
1.1 这里我选择下载CentOS镜像
docker pull centos
1.2 通过docker tag命令将下载的CentOS镜像名称换成centos,然后建立一个简单容器
docker run -it --name=client1 centos /bin/bash
2 Docker容器中下载并安装Java
2.1 JDK下载
去Oracle官网选择要下载的JDK
http://www.oracle.com/technetwork/java/javase/archive-139210.html
选择好版本后,在如下的界面中选择好Linux平台的tar包,点击右键在新标签中打开
然后在新的标签页中复制出下载地址,在wget下载的时候使用:
然后用wget下载,下载的时候要时候加上一个特殊的cookie
wget --no-cookie --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F" download.oracle.com/otn/java/jdk/7u80-b15/jdk-7u80-linux-x64.tar.gz
下载截图:
2.2 JDK安装
选择自己的安装目录,然后解压,做软连接
[root@f795f10ac377 java]# pwd
/usr/local/java
[root@f795f10ac377 java]# tar -zxvf jdk-7u80-linux-x64.tar.gz
[root@f795f10ac377 java]# ln -s jdk1.7.0_80 jdk
[root@f795f10ac377 java]# ls
jdk jdk1.7.0_80
引入环境变量,编辑/etc/profile文件,在末尾加上如下内容:
export JAVA_HOME=/usr/local/java/jdk
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
保存退出后使用source profile ,或者是 . profile 使得profile文件中的内容立即生效。然后可以通过java -version查看java版本:
3 SSH的安装以及配置
3.1 通过yum 安装
yum install openssh-server yum install openssh-clients
3.2 启动sshd,但是出现错误,无法正常启动。
[root@f795f10ac377 hadoop]# /usr/sbin/sshd Could not load host key: /etc/ssh/ssh_host_rsa_key Could not load host key: /etc/ssh/ssh_host_ecdsa_key Could not load host key: /etc/ssh/ssh_host_ed25519_key
解决方法如下:
#ssh-keygen 中的参数 -q 表示以quiet方式执行,也就是不输出执行情况。 -t 表示生成的host key 的类型
[root@f795f10ac377 hadoop]# ssh-keygen -q -t rsa -b 2048 -f /etc/ssh/ssh_host_rsa_key -N '' [root@f795f10ac377 hadoop]# ssh-keygen -q -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key -N '' [root@f795f10ac377 hadoop]# ssh-keygen -t dsa -f /etc/ssh/ssh_host_ed25519_key -N '' Generating public/private dsa key pair. Your identification has been saved in /etc/ssh/ssh_host_ed25519_key. Your public key has been saved in /etc/ssh/ssh_host_ed25519_key.pub. The key fingerprint is: d4:8a:c2:e0:75:cf:fc:2b:46:b2:8a:b4:d9:a2:8b:7a root@f795f10ac377 The key's randomart image is: +--[ DSA 1024]----+ | | | . | | . . . . . | | . + . * . | | . o . S | | .. .. | | . + . | |..E= . o . | |*++.o. . .. | +-----------------+
3.3 设置root密码,并测试登陆到本机。
[root@f795f10ac377 ssh]# /usr/sbin/sshd #开启sshd服务 [root@f795f10ac377 ssh]# netstat -tnulp #查看是否开启成功 Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 158/sshd tcp6 0 0 :::22 :::* LISTEN 158/sshd [root@f795f10ac377 ssh]# passwd root #设置root账户的密码 Changing password for user root. New password: BAD PASSWORD: The password fails the dictionary check - it does not contain enough DIFFERENT characters Retype new password: passwd: all authentication tokens updated successfully. [root@f795f10ac377 ssh]# ssh root@localhost #用ssh登陆到本机 root@localhost's password: [root@f795f10ac377 ~]# ls anaconda-ks.cfg [root@f795f10ac377 ~]# exit #退出登陆 logout Connection to localhost closed.
3.4 设置本机免密码登陆
[root@f795f10ac377 ~]# ssh-keygen #默认以 -t rsa 建立公钥,私钥档案 Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): # 按enter, 使用默认的目录和文件名 Enter passphrase (empty for no passphrase): # 按enter, 使用空密码 Enter same passphrase again: # 按enter, 重复密码 Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: c5:3b:1f:cc:07:d5:6b:e3:12:57:c2:84:a3:2e:97:32 root@f795f10ac377 The key's randomart image is: +--[ RSA 2048]----+ | +o. | | . o.o o| | o... .o| | ..+ o = | | S.o.+ * .| | E +o + . | | = . . | | | | | +-----------------+
这时在~/.ssh/目录下就会出现公钥和私钥
[root@f795f10ac377 .ssh]# pwd /root/.ssh [root@f795f10ac377 .ssh]# ls id_rsa id_rsa.pub
建立authorized_keys文件
[root@f795f10ac377 .ssh]# touch authorized_keys [root@f795f10ac377 .ssh]# cat id_rsa.pub >> authorized_keys [root@f795f10ac377 .ssh]# cat authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8e4J0ogiNuH3SC8/pXKa5gfHZxWY+soo9wOBoYnkq2HCga/p/cmhpy87mO+IZHLdyskH8TylK2/tFaovbnWNDXHE7uH2gToPjbQG0wCOoRWYy0Irz++wmK64eMTsVYF0L4/AEF6l46iYonQ1RT9xCC/BNgcKaiPNnNlu2O5jMw1ZQCJGg5IDT9RGFms/aYw/cblafYRkwF14keULWpGHFQgyiNthFP/1faaWIu9KJqBr9I93FXWE3cD7F05M/EGV0cRlrVnPOUD5oLUS7y+useBm3Cu8IRUy5SvaJ1qoUb78fX1ExhUFcewt4D1K9XNsFGTi6a4Q60RN7jTjHvRm/ root@f795f10ac377
本机测试免密码登陆
[root@f795f10ac377 /]# ssh root@localhost #直接登陆,没有输入密码 [root@f795f10ac377 ~]# exit logout Connection to localhost closed.
4 Hadoop的下载和安装以及配置
4.1 在容器中通过curl命令下载hadoop 2.6 安装包。 下载地址为: http://apache.fayea.com/hadoop/common/hadoop-2.6.0/
curl -o hadoop-2.6.tar.gz http://apache.fayea.com/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
4.2 解压并重命名
tar -zxvf hadoop-2.6.0.tar.gz
mv hadoop-2.6.0 hadoop
4.3 添加Hadoop相关的环境变量
编辑/etc/profile,添加HADOOP_HOME变量,并将HADOOP_HOME中的bin添加到PATH中
export HADOOP_HOME=/usr/local/hadoop
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:$PATH
4.4 配置伪分布式
编辑/usr/local/hadoop/etc/hadoop/core-site.xml 文件:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/data/hadoop/tmp</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
编辑/usr/local/hadoop/etc/hadoop/hdfs-site.xml 文件:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/data/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/data/hadoop/dfs/data</value> </property> </configuration>
4.5 格式化namenode
[root@f795f10ac377 bin]# ./hdfs namenode -format ./hdfs: line 28: which: command not found dirname: missing operand Try 'dirname --help' for more information. /usr/local/hadoop/bin/../libexec/hdfs-config.sh: line 21: which: command not found
虽然格式化成功,但是上边的红体字部分报错,提示which命令没有。如果没有which命令会影响后续的工作,所以需要安装。
[root@f795f10ac377 /]# yum install which
重新进行格式化:
[root@f795f10ac377 bin]# ./hdfs namenode -format 16/08/06 16:12:08 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = f795f10ac377/172.17.0.2 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.6.0 ................................ 16/08/06 16:12:11 INFO common.Storage: Storage directory /data/hadoop/dfs/name has been successfully formatted. #成功格式化 16/08/06 16:12:11 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 16/08/06 16:12:11 INFO util.ExitUtil: Exiting with status 0 16/08/06 16:12:11 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at f795f10ac377/172.17.0.2 ************************************************************/
4.6 开启HDFS,和Yarn
[root@f795f10ac377 hadoop]# start-dfs.sh [root@f795f10ac377 hadoop]# start-yarn.sh
4.7 测试
查看开启的JAVA程序
[root@f795f10ac377 hadoop]# jps 1322 DataNode 1460 SecondaryNameNode 2068 Jps 1692 ResourceManager 1775 NodeManager 1214 NameNode
查看开启的端口
# 如果没有netstat 以及 ifconfig 等网络命令可以通过 yum install net-tools 进行安装
[root@f795f10ac377 hadoop]# netstat -tnulp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1322/java
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 1214/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1460/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1214/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 304/sshd
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1322/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1322/java
tcp6 0 0 :::8032 :::* LISTEN 1692/java
tcp6 0 0 :::37216 :::* LISTEN 1775/java
tcp6 0 0 :::8033 :::* LISTEN 1692/java
tcp6 0 0 :::8040 :::* LISTEN 1775/java
tcp6 0 0 :::8042 :::* LISTEN 1775/java
tcp6 0 0 :::22 :::* LISTEN 304/sshd
tcp6 0 0 :::8088 :::* LISTEN 1692/java
tcp6 0 0 :::8030 :::* LISTEN 1692/java
tcp6 0 0 :::8031 :::* LISTEN 1692/java
查看HDFS状况
[root@f795f10ac377 hadoop]# hadoop dfsadmin -report DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 16/08/06 16:32:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Configured Capacity: 10726932480 (9.99 GB) Present Capacity: 9665077248 (9.00 GB) DFS Remaining: 9665073152 (9.00 GB) DFS Used: 4096 (4 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Live datanodes (1): Name: 127.0.0.1:50010 (localhost) Hostname: f795f10ac377 Decommission Status : Normal Configured Capacity: 10726932480 (9.99 GB) DFS Used: 4096 (4 KB) Non DFS Used: 1061855232 (1012.66 MB) DFS Remaining: 9665073152 (9.00 GB) DFS Used%: 0.00% DFS Remaining%: 90.10% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sat Aug 06 16:32:59 UTC 2016