hadoop的布暑方式 笔记二
Hadoop 部署方式:
本地单机模式:
伪分布模式:
关闭防火墙 service iptables stop //关闭防火墙
Service iptables status //查看防火墙状态
设置防火墙开机启动模式
Chkconfig iptables --list //查看
Gedit /etc/inittab //查看模式详情
Chkconfig iptables off //关闭开机启动
修改ip gedit /etc/sysconfig/network-scritps/ifcfg-eth0
添加
Ipaddr=”192.168.8.88”
Netmask=”255.255.255.0”
Gateway=”192.168.8.1”
//可以不配dns
dns1=”8.8.8.8”
Dns2=”8.8.4.4”
修改hostname gedit /etc/sysconfig/network (retacn1)
Geidt /etc/hosts 添加
192.168.8.88 retacn1
设置ssh自动登录 设置允许集群上的机器不需要密码登录
Ubuntu 安装ssh
Apt-get install openssh-client
Apt-get install openssh-server
重启ssh
sudo /etc/init.d/ssh resart
#进入用户目录
retacn@vm:/# cd retacn
# 创建一对密钥
retacn@vm:~# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
a9:dc:6f:dd:96:04:fb:2c:96:9b:fb:7d:d3:8a:e2:ac root@vm
The key's randomart image is:
+--[ RSA 2048]----+
| |
| |
| |
| . . |
| S o |
| . o . . |
| o . . * ..|
| oo =o*oo|
| E++o=*o.+|
+-----------------+
#免密码登录
retacn@vm:~/.ssh# cp id_rsa.pub authorized_keys
安装jdk
安装hadoop
下载安装包
解压到指定目录 root@vm:/software/hadoop# tar zxvf hadoop-0.20.2.tar.gz
配置文件如下:
1 Hadoop-env.sh 环境变量
打开文件
retacn@vm:/software/hadoop/hadoop-0.20.2/conf# gedit hadoop-env.sh
添加如下内容
export JAVA_HOME=/sdk/jdk1.6.0_34
2 Core-site.xml 核心配置,如hdfs和mapreduce中的i/o设置
添加如下内容: 如果是在不同机器要修改loclhost
<property>
<!--名称节点-->
<name>fs.default.name</name>
<!--本地端口-->
<value>hdfs://localhost:9000</value>
</property>
<!--临时目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/tmp/hadoop-${user.name}</value>
</property>
3 Hdfs-site.xml 后台程序设置的配置:名称节点 第二节名称节点和数据节点
添加如下内容
<!--指定数据节点中数据的存放位-->
<property>
<name>dfs.data.dir</name>
<value>/usr/hadoop-0.20.2/data</value>
</property>
<!--数据节点的数据需要复制的份数-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
4 Mapred-site.xml mapreduce 后台程序设置的配置,jobTacker和taskTracker
添加如下内容: 如果是完全分布式,需要修改localhost
<!--配置作业跟踪器-->
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapred.job.tmp</name>
<value>/opt/temp</value>
</property>
5 Masters 记录运行第二节点的机器列表
root@vm:/software/hadoop/hadoop-0.20.2/conf# cat masters
Localhost // 名称节点 namenode
6 Slaves 记录运行数据节点和taskTracker的机器
root@vm:/software/hadoop/hadoop-0.20.2/conf# cat slaves
Localhost // datanode taskTracker所在节点,如果为多个,每行定义一个
7 Hadoop-metrics..properties 控制hadoop怎么发布metries的属性
8 Log4j.properties 系统日志文件的属性,名称节点审计日记和taskTracker子进程
格式化HDFS分布式文件系统(也就是所谓的名称节点namenode)
root@vm:/software/hadoop/hadoop-0.20.2# bin/hadoop namenode -format
15/09/19 11:14:37 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = vm/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
15/09/19 11:14:38 INFO namenode.FSNamesystem: fsOwner=root,root
15/09/19 11:14:38 INFO namenode.FSNamesystem: supergroup=supergroup
15/09/19 11:14:38 INFO namenode.FSNamesystem: isPermissionEnabled=true
15/09/19 11:14:38 INFO common.Storage: Image file of size 94 saved in 0 seconds.
15/09/19 11:14:38 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
15/09/19 11:14:38 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at vm/127.0.1.1
************************************************************/
使用bin/start-all.sh 启动hadoop
root@vm:/software/hadoop/hadoop-0.20.2# bin/start-all.sh
starting namenode, logging to /software/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-vm.out
localhost: starting datanode, logging to /software/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-vm.out
localhost: starting secondarynamenode, logging to /software/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-vm.out
starting jobtracker, logging to /software/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-vm.out
localhost: starting tasktracker, logging to /software/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-vm.out
检测守护进程的启动情况
root@vm:/software/hadoop/hadoop-0.20.2# /sdk/jdk1.6.0_34/bin/jps
4673 NameNode
6192 SecondaryNameNode
5435 DataNode
7045 TaskTracker
7105 Jps
6269 JobTracker
使用bin/stop-all.sh 关闭hadoop
root@vm:/software/hadoop/hadoop-0.20.2# bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
配置主机与虚拟机通信
启动vmnet1 hostonly 192.168.8.100
注:vmnet8 为nat
集群模式:
至少三台虚拟机器
服务器可以是esxi,客户机安装vmware client
配置hosts文件
修改etc/hosts文件添加,使机器之间能把主机名解析为ip
192.168.8.88 retacn1
建立hadoop账号
创建一个运行hadoop的专用账号
#添加用户
root@vm:/etc# useradd retacn
#修改用户密码
root@vm:/etc# passwd retacn
#修改用户登录目录
root@vm:/home# usermod -d /home/retacn retacn
#添加分组
root@vm:/# groupadd superman
#修改分组
root@vm:/# groupadd -g 355 superman
#修改用户分组
root@vm:/# usermod -g superman retacn
配置ssh免密码登录
#进入用户目录
retacn@vm:/# cd retacn
# 创建一对密钥
retacn@vm:~# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
a9:dc:6f:dd:96:04:fb:2c:96:9b:fb:7d:d3:8a:e2:ac root@vm
The key's randomart image is:
+--[ RSA 2048]----+
| |
| |
| |
| . . |
| S o |
| . o . . |
| o . . * ..|
| oo =o*oo|
| E++o=*o.+|
+-----------------+
#免密码登录
retacn@vm:~/.ssh# cp id_rsa.pub authorized_keys
注:
把所有节点中的authorized_keys放到同一个文件中,再替换原有的authorized_keys,就可以免密码接入
下载并解压hadoop安装包
配置namenode,修改site文件
同为分布式相同,只需将localhost改为ip或是主机名即可
配置hadoop-env.sh
同上
配置masters和salves文件
root@vm:/software/hadoop/hadoop-0.20.2/conf# cat masters
localhost
root@vm:/software/hadoop/hadoop-0.20.2/conf# cat slaves
Localhost
向各节点复制hadoop
Scp -r ./hadoop-0.20.0 主机名:/home/retacn(用户名,即用户的工作目录)
多个节点可重复以上操作
格式化namenode
root@vm:/software/hadoop/hadoop-0.20.2# bin/hadoop namenode -format
15/09/19 11:14:37 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = vm/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
15/09/19 11:14:38 INFO namenode.FSNamesystem: fsOwner=root,root
15/09/19 11:14:38 INFO namenode.FSNamesystem: supergroup=supergroup
15/09/19 11:14:38 INFO namenode.FSNamesystem: isPermissionEnabled=true
15/09/19 11:14:38 INFO common.Storage: Image file of size 94 saved in 0 seconds.
15/09/19 11:14:38 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
15/09/19 11:14:38 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at vm/127.0.1.1
************************************************************/
使用bin/start-all.sh 启动hadoop
root@vm:/software/hadoop/hadoop-0.20.2# bin/start-all.sh
starting namenode, logging to /software/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-vm.out
localhost: ssh: connect to host localhost port 22: Connection refused
localhost: ssh: connect to host localhost port 22: Connection refused
starting jobtracker, logging to /software/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-vm.out
localhost: ssh: connect to host localhost port 22: Connection refused
检测守护进程的启动情况
root@vm:/software/hadoop/hadoop-0.20.2# ls /sdk/jdk1.6.0_34/bin/jps
/sdk/jdk1.6.0_34/bin/jps
root@vm:/software/hadoop/hadoop-0.20.2# /sdk/jdk1.6.0_34/bin/jps
3354 Jps
3183 JobTracker
3063 NameNode
如果是在datanode节点上看到的可能是
Jps
Datanode
taskTracker
Window平台下安装hadoop
首先安装cygwin,安装步骤略
配置环境变量:
安装目录
CYGWIN_HOME d:/tools/cygwin
Path %CYGWIN_HOME%
命名空间
CYGWIN ntsec tty
Ssh-host-config
No
Yes
/
No
密码
启动服务,可心从windows服务管理中启动,也可以输入命令启动 net start sshd
停止服务 net stop sshd
免密码登录
Ssh-keygen
Cd ~/.ssh
Ls
Cp id_rsa.pub authorized_keys
Ssh localhost
Yes
Who
安装hadoop,同linux下安装
向各节点复制hadoop
Scp -r ./hadoop-0.20.2 机器名:/home/用户名
通过web了解hadoop的活动
作业跟踪器Jobtracker 50030
http://192.168.8.88:50030
名称节点Namenode 50070
http://192.168.8.88:50070
http://192.168.8.88:50030/jobtracker.jsp
数据保存的物理位置
root@vm:/usr/hadoop-0.20.2/data# ls -lR
.:
总用量 16
drwxr-xr-x 2 root root 4096 10月 2 21:30 current
drwxr-xr-x 2 root root 4096 9月 30 23:39 detach
-rw-r--r-- 1 root root 157 9月 30 23:39 storage
drwxr-xr-x 2 root root 4096 10月 2 21:30 tmp
./current:
总用量 80
#数据文件
-rw-r--r-- 1 root root 13 10月 2 21:22 blk_-1675081577279755485
#元数据
-rw-r--r-- 1 root root 11 10月 2 21:22 blk_-1675081577279755485_1006.meta
-rw-r--r-- 1 root root 12 10月 2 21:22 blk_-2439359072604098835
-rw-r--r-- 1 root root 11 10月 2 21:22 blk_-2439359072604098835_1007.meta
-rw-r--r-- 1 root root 25 10月 2 21:30 blk_-4637427120727666256
-rw-r--r-- 1 root root 11 10月 2 21:30 blk_-4637427120727666256_1013.meta
-rw-r--r-- 1 root root 16819 10月 2 21:28 blk_5582641510528402204
-rw-r--r-- 1 root root 139 10月 2 21:28 blk_5582641510528402204_1012.meta
-rw-r--r-- 1 root root 8614 10月 2 21:30 blk_-5776174436279791856
-rw-r--r-- 1 root root 75 10月 2 21:30 blk_-5776174436279791856_1013.meta
-rw-r--r-- 1 root root 4 10月 2 21:22 blk_698400160766810009
-rw-r--r-- 1 root root 11 10月 2 21:22 blk_698400160766810009_1005.meta
#数据文件效验和
-rw-r--r-- 1 root root 1159 10月 2 21:36 dncp_block_verification.log.curr
-rw-r--r-- 1 root root 155 10月 2 21:21 VERSION
./detach:
总用量 0
./tmp:
总用量 0