Hadoop伪分布式搭建(一)
下面内容主要说明在Windows虚拟机上面,怎么搭建一个Hadoop伪分布式,并如何运行wordcount程序和网页查看HDFS文件系统。
1 相关软件下载和安装
APACH官网提供hadoop版本是32bit的。如果要在64bit Linux环境运行,需要重新编译hadoop,网盘提供的是64bit hadoop。
本文采用Hadoop 2.0.0-cdh4.2.1。下载 http://pan.baidu.com/s/1gdsC1TT
操作系统ubuntu 64-bit,下载 http://pan.baidu.com/s/14XVI2
JDK版本 java version "1.7.0_79", 下载 http://pan.baidu.com/s/1sjuua1B
当然读者也可以去官网下载其它稳定版本,想了解更多,请参考本文结尾的参考资料。
从网址http://archive.apache.org/dist/hadoop/core/stable/ 下载稳定版本hadoop,读者可以回退网址参看更多hadoop版本。
2 创建新用户
例如创建用户sms
此命令创建了一个用户sms,其中-d和-m选项用来为登录名sms产生一个主目录/home/sms(一般来说,/usr为默认的用户主目录所在的父目录)。
指定用户所在的用户组 oinstall
在root 权限下, 修改sms用户的登录密码
#passwd sms
3 创建完成后,配置相关的环境变量。
3.1配置个人用户的环境变量
在vim的最后加上如下的内容
3.1.1 我的JDK安装在
配置JAVA_HOME的环境变量
#java settingsexport JAVA_HOME=/opt/jdk1.7.0_79 //修改为你的java安装路径即可 export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
3.1.2 配置HADOOP_HOME环境变量
我的hadoop安装在/home/sms/hadoop
#hadoop settings export HADOOP_HOME=/home/sms/hadoop export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
运行命令 source .profile 使配置的环境变量立即生效
验证java和hadoop配置是否正确。
1 java -verison //查看java版本
2 hadoop version //查看hadoop版本
3.1.3 修改主机名对应的Ip
1 ifconfig //查看本机IP地址
2 vim /etc/hosts/ //修改主机ip
3 sudo apt-get install openssh-server //安装ssh服务
4 ufw disable //关闭防火墙
4 安装后的 hadoop目录如下
其中, tmp目录为新加目录, 命令 mkdir tmp 即可
4.1 进入etc/hadoop 修改hadoop的配置文件
修改绿框中的5个文件。
4.1.1
修改hadoop-env.sh的java_home为安装的java_home,参考如下图
4.1.2 修改core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://ubuntu:9000</value> //ubuntu 为我的hostname, 读者可修改为自己的IP或hostname </property> <property> <name>hadoop.tmp.dir</name> <value>/home/sms/hadoop/tmp</value> </property> </configuration>
4.1.3 修改hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
4.1.4 修改mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
4.1.5 修改yarn-site.xml
<?xml version="1.0"?> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>ubuntu</value> </property> </configuration>
5 ssh无密码登录
1 创建空密码登录
ssh-keygen -t rsa -P ""
2 把id_rsa.pub追加到授权key中
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3 验证是否hadoop能无密码登陆
ssh localhost
6 hadoop初始化和启动
在core-site.xml中指定了hadoop.tmp.dir的目录,所以我们要新建tmp目录,如4附件图所示。
1 hadoop初始化 hadoop namenode -format //主机节点格式化
2 start-dfs.sh //启动hdfs文件系统
3 start-yarn.sh //启动yarn计算框架
4 jps //查看java守护进程
如图所示
必须有这5个进程显示,否则就是错误。
7 测试wordcount程序
进入hadoop安装目录下的hadoop/share/hadoop/mapreduce
发现文件 hadoop-mapreduce-examples-2.0.0-cdh4.2.1.jar
7.1 创建文件 hello
7.2 上传到hadoop的hdfs系统
1 hadoop fs -put hello test //把文件hello 上传至hdfs文件系统中的test文件
2 hadoop fs -ls test //产看上传的文件test
7.3 运行hadoop-mapreduce-examples-2.0.0-cdh4.2.1.jar包
进入到hadoop/share/hadoop/mapreduce目录中
hadoop jar hadoop-mapreduce-examples-2.0.0-cdh4.2.1.jar wordcount test test-out //运行安装包自带的wordcount程序
7.4 查看运行结果
hadoop fs -ls test-out //hadoop运行成功,会生成part-r-00000 和 _SUCCESS两个文件,计算结果保存在part-r-00000
hadoop fs -cat test-out/part-r-00000 //查看结果
7.5 网页查看运行结果
修改windows主机在路径C:\Windows\System32\drivers\etc下的hosts文件
增加虚拟机的IP地址和主机名字。
# Copyright (c) 1993-2009 Microsoft Corp. # # This is a sample HOSTS file used by Microsoft TCP/IP for Windows. # # This file contains the mappings of IP addresses to host names. Each # entry should be kept on an individual line. The IP address should # be placed in the first column followed by the corresponding host name. # The IP address and the host name should be separated by at least one # space. # # Additionally, comments (such as these) may be inserted on individual # lines or following the machine name denoted by a '#' symbol. # # For example: # # 102.54.94.97 rhino.acme.com # source server # 38.25.63.10 x.acme.com # x client host 192.168.67.133 localhost //新加入的虚拟机IP和主机名
在windows主机浏览器主页输入:http://192.168.67.133:50070 即可查看到HDFS文件系统
8 问题汇集
...
参考资料
0 CDH与APACHE hadoop对应关系
1 历史版本之间的关系
2 hadoop家族、strom、spark、Linux、flume等jar包、安装包汇总下载(持续更新)
http://www.aboutyun.com/thread-8178-1-1.html
3 Win7中使用Eclipse连接虚拟机中的Ubuntu中的Hadoop2.4经验总结
http://www.aboutyun.com/thread-7784-1-1.html
4 Hadoop安装遇到的各种异常及解决办法(1)
http://www.it165.net/admin/html/201409/3704.html
5 hadoop 常见问题解决办法
http://www.aboutyun.com/home.php?mod=space&uid=61&do=blog&view=me&from=space
6 hadoop官方安装教程
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
7 wordcount 官方例子
http://hadoop.apache.org/docs/r2.7.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0