Hadoop伪分布式搭建(一)

 下面内容主要说明在Windows虚拟机上面,怎么搭建一个Hadoop伪分布式,并如何运行wordcount程序和网页查看HDFS文件系统。

 

1 相关软件下载和安装

 

APACH官网提供hadoop版本是32bit的。如果要在64bit Linux环境运行,需要重新编译hadoop,网盘提供的是64bit hadoop。

本文采用Hadoop 2.0.0-cdh4.2.1。下载 http://pan.baidu.com/s/1gdsC1TT 

操作系统ubuntu 64-bit,下载 http://pan.baidu.com/s/14XVI2

JDK版本 java version "1.7.0_79", 下载 http://pan.baidu.com/s/1sjuua1B

 

当然读者也可以去官网下载其它稳定版本,想了解更多,请参考本文结尾的参考资料。

从网址http://archive.apache.org/dist/hadoop/core/stable/ 下载稳定版本hadoop,读者可以回退网址参看更多hadoop版本。

 

 

2 创建新用户

例如创建用户sms

此命令创建了一个用户sms,其中-d和-m选项用来为登录名sms产生一个主目录/home/sms(一般来说,/usr为默认的用户主目录所在的父目录)。

 

指定用户所在的用户组 oinstall

如果用 usermod  -g  组名  用户名 , 是修改用户的所在组,是覆盖性的。 如果要加入多个组,应该用:   usermod -G 组名,组名,组名.....+空格+用户名!
 

在root 权限下, 修改sms用户的登录密码

#passwd sms

 

3  创建完成后,配置相关的环境变量。

 

3.1配置个人用户的环境变量

在vim的最后加上如下的内容

3.1.1 我的JDK安装在

配置JAVA_HOME的环境变量

#java settingsexport JAVA_HOME=/opt/jdk1.7.0_79              //修改为你的java安装路径即可
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

3.1.2 配置HADOOP_HOME环境变量

我的hadoop安装在/home/sms/hadoop

#hadoop settings
export HADOOP_HOME=/home/sms/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

 

运行命令 source .profile 使配置的环境变量立即生效

 

验证java和hadoop配置是否正确。

1 java -verison  //查看java版本
2  hadoop version //查看hadoop版本

 3.1.3 修改主机名对应的Ip

1  ifconfig  //查看本机IP地址

2 vim /etc/hosts/ //修改主机ip

3 sudo apt-get install openssh-server //安装ssh服务

4 ufw disable //关闭防火墙

 

4 安装后的 hadoop目录如下

 

其中,  tmp目录为新加目录, 命令 mkdir tmp 即可

 

4.1 进入etc/hadoop 修改hadoop的配置文件

 

修改绿框中的5个文件。

4.1.1

修改hadoop-env.sh的java_home为安装的java_home,参考如下图

 

 4.1.2 修改core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://ubuntu:9000</value>   //ubuntu 为我的hostname, 读者可修改为自己的IP或hostname
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/sms/hadoop/tmp</value>
    </property>
</configuration>

4.1.3 修改hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

4.1.4 修改mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

4.1.5 修改yarn-site.xml

<?xml version="1.0"?>
<configuration>

<!-- Site specific YARN configuration properties -->
    
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce.shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>ubuntu</value>
    </property>

</configuration>

 

5 ssh无密码登录

1 创建空密码登录 
ssh-keygen -t rsa -P ""

2 把id_rsa.pub追加到授权key中
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

3 验证是否hadoop能无密码登陆
ssh localhost

 

 

6  hadoop初始化和启动

在core-site.xml中指定了hadoop.tmp.dir的目录,所以我们要新建tmp目录,如4附件图所示。

 

1 hadoop初始化

hadoop namenode -format  //主机节点格式化

2 start-dfs.sh //启动hdfs文件系统

3 start-yarn.sh //启动yarn计算框架

4 jps //查看java守护进程


 

如图所示

必须有这5个进程显示,否则就是错误。

 

7  测试wordcount程序

进入hadoop安装目录下的hadoop/share/hadoop/mapreduce

发现文件 hadoop-mapreduce-examples-2.0.0-cdh4.2.1.jar

7.1 创建文件 hello

 

7.2 上传到hadoop的hdfs系统

1  hadoop fs -put hello test //把文件hello 上传至hdfs文件系统中的test文件
2 hadoop fs -ls test //产看上传的文件test

 

 7.3 运行hadoop-mapreduce-examples-2.0.0-cdh4.2.1.jar

进入到hadoop/share/hadoop/mapreduce目录中

hadoop jar hadoop-mapreduce-examples-2.0.0-cdh4.2.1.jar  wordcount test test-out  //运行安装包自带的wordcount程序

 

 

 

 7.4 查看运行结果

hadoop fs -ls test-out               //hadoop运行成功,会生成part-r-00000 和 _SUCCESS两个文件,计算结果保存在part-r-00000

 

hadoop fs -cat test-out/part-r-00000   //查看结果

 

 

7.5  网页查看运行结果

修改windows主机在路径C:\Windows\System32\drivers\etc下的hosts文件

增加虚拟机的IP地址和主机名字。

# Copyright (c) 1993-2009 Microsoft Corp.
#
# This is a sample HOSTS file used by Microsoft TCP/IP for Windows.
#
# This file contains the mappings of IP addresses to host names. Each
# entry should be kept on an individual line. The IP address should
# be placed in the first column followed by the corresponding host name.
# The IP address and the host name should be separated by at least one
# space.
#
# Additionally, comments (such as these) may be inserted on individual
# lines or following the machine name denoted by a '#' symbol.
#
# For example:
#
#      102.54.94.97     rhino.acme.com          # source server
#       38.25.63.10     x.acme.com              # x client host


192.168.67.133 localhost  //新加入的虚拟机IP和主机名

 

在windows主机浏览器主页输入:http://192.168.67.133:50070    即可查看到HDFS文件系统

 

 

 

8 问题汇集

...

 

参考资料

0 CDH与APACHE hadoop对应关系

http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/CDH-Version-and-Packaging-Information/cdhvd_cdh_package_tarball.html#topic_3_unique_8

1  历史版本之间的关系

http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/CDH-Version-and-Packaging-Information/cdhvd_cdh_package_previous.html

2 hadoop家族、strom、spark、Linux、flume等jar包、安装包汇总下载(持续更新)

http://www.aboutyun.com/thread-8178-1-1.html

3 Win7中使用Eclipse连接虚拟机中的Ubuntu中的Hadoop2.4经验总结

http://www.aboutyun.com/thread-7784-1-1.html

4  Hadoop安装遇到的各种异常及解决办法(1)

http://www.it165.net/admin/html/201409/3704.html


5   hadoop 常见问题解决办法
http://www.aboutyun.com/home.php?mod=space&uid=61&do=blog&view=me&from=space

 

6 hadoop官方安装教程

http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html

 

7 wordcount 官方例子

http://hadoop.apache.org/docs/r2.7.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0

 

posted @ 2015-05-25 11:51  kongmeng  阅读(313)  评论(0编辑  收藏  举报