Hadoop学习笔记——安装Hadoop

sudo mv /home/common/下载/hadoop-2.7.2.tar.gz /usr/local
sudo tar -xzvf hadoop-2.7.2.tar.gz
sudo mv hadoop-2.7.2 hadoop    #改个名

 在etc/profile文件中添加

export HADOOP_HOME=/usr/local/hadoop
export PATH=.:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

 1.修改/usr/local/hadoop/etc/hadoop/hadoop-env.sh文件

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_121

2.修改/usr/local/hadoop/etc/hadoop/core-site.xml文件

<configuration>

        <property>
                <name>fs.default.name</name>
                <value>hdfs://master:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>~/software/apache/hadoop-2.9.1/tmp</value>
        </property>
        <property>
                <name>hadoop.native.lib</name>
                <value>false</value>
        </property>

</configuration>

 在/etc/hosts中添加自己的外网ip

XXXX    master

 如果在工程中需要访问HDFS,需要在resources中添加 core-site.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://master:9000</value>
  </property>

</configuration>

 

 3.修改/usr/local/hadoop/etc/hadoop/hdfs-site.xml文件

<configuration>

        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
                <name>dfs.name.dir</name>
                <value>file:/home/lintong/software/apache/hadoop-2.9.1/tmp/dfs/name</value>
        </property>
        <property>
                <name>dfs.data.dir</name>
                <value>file:/home/lintong/software/apache/hadoop-2.9.1/tmp/dfs/data</value>
        </property>
        <property>
                <name>dfs.namenode.checkpoint.dir</name>
                <value>file:/home/lintong/software/apache/hadoop-2.9.1/tmp/dfs/namenode</value>
        </property>
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>

</configuration>

 

 4./usr/local/hadoop/etc/hadoop/mapred-site.xml(修改mapred-site.xml.template的那个文件)

<configuration>

        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>

</configuration>

 

5. /usr/local/hadoop/etc/hadoop/yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>

</configuration>

 

6.使得/etc/profile生效

sudo source /etc/profile

 /etc/profile文件内容

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_121
export JRE_HOME=${JAVA_HOME}/jre 
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib 
export PATH=${JAVA_HOME}/bin:$PATH

export PATH=/usr/local/texlive/2015/bin/x86_64-linux:$PATH 
export MANPATH=/usr/local/texlive/2015/texmf-dist/doc/man:$MANPATH 
export INFOPATH=/usr/local/texlive/2015/texmf-dist/doc/info:$INFOPATH

export HADOOP_HOME=/usr/local/hadoop
export PATH=.:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

export M2_HOME=/opt/apache-maven-3.3.9
export M2=$M2_HOME/bin
export PATH=$M2:$PATH

export GRADLE_HOME=/opt/gradle/gradle-3.4.1
export PATH=$GRADLE_HOME/bin:$PATH

 ~/.bashrc文件内容

export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL

SSH和Hadoop用户设置可以参考

http://www.cnblogs.com/CheeseZH/p/5051135.html

http://www.powerxing.com/install-hadoop/

免密登录

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost

 

<i>如果遇到dataNode不能启动的问题,参考

http://www.aboutyun.com/thread-12803-1-1.html

去Hadoop/log目录下查看log日志文件,然后在/usr/local/hadoop/tmp/dfs/data/current目录下修改VERSION文件中的内容

 

<ii>ubuntu Hadoop启动报Error: JAVA_HOME is not set and could not be found解决办法

修改/etc/hadoop/hadoop-env.sh中设JAVA_HOME为绝对路径

 

Hadoop目录下的权限

 

格式化一个新的分布式文件系统

hdfs namenode -format

运行Hadoop

运行Hadoop示例

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi 2 5

 输出

Number of Maps  = 2
Samples per Map = 5
Wrote input for Map #0
Wrote input for Map #1
Starting Job
17/03/26 11:49:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/03/26 11:49:47 INFO input.FileInputFormat: Total input paths to process : 2
17/03/26 11:49:47 INFO mapreduce.JobSubmitter: number of splits:2
17/03/26 11:49:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1490497943530_0002
17/03/26 11:49:48 INFO impl.YarnClientImpl: Submitted application application_1490497943530_0002
17/03/26 11:49:48 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1490497943530_0002/
17/03/26 11:49:48 INFO mapreduce.Job: Running job: job_1490497943530_0002
17/03/26 11:49:55 INFO mapreduce.Job: Job job_1490497943530_0002 running in uber mode : false
17/03/26 11:49:55 INFO mapreduce.Job:  map 0% reduce 0%
17/03/26 11:50:02 INFO mapreduce.Job:  map 100% reduce 0%
17/03/26 11:50:08 INFO mapreduce.Job:  map 100% reduce 100%
17/03/26 11:50:08 INFO mapreduce.Job: Job job_1490497943530_0002 completed successfully
17/03/26 11:50:08 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=50
		FILE: Number of bytes written=353898
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=524
		HDFS: Number of bytes written=215
		HDFS: Number of read operations=11
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Job Counters 
		Launched map tasks=2
		Launched reduce tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=9536
		Total time spent by all reduces in occupied slots (ms)=3259
		Total time spent by all map tasks (ms)=9536
		Total time spent by all reduce tasks (ms)=3259
		Total vcore-milliseconds taken by all map tasks=9536
		Total vcore-milliseconds taken by all reduce tasks=3259
		Total megabyte-milliseconds taken by all map tasks=9764864
		Total megabyte-milliseconds taken by all reduce tasks=3337216
	Map-Reduce Framework
		Map input records=2
		Map output records=4
		Map output bytes=36
		Map output materialized bytes=56
		Input split bytes=288
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=56
		Reduce input records=4
		Reduce output records=0
		Spilled Records=8
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=319
		CPU time spent (ms)=2570
		Physical memory (bytes) snapshot=719585280
		Virtual memory (bytes) snapshot=5746872320
		Total committed heap usage (bytes)=513802240
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=236
	File Output Format Counters 
		Bytes Written=97
Job Finished in 21.472 seconds
Estimated value of Pi is 3.60000000000000000000

 

可以访问 Web 界面 http://localhost:50070 查看 NameNode 和 Datanode 信息,还可以在线查看 HDFS 中的文件

 

启动 YARN 之后,运行实例的方法还是一样的,仅仅是资源管理方式、任务调度不同。观察日志信息可以发现,不启用 YARN 时,是 “mapred.LocalJobRunner” 在跑任务,启用 YARN 之后,是 “mapred.YARNRunner” 在跑任务。启动 YARN 有个好处是可以通过 Web 界面查看任务的运行情况:http://localhost:8088/cluster

点击history,查看每一个任务,如果遇到master:19888不能访问的情况,在目录下执行

mr-jobhistory-daemon.sh start historyserver

 

hdfs解除安全模式

bin/hadoop dfsadmin -safemode leave

  

关于Hadoop的架构请关注下面这篇博文的内容

Hadoop HDFS概念学习系列之初步掌握HDFS的架构及原理1(一)

关于Hadoop中HDFS的读取过程请关注下面这篇博文的内容

Hadoop HDFS概念学习系列之初步掌握HDFS的架构及原理2(二)

关于Hadoop中HDFS的写入过程请关注下面这篇博文的内容

Hadoop HDFS概念学习系列之初步掌握HDFS的架构及原理3(三)

关于Hadoop中SNN的作用请关注下面这篇博文的内容

http://blog.csdn.net/xh16319/article/details/31375197

 

posted @ 2017-03-10 22:44  tonglin0325  阅读(344)  评论(0编辑  收藏  举报