Hadoop集群配置文档
目录
Hadoop集群配置文档
1. 准备工作
2. 安装jdk.
3. 设定三台机器登入免密码
4. 安装hadoop
5. 设定 *-site.xml(所有节点相同)
6. 设定 主从节点(所有节点相同)
7. 格式化HDFS
8. 启动hadoop
9. 测试wordcount
1. 准备工作
准备机器:一台master,两台slave
安装软件:
Hadoop版本 1.0.2
Jdk版本 1.7.0_03
安装路径:/usr/local/webserver/hadoop
数据目录:/data/hadoop/
集群信息:
机器名 |
Ip地址 |
作用 |
master |
192.168.1.1 |
NameNode、JobTracker |
slave1 |
192.168.1.2 |
DataNode、TaskTracker |
slave2 |
192.168.1.3 |
DataNode、TaskTracker |
先设定主机名,每台主机名都不一样
在192.168.1.1设
[root@localhost src]# echo "master" > hostname [root@localhost src]# mv hostname /etc/hostname [root@localhost src]# hostname -F /etc/hostname #检查 [root@localhost src]# hostname master
|
在192.168.1.2设
[root@localhost src]# echo "slave1" > hostname [root@localhost src]# mv hostname /etc/hostname [root@localhost src]# hostname -F /etc/hostname #检查 [root@localhost src]# hostname slave1
|
在192.168.1.3设
[root@localhost src]# echo "slave2" > hostname [root@localhost src]# mv hostname /etc/hostname [root@localhost src]# hostname -F /etc/hostname #检查 [root@localhost src]# hostname slave2
|
配置每台机器的/etc/hosts保证各台机器之间通过机器名可以互访,例如:
[root@localhost src]# vi /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 link-masterdb localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 #hadoop 192.168.1.1 master 192.168.1.3 slave2 |
2. 安装jdk
安装程序
[root@localhost src]# rpm -ivh jdk-7u3-linux-x64.rpm
|
Linux安装JDK步骤2.设置环境变量。
[root@localhost src]# vi /etc/profile #在最后面加入 #set java environment JAVA_HOME=/usr/java/jdk1.7.0_03 CLASSPATH=.:$JAVA_HOME/lib.tools.jar PATH=$JAVA_HOME/bin:$PATH export JAVA_HOME CLASSPATH PATH
|
|
执行下面命令,让环境变量生效
[root@localhost src]# source /etc/profile
|
Linux安装JDK步骤3.在终端使用echo命令检查环境变量设置情况。
[root@localhost src]# echo $JAVA_HOME /usr/java/jdk1.7.0_03 [root@localhost src]# echo $CLASSPATH .:/usr/java/jdk1.7.0_03/lib.tools.jar [root@localhost src]# echo $PATH /usr/java/jdk1.7.0_03/bin:/usr/java/jdk-1_5_0_02/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin |
4.检查JDK是否安装成功。
[root@localhost src]# java -version java version "1.7.0_03" Java(TM) SE Runtime Environment (build 1.7.0_03-b04) Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode) |
如果看到JVM版本及相关信息,即安装成功!
3. 设定三台机器登入免密码
请注意我們试验检查环境已经把 /etc/ssh/ssh_config里的StrictHostKeyChecking改成no,下面的指令可以检查,如果你的设定不同的话,请修改此档会比较顺。
[root@localhost src]# cat /etc/ssh/ssh_config |grep StrictHostKeyChecking StrictHostKeyChecking no
|
在"master" 上操作
接着将key产生并复制到其他node上
[root@localhost src]# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P "" [root@localhost src]# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys [root@localhost src]# scp -r ~/.ssh slave1:~/ [root@localhost src]# scp -r ~/.ssh slave2:~/
|
测试看看是否登入免密码
[root@localhost src]# ssh slave1 [root@localhost src]# ssh master [root@localhost src]# ssh slave2 [root@localhost src]# ssh master [root@localhost src]# exit
|
4. 安装hadoop
[root@localhost src]# tar zxvf hadoop-1.0.2.tar.gz [root@localhost src]# mkdir /usr/local/webserver [root@localhost src]# mv hadoop-1.0.2 /usr/local/webserver/hadoop [root@localhost src]# cd /usr/local/webserver/hadoop
|
在hadoop-env.sh 中添加export JAVA_HOME=/root/jdk1.6.0_14
[root@localhost src]# vim conf/hadoop-env.sh export JAVA_HOME=/usr/java/jdk1.7.0_03 |
测试hadoop是否安装成功
[root@localhost hadoop]# bin/hadoop Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode datanode run a DFS datanode dfsadmin run a DFS admin client mradmin run a Map-Reduce admin client fsck run a DFS filesystem checking utility fs run a generic filesystem user client balancer run a cluster balancing utility fetchdt fetch a delegation token from the NameNode jobtracker run the MapReduce job Tracker node pipes run a Pipes job tasktracker run a MapReduce task Tracker node historyserver run job history servers as a standalone daemon job manipulate MapReduce jobs queue get information regarding JobQueues version print the version jar <jar> run a jar file distcp <srcurl> <desturl> copy file or directories recursively archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive classpath prints the class path needed to get the Hadoop jar and the required libraries daemonlog get/set the log level for each daemon or CLASSNAME run the class named CLASSNAME Most commands print help when invoked w/o parameters.
|
5. 设定 *-site.xml(所有节点相同)
接下来的设定档共有3個 core-site.xml, hdfs-site.xml, mapred-site.xml
[root@localhost hadoop]# vim conf/core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/data/hadoop/hadoop_home/var</value> </property> </configuration>
|
[root@localhost hadoop]# vim conf/hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration> <property> <name>dfs.name.dir</name> <value>/data/hadoop/name1, /data/hadoop/name2</value> <description> </description> </property> <property> <name>dfs.data.dir</name> <value>/data/hadoop/data1, /data/hadoop/data2</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration>
|
[root@localhost hadoop]# vim conf/mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration> <property> <name>mapred.job.tracker</name> <value>master:9001</value> </property> <property> <name>mapred.local.dir</name> <value>/data/hadoop/hadoop_home/var</value> </property> </configuration>
|
6. 设定 主从节点(所有节点相同)
配置conf/masters和conf/slaves来设置主从结点,注意最好使用主机名,并且保证机器之间通过主机名可以互相访问,每个主机名一行
[root@localhost hadoop]# vim conf/masters master
[root@localhost hadoop]# vim conf/slaves slave1 slave2 |
7. 格式化HDFS
以上我們已经设定好 Hadoop 单机测试的环境,接著让我们来启动 Hadoop 相关服务,格式化 namenode, secondarynamenode, tasktracker
[root@localhost hadoop]# bin/hadoop namenode -format 12/04/17 11:21:55 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost.localdomain/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.0.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012 ************************************************************/ 12/04/17 11:21:55 INFO util.GSet: VM type = 64-bit 12/04/17 11:21:55 INFO util.GSet: 2% max memory = 17.77875 MB 12/04/17 11:21:55 INFO util.GSet: capacity = 2^21 = 2097152 entries 12/04/17 11:21:55 INFO util.GSet: recommended=2097152, actual=2097152 12/04/17 11:21:56 INFO namenode.FSNamesystem: fsOwner=root 12/04/17 11:21:56 INFO namenode.FSNamesystem: supergroup=supergroup 12/04/17 11:21:56 INFO namenode.FSNamesystem: isPermissionEnabled=true 12/04/17 11:21:56 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 12/04/17 11:21:56 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 12/04/17 11:21:56 INFO namenode.NameNode: Caching file names occuring more than 10 times 12/04/17 11:21:56 INFO common.Storage: Image file of size 110 saved in 0 seconds. 12/04/17 11:21:56 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted. 12/04/17 11:21:56 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1 ************************************************************/ |
8. 启动hadoop
接着用 start-all.sh 来启动所有服务,包含 namenode, datanode,
[root@localhost hadoop]# bin/start-all.sh starting namenode, logging to /usr/local/webserver/hadoop/libexec/../logs/hadoop-root-namenode-localhost.localdomain.out localhost: starting datanode, logging to /usr/local/webserver/hadoop/libexec/../logs/hadoop-root-datanode-localhost.localdomain.out localhost: starting secondarynamenode, logging to /usr/local/webserver/hadoop/libexec/../logs/hadoop-root-secondarynamenode-localhost.localdomain.out starting jobtracker, logging to /usr/local/webserver/hadoop/libexec/../logs/hadoop-root-jobtracker-localhost.localdomain.out localhost: starting tasktracker, logging to /usr/local/webserver/hadoop/libexec/../logs/hadoop-root-tasktracker-localhost.localdomain.out |
完成!检查运作状态
启动之后,可以检查以下网址,来观看服务是否正常。Hadoop 管理
http://localhost:50030/ - Hadoop 管理介面
http://localhost:50060/ - Hadoop Task Tracker 状态
http://localhost:50070/ - Hadoop DFS 状态
9. 测试wordcount
[root@localhost hadoop]# mkdir input [root@localhost hadoop]# echo "hello world" >> input/a.txt [root@localhost hadoop]# echo "hello hadoop" >> input/b.txt
|
将本地数据复制到HDFS中:
[root@localhost hadoop]# bin/hadoop fs -put input in
|
执行测试任务:
[root hadoop]# bin/hadoop jar hadoop-examples-1.0.2.jar wordcount in out ****hdfs://localhost:9000/user/root/in 12/04/16 20:37:27 INFO input.FileInputFormat: Total input paths to process : 2 12/04/16 20:37:27 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/04/16 20:37:27 WARN snappy.LoadSnappy: Snappy native library not loaded 12/04/16 20:37:27 INFO mapred.JobClient: Running job: job_201204161711_0008 12/04/16 20:37:28 INFO mapred.JobClient: map 0% reduce 0% 12/04/16 20:37:42 INFO mapred.JobClient: map 100% reduce 0% 12/04/16 20:37:54 INFO mapred.JobClient: map 100% reduce 100% 12/04/16 20:37:59 INFO mapred.JobClient: Job complete: job_201204161711_0008 12/04/16 20:37:59 INFO mapred.JobClient: Counters: 29 12/04/16 20:37:59 INFO mapred.JobClient: Job Counters 12/04/16 20:37:59 INFO mapred.JobClient: Launched reduce tasks=1 12/04/16 20:37:59 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=19236 12/04/16 20:37:59 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/04/16 20:37:59 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/04/16 20:37:59 INFO mapred.JobClient: Launched map tasks=2 12/04/16 20:37:59 INFO mapred.JobClient: Data-local map tasks=2 12/04/16 20:37:59 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=11028 12/04/16 20:37:59 INFO mapred.JobClient: File Output Format Counters 12/04/16 20:37:59 INFO mapred.JobClient: Bytes Written=25 12/04/16 20:37:59 INFO mapred.JobClient: FileSystemCounters 12/04/16 20:37:59 INFO mapred.JobClient: FILE_BYTES_READ=55 12/04/16 20:37:59 INFO mapred.JobClient: HDFS_BYTES_READ=235 12/04/16 20:37:59 INFO mapred.JobClient: FILE_BYTES_WRITTEN=64792 12/04/16 20:37:59 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=25 12/04/16 20:37:59 INFO mapred.JobClient: File Input Format Counters 12/04/16 20:37:59 INFO mapred.JobClient: Bytes Read=25 12/04/16 20:37:59 INFO mapred.JobClient: Map-Reduce Framework 12/04/16 20:37:59 INFO mapred.JobClient: Map output materialized bytes=61 12/04/16 20:37:59 INFO mapred.JobClient: Map input records=2 12/04/16 20:37:59 INFO mapred.JobClient: Reduce shuffle bytes=31 12/04/16 20:37:59 INFO mapred.JobClient: Spilled Records=8 12/04/16 20:37:59 INFO mapred.JobClient: Map output bytes=41 12/04/16 20:37:59 INFO mapred.JobClient: CPU time spent (ms)=2900 12/04/16 20:37:59 INFO mapred.JobClient: Total committed heap usage (bytes)=602996736 12/04/16 20:37:59 INFO mapred.JobClient: Combine input records=4 12/04/16 20:37:59 INFO mapred.JobClient: SPLIT_RAW_BYTES=210 12/04/16 20:37:59 INFO mapred.JobClient: Reduce input records=4 12/04/16 20:37:59 INFO mapred.JobClient: Reduce input groups=3 12/04/16 20:37:59 INFO mapred.JobClient: Combine output records=4 12/04/16 20:37:59 INFO mapred.JobClient: Physical memory (bytes) snapshot=499847168 12/04/16 20:37:59 INFO mapred.JobClient: Reduce output records=3 12/04/16 20:37:59 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1718542336 12/04/16 20:37:59 INFO mapred.JobClient: Map output records=4
|
|
查看结果:
[root@localhost hadoop]# bin/hadoop fs -cat out/* hadoop 1 hello 2 world 1 cat: File does not exist: /user/root/out/_logs |
|
参考博文:實作七: Hadoop 叢集安裝