Hadoop-1.0.3的伪分布式安装
1、创建用户账号和Hadoop部署目录和数据目录
#创建hadoop用户组
/usr/sbin/groupadd hadoop
#分配hadoop到hadoop组中
/usr/sbin/useradd hadoop -g hadoop
#给hadoop用户设置密码
passwd hadoop
#创建hadoop代码目录结构
mkdir -p /opt/modules/hadoop/
#创建hadoop数据目录结构
mkdir -p /opt/data/hadoop/
#修改 目录结构权限为hadoop
chown -R hadoop:hadoop /opt/modules/hadoop/
chown -R hadoop:hadoop /opt/data/hadoop/
2、配置机器时间同步
# crontab -e
0 1 * * * /usr/sbin/ntpdate cn.pool.ntp.org
#server 0.centos.pool.ntp.org
#server 1.centos.pool.ntp.org
#server 2.centos.pool.ntp.org
#server 1.centos.pool.ntp.org
#server 2.centos.pool.ntp.org
3、配置机器网络环境
修改第一台hostname为master
# vi /etc/sysconfig/network
NETWORKING=yes #启动网络
NETWORKING_IPV6=no
HOSTNAME=master #主机名
# service network restart
# hostname master
4、配置集群hosts列表(本地DNS配置)
vi /etc/hosts
#添加以下内容到vi中
192.168.16.220 master
5、生成登陆密钥
#切换Hadoop用户下
su hadoop
cd /home/hadoop/
#生成公钥和私钥
ssh-keygen -q -t rsa -N "" -f /home/hadoop/.ssh/id_rsa
cd .ssh
cat id_rsa.pub > authorized_keys
chmod go-rwx authorized_keys
#公钥:复制文件内容id_rsa.pub到authorized_keys
#集群环境id_rsa.pub 复制到 node1:/home/hadoop/.ssh/authorized_keys
#检查
ll -a /home/hadoop/.ssh
6、Hadoop文件下载和解压(该过程最好在hadoop用户下进行)
#切到hadoop安装路径下
cd /opt/modules/hadoop/
#从hadoop.apache.org下载Hadoop安装文件
#如果已经下载请复制文件到安装hadoop文件夹
cp hadoop-1.0.3.tar.gz /opt/modules/hadoop/
cd /opt/modules/hadoop/
tar -xzvf hadoop-1.0.3.tar.gz
7、安装JAVA JDK系统软件(该过程在root用户下进行)
# su root
# chmod +x jdk-6u21-linux-i586-rpm.bin
./jdk-6u21-linux-i586-rpm.bin
配置环境变量
vi /etc/profile.d/java.sh
export JAVA_HOME=/usr/java/jdk1.6.0_21/
export HADOOP_HOME=/opt/modules/hadoop/hadoop-1.0.3/
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
#手动立即生效
source /etc/profile
8、检查基础环境(该过程最好在hadoop用户下进行)
#测试命令
/sbin/ifconfig
ping master
ssh master (在hadoop下进行)
jps
echo $JAVA_HOME
echo $HADOOP_HOME
9、配置hadoop-env.sh环境变量
#配置hadoop最大HADOOP_HEAPSIZE大小,默认为1000,因为虚拟机最大内存配置512m,这里配置较小。
vi /opt/modules/hadoop/hadoop-1.0.3/conf/hadoop-env.sh
export HADOOP_HEAPSIZE=32
10、Hadoop Common组件 配置 core-site.xml、hdfs-site.xml、mapred-site.xml
配置hadoop-env.sh环境变量
#配置hadoop最大HADOOP_HEAPSIZE大小,默认为1000,因为虚拟机最大内存配置512m,这里配置较小。
vi /opt/modules/hadoop/hadoop-1.0.3/conf/hadoop-env.sh
export HADOOP_HEAPSIZE=32
10.1、配置core-site.xml文件
vi /opt/modules/hadoop/hadoop-1.03/conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/data/hadoop/hdfs/namesecondary</value>
</property>
<property>
<name>fs.checkpoint.period</name>
<value>1800</value>
</property>
<property>
<name>fs.checkpoint.size</name>
<value>33554432</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
</configuration>
10.2、HDFS NameNode,DataNode组建配置hdfs-site.xml
vi /opt/modules/hadoop/hadoop-1.0.3/conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/data/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.http.address</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>node1:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>1073741824</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
10.3、配置MapReduce-JobTracker TaskTracker启动配置
vi /opt/modules/hadoop/hadoop-1.0.3/conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/data/hadoop/mapred/mrlocal</value>
<final>true</final>
</property>
<property>
<name>mapred.system.dir</name>
<value>/data/hadoop/mapred/mrsystem</value>
<final>true</final>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>2</value>
<final>true</final>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>1</value>
<final>true</final>
</property>
<property>
<name>io.sort.mb</name>
<value>32</value>
<final>true</final>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx64M</value>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
</configuration>
11、启动hadoop的过程
<root用户>
mkdir -p /data/
mkdir -p /data/hadoop/
chown -R hadoop:hadoop /data/hadoop/
chown -R hadoop:hadoop /opt/modules/hadoop/*
<hadoop用户>
su hadoop
mkdir -p /data/hadoop/hdfs/namesecondary
mkdir -p /data/hadoop/hdfs/name
mkdir -p /data/hadoop/hdfs/data
chmod go-w /data/hadoop/hdfs/data
#格式化文件
/opt/modules/hadoop/hadoop-1.0.3/bin/hadoop namenode -format
#启动master node
/opt/modules/hadoop/hadoop-1.0.3/bin/hadoop-daemon.sh start namenode
/opt/modules/hadoop/hadoop-1.0.3/bin/hadoop-daemon.sh start datanode
mkdir -p /data/hadoop/mapred/mrlocal
mkdir -p /data/hadoop/mapred/mrsystem
/opt/modules/hadoop/hadoop-1.0.3/bin/hadoop-daemon.sh start jobtracker
/opt/modules/hadoop/hadoop-1.0.3/bin/hadoop-daemon.sh start tasktracker
12、测试是否启动成功
检查namenode和datanode是否正常
http://192.168.16.220:50070/
检查jobtracker和tasktracker是否正常
http://192.168.16.220:50030/
通过执行hadoop pi 运行样例检查集群是否成功
cd /opt/modules/hadoop/hadoop-1.0.3
bin/hadoop jar hadoop-examples-1.0.3.jar pi 10 100