ubuntu下搭建hadoop平台
终于把单击模式跟伪分布式模式搭建起来了,记录于此。
1.SSH无密码验证配置
因为伪分布模式下DataNode和NameNode均是本身,所以必须配置SSH localhost的无密码验证。
第一步,安装并启动SSH:
~$ sudo apt-get install openssh-server ~$ sudo /etc/init.d/ssh start
第二步,生成公钥和私钥,并将公钥追加到authorized_keys中(authorized_keys用于保存所有允许以当前用户身份登录到ssh客户端用户的公钥内容):
~$ ssh-keygen -t rsa -P "" ~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
2.安装java:
~$ sudo apt-get install openjdk-6-jdk
3.安装hadoop
第一步,官网http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/下载hadoop(我下载的是1.2.1版本)。解压并移动到/usr/local目录下,增加hadoop用户权限:
~$ sudo tar -xzf hadoop-1.1.2.tar.gz ~$ sudo mv hadoop-1.1.2 /usr/local/hadoop ~$ sudo chown -R hadoop:hadoop /usr/local/hadoop
第二步,在/hadoop/conf/hadoop-env/sh中配置java环境:
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386 export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:/usr/local/hadoop/bin
第三步,配置core-site.xml,hdfs-site.xml和mapred-site.xml:
core-site.xml:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> </configuration>
hdfs-site.xml:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.name.dir</name> <value>/usr/local/hadoop/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/usr/local/hadoop/hdfs/data</value> </property> </configuration>
mapred-site.xml:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
第四步,使环境变量生效并格式化HDFS:
~$ source /usr/local/hadoop/conf/hadoop-env.sh ~$ hadoop namenode -format
第五步,启动hadoop并列出所有守护进程来查看是否安装成功:
~$ bin/start-all.sh ~$ jps
第六步,环境测试:
~$ bin/hadoop dfs -mkdir input ~$ hadoop dfs -copyFromLocal conf/* input ~$ hadoop jar hadoop-examples-1.1.2.jar wordcount input output ~$ hadoop dfs -cat output/*
第七步,关闭hadoop守护进程:
~$ bin/stop-all.sh