在ubuntu上安装hadoop(书和官方文档结合的)
操作系统:ubuntu 12.04
1.$ sudo apt-get install install ssh (备注:需要输入yes的,要安装openssh server 和另一个文件,忘了)
2.官方文档要求:$ sudo apt-get install rsync 这个系统已经装好了的
3 安装java(我的安装方法)
$chmod +x jdk-6u30-linux-i586.bin
$./jdk-6u30-linux-i586.bin
找到安装好的目录jdk1.6.0_30
$sudo mv jdk1.6.0_30 /usr/java (没有这个目录,可以提前建一个)
(这里java -version是不会出版本信息的,对hadoop是不影响的,如果需要设定,可以参考我的另一个文章http://www.cnblogs.com/xioyaozi/archive/2012/05/21/2511562.html)
4 解压hadoop安装包
$ tar -zxvf hadoop-1.0.3-bin.tar.gz
解压后,找到hadoop-1.0.3目录,修改conf/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.6.0_30 (shell脚本中#号是注释)
6 在hadoop-1.0.3目录下 $ bin/hadoop 如果执行成功,就可以了。
Now you are ready to start your Hadoop cluster in one of the three supported modes:
- Local (Standalone) Mode(单机模式)
- Pseudo-Distributed Mode(伪分布式模式)
- Fully-Distributed Mode(完全分布式模式)
7 单机模式下(以下英文是官方文档,可以简单调试下)
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.
The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
$ cat output/*
8 Pseudo-Distributed Mode
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.Configuration
Use the following:
conf/core-site.xml:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
conf/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
conf/mapred-site.xml:
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
9 设置ssh免密码登陆
$ssh localhost
会让你输入密码,所以需要下进行配置
$ssh-keygen -t dsa
然后按回车就行。文件会自动产生.ssh目录,但是我们看不到,无所谓
$ cd .ssh
****/.ssh$ cp id_dsa.pub authorized_keys
然后执行$ ssh localhost就可以不需要密码登陆了
10 完全分布的还没有配置,OK,over了,我也是一个新手
伪分布式安装好后,可以进行wordcount的实验,见下一篇博文http://www.cnblogs.com/xioyaozi/archive/2012/05/28/2521161.html