大数据集群搭建
1、virtualbox
1、关闭虚拟机选择第一个休眠,会记录各种进程的状态。如果直接关掉虚拟机会关闭各种进程,导致环境崩溃。
2、Centos7
1、修改网络
网卡桥接,配置主机和虚拟机相互pingtong
vim /etc/sysconfig/network-scripts/ifcfj-enp0s3
BOOTPROTO=static
IPADDR=192.168.0.106 跟自己主机同一网段
GATEWAY=192.168.0.1
NETMASK=255.255.255.0
ONBOOT=yes
2、修改主机名
hostnamectl set-name spark
vi /etc/selinux/config
systemctl stop firewalld
SELINUX= disabled
3、修改host文件
vi /etc/hosts
192.168.0.106 spark1
192.168.0.107 spark2
192.168.0.108 spark3
4、配置免密登陆访问
ssh-keygen -t rsa
touch /root/.ssh/authorized_keys
cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
ssh-copy-id -i spark3
3、JDK1.7
4、MobaXterm
5、hadoop2.4.1
tar -zxvf hadoop2.4.2.tar.gz
mv hadoop2.4.1 hadoop
vim ~/.bashrc
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source ~/.bashrc
配置hadoop下面的etc/hadoop目录下面的配置文件
修改core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://spark1:9000</value> </property> </configuration>
修改hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.name.dir</name> <value>/usr/local/data/namenode</value> </property> <property> <name>dfs.data.dir</name> <value>/usr/local/data/datanode</value> </property> <property> <name>dfs.tmp.dir</name> <value>/usr/local/data/tmp</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration>
修改mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
修改yarn-site.xml
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>spark1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
修改slaves
spark1
spark2
spark3
启动hadoop集群
格式化namenode在spark1上面执行
hdfs namenode -format
start-dfs.sh
安装一个jdk开发工具
yum install java-1.8.0-openjdk-devel.x86_64
启动完成后要确认
spark1上面有namenode,datanode,secondarynamenode
spark2上面有datanode
spark3上面有datanode
http://spark1:50070/dfshealth.html#tab-overview可以访问
启动yarn集群
start-yarn.sh
spark1:resourcemanager、nodemanager
spark2:nodemanager
spark3:nodemanager
http://spark1:8088/cluster可以访问
6、Hive 0.13
1、配置hive
tar -zxvf apache-hive-0.13-bin.tar.gz
mv apache-hive-0.13-bin hive
vim ~/.bashrc
$HIVE_HOME=/usr/local/hive
配置环境变量
2、安装mysql-sever
$ wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rp
$ sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm
yum install -y mysql
service mysqld start
chkconfig mysqld on
yum install -y mysql-connector-java
cp /usr/share/java/mysql-connector-java.jar /usr/local/hive/lib/
3、登录mysql创建hive元数据库
4、配置hive
去掉createDatabaseIfNotExist增加serverTimezone=Asia/Shanghai
验证hive就是输入hive看是否进入命令行
create table t(id int);
select * from t;
drop table t;
7、Zookeeper3.4.5
tar -zxvf zookeeper3.4.5
mv zookeeper3.4.5 zk
配置环境变量
vim ~/.bashrc
export ZOOKEEPER_HOME=/usr/local/zk
export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export ZOOKEEPER_HOME=/usr/local/zk
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin
修改zookeeper的conf下面的配置文件
mv zoo_sample.cfg zoo.cfg
vim zoo.cfg
dataDir=/usr/local/zk/data
server.0=spark1:2888:3888
server.1=spark2:2888:3888
server.2=spark3:2888:3888
窗口data目录和myid文件
mkdir data
vim myid 0
拷贝zk文件和环境变量到spark2,spark3并且刷新环境变量,修改myid 为,1,2
scp -r zk/ root@spark3:/usr/local/
scp ~/.bashrc root@spark3:~/
启动zk
zkServer.sh start
8、kafka_2.9.2-0.8.1
解压scala-2.11.4.tgz解压,该命成scala
配置环境变量
export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export ZOOKEEPER_HOME=/usr/local/zk
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin
解压kafka_2.9.2-0.8.1
修改配置文件kafak/config/server.properites
zookeeper.connect=spark1:2181,spark2:2181,spark3:2181
broker.id=0
将slf4j-1.7.6.zip解压
slf4j-nop-1.7.6.jar 复制到kafka/libs下面
启动kafka
nohup bin/kafka-server-start.sh config/server.properties &
测试集群
bin/kafka-topics.sh --zookeeper 192.168.0.106:2181,192.168.0.107:2181,192.168.0.108:2181 --topic Test --replication-factor 1 --partitions 1 --create
bin/kafka-console-producer.sh --broker-list spark1:9092,spark2:9092,spark3:9092 --topic Test
bin/kafka-console-consumer.sh --zookeeper spark1:2181,spark2:2181,spark3:2181 --topic Test --from-beginning
9、spark 1.3.0
上传spark-1.3.0-bin-hadoop2.4.tgz,解压改名
配置环境变量
export JAVA_HOME=/usr/lib/jvm/jre export HADOOP_HOME=/usr/local/hadoop export HIVE_HOME=/usr/local/hive export ZOOKEEPER_HOME=/usr/local/zk export SCALA_HOME=/usr/local/scala export SPARK_HOME=/usr/local/spark export CLASS_PATH=.:$CLASS_PATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
cd /usr/local/spark/conf
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
export JAVA_HOME=/usr/lib/jvm/jre export SCALA_HOME=/usr/local/scala export SPARK_MASTER=192.168.0.106 export SPARK_WORKER_MEMORY=1g export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
cp slaves.template slaves
vim slaves
spark1
spark2
spark3
复制spark和~/.bashrc到spark2,spark3,然后source ~/.bashrc
启动spark
cd /usr/local/spark/sbin
./start-all.sh
验证通过jps查看是否有spark1:master,worker,spark2:worker,spark3:worker
运行spark-shell,进入如下界面