大数据集群搭建

1、virtualbox

　　1、关闭虚拟机选择第一个休眠，会记录各种进程的状态。如果直接关掉虚拟机会关闭各种进程，导致环境崩溃。

2、Centos7

　　1、修改网络

　　网卡桥接，配置主机和虚拟机相互pingtong

　　vim /etc/sysconfig/network-scripts/ifcfj-enp0s3

　　BOOTPROTO=static

　　IPADDR=192.168.0.106 跟自己主机同一网段

　　GATEWAY=192.168.0.1

　　NETMASK=255.255.255.0

　　ONBOOT=yes

　　2、修改主机名

　　hostnamectl set-name spark

　　vi /etc/selinux/config

systemctl stop firewalld

　　SELINUX= disabled

　　3、修改host文件

　　vi /etc/hosts

　　192.168.0.106 spark1

　　192.168.0.107 spark2

　　192.168.0.108 spark3

　　4、配置免密登陆访问

　　ssh-keygen -t rsa

　　touch /root/.ssh/authorized_keys

　　cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys

　　ssh-copy-id -i spark3

3、JDK1.7

4、MobaXterm

5、hadoop2.4.1

　tar -zxvf hadoop2.4.2.tar.gz

　mv hadoop2.4.1 hadoop

　 vim ~/.bashrc

　　export HADOOP_HOME=/usr/local/hadoop
　　export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

　　source ~/.bashrc

　　配置hadoop下面的etc/hadoop目录下面的配置文件

　　修改core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>fs.default.name</name>
  <value>hdfs://spark1:9000</value>
</property>
</configuration>

修改hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>dfs.name.dir</name>
  <value>/usr/local/data/namenode</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/usr/local/data/datanode</value>
</property>
<property>
  <name>dfs.tmp.dir</name>
  <value>/usr/local/data/tmp</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>3</value>
</property>
</configuration>

修改mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
</configuration>

修改yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
<property>
 <name>yarn.resourcemanager.hostname</name>
 <value>spark1</value>
</property>
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property>
</configuration>

修改slaves

spark1
spark2
spark3

启动hadoop集群

　　格式化namenode在spark1上面执行

　　hdfs namenode -format

　　start-dfs.sh

　　安装一个jdk开发工具

　　yum install java-1.8.0-openjdk-devel.x86_64

　　启动完成后要确认

　　spark1上面有namenode，datanode，secondarynamenode

　　spark2上面有datanode

　　spark3上面有datanode

http://spark1:50070/dfshealth.html#tab-overview可以访问

　　启动yarn集群

　　start-yarn.sh

　　spark1:resourcemanager、nodemanager

　　spark2:nodemanager

　　spark3:nodemanager

　　http://spark1:8088/cluster可以访问

6、Hive 0.13

　　1、配置hive

　　tar -zxvf apache-hive-0.13-bin.tar.gz

　　mv apache-hive-0.13-bin hive

　　vim ~/.bashrc

　　$HIVE_HOME=/usr/local/hive

　　配置环境变量

　　2、安装mysql-sever

　　$ wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rp

　　$ sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm

　　yum install -y mysql

　　service mysqld start

　　chkconfig mysqld on

　　yum install -y mysql-connector-java

　　cp /usr/share/java/mysql-connector-java.jar /usr/local/hive/lib/

　　3、登录mysql创建hive元数据库

　　4、配置hive

　　　　去掉createDatabaseIfNotExist增加serverTimezone=Asia/Shanghai

　　验证hive就是输入hive看是否进入命令行

　　create table t(id int);

　　select * from t;

　　drop table t;

7、Zookeeper3.4.5

　　tar -zxvf zookeeper3.4.5

　　mv zookeeper3.4.5 zk

　　配置环境变量

　　vim ~/.bashrc

　　export ZOOKEEPER_HOME=/usr/local/zk

　　export JAVA_HOME=/usr/lib/jvm/jre
　　export HADOOP_HOME=/usr/local/hadoop
　　export HIVE_HOME=/usr/local/hive
　　export ZOOKEEPER_HOME=/usr/local/zk
　　export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin

　　修改zookeeper的conf下面的配置文件

　　 mv zoo_sample.cfg zoo.cfg

　　vim zoo.cfg

　　dataDir=/usr/local/zk/data

　　server.0=spark1:2888:3888

　　server.1=spark2:2888:3888

　　server.2=spark3:2888:3888

　　窗口data目录和myid文件

　　mkdir data

　　vim myid 0

　　拷贝zk文件和环境变量到spark2，spark3并且刷新环境变量，修改myid 为，1，2

　　scp -r zk/ root@spark3:/usr/local/

scp ~/.bashrc root@spark3:~/

　　启动zk

　　zkServer.sh start

8、kafka_2.9.2-0.8.1

　　解压scala-2.11.4.tgz解压，该命成scala

　　配置环境变量　

　　export JAVA_HOME=/usr/lib/jvm/jre
　　export HADOOP_HOME=/usr/local/hadoop
　　export HIVE_HOME=/usr/local/hive
　　export ZOOKEEPER_HOME=/usr/local/zk
　　export SCALA_HOME=/usr/local/scala
　　export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin

　　解压kafka_2.9.2-0.8.1

　　修改配置文件kafak/config/server.properites

　　zookeeper.connect=spark1:2181,spark2:2181,spark3:2181

　　broker.id=0

　　将slf4j-1.7.6.zip解压

　　slf4j-nop-1.7.6.jar 复制到kafka/libs下面

　　启动kafka

　　nohup bin/kafka-server-start.sh config/server.properties &

　　测试集群

　　bin/kafka-topics.sh --zookeeper 192.168.0.106:2181,192.168.0.107:2181,192.168.0.108:2181 --topic Test --replication-factor 1 --partitions 1 --create

　　 bin/kafka-console-producer.sh --broker-list spark1:9092,spark2:9092,spark3:9092 --topic Test

　　bin/kafka-console-consumer.sh --zookeeper spark1:2181,spark2:2181,spark3:2181 --topic Test --from-beginning

9、spark 1.3.0

　　上传spark-1.3.0-bin-hadoop2.4.tgz,解压改名

　　配置环境变量

export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export ZOOKEEPER_HOME=/usr/local/zk
export SCALA_HOME=/usr/local/scala
export SPARK_HOME=/usr/local/spark
export CLASS_PATH=.:$CLASS_PATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin

　　cd /usr/local/spark/conf

　　cp spark-env.sh.template spark-env.sh

　　vim spark-env.sh

export JAVA_HOME=/usr/lib/jvm/jre
export SCALA_HOME=/usr/local/scala
export SPARK_MASTER=192.168.0.106
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

　　cp slaves.template slaves

　　vim slaves

　　spark1

　　spark2

　　spark3

　　复制spark和~/.bashrc到spark2，spark3，然后source ~/.bashrc

启动spark

　　cd /usr/local/spark/sbin

　　./start-all.sh

验证通过jps查看是否有spark1：master，worker，spark2:worker,spark3:worker

http://spark1:8080/

运行spark-shell，进入如下界面

posted on 2020-05-01 22:47 清浊阅读(1248) 评论(0) 编辑收藏举报