离线数仓采集通道搭建

离线数仓搭建——数据采集工具安装

@

一、zookeeper安装及配置

(1)zookeeper-3.5.9安装

先去网上下载zookeeper-3.5.9安装包,将安装包放入flink102的安装包路径

cd /opt/software

将安装包放入后进行解压

tar -zxvf apache-zookeeper-3.5.9-bin.tar.gz -C /opt/module # 然后去/opt/module中查看 cd /opt/module ll # 或者 ls # 将文件改名便于后续使用 mv apache-zookeeper-3.5.9-bin zookeeper-3.5.9

(2)修改zookeeper配置文件

进到zookeeper文件夹中查看文件目录

cd /opt/module/zookeeper-3.5.9 ll

创建zkData用于保存日志

mkdir zkData

对zookeeper的配置文件进行修改
先对文件进行更名

mv conf/zoo_sample.cfg zoo.cfg # 然后进行修改 vim conf/zoo.cfg

修改内容如下:

# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/opt/module/zookeeper-3.5.9/zkData # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 #########cluster######### server.2=flink102:2888:3888 server.3=flink103:2888:3888 server.4=flink104:2888:3888

(3)增加zookeeper环境变量

上一篇文章中有自定义一个环境变量文件,对其进行增加

sudo vim /etc/profile.d/my_env.sh

增加zoo_home:

# JAVA_HOME export JAVA_HOME=/opt/module/jdk1.8.0_212 export PATH=$PATH:$JAVA_HOME/bin # HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop-3.1.3 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin # ZOO_HOME export ZOO_HOME=/opt/module/zookeeper-3.5.9 export PATH=$PATH:$ZOO_HOME/bin
source /etc/profile

(4)zookeeper启动

启动zookeeper服务

cd /opt/module/zookeeper-3.5.9 bin/zkServer.sh start

启动zookeeper客户端

bin/zkCli.sh

(5)集群zookeeper配置

上面已经完成了单机搭建,下面将文件分发给103、104节点

cd /opt/module xsync zookeeper-3.5.9 sudo xsync /etc/profile.d/my_env.sh

记得去103、104节点source一下
在zkData文件夹下创建myid文件并编辑

cd /opt/module/zookeeper-3.5.9 vim zkData/myid # 写个编号,必须跟server.x的编号一一对应上 2

分发给103、104节点

xsync zkData/myid

并修改103和104上的编号

(6)zookeeper集群脚本编写

cd到我们的脚本目录

cd /home/flink/bin vim zk.sh

脚本内容如下:

#!/bin/bash if [ $# -lt 1 ] then echo "No Args Input" exit fi case $1 in "start") for i in flink102 flink103 flink104 do echo "==================$i==================" ssh $i /opt/module/zookeeper-3.5.9/bin/zkServer.sh start done for i in flink102 flink103 flink104 do echo "==================$i==================" ssh $i /opt/module/zookeeper-3.5.9/bin/zkServer.sh status done ;; "stop") for i in flink102 flink103 flink104 do echo "==================$i==================" ssh $i /opt/module/zookeeper-3.5.9/bin/zkServer.sh stop done ;; "status") for i in flink102 flink103 flink104 do echo "==================$i==================" ssh $i /opt/module/zookeeper-3.5.9/bin/zkServer.sh status done ;; *) echo "Args Error" ;; esac

赋予执行权限

chmod 777 zk.sh

现在可以尝试的群起zookeeper

zk.sh start # 查看后台是否群起 jpsall

二、kafka安装及配置

(1)kafka安装

这里我选用的是kafka_2.11-2.4.1.tgz,官网可以下载
跟zookeeper一样将安装包放入安装包目录

cd /opt/software # 解压kafka tar -zxvf kafka_2.11-2.4.1.tgz -C /opt/module # 去module目录修改文件名 cd /opt/module mv kafka_2.11-2.4.1 kafka

(2)修改kafka配置文件

进入kafka文件夹查看文件结构

cd kafka ll
drwxr-xr-x. 3 flink flink 4096 33 2020 bin drwxr-xr-x. 2 flink flink 4096 1112 14:44 config drwxrwxr-x. 20 flink flink 4096 1116 12:33 datas drwxr-xr-x. 2 flink flink 4096 1112 10:12 libs -rw-r--r--. 1 flink flink 32216 33 2020 LICENSE drwxrwxr-x. 2 flink flink 4096 1116 12:00 logs -rw-r--r--. 1 flink flink 337 33 2020 NOTICE drwxr-xr-x. 2 flink flink 4096 33 2020 site-docs

进config目录修改配置文件

cd config vim server.properties

修改内容为:

############################# Server Basics ############################# # The id of the broker. This must be set to a unique integer for each broker. broker.id=0 # The number of threads that the server uses for receiving requests from the network and sending responses to the network num.network.threads=3 # The number of threads that the server uses for processing requests, which may include disk I/O num.io.threads=8 # The send buffer (SO_SNDBUF) used by the socket server socket.send.buffer.bytes=102400 # The receive buffer (SO_RCVBUF) used by the socket server socket.receive.buffer.bytes=102400 # The maximum size of a request that the socket server will accept (protection against OOM) socket.request.max.bytes=104857600 ############################# Log Basics ############################# # A comma separated list of directories under which to store log files log.dirs=/opt/module/kafka/datas # The default number of log partitions per topic. More partitions allow greater # parallelism for consumption, but this will also result in more files across # the brokers. num.partitions=1 # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown. # This value is recommended to be increased for installations with data dirs located in RAID array. num.recovery.threads.per.data.dir=1 ############################# Zookeeper ############################# # Zookeeper connection string (see zookeeper docs for details). # This is a comma separated host:port pairs, each corresponding to a zk # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002". # You can also append an optional chroot string to the urls to specify the # root directory for all kafka znodes. zookeeper.connect=flink102:2181,flink103:2181,flink104:2181/kafka # Timeout in ms for connecting to zookeeper zookeeper.connection.timeout.ms=6000

(3)配置kafka环境变量

sudo vim /etc/profile.d/my_env.sh

添加以下内容:

# KAFKA_HOME export KAFKA_HOME=/opt/module/kafka export PATH=$PATH:$KAFKA_HOME/bin

记得source一下

(4)分发给103、104节点

xsync /opt/module/kafka sudo xsync /etc/profile.d/my_env.sh

记得各个节点都source,然后进入路径/opt/module/kafka/conf 修改server.properties,将broker.id设置成1、2

(5)群起kafka脚本

先试着开启102节点上的kafka服务

bin/kafka-server-start.sh

kafka群起脚本编写

cd /home/flink/bin vim kf.sh
#! /bin/bash case $1 in "start"){ for i in flink102 flink103 flink104 do echo " --------启动 $i Kafka-------" ssh $i "/opt/module/kafka/bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties" done };; "stop"){ for i in flink102 flink103 flink104 do echo " --------停止 $i Kafka-------" ssh $i "/opt/module/kafka/bin/kafka-server-stop.sh stop" done };; esac
chmod 777 kf.sh

试着群起kafka,在用jpsall看一下后台进程

kf.sh start jpsall

注意:如果是ARM架构的服务器,需要有以下更改:

三、flume安装及配置

(1)flume安装及配置

flume我选的是1.9.0,官网可下载
将安装包放入/opt/software

tar -zxvf /opt/software/apache-flume-1.9.0-bin.tar.gz -C /opt/module

修改文件名

cd /opt/module mv apache-flume-1.9.0-bin flume-1.9.0

flume安装后需要删除一个jar包以兼容hadoop3.1.3

rm /opt/module/flume-1.9.0/lib/guava-11.0.2.jar

至此数据采集通道需要安装的工具已经安装完成


__EOF__

本文作者z x z
本文链接https://www.cnblogs.com/xiaolongbaoxiangfei/p/15800845.html
关于博主:评论和私信会在第一时间回复。或者直接私信我。
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
声援博主:如果您觉得文章对您有帮助,可以点击文章右下角推荐一下。您的鼓励是博主的最大动力!
posted @   小笼包想飞  阅读(194)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
点击右上角即可分享
微信分享提示