搭建zookeeper+kafka集群
搭建zookeeper+kafka集群
一、环境及准备
集群环境:
软件版本:
部署前操作:
关闭防火墙,关闭selinux(生产环境按需关闭或打开)
同步服务器时间,选择公网ntpd服务器或者自建ntpd服务器
[root@es1 ~]# crontab -l #为了方便直接使用公网服务器
#update time
*/5 * * * * /usr/bin/rdate -s time-b.nist.gov &>/dev/null
二、zookeeper集群安装配置
1.安装jvm依赖环境(三台)
安装JDK
[root@node01 ~]# rpm -ivh jdk1.8.0_162-x64.rpm #为了以后升级麻烦直接安装1.8
Preparing... ########################################### [100%]
1:jdk1.8.0_162 ########################################### [100%]
设置Java环境
[root@node01 ~]# cat /etc/profile.d/java.sh #编辑Java环境配置文件
export JAVA_HOME=/usr/java/latest
export CLASSPATH=$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
[root@node01 ~]# . /etc/profile.d/java.sh
[root@node01 ~]# java -version #确认配置
java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
2.安装配置zookeeper
[root@node01 ~]#wget http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.4.13/zookeeper-3.4.13.tar.gz #如果无法下载,可以到官方下载页面下载 https://zookeeper.apache.org/releases.html
[root@node01 ~]#tar xf zookeeper-3.4.13.tar.gz -C /usr/local
[root@node01 ~]#cd /usr/local
[root@node01 local]#ln -sv zookeeper-3.4.13 zookeeper
[root@node01 local]#cd zookeeper/conf
[root@node01 conf]# cp zoo_sample.cfg zoo.cfg
[root@node01 conf]# vim zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/Data/zookeeper
clientPort=2181
server.1=172.16.150.154:2888:3888
server.2=172.16.150.155:2888:3888
server.3=172.16.150.156:2888:3888
#配置参数说明:
tickTime:客户端与服务器或者服务器与服务器之间每个tickTime时间就会发送一次心跳。通过心跳不仅能够用来监听机器的工作状态,还可以通过心跳来控制Flower跟Leader的通信时间,默认2秒
initLimit:集群中的follower服务器(F)与leader服务器(L)之间初始连接时能容忍的最多心跳数(tickTime的数量)。
syncLimit:集群中flower服务器(F)跟leader(L)服务器之间的请求和答应最多能容忍的心跳数。
dataDir:该属性对应的目录是用来存放myid信息跟一些版本,日志,跟服务器唯一的ID信息等。
clientPort:客户端连接的接口,客户端连接zookeeper服务器的端口,zookeeper会监听这个端口,接收客户端的请求访问!这个端口默认是2181。
service.N=YYY:A:B
N:代表服务器编号(也就是myid里面的值)
YYY:服务器地址
A:表示 Flower 跟 Leader的通信端口,简称服务端内部通信的端口(默认2888)
B:表示 是选举端口(默认是3888)
创建zookeeper所需要的目录和myid文件
[root@node01 conf]# mkdir -pv /Data/zookeeper
mkdir: 已创建目录 "/Data"
mkdir: 已创建目录 "/Data/zookeeper"
[root@node01 conf]# echo "1" > /Data/zookeeper/myid #myid文件,里面的内容为数字,用于标识主机,如果这个文件没有的话,zookeeper无法启动
其他节点配置相同,除以下配置:
echo "x" > /Data/zookeeper/myid #唯一
3.启动zookeeper(三台)
[root@node01 zookeeper]# cd /usr/local/zookeeper/bin
[root@node01 bin]# ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... taSTARTED
[root@node01 bin]# tailf zookeeper.out
2019-02-13 14:05:28,088 [myid:] - INFO [main:QuorumPeerConfig@136] - Reading configuration from: /usr/local/zookeeper/bin/../conf/zoo.cfg
2019-02-13 14:05:28,102 [myid:] - INFO [main:QuorumPeer$QuorumServer@184] - Resolved hostname: 172.16.150.154 to address: /172.16.150.154
2019-02-13 14:05:28,102 [myid:] - INFO [main:QuorumPeer$QuorumServer@184] - Resolved hostname: 172.16.150.156 to address: /172.16.150.156
2019-02-13 14:05:28,103 [myid:] - INFO [main:QuorumPeer$QuorumServer@184] - Resolved hostname: 172.16.150.155 to address: /172.16.150.155
2019-02-13 14:05:28,103 [myid:] - INFO [main:QuorumPeerConfig@398] - Defaulting to majority quorums
2019-02-13 14:05:28,108 [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2019-02-13 14:05:28,108 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2019-02-13 14:05:28,108 [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2019-02-13 14:05:28,119 [myid:1] - INFO [main:QuorumPeerMain@130] - Starting quorum peer
2019-02-13 14:05:28,128 [myid:1] - INFO [main:ServerCnxnFactory@117] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2019-02-13 14:05:28,134 [myid:1] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2019-02-13 14:05:28,144 [myid:1] - INFO [main:QuorumPeer@1158] - tickTime set to 2000
2019-02-13 14:05:28,144 [myid:1] - INFO [main:QuorumPeer@1204] - initLimit set to 10
2019-02-13 14:05:28,144 [myid:1] - INFO [main:QuorumPeer@1178] - minSessionTimeout set to -1
2019-02-13 14:05:28,144 [myid:1] - INFO [main:QuorumPeer@1189] - maxSessionTimeout set to -1
2019-02-13 14:05:28,151 [myid:1] - INFO [main:QuorumPeer@1467] - QuorumPeer communication is not secured!
2019-02-13 14:05:28,153 [myid:1] - INFO [main:QuorumPeer@1496] - quorum.cnxn.threads.size set to 20
2019-02-13 14:05:28,196 [myid:1] - INFO [ListenerThread:QuorumCnxManager$Listener@736] - My election bind port: /172.16.150.154:3888
........
zookeeper服务检查
[root@node01 bin]# netstat -nlpt | grep -E "2181|2888|3888" tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN 6242/java tcp 0 0 172.16.150.154:3888 0.0.0.0:* LISTEN 6242/java [root@node02 ~]# netstat -nlpt | grep -E "2181|2888|3888" tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN 5197/java tcp 0 0 172.16.150.155:3888 0.0.0.0:* LISTEN 5197/java [root@node03 ~]# netstat -nlpt | grep -E "2181|2888|3888" tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN 5304/java tcp 0 0 172.16.150.156:2888 0.0.0.0:* LISTEN 5304/java #哪台是leader,那么他就拥有2888端口,可以看到目前node3节点为leader
tcp 0 0 172.16.150.156:3888 0.0.0.0:* LISTEN 5304/java
测试服务器是否正常
[root@node01 bin]# yum install telnet nc -y
[root@node01 bin]# telnet 172.16.150.154 2181
Trying 172.16.150.154...
Connected to 172.16.150.154.
Escape character is '^]'.
exit
Connection closed by foreign host.
[root@node01 bin]# echo "stat"|nc 172.16.150.154 2181 #conf 可以显示配置信息,cons可以显示所有客户端连接的详细信息,mntr命令比stat命令更详细
Zookeeper version: 3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 04:05 GMT
Clients:
/172.16.150.154:54989[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x1000000d4
Mode: follower
Node count: 138
检查集群状态
[root@node01 bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower #follower或者leader
连接zookeeper
[root@node01 bin]# ./zkCli.sh -server 172.16.150.154:2181
Connecting to 172.16.150.154:2181
2019-02-13 14:25:24,060 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 04:05 GMT
....
[zk: 172.16.150.154:2181(CONNECTED) 0] h #查看命令帮助
ZooKeeper -server host:port cmd args
stat path [watch]
set path data [version]
ls path [watch]
delquota [-n|-b] path
ls2 path [watch]
setAcl path acl
setquota -n|-b val path
history
redo cmdno
printwatches on|off
delete path [version]
sync path
listquota path
rmr path
get path [watch]
create [-s] [-e] path data acl
addauth scheme auth
quit
getAcl path
close
connect host:port
[zk: 172.16.150.154:2181(CONNECTED) 1] quit #退出
设置jconsole连接zookeeper
[root@node01 bin]# vim zkServer.sh #修改54行,172.16.150.154是本机的ip地址,8899是jconsole的连接地址,关闭ssl和认证
ZOOMAIN="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=$JMXLOCALONLY -Djava.rmi.server.hostname=172.16.150.154 -Dcom.sun.management.jmxremote.port=8899 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false org.apache .zookeeper.server.quorum.QuorumPeerMain"
./zkServer.sh stop && ./zkServer.sh start #重启服务,使用jconsole连接zookeeper服务器,选择远程连接,输入172.16.150.154:8899 即可
#登录jconsole
zookeeper开启超级用户 #关于zookeeper ACL权限请参考官方文档
当设置了znode权限,但是密码忘记了怎么办?如果忘记了该子节点的授权用户名还有密码。这里是比较蛋疼的事情。由于我们基本上找不到because在base64反编码后再sha1反编码后的样子,所以基本上这个节点的控制权可以说是失去了。还好Zookeeper提供了超级管理员机制。
[root@node01 bin]# cd /usr/local/zookeeper/lib/ [root@node01 lib]# java -cp ../zookeeper-3.4.13.jar:./log4j-1.2.17.jar:./slf4j-api-1.7.25.jar:./slf4j-log4j12-1.7.25.jar org.apache.zookeeper.server.auth.DigestAuthenticationProvider super:super super:super->super:gG7s8t3oDEtIqF6DM9LlI/R+9Ss= #生成密文
[root@node01 lib]# vim ../bin/zkServer.sh SUPER_ACL="-Dzookeeper.DigestAuthenticationProvider.superDigest=super:gG7s8t3oDEtIqF6DM9LlI/R+9Ss="
#添加以上标记的内容
验证用户是否有效
[root@node01 lib]# cd ../bin/
[root@node01 bin]# ./zkServer.sh stop #修改配置文件后重启服务
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
[root@node01 bin]# ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@node01 bin]# ./zkCli.sh -server 172.16.150.154
......
[zk: 172.16.150.154(CONNECTED) 0] addauth digest super:super #对之前添加的用户进行认证
[zk: 172.16.150.154(CONNECTED) 1] quit
三、kafka集群安装
kafka同样依赖Java环境,由于和zookeeper在相同的机器上,之前已经安装过了,所有可以直接跳过Java环境安装
1.安装kafka
[root@node01 ~]#wget http://mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.0.1/kafka_2.11-2.0.1.tgz
[root@node01 ~]#tar xf kafka_2.11-2.0.1.tgz -C /usr/local
[root@node01 ~]#cd /usr/local
[root@node01 local]# ln -sv kafka_2.11-2.0.1 kafka
[root@node01 local]# cd kafka/config/
[root@node01 config]#cp server.properties server.properties-bak
[root@node01 config]# grep "^[a-Z]" server.properties
broker.id=1 #唯一
listeners=PLAINTEXT://172.16.150.154:9092 #修改为本机地址
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/Data/kafka-logs #数据目录,kafka-logs会自动采集
num.partitions=3
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=172.16.150.154:2181,172.16.150.155:2181,172.16.150.156:2181 #zokeeper集群地址,以","为分割
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0
其他节点配置相同,除以下内容:
broker.id=1 #唯一
listeners=PLAINTEXT://172.16.150.154:9092 #修改为本机地址
启动服务
[root@node01 config]# cd ../bin
[root@node01 bin]#
./kafka-server-start.sh -daemon ../config/server.properties #后台运行
验证服务是否正常
登录zookeeper验证:
[zk: 172.16.150.154(CONNECTED) 5] get /brokers/ids/1 #查看节点broker id为1的信息
{"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"},"endpoints":["PLAINTEXT://172.16.150.154:9092"],"jmx_port":-1,"host":"172.16.150.154","timestamp":"1549953312989","port":9092,"version":4}
cZxid = 0x10000002e
ctime = Tue Feb 12 14:35:13 CST 2019
mZxid = 0x10000002e
mtime = Tue Feb 12 14:35:13 CST 2019
pZxid = 0x10000002e
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x10077feb7bc0001
dataLength = 198
numChildren = 0
创建topic验证
#154上创建一个生产者
[root@node01 ~]# cd /usr/local/kafka/bin/
[root@node01 bin]# ./kafka-topics.sh --create --zookeeper 172.16.150.154:2181 --replication-factor 1 --partitions 1 --topic Test Created topic "Test". [root@node01 bin]# ./kafka-console-producer.sh --broker-list 172.16.150.154:9092 --topic Test #其他服务器上创建一个消费者
[root@node02 ~]# cd /usr/local/kafka/bin/ [root@node02 bin]# ./kafka-console-consumer.sh --bootstrap-server 172.16.150.155:9092 --topic Test --from-beginning
#启动成功后,在154上输入任意内容,在另一台机器上查看是否会同步显示
四、zookeeper及kafka监控工具
1.zookeeper监控工具(没有安装过,有需要请参考官方文档)
zookeeper监控工具地址:https://github.com/soabase/exhibitor
2.kafka监控工具
1)KafkaOffsetMonitor
[root@node01 ~]#mkdir KafkaMonitor
[root@node01 ~]#cd KafkaMonitor/
[root@node01 ~]#wget https://github.com/quantifind/KafkaOffsetMonitor/releases/download/v0.2.1/KafkaOffsetMonitor-assembly-0.2.1.jar
[root@node01 ~]#nohup java -cp KafkaOffsetMonitor-assembly-0.2.0.jar com.quantifind.kafka.offsetapp.OffsetGetterWeb --zk 172.16.150.154:2181,172.16.150.155:2181,172.16.150.156:2181 -port 8088 --refresh 5.seconds --retain 1.days &
访问(由于测试环境没有数据,所有我直接登录生产环境来演示):
查看曾经消费者的情况
查看其中任意一个消费者信息
主意lag字段,表示是否有延迟
查看topic
2)kafka-manager
[root@node01 ~]# unzip kafka-manager-1.3.3.7.zip #直接使用已经编译完成的软件包(链接: https://pan.baidu.com/s/12sswyPo7-e9R3mZQ3ba-dA 提取码: jz6s)
[root@node01 ~]# cd kafka-manager-1.3.3.7
[root@node01 ~]# cd conf/
[root@node01 ~]# vim application.conf
kafka-manager.zkhosts="172.16.150.154:2181,172.16.150.155:2181,172.16.150.156:2181" #填写zookeeper服务器地址和端口
[root@node01 ~]#cd ../bin/
[root@node01 ~]# ./kafka-manager -Dconfig.file=../conf/application.conf -Dhttp.port=8888 #8888表示监听端口,启动后直接访问
#kafka-manager安装需要编译,并且过程复杂、成功率低建议使用其他人已经编译过得直接使用
3)kafka eagle(已经更名为EFAK)
依赖:
1.java 环境,根据版本选择,当前版本要求jdk1.8,安装步骤略
2.mysql 5.7,安装步骤略,创建名称为ke数据库
1.下载
wget https://github.com/smartloli/kafka-eagle-bin/archive/v3.0.1.tar.gz
tar xf kafka-eagle-bin-3.0.1.tar.gz
cd kafka-eagle-bin-3.0.1/
tar xf efak-web-3.0.1-bin.tar.gz
mv efak-web-3.0.1 /usr/local/efak
2.设置环境变量
vim /etc/profile.d/efak.sh
export KE_HOME=/usr/local/efak
export PATH=$PATH:$KE_HOME/bin
. /etc/profile.d/efak.sh
3.配置服务
独立部署模式
cd /usr/local/efak/conf/
cp system-config.properties system-config.properties-bak
vim system-config.properties
5 efak.zk.cluster.alias=cluster1 #名称自定义
6 cluster1.zk.list=node1:2181,node2:2181,node3:2181 #ZK地址
7 #cluster2.zk.list=xdn10:2181,xdn11:2181,xdn12:2181 #注释该行
126 efak.username=root #数据库登录用户密码
127 efak.password=Qwer@123
cd ../bin
vim ke.sh
export JAVA_HOME=/usr/local/java
./ke.sh start #启动服务
主页
分布式部署模式
当管理Kafka多集群或者一个规模较大的Kafka集群时,单机模式的EFAK部署时,运行的多线程任务,相关消费者、Topic、生产者、Broker & Zookeeper的监控指标等内容调度时,部署EFAK的服务器如果配置较低,会造成很大的负载,对CPU的负载会很高。为了解决这类问题,EFAK开发了分布式模式的部署,可由多个低配置的服务器来组件一个EFAK集群。来对Kafka多集群进行监控和管理。
1.同步环境变量
2.同步安装目录
3.修改system-config.properties配置
efak.distributed.enable=true #开启分布式
efak.cluster.mode.status=master #非master节点上该属性修改为salve
efak.worknode.master.host=192.168.1.131 #master地址,所有集群一致
4.master上修改works文件,当前conf目录下,写入slave地址,需要master节点可以对slave节点ssh免密登录执行
slave1
slave2
5.启动服务
ke.sh cluster start
4.kafka配置JMX
1.修改kafka的bin/kafka-run-class.sh配置,保留以下内容,其他删除
# JMX settings
if [ -z "$KAFKA_JMX_OPTS" ]; then
KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false "
fi
# JMX port to use
JMX_PORT=9999
JMX_RMI_PORT=9998
ISKAFKASERVER="false"
if [[ "$*" =~ "kafka.Kafka" ]]; then
ISKAFKASERVER="true"
fi
if [ $JMX_PORT ] && [ "true" == "$ISKAFKASERVER" ]; then
KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT -Dcom.sun.management.jmxremote.rmi.port=$JMX_RMI_PORT "
echo set KAFKA_JMX_PORT:$KAFKA_JMX_OPTS
fi
即删除以下几行
if [ $JMX_PORT ]; then
KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT "
fi
2.同步配置,重启kafka集群
3.重启efak
参考文档:
https://zookeeper.apache.org/doc/r3.4.13/zookeeperAdmin.html
https://blog.csdn.net/pdw2009/article/details/73794525
https://blog.csdn.net/lizhitao/article/details/25667831
https://www.cnblogs.com/dadonggg/p/8242682.html
https://www.cnblogs.com/dadonggg/p/8205302.html