kubernetes 集群部署rabbimq3.11.11
通过官方镜像 RabbitMQ Docker Image 和 rabbitmq-peer-discovery-k8s 插件进行集群部署。
0. 环境
kubernetes 1.24
rabbitmq3.11.11
1.命名空间
将 rabbitmq 的资源都放在 rabbitmq 命名空间内。
Namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: rabbitmq
2.配置
通过配置 configMap 将配置文件挂载到 rabbitmq 容器内 。
Config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: rabbitmq-config
namespace: rabbitmq
data:
enabled_plugins: |
[rabbitmq_management,rabbitmq_mqtt,rabbitmq_web_mqtt,rabbitmq_peer_discovery_k8s].
rabbitmq.conf: |
## Cluster formation. See https://www.rabbitmq.com/cluster-formation.html to learn more.
cluster_formation.peer_discovery_backend = k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
## Service name is rabbitmq by default but can be overridden using the cluster_formation.k8s.service_name key if needed
cluster_formation.k8s.service_name = rabbitmq-internal
## It is possible to append a suffix to peer hostnames returned by Kubernetes using cluster_formation.k8s.hostname_suffix
cluster_formation.k8s.hostname_suffix = .rabbitmq-internal.rabbitmq.svc.cluster.local
## Should RabbitMQ node name be computed from the pod's hostname or IP address?
## IP addresses are not stable, so using [stable] hostnames is recommended when possible.
## Set to "hostname" to use pod hostnames.
## When this value is changed, so should the variable used to set the RABBITMQ_NODENAME
## environment variable.
cluster_formation.k8s.address_type = hostname
## How often should node cleanup checks run?
cluster_formation.node_cleanup.interval = 30
## Set to false if automatic removal of unknown/absent nodes
## is desired. This can be dangerous, see
## * https://www.rabbitmq.com/cluster-formation.html#node-health-checks-and-cleanup
## * https://groups.google.com/forum/#!msg/rabbitmq-users/wuOfzEywHXo/k8z_HWIkBgAJ
cluster_formation.node_cleanup.only_log_warning = true
cluster_partition_handling = autoheal
## See https://www.rabbitmq.com/ha.html#master-migration-data-locality
queue_master_locator=min-masters
## This is just an example.
## This enables remote access for the default user with well known credentials.
## Consider deleting the default user and creating a separate user with a set of generated
## credentials instead.
## Learn more at https://www.rabbitmq.com/access-control.html#loopback-users
loopback_users.guest = false
## https://www.rabbitmq.com/memory.html#configuring-threshold
vm_memory_high_watermark.relative = 0.6
## On first start RabbitMQ will create a vhost and a user. These
## config items control what gets created.
## Relevant doc guide: https://rabbitmq.com/access-control.html
##
default_vhost = /
default_user = system
default_pass = rbmqu0101081710
# =======================================
# MQTT section
# =======================================
## TCP listener settings.
##
# mqtt.listeners.tcp.1 = 127.0.0.1:61613
# mqtt.listeners.tcp.2 = ::1:61613
mqtt.listeners.tcp.default = 1883
## Set the default user name and password used for anonymous connections (when client
## provides no credentials). Anonymous connections are highly discouraged!
##
mqtt.default_user = mqtt_admin
mqtt.default_pass = rbmqmqtt_07231816
## Enable anonymous connections. If this is set to false, clients MUST provide
## credentials in order to connect. See also the mqtt.default_user/mqtt.default_pass
## keys. Anonymous connections are highly discouraged!
##
mqtt.allow_anonymous = false
## If you have multiple vhosts, specify the one to which the
## adapter connects.
##
mqtt.vhost = /
## Specify the exchange to which messages from MQTT clients are published.
##
mqtt.exchange = exchange_mqtt_topic
## Specify TTL (time to live) to control the lifetime of non-clean sessions.
##
mqtt.subscription_ttl = 1800000
## Set the prefetch count (governing the maximum number of unacknowledged
## messages that will be delivered).
##
mqtt.prefetch = 10
这里配置了账号信息,cluster信息,mqtt信息(这个刚测试了一下,subscribe会失败)
3.密钥
通过 secrets 将 erlang-cookie 和默认用户信息写入到环境变量中。
Secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: rabbitmq-secret
namespace: rabbitmq
type: Opaque
data:
RABBITMQ_ERLANG_COOKIE: MTIzajE5dWVkYXM3ZGFkODEwMjNqMTM5ZGph
RABBITMQ_DEFAULT_USER: c3lzdGVt==
RABBITMQ_DEFAULT_PASS: cmJtcXUwMTAxMDgxNzEw==
这里填写的数据是需要base64处理的,k8s会自动base64解开放到pod里面的环境变量
4.RBAC
rabbitmq-peer-discovery 需要 rabc 权限来获取 endpoints 信息来做集群节点的自动发现。
Rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: rabbitmq
namespace: rabbitmq
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rabbitmq-peer-discovery-rbac
namespace: rabbitmq
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rabbitmq-peer-discovery-rbac
namespace: rabbitmq
subjects:
- kind: ServiceAccount
name: rabbitmq
namespace: rabbitmq
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rabbitmq-peer-discovery-rbac
5.服务
定义 headless service 作为 statefulset 的服务入口。
Service.yaml
kind: Service
apiVersion: v1
metadata:
namespace: rabbitmq
name: rabbitmq-internal
labels:
app: rabbitmq
spec:
clusterIP: None
ports:
- name: mqtt
protocol: TCP
port: 1883
- name: epmd
protocol: TCP
port: 4369
- name: amqp
protocol: TCP
port: 5672
- name: amqp-tls
protocol: TCP
port: 5671
- name: http
protocol: TCP
port: 15672
- name: inter-node-cli
protocol: TCP
port: 25672
selector:
app: rabbitmq
---
kind: Service
apiVersion: v1
metadata:
namespace: rabbitmq
name: rabbitmq
labels:
app: rabbitmq
spec:
type: NodePort
ports:
- name: mqtt
protocol: TCP
port: 1883
nodePort: 1883
- name: amqp
protocol: TCP
port: 5672
nodePort: 5672
- name: http
protocol: TCP
port: 15672
nodePort: 15672
selector:
app: rabbitmq
6.持久卷
statefulset 数据存储地方
PersistentVolume.yaml
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: rabbit-pv1
labels:
type: rabbitmq
spec:
storageClassName: rabbitmq
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/opt/rabbitmq_data1"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: rabbit-pv2
labels:
type: rabbitmq
spec:
storageClassName: rabbitmq
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/opt/rabbitmq_data2"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: rabbit-pv3
labels:
type: rabbitmq
spec:
storageClassName: rabbitmq
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/opt/rabbitmq_data3"
7. 有状态应用部署
按照官方集群部署的推荐方式使用 StatefulSet 方式部署,使用动态存储卷保存数据。
Statefulset.yaml
apiVersion: apps/v1
# See the Prerequisites section of https://www.rabbitmq.com/cluster-formation.html#peer-discovery-k8s.
kind: StatefulSet
metadata:
name: rabbitmq
namespace: rabbitmq
spec:
serviceName: rabbitmq-internal
# Three nodes is the recommended minimum. Some features may require a majority of nodes
# to be available.
replicas: 3
selector:
matchLabels:
app: rabbitmq
template:
metadata:
labels:
app: rabbitmq
spec:
serviceAccountName: rabbitmq
terminationGracePeriodSeconds: 10
nodeSelector:
# Use Linux nodes in a mixed OS kubernetes cluster.
# Learn more at https://kubernetes.io/docs/reference/kubernetes-api/labels-annotations-taints/#kubernetes-io-os
kubernetes.io/os: linux
initContainers:
- name: fix-readonly-config
image: busybox:1.31.1
command:
- sh
- -c
- cp /tmp/config/* /etc/rabbitmq;
volumeMounts:
- name: rabbitmq-config
mountPath: /etc/rabbitmq
- name: tmp-dir
mountPath: /tmp/config
containers:
- name: rabbitmq
image: rabbitmq:3.11.11
# Learn more about what ports various protocols use
# at https://www.rabbitmq.com/networking.html#ports
ports:
- name: mqtt
protocol: TCP
containerPort: 1883
- name: epmd
protocol: TCP
containerPort: 4369
- name: amqp
protocol: TCP
containerPort: 5672
- name: amqp-tls
protocol: TCP
containerPort: 5671
- name: http
protocol: TCP
containerPort: 15672
livenessProbe:
exec:
# This is just an example. There is no "one true health check" but rather
# several rabbitmq-diagnostics commands that can be combined to form increasingly comprehensive
# and intrusive health checks.
# Learn more at https://www.rabbitmq.com/monitoring.html#health-checks.
#
# Stage 2 check:
command: ["rabbitmq-diagnostics", "status"]
initialDelaySeconds: 60
# See https://www.rabbitmq.com/monitoring.html for monitoring frequency recommendations.
periodSeconds: 60
timeoutSeconds: 15
readinessProbe:
exec:
# This is just an example. There is no "one true health check" but rather
# several rabbitmq-diagnostics commands that can be combined to form increasingly comprehensive
# and intrusive health checks.
# Learn more at https://www.rabbitmq.com/monitoring.html#health-checks.
#
# Stage 2 check:
command: ["rabbitmq-diagnostics", "status"]
# To use a stage 4 check:
# command: ["rabbitmq-diagnostics", "check_port_connectivity"]
initialDelaySeconds: 20
periodSeconds: 60
timeoutSeconds: 10
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: RABBITMQ_NODENAME
value: rabbit@$(POD_NAME).rabbitmq-internal.$(POD_NAMESPACE).svc.cluster.local
- name: RABBITMQ_USE_LONGNAME
value: "true"
envFrom:
- secretRef:
name: rabbitmq-secret
volumeMounts:
- name: rabbitmq-config
mountPath: /etc/rabbitmq
- name: rabbitmq-data
mountPath: /var/lib/rabbitmq
volumes:
- name: rabbitmq-config
emptyDir: {}
- name: tmp-dir
configMap:
name: rabbitmq-config
volumeClaimTemplates:
- metadata:
name: rabbitmq-data
namespace: rabbitmq
labels:
app: rabbitmq
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: rabbitmq
resources:
requests:
storage: 10Gi
8. 部署
sudo kubectl create -f Namespace.yaml
sudo kubectl create -f PersistentVolume.yaml
sudo kubectl create -f Rbac.yaml
sudo kubectl create -f Secret.yaml
sudo kubectl create -f Config.yaml
sudo kubectl create -f Statefulset.yaml
sudo kubectl create -f Service.yaml
9. 查看pods
qiteck@server:~/program/rabbitmq/3.11.11/k8s$ sudo kubectl get pods -n rabbitmq -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rabbitmq-0 1/1 Running 0 32m 10.244.1.133 server <none> <none>
rabbitmq-1 1/1 Running 0 31m 10.244.1.134 server <none> <none>
rabbitmq-2 1/1 Running 0 30m 10.244.1.135 server <none> <none>
10. 查看服务
qiteck@server:~/program/rabbitmq/3.11.11/k8s$ sudo kubectl get service -n rabbitmq -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
rabbitmq NodePort 10.96.172.247 <none> 1883:1883/TCP,5672:5672/TCP,15672:15672/TCP 32m app=rabbitmq
rabbitmq-internal ClusterIP None <none> 1883/TCP,4369/TCP,5672/TCP,5671/TCP,15672/TCP,25672/TCP 32m app=rabbitmq
11. 管理系统查看
12. 集群查看
12.1. rabbitmqctl cluster_status
root@rabbitmq-0:/# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbitmq-0.rabbitmq-internal.rabbitmq.svc.cluster.local ...
Basics
Cluster name: rabbit@rabbitmq-0.rabbitmq-internal.rabbitmq.svc.cluster.local
Total CPU cores available cluster-wide: 6
Disk Nodes
rabbit@rabbitmq-0.rabbitmq-internal.rabbitmq.svc.cluster.local
rabbit@rabbitmq-1.rabbitmq-internal.rabbitmq.svc.cluster.local
rabbit@rabbitmq-2.rabbitmq-internal.rabbitmq.svc.cluster.local
Running Nodes
rabbit@rabbitmq-0.rabbitmq-internal.rabbitmq.svc.cluster.local
rabbit@rabbitmq-1.rabbitmq-internal.rabbitmq.svc.cluster.local
rabbit@rabbitmq-2.rabbitmq-internal.rabbitmq.svc.cluster.local
Versions
rabbit@rabbitmq-0.rabbitmq-internal.rabbitmq.svc.cluster.local: RabbitMQ 3.11.11 on Erlang 25.3
rabbit@rabbitmq-1.rabbitmq-internal.rabbitmq.svc.cluster.local: RabbitMQ 3.11.11 on Erlang 25.3
rabbit@rabbitmq-2.rabbitmq-internal.rabbitmq.svc.cluster.local: RabbitMQ 3.11.11 on Erlang 25.3
CPU Cores
Node: rabbit@rabbitmq-0.rabbitmq-internal.rabbitmq.svc.cluster.local, available CPU cores: 2
Node: rabbit@rabbitmq-1.rabbitmq-internal.rabbitmq.svc.cluster.local, available CPU cores: 2
Node: rabbit@rabbitmq-2.rabbitmq-internal.rabbitmq.svc.cluster.local, available CPU cores: 2
Maintenance status
Node: rabbit@rabbitmq-0.rabbitmq-internal.rabbitmq.svc.cluster.local, status: not under maintenance
Node: rabbit@rabbitmq-1.rabbitmq-internal.rabbitmq.svc.cluster.local, status: not under maintenance
Node: rabbit@rabbitmq-2.rabbitmq-internal.rabbitmq.svc.cluster.local, status: not under maintenance
12.2. 启动日志查看
tail -f /var/log/rabbitmq/rabbit\@rabbitmq-0.rabbitmq-internal.rabbitmq.svc.cluster.local.log:
Feature flags
Flag: classic_mirrored_queue_version, state: enabled
Flag: classic_queue_type_delivery_support, state: enabled
Flag: direct_exchange_routing_v2, state: enabled
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: feature_flags_v2, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: listener_records_in_ets, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: stream_single_active_consumer, state: enabled
Flag: tracking_records_in_ets, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled
root@rabbitmq-0:/# tail -f /var/log/rabbitmq/rabbit\@rabbitmq-0.rabbitmq-internal.rabbitmq.svc.cluster.local.log
2023-03-31 08:33:03.820636+00:00 [info] <0.724.0> Server startup complete; 5 plugins started.
2023-03-31 08:33:03.820636+00:00 [info] <0.724.0> * rabbitmq_peer_discovery_k8s
2023-03-31 08:33:03.820636+00:00 [info] <0.724.0> * rabbitmq_peer_discovery_common
2023-03-31 08:33:03.820636+00:00 [info] <0.724.0> * rabbitmq_management
2023-03-31 08:33:03.820636+00:00 [info] <0.724.0> * rabbitmq_web_dispatch
2023-03-31 08:33:03.820636+00:00 [info] <0.724.0> * rabbitmq_management_agent
2023-03-31 08:34:03.957323+00:00 [info] <0.638.0> node 'rabbit@rabbitmq-1.rabbitmq-internal.rabbitmq.svc.cluster.local' up
2023-03-31 08:34:07.136896+00:00 [info] <0.638.0> rabbit on node 'rabbit@rabbitmq-1.rabbitmq-internal.rabbitmq.svc.cluster.local' up
2023-03-31 08:35:07.762592+00:00 [info] <0.638.0> node 'rabbit@rabbitmq-2.rabbitmq-internal.rabbitmq.svc.cluster.local' up
2023-03-31 08:35:10.314678+00:00 [info] <0.638.0> rabbit on node 'rabbit@rabbitmq-2.rabbitmq-internal.rabbitmq.svc.cluster.local' up
13. 遇到问题
13.1. this command requires the 'rabbit' app to be running on the target node. Start it with 'rabbitmqctl start_app'
服务还没有起来很多命令不可以执行,
k8s的PostStart :这个钩子在容器创建后立即执行,这个时候rabbit还没有起来
13.2. pvc异常:storageclass.storage.k8s.io "rabbitmq" not found: 访问模式,存储类型要对
PersistentVolume的accessModes和storageClassName必须要和Statefulset里面的volumeClaimTemplates一致
13.3. Error: secret "rabbitmq-secret" not found:
secret不存在,创建一下就好
13.4.Secret in version "v1" cannot be handled as a Secret: illegal base64 data at input byte 12
secret里面配置的数据需要base64处理
13.5.Feature flags: `maintenance_mode_status`: required feature flag not enabled! It must be enabled before upgrading RabbitMQ.
后面才发现是集群重新配置以后,之前的PersistentVolume没有删除导致,集群重启的话,需要彻底清空PersistentVolume的数据
13.6.Node 'rabbit@rabbitmq-1.rabbitmq-internal.rabbitmq.svc.cluster.local' thinks it's clustered with node 'rabbit@rabbitmq-0.rabbitmq-internal.rabbitmq.svc.cluster.local',but 'rabbit@rabbitmq-0.rabbitmq-internal.rabbitmq.svc.cluster.local' disagrees
后面才发现是集群重新配置以后,之前的PersistentVolume没有删除导致,集群重启的话,需要彻底清空PersistentVolume的数据