【云原生】Azkaban on k8s 讲解与实战操作
一、概述
大数据平台技术框架支持的开发语言多种多样,开发人员的背景差异也很大,这就产生出很多不同类型的程序(任务)运行在大数据平台之上,如:MapReduce、Hive、Pig、Spark、Java、Shell、Python 等。
官方文档:
https://azkaban.readthedocs.io/en/latest/
https://azkaban.github.io/azkaban/docs/latest/
GitHub地址:https://github.com/azkaban/azkaban
可以参考我之前的文章:大数据Hadoop之——任务调度器Azkaban(Azkaban环境部署)
二、开始部署
官方文档:https://azkaban.readthedocs.io/en/latest/getStarted.html
1)下载 azkaban
git clone https://github.com/azkaban/azkaban.git
# 构建Azkaban安装包
cd azkaban; ./gradlew build installDist
2)初始化azkaban表
这里的mysql也是部署在k8s,不清楚的小伙伴可以参考我这篇文章:【云原生】MySQL on k8s 环境部署
#【温馨提示】一般公司禁止mysql -u root -p123456这种方式连接,在history里有记录,存在安全隐患,小伙伴不要被公司安全审计哦,切记!!!
# 获取root密码
MYSQL_ROOT_PASSWORD=$(kubectl get secret --namespace mysql mysql -o jsonpath="{.data.mysql-root-password}" | base64 -d)
#登录pod
kubectl exec -it mysql-primary-0 -n mysql -- bash
# 连接myslq
mysql -u root -p$MYSQL_ROOT_PASSWORD
CREATE DATABASE azkaban;
CREATE USER 'azkaban'@'%' IDENTIFIED BY 'azkaban';
GRANT SELECT,INSERT,UPDATE,DELETE ON azkaban.* to 'azkaban'@'%' WITH GRANT OPTION;
flush privileges;
# 将宿主机上的sql文件copy到pod
kubectl cp azkaban-db/create-all-sql-3.91.0-386-ge35281d.sql mysql/mysql-primary-0:/tmp/
#登录pod
kubectl exec -it mysql-primary-0 -n mysql -- bash
mysql -u root -p$MYSQL_ROOT_PASSWORD
use azkaban;
# 可能版本不一样,sql文件也不太一样,create-all-sql-*.sql
source /tmp/create-all-sql-3.91.0-386-ge35281d.sql
【温馨提示】最好是在启动服务的时候通过脚本去初始化sql。sql文件如下:
/*
-- 如果多次执行建议打开,但是有风险
drop database azkaban;
delete from mysql.db where user="azkaban";
delete from mysql.user where user="azkaban";
flush privileges;
*/
CREATE DATABASE IF NOT EXISTS azkaban;
CREATE USER 'azkaban'@'%' IDENTIFIED BY 'azkaban';
GRANT SELECT,INSERT,UPDATE,DELETE ON azkaban.* to 'azkaban'@'%' WITH GRANT OPTION;
flush privileges;
use azkaban
source /opt/apache/azkaban/azkaban-db/create-all-sql-3.91.0-386-ge35281d.sql
执行测试
# 登录pod
kubectl exec -it mysql-primary-0 -n mysql -- bash
# SQL初始化
mysql -u root -pWyfORdvwVm -h mysql-primary.mysql </opt/apache/azkaban/azkaban-db/init.sql
3)构建镜像
docker-entrypoint.sh
#!/bin/bash
### init sql
if [ ! `mysql -u ${MYSQL_USER_NAME} -p${MYSQL_USER_PASSWORD} -h${MYSQL_HOST} -e "show databases;"|grep "${MYSQL_DB}"` ];then
mysql -u${MYSQL_USER_NAME} -p${MYSQL_USER_PASSWORD} -h${MYSQL_HOST} < ${AZKABAN_HOME}/azkaban-db/init.sql
fi
funStartExec(){
### start azkaban exec
echo "start azkaban exec..."
{
funActivateExec
}&
cd ${AZKABAN_HOME}/azkaban-exec-server/;
./bin/internal/internal-start-executor.sh 2>&1 |tee -a executorServerLog__`date +%F+%T`.out
}
funStartWeb(){
### start azkaban web
echo "start azkaban web..."
cd ${AZKABAN_HOME}/azkaban-web-server/;
./bin/internal/internal-start-web.sh 2>&1 |tee -a webServerLog_`date +%F+%T`.out
}
funActivateExec(){
until netstat -ntlp|grep -q :12321; do echo waiting for azkaban-exec; sleep 1; done
curl -G "`hostname`:12321/executor?action=activate" && echo
}
if [ "$1" = "exec" ];then
funStartExec
elif [ "$1" = "web" ];then
funStartWeb
elif [ "$1" = "all" ];then
funStartExec
funStartWeb
else
echo "please input args [exec|web|all]"
fi
【温馨提示】web启动必须cd到
azkaban-web-server
目录下再执行启动脚本。
deleteExec.sh
#!/bin/bash
HOSTNAME=`hostname -A`
mysql -u${MYSQL_USER_NAME} -p${MYSQL_USER_PASSWORD} -h${MYSQL_HOST} ${MYSQL_DB} -e "DELETE FROM ${MYSQL_DB} WHERE host=\"${HOSTNAME}.azkaban-exe.azkaban.svc.cluster.local\""
Dockerfile
FROM myharbor.com/bigdata/centos:7.9.2009
RUN rm -f /etc/localtime && ln -sv /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo "Asia/Shanghai" > /etc/timezone
RUN export LANG=zh_CN.UTF-8
### install tools
RUN yum install -y vim tar wget curl less telnet net-tools lsof mysql
RUN mkdir -p /opt/apache
### JDK
ADD jdk-8u212-linux-x64.tar.gz /opt/apache/
ENV JAVA_HOME /opt/apache/jdk1.8.0_212
ENV PATH $JAVA_HOME/bin:$PATH
### Azkaban
RUN mkdir /opt/apache/azkaban
ENV AZKABAN_HOME /opt/apache/azkaban
ADD azkaban-exec-server.tar.gz $AZKABAN_HOME
ADD azkaban-web-server.tar.gz $AZKABAN_HOME
ADD azkaban-db.tar.gz $AZKABAN_HOME
COPY init.sql $AZKABAN_HOME
COPY docker-entrypoint.sh /opt/apache
RUN chmod +x /opt/apache/docker-entrypoint.sh
RUN groupadd --system --gid=9999 admin && useradd --system --home-dir /opt/home --uid=9999 --gid=admin admin
RUN chown -R admin:admin /opt/apache
#设置的工作目录
WORKDIR $AZKABAN_HOME
# 执行脚本,构建镜像时不执行,运行实例才会执行
ENTRYPOINT ["/opt/apache/docker-entrypoint.sh"]
开始构建镜像
docker build -t myharbor.com/bigdata/azkaban:4.0 . --no-cache
# 上传镜像
docker push myharbor.com/bigdata/azkaban:4.0
# 删除镜像
docker rmi myharbor.com/bigdata/azkaban:4.0
crictl rmi myharbor.com/bigdata/azkaban:4.0
4)编排yaml
1、configmap
azkaban-exec-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
labels:
app: azkaban-exec
name: azkaban-exec-cm
data:
azkaban.properties: |-
# Azkaban Personalization Settings
azkaban.name=Test
azkaban.label=My Local Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=false
jetty.maxThreads=25
jetty.port=8081
# Where the Azkaban web server is located
azkaban.webserver.url=http://azkaban-web.azkaban:8081
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban plugin settings
azkaban.jobtype.plugin.dir=plugins/jobtypes
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=mysql-primary.mysql
mysql.database=azkaban
mysql.user=azkaban
mysql.password=azkaban
mysql.numconnections=100
# Azkaban Executor settings
executor.maxThreads=50
executor.flow.threads=30
azkaban.executor.runtimeProps.override.eager=false
executor.port=12321
azkaban-web-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
labels:
app: azkaban-web
name: azkaban-web-cm
data:
azkaban.properties: |-
# Azkaban Personalization Settings
azkaban.name=Test
azkaban.label=My Local Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=false
jetty.maxThreads=25
jetty.port=8081
# Azkaban Executor settings
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=mysql-primary.mysql
mysql.database=azkaban
mysql.user=azkaban
mysql.password=azkaban
mysql.numconnections=100
#Multiple Executor
azkaban.use.multiple.executors=true
azkaban.executorselector.filters=StaticRemainingFlowSize,CpuStatus
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1
azkaban.executorselector.comparator.Memory=1
azkaban.executorselector.comparator.LastDispatched=1
azkaban.executorselector.comparator.CpuUsage=1
azkaban-users.xml: |-
<azkaban-users>
<user groups="azkaban" password="azkaban" roles="admin" username="azkaban"/>
<user password="metrics" roles="metrics" username="metrics"/>
<role name="admin" permissions="ADMIN"/>
<role name="metrics" permissions="METRICS"/>
</azkaban-users>
3、secret
secret.yaml
apiVersion: v1
data:
# echo -n 'WyfORdvwVm' | base64
mysql-root-password: V3lmT1JkdndWbQ==
kind: Secret
metadata:
labels:
app.kubernetes.io/name: azkaban
name: azkaban-secret
type: Opaque
3、service
azkaban-exec-svc.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: exec
app.kubernetes.io/name: azkaban
name: azkaban-exec
spec:
ports:
- name: azkaban-exec
port: 12321
protocol: TCP
selector:
app.kubernetes.io/component: exec
app.kubernetes.io/name: azkaban
type: ClusterIP
azkaban-web-svc.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: web
app.kubernetes.io/name: azkaban
name: azkaban-web
spec:
ports:
- name: azkaban-web-http
nodePort: 30081
port: 8081
protocol: TCP
selector:
app.kubernetes.io/component: web
app.kubernetes.io/name: azkaban
type: NodePort
4、控制器
azkaban-exec-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: azkaban-exec
spec:
serviceName: azkaban-exec
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/component: exec
app.kubernetes.io/name: azkaban
template:
metadata:
labels:
app.kubernetes.io/component: exec
app.kubernetes.io/name: azkaban
spec:
containers:
- name: azkaban-exec
image: myharbor.com/bigdata/azkaban:4.0
#command: ["/opt/apache/docker-entrypoint.sh"]
args: ["exec"]
imagePullPolicy: IfNotPresent
ports:
- name: azkaban-exec
containerPort: 12321
protocol: TCP
env:
- name: MYSQL_HOST
value: mysql-primary.mysql
- name: MYSQL_USER_NAME
value: root
- name: MYSQL_DB
value: azkaban
- name: MYSQL_USER_PASSWORD
valueFrom:
secretKeyRef:
name: azkaban-secret
key: mysql-root-password
volumeMounts:
- name: azkaban-exec-volume
mountPath: /opt/apache/azkaban/azkaban-exec-server/conf/azkaban.properties
subPath: azkaban.properties
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 5
tcpSocket:
port: azkaban-exec
livenessProbe:
initialDelaySeconds: 10
periodSeconds: 5
tcpSocket:
port: azkaban-exec
lifecycle:
preStop: # 删掉mysql记录
exec:
command: ["/opt/apache/deleteExec.sh"]
securityContext:
runAsUser: 9999
privileged: true
volumes:
- name: azkaban-exec-volume
configMap:
name: azkaban-exec-cm
azkaban-web-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: azkaban-web
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/component: web
app.kubernetes.io/name: azkaban
template:
metadata:
labels:
app.kubernetes.io/component: web
app.kubernetes.io/name: azkaban
spec:
initContainers:
- name: waiting-exec
image: myharbor.com/bigdata/azkaban:4.0
command: ['sh', '-c', "until (echo 'q')|telnet -e 'q' azkaban-exec.azkaban 12321 >/dev/null 2>&1; do echo waiting for exec; sleep 1; done"]
containers:
- name: azkaban-web
image: myharbor.com/bigdata/azkaban:4.0
#command: ["/opt/apache/docker-entrypoint.sh"]
args: ["web"]
imagePullPolicy: IfNotPresent
ports:
- name: azkaban-web
containerPort: 8081
protocol: TCP
env:
- name: MYSQL_HOST
value: mysql-primary.mysql
- name: MYSQL_USER_NAME
value: root
- name: MYSQL_DB
value: azkaban
- name: MYSQL_USER_PASSWORD
valueFrom:
secretKeyRef:
name: azkaban-secret
key: mysql-root-password
volumeMounts:
- name: azkaban-web-volume
mountPath: /opt/apache/azkaban/azkaban-web-server/conf/azkaban.properties
subPath: azkaban.properties
- name: azkaban-users-volume
mountPath: /opt/apache/azkaban/azkaban-web-server/conf/azkaban-users.xml
subPath: azkaban-users.xml
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 5
tcpSocket:
port: azkaban-web
livenessProbe:
initialDelaySeconds: 10
periodSeconds: 5
tcpSocket:
port: azkaban-web
securityContext:
runAsUser: 9999
privileged: true
volumes:
- name: azkaban-web-volume
configMap:
name: azkaban-web-cm
- name: azkaban-users-volume
configMap:
name: azkaban-web-cm
5)开始部署
kubectl create ns azkaban
kubectl apply -f azkaban-exec-cm.yaml -n azkaban
kubectl apply -f azkaban-web-cm.yaml -n azkaban
kubectl apply -f secret.yaml -n azkaban
kubectl apply -f azkaban-exec-svc.yaml -n azkaban
kubectl apply -f azkaban-web-svc.yaml -n azkaban
kubectl apply -f azkaban-exec-statefulset.yaml -n azkaban
kubectl apply -f azkaban-web-deployment.yaml -n azkaban
查看
kubectl get pods,svc -n azkaban -owide
web:http://192.168.182.110:30081/
账号/密码:admin/admin
6)测试验证
官方文档:https://azkaban.readthedocs.io/en/latest/createFlows.html
【示例】
1、新建helloworld.project
文件,编辑内容如下:
cat >helloworld.project<<EOF
azkaban-flow-version: 2.0
EOF
2、新建helloworld.flow
文件,内容如下:
cat > helloworld.flow <<EOF
nodes:
- name: jobA
type: command
config:
command: echo "Hello World"
EOF
3、将上面两个文件压缩成一个zip文件,目前只支持zip
文件,文件名称必须是英文。
4、web上新建Project
5、上传zip包
6、开始调度执行
这里只是演示一个很简单很简单的示例,只是为了验证azkaban的可用性,一般企业是通过azkaban去调度spark,flink等任务,因为机器资源有限,无法把所有的服务都起来,有疑问的小伙伴可以给我留言。
7)卸载
kubectl delete ns azkaban --force
8)通过helm 部署
因为没有现成的模板,所以这里需要创建一个空模板。
helm create azkaban
这里就不贴yaml文件内容了,最下面会给出git下载地址,有疑问的小伙伴欢迎给我留言。下面直接安装了
helm install azkaban ./azkaban -n azkaban --create-namespace
kubectl get pods,svc -n azkaban -owide
NOTES
NAME: azkaban
LAST DEPLOYED: Fri Oct 7 15:21:22 2022
NAMESPACE: azkaban
STATUS: deployed
REVISION: 1
NOTES:
1. Get the application URL by running these commands:
export NODE_PORT=$(kubectl get --namespace azkaban -o jsonpath="{.spec.ports[0].nodePort}" services azkaban-web)
export NODE_IP=$(kubectl get nodes --namespace azkaban -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
web:http://192.168.182.110:31081/index
9)helm 卸载
helm uninstall azkaban -n azkaban
kubectl delete ns azkaban --force
git地址:https://gitee.com/hadoop-bigdata/azkaban-on-k8s
azkaban 已编译部署包百度云盘下载地址:
链接:https://pan.baidu.com/s/1TqmMSCT1--z_LcqBlAancA?pwd=9y17
提取码:9y17
Azkaban on k8s 讲解与实战操作就先到这里了,有疑问的小伙伴欢迎给我留言,后续会持续更新【云原生+大数据】相关的教程,请小伙伴耐心等待~