一、环境准备
1.服务器准备
主机名 |
公网IP |
内网IP |
stg-airflow001 |
68.79.16.69 |
172.31.47.207 |
2.安装版本说明
#1.安装版本限制
Python: 3.6, 3.7, 3.8
Databases:
PostgreSQL: 9.6, 10, 11, 12, 13
MySQL: 5.7, 8
SQLite: 3.15.0+
Kubernetes: 1.18.15 1.19.7 1.20.2
注意:
1)MySQL 5.x 版本不能或有运行多个调度程序的限制——请参阅:调度程序。MariaDB 未经过测试/推荐。
2)SQLite 用于 Airflow 测试。不要在生产中使用它。建议使用最新的 SQLite 稳定版本进行本地开发。
3)就 Python 3 支持而言,Airflow 2.0.0 已使用 Python 3.6、3.7 和 3.8 进行测试,但尚不支持 Python 3.9。
#2.安装工具
只有pip安装目前正式支持。
3.版本选择
安装工具 |
版本 |
用途 |
Python |
3.8.6 |
安装airflow及其依赖包、开发airflow的dag使用 |
MySQL |
5.7 |
作为airflow的元数据库 |
Airflow |
2.1.0 |
任务调度平台 |
二、格式化文件系统
#1.查看所有磁盘分区情况
[stg-airflow001 ~]$ fdisk -l
Disk /dev/nvme1n1: 107.4 GB, 107374182400 bytes, 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/nvme0n1: 53.7 GB, 53687091200 bytes, 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000acf0a
Device Boot Start End Blocks Id System
/dev/nvme0n1p1 * 2048 104857566 52427759+ 83 Linux
#2.进行磁盘分区
[stg-airflow001 ~]$ fdisk /dev/nvme1n1
Welcome to fdisk (util-linux 2.23.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Device does not contain a recognized partition table
Building a new DOS disklabel with disk identifier 0x39e17a4f.
Command (m for help): n
Partition type:
p primary (0 primary, 0 extended, 4 free)
e extended
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-209715199, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-209715199, default 209715199):
Using default value 209715199
Partition 1 of type Linux and of size 100 GiB is set
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
三、上传系统优化脚本
#1.编写系统优化脚本
[stg-airflow001 ~]$ vim Opt-Centos.sh
#!/usr/bin/bash
# Author:jh
# Time:2021-04-11 18:48:19
# Name:Opt-Centos.sh
# Version: 1.0
# Discription: To
local_IP=`ifconfig |awk -F ' ' 'NR==2{print $2}'`
local_hostname=`hostname`
base_yum="CentOS-Base.repo"
epel_yum="epel.repo"
yum_dir="/etc/yum.repos.d/"
cron_dir="/var/spool/cron/root"
ssh_dir="/etc/ssh/sshd_config"
linux_comm_software=(net-tools vim tree htop iftop gcc gcc-c++ glibc iotop lrzsz sl wget unzip telnet nmap nc psmisc dos2unix bash-completion bash-completion-extra sysstat rsync nfs-utils httpd-tools expect)
#1.修改主机名
source /etc/init.d/functions
if [ $# -ne 1 ];then
echo "/bin/sh $0 New hostname"
exit 1
fi
hostnamectl set-hostname $1
if [ $? -eq 0 ];then
action "hostname update is" /usr/bin/true
else
action "hostname update is" /usr/bin/false
fi
#2.配置ssh连接成功显示
platform=`uname -i`
if [ $platform != "x86_64" ];then
echo "this script is only for 64bit Operating System !"
exit 1
fi
echo "the platform is ok"
cat << EOF
+---------------------------------------+
| your system is CentOS 7 x86_64 |
| start optimizing....... |
+---------------------------------------
EOF
#3.配置yum仓库
mv $yum_dir$base_yum $yum_dir${base_yum}.bak
mv $yum_dir$epel_yum $yum_dir${epel_yum}.bak
curl -o $yum_dir$base_yum http://mirrors.aliyun.com/repo/Centos-7.repo
curl -o $yum_dir$epel_yum http://mirrors.aliyun.com/repo/epel-7.repo
yum clean all
yum makecache
#4.安装基础软件包
for i in ${linux_comm_software[*]}
do
rpm -q $i &>/dev/null
if [ $? -eq 0 ];then
echo "$i is installed"
else
yum -y install $i &>/dev/null
action "$i is installing" /usr/bin/true
fi
done
#5.关闭防火墙firewalld
#systemctl disable firewalld
#systemctl stop firewalld
#6.关闭selinux
#sed 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
#7.修改本地解析
echo "$local_IP $local_hostname" >> /etc/hosts
#8.设置时间同步
timedatectl set-timezone Asia/Shanghai
/usr/sbin/ntpdate time1.aliyun.com
echo '#Timing synchronization time' >> $cron_dir
echo "* 4 * * * /usr/sbin/ntpdate time1.aliyun.com > /dev/null 2>&1" >> $cron_dir
systemctl restart crond.service
#9.ssh参数优化
#sed -i 's/^GSSAPIAuthentication yes$/GSSAPIAuthentication no/g' $ssh_dir
#sed -i 's/#UseDNS yes/UseDNS no/g' $ssh_dir
#sed -i 's/PermitRootLogin yes/PermitRootLogin no/g' $ssh_dir
#sed -i 's/#port 22/poort 520/g' $ssh_dir
#10.加大文件描述符
tail -1 /etc/security/limits.conf &>/dev/null
[ $? -eq 0 ] && echo "文件描述符以加大" || echo '* - nofile 65535 ' >>/etc/security/limits.conf
#11.环境变量及别名优化
cat>>/etc/profile.d/color.sh<<EOF
alias ll='ls -l --color=auto --time-style=long-iso'
PS1="\[\e[37;40m\][\[\e[32;1m\]\u\[\e[37;40m\]@\h \[\e[36;40m\]\w\[\e[0m\]]\[\e[32;1m\]\\$ \[\e[0m\]"
export HISTTIMEFORMAT='%F-%T '
EOF
source /etc/profile
#12.内核优化
cat >>/etc/sysctl.conf<<EOF
net.ipv4.tcp_fin_timeout = 2
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_keepalive_time = 600
net.ipv4.ip_local_port_range = 4000 65000
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_max_tw_buckets = 36000
net.ipv4.route.gc_timeout = 100
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_synack_retries = 1
net.core.somaxconn = 16384
net.core.netdev_max_backlog = 16384
net.ipv4.tcp_max_orphans = 16384
net.ipv4.ip_forward = 1
net.ipv4.icmp_echo_ignore_all=1
EOF
sysctl -p
#13.关闭NetworkManager
#systemctl stop NetworkManager
#systemctl disable NetworkManager
#14.更新软件
yum -y update && > /dev/null
#15.设置中文字符集
localectl set-locale LANG=zh_CN.UTF-8
#16.备份显示系统版本和内核的文件
cp /etc/issue{,.bak}
cp /etc/issue.net{,.bak}
> /etc/issue
> /etc/issue.net
#17.重读分区表
partprobe &&
#18.磁盘格式化
xfs_disk_info=`fdisk -l |awk 'NR==10{print $1}'`
mkfs.xfs $xfs_disk_info -f
#19.新建数据目录data
mkdir /data
#20.挂载目录
mount $xfs_disk_info /data/
#21.查看挂载点
df -h
#22.实现永久挂载
uuid_disk_info=`blkid |awk -F ' ' 'NR==2{print $2}' |awk -F '\"' '{print $2}'`
echo "UUID=$uuid_disk_info /data xfs defaults 0 0 ">>/etc/fstab
#23.查看挂载信息
tail -1 /etc/fstab
#24.优化完成
cat << EOF
+-------------------------------------------------+
| 优 化 已 完 成 |
| 请 重启 这台服务器 ! |
+-------------------------------------------------+
EOF
sleep 5
rm -rf ./Opt-Centos.sh
#2.增加执行权限
[stg-airflow001 ~]$ chmod +x Opt-Centos.sh
#3.执行系统优化脚本
[stg-airflow001 ~]$ sh Opt-Centos.sh
四、安装python3
1.安装依赖
#1.安装相关依赖
[root@stg-airflow001 ~]$ yum -y install zlib zlib-devel
[root@stg-airflow001 ~]$ yum -y install bzip2 bzip2-devel
[root@stg-airflow001 ~]$ yum -y install ncurses ncurses-devel
[root@stg-airflow001 ~]$ yum -y install readline readline-devel
[root@stg-airflow001 ~]$ yum -y install openssl openssl-devel
[root@stg-airflow001 ~]$ yum -y install openssl-static
[root@stg-airflow001 ~]$ yum -y install xz lzma xz-devel
[root@stg-airflow001 ~]$ yum -y install sqlite sqlite-devel
[root@stg-airflow001 ~]$ yum -y install gdbm gdbm-devel
[root@stg-airflow001 ~]$ yum -y install tk tk-devel
[root@stg-airflow001 ~]$ yum -y install db4-devel libpcap-devel libffi-devel
[root@stg-airflow001 ~]$ yum -y install epel-release
[root@stg-airflow001 ~]$ yum -y install gcc
2.下载安装包
#1.使用wget下载Python源码压缩包到/root目录下
[root@stg-airflow001 ~]$ cd /data/software
[root@stg-airflow001 /data/software]$ wget https://www.python.org/ftp/python/3.8.6/Python-3.8.6.tgz
#2.解压python3安装包
[root@stg-airflow001 /data/software]$ tar -zxvf Python-3.8.6.tgz -C /root
#3.进入安装目录
[root@stg-airflow001 /data/software]$ cd /root/Python-3.8.6/
#4.创建python3程序目录
[root@stg-airflow001 ~/Python-3.8.6]$ mkdir /usr/local/python3.8.6
3.生成Makefile文件
[root@stg-airflow001 ~/Python-3.8.6]$ mkdir bld
[root@stg-airflow001 ~/Python-3.8.6]$ cd bld/
[root@stg-airflow001 ~/Python-3.8.6/bld]$ ../configure --prefix=/usr/local/python3.8.6
4.编译安装
#1.编译安装
[root@stg-airflow001 ~/Python-3.8.6/bld]$ make && make install
#2.做软连接
[root@stg-airflow001 ~/Python-3.8.6/bld]$ cd /usr/local/
[root@stg-airflow001 /usr/local]$ ln -s python3.8.6 python3
5.配置环境变量
[root@stg-airflow001 ~/Python-3.8.6/bld]$ vim /etc/profile.d/python3.sh
export PATH=/usr/local/python3/bin:$PATH
[root@stg-airflow001 ~/Python-3.8.6/bld]$ source /etc/profile
6.查看python版本
#1.查看Python版本
[root@stg-airflow001 ~/Python-3.8.6/bld]$ python3 -V
Python 3.8.6
#2.检测pip是否可用
[root@stg-airflow001 ~/Python-3.8.6/bld]$ pip3 -V
pip 20.2.3 from /usr/local/python3/lib/python3.8/site-packages/pip (python 3.9)
#3.升级pip
[root@stg-airflow001 ~/Python-3.8.6/bld]$ pip3 install --upgrade pip
#4.再次查看pip版本
[root@stg-airflow001 ~/Python-3.8.6/bld]$ pip3 -V
pip 21.1.2 from /usr/local/python3/lib/python3.8/site-packages/pip (python 3.9)
五、安装MySQL
#1.卸载mariadb
[root@stg-airflow001 ~]$ rpm -qa | grep mariadb
mariadb-libs-5.5.68-1.el7.x86_64
mariadb-devel-5.5.68-1.el7.x86_64
[root@stg-airflow001 ~]$ rpm -e --nodeps mariadb-libs-5.5.68-1.el7.x86_64
[root@stg-airflow001 ~]$ rpm -e --nodeps mariadb-devel-5.5.68-1.el7.x86_64
#2.下载mysql的repo源
[root@stg-airflow001 ~]$ wget -P /root http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
#3.通过rpm安装
[root@stg-airflow001 ~]$ rpm -ivh mysql-community-release-el7-5.noarch.rpm
#安装mysql
[root@stg-airflow001 ~]$ yum -y install mysql-server
#授权
[root@stg-airflow001 ~]$ chown -R mysql:mysql /var/lib/mysql
#开启Mysql服务
[root@stg-airflow001 ~]$ service mysqld start
#用root用户连接登录mysql:
[root@stg-airflow001 ~]$ mysql -uroot 或者 /usr/bin/mysql -uroot
#重置mysql密码
mysql> use mysql;
mysql> update user set password=password('root') where user='root';
mysql> flush privileges;
#为Airflow建库、建用户
#建库:
mysql> CREATE DATABASE airflow CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
#建用户:
mysql> create user 'airflow'@'%' identified by 'airflow';
mysql> create user 'airflow'@'localhost' identified by 'airflow';
#为用户授权:
mysql> grant all on airflow.* to 'airflow'@'%';
mysql> grant all on airflow.* to 'root'@'%';
mysql> flush privileges;
mysql> quit
Bye
#4.配置my.cnf如下
内容如下
[client]
default-character-set=utf8mb4
[mysql]
default-character-set=utf8mb4
[mysqld]
collation-server = utf8mb4_unicode_ci
init-connect='SET NAMES utf8mb4'
character-set-server = utf8mb4
explicit_defaults_for_timestamp=1
六、安装 Airflow
1.安装Airflow相关包
[root@stg-airflow001 ~]$ yum -y install mysql-devel
[root@stg-airflow001 ~]$ yum -y install python-devel
[root@stg-airflow001 ~]$ yum -y install python3-devel
[root@stg-airflow001 ~]$ yum -y install mysql-devel
[root@stg-airflow001 ~]$ pip3 install mysqlclient
[root@stg-airflow001 ~]$ pip3 install apache-airflow
[root@stg-airflow001 ~]$ pip3 install apache-airflow[mysql]
2.修改配置文件
#1.设置airflow的根目录,不设置默认当前家用户目录下生成airflow目录
[root@stg-airflow001 ~]$ echo "export AIRFLOW_HOME=/data/airflow" >> /root/.bashrc
[root@stg-airflow001 ~]$ source /root/.bashrc
# 初始化原始库 执行完以下命令后会生成airflow目录
[root@stg-airflow001 ~]$ airflow db init
[root@stg-airflow001 ~]$ cd airflow/
[root@stg-airflow001 ~]$ vim airflow.cfg
# 配置数据库,这里使用了mysql
executor = LocalExecutor
sql_alchemy_conn = mysql+pymysql://root:123456@localhost:3306/airflow
sql_alchemy_conn = mysql://user:password@IP:3306/airflow
# 设置时区
default_timezone = Asia/Shanghai
# web ui 界面使用的时区
default_ui_timezone = Asia/Shanghai
4.创建用户
[root@stg-airflow001 ~]$ airflow users create --username admin --password admin --firstname admin --lastname admin --role Admin --email example@XX.com
七、启动 Airflow
1.命令行启动
#1.命令行启动
[root@stg-airflow001 ~]$ ps -ef|grep airflow|cut -c 9-15|xargs kill -9
[root@stg-airflow001 ~]$ nohup airflow webserver >>werserver.log 2>&1 & #启动web服务,默认端口8080
[root@stg-airflow001 ~]$ nohup airflow scheduler >>scheduler.log 2>&1 & #启动定时任务
2.system启动
#1.添加配置文件
[root@stg-airflow001 ~/airflow]$ vim /etc/sysconfig/airflow
AIRFLOW_CONFIG=/root/airflow/airflow.cfg
AIRFLOW_HOME=/root/airflow
HADOOP_USER_NAME=hdfs
#2.添加gunicorn软连接
[root@stg-airflow001 ~/airflow]$ ln -fs /usr/local/python3.8.6/bin/gunicorn /bin/gunicorn
#3.添加System启动
[root@stg-airflow001 ~/airflow]$ vim /usr/lib/systemd/system/airflow-webserver.service
[Unit]
Description=Airflow Webserver
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
User=root
Group=root
Restart=on-failure
EnvironmentFile=/etc/sysconfig/airflow
ExecStart=/usr/local/python3/bin/airflow webserver
RestartSec=5s
PrivateTmp=true
LimitNOFILE=10000
TimeoutStopSec=20
[Install]
WantedBy=multi-user.target
[root@stg-airflow001 ~/airflow]$ vim /usr/lib/systemd/system/airflow-scheduler.service
[Unit]
Description=Airflow Scheduler
Wants=network-online.target
After=network-online.target
[Service]
User=root
Group=root
Type=simple
Restart=on-failure
EnvironmentFile=/etc/sysconfig/airflow
ExecStart=/usr/local/python3/bin/airflow scheduler
RestartSec=5s
PrivateTmp=true
LimitNOFILE=10000
TimeoutStopSec=20
[Install]
WantedBy=multi-user.target
#4.重载system服务
[root@stg-airflow001 ~/airflow]$ systemctl daemon-reload
#5.启动服务
[root@stg-airflow001 ~]$ systemctl enable --now airflow-webserver.service
[root@stg-airflow001 ~]$ systemctl enable --now airflow-scheduler.service
#6.查看服务有没有设置开机启动
[root@stg-airflow001 ~]$ systemctl is-enabled airflow-webserver.service
enabled
[root@stg-airflow001 ~]$ systemctl is-enabled airflow-scheduler.service
enabled
#7.验证服务
[root@stg-airflow001 ~/airflow]$ systemctl status airflow-webserver.service
● airflow-webserver.service - Airflow Webserver
Loaded: loaded (/usr/lib/systemd/system/airflow-webserver.service; disabled; vendor preset: disabled)
Active: active (running) since 一 2021-06-28 11:18:06 CST; 11min ago
Main PID: 26274 (airflow)
Tasks: 19
Memory: 430.9M
CGroup: /system.slice/airflow-webserver.service
├─26274 /usr/local/python3.8.6/bin/python3.8 /usr/local/python3/bin/airflow webserver -D
├─26292 gunicorn: master [airflow-webserver]
├─26294 [ready] gunicorn: worker [airflow-webserver]
├─26295 [ready] gunicorn: worker [airflow-webserver]
├─26296 [ready] gunicorn: worker [airflow-webserver]
└─26297 [ready] gunicorn: worker [airflow-webserver]
6月 28 11:18:06 stg-airflow001 systemd[1]: Started Airflow Webserver.
6月 28 11:18:07 stg-airflow001 airflow[26274]: ____________ _____________
6月 28 11:18:07 stg-airflow001 airflow[26274]: ____ |__( )_________ __/__ /________ __
6月 28 11:18:07 stg-airflow001 airflow[26274]: ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
6月 28 11:18:07 stg-airflow001 airflow[26274]: ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
6月 28 11:18:07 stg-airflow001 airflow[26274]: _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
6月 28 11:18:07 stg-airflow001 airflow[26274]: [2021-06-28 11:18:07,513] {dagbag.py:487} INFO - Filling up the DagBag from /dev/null
6月 28 11:18:20 stg-airflow001 systemd[1]: Current command vanished from the unit file, execution of the command list won't be resumed.
[root@stg-airflow001 ~/airflow]$ systemctl status airflow-scheduler.service
● airflow-scheduler.service - Airflow Scheduler
Loaded: loaded (/usr/lib/systemd/system/airflow-scheduler.service; disabled; vendor preset: disabled)
Active: active (running) since 一 2021-06-28 11:27:50 CST; 9s ago
Main PID: 27436 (airflow)
Tasks: 3
Memory: 80.5M
CGroup: /system.slice/airflow-scheduler.service
├─27436 /usr/local/python3.8.6/bin/python3.8 /usr/local/python3/bin/airflow scheduler
├─27439 /usr/local/python3.8.6/bin/python3.8 /usr/local/python3/bin/airflow scheduler
└─27440 airflow scheduler -- DagFileProcessorManager
6月 28 11:27:50 stg-airflow001 airflow[27436]: WARNING: This is a development server. Do not use it in a production deployment.
6月 28 11:27:50 stg-airflow001 airflow[27436]: Use a production WSGI server instead.
6月 28 11:27:50 stg-airflow001 airflow[27436]: * Debug mode: off
6月 28 11:27:50 stg-airflow001 airflow[27436]: [2021-06-28 11:27:50,958] {_internal.py:113} INFO - * Running on http://0.0.0.0:...o quit)
6月 28 11:27:50 stg-airflow001 airflow[27436]: [2021-06-28 11:27:50,960] {scheduler_job.py:1253} INFO - Starting the scheduler
6月 28 11:27:50 stg-airflow001 airflow[27436]: [2021-06-28 11:27:50,961] {scheduler_job.py:1258} INFO - Processing each file at ...1 times
6月 28 11:27:50 stg-airflow001 airflow[27436]: [2021-06-28 11:27:50,963] {dag_processing.py:254} INFO - Launched DagFileProcesso...: 27440
6月 28 11:27:50 stg-airflow001 airflow[27436]: [2021-06-28 11:27:50,964] {scheduler_job.py:1822} INFO - Resetting orphaned tasks...ag runs
6月 28 11:27:50 stg-airflow001 airflow[27436]: [2021-06-28 11:27:50,967] {settings.py:52} INFO - Configured default timezone Tim...('UTC')
6月 28 11:27:50 stg-airflow001 airflow[27436]: [2021-06-28 11:27:50,974] {dag_processing.py:529} WARNING - Because we cannot use...m to 1.
Hint: Some lines were ellipsized, use -l to show in full.
#7.验证端口
[root@stg-airflow001 ~/airflow]$ netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 25360/rpcbind
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 26292/gunicorn: mas
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 18655/sshd
tcp 0 0 0.0.0.0:8793 0.0.0.0:* LISTEN 27439/python3.8
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 24970/master
tcp6 0 0 :::111 :::* LISTEN 25360/rpcbind
tcp6 0 0 :::22 :::* LISTEN 18655/sshd
tcp6 0 0 ::1:25 :::* LISTEN 24970/master
5.登录测试
# 浏览器输入:http://68.79.16.69:8080 ,输入创建的用户名和密码,登陆成功,至此安装Airflow结束