k8s及周边服务(详细总结篇)

1.业务部署说明

  我们是做算法的,每个算法会被封装成一个镜像,比如:image1(入侵算法),image2(安全带识别算)

 

  结合k8s流程: ingress-nginx(做了hostNetwork: true 网络模式,省去了service的一成转发),直接可以访问ingress-nginx的域名和端口——>客户通过ingress发布的host+path+业务自定义的接口地址拼接成访问url——>ingress-nginx根据对应的匹配规则转发  到后端service——>deployment——>pods(pvc——>pv)

 

  最后的流程就是:客户通过固定的接口格式,访问ingress-nginx根据匹配结果代理到后端不同的service上提供算力能力

 

 

2.完整卸载k8s

# 首先清理运行到k8s群集中的pod,使用
kubectl delete node --all

# 使用脚本停止所有k8s服务
for service in kube-apiserver kube-controller-manager kubectl kubelet etcd kube-proxy kube-scheduler; 
do
    systemctl stop $service
done

# 使用命令卸载k8s
kubeadm reset -f

# 卸载k8s相关程序
yum -y remove kube*

# 删除相关的配置文件
modprobe -r ipip
lsmod

# 然后手动删除配置文件和flannel网络配置和flannel网口:
rm -rf /etc/cni
rm -rf /root/.kube
# 删除cni网络
ifconfig cni0 down
ip link delete cni0
ifconfig flannel.1 down
ip link delete flannel.1

# 删除残留的配置文件
rm -rf ~/.kube/
rm -rf /etc/kubernetes/
rm -rf /etc/systemd/system/kubelet.service.d
rm -rf /etc/systemd/system/kubelet.service
rm -rf /etc/systemd/system/multi-user.target.wants/kubelet.service
rm -rf /var/lib/kubelet
rm -rf /usr/libexec/kubernetes/kubelet-plugins
rm -rf /usr/bin/kube*
rm -rf /opt/cni
rm -rf /var/lib/etcd
rm -rf /var/etcd

# 更新镜像
yum clean all
yum makecache

 

3.安装k8s准备工作

3.1 服务器分配

k8master:  172.16.4.58  #master使用虚拟机部署,不部署业务  
k8node1:   172.16.3.199 #物理机(GPU 2060),部署业务ai

操作系统:Centos7.8
k8s版本:v1.23.0
docker版本:19.03.8
ingress版本:

 

3.2 配置主机名解析(所有节点)

[root@k8master ~]# cat /etc/hosts
#添加
172.16.4.58 k8master
172.16.3.199 k8node1

 

3.3 设置hostname(所有节点)

#k8master节点执行
hostnamectl set-hostname k8master

#k8node1节点执行
hostnamectl set-hostname k8snode1

 

3.4 安装时间服务器(所有节点)

yum -y install bash-completion chrony iotop sysstat
## 启动时间服务
cat > /etc/chrony.conf <<EOF
server ntp.aliyun.com iburst
stratumweight 0
driftfile /var/lib/chrony/drift
rtcsync
makestep 10 3
bindcmdaddress 127.0.0.1
bindcmdaddress ::1
keyfile /etc/chrony.keys
commandkey 1
generatecommandkey
logchange 0.5
logdir /var/log/chrony
EOF
systemctl enable chronyd
systemctl start chronyd

 

3.5 禁用SELinux和Firewalld服务(所有节点)

#关闭firewalld
systemctl stop firewalld
systemctl disable firewalld
#禁用selinux
sed -i 's/enforcing/disabled/' /etc/selinux/config # 重启后生效

 

3.6 禁用swap分区

#临时关闭
swapoff -a        

#永久关闭
sed -i 's/.*swap.*/#&/' /etc/fstab 

 

3.7 添加网桥过滤和地址转发功能(所有节点)

cat > /etc/sysctl.d/kubernetes.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF

# 然后执行,生效
sysctl --system

 

3.8 安装docker(所有节点)

#安装依赖
yum install -y yum-utils device-mdataer-persistent-data lvm2

#安装docker库
yum-config-manager --add-repo \
https://download.docker.com/linux/centos/docker-ce.repo

#安装docker ce(业务使用版本,可以自行选择对应版本)
yum install -y containerd.io-1.2.13 docker-ce-19.03.8 docker-ce-cli-19.03.8

#创建docker目录
mkdir /etc/docker

#配置docker daemon 文件
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"graph": "/data/docker_storage",
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
],
"insecure-registries" : ["172.168.4.99:7090","152.199.254.168:7090"],
"registry-mirrors": ["https://g427vmjy.mirror.aliyuncs.com"],
"live-restore": true
}
EOF

#打开docker的api监听端口
sed -i 's/^ExecStart.*/#&/' /lib/systemd/system/docker.service
sed -i '15i ExecStart=/usr/bin/dockerd -H tcp://localhost:2375 -H unix://var/run/docker.sock -H fd:// --containerd=/run/containerd/containerd.sock' /lib/systemd/system/docker.service

#启动docker
systemctl daemon-reload
systemctl restart docker
systemctl enable docker

 

{
  // 注意daemon.json一定要要添加这行,指定cgroup的驱动程序,其他可按照业务自行配置
  "exec-opts": ["native.cgroupdriver=systemd"],
}

 

3.9 kubernetes镜像切换成国内源(所有节点)

cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

 

3.10 安装指定版本 kubeadm,kubelet 和 kubectl(我这里选择1.23.0版本的)

yum install -y kubelet-1.23.0 kubeadm-1.23.0 kubectl-1.23.0

# 设置kubelet开机启动
systemctl enable kubelet

 

3.11 更改kubelet的容器路径(可以不改,直接跳过)(所有节点)

#创建目录
mkdir /data/kubelet

vim /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf #添加--root-dir=/data/kubelet/,指定自己的目录 [Service] Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --root-dir=/data/kubelet/ --kubeconfig=/etc/kubernetes/kubelet.conf"

#配置生效

  systemctl daemon-reload
  systemctl restart docker
  systemctl restart kubelet

 

4.部署kubernetes集群

4.1 覆盖k8s的镜像地址(只需要在master节点上操作初始化命令)

(1)首先要覆盖kubeadm的镜像地址,因为这个是外网的无法访问,需要替换成国内的镜像地址,使用此命令列出集群在配置过程中需要哪些镜像

[root@k8master ~]# kubeadm config images list
I0418
18:26:04.047449 19242 version.go:255] remote version is much newer: v1.27.1; falling back to: stable-1.23 k8s.gcr.io/kube-apiserver:v1.23.17 k8s.gcr.io/kube-controller-manager:v1.23.17 k8s.gcr.io/kube-scheduler:v1.23.17 k8s.gcr.io/kube-proxy:v1.23.17 k8s.gcr.io/pause:3.6 k8s.gcr.io/etcd:3.5.1-0 k8s.gcr.io/coredns/coredns:v1.8.6

(2)更改为阿里云的镜像地址

[root@k8master ~]# kubeadm config images list  --image-repository registry.aliyuncs.com/google_containers
I0418
18:28:18.740057 20021 version.go:255] remote version is much newer: v1.27.1; falling back to: stable-1.23 registry.aliyuncs.com/google_containers/kube-apiserver:v1.23.17 registry.aliyuncs.com/google_containers/kube-controller-manager:v1.23.17 registry.aliyuncs.com/google_containers/kube-scheduler:v1.23.17 registry.aliyuncs.com/google_containers/kube-proxy:v1.23.17 registry.aliyuncs.com/google_containers/pause:3.6 registry.aliyuncs.com/google_containers/etcd:3.5.1-0 registry.aliyuncs.com/google_containers/coredns:v1.8.6

 

(3)然后将镜像手动拉取下来,这样在初始化的时候回更快一些(还有一个办法就是直接在docker上把镜像pull下来,docker只要配置一下国内源即可快速的将镜像pull下来)

[root@k8master ~]# kubeadm config images pull  --image-repository registry.aliyuncs.com/google_containers
I0418
18:28:31.795554 20088 version.go:255] remote version is much newer: v1.27.1; falling back to: stable-1.23 [config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.23.17 [config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.23.17 [config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.23.17 [config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.23.17 [config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.6 [config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.5.1-0 [config/images] Pulled registry.aliyuncs.com/google_containers/coredns:v1.8.6

 

(4)初始化kubernetes(只需要在master节点上操作初始化命令)

# 初始化 Kubernetes,指定网络地址段 和 镜像地址(后续的子节点可以使用join命令进行动态的追加)

kubeadm init \
  --apiserver-advertise-address=172.16.4.58 \
  --image-repository registry.aliyuncs.com/google_containers \
  --kubernetes-version v1.23.0 \
  --service-cidr=10.96.0.0/12 \
  --pod-network-cidr=10.244.0.0/16 \
  --ignore-preflight-errors=all

# –apiserver-advertise-address # 集群通告地址(master 机器IP)
# –image-repository # 由于默认拉取镜像地址k8s.gcr.io国内无法访问,这里指定阿里云镜像仓库地址
# –kubernetes-version #K8s版本,与上面安装的一致
# –service-cidr #集群内部虚拟网络,Pod统一访问入口,可以不用更改,直接用上面的参数
# –pod-network-cidr #Pod网络,与下面部署的CNI网络组件yaml中保持一致,可以不用更改,直接用上面的参数

 

#执行完之后要手动执行一些参数(尤其是 加入集群的join命令 需要复制记录下载)

[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 172.16.4.58:6443 --token nnzdrq.1mqngk9jnh88nkyn \
    --discovery-token-ca-cert-hash sha256:de19ee27e18341ce9acf6248d76664c3bc932372745930fe28687a20073c179a

 

(5)执行参数

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

vim /root/.bash_profile
#最后添加
# 超级用户变量
export KUBECONFIG=/etc/kubernetes/admin.conf
# 设置别名
alias k=kubectl
# 设置kubectl命令补齐功能
source <(kubectl completion bash)

#执行生效
source /root/.bash_profile

 

(6)这段要复制记录下来(来自k8s初始化成功之后出现的join命令,需要先配置完calico或者Flannel才能加入子节点),后续子节点加入master节点需要执行这段命令

kubeadm join 172.16.4.58:6443 --token nnzdrq.1mqngk9jnh88nkyn \
    --discovery-token-ca-cert-hash sha256:de19ee27e18341ce9acf6248d76664c3bc932372745930fe28687a20073c179a

 

4.2 k8s网络部署,calico插件(只需要k8master上执行)

(1)下载calico.yaml文件

wget https://docs.projectcalico.org/v3.20/manifests/calico.yaml --no-check-certificate

(2)calico.yaml文件修改

vim /data/calico.yaml

 

#calico增加、修改内容解释

- name: IP_AUTODETECTION_METHOD

   value: interface=ens192

作用:告诉 Calico 插件使用 ens192 网络接口来自动检测容器的 IP 地址。这意味着 Calico 将尝试在指定的网络接口上查找可用的 IP 地址,并将该 IP 地址用于容器的网络通信

 

- name: CALICO_IPV4POOL_CIDR   
                                                                                    
  value: "10.244.0.0/16" 

作用:这表示 Calico 将使用 IP 地址范围 10.244.0.010.244.255.255(CIDR /16 表示 16 位的网络前缀)来分配给容器,这个值要和:

kubeadm init \
  --apiserver-advertise-address=172.16.4.58 \
  --image-repository registry.aliyuncs.com/google_containers \
  --kubernetes-version v1.23.0 \
  --service-cidr=10.96.0.0/12 \
  --pod-network-cidr=10.244.0.0/16 \
  --ignore-preflight-errors=all

中的  --pod-network-cidr=10.244.0.0/16 保持一致

 

apiVersion: policy/v1beta1
修改为:
apiVersion: policy/v1 


目的:policy/v1 引入了 NetworkPolicy 资源的稳定版本,并提供了更强大的网络策略功能。这个版本更加成熟和稳定,因此推荐在较新的 Kubernetes 版本中使用它

 

(3)应用calico网络

kubectl apply -f calico.yaml

 

(4)查看calico是否运行成功

[root@k8master data]# kubectl get pods -n kube-system 
NAME                                       READY   STATUS    RESTARTS      AGE
calico-kube-controllers-5b9cd88b65-vg4gq   1/1     Running   3 (17h ago)   21h
calico-node-86l8c                          1/1     Running   2 (17h ago)   21h
calico-node-lg2mg                          1/1     Running   0             21h
coredns-6d8c4cb4d-wm8d2                    1/1     Running   0             23h
coredns-6d8c4cb4d-xxdmm                    1/1     Running   0             23h

 

(5)下载calicoctl工具

#github地址
https://github.com/projectcalico/calicoctl/releases/tag/v3.20.6

 

 

mv calicoctl-linux-amd64 calicoctl
chmod +x calicoctl 
mv calicoctl /usr/bin/

#执行命令calicoctl node status,看到up就说明已经启动 [root@k8master data]# calicoctl node status Calico process is running. IPv4 BGP status +--------------+-------------------+-------+----------+-------------+ | PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO | +--------------+-------------------+-------+----------+-------------+ | 172.16.3.199 | node-to-node mesh | up | 12:15:08 | Established | +--------------+-------------------+-------+----------+-------------+ IPv6 BGP status No IPv6 peers found.

 

 


至此k8s的master节点全部部署完成!!!

5.k8node1从节点加入集群(以下操作在k8node1,也就是从节点执行)
5.1 安装nvidia驱动(需要根据自己的业务选择合适的驱动安装)
# 禁用系统Nouveau驱动
sed -i "s/blacklist nvidiafb/#&/" /usr/lib/modprobe.d/dist-blacklist.conf
cat >> /usr/lib/modprobe.d/dist-blacklist.conf <<EOF
blacklist nouveau
options nouveau modeset=0
EOF



# 备份系统initramfs镜像
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)
reboot


# 重启后查看系统Nouveau是否被禁用(没有任何输出)
lsmod | grep nouveau


# 安装驱动(--kernel-source-path手动补全,禁止复制)
sh /data/nvidia-drive/NVIDIA-Linux-x86_64-440.82.run --kernel-source-path=/usr/src/kernels/3.10.0-1160.102.1.el7.x86_64/ -k $(uname -r)
#注意: kernel-source-path 需要手动查看目录


#  接下来界面化操作(略)

5.2 安装nvidia-docker2支持k8s
# 安装nvidia-container-runtime && nvidia-container-runtime

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
yum install -y nvidia-container-toolkit nvidia-container-runtime # 安装nvidia-docker2,使k8s可以使用显卡驱动 distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo # 安装nvidia-docker2,重载Docker daemon configuration # 在执行过程中,会覆盖/etc/docker/daemon.json的内容,此时注意备份,可以把daemon.json的内容和新生成的合成一体。 yum install -y nvidia-docker2

 

5.3 daemon.json文件合并

[root@k8node1 ~]# cat /etc/docker/daemon.json 
{
"exec-opts": ["native.cgroupdriver=systemd"],
"graph": "/data/docker_storage",
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
],
"insecure-registries" : ["172.168.4.90:8090","152.188.254.169:8090"],
"registry-mirrors": ["https://g427vmjy.mirror.aliyuncs.com"],
"live-restore": true,
"default-runtime": "nvidia",
"runtimes": {
    "nvidia": {
        "path": "/usr/bin/nvidia-container-runtime",
        "runtimeArgs": []
    }
  }
}

 

5.4 查看是否可以调用gpu,看到下图则为调用成功

[root@k8node1 aibox-ai-server]# nvidia-smi 
Thu Nov  9 14:17:29 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:B3:00.0 Off |                  N/A |
|  0%   38C    P8     3W / 160W |      0MiB /  5934MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

 

5.5 重启docker、kublete服务

systemctl restart docker 
systemctl restart kubelet

 

5.6 从k8node1节点加入k8master主节点(k8node1从节点执行)

kubeadm join 172.16.4.58:6443 --token nnzdrq.1mqngk9jnh88nkyn --discovery-token-ca-cert-hash sha256:de19ee27e18341ce9acf6248d76664c3bc932372745930fe28687a20073c179a

 

5.7 查看加入是否成功(k8master主节点执行)

#给k8node1设置一个标签(在k8master上执行)

kubectl label nodes k8node1 node-role.kubernetes.io/work=work
[root@k8master data]# kubectl get node -o wide
NAME       STATUS   ROLES                  AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                 CONTAINER-RUNTIME
k8master   Ready    control-plane,master   23h   v1.23.0   172.16.4.58    <none>        CentOS Linux 7 (Core)   3.10.0-1127.el7.x86_64         docker://19.3.8
k8node1    Ready    work                   23h   v1.23.0   172.16.3.199   <none>        CentOS Linux 7 (Core)   3.10.0-1160.102.1.el7.x86_64   docker://19.3.8)
 可以看到k8node1已经加入了

5.8 删除子节点(在master主节点上操作)
# kubectl drain <node name> --delete-local-data --force --ignore-daemonsets
# 其中 <node name> 是在k8s集群中使用 <kubectl get nodes> 查询到的节点名称
# 假设这里删除 node3 子节点
[root@node1 home]# kubectl drain node3 --delete-local-data --force --ignore-daemonsets
[root@node1 home]# kubectl delete node node3

[root@node3 home]# # 子节点重置k8s
[root@node3 home]# kubeadm reset

 

6.部署k8s dashboard(这里使用Kubepi)

Kubepi是一个简单高效的k8s集群图形化管理工具,方便日常管理K8S集群,高效快速的查询日志定位问题的工具

6.1 部署KubePI(随便在哪个节点部署,我这里在主节点部署):

[root@k8master ~]# docker pull kubeoperator/kubepi-server
[root@k8master ~]# # 运行容器
[root@k8master ~]# docker run --privileged -itd --restart=unless-stopped --name kube_dashboard -v /home/docker-mount/kubepi/:/var/lib/kubepi/ -p 8000:80 kubeoperator/kubepi-server
地址: http://172.16.4.58:8000
默认用户名:admin
默认密码:kubepi

 

6.2 填写集群名称,默认认证模式,填写apisever地址及token

 

 

6.3 获取登录需要用到的ip地址和登录token

[root@k8master ~]# # 在 k8s 主节点上创建用户,并获取token
[root@k8master ~]# kubectl create sa k8admin --namespace kube-system
serviceaccount/k8admin created
[root@k8master ~]# kubectl create clusterrolebinding k8admin --clusterrole=cluster-admin --serviceaccount=kube-system:k8admin
clusterrolebinding.rbac.authorization.k8s.io/k8admin created
[root@k8master ~]# 
[root@k8master ~]# # 在主节点上获取新建的用户 k8admin 的 token
[root@k8master ~]# kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep k8admin | awk '{print $1}') | grep token: | awk '{print $2}'
eyJhbGciOiJSUzI1NiIsImtpZCI6IkhVeUtyc1BpU1JvRnVacXVqVk1PTFRkaUlIZm1KQTV6Wk9WSExSRllmd0kifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcGktdXNlci10b2tlbi10cjVsMiIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcGktdXNlciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjJiYzlhZDRjLWVjZTItNDE2Mi04MDc1LTA2NTI0NDg0MzExZiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTprdWJlcGktdXNlciJ9.QxkR1jBboqTYiVUUVO4yGhfWmlLDA5wHLo_ZnjAuSLZQDyVevCgBluL6l7y7UryRdId6FmBZ-L0QitvOuTsurcjGL2QHxPE_yZsNW7s9K7eikxJ8q-Q_yOvnADtAueH_tcMGRGW9Zyec2TlmcGTZCNaNUme84TfMlWqX7oP3GGJGMbMGN7H4fPXh-Qqrdp-0MJ3tP-dk3koZUEu3amrq8ExSmjIAjso_otrgFWbdSOMkCXKsqb9yuZzaw7u5Cy18bH_HW6RbNCRT5jGs5aOwzuMAd0HQ5iNm-5OISI4Da6jGdjipLXejcC1H-xWgLlJBx0RQWu41yoPNF57cG1NubQ
[root@k8master ~]# 
[root@k8master ~]# # 在主节点上获取 apiserver 地址
[root@k8master ~]# cat ~/.kube/config | grep server: | awk '{print $2}'
https://172.16.4.58:6443

6.4 确认之后就可以看到

 

7.安装metrics k8s集群监控插件

https://zhuanlan.zhihu.com/p/572406293

 

8.k8s整体部署方式参考文档:

https://zhuanlan.zhihu.com/p/627310856?utm_id=0

 

至此k8s相关内容已经部署完成!!!

 

8.业务相关服务部署

8.1 部署内容和流程

创建pv——创建pvc——创建aiserver服务——创建service——创建ingress代理

8.2 流程yaml文件展示

(1)pv.yaml

[root@k8master new]# cat pv.yaml 
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-aimodel
  labels:
    pv: aimodel
spec:
  capacity:
    storage: 20Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: /data/aibox-common/aimodel
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-common
  labels:
    pv: common
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: /data/aibox-common/common
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-ai-logs
  labels:
    pv: ai-logs
spec:
  capacity:
    storage: 50Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: /data/aibox-common/ai-server/logs
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-ai-dmi
  labels:
    pv: ai-dmi
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: /sys/firmware/dmi/
---

 

(2)pvc.yaml

[root@k8master new]# cat pvc.yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-aimodel
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 20Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-common
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-ai-logs
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-ai-dmi
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
---

 

(3)ai.yaml + service.yaml (将ai服务部署到k8node1节点,因为这个节点是业务服务部署节点,有GPU)

[root@k8master new]# cat ai.yaml 
---
apiVersion: v1
kind: Service
metadata:
  name: ai-svc
  labels:
    app: ai
spec:
  type: NodePort
  ports:
    - port: 28865
      targetPort: 28865
      nodePort: 31000
  selector:
    app: ai
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai
  #namespace: kube-fjyd
spec:
  replicas: 5
  selector:
    matchLabels:
      app: ai
  template:
    metadata:
      labels:
        app: ai
    spec:
      nodeName: k8node1  #调用到k8node1进行部署,因为业务需要,有GPU
      containers:
        - name: ai
          image: 172.168.4.60:8090/rz4.5.0.0/aiserver:v4.5.3.0019_v4.5.15.16   #自己业务镜像,下载不下来
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 28865
          volumeMounts:
            - name: logs
              mountPath: /home/nvidia/aibox/logs
            - name: aimodel
              mountPath: /home/nvidia/aibox/aimodel
            - name: common
              mountPath: /home/nvidia/aibox/common
            - name: dmi
              mountPath: /mnt/sys/firmware/dmi
            - name: localtime
              mountPath: /etc/localtime
              readOnly: true
          resources:
              limits:
                nvidia.com/gpu: 1  # 请求使用1个GPU
      volumes:
        - name: logs
          persistentVolumeClaim:
            claimName: pvc-ai-logs
        - name: aimodel
          persistentVolumeClaim:
            claimName: pvc-aimodel
        - name: common
          persistentVolumeClaim:
            claimName: pvc-common
        - name: dmi
          persistentVolumeClaim:
            claimName: pvc-ai-dmi
        - name: localtime
          hostPath:
            path: /etc/localtime
            type: ""
      restartPolicy: Always
---

 

9. ingress部署

9.1 什么是 Ingress

  Ingress 是对集群中服务的外部访问进行管理的 API 对象,典型的访问方式是 HTTP。Ingress 可以提供负载均衡

  Ingress 公开了从集群外部到集群内 服务的 HTTP 和 HTTPS 路由。 流量路由由 Ingress 资源上定义的规则控制。

  下面是一个将所有流量都发送到同一 Service 的简单 Ingress 示例:

 

9.2 部署 Ingress-nginx controller

  deploy.yaml 坑点:

    Ingress-nginx 官网  https://kubernetes.github.io/ingress-nginx/ 提到了 deploy.yaml 文件

    Ingress-nginx 新版本的 depoly.yaml 有些不同,需要拉取下面2个镜像

    k8s.gcr.io/ingress-nginx/controller:v1.1.2

    k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.1.1

   多半是下载不到的,所以需要自己替换一下 ,可以去docker hub 上找到对应的 镜像文件,比如下边这两个:

[root@k8master ingress]# docker images | egrep "longjianghu|liangjw"
longjianghu/ingress-nginx-controller                              v1.1.2                   7e5c1cecb086        20 months ago       286MB               #k8s.gcr.io/ingress-nginx/controller:v1.1.2
liangjw/kube-webhook-certgen                                      v1.1.1                   c41e9fcadf5a        2 years ago         47.7MB              #k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.1.1

 

9.3 总结 坑点:

  (1)新版本中 提供了 IngressClass ,需要在编写 Ingress 的时候指定
  (2)deploy.yaml中的Image 加载不到,替换成上边的两个镜像
  (3)ingress-nginx-controller 使用 hostNetwork: true 进行部署 比 NodePort 减少一层转发,但是需要指定 选择打了标签的 node nodeSelector: app: ingress

 

9.4 deploy.yaml文件自己修改后的,可以参考

[root@k8master ingress]# cat deploy.yaml  #代码太多,已经折叠
apiVersion: v1
kind: Namespace
metadata:
  labels:
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
  name: ingress-nginx
---
apiVersion: v1
automountServiceAccountToken: true
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx
  namespace: ingress-nginx
---
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
    helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
  labels:
    app.kubernetes.io/component: admission-webhook
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx-admission
  namespace: ingress-nginx
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx
  namespace: ingress-nginx
rules:
  - apiGroups:
      - ""
    resources:
      - namespaces
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - configmaps
      - pods
      - secrets
      - endpoints
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - services
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - networking.k8s.io
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - networking.k8s.io
    resources:
      - ingresses/status
    verbs:
      - update
  - apiGroups:
      - networking.k8s.io
    resources:
      - ingressclasses
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resourceNames:
      - ingress-controller-leader
    resources:
      - configmaps
    verbs:
      - get
      - update
  - apiGroups:
      - ""
    resources:
      - configmaps
    verbs:
      - create
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - create
      - patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  annotations:
    helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
    helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
  labels:
    app.kubernetes.io/component: admission-webhook
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx-admission
  namespace: ingress-nginx
rules:
  - apiGroups:
      - ""
    resources:
      - secrets
    verbs:
      - get
      - create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx
rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
      - endpoints
      - nodes
      - pods
      - secrets
      - namespaces
    verbs:
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - services
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - networking.k8s.io
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - create
      - patch
  - apiGroups:
      - networking.k8s.io
    resources:
      - ingresses/status
    verbs:
      - update
  - apiGroups:
      - networking.k8s.io
    resources:
      - ingressclasses
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
    helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
  labels:
    app.kubernetes.io/component: admission-webhook
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx-admission
rules:
  - apiGroups:
      - admissionregistration.k8s.io
    resources:
      - validatingwebhookconfigurations
    verbs:
      - get
      - update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx
  namespace: ingress-nginx
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ingress-nginx
subjects:
  - kind: ServiceAccount
    name: ingress-nginx
    namespace: ingress-nginx
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  annotations:
    helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
    helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
  labels:
    app.kubernetes.io/component: admission-webhook
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx-admission
  namespace: ingress-nginx
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ingress-nginx-admission
subjects:
  - kind: ServiceAccount
    name: ingress-nginx-admission
    namespace: ingress-nginx
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: ingress-nginx
subjects:
  - kind: ServiceAccount
    name: ingress-nginx
    namespace: ingress-nginx
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
    helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
  labels:
    app.kubernetes.io/component: admission-webhook
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx-admission
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: ingress-nginx-admission
subjects:
  - kind: ServiceAccount
    name: ingress-nginx-admission
    namespace: ingress-nginx
---
apiVersion: v1
data:
  allow-snippet-annotations: "true"
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx-controller
  namespace: ingress-nginx
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  externalTrafficPolicy: Local
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack
  ports:
    - appProtocol: http
      name: http
      port: 80
      protocol: TCP
      targetPort: http
    - appProtocol: https
      name: https
      port: 443
      protocol: TCP
      targetPort: https
  selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
  type: LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx-controller-admission
  namespace: ingress-nginx
spec:
  ports:
    - appProtocol: https
      name: https-webhook
      port: 443
      targetPort: webhook
  selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  minReadySeconds: 0
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: ingress-nginx
      app.kubernetes.io/name: ingress-nginx
  template:
    metadata:
      labels:
        app.kubernetes.io/component: controller
        app.kubernetes.io/instance: ingress-nginx
        app.kubernetes.io/name: ingress-nginx
    spec:
      hostNetwork: true #修改  ingress-nginx-controller 为 hostNetwork模式 
      nodeSelector: #选择 node label 中有 app=ingress的节点进行部署
        app: ingress
      containers:
        - args:
            - /nginx-ingress-controller
            - --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
            - --election-id=ingress-controller-leader
            - --controller-class=k8s.io/ingress-nginx
            - --ingress-class=nginx
            - --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
            - --validating-webhook=:8443
            - --validating-webhook-certificate=/usr/local/certificates/cert
            - --validating-webhook-key=/usr/local/certificates/key
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: LD_PRELOAD
              value: /usr/local/lib/libmimalloc.so
          image: longjianghu/ingress-nginx-controller:v1.1.2 #修改镜像地址
          imagePullPolicy: IfNotPresent
          lifecycle:
            preStop:
              exec:
                command:
                  - /wait-shutdown
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          name: controller
          ports:
            - containerPort: 80
              name: http
              protocol: TCP
            - containerPort: 443
              name: https
              protocol: TCP
            - containerPort: 8443
              name: webhook
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          resources:
            requests:
              cpu: 100m
              memory: 90Mi
          securityContext:
            allowPrivilegeEscalation: true
            capabilities:
              add:
                - NET_BIND_SERVICE
              drop:
                - ALL
            runAsUser: 101
          volumeMounts:
            - mountPath: /usr/local/certificates/
              name: webhook-cert
              readOnly: true
      dnsPolicy: ClusterFirst
      nodeSelector:
        kubernetes.io/os: linux
      serviceAccountName: ingress-nginx
      terminationGracePeriodSeconds: 300
      volumes:
        - name: webhook-cert
          secret:
            secretName: ingress-nginx-admission
---
apiVersion: batch/v1
kind: Job
metadata:
  annotations:
    helm.sh/hook: pre-install,pre-upgrade
    helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
  labels:
    app.kubernetes.io/component: admission-webhook
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx-admission-create
  namespace: ingress-nginx
spec:
  template:
    metadata:
      labels:
        app.kubernetes.io/component: admission-webhook
        app.kubernetes.io/instance: ingress-nginx
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: ingress-nginx
        app.kubernetes.io/part-of: ingress-nginx
        app.kubernetes.io/version: 1.1.2
        helm.sh/chart: ingress-nginx-4.0.18
      name: ingress-nginx-admission-create
    spec:
      containers:
        - args:
            - create
            - --host=ingress-nginx-controller-admission,ingress-nginx-controller-admission.$(POD_NAMESPACE).svc
            - --namespace=$(POD_NAMESPACE)
            - --secret-name=ingress-nginx-admission
          env:
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          image: liangjw/kube-webhook-certgen:v1.1.1 #修改镜像地址
          imagePullPolicy: IfNotPresent
          name: create
          securityContext:
            allowPrivilegeEscalation: false
      nodeSelector:
        kubernetes.io/os: linux
      restartPolicy: OnFailure
      securityContext:
        fsGroup: 2000
        runAsNonRoot: true
        runAsUser: 2000
      serviceAccountName: ingress-nginx-admission
---
apiVersion: batch/v1
kind: Job
metadata:
  annotations:
    helm.sh/hook: post-install,post-upgrade
    helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
  labels:
    app.kubernetes.io/component: admission-webhook
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx-admission-patch
  namespace: ingress-nginx
spec:
  template:
    metadata:
      labels:
        app.kubernetes.io/component: admission-webhook
        app.kubernetes.io/instance: ingress-nginx
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: ingress-nginx
        app.kubernetes.io/part-of: ingress-nginx
        app.kubernetes.io/version: 1.1.2
        helm.sh/chart: ingress-nginx-4.0.18
      name: ingress-nginx-admission-patch
    spec:
      containers:
        - args:
            - patch
            - --webhook-name=ingress-nginx-admission
            - --namespace=$(POD_NAMESPACE)
            - --patch-mutating=false
            - --secret-name=ingress-nginx-admission
            - --patch-failure-policy=Fail
          env:
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          image: liangjw/kube-webhook-certgen:v1.1.1 #修改镜像地址
          imagePullPolicy: IfNotPresent
          name: patch
          securityContext:
            allowPrivilegeEscalation: false
      nodeSelector:
        kubernetes.io/os: linux
      restartPolicy: OnFailure
      securityContext:
        fsGroup: 2000
        runAsNonRoot: true
        runAsUser: 2000
      serviceAccountName: ingress-nginx-admission
---
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: nginx
spec:
  controller: k8s.io/ingress-nginx
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  labels:
    app.kubernetes.io/component: admission-webhook
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.1.2
    helm.sh/chart: ingress-nginx-4.0.18
  name: ingress-nginx-admission
webhooks:
  - admissionReviewVersions:
      - v1
    clientConfig:
      service:
        name: ingress-nginx-controller-admission
        namespace: ingress-nginx
        path: /networking/v1/ingresses
    failurePolicy: Fail
    matchPolicy: Equivalent
    name: validate.nginx.ingress.kubernetes.io
    rules:
      - apiGroups:
          - networking.k8s.io
        apiVersions:
          - v1
        operations:
          - CREATE
          - UPDATE
        resources:
          - ingresses
    sideEffects: None
View Code

 

[root@k8master ingress]# kubectl get all -n ingress-nginx
NAME                                            READY   STATUS      RESTARTS   AGE
pod/ingress-nginx-admission-create-fqsl7        0/1     Completed   0          122m
pod/ingress-nginx-admission-patch-nmbrd         0/1     Completed   0          122m
pod/ingress-nginx-controller-6b68d8cbbf-9xj8t   1/1     Running     0          122m

NAME                                         TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/ingress-nginx-controller             LoadBalancer   10.109.255.117   <pending>     80:30297/TCP,443:31879/TCP   122m
service/ingress-nginx-controller-admission   ClusterIP      10.99.13.106     <none>        443/TCP                      122m

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ingress-nginx-controller   1/1     1            1           122m

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/ingress-nginx-controller-6b68d8cbbf   1         1         1       122m

NAME                                       COMPLETIONS   DURATION   AGE
job.batch/ingress-nginx-admission-create   1/1           8s         122m
job.batch/ingress-nginx-admission-patch    1/1           24s        122m

 

 

9.5.部署 Ingress-nginx
(1)准备工作
  给 k8node1 节点打了app=ingress标签,因为上面的ingress-nginx-controller 使用的是 hostNetwork 模式(只会放pod真实pod 的 端口) + nodeSelector

[root@k8master ingress]# kubectl get node -o wide
NAME       STATUS   ROLES                  AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                 CONTAINER-RUNTIME
k8master   Ready    control-plane,master   47h   v1.23.0   172.16.4.58    <none>        CentOS Linux 7 (Core)   3.10.0-1127.el7.x86_64         docker://19.3.8
k8node1    Ready    work                   47h   v1.23.0   172.16.3.199   <none>        CentOS Linux 7 (Core)   3.10.0-1160.102.1.el7.x86_64   docker://19.3.8

[root@k8master ingress]# kubectl label node k8node1 app=ingress node/k8node1 labeled [root@k8master ingress]# kubectl get node --show-labels NAME STATUS ROLES AGE VERSION LABELS k8master Ready control-plane,master 45h v1.23.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8master,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers= k8node1 Ready work 44h v1.23.0 app=ingress,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8node1,kubernetes.io/os=linux,node-role.kubernetes.io/work=work

 

(2)部署deploy.yaml, kubect apply -f

kubectl apply -f deploy.yaml #通过 kubectl apply 命令进行部署 ,前提是镜像准备好,否则GG

 

[root@k8master ingress]# kubectl apply -f deploy.yaml
namespace/ingress-nginx created
serviceaccount/ingress-nginx created
serviceaccount/ingress-nginx-admission created
role.rbac.authorization.k8s.io/ingress-nginx created
role.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrole.rbac.authorization.k8s.io/ingress-nginx created
clusterrole.rbac.authorization.k8s.io/ingress-nginx-admission created
rolebinding.rbac.authorization.k8s.io/ingress-nginx created
rolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
configmap/ingress-nginx-controller created
service/ingress-nginx-controller created
service/ingress-nginx-controller-admission created
deployment.apps/ingress-nginx-controller created
job.batch/ingress-nginx-admission-create created
job.batch/ingress-nginx-admission-patch created
ingressclass.networking.k8s.io/nginx created
validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission created

 

(3) 查看状态
kubectl get all -n ingress-nginx #查看 ingress-nginx namespace的 部署情况

 

[root@k8master ingress]# kubectl get all -n ingress-nginx
NAME                                            READY   STATUS      RESTARTS   AGE
pod/ingress-nginx-admission-create-fqsl7        0/1     Completed   0          147m
pod/ingress-nginx-admission-patch-nmbrd         0/1     Completed   0          147m
pod/ingress-nginx-controller-6b68d8cbbf-9xj8t   1/1     Running     0          147m

NAME                                         TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/ingress-nginx-controller             LoadBalancer   10.109.255.117   <pending>     80:30297/TCP,443:31879/TCP   147m
service/ingress-nginx-controller-admission   ClusterIP      10.99.13.106     <none>        443/TCP                      147m

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ingress-nginx-controller   1/1     1            1           147m

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/ingress-nginx-controller-6b68d8cbbf   1         1         1       147m

NAME                                       COMPLETIONS   DURATION   AGE
job.batch/ingress-nginx-admission-create   1/1           8s         147m
job.batch/ingress-nginx-admission-patch    1/1           24s        147m

 

(4)查看 ingress-nginx-controller 的 日志情况

kubectl logs -f ingress-nginx-controller-6b68d8cbbf-9xj8t -n ingress-nginx

 

(5)测试访问(因为给k8node1节点打了app=ingress标签)

直接访问 k8node1的 ip 即可,因为 ingress-nginx-controller 默认是 监听 80端口,由于上面的 nodeSelector: #选择 node label 中有 app=ingress的节点进行部署 ,而 k8node1 是被打了标签的节点node

 

(6)部署一个 tomcat 测试 Ingress-nginx

  通过部署一个tomcat ,测试Ingress-nginx的代理 是否生效

   1. 编写 deploy-tomcat.yaml

  • Deployment 部署tomcat:8.0-alpine,
  • Service 暴露 tomcat pod
  • 一个 Ingress 资源它规定 访问 tomcat.demo.com 这个域名的 所有请求 / 都转发到 tomcat-demo Service 上
  • IngressClass 新版本提供的资源 ,用于在定义 Ingress资源的时候 指定,在集群中有多个 Ingress controller 的时候很有用处
[root@k8master ingress]# cat deploy-tomcat.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tomcat-demo
spec:
  selector:
    matchLabels:
      app: tomcat-demo
  replicas: 1
  template:
    metadata:
      labels:
        app: tomcat-demo
    spec:
      containers:
      - name: tomcat-demo
        image: tomcat:8.0-alpine
        ports:
        - containerPort: 8080
---

apiVersion: v1
kind: Service
metadata:
  name: tomcat-demo
spec:
  selector:
    app: tomcat-demo
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
---

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: tomcat-demo
spec:
  defaultBackend:
    service:
      name: default-http-backend
      port:
        number: 80
  ingressClassName: nginx
  rules:
  - host: tomcat.demo.com
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: tomcat-demo
            port:
              number: 80
---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: default-http-backend
  labels:
    app: default-http-backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: default-http-backend
  template:
    metadata:
      labels:
        app: default-http-backend
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: default-http-backend
        image: registry.cn-hangzhou.aliyuncs.com/google_containers/defaultbackend:1.4
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 30
          timeoutSeconds: 5
        ports:
        - containerPort: 8080
        resources:
          limits:
            cpu: 10m
            memory: 20Mi
          requests:
            cpu: 10m
            memory: 20Mi
---

apiVersion: v1
kind: Service
metadata:
  name: default-http-backend
  labels:
    app: default-http-backend
spec:
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: default-http-backend

 

  2. deploy-tomcat.yaml解释

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tomcat-demo
spec:
  selector:
    matchLabels:
      app: tomcat-demo
  replicas: 1
  template:
    metadata:
      labels:
        app: tomcat-demo
    spec:
      containers:
      - name: tomcat-demo
        image: tomcat:8.0-alpine
        ports:
        - containerPort: 8080


创建一个名为 tomcat-demo 的 Deployment 对象,用于部署 Tomcat 应用。
使用标签选择器匹配 app: tomcat-demo 的 pod。
设置副本数为 1。
定义 pod 模板,使用 Tomcat 8.0 Alpine 版本镜像,容器暴露 8080 端口

 

 

apiVersion: v1
kind: Service
metadata:
  name: tomcat-demo
spec:
  selector:
    app: tomcat-demo
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080

创建一个名为 tomcat-demo 的 Service 对象,用于公开 Tomcat 应用。
使用标签选择器选择具有 app: tomcat-demo 标签的 pod。
在服务上公开端口 80,将流量转发到 pod 的 8080 端口。

 

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: tomcat-demo
spec:
  defaultBackend:
    service:
      name: default-http-backend
      port:
        number: 80
  ingressClassName: nginx
  rules:
  - host: tomcat.demo.com
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: tomcat-demo
            port:
              number: 80


创建一个名为 tomcat-demo 的 Ingress 对象,定义了路由规则。
使用默认的后端服务 default-http-backend 处理未匹配到规则的请求。
指定 Ingress 使用的类别是 nginx。
定义了一个规则:当请求的主机是 tomcat.demo.com 时,将请求路径为 "/" 的流量转发到 tomcat-demo 服务的 80 端口

 

apiVersion: apps/v1
kind: Deployment
metadata:
  name: default-http-backend
  labels:
    app: default-http-backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: default-http-backend
  template:
    metadata:
      labels:
        app: default-http-backend
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: default-http-backend
        image: registry.cn-hangzhou.aliyuncs.com/google_containers/defaultbackend:1.4
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 30
          timeoutSeconds: 5
        ports:
        - containerPort: 8080
        resources:
          limits:
            cpu: 10m
            memory: 20Mi
          requests:
            cpu: 10m
            memory: 20Mi


创建一个名为 default-http-backend 的 Deployment 对象,用于部署默认的后端服务。
设置副本数为 1。
使用标签选择器匹配 app: default-http-backend 的 pod。
定义 pod 模板,使用默认后端服务的镜像,容器暴露 8080 端口。
配置 livenessProbe,确保服务正常运行

 

apiVersion: v1
kind: Service
metadata:
  name: default-http-backend
  labels:
    app: default-http-backend
spec:
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: default-http-backend


创建一个名为 default-http-backend 的 Service 对象,用于公开默认的后端服务。
在服务上公开端口 80,将流量转发到 pod 的 8080 端口。
使用标签选择器选择具有 app: default-http-backend 标签的 pod。
这个配置文件定义了一个包含 Tomcat 应用和默认后端服务的完整 Kubernetes 部署,以及相关的 Service 和 Ingress 资源。 Ingress 规则将根据主机名和路径将流量路由到相应的服务

 

  3.执行deploy-tomcat.yaml文件

[root@k8master ingress]# kubectl apply -f deploy-tomcat.yaml 
deployment.apps/tomcat-demo unchanged
service/tomcat-demo unchanged
ingress.networking.k8s.io/tomcat-demo created
deployment.apps/default-http-backend created
service/default-http-backend created

 

4.测试是否可以访问成功

  编辑windows的hosts文件

172.16.3.199 tomcat.demo.com
172.16.3.199 api.demo.com

 

 

 

 

10.参考部署文档

https://blog.51cto.com/u_16213624/7693786

 

11.我们自己的业务需求

11.1业务需求说明

  我们是做算法的,每个算法会被封装成一个镜像,比如:image1(入侵算法),image2(安全带识别算)

  结合k8s流程: ingress-nginx(做了hostNetwork: true 网络模式,省去了service的一成转发),直接可以访问ingress-nginx的域名和端口——>客户通过ingress发布的host+path+业务自定义的接口地址拼接成访问url——>ingress-nginx根据对应的匹配规则转发  到后端service——>deployment——>pods(pvc——>pv)

  最后的流程就是:客户通过固定的接口格式,访问对应的路由匹配到不同的后端service提供算法服务

 

11.2 ai-ingress部署

[root@k8master ingress]# cat ai-ingress.yaml 
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ai-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /ai/test/alg/$1
spec:
  defaultBackend:
    service:
      name: default-http-backend
      port:
        number: 80
  ingressClassName: nginx
  rules:
  - host: funuo.ai.com
    http:
      paths:
      - pathType: Prefix
        path: "/A/(.*)"
        backend:
          service:
            name: ai-svc
            port:
              number: 28865
      - pathType: Prefix
        path: "/B/(.*)"
        backend:
          service:
            name: ai1-svc
            port:
              number: 28865
---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: default-http-backend
  labels:
    app: default-http-backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: default-http-backend
  template:
    metadata:
      labels:
        app: default-http-backend
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: default-http-backend
        image: registry.cn-hangzhou.aliyuncs.com/google_containers/defaultbackend:1.4
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 30
          timeoutSeconds: 5
        ports:
        - containerPort: 8080
        resources:
          limits:
            cpu: 10m
            memory: 20Mi
          requests:
            cpu: 10m
            memory: 20Mi
---

apiVersion: v1
kind: Service
metadata:
  name: default-http-backend
  labels:
    app: default-http-backend
spec:
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: default-http-backend

 

11.3 ingress部分解释

在我的 Ingress 配置中,path: "/B/(.*)", 以及 annotations 中的 nginx.ingress.kubernetes.io/rewrite-target: /ai/test/alg/$1 的写法是用于通过正则表达式捕获路径的一部分,并在重写目标中使用捕获到的值。

具体来说:

path: "/B/(.*)": 这是一个使用正则表达式的路径规则,其中 (.*) 表示捕获任意字符序列。在这个规则中,路径以 /B/ 开头,然后 (.*) 捕获后续的字符序列。

annotations: nginx.ingress.kubernetes.io/rewrite-target: /ai/test/alg/$1: 这是一个注解,它告诉 Ingress 控制器在将请求发送到后端服务之前重写路径。其中 $1 是在路径匹配中捕获到的第一个组的值(即正则表达式中的 (.*) 部分)。

如果你的路径是 /B/infer/ai/test/alg/infer,那么 /B/(.*) 中的 (.*) 会捕获 infer/ai/test/alg/infer,然后 $1 就会被替换为捕获到的值,最终的重写路径就是 /ai/test/alg/infer/ai/test/alg/infer。

如果你的路径是 /B/foo/bar,那么 /B/(.*) 中的 (.*) 会捕获 foo/bar,然后 $1 就会被替换为捕获到的值,最终的重写路径就是 /ai/test/alg/foo/bar。

 

11.4 ingress状态查询

[root@k8master ingress]# kubectl get ingress
NAME          CLASS   HOSTS             ADDRESS   PORTS   AGE
ai-ingress    nginx   funuo.ai.com                80      3h34m
tomcat-demo   nginx   tomcat.demo.com             80      3h51m

 

11.5 结果展示(还需要提前启动好对应的服务,这就是业务上的事情了,其他的部署方式可以参考文档)

 

 

12.k8s自动伸缩(CPU、MEM)

12.1 自动伸缩类型说明

Horizontal Pod Autoscaler (HPA): HPA 允许根据应用程序的负载动态地调整 Pod 的副本数量。它监测 Pod 的 CPU 或内存使用率,并在需要时增加或减少 Pod 的数量,以确保应用程序能够满足负载需求。

Cluster Autoscaler (CA): CA 允许在集群级别动态调整节点的数量。它监测节点的资源利用率,并在需要时增加或减少节点的数量,以确保集群能够满足 Pod 的资源需求。CA 通常用于处理整个集群的自动伸缩,而不仅仅是应用程序的 Pod。

Vertical Pod Autoscaler (VPA): VPA 允许根据容器的资源需求调整容器的资源分配。它监测容器的 CPU 和内存使用情况,并在需要时调整容器的请求和限制,以确保容器能够获得足够的资源。

Horizontal Vertical Pod Autoscaler (HVPA): HVPA 结合了 HPA 和 VPA 的功能,允许根据应用程序负载和容器资源需求动态地调整 Pod 和容器的副本数量和资源分配。

 

12.2 HPA工作原理

1)对于每个pod的资源指标(如CPU),控制器从资源指标API中获取每一个 HorizontalPodAutoscaler指定的pod的指标,然后,如果设置了目标使用率,控制器获取每个pod中的容器资源使用情况,并计算资源使用率。如果使用原始值,将直接使用原始数据(不再计算百分比)。然后,控制器根据平均的资源使用率或原始值计算出缩放的比例,进而计算出目标副本数。需要注意的是,如果pod某些容器不支持资源采集,那么控制器将不会使用该pod的CPU使用率

2)如果 pod 使用自定义指标,控制器机制与资源指标类似,区别在于自定义指标只使用原始值,而不是使用率。

3)如果pod 使用对象指标和外部指标(每个指标描述一个对象信息)。这个指标将直接跟据目标设定值相比较,并生成一个上面提到的缩放比例。在autoscaling/v2beta2版本API中,这个指标也可以根据pod数量平分后再计算。通常情况下,控制器将从一系列的聚合API(metrics.k8s.io、custom.metrics.k8s.io和external.metrics.k8s.io)中获取指标数据。metrics.k8s.io API通常由 metrics-server(需要额外启动)提供。

 

12.3 HPA版本介绍

[root@k8master data]# kubectl api-versions | grep autoscal
autoscaling/v1
autoscaling/v2
autoscaling/v2beta1
autoscaling/v2beta2

 

autoscaling/v1:
这是最早的 Autoscaling API 版本,引入了 Horizontal Pod Autoscaler(HPA)。
HPA 允许根据指标(如 CPU 使用率)自动调整 Pod 的数量,以适应应用程序的负载变化。
该版本提供了基本的水平伸缩功能,但可能缺乏一些后续版本引入的高级功能。

autoscaling/v2:
在 Kubernetes 1.8 版本中引入,引入了 Custom Metrics API,允许用户使用自定义指标进行自动伸缩。
Custom Metrics API 扩展了 HPA 的功能,使其能够使用更多类型的指标,而不仅仅是默认的 CPU 和内存指标。

autoscaling/v2beta1:
在 Kubernetes 1.8 版本中引入,为 Horizontal Pod Autoscaler(HPA)引入了多指标支持。
这使得可以同时使用多个指标进行自动伸缩,例如同时基于 CPU 和内存的指标。
这个版本引入了对多指标自动伸缩的支持,提供更多灵活性。

autoscaling/v2beta2:(需要k8s版本1.23+)
这是当前(截至我的知识截止日期为2022年1月)最新的 HPA 版本,提供了更多的功能和改进。
改进了 API 的可用性和效率,引入了稳定性,增强了与 Custom Metrics API 的集成,支持更复杂的规则和指标。

 

12.4 metrics server

    metrics-server是一个集群范围内的资源数据集和工具,同样的,metrics-server也只是显示数据,并不提供数据存储服务,主要关注的是资源度量API的实现,比如CPU、文件描述符、内存、请求延时等指标,metric-server收集数据给k8s集群内使用,如kubectl,hpa,scheduler等

(1)metrics server部署文档

https://zhuanlan.zhihu.com/p/572406293

 

(2)修改后的配置文件展示

[root@k8master data]# cat metrics-server-components.yaml 
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
    #lipc 添加
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        - --kubelet-insecure-tls

        image: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.6.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: https
            scheme: HTTPS
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 4443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: https
            scheme: HTTPS
          initialDelaySeconds: 20
          periodSeconds: 10
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      volumes:
      - emptyDir: {}
        name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
  version: v1beta1
  versionPriority: 100

 

(3)应用metrics server

kubectl apply -f metrics-server-components.yaml

 

(4)查看metrics server服务状态

[root@k8master data]# kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS        AGE
calico-kube-controllers-5b9cd88b65-vg4gq   1/1     Running   3 (5d15h ago)   5d20h
calico-node-86l8c                          1/1     Running   2 (5d15h ago)   5d20h
calico-node-lg2mg                          1/1     Running   0               5d20h
coredns-6d8c4cb4d-wm8d2                    1/1     Running   0               5d21h
coredns-6d8c4cb4d-xxdmm                    1/1     Running   0               5d21h
etcd-k8master                              1/1     Running   0               5d21h
kube-apiserver-k8master                    1/1     Running   0               5d21h
kube-controller-manager-k8master           1/1     Running   0               5d21h
kube-proxy-bbvzc                           1/1     Running   2 (5d15h ago)   5d21h
kube-proxy-smhnc                           1/1     Running   0               5d21h
kube-scheduler-k8master                    1/1     Running   0               5d21h
metrics-server-fd9598766-495c9             1/1     Running   3 (5d15h ago)   5d20h

 

(5)测试kubectl top命令

[root@k8master data]# kubectl top node
NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
k8master   278m         3%     3218Mi          20%       
k8node1    300m         1%     14085Mi         44%       
[root@k8master data]# kubectl top pods
NAME                                   CPU(cores)   MEMORY(bytes)   
ai1-6d98756bdd-4b86q                   3m           1257Mi          
ai1-6d98756bdd-5dwlz                   3m           1251Mi          
ai1-6d98756bdd-fhgvv                   3m           1265Mi          
ai1-6d98756bdd-jgrxb                   3m           1243Mi          
ai1-6d98756bdd-mc5zp                   3m           1248Mi          
ai1-6d98756bdd-t2sv4                   3m           1264Mi          
ai1-6d98756bdd-w5vsq                   3m           1275Mi          
ai1-6d98756bdd-z6ptz                   3m           1262Mi          
default-http-backend-ff744689f-wxqnf   1m           3Mi             
php-apache-866cb4fc88-zvq6g            1m           8Mi             
tomcat-demo-55b6bbcb97-kmt25           1m           336Mi           
v1                                     0m           0Mi 

 

(6)top参数说明

在 Kubernetes 中,CPU 资源的计算是以 CPU 核的 "毫核"(miliCPU)为单位进行的。这意味着 1 毫核等于 0.001 核。

因此,当你在 Pod 配置中设置 CPU 请求为 "100m" 时,表示该容器请求的是 0.1 个 CPU 核。这是因为 100m 表示 100 毫核,而 100m 等于 0.1。

举例来说:
"100m" 表示请求 0.1 个 CPU 核。
"200m" 表示请求 0.2 个 CPU 核。
"1000m" 表示请求 1.0 个 CPU 核。

 

12.5 测试HPA的autoscaling/v2beta2版-基于CPU、MEM自动扩缩容

  用Deployment创建一个php-apache服务,然后利用HPA进行自动扩缩容。步骤如下:

12.5.1 创建并运行一个php-apache服务,通过deployment创建pod,在k8s的master节点操作

1)使用dockerfile构建一个新的镜像,在k8s的master节点构建

[root@k8master dockerfile]# cat Dockerfile 
FROM php:5-apache
ADD index.php /var/www/html/index.php
RUN chmod a+rx index.php

2)创建前端php文件

[root@k8master dockerfile]# cat index.php 
<?php
  $x = 0.0001;
  for ($i = 0; $i <= 1000000;$i++) {
    $x += sqrt($x);
  }
  echo "OK!";
?>

3)打包镜像

docker build . -t 172.168.4.177:8090/test/hpa-example:v1
docker push 172.168.4.177:8090/test/hpa-example:v1

4)通过deployment部署一个php-apache服务

[root@k8master hpa]# cat php-apache.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  replicas: 1
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: 172.168.4.177:8090/test/hpa-example:v1
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: "200m"
            memory: "256Mi"
          requests:
            cpu: "100m"
            memory: "128Mi"

---

apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache

5)创建deployment、service

kubectl apply -f php-apache.yaml

6)查看是否创建成功

[root@k8master hpa]# kubectl get pods -l run=php-apache
NAME                          READY   STATUS    RESTARTS   AGE
php-apache-866cb4fc88-zvq6g   1/1     Running   0          147m

 

12.5.2 创建HPA

1)创建HPA文件

[root@k8master hpa]# cat php-apache-hpa.yaml 
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment    #监控的是deployment
    name: php-apache    #deployment的名称
  minReplicas: 1        #最小pod个数
  maxReplicas: 10       #最大pod个数
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization         #类型百分比
        averageUtilization: 50    #50%
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue   #类型数值
        averageValue: 200Mi  # 设置目标平均内存值

 

2)启用自动伸缩

kubectl apply -f php-apache-hpa.yaml

 

3)查看是否创建成功

[root@k8master hpa]# kubectl get hpa
NAME             REFERENCE               TARGETS                 MINPODS   MAXPODS   REPLICAS   AGE
php-apache-hpa   Deployment/php-apache   9195520/200Mi, 1%/50%   1         10        1          143m

 

4)压测php-apache服务,只是针对CPU做压测

[root@k8master hpa]# kubectl run v1 -it --image=busybox /bin/sh
If you don't see a command prompt, try pressing enter.
/ # while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done
OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!^C

 

5)监控hpa伸缩变化

[root@k8master hpa]# kubectl get hpa -w
NAME             REFERENCE               TARGETS                 MINPODS   MAXPODS   REPLICAS   AGE
php-apache-hpa   Deployment/php-apache   6074368/200Mi, 1%/50%   1         10        1          49m
php-apache-hpa   Deployment/php-apache   8585216/200Mi, 179%/50%   1         10        1          49m
php-apache-hpa   Deployment/php-apache   8585216/200Mi, 179%/50%   1         10        4          49m
php-apache-hpa   Deployment/php-apache   7797418666m/200Mi, 141%/50%   1         10        4          49m
php-apache-hpa   Deployment/php-apache   7801856/200Mi, 141%/50%       1         10        4          50m
php-apache-hpa   Deployment/php-apache   8285184/200Mi, 48%/50%        1         10        4          50m
php-apache-hpa   Deployment/php-apache   8591360/200Mi, 49%/50%        1         10        4          50m
php-apache-hpa   Deployment/php-apache   8604672/200Mi, 64%/50%        1         10        4          50m
php-apache-hpa   Deployment/php-apache   8620032/200Mi, 48%/50%        1         10        6          51m
php-apache-hpa   Deployment/php-apache   8141209600m/200Mi, 34%/50%    1         10        6          51m
php-apache-hpa   Deployment/php-apache   8118954666m/200Mi, 44%/50%    1         10        6          51m
php-apache-hpa   Deployment/php-apache   8123733333m/200Mi, 37%/50%    1         10        6          51m
php-apache-hpa   Deployment/php-apache   8444586666m/200Mi, 38%/50%    1         10        6          52m
php-apache-hpa   Deployment/php-apache   8519680/200Mi, 33%/50%        1         10        6          52m
php-apache-hpa   Deployment/php-apache   8624128/200Mi, 35%/50%        1         10        6          52m
php-apache-hpa   Deployment/php-apache   8633002666m/200Mi, 33%/50%    1         10        6          52m
php-apache-hpa   Deployment/php-apache   8658261333m/200Mi, 35%/50%    1         10        6          53m
php-apache-hpa   Deployment/php-apache   8642560/200Mi, 27%/50%        1         10        6          53m
php-apache-hpa   Deployment/php-apache   8647338666m/200Mi, 38%/50%    1         10        6          53m
php-apache-hpa   Deployment/php-apache   8648704/200Mi, 34%/50%        1         10        6          53m
php-apache-hpa   Deployment/php-apache   8649386666m/200Mi, 42%/50%    1         10        6          54m
php-apache-hpa   Deployment/php-apache   8652800/200Mi, 35%/50%        1         10        6          54m
php-apache-hpa   Deployment/php-apache   8655530666m/200Mi, 43%/50%    1         10        6          54m
php-apache-hpa   Deployment/php-apache   8656213333m/200Mi, 25%/50%    1         10        6          54m
php-apache-hpa   Deployment/php-apache   8677376/200Mi, 37%/50%        1         10        6          55m
php-apache-hpa   Deployment/php-apache   8742912/200Mi, 38%/50%        1         10        6          55m
php-apache-hpa   Deployment/php-apache   8746325333m/200Mi, 34%/50%    1         10        6          55m
php-apache-hpa   Deployment/php-apache   8746325333m/200Mi, 24%/50%    1         10        6          55m
php-apache-hpa   Deployment/php-apache   8746325333m/200Mi, 5%/50%     1         10        6          56m
php-apache-hpa   Deployment/php-apache   8746325333m/200Mi, 1%/50%     1         10        6          56m
php-apache-hpa   Deployment/php-apache   8746325333m/200Mi, 1%/50%     1         10        6          59m
php-apache-hpa   Deployment/php-apache   8762163200m/200Mi, 1%/50%     1         10        5          59m
php-apache-hpa   Deployment/php-apache   8762163200m/200Mi, 1%/50%     1         10        5          60m
php-apache-hpa   Deployment/php-apache   8835072/200Mi, 1%/50%         1         10        3          60m
php-apache-hpa   Deployment/php-apache   9195520/200Mi, 1%/50%         1         10        1          61m

 

6)php-apache的pods个数变化观察

[root@k8master ~]# kubectl get pods -l run=php-apache
NAME                          READY   STATUS    RESTARTS   AGE
php-apache-866cb4fc88-7dmrp   1/1     Running   0          4m26s
php-apache-866cb4fc88-7sx4z   1/1     Running   0          4m26s
php-apache-866cb4fc88-b4hjw   1/1     Running   0          5m56s
php-apache-866cb4fc88-km7nx   1/1     Running   0          5m56s
php-apache-866cb4fc88-mwx5j   1/1     Running   0          5m56s
php-apache-866cb4fc88-zvq6g   1/1     Running   0          62m
[root@k8master ~]# kubectl get pods -l run=php-apache
NAME                          READY   STATUS    RESTARTS   AGE
php-apache-866cb4fc88-7sx4z   1/1     Running   0          9m11s
php-apache-866cb4fc88-b4hjw   1/1     Running   0          10m
php-apache-866cb4fc88-km7nx   1/1     Running   0          10m
php-apache-866cb4fc88-mwx5j   1/1     Running   0          10m
php-apache-866cb4fc88-zvq6g   1/1     Running   0          67m
[root@k8master ~]# kubectl get pods -l run=php-apache
NAME                          READY   STATUS    RESTARTS   AGE
php-apache-866cb4fc88-zvq6g   1/1     Running   0          69m

 

12.5.3 基于autoscaling/v2beta2对内存进行压测

1)创建nginx的deployment的pod、service

[root@k8master hpa]# cat deploy-nginx.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-hpa
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.9.1
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        resources:
          requests:
            cpu: 0.01
            memory: 25Mi
          limits:
            cpu: 0.05
            memory: 60Mi
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  selector:
    app: nginx
  type: NodePort
  ports:
  - name: http
    protocol: TCP
    port: 80
    targetPort: 80
    nodePort: 30080

 

2)应用deploy-nginx的deployment

[root@k8master hpa]# kubectl apply -f deploy-nginx.yaml

 

3)查看运行状态

[root@k8master hpa]# kubectl get pods
NAME                                   READY   STATUS    RESTARTS   AGE
ai1-6d98756bdd-4b86q                   1/1     Running   0          14h
ai1-6d98756bdd-5dwlz                   1/1     Running   0          14h
ai1-6d98756bdd-fhgvv                   1/1     Running   0          14h
ai1-6d98756bdd-jgrxb                   1/1     Running   0          14h
ai1-6d98756bdd-mc5zp                   1/1     Running   0          14h
ai1-6d98756bdd-t2sv4                   1/1     Running   0          14h
ai1-6d98756bdd-w5vsq                   1/1     Running   0          14h
ai1-6d98756bdd-z6ptz                   1/1     Running   0          14h
default-http-backend-ff744689f-wxqnf   1/1     Running   0          4d
nginx-hpa-7f5d6fbfd9-fp5qm             1/1     Running   0          47s
php-apache-866cb4fc88-zvq6g            1/1     Running   0          168m
tomcat-demo-55b6bbcb97-kmt25           1/1     Running   0          4d1h
v1                                     1/1     Running   0          114m

 

4)创建一个hpa(可以基于百分比,也可以基于具体数值定义伸缩的阈值)

[root@k8master hpa]# cat nginx-hpa.yaml 
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-hpa
  minReplicas: 1
  maxReplicas: 10
  metrics:
  #- type: Resource
    #resource:
    #  name: cpu
    #  target:
    #    type: Utilization
    #    averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 50Mi  # 设置目标平均内存值

 

5)应用hpa

[root@k8master hpa]# kubectl apply -f nginx-hpa.yaml 
Warning: autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+
horizontalpodautoscaler.autoscaling/nginx-hpa created

 

6)查看hpa状态

[root@k8master hpa]# kubectl get hpa
NAME             REFERENCE               TARGETS                 MINPODS   MAXPODS   REPLICAS   AGE
nginx-hpa        Deployment/nginx-hpa    1380352/50Mi            1         10        1          17s
php-apache-hpa   Deployment/php-apache   9195520/200Mi, 1%/50%   1         10        1          168m

 

7)压测nginx内存,会自动伸缩pod

[root@k8master hpa]# kubectl get pods
NAME                                   READY   STATUS    RESTARTS   AGE
nginx-hpa-7f5d6fbfd9-fp5qm             1/1     Running   0          10m

[root@k8master hpa]# kubectl exec -it nginx-hpa-7f5d6fbfd9-fp5qm bash 
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@nginx-hpa-7f5d6fbfd9-fp5qm:/# dd if=/dev/zero of=/tmp/a

 

8)hpa展示

[root@k8master hpa]# kubectl get hpa -w
NAME             REFERENCE               TARGETS                 MINPODS   MAXPODS   REPLICAS   AGE
nginx-hpa        Deployment/nginx-hpa    31979520/20Mi           1         10        2          11m
nginx-hpa        Deployment/nginx-hpa    31979520/20Mi           1         10        4          11m
nginx-hpa        Deployment/nginx-hpa    10496k/20Mi             1         10        4          11m
nginx-hpa        Deployment/nginx-hpa    16656384/20Mi           1         10        4          12m
nginx-hpa        Deployment/nginx-hpa    16721920/20Mi           1         10        4          12m
nginx-hpa        Deployment/nginx-hpa    16721920/20Mi           1         10        1          15m

 

9)nginx pod伸缩展示

[root@k8master hpa]# kubectl get pods -l app=nginx -w
NAME                         READY   STATUS    RESTARTS   AGE
nginx-hpa-7f5d6fbfd9-cf4hg   1/1     Running   0          4m20s
nginx-hpa-7f5d6fbfd9-fp5qm   1/1     Running   0          19m
nginx-hpa-7f5d6fbfd9-p59wk   1/1     Running   0          19s
nginx-hpa-7f5d6fbfd9-pkdn5   1/1     Running   0          19s
[root@k8master hpa]# kubectl get pods -l app=nginx -w
NAME                         READY   STATUS    RESTARTS   AGE
nginx-hpa-7f5d6fbfd9-cf4hg   1/1     Running   0          4m20s

 

12.5.4 k8s HPA自动伸缩文档

https://www.52dianzi.com/category/article/2a49706c73ced38df05ad1aba2966cf0.html

 

至此我们基于autoscaling/v2beta2 对cpu和内存自定义pod的伸缩就完成了!!!

 

13 k8s基于GPU值的自动伸缩

13.1 参考文档

https://www.cnblogs.com/lixinliang/p/16938630.html

13.2 

 

 

14 argocd部署与应用

14.1 argocd介绍

14.1.1 argocd简介

  • Argo CD 是用于 Kubernetes 的声明性 GitOps 连续交付工具
  • Argo CD 的主要职责是 CD(Continuous Delivery,持续交付),将应用部署到 Kubernetes 等环境中,而 CI(Continuous Integration,持续集成)主要是交给 Jenkins,Gitlab CI 等工具来完成

 14.1.2 架构图

 

 

  • Argo CD 从 Git Repo 拉取应用的配置,部署在 Kubernetes 集群中。
  • 当有人新增功能时,提交一个 Pull Requests 到 Git Repo 修改应用的部署配置,等待合并。
  • 在 Pull Requests 合并之后,通过 Webhook 触发 Argo CD 执行更新操作。
  • 应用得到更新,发送通知

 

14.2 argocd部署

14.2.1 版本信息

        k8s集群版本:v1.23.0

        argocd版本:v2.10.0-rc1

 

14.2.2 argocd官网

https://argo-cd.readthedocs.io/en/stable/operator-manual/installation/

 

#选择我们需要的yaml部署文件,我这里是测试,没有使用ha相关的,可以根据需求自己获取对应的文件

 

#点击上边会跳转到github,下载对应版本的install.yaml文件,我选择的是目前(2023-12-20)最新的版本,如果wget下载不了,可以在浏览器中先下载,在传到服务器

 

 

14.2.3 创建argocd的命名空间

kubectl create namespace argocd 
kubectl get ns

 

14.2.4 上传到服务器,目前只能安装2.3.5版本的

rz install.yaml(2.10.0-rc1

 

14.2.5 应用argocd yaml

kubectl apply -n argocd -f install.yaml 

 

  • 部署遇到问题,在部署最新版本以及较新版的时候,argocd-repo-server报错
[root@k8master argocd]# kubectl logs -f argocd-repo-server-54c8d6cf58-c9wxz -n argocd
time="2023-12-19T09:39:15Z" level=info msg="ArgoCD Repository Server is starting" built="2023-12-01T23:05:50Z" commit=6eba5be864b7e031871ed7698f5233336dfe75c7 port=8081 version=v2.9.3+6eba5be
time="2023-12-19T09:39:15Z" level=info msg="Generating self-signed TLS certificate for this session"
time="2023-12-19T09:39:15Z" level=info msg="Initializing GnuPG keyring at /app/config/gpg/keys"
time="2023-12-19T09:39:15Z" level=info msg="gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe224381730" dir= execID=96582
time="2023-12-19T09:39:21Z" level=error msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe224381730` failed exit status 2" execID=96582
time="2023-12-19T09:39:21Z" level=info msg=Trace args="[gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe224381730]" dir= operation_name="exec gpg" time_ms=6036.826093000001
time="2023-12-19T09:39:21Z" level=fatal msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe224381730` failed exit status 2"

 

  • 目前的解决方法:

(1)降低argocd的版本,部署argocd的2.3.5是没有问题的

(2)通过argocd中的一些博客可以看到解决方案(推荐使用)

https://github.com/argoproj/argo-cd/issues/9809#issuecomment-1243415495

https://github.com/argoproj/argo-cd/issues/11647#issuecomment-1348758596

 

  • 问题解决,编辑argocd-repo-server 的deployment文件,删除对应的两个参数

#编辑文件

kubectl edit deploy -n argocd argocd-repo-server

#找到这两行,删除完进行保存

seccompProfile:
    type: RuntimeDefault

#如图:

 

14.2.6 查看安装详情

kubectl get pods -n argocd -o wide

 #如图

 

14.2.7 给argocd-server设置nodeport映射端口

kubectl patch svc argocd-server -p '{"spec": {"type": "NodePort"}}' -n argocd

 #如图

 

14.2.8 获取argocd ui登陆密码(admin/z7A7xxxxvxR-h3i1)

kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

 #如图

 

14.2.9 argocd ui登陆方式

https://172.16.4.58:32066
admin/z7A7xxxxvxR-h3i1

 argocd web UI

 

14.2.10 argocd CLI命令行工具安装(浏览器登陆下载指定版本)

#下载指定版本
https://github.com/argoproj/argo-cd/releases/download/v2.3.5/argocd-linux-amd64
https://github.com/argoproj/argo-cd/releases/download/v2.10.0-rc1/argocd-linux-amd64  #我当前使用的是这个版本


#上传到服务器,并给执行权限
chmod +x argocd-linux-amd64 

#做软连接全局可用
ln -s /data/argocd/argocd-linux-amd64 /usr/bin/argocd

#查看版本信息
argocd version

展示:

[root@k8master argocd]# argocd version
argocd: v2.10.0-rc1+d919606
  BuildDate: 2023-12-18T21:04:29Z
  GitCommit: d9196060c2d8ea3eadafe278900e776760c5fcc6
  GitTreeState: clean
  GoVersion: go1.21.5
  Compiler: gc
  Platform: linux/amd64
argocd-server: v2.10.0-rc1+d919606
  BuildDate: 2023-12-18T20:45:12Z
  GitCommit: d9196060c2d8ea3eadafe278900e776760c5fcc6
  GitTreeState: clean
  GoVersion: go1.21.3
  Compiler: gc
  Platform: linux/amd64
  Kustomize Version: v5.2.1 2023-10-19T20:13:51Z
  Helm Version: v3.13.2+g2a2fb3b
  Kubectl Version: v0.26.11
  Jsonnet Version: v0.20.0

 

14.2.11 使用 CLI 登录集群 IP是当前部署服务器的主机或者是master机器,端口是argocd-server的svc的nodeport端口

argocd login 172.16.4.58:32066 --username admin --password z7A7xxxxvxR-h3i1
展示:
[root@k8master argocd]# argocd login 172.16.4.58:32066 --username admin --password z7A7xxxxvxR-h3i1
WARNING: server certificate had error: tls: failed to verify certificate: x509: cannot validate certificate for 172.16.4.58 because it doesn't contain any IP SANs. Proceed insecurely (y/n)? y
'admin:login' logged in successfully
Context '172.16.4.58:32066' updated
 

14.2.12 修改密码(admin/keshen@1234),修改后用新密码登陆

argocd account update-password --account admin --current-password z7A7xxxxvxR-h3i1 --new-password keshen@1234
展示:
[root@k8master argocd]# argocd account update-password --account admin --current-password z7A7xxxxvxR-h3i1 --new-password keshen@1234
Password updated
Context '172.16.4.58:32066' updated

 

 14.3 gitlab代码仓准备

  • 在 Gitlab 上创建项目,取名为 argocd-lab,为了方便实验将仓库设置为 public 公共仓库。
  • 在仓库中创建 yfile 目录,在目录中创建两个 yaml 资源文件,分别是 myapp-deployment.yaml 和 myapp-service.yaml。

14.3.1 创建一个项目

 

 

14.3.2 创建一个test分支

 

14.3.3 将yaml文件推送到gitlab

配置git全局账号
git config user.email "lipc@zhengjue-ai.com"
git config --global user.name "lipc"

拉取gitlab项目
git clone http://172.16.4.53/lipc/argocd-lab.git

切换分支test
cd argocd-lab/
git checkout test

创建yfile目录,存放argocd使用的yaml文件资源,也是后边配置argocd的path资源路径的值
mkdir yfile

创建yaml文件,并拷贝到yflie目录中,文件在下边有展示
myapp-deployment.yaml  
myapp-service.yaml

将yaml文件提交到gitlab
git add .
git commit -m "file yaml"
git push origin test

 

 #yaml文件展示

[root@k8master yfile]# cat myapp-deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ksapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ksapp
  template:
    metadata:
      labels:
        app: ksapp
    spec:
      containers:
      - image: registry.cn-shanghai.aliyuncs.com/public-namespace/myapp:v1
        name: ksapp
        ports:
        - containerPort: 80

 

[root@k8master yfile]# cat myapp-service.yaml 
apiVersion: v1
kind: Service
metadata:
  name: ksapp
spec:
  ports:
  - port: 80
    targetPort: 80
    nodePort: 32060
  type: NodePort
  selector:
    app: ksapp

 

#如图展示

 

 

 

 

14.4 创建 Argo CD App

14.4.1首先创建一个命名空间 devops 用于 Argo CD 部署应用

kubectl create ns devops
kubectl get ns

 

14.4.2 创建argocd app

方式一:使用 UI 创建 App

  • Application Name: 自定义的应用名。
  • Project: 使用默认创建好的 default 项目。
  • SYNC POLICY: 同步方式,可以选择自动或者手动,这里我们选择手动同步。

 

 

 

  • Repository URL: 项目的 Git 地址。
  • Revision: 分支名。
  • Path: yaml 资源文件所在的相对路径。

 

 

  • Cluster URL: Kubernetes API Server 的访问地址,由于 Argo CD 和下发应用的 Kubernetes 集群是同 一个,因此可以直接使用 https://kubernetes.default.svc 来访问。关于 Kubernetes 中 DNS 解析规则可 以查看 Pod 与 Service 的 DNS。
  • Namespace: 部署应用的命名空间。

 

 

 

  •  创建完成后如下图所示,此时处于 OutOfSync 的状态:

 

 

 

  • 由于设置的是手动同步,因此需要点一下下面的 SYNC 进行同步:

 

  • 在弹出框点击 SYNCHRONIZE,确认同步:

 

 

  • 等待同步完成

 

  • 同步完成

 

  • 在 Argo CD 上点击应用进入查看详情,如下图:

 

 

方式二:使用 CLI 创建 APP
argocd app create ksapp2 \
  --repo http://172.16.4.53/lipc/argocd-lab.git \
  --path yfile \
  --dest-server https://kubernetes.default.svc \
  --dest-namespace devops \
  --revision test

 

  • argocd 查看创建信息
[root@k8master yfile]# argocd app list  #列出app应用
NAME           CLUSTER                         NAMESPACE  PROJECT  STATUS     HEALTH   SYNCPOLICY  CONDITIONS                REPO                                    PATH   TARGET
argocd/ksapp   https://kubernetes.default.svc  devops     default  Synced     Healthy  <none>      <none>                    http://172.16.4.53/lipc/argocd-lab.git  yfile  test
argocd/ksapp2  https://kubernetes.default.svc  devops     default  OutOfSync  Healthy  <none>      SharedResourceWarning(2)  http://172.16.4.53/lipc/argocd-lab.git  yfile  test

 

[root@k8master yfile]# argocd app get ksapp #查看ksapp应用
Name:               argocd/ksapp
Project:            default
Server:             https://kubernetes.default.svc
Namespace:          devops
URL:                https://172.16.4.58:32066/applications/ksapp
Repo:               http://172.16.4.53/lipc/argocd-lab.git
Target:             test
Path:               yfile
SyncWindow:         Sync Allowed
Sync Policy:        <none>
Sync Status:        Synced to test (42ca17b)
Health Status:      Healthy

GROUP  KIND        NAMESPACE  NAME   STATUS  HEALTH   HOOK  MESSAGE
       Service     devops     myapp  Synced  Healthy        service/myapp created
apps   Deployment  devops     myapp  Synced  Healthy        deployment.apps/myapp created

 

 

 

方式三:使用 YAML 文件创建

apiVersion: argoproj.io/v1alpha1 
kind: Application
metadata:
 name: ksapp
 namespace: argocd 
spec:
 destination:
   namespace: devops # 部署应用的命名空间
   server: https://kubernetes.default.svc # API Server 地址
 project: default # 项目名
 source:
   path: quickstart # 资源文件路径
   repoURL: http://172.16.4.53/lipc/argocd-lab.git # Git 仓库地址
   targetRevision: test # 分支名

 

14.5 版本升级

  • 将 ksapp 应用从手动同步改成自动同步。点击 APP DETAILS -> SYNC POLICY,点击 ENABLE AUTO- SYNC

 

 

 

  • 当前版本

 

 

  • 编辑 ksapp 资源文件,将版本从 v1 改为 v2,点击 Commit changes,提交更改:

 

 

  • 等待一会 Argo CD 会自动更新应用,如果等不及可以点击 Refresh,Argo CD 会去立即获取最新的资源 文件。可以看到此时 ksapp Deployment 会新创建 v2 版本的 Replicaset,v2 版本的 Replicaset 会创 建并管理 v2 版本的 Pod

 

  • 升级之后的版本v2

 

 

14.6 版本回滚

  • 升级到 v2 版本以后, v1 版本的 Replicaset 并没有被删除,而是继续保留,这是为了方便我们回滚应 用。在 ksapp 应用中点击 HISTORY AND ROLLBACK 查看历史记录,可以看到有 2 个历史记录:

 

 

  • 假设刚刚上线的 v2 版本出现了问题,需要回滚回 v1 版本,那么可以选中 v1 版本,然后点击 Rollback 进行回滚:

 

 

  • 在回滚的时候需要禁用 AUTO-SYNC 自动同步,点击 OK 确认即可:

 

 

  • 等待一会可以看到此时已经回滚成功,此时 Pod 是 v1 版本的,并且由于此时线上的版本并不是 Git 仓 库中最新的版本,因此此时同步状态是 OutOfSync:

 

 

 

14.7 argocd部署参考文档

#主要
https://blog.csdn.net/lzhcoder/article/details/131763500
#辅助
https://blog.csdn.net/Coder_Boy_/article/details/131747958

https://blog.csdn.net/m0_37723088/article/details/129991679
#问题解决
https://linuxea.com/3086.html

 

 

15. 新的node节点加入k8s集群

15.1 初始化新的node节点

  按照上边步骤——> 3.安装k8s准备工作

15.2 在master上获取加入k8s集群的命令和授权

kubeadm token create --print-join-command

获取的信息:

kubeadm join 172.16.4.58:6443 --token m7ngmk.4mowyu019g6cwttp --discovery-token-ca-cert-hash sha256:de19ee27e18341ce9acf6248d76664c3bc932372745930fe28687a20073c179a

15.3 在新的node节点上执行

15.4 执行完成后master查看node信息

[root@k8master ~]# kubectl get node

 

16 core dns验证

参考文档:

http://www.taodudu.cc/news/show-634984.html?action=onClick

 可以在容器里边ping或者nc指定服务的域名,格式如下:

#statefulset
<PodName>.<ServiceName>.<Namespace>.svc.cluster.local

mysql-ss-0.mysql-svc.default.svc.cluster.local 3306


#deployment
<ServiceName>.<Namespace>.svc.cluster.local

nginx-svc.default.svc.cluster.local

 

posted @ 2023-12-20 18:21  Leonardo-li  阅读(341)  评论(0编辑  收藏  举报