20221215 1. 安装 k8s 集群

服务器资源

3台机器,系统 CentOS 7

节点 内网ip 用途
k8s-master 172.16.0.7 k8s主节点
k8s-node1 172.16.0.3 k8s工作节点1
k8s-node2 172.16.0.12 k8s工作节点2

Docker 部署

Install Docker Engine on CentOS | Docker Documentation

# 安装最新版:推荐大家安装最新版本
yum -y install docker-ce
systemctl start docker
systemctl status docker
systemctl enable docker
# 查看版本
docker version

配置镜像加速

vim /etc/docker/daemon.json

{
  "registry-mirrors": [
              "https://mirror.ccs.tencentyun.com",
              "https://hub-mirror.c.163.com",
            "https://mirror.baidubce.com"
        ]
}
systemctl daemon-reload
systemctl restart docker

k8s 部署

准备工作

修改主机名

hostnamectl set-hostname k8s-master
hostnamectl set-hostname k8s-node1
hostnamectl set-hostname k8s-node2

hosts配置

vim /etc/hosts

172.16.0.7 k8s-master
172.16.0.3 k8s-node1
172.16.0.12 k8s-node2

关闭防火墙

systemctl stop firewalld
systemctl disable firewalld

关闭selinux

# 查看状态
getenforce
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
setenforce 0

网桥过滤

vim /etc/sysctl.conf

net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
net.ipv6.conf.all.forwarding = 1
vm.swappiness = 0
modprobe br_netfilter
sysctl -p

开启IPVS

vim /etc/sysconfig/modules/ipvs.modules

#!/bin/bash
ipvs_modules="ip_vs ip_vs_lc ip_vs_wlc ip_vs_rr ip_vs_wrr ip_vs_lblc ip_vs_lblcr ip_vs_dh ip_vs_sh ip_vs_fo ip_vs_nq ip_vs_sed ip_vs_ftp nf_conntrack"
for kernel_module in ${ipvs_modules}; do
  /sbin/modinfo -F filename ${kernel_module} > /dev/null 2>&1
  if [ $? -eq 0 ]; then
    /sbin/modprobe ${kernel_module}
  fi
done
chmod 755 /etc/sysconfig/modules/ipvs.modules 
sh /etc/sysconfig/modules/ipvs.modules 
lsmod | grep ip_vs
vim ~/.bashrc

# 加入开机自启动
sh /etc/sysconfig/modules/ipvs.modules 

同步时间

安装chrony,卸载ntp

yum -y remove ntp
yum -y install chrony
systemctl start chronyd
systemctl enable chronyd
# 将当前的 UTC 时间写入硬件时钟 
timedatectl set-local-rtc 0 
# 重启依赖于系统时间的服务 
systemctl restart rsyslog && systemctl restart crond

命令补全

# 安装bash-completion
yum -y install bash-completion bash-completion-extras
# 使用bash-completion
source /etc/profile.d/bash_completion.sh

关闭swap分区

# 查看状态
free -h
# 临时关闭:
swapoff -a

# 永久关闭:
vi /etc/fstab

# 将文件中的/dev/mapper/centos-swap这行代码注释掉
#/dev/mapper/centos-swap swap swap defaults 0 0


# 确认swap已经关闭:若swap行都显示 0 则表示关闭成功
free -m

使桥接流量对 iptables 可见

# 验证,返回1表示正常
sysctl -n net.bridge.bridge-nf-call-iptables 
sysctl -n net.bridge.bridge-nf-call-ip6tables
cat > /etc/sysctl.d/k8s.conf <<EOF 
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

sysctl --system

修改docker驱动为systemd

vim /etc/docker/daemon.json
在配置/etc/docker/daemon.json增加
"exec-opts": ["native.cgroupdriver=systemd"],
systemctl daemon-reload
systemctl restart docker

增加docker镜像仓库

视情况而定,看看是否需要增加镜像仓库

vi /etc/docker/daemon.json

# 在配置/etc/docker/daemon.json增加"registry-mirrors"的值
# "https://ustc-edu-cn.mirror.aliyuncs.com",
systemctl daemon-reload
systemctl restart docker
#查看状态
systemctl status docker
#查看详细
docker info

安装k8s

添加 kubernetes 源

cat > /etc/yum.repos.d/kubernetes.repo <<EOF
[kubernetes]
name=Kubernetes    
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
# 重建yum缓存
yum clean all && yum -y makecache

安装 kubeadm、kubelet、kubectl

# 查看版本
yum list kubelet --showduplicates | sort -r

# 参考,安装最新版本(推荐)
yum install -y kubelet kubeadm kubectl
systemctl enable kubelet

# 指定版本安装,1.24弃用docker
version=1.23.12-0
yum install -y kubelet-${version} kubeadm-${version} kubectl-${version}
systemctl enable kubelet
# 卸载
yum remove -y kubelet kubeadm kubectl

命令补全

# 安装 bash 自动补全插件
yum install bash-completion -y
# 设置 kubectl 与 kubeadm 命令补全,下次 login 生效
kubectl completion bash >/etc/bash_completion.d/kubectl
kubeadm completion bash > /etc/bash_completion.d/kubeadm

启动

master节点执行:

kubeadm init --image-repository registry.aliyuncs.com/google_containers   --pod-network-cidr=10.244.0.0/16

输出:

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 172.16.0.7:6443 --token k6mjct.ns01hfjbs0mjzrhd \
        --discovery-token-ca-cert-hash sha256:a1b220bfc75ef98214f13ea21e06887b1f5571c63a909470373a7a00c776f63a
# 查看状态
systemctl status kubelet

worker节点执行(master节点执行完毕,会提示):

kubeadm join 9.135.144.24:6443 --token 5tnocg.0uvnrem6xzt06nb6 \
        --discovery-token-ca-cert-hash sha256:d1cfac7a2682d1982dc8b26258f61e13e3c32412b15c304ba86e8e12a4be4ec9

输出

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was receiv                                                                                 ed.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluste                                                                                 r.

安装遇到问题,需要重新安装则执行

kubeadm reset

安装成功后,会提示以下信息,复制粘贴执行即可使用kubectl

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

部署flannel插件

wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
#注意yml文件中的地址要与初始化地址段对应
kubectl apply -f kube-flannel.yml
---
kind: Namespace
apiVersion: v1
metadata:
  name: kube-flannel
  labels:
    pod-security.kubernetes.io/enforce: privileged
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: flannel
rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes/status
  verbs:
  - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-flannel
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-flannel
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-flannel
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-flannel
  labels:
    tier: node
    app: flannel
spec:
  selector:
    matchLabels:
      app: flannel
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - linux
      hostNetwork: true
      priorityClassName: system-node-critical
      tolerations:
      - operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni-plugin
       #image: flannelcni/flannel-cni-plugin:v1.1.0 for ppc64le and mips64le (dockerhub limitations may apply)
        image: docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.0
        command:
        - cp
        args:
        - -f
        - /flannel
        - /opt/cni/bin/flannel
        volumeMounts:
        - name: cni-plugin
          mountPath: /opt/cni/bin
      - name: install-cni
       #image: flannelcni/flannel:v0.19.2 for ppc64le and mips64le (dockerhub limitations may apply)
        image: docker.io/rancher/mirrored-flannelcni-flannel:v0.19.2
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      containers:
      - name: kube-flannel
       #image: flannelcni/flannel:v0.19.2 for ppc64le and mips64le (dockerhub limitations may apply)
        image: docker.io/rancher/mirrored-flannelcni-flannel:v0.19.2
        command:
        - /opt/bin/flanneld
        args:
        - --ip-masq
        - --kube-subnet-mgr
        resources:
          requests:
            cpu: "100m"
            memory: "50Mi"
          limits:
            cpu: "100m"
            memory: "50Mi"
        securityContext:
          privileged: false
          capabilities:
            add: ["NET_ADMIN", "NET_RAW"]
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: EVENT_QUEUE_DEPTH
          value: "5000"
        volumeMounts:
        - name: run
          mountPath: /run/flannel
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
        - name: xtables-lock
          mountPath: /run/xtables.lock
      volumes:
      - name: run
        hostPath:
          path: /run/flannel
      - name: cni-plugin
        hostPath:
          path: /opt/cni/bin
      - name: cni
        hostPath:
          path: /etc/cni/net.d
      - name: flannel-cfg
        configMap:
          name: kube-flannel-cfg
      - name: xtables-lock
        hostPath:
          path: /run/xtables.lock
          type: FileOrCreate

验证

# 查看集群节点信息
kubectl get nodes
# 查看所有pod
kubectl get pod -A

挂载GPU

可选

安装 Nvidia 驱动

下载 Nvidia 驱动

lspci | grep -i nvidia

根据需要下载驱动

NVIDIA

安装 NVIDIA 驱动检测工具

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum install nvidia-detect -y
# 检测显卡驱动
nvidia-detect -v

安装依赖环境

yum install kernel-devel gcc dkms -y

安装驱动

# 查看内核版本
uname -r
chmod +x NVIDIA-Linux-x86_64-440.118.02.run
./NVIDIA-Linux-x86_64-440.118.02.run --kernel-source-path=/usr/src/kernels/4.14.105-19-0024 -k $(uname -r)

选项

 Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different 
  kernel later.

NO

Install NVIDIA's 32-bit compatibility libraries?

NO

An incomplete installation of libglvnd was found. All of the essential libglvnd libraries are present, but one or more optional components are       
  missing. Do you want to install a full copy of libglvnd? This will overwrite any existing libglvnd libraries.

Don't install libglvnd files

验证

[root@k8s-master ~]# nvidia-smi 
Thu Nov  3 17:27:02 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.118.02   Driver Version: 440.118.02   CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:08.0 Off |                    0 |
| N/A   67C    P0    27W /  70W |      0MiB / 15109MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

NVIDIA GPU Driver安装

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum install -y kmod-nvidia-515.76-1.el7_9.elrepo
curl -s -L https://nvidia.github.io/nvidia-docker/centos7/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo 
# 备份 /etc/docker/daemon.json
cp /etc/docker/daemon.json.bak
sudo yum install -y nvidia-container-toolkit nvidia-docker2
vim /etc/docker/daemon.json

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
# 重启 docker
systemctl daemon-reload
systemctl restart docker
# 验证nvidia-docker2是否安装成功
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

k8s 挂载 GPU

k8s master

kubectl create -f nvidia-device-plugin.yml  

测试

kubectl create -f gpu-pod.yml
kubectl logs gpu-pod

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

问题记录

执行kubectl init 报错

[root@k8s-master ~]# kubeadm init --image-repository registry.aliyuncs.com/google_containers   --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.25.3
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR CRI]: container runtime is not running: output: E1015 10:17:22.425895   29854 remote_runtime.go:948] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
time="2022-10-15T10:17:22+08:00" level=fatal msg="getting status of runtime: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

解决方法

rm -rf /etc/containerd/config.toml
systemctl restart containerd

systemctl start kubelet 报错没有配置文件 config.yaml

K8S服务搭建过程中出现的憨批错误_84岁带头冲锋的博客-CSDN博客

IDEA k8s 环境搭建

安装 k8s 插件

插件名称:Kubernetes

实时模板,快速生成文件内容:设置->编辑器->实时模板

配置 SSH 客户端

新建 ssh 会话:settings->Tools->SSH Configurations->新建

使用 ssh :Tools->Start SSH session->选择我们刚刚配置的ssh客户端名称

配置 Remote Host 和部署

目标:将idea工程中的文件上传k8s集群master节点

配置:设置->构建、执行、部署->部署->新建

  1. 连接:选择 SFTP 连接,指定上面配置的 SSH 会话,设置根路径

  2. 映射:设置部署路径(推荐与项目名相同)

右键项目,选择部署,上传到服务器,项目文件会被上传到设置的 根路径+部署路径

使用视图:

  • 终端

  • Remote Host

posted @ 2022-12-22 14:45  流星<。)#)))≦  阅读(91)  评论(0编辑  收藏  举报