Kubernetes

前言

开发和运维是保证互联网产品快速迭代的2个重要环节，然而开发求变，运维图稳，2者在迭代速度上存在无法同频的鸿沟；

中间就需要1条DevOps工具链来连接开发和运维2个部门，使2者尽可能成为1个整体，从而使运维部署的效率得以满足产品快速开发迭代的需求，以保证整个互联网产品得以快速迭代、交付；

近些年，随着Web应用微服务化、大数据分布式处理、分布式模型训练技术，MLOps和DevOps文化的兴起，使K8s逐渐成为了主流的应用部署方案；

传统部署：直接在物理服务器/虚拟机的OS之上部署应用
Ansible/Puppet/Saltstack：分布式应用编排工具
Docker：应用程序容器化
Kubernetes：分布式容器编排工具

一、Kubernetes集群概念

如果软件架构是比较传统的单机/分层，我们通过docker / docker compose的方式部署是可以解决问题的；

如果软件架构微服务，多个服务部署起来难度增大，就需要1个容器编排工具；

当1个容器出现了故障，如何立刻启动另1个容器来替代当前容器，继续对外提供服务；（无容错能力，仅出现单点故障，既造成整个服务雪崩）

当并发量大的时候，无法动态横向扩展容器的数量，当并发量降低时，无法动态减少容器数量；（动态扩缩）

以上这些容器管理的问题统称为容器编排问题；

Kubernetes简称K8s，是一种基于容器技术的分布式应用部署方案，可以解决容器编排问题；

K8s本质是一组服务器集群，有以下优势

自动装箱：基于资源依赖及约束自动完成容器的部署
自我修复保证服务的高可用：一旦某1个容器退出，会在1秒能重新启动另1个容器进行替换；
弹性伸缩：可以根据现实需求，动态调整K8S集群中容器的数量；
服务自动发现：服务和服务之间可以通过自动发现的机制，找到自身依赖的服务；
负载均衡：如果1个服务绑定了多个容器，那么多个容器可以自动实现该服务的负载均衡；
版本管理：版本滚动更新、回退；

二、Kubernetes集群内部通信机制

OpenStack集群内部各个组件采用异步消息队列进行通信；

Kubernetes集群内部各组件采用List-Watch机制与API-server(etc数据库唯一的客户端)进行通信，保持各组件的数据的同步，实现组件间的解耦。

1.特点

当设计优秀的1个异步消息的系统时，对消息机制有至少如下4点要求：

消息可靠性
消息实时性
消息顺序性
高性能

2.流程

Etcd 存储集群的数据信息，API-server作为统一入口，任何对etcd数据的操作都必须经过API-server

客户端(Kubelet/Kube-Scheduler/Controller-Manager)通过list-watch监听API-server 中资源(pod/rs/rc 等等)的 create, update 和 delete 事件；

客户端针对事件类型调用相应的事件处理函数

list-watch由2个API动作组成，分别是List和Watch；

list

调用资源的 ist API 罗列需要watch资源类型，基于 HTTP 短链接实现（全量）
watch

调用资源的 watch API 监听资源变更事件，基于 HTTP 长链接实现（增量）

3.informer、go-client和kubebuilder的关系

无论是自定义的，还是K8s内置的Controller都使用list-watch机制与API-server通信；

Informer

Informer是go-client中的1个模块，是实现List-Watch通信机制的模块；

go-client

go-client封装了Informer模块，与Kubernetes API服务器交互。

使用client-go，您可以编写Go语言程序来创建、修改和删除Kubernetes对象，如Pod、Deployment、Service等

kubebuilder

kubebuilder封装go-client功能，是实现Operator的代码框架；

三、Kubernetes集群组件

Kubernetes由以下组件构成，采用list-watch通信机制，协同工作完成容器编排任务；

1.Master节点

k8s的控制平面

包含APIServer、Scheduler、ControllerManager三个守护进程；

APIServer：整个K8s的大脑，对外提供API服务，对k8s内部组件发送指令，解耦合设计，只有API-Server才能操作etcd数据库。
Scheduler：帮Pod选择最优的Node节点
ControllerManager：k8s内部几乎每1种特定资源都有一1种特定的Controller来维护管理，那谁来管理Controller呢？ControllerManager的职责便是把所有的Controller聚合起来统一管理，确保各种Controller的健康运行。

1.1.获取K8S的证书

首先引入SDK支持库。然后将 ~/.kube 的config文件的内容复制到本地目录，保存为文件kubeconfig.yaml，

[root@k8s-m ~]# cp .kube/config    kubeconfig.yaml

1.2.Python调用K8s的APIServer

from kubernetes import client, config

config.kube_config.load_kube_config(config_file="./config.txt")
# 获取API的CoreV1Api版本对象
v1 = client.CoreV1Api()

#列出 namespaces信息
# for ns in v1.list_namespace().items:
#     print(ns.metadata.name)

#列出所有的pod信息
ret = v1.list_pod_for_all_namespaces(watch=False)
for i in ret.items:
    print("%s\t%s\t%s\t%s" % (i.spec.node_name, i.status.pod_ip, i.metadata.namespace, i.metadata.name))

kubernets.py

2.Node(worker)节点：

K8s的工作平面；

Kubelet工作在集群的Node节点，Kubelet 监听（Watch）Master节点 (API Server) 的指令，调用容器引擎，在Node节点创建出Pod。
容器引擎（docker）接收API-Server的指令，管理容器；
Kube-proxy：实时Watch着所在节点之上Service的变动，把变动的Service转换为Iptables/IPVS通信规则

3.Pod

在k8s的所有功能都是围绕Pod展开的；

Volume为Pod提供跨节点的数据存储功能、 Service、Ingress是让Pod的访问更加固定、Pod控制器是让Pod中的应用可以高可用和动态扩缩容。

Pod相当于虚拟化部署架构中的1个虚拟主机；
容器是应用的外壳， Pod是容器的外壳；
Pod是K8S调动的最小单元；
1个Pod中可以包含多个容器，其中只有1个容器为主容器，其他容器辅助主容器完成更多功能；
辅助容器称为Sidecar（边车、摩托车挎斗）；
在1个Pod中，容器之的存储卷是彼此共享的；
通常情况下1个Pod只有1个容器；

4.Pod控制器

运维人员的3大核心职能：发布管理、变更管理、故障处理；

控制器就相当于1位7*24小时工作的运维工程师，完全完成运维人员的3大核心职能；

Kubernetes是1个容器编排工具，容器编排工具的核心功能无非就是容器编排，而容器指的就是应用；

在Kubernetes中我们不是直接增、删、改、查容器，而是通过定义Pod控制器，然后通过Pod控制器简介管理容器。

在Kubernetes中，Pod是最小的管理单位，但是kubernetes很少直接管理Pod，而是通过Pod控制器完成对Pod的管理，最终保证容器的正常运行；

在Kubernetes中，有很多类型的Pod控制器，不同的Pod控制器创建出不同类型的Pod，适用于不同业务部署场景，常见的Pod控制器有下面这些：

ReplicationController：比较原始的Pod控制器，已经被废弃，由ReplicaSet替代
ReplicaSet：精确保证副本的数量一直维持在期望值，并支持Pod数量的扩缩容，镜像版本升级
Deployment：通过控制ReplicaSet控制器来控制Pod版本的滚动升级，实现服务的灰度、金丝雀、蓝绿部署（常用）
Horizontal Pod Autoscaler：无需用户干涉，根据集群负载情况，自动水平调整Pod的数量，实现削峰填谷
DaemonSet：在集群中的指定Node上运行且仅运行一个副本，一般用于守护进程类的任务（监控、日志收集服务）
Job：它创建出来的Pod只要完成任务就立即退出，不需要重启或重建，用于执行1次性任务
Cronjob：它创建的Pod负责周期性任务控制，不需要持续后台运行（定时任务、数据备份）
StatefulSet：管理有状态应用

5.Endpoints

EndPoint是1种K8s内置资源，描述了1个Service代理的后端Pods，将Pods和Service关联到一起；

当创建Service时Endpint也会随之自动创建，且Name同名；

我们可以手工创建1对name相同的Endpoint、Service，用于代理非Pod的后端；

从而在K8s集群中引入外部非Pod的后端资源，关联到Service资源上；

6.Service

Service通过CoreDNS实现服务发现和注册；

Service通过Kuber-Proxy组件在Linux系统内核空间中动态生成IPVS/IPTables规则用于将用户流量负载均衡到绑定该Service的后端Pods;

这些规则工作在网络协议的4层（传输层）；

[root@master ~]# kubectl get svc -A
NAMESPACE     NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes      ClusterIP   10.96.0.1       <none>        443/TCP                  7d3h
default       myapp-service   ClusterIP   10.98.160.120   <none>        80/TCP                   4h2m
kube-system   kube-dns        ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   7d3h
[root@master ~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.96.0.1:443 rr
  -> 192.168.56.18:6443           Masq    1      0          0         
TCP  10.96.0.10:53 rr
  -> 10.244.104.1:53              Masq    1      0          0         
  -> 10.244.166.131:53            Masq    1      0          0         
TCP  10.96.0.10:9153 rr
  -> 10.244.104.1:9153            Masq    1      0          0         
  -> 10.244.166.131:9153          Masq    1      0          0         
TCP  10.98.160.120:80 rr
UDP  10.96.0.10:53 rr
  -> 10.244.104.1:53              Masq    1      0          0         
  -> 10.244.166.131:53            Masq    1      0          0

Service可以看做是一组同类Pod的反向代理，借助Service这个对外访问的固定接口，客户端就可以通过代理的方式访问后端Pod

当Service代理了1个Pod时就是Pod的反向代理，如果有多个后端Pod，Service还可以为这1组Pod提供4层(传输层)的负载均衡;

Service支持会话粘性（Session Affinity）功能，可以总是把来自同1个客户端请求，调度至1个Pod上；

在k8s集群中，可以创建多个service，由于每个service的IP地址是不同的，端口就可以相同；

[root@master zhanggen]# kubectl get svc
NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
kubernetes     ClusterIP   10.96.0.1        <none>        443/TCP   22h
myapp          ClusterIP   10.111.198.253   <none>        80/TCP    19m
nginx-deploy   ClusterIP   10.98.82.22      <none>        80/TCP    14h

如果service的type是nodeport，kube-proxy会在每个node节点生成iptable访问规则，之后用户就可以通过每1个node的ip地址访问到service。

[root@master zhanggen]# kubectl get svc
NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
kubernetes     ClusterIP   10.96.0.1        <none>        443/TCP   22h
myapp          ClusterIP   10.111.198.253   <none>        80/TCP    29m
nginx-deploy   ClusterIP   10.98.82.22      <none>        80/TCP    14h
[root@master zhanggen]# kubectl delete svc/myapp
service "myapp" deleted
[root@master zhanggen]# kubectl create service nodeport myapp --tcp=80:80
service/myapp created
[root@master zhanggen]# kubectl get svc
NAME           TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
kubernetes     ClusterIP   10.96.0.1      <none>        443/TCP        22h
myapp          NodePort    10.98.226.20   <none>        80:30121/TCP   3s
nginx-deploy   ClusterIP   10.98.82.22    <none>        80/TCP         14h
[root@master zhanggen]# kubectl get nodes
NAME     STATUS   ROLES    AGE   VERSION
master   Ready    master   22h   v1.17.4
node1    Ready    <none>   22h   v1.17.4
node2    Ready    <none>   22h   v1.17.4
[root@master zhanggen]# curl node1:30121
Hello MyApp | Version: v1 | <a href="hostname.html">Pod Name</a>
[root@master zhanggen]# curl node2:30121
Hello MyApp | Version: v1 | <a href="hostname.html">Pod Name</a>

6.1.Service类型

ClusterIP

相当于Service的虚IP不能ping同，虚IP和service名称映射关系为CoreDNS的A记录，用于集群内部访问；

无头service

ClusterIP=None，后端Pod无需Service提供负载均衡服务，由客户端自主选择，或用于Pod之间固定访问;

NodePort

在每1个Node上开启1个端口，可以对外暴露服务；可以通过NodeIP:NodePort 地址联系该服务。

LoadBalancer

LoadBalancer=NodePort+LoadBalancer，可以对外暴露服务;

除了拥有集群内部 IP并在NodePort上公开服务之外,还向云提供商询问负载平衡器,该负载平衡器转发到公开为 NodeIP:NodePort 的服务每个节点。

LoadBalancer可由云厂商提供，也可自建LoadBalancer(MetalLB)

ExternalName

externalName Service是k8s中一个特殊的service类型；

它没有Endpoints也不代理Pods，仅仅是通过DNS的CNAME机制把当前Service域名CNAME到另外1个域名上；

可以提供集群内不同NameSpace或在CoreDNS支持访问外部域名的前提下，指定集群外部的真实域名:

比如mysql.db.svc这样的建立在db命名空间内的mysql服务;(跨NameSpace访问)
也可以指定http://mysql.example.com这样的外部真实域名;(跨外网访问)

7.LabelSelector

既然Pod是动态的Pod的IP和端口会动态变化，那么Service是如何发现后端Pod的呢？这时就需要DNS进行服务发现，然后再通过标签选择器关联上后端Pod；

由于K8s集群中容器是可以动态调整的，所以容器的IP地址会经常发生变化，但Pod标签是固定的，标签可以标记和分类Pod；
在Pod控制器和Service中定义标签选择器，在Pod中定义标签，Pod控制器和Service便可以通过标签选择器，关联管理相关Pod；

8.DNS Addon

如果用户把K8s集群中定义的Service删了，那么客户端如何通过Service反向代理到后端的Pod呢？

在K8s中DNS插件具有服务自动发现和DNS解析2大功能；

一旦k8s集群中的Service资源变更，就会自动更新到Coredns的DNS记录中；

在K8s集群之内，客户端Pod是通过Service的完整名称空间（nginx-deploy.default.svc.cluster.local.）访问服务端Pod的，而不是直接通过ServiceIP地址。

详细过程如下：

客户端要想访问Service，需要先请求Coredns，对完整的Service名称进行解析，Coredns解析出Service域名对应的ServiceIP之后，客户端再去访问该Service对应的ServiceIP；

对于无头Service，Coredns对该完整Service名称进行解析，解析出该Service后台关联的那些Endpoint的IP地址列表。

Pod的DNS策略

在创建Pod时可以指定Pod的DNS策略；

ClusterFirst：这是默认的DNS策略，意味着当Pod需要进行域名解析时，首先会查询集群内部的CoreDNS服务。通过CoreDNS来做域名解析，表示Pod的/etc/resolv.conf文件被自动配置指向kube-dns服务地址。
None：使用该策略，Kubernetes会忽略集群的DNS策略。需要您提供dnsConfig字段来指定DNS配置信息，否则Pod可能无法正确解析任何域名。
Default：Pod直接继承集群节点的域名解析配置。
ClusterFirstWithHostNet：强制在hostNetwork网络模式下使用ClusterFirst策略（默认使用Default策略）。

9.K8s对外暴露服务方案

如下图所示K8s通过Service网络实现了Pod网络的负载均衡和在集群内部的通信；

那么K8s如何将内部的Service网络暴露给集群外部的客户端呢？

K8s可以通过以下几种方式将Service对外提供给用户侧；

NodePort

NodePort类型的Service，本质是Kube-Proxy在Node上暴露1个监听的端口，该端口可以把入口流量负载均衡到绑定该Service的后端Pod上；

LoadBalancer

NodePort+LoadBalance

Service类型为LoadBalancer，基于NodePort，借助云厂商提供LB例如阿里云的SLB，将用户流量负载均衡到各Node节点暴露NodePort；

Ingress

Ingress+LoadBalance

在当集群规模较大时，NodePort的端口管理就是个灾难。

因为每1个端口只能是一种服务，端口范围只能是 30000-32767。

Ingress只需要1个NodePort或者1个LB就可以满足暴露多个Service的需求。

Ingress和Service同等均属于K8s内置的资源,可以实现7层(应用层)的负载均衡;

Ingress-Controller负责k8s中的7层负载均衡,在物理机中有多种部署方式，例如NodePort和HostNetwork；

HostNetwork

Pod共享所在Node节点的网络名称空间，直接使用Node节点的IP地址。

apiVersion: v1
kind: Pod
metadata:
  name: host-network-pod
  namespace: default
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
  hostNetwork: true

HostPort

使用Pod所在Node节点的某个端口。

apiVersion: v1
kind: Pod
metadata:
  name: host-port-pod
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
    ports:
    - name: nginx-web
      protocol: TCP
      containerPort: 80
      hostPort: 8080

11.NameSpace

1个K8s集群中可以运行几千个Pod，Pod之间是可以直接通过网络通信的；
使用NameSpace名称空间分割K8s集群中的pod，起到K8s集群中多租户网络隔离的作用；

12.Pod创建过程

单独的创建1个pod，则其创建过程是这样的：
1.首先，用户通过kubectl或其他api客户端工具提交需要创建的pod信息给apiserver；
2.apiserver验证客户端的用户权限信息，验证通过开始处理创建请求生成pod对象信息，并将信息存入etcd，然后返回确认信息给客户端；
3.apiserver开始反馈etcd中pod对象的变化，其他组件使用watch机制跟踪apiserver上的变动；
4.scheduler发现有新的pod对象要创建，开始调用内部算法机制为pod分配最佳的主机，并将结果信息更新至apiserver；
5.node节点上的kubelet通过watch机制跟踪apiserver发现有pod调度到本节点，尝试调用docker启动容器，并将结果反馈apiserver；
6.apiserver将收到的pod状态信息存入etcd中。

使用deployment来创建多个pod
1.首先，用户使用kubectl create命令或者kubectl apply命令提交了要创建一个deployment资源请求；
2.api-server收到创建资源的请求后，会对客户端操作进行身份认证，在客户端的~/.kube文件夹下，已经设置好了相关的用户认证信息，这样api-server会知道我是哪个用户，并对此用户进行鉴权，当api-server确定客户端的请求合法后，就会接受本次操作，并把相关的信息保存到etcd中，然后返回确认信息给客户端。
3.apiserver开始反馈etcd中过程创建的对象的变化，其他组件使用watch机制跟踪apiserver上的变动。
4.controller-manager组件会监听api-server的信息，controller-manager是有多个类型的，比如Deployment Controller, 它的作用就是负责监听Deployment，此时Deployment Controller发现有新的deployment要创建，那么它就会去创建一个ReplicaSet，一个ReplicaSet的产生，又被另一个叫做ReplicaSet Controller监听到了，紧接着它就会去分析ReplicaSet的语义，它了解到是要依照ReplicaSet的template去创建Pod, 它一看这个Pod并不存在，那么就新建此Pod，当Pod刚被创建时，它的nodeName属性值为空，代表着此Pod未被调度。
5.调度器Scheduler组件开始介入工作，Scheduler也是通过watch机制跟踪apiserver上的变动，发现有未调度的Pod，则根据内部算法、节点资源情况，pod定义的亲和性反亲和性等等，调度器会综合的选出一批候选节点，在候选节点中选择一个最优的节点，然后将pod绑定该该节点，将信息反馈给api-server。
6.kubelet组件布署于Node之上，它也是通过watch机制跟踪apiserver上的变动，监听到有一个Pod应该要被调度到自身所在Node上来，kubelet首先判断本地是否在此Pod，如果不存在，则会进入创建Pod流程，创建Pod有分为几种情况，第一种是容器不需要挂载外部存储，则相当于直接docker run把容器启动，但不会直接挂载docker网络，而是通过CNI调用网络插件配置容器网络，如果需要挂载外部存储，则还要调用CSI来挂载存储。kubelet创建完pod，将信息反馈给api-server，api-servier将pod信息写入etcd。
7.Pod建立成功后，ReplicaSet Controller会对其持续进行关注，如果Pod因意外或被我们手动退出，ReplicaSet Controller会知道，并创建新的Pod，以保持replicas数量期望值。

四、Kubernetes集群搭建

Kubernetes主要有3种搭建方式

Minikube：用于搭建单节点Kubernetes集群的工具（开发人员学习)；
Kubeadm:用于快速搭建Kubernetes集群（运维人员学习）；
二进制源码安装：从K8s官网下载每个组件的源码，依次进行安装，此方式对于理解K8s组件组件更有效（生产环境）；

Kubeadm部署工具的出发点很简单就是把大部分K8s集群组件容器化，并通过StaticPod的方式运行；

那么谁来创建出这些组件的StaticPod呢？

如果Kubelet不听从Master节点的指令，用户也可以直接通过kubectl命令Kublet创建Pod，此类Pod称为静态Pod，Kubeadm就是这样工作的。

基于以上原理，要想使用Kubeadm搭建出K8s集群，其大致步骤应当如下

下载集群组件（kube-api-server、kube-scheduler、kube-controller-manager、kube-proxy、Flanel)的镜像文件
安装docker、kubelet、kubectl和kubeadm程序
通过kubeadm调用kubelet把集群组件（kube-api-server、kube-scheduler、kube-controller-manager、kube-proxy、Flane)镜像运行为Pod

[root@master net.d]# kubectl get ns
NAME              STATUS   AGE
default           Active   20h
kube-flannel      Active   20h
kube-node-lease   Active   20h
kube-public       Active   20h
kube-system       Active   20h
[root@master net.d]# kubectl get nodes -n kube-system 
NAME     STATUS   ROLES    AGE   VERSION
master   Ready    master   20h   v1.17.4
node1    Ready    <none>   20h   v1.17.4
[root@master net.d]#

Kubeadm大大简化了K8s集群配置文件、手工生成CA证书工作；

基于kubeadm，我们最常用的功能有3个：

init：初始化k8s集群的Master节点
join：将当前Worker节点加入到K8s集群之中
reset：尽最大努力还原init或者join对K8s集群的影响

1.主机环境预设

准备<=3的奇数台Centos7的虚拟主机，用于构建1个完整的K8s集群；

角色	IP地址	组件
master	192.168.56.18	docker，kubectl，kubeadm，kubelet
node1	192.168.56.19	docker，kubectl，kubeadm，kubelet
node2	192.168.56.20	docker，kubectl，kubeadm，kubelet

1.1.检查操作系统版本

Kubeadmin方式安装K8s要求CentOS的系统内核高于7.5；

root@localhost ~]# cat /etc/redhat-release 
CentOS Linux release 7.6.1810 (Core)

1.2.主机名解析

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.56.18 master
192.168.56.19 node1

vim /etc/hosts

1.3.时间同步

[root@master zhanggen]# timedatectl set-timezone Asia/Shanghai
[root@master zhanggen]# ntpdate ntp1.aliyun.com

1.4.关闭firewalld和iptables

[root@master zhanggen]# systemctl stop firewalld
[root@master zhanggen]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@master zhanggen]# systemctl disable iptables
Failed to execute operation: No such file or directory
[root@master zhanggen]# systemctl stop iptables
Failed to stop iptables.service: Unit iptables.service not loaded.
[root@master zhanggen]#

1.5.禁用selinux

[root@localhost zhanggen]# cat /etc/selinux/config 

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of three values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected. 
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted 
检查一下

[root@node1 zhanggen]# getenforce
Disabled
[root@node1 zhanggen]#

1.6.关闭swap分区

vim /etc/fstab

#
# /etc/fstab
# Created by anaconda on Wed Jun 15 18:12:38 2022
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=eae1aeba-8612-48a4-9d76-fbd933efdd47 /                       xfs     defaults        0 0
UUID=674b7ccf-9765-47dd-b43f-c658c19dbd9b /boot                   xfs     defaults        0 0
#UUID=fedfdf53-e6a3-48e1-b280-db3f2fa44809 swap                    swap    defaults

1.7.配置kubernetes的内核参数

cat <<EOF> kubernetes.conf 
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.ipv4.ip_forward=1
net.ipv4.tcp_tw_recycle=0
vm.swappiness=0 # 禁止使用 swap 空间，只有当系统 OOM 时才允许使用它
vm.overcommit_memory=1 # 不检查物理内存是否够用
vm.panic_on_oom=0 # 开启 OOM
fs.inotify.max_user_instances=8192
fs.inotify.max_user_watches=1048576
fs.file-max=52706963
fs.nr_open=52706963
net.ipv6.conf.all.disable_ipv6=1
net.netfilter.nf_conntrack_max=2310720
EOF

cp kubernetes.conf /etc/sysctl.d/kubernetes.conf

系统加载内核模块

[root@master netfilter]# modprobe br_netfilter
[root@master netfilter]# modprobe nf_conntrack
[root@master netfilter]# sysctl -p /etc/sysctl.d/kubernetes.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
net.ipv4.tcp_tw_recycle = 0
vm.swappiness = 0
vm.overcommit_memory = 1
vm.panic_on_oom = 0
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 1048576
fs.file-max = 52706963
fs.nr_open = 52706963
net.ipv6.conf.all.disable_ipv6 = 1
net.netfilter.nf_conntrack_max = 2310720
[root@master netfilter]#

1.8.设置rsyslogd 和systemd journald

[root@master netfilter]#  lsmod |grep  br_netfilter
br_netfilter           22256  0 
bridge                151336  1 br_netfilter
[root@master netfilter]# mkdir /var/log/journal 
[root@master netfilter]# mkdir /etc/systemd/journald.conf.d
[root@master netfilter]# cat > /etc/systemd/journald.conf.d/99-prophet.conf <<EOF
> [Journal]
> # 持久化保存到磁盘
> Storage=persistent
> 
> # 压缩历史日志
> Compress=yes
> 
> SyncIntervalSec=5m
> RateLimitInterval=30s
> RateLimitBurst=1000
> 
> # 最大占用空间 10G
> SystemMaxUse=10G
> 
> # 单日志文件最大 200M
> SystemMaxFileSize=200M
> 
> # 日志保存时间 2 周
> MaxRetentionSec=2week
> 
> # 不将日志转发到 syslog
> ForwardToSyslog=no
> EOF
[root@master netfilter]# systemctl restart systemd-journald
[root@master netfilter]#

1.9.kube-proxy开启IPVS的前置条件

Kubernetes的service有两种实现模型，分别基于Iptables和IPVS，其中IPVS性能高于Iptables，固使Linux内核载入IPVS模块；

 yum install ipset ipvsadm -y

[root@node1 zhanggen]# cat <<EOF> /etc/sysconfig/modules/ipvs.modules 
> #!/bin/bash
> modprobe -- ip_vs
> modprobe -- ip_vs_rr
> modprobe -- ip_vs_wrr
> modprobe -- ip_vs_sh
> modprobe -- nf_conntrack_ipv4
> EOF
[root@node1 zhanggen]# chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4
ip_vs_sh               12688  0 
ip_vs_wrr              12697  0 
ip_vs_rr               12600  0 
ip_vs                 145497  6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
nf_conntrack_ipv4      15053  2 
nf_defrag_ipv4         12729  1 nf_conntrack_ipv4
nf_conntrack          133095  6 ip_vs,nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4
libcrc32c              12644  4 xfs,ip_vs,nf_nat,nf_conntrack
[root@node1 zhanggen]#

2.kubeadm安装K8s组件

kubeadm不能帮你安装或者管理 kubelet 或 kubectl，

所以需要确保kubelet、kubectl和kubeadm安装的控制平面的版本相匹配。

2.1.安装 Docker

yum install -y yum-utils device-mapper-persistent-data lvm2

yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

yum install -y docker-ce

## 创建 /etc/docker 目录
mkdir /etc/docker

cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
}
}
EOF
mkdir -p /etc/systemd/system/docker.service.d
# 重启docker服务
systemctl daemon-reload && systemctl restart docker && systemctl enable docker

查看docker的repo

[docker-ce-stable]
name=Docker CE Stable - $basearch
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/$releasever/$basearch/stable
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/docker-ce/linux/centos/gpg

[docker-ce-stable-debuginfo]
name=Docker CE Stable - Debuginfo $basearch
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/$releasever/debug-$basearch/stable
enabled=0
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/docker-ce/linux/centos/gpg

[docker-ce-stable-source]
name=Docker CE Stable - Sources
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/$releasever/source/stable
enabled=0
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/docker-ce/linux/centos/gpg

[docker-ce-test]
name=Docker CE Test - $basearch
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/$releasever/$basearch/test
enabled=0
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/docker-ce/linux/centos/gpg

[docker-ce-test-debuginfo]
name=Docker CE Test - Debuginfo $basearch
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/$releasever/debug-$basearch/test
enabled=0
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/docker-ce/linux/centos/gpg

[docker-ce-test-source]
name=Docker CE Test - Sources
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/$releasever/source/test
enabled=0
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/docker-ce/linux/centos/gpg

[docker-ce-nightly]
name=Docker CE Nightly - $basearch
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/$releasever/$basearch/nightly
enabled=0
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/docker-ce/linux/centos/gpg

[docker-ce-nightly-debuginfo]
name=Docker CE Nightly - Debuginfo $basearch
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/$releasever/debug-$basearch/nightly
enabled=0
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/docker-ce/linux/centos/gpg

[docker-ce-nightly-source]
name=Docker CE Nightly - Sources
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/$releasever/source/nightly
enabled=0
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/docker-ce/linux/centos/gpg

/etc/yum.repos.d/docker-ce.repo

2.2.安装kubeadm、kubelet和kubectl

kubelet是服务，各个node节点用来调用下层的container管理器创建Pod的组件；
kubectl是API，供我们调用，键入命令对k8s资源进行管理。
kubeadm是管理器，我们可以使用它进行k8s节点的管理。

2.2.1.配置Kubernetes的yum源

/etc/yum.repos.d/kubernetes.repo

[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgchech=0
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
            http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg

2.2.2.安装kubeadm、kubelet、kubectl

确保kubeadm、kubelet、kubectl的版本（v1.17.4.0）保持一致；

[root@master ~]# yum install --setopt=obsoletes=0 kubeadm-1.17.4-0 kubelet-1.17.4-0 kubectl-1.17.4-0 -y

查看yum安装了哪些文件

[root@master net.d]# rpm -ql kubelet
/etc/kubernetes/manifests  
/etc/sysconfig/kubelet
/usr/bin/kubelet
/usr/lib/systemd/system/kubelet.service
[root@master net.d]# rpm -ql kubeadm
/usr/bin/kubeadm
/usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
[root@master net.d]#

2.2.3.配置kubelet

/etc/sysconfig/kubelet

KUBELET_CGROUP_ARGS="--cgroup-driver=systemd"
KUBE_PROXY_MODE="ipvs"

2.2.4.设置kubelet开机自启

[root@master ~]# systemctl enable kubelet

2.2.5.手动拉取kubadm所需要的集群组件镜像

由于kubadm所需要的集群组件镜像，在国外的kubernetes的仓库中，由于网络原因，无法连接，下面提供了一种替换方案

[root@master net.d]# kubeadm -h
Available Commands:
  alpha       Kubeadm experimental sub-commands
  completion  Output shell completion code for the specified shell (bash or zsh)
  config      Manage configuration for a kubeadm cluster persisted in(持久化) a ConfigMap in the cluster
  help        Help about any command
  init        Run this command in order to set up the Kubernetes control plane
  join        Run this on any machine you wish to join an existing cluster
  reset       Performs a best effort revert of changes made to this host by 'kubeadm init' or 'kubeadm join'
  token       Manage bootstrap tokens
  upgrade     Upgrade your cluster smoothly to a newer version with this command
  version     Print the version of kubeadm

Flags:
      --add-dir-header           If true, adds the file directory to the header
  -h, --help                     help for kubeadm
      --log-file string          If non-empty, use this log file
      --log-file-max-size uint   Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
      --rootfs string            [EXPERIMENTAL] The path to the 'real' host root filesystem.
      --skip-headers             If true, avoid header prefixes in the log messages
      --skip-log-headers         If true, avoid headers when opening log files
  -v, --v Level                  number for the log level verbosity

查看kubeadm init时默认加载的参数

[root@master net.d]# kubeadm config print init-defaults 
W1205 09:27:15.918128  130084 validation.go:28] Cannot validate kube-proxy config - no validator is available
W1205 09:27:15.918569  130084 validation.go:28] Cannot validate kubelet config - no validator is available
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 1.2.3.4
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: master
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.ioc  加载镜像的地址
kind: ClusterConfiguration
kubernetesVersion: v1.17.0  #一定要和自己Yum的kubernetes-version版本一致，否则在init的时候手动指定
networking: dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12 scheduler: {} #service的默认ip地址

提前去国内阿里云镜像仓库， pull当前kubadm所需要的集群组件镜像。

[root@master ~]# kubeadm config images list   
显示kubadm依赖的镜像

images=(
    kube-apiserver:v1.17.4
    kube-controller-manager:v1.17.4
    kube-scheduler:v1.17.4
    kube-proxy:v1.17.4
    pause:3.1
    etcd:3.4.3-0
    coredns:1.6.5
)

for imageName in ${images[@]};do
    docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/$imageName
    docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/$imageName k8s.gcr.io/$imageName
    docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/$imageName 
done

准备完成

[root@master yum.repos.d]# docker images
REPOSITORY                           TAG       IMAGE ID       CREATED       SIZE
k8s.gcr.io/kube-proxy                v1.17.4   6dec7cfde1e5   2 years ago   116MB
k8s.gcr.io/kube-apiserver            v1.17.4   2e1ba57fe95a   2 years ago   171MB
k8s.gcr.io/kube-controller-manager   v1.17.4   7f997fcf3e94   2 years ago   161MB
k8s.gcr.io/kube-scheduler            v1.17.4   5db16c1c7aff   2 years ago   94.4MB
k8s.gcr.io/coredns                   1.6.5     70f311871ae1   3 years ago   41.6MB
k8s.gcr.io/etcd                      3.4.3-0   303ce5db0e90   3 years ago   288MB
k8s.gcr.io/pause                     3.1       da86e6ba6ca1   4 years ago   742kB
[root@master yum.repos.d]#

查看安装k8s的版本

使用kubectl version 命令，输出的信息会显示client和server的版本信息；

client代表kubectl版本信息

server代表的是master节点的k8s版本信息

[root@master chapter5]# kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T21:03:42Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T20:55:23Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
[root@master chapter5]#

2.3.初始化Kubernetes的Master节点

先预演一下kubeadm init的执行，--dry-run。

[root@master net.d]# kubeadm init  --apiserver-advertise-address=192.168.56.18 --dry-run
I1205 09:45:45.183662    9203 version.go:251] remote version is much newer: v1.25.4; falling back to: stable-1.17
W1205 09:45:46.092452    9203 validation.go:28] Cannot validate kube-proxy config - no validator is available
W1205 09:45:46.092475    9203 validation.go:28] Cannot validate kubelet config - no validator is available
[init] Using Kubernetes version: v1.17.17
[preflight] Running pre-flight checks
    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.21. Latest validated version: 19.03
error execution phase preflight: [preflight] Some fatal errors occurred:
    [ERROR Port-6443]: Port 6443 is in use
    [ERROR Port-10259]: Port 10259 is in use
    [ERROR Port-10257]: Port 10257 is in use
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
    [ERROR Port-10250]: Port 10250 is in use
    [ERROR Port-2379]: Port 2379 is in use
    [ERROR Port-2380]: Port 2380 is in use
    [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

所谓初始化Master节点就是以容器的形式启动KubernetesMaster节点之上运行的各大组件；

kubeadm init  --apiserver-advertise-address=192.168.56.18  
--image-repository registry.aliyuncs.com/google_containers  
--kubernetes-version=v1.17.4  
--service-cidr=10.96.0.0/12  
--pod-network-cidr=10.244.0.0/16

生成Kubernetes的各大组件直接通信使用的CA证书

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

生成Node加入Master节点所需的token证书秘钥

kubeadm token create --print-join-command

Node节点加入K8s集群之中

[root@node1 yum.repos.d]# kubeadm join 192.168.56.18:6443 --token 7npygu.rc2yrxy5i4fekbtf 
                      --discovery-token-ca-cert-hash sha256:a97ae99ed9d6ddf8659a574e24da69ffda0dd3a8c283d31832a8eb2b53a282a5

当k8s集群中安装好网络插件之后，各个node节点才会进入ready状态；下面来部署网络插件

2.4.部署网络插件calicao

在K8s中以下3种网络，

Node Network： Pod宿主机的网络，负责于外部网络通信的接口。
Service Network：Service网络的IP地址仅会出现在iptable/IPVS规则中，不会配置在网络接口上，ping不通，因为没有网络协议栈来响应你的请求，仅用于路由和调度发给Service IP地址的请求流量到具体Pod。
Pod Network：由于CNI插件通过SDN技术创建的网络，动态分配IP地址给各个动态的Pod，在k8s集群中可以ping通，由于Pod动态，固使用Service网络前端代理。

在部署calicao时，务必和kubeadm -init指定的网段保持一致；

Pod网络：10.244.0.0/16
Service网络：10.96.0.0/12
Node网络：物理机的网络

---
# Source: calico/templates/calico-config.yaml
# This ConfigMap is used to configure a self-hosted Calico installation.
kind: ConfigMap
apiVersion: v1
metadata:
  name: calico-config
  namespace: kube-system
data:
  # Typha is disabled.
  typha_service_name: "none"
  # Configure the backend to use.
  calico_backend: "bird"

  # Configure the MTU to use
  veth_mtu: "1440"

  # The CNI network configuration to install on each node.  The special
  # values in this config will be automatically populated.
  cni_network_config: |-
    {
      "name": "k8s-pod-network",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "calico",
          "log_level": "info",
          "datastore_type": "kubernetes",
          "nodename": "__KUBERNETES_NODE_NAME__",
          "mtu": __CNI_MTU__,
          "ipam": {
              "type": "calico-ipam"
          },
          "policy": {
              "type": "k8s"
          },
          "kubernetes": {
              "kubeconfig": "__KUBECONFIG_FILEPATH__"
          }
        },
        {
          "type": "portmap",
          "snat": true,
          "capabilities": {"portMappings": true}
        }
      ]
    }

---
# Source: calico/templates/kdd-crds.yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: felixconfigurations.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: FelixConfiguration
    plural: felixconfigurations
    singular: felixconfiguration
---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ipamblocks.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: IPAMBlock
    plural: ipamblocks
    singular: ipamblock

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: blockaffinities.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: BlockAffinity
    plural: blockaffinities
    singular: blockaffinity

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ipamhandles.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: IPAMHandle
    plural: ipamhandles
    singular: ipamhandle

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ipamconfigs.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: IPAMConfig
    plural: ipamconfigs
    singular: ipamconfig

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: bgppeers.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: BGPPeer
    plural: bgppeers
    singular: bgppeer

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: bgpconfigurations.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: BGPConfiguration
    plural: bgpconfigurations
    singular: bgpconfiguration

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ippools.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: IPPool
    plural: ippools
    singular: ippool

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: hostendpoints.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: HostEndpoint
    plural: hostendpoints
    singular: hostendpoint

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: clusterinformations.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: ClusterInformation
    plural: clusterinformations
    singular: clusterinformation

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: globalnetworkpolicies.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: GlobalNetworkPolicy
    plural: globalnetworkpolicies
    singular: globalnetworkpolicy

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: globalnetworksets.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: GlobalNetworkSet
    plural: globalnetworksets
    singular: globalnetworkset

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: networkpolicies.crd.projectcalico.org
spec:
  scope: Namespaced
  group: crd.projectcalico.org
  version: v1
  names:
    kind: NetworkPolicy
    plural: networkpolicies
    singular: networkpolicy

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: networksets.crd.projectcalico.org
spec:
  scope: Namespaced
  group: crd.projectcalico.org
  version: v1
  names:
    kind: NetworkSet
    plural: networksets
    singular: networkset
---
# Source: calico/templates/rbac.yaml

# Include a clusterrole for the kube-controllers component,
# and bind it to the calico-kube-controllers serviceaccount.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: calico-kube-controllers
rules:
  # Nodes are watched to monitor for deletions.
  - apiGroups: [""]
    resources:
      - nodes
    verbs:
      - watch
      - list
      - get
  # Pods are queried to check for existence.
  - apiGroups: [""]
    resources:
      - pods
    verbs:
      - get
  # IPAM resources are manipulated when nodes are deleted.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - ippools
    verbs:
      - list
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - blockaffinities
      - ipamblocks
      - ipamhandles
    verbs:
      - get
      - list
      - create
      - update
      - delete
  # Needs access to update clusterinformations.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - clusterinformations
    verbs:
      - get
      - create
      - update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: calico-kube-controllers
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: calico-kube-controllers
subjects:
- kind: ServiceAccount
  name: calico-kube-controllers
  namespace: kube-system
---
# Include a clusterrole for the calico-node DaemonSet,
# and bind it to the calico-node serviceaccount.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: calico-node
rules:
  # The CNI plugin needs to get pods, nodes, and namespaces.
  - apiGroups: [""]
    resources:
      - pods
      - nodes
      - namespaces
    verbs:
      - get
  - apiGroups: [""]
    resources:
      - endpoints
      - services
    verbs:
      # Used to discover service IPs for advertisement.
      - watch
      - list
      # Used to discover Typhas.
      - get
  - apiGroups: [""]
    resources:
      - nodes/status
    verbs:
      # Needed for clearing NodeNetworkUnavailable flag.
      - patch
      # Calico stores some configuration information in node annotations.
      - update
  # Watch for changes to Kubernetes NetworkPolicies.
  - apiGroups: ["networking.k8s.io"]
    resources:
      - networkpolicies
    verbs:
      - watch
      - list
  # Used by Calico for policy information.
  - apiGroups: [""]
    resources:
      - pods
      - namespaces
      - serviceaccounts
    verbs:
      - list
      - watch
  # The CNI plugin patches pods/status.
  - apiGroups: [""]
    resources:
      - pods/status
    verbs:
      - patch
  # Calico monitors various CRDs for config.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - globalfelixconfigs
      - felixconfigurations
      - bgppeers
      - globalbgpconfigs
      - bgpconfigurations
      - ippools
      - ipamblocks
      - globalnetworkpolicies
      - globalnetworksets
      - networkpolicies
      - networksets
      - clusterinformations
      - hostendpoints
    verbs:
      - get
      - list
      - watch
  # Calico must create and update some CRDs on startup.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - ippools
      - felixconfigurations
      - clusterinformations
    verbs:
      - create
      - update
  # Calico stores some configuration information on the node.
  - apiGroups: [""]
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch
  # These permissions are only requried for upgrade from v2.6, and can
  # be removed after upgrade or on fresh installations.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - bgpconfigurations
      - bgppeers
    verbs:
      - create
      - update
  # These permissions are required for Calico CNI to perform IPAM allocations.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - blockaffinities
      - ipamblocks
      - ipamhandles
    verbs:
      - get
      - list
      - create
      - update
      - delete
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - ipamconfigs
    verbs:
      - get
  # Block affinities must also be watchable by confd for route aggregation.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - blockaffinities
    verbs:
      - watch
  # The Calico IPAM migration needs to get daemonsets. These permissions can be
  # removed if not upgrading from an installation using host-local IPAM.
  - apiGroups: ["apps"]
    resources:
      - daemonsets
    verbs:
      - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: calico-node
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: calico-node
subjects:
- kind: ServiceAccount
  name: calico-node
  namespace: kube-system

---
# Source: calico/templates/calico-node.yaml
# This manifest installs the calico-node container, as well
# as the CNI plugins and network config on
# each master and worker node in a Kubernetes cluster.
kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: calico-node
  namespace: kube-system
  labels:
    k8s-app: calico-node
spec:
  selector:
    matchLabels:
      k8s-app: calico-node
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        k8s-app: calico-node
      annotations:
        # This, along with the CriticalAddonsOnly toleration below,
        # marks the pod as a critical add-on, ensuring it gets
        # priority scheduling and that its resources are reserved
        # if it ever gets evicted.
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      nodeSelector:
        beta.kubernetes.io/os: linux
      hostNetwork: true
      tolerations:
        # Make sure calico-node gets scheduled on all nodes.
        - effect: NoSchedule
          operator: Exists
        # Mark the pod as a critical add-on for rescheduling.
        - key: CriticalAddonsOnly
          operator: Exists
        - effect: NoExecute
          operator: Exists
      serviceAccountName: calico-node
      # Minimize downtime during a rolling upgrade or deletion; tell Kubernetes to do a "force
      # deletion": https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods.
      terminationGracePeriodSeconds: 0
      priorityClassName: system-node-critical
      initContainers:
        # This container performs upgrade from host-local IPAM to calico-ipam.
        # It can be deleted if this is a fresh installation, or if you have already
        # upgraded to use calico-ipam.
        - name: upgrade-ipam
          image: calico/cni:v3.8.9
          command: ["/opt/cni/bin/calico-ipam", "-upgrade"]
          env:
            - name: KUBERNETES_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: CALICO_NETWORKING_BACKEND
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: calico_backend
          volumeMounts:
            - mountPath: /var/lib/cni/networks
              name: host-local-net-dir
            - mountPath: /host/opt/cni/bin
              name: cni-bin-dir
          securityContext:
            privileged: true
        # This container installs the CNI binaries
        # and CNI network config file on each node.
        - name: install-cni
          image: calico/cni:v3.8.9
          command: ["/install-cni.sh"]
          env:
            # Name of the CNI config file to create.
            - name: CNI_CONF_NAME
              value: "10-calico.conflist"
            # The CNI network config to install on each node.
            - name: CNI_NETWORK_CONFIG
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: cni_network_config
            # Set the hostname based on the k8s node name.
            - name: KUBERNETES_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            # CNI MTU Config variable
            - name: CNI_MTU
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: veth_mtu
            # Prevents the container from sleeping forever.
            - name: SLEEP
              value: "false"
          volumeMounts:
            - mountPath: /host/opt/cni/bin
              name: cni-bin-dir
            - mountPath: /host/etc/cni/net.d
              name: cni-net-dir
          securityContext:
            privileged: true
        # Adds a Flex Volume Driver that creates a per-pod Unix Domain Socket to allow Dikastes
        # to communicate with Felix over the Policy Sync API.
        - name: flexvol-driver
          image: calico/pod2daemon-flexvol:v3.8.9
          volumeMounts:
          - name: flexvol-driver-host
            mountPath: /host/driver
          securityContext:
            privileged: true
      containers:
        # Runs calico-node container on each Kubernetes node.  This
        # container programs network policy and routes on each
        # host.
        - name: calico-node
          image: calico/node:v3.8.9
          env:
            # Use Kubernetes API as the backing datastore.
            - name: DATASTORE_TYPE
              value: "kubernetes"
            # Wait for the datastore.
            - name: WAIT_FOR_DATASTORE
              value: "true"
            # Set based on the k8s node name.
            - name: NODENAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            # Choose the backend to use.
            - name: CALICO_NETWORKING_BACKEND
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: calico_backend
            # Cluster type to identify the deployment type
            - name: CLUSTER_TYPE
              value: "k8s,bgp"
            # Auto-detect the BGP IP address.
            - name: IP
              value: "autodetect"
            # Enable IPIP
            - name: CALICO_IPV4POOL_IPIP
              value: "Always"
            # Set MTU for tunnel device used if ipip is enabled
            - name: FELIX_IPINIPMTU
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: veth_mtu
            # The default IPv4 pool to create on startup if none exists. Pod IPs will be
            # chosen from this range. Changing this value after installation will have
            # no effect. This should fall within `--cluster-cidr`.
            - name: CALICO_IPV4POOL_CIDR
              value: "10.244.0.0/16"
            - name: IP_AUTODETECTION_METHOD
              value: "interface=ens.*"
            # Disable file logging so `kubectl logs` works.
            - name: CALICO_DISABLE_FILE_LOGGING
              value: "true"
            # Set Felix endpoint to host default action to ACCEPT.
            - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
              value: "ACCEPT"
            # Disable IPv6 on Kubernetes.
            - name: FELIX_IPV6SUPPORT
              value: "false"
            # Set Felix logging to "info"
            - name: FELIX_LOGSEVERITYSCREEN
              value: "info"
            - name: FELIX_HEALTHENABLED
              value: "true"
          securityContext:
            privileged: true
          resources:
            requests:
              cpu: 250m
          livenessProbe:
            exec:
              command:
                - /bin/calico-node
                - -felix-live
                - -bird-live
            periodSeconds: 10
            initialDelaySeconds: 10
            failureThreshold: 6
          readinessProbe:
            exec:
              command:
              - /bin/calico-node
              - -bird-ready
              - -felix-ready
            periodSeconds: 10
          volumeMounts:
            - mountPath: /lib/modules
              name: lib-modules
              readOnly: true
            - mountPath: /run/xtables.lock
              name: xtables-lock
              readOnly: false
            - mountPath: /var/run/calico
              name: var-run-calico
              readOnly: false
            - mountPath: /var/lib/calico
              name: var-lib-calico
              readOnly: false
            - name: policysync
              mountPath: /var/run/nodeagent
      volumes:
        # Used by calico-node.
        - name: lib-modules
          hostPath:
            path: /lib/modules
        - name: var-run-calico
          hostPath:
            path: /var/run/calico
        - name: var-lib-calico
          hostPath:
            path: /var/lib/calico
        - name: xtables-lock
          hostPath:
            path: /run/xtables.lock
            type: FileOrCreate
        # Used to install CNI.
        - name: cni-bin-dir
          hostPath:
            path: /opt/cni/bin
        - name: cni-net-dir
          hostPath:
            path: /etc/cni/net.d
        # Mount in the directory for host-local IPAM allocations. This is
        # used when upgrading from host-local to calico-ipam, and can be removed
        # if not using the upgrade-ipam init container.
        - name: host-local-net-dir
          hostPath:
            path: /var/lib/cni/networks
        # Used to create per-pod Unix Domain Sockets
        - name: policysync
          hostPath:
            type: DirectoryOrCreate
            path: /var/run/nodeagent
        # Used to install Flex Volume Driver
        - name: flexvol-driver-host
          hostPath:
            type: DirectoryOrCreate
            path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: calico-node
  namespace: kube-system

---
# Source: calico/templates/calico-kube-controllers.yaml

# See https://github.com/projectcalico/kube-controllers
apiVersion: apps/v1
kind: Deployment
metadata:
  name: calico-kube-controllers
  namespace: kube-system
  labels:
    k8s-app: calico-kube-controllers
spec:
  # The controllers can only have a single active instance.
  replicas: 1
  selector:
    matchLabels:
      k8s-app: calico-kube-controllers
  strategy:
    type: Recreate
  template:
    metadata:
      name: calico-kube-controllers
      namespace: kube-system
      labels:
        k8s-app: calico-kube-controllers
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      nodeSelector:
        beta.kubernetes.io/os: linux
      tolerations:
        # Mark the pod as a critical add-on for rescheduling.
        - key: CriticalAddonsOnly
          operator: Exists
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
      serviceAccountName: calico-kube-controllers
      priorityClassName: system-cluster-critical
      containers:
        - name: calico-kube-controllers
          image: calico/kube-controllers:v3.8.9
          env:
            # Choose which controllers to run.
            - name: ENABLED_CONTROLLERS
              value: node
            - name: DATASTORE_TYPE
              value: kubernetes
          readinessProbe:
            exec:
              command:
              - /usr/bin/check-status
              - -r

---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: calico-kube-controllers
  namespace: kube-system
---
# Source: calico/templates/calico-etcd-secrets.yaml

---
# Source: calico/templates/calico-typha.yaml

---
# Source: calico/templates/configure-canal.yaml

calicao.yaml

使用配置文件启动fannel

kubectl apply -f calicao.yaml

若是集群状态一直是 notready,用下面语句查看kubelet服务日志

journalctl -f -u kubelet.service

2.5.查看pod的日志

kubectl logs myapp-5c6976696c-tggsq -n default -f --tail=1  #pod中包含2个容器
kubectl logs pod-zhanggen -n default -c myapp

3.测试kubernetes集群

验证k8s组件、coredns插件、calicao插件是否能正常提供服务？

验证service、pod控制、pod的网络连接工作机制；

1.创建pod控制器和service

创建1个名称nginx-deploy的pod控制器，控制器自动创建出pod；

创建1个名称nginx-deploy的service，由于service和pod控制器的名称同为nginx-deploy两者自动关联；

[root@master ~]# kubectl create deploy nginx-deploy --image=nginx:1.14-alpine
deployment.apps/nginx-deploy created
[root@master ~]# kubectl get pods
NAME                           READY   STATUS    RESTARTS   AGE
nginx-deploy-75d47d8b7-brtlm   1/1     Running   0          2m11s
[root@master ~]# kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE     IP                NODE    NOMINATED NODE   READINESS GATES
nginx-deploy-75d47d8b7-brtlm   1/1     Running   0          2m32s   192.168.166.130   node1   <none>           <none>
[root@master ~]# kubectl create svc clusterip nginx-deploy --tcp=80:80
service/nginx-deploy created

2.k8s核心功能测试

通过service的IP地址访问后端关联的Pod；

通过service的完整域名地址访问后端关联的Pod；

[root@master ~]# kubectl describe svc/nginx-deploy
Name:              nginx-deploy
Namespace:         default
Labels:            app=nginx-deploy
Annotations:       <none>
Selector:          app=nginx-deploy
Type:              ClusterIP
IP:                10.98.82.22          #service的ip
Port:              80-80  80/TCP
TargetPort:        80/TCP
Endpoints:         10.244.166.133:80   #service关联pod的ip
Session Affinity:  None
Events:            <none>
[root@master ~]# kubectl get pod  -o wide
NAME                           READY   STATUS    RESTARTS   AGE     IP               NODE    NOMINATED NODE   READINESS GATES
nginx-deploy-75d47d8b7-z29gn   1/1     Running   0          8m41s   10.244.166.133   node1   <none>           <none>

#1.通过service的IP地址访问后端关联的Pod
[root@master ~]# curl 10.98.82.22
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
#2.通过service的完整域名地址访问后端关联的Pod
[root@master ~]# kubectl get svc -n kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   8h
[root@master ~]# vim /etc/resolv.conf 
[root@master ~]# cat /etc/resolv.conf 
# Generated by NetworkManager
#nameserver 114.114.114.114
nameserver 10.96.0.10
[root@master ~]# curl nginx-deploy.default.svc.cluster.local.
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>

在线扩容pod，通过service的域名测试pod的负载均衡

[root@master zhanggen]# kubectl scale --replicas=3 deployment myapp
deployment.apps/myapp scaled
[root@master zhanggen]# kubectl get pod -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP               NODE    NOMINATED NODE   READINESS GATES
myapp-5c6976696c-8vrd6         1/1     Running   0          5s    10.244.104.2     node2   <none>           <none>
myapp-5c6976696c-frg2p         1/1     Running   0          12m   10.244.166.134   node1   <none>           <none>
myapp-5c6976696c-tggsq         1/1     Running   0          5s    10.244.166.135   node1   <none>           <none>
nginx-deploy-75d47d8b7-z29gn   1/1     Running   0          14h   10.244.166.133   node1   <none>           <none>
[root@master zhanggen]# curl myapp.default.svc.cluster.local./hostname.html
myapp-5c6976696c-frg2p
[root@master zhanggen]# curl myapp.default.svc.cluster.local./hostname.html
myapp-5c6976696c-tggsq
[root@master zhanggen]# curl myapp.default.svc.cluster.local./hostname.html
myapp-5c6976696c-frg2p
[root@master zhanggen]# curl myapp.default.svc.cluster.local./hostname.html
myapp-5c6976696c-8vrd6
[root@master zhanggen]# curl myapp.default.svc.cluster.local./hostname.html
myapp-5c6976696c-frg2p
[root@master zhanggen]# curl myapp.default.svc.cluster.local./hostname.html
myapp-5c6976696c-frg2p
[root@master zhanggen]# curl myapp.default.svc.cluster.local./hostname.html
myapp-5c6976696c-8vrd6

定义一个包含busybox容器的pod

apiVersion: v1
kind: Pod
metadata:
  name: pod-zhanggen
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
  - name: busybox
    image: busybox:latest
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh","-c","sleep 86400"]

进入pod包含的容器

kubectl exec -it podname -n argo -c authenticator -- /bin/sh

进入的pod中容器中测试网络

[root@master basic]# kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE     IP               NODE    NOMINATED NODE   READINESS GATES
myapp-5c6976696c-8vrd6         1/1     Running   0          3h8m    10.244.104.2     node2   <none>           <none>
myapp-5c6976696c-frg2p         1/1     Running   0          3h21m   10.244.166.134   node1   <none>           <none>
myapp-5c6976696c-tggsq         1/1     Running   0          3h8m    10.244.166.135   node1   <none>           <none>
nginx-deploy-75d47d8b7-z29gn   1/1     Running   0          17h     10.244.166.133   node1   <none>           <none>
pod-zhanggen                   2/2     Running   0          18m     10.244.166.136   node1   <none>           <none>
[root@master basic]# kubectl exec -it pod-zhanggen -c busybox -n default /bin/sh
/ # ping 10.244.104.3
PING 10.244.104.3 (10.244.104.3): 56 data bytes
64 bytes from 10.244.104.3: seq=0 ttl=62 time=0.815 ms
64 bytes from 10.244.104.3: seq=1 ttl=62 time=0.522 ms
^C
--- 10.244.104.3 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.522/0.668/0.815 ms
/ # nslookup nginx-deploy.default.svc.cluster.local.
Server:        10.96.0.10
Address:    10.96.0.10:53

在同1个pod的中的多个容器共享同1个网络名称空间

/ # ifconfig 
eth0      Link encap:Ethernet  HWaddr 26:D3:5E:86:83:3C  
          inet addr:10.244.166.136  Bcast:0.0.0.0  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1440  Metric:1
          RX packets:25 errors:0 dropped:0 overruns:0 frame:0
          TX packets:25 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:3059 (2.9 KiB)  TX bytes:1952 (1.9 KiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:22 errors:0 dropped:0 overruns:0 frame:0
          TX packets:22 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1912 (1.8 KiB)  TX bytes:1912 (1.8 KiB)

/ # netstat -tnl #在busybox容器中哪里开启的80端口？
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      
/ # ps -ef
PID   USER     TIME  COMMAND
    1 root      0:00 sleep 86400
   26 root      0:00 /bin/sh
   44 root      0:00 ps -ef
/ # wget -O - -q 127.0.0.1
Hello MyApp | Version: v1 | <a href="hostname.html">Pod Name</a>
/ #

五、Kubernetes集群管理

显示管理

[root@master ~]# kubectl get ns/zhanggen -o wide
[root@master ~]# kubectl get ns/zhanggen -o yaml
#生成资源清单的模板
kubectl get pod myapp-5c6976696c-8vrd6 -o yaml --> ./pod-deamon.yaml
kubectl get ns/zhanggen -o json

集群组件管理

[root@master ~]# kubectl get cm -n kube-system
NAME                                 DATA   AGE
calico-config                        4      6d19h
coredns                              1      7d2h
extension-apiserver-authentication   6      7d2h
kube-proxy                           2      7d2h
kubeadm-config                       2      7d2h
kubelet-config-1.17                  1      7d2h
[root@master ~]# kubectl edit cm kube-proxy -n kube-system

标签管理

借助标签+标签选择器，可以从K8s集群的众多Pod中筛选，定位到K8s管理员关注的特定k8s资源。

创建标签

可以在yaml文件的metadata.labels字段中定义标签，也可以通过kubectl创建和覆盖标签。

#新增标签
[root@master basic]# kubectl get pods --show-labels
NAME       READY   STATUS    RESTARTS   AGE     LABELS
pod-demo   1/1     Running   0          6h54m   app=pod-demo,rel=stable
[root@master basic]# kubectl label pods pod-demo tier=frontend
pod/pod-demo labeled
[root@master basic]# kubectl get pods --show-labels
NAME       READY   STATUS    RESTARTS   AGE     LABELS
pod-demo   1/1     Running   0          6h55m   app=pod-demo,rel=stable,tier=frontend
[root@master basic]# kubectl label pods pod-demo app=myapp
error: 'app' already has a value (pod-demo), and --overwrite is false
[root@master basic]# kubectl label pods pod-demo app=myapp --overwrite
pod/pod-demo labeled
[root@master basic]#

标签查询

根据标签查询Pod资源

#查看标签的app=zhanggen的pod
[root@master basic]# kubectl get pods --show-labels -l app=zhanggen
No resources found in default namespace.
#查看标签的app!=zhanggen的pod
[root@master basic]# kubectl get pods --show-labels -l app!=zhanggen
NAME       READY   STATUS    RESTARTS   AGE     LABELS
pod-demo   1/1     Running   0          3m51s   app=pod-demo,rel=stable
#查看标签的值在某1访问内的
[root@master basic]# kubectl get pods --show-labels -l "app in (pod-demo)"
NAME       READY   STATUS    RESTARTS   AGE     LABELS
pod-demo   1/1     Running   0          4m40s   app=pod-demo,rel=stable
[root@master basic]# kubectl get pods --show-labels -l "app in (pod-demo)" -L app
NAME       READY   STATUS    RESTARTS   AGE     APP        LABELS
pod-demo   1/1     Running   0          5m19s   pod-demo   app=pod-demo,rel=stable
#查看标签的值不在某1访问内的
[root@master basic]# kubectl get pods --show-labels -l "app notin (pod-demo)" -L app
No resources found in default namespace.
[root@master basic]# kubectl get pods --show-labels -l "app notin (pod-demo)"
No resources found in default namespace.
#查看标签的键为app的Pod
[root@master basic]# kubectl get pods --show-labels -l app
NAME       READY   STATUS    RESTARTS   AGE     LABELS
pod-demo   1/1     Running   0          7m14s   app=pod-demo,rel=stable
[root@master basic]# kubectl get pods -l '!app'
No resources found in default namespace.
[root@master basic]#

删除多个deployment

[root@master basic]# kubectl get deploy
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
myapp          3/3     3            3           2d
nginx-deploy   1/1     1            1           2d15h
[root@master basic]# kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE     IP               NODE    NOMINATED NODE   READINESS GATES
myapp-5c6976696c-frg2p         1/1     Running   0          2d      10.244.166.134   node1   <none>           <none>
myapp-5c6976696c-tggsq         1/1     Running   0          47h     10.244.166.135   node1   <none>           <none>
myapp-5c6976696c-wrxq4         1/1     Running   0          7m31s   10.244.104.4     node2   <none>           <none>
nginx-deploy-75d47d8b7-z29gn   1/1     Running   0          2d14h   10.244.166.133   node1   <none>           <none>
[root@master basic]# kubectl delete deployments myapp nginx-deploy
deployment.apps "myapp" deleted
deployment.apps "nginx-deploy" deleted
[root@master basic]#

六、Kubernetes资源管理

K8s的的API是RestFull风格的，在RestFull风格API中资源类似OOP中类，类中的方法就是HTTP所支持的固定方法（Get/Post/Set/Delete...）， K8s有下列对象。

Kubernetes contains a number ofabstractions that represent the state of your system:deployed containerized applications and workloads, their associated networkand disk resources.and other information about what your cluster is doing.
These abstractions are represented by objects in the Kubernetes API.

The basic Kubernetes objects include Pod, Service, Namespace and Volume·
Inaddition Kubernetes contains a number of higher-level abstractions called Controllers.Controllers build upon the basic objects, and provide additional functionality and convenience features.
Controllers include ReplicaSet、Deployment、DaemonSet、StatefulSet and Job.

在K8s中我们可以在yaml文件中定义各种集群资源；

1.Pod资源管理

在k8s中Pod是核心资源，所有的资源都为Pod资源服务的；

Pod控制器资源是为了动态地创建、管理Pod资源；

Service和Ingress资源是为了实现Pod资源的网络访问；

1.1.Pod的生命周期

在1个Pod中通常包含以下3类容器，这3类容器的分工各有不同。

初始化容器

执行主容器运行之前所需的初始化工作，初始化容器运行在主容器启动之前，初始化容器串行运行完毕之后，主容器才会启动，否则主容器不启动；

主容器

执行重要任务，生命周期包含PostStartHook(启动后)、Runing(运行中)、PreStopHook(停止前)3个阶段，主容器和Sidecar容器同时运行，同时销毁；

主容器在运行的过程中可以设置3类探针以监控容器的启动、就绪、存活状态；

启动探针（startupProbe）

探测容器内的应用是否启动成功，在启动探针探测成功之前，其它类型的探针都会暂时处于禁用状态；启动失败Pod状态为Not

就绪探针(ReadinessProbe)

探测Pod是否进入READY状态，并做好接收请求的准备。如果探测失败Pod则会进入NOTREADY状态（READY为0/1）并且从所关联的Service资源的端点(Endpoints)中踢出，Service将不会再把访问请求转发给这个Pod；

存活探针(LivenessProbe)

探测容器是否运行正常。如果探测失败，控制器会修改改Pod的资源状态进而触发Kubelet杀掉容器（不是Pod），容器会根据重启策略决定是否重启；

SideCar容器

执行辅助业务，生命周期包含ready和stop阶段，Sidecar和主容器同时运行，同时销毁；

1.1.1.健康状态检测

livenessProbe：在当前容器启动后，通过执行Shell命令、发送HTTP请求、连接Socket这3种方法，对容器的运行状态进行周期性的健康状态检测；

如果当前容器的健康状态检测失败，当前容器就会被重启。

1.1.1.1.exec command

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness-exec
  name: liveness-exec
spec:
  containers:
  - name: liveness-demo
    image: busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - test
        - -e
        - /tmp/healthy

kubectl describe pod liveness-exec

ame:         liveness-exec
Namespace:    default
Priority:     0
Node:         node1/192.168.56.19
Start Time:   Sat, 10 Dec 2022 10:45:45 +0800
Labels:       test=liveness-exec
Annotations:  cni.projectcalico.org/podIP: 10.244.166.138/32
              kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"test":"liveness-exec"},"name":"liveness-exec","namespace":"default...
Status:       Running
IP:           10.244.166.138
IPs:
  IP:  10.244.166.138
Containers:
  liveness-demo:
    Container ID:  docker://8a653437e5326a55aacf20ce1b7544fc7bd352802fbc24fb4091a872d9445ae4
    Image:         busybox
    Image ID:      docker-pullable://busybox@sha256:5acba83a746c7608ed544dc1533b87c737a0b0fb730301639a0179f9344b1678
    Port:          <none>
    Host Port:     <none>
    Args:
      /bin/sh
      -c
      touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Sat, 10 Dec 2022 11:34:02 +0800
      Finished:     Sat, 10 Dec 2022 11:35:31 +0800
    Ready:          False
    Restart Count:  15
    Liveness:       exec [test -e /tmp/healthy] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-2wg5g (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-2wg5g:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-2wg5g
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  <unknown>             default-scheduler  Successfully assigned default/liveness-exec to node1
  Normal   Created    47m (x3 over 51m)     kubelet, node1     Created container liveness-demo
  Normal   Started    47m (x3 over 51m)     kubelet, node1     Started container liveness-demo
  Normal   Killing    46m (x3 over 50m)     kubelet, node1     Container liveness-demo failed liveness probe, will be restarted
  Normal   Pulling    30m (x9 over 51m)     kubelet, node1     Pulling image "busybox"
  Normal   Pulled     19m (x12 over 51m)    kubelet, node1     Successfully pulled image "busybox"
  Warning  BackOff    9m15s (x90 over 39m)  kubelet, node1     Back-off restarting failed container
  Warning  Unhealthy  3m56s (x43 over 50m)  kubelet, node1     Liveness probe failed:

1.1.1.2.发送HTTP请求

deployment:
  spec:
    replicas: 2
    strategy:
      rollingUpdate:
        maxSurge: 0
        maxUnavailable: 1
    template:
      spec:
        nodeSelector:
          ailme: "a100"
        containers:
        - name: app
          command:
            - "python"
            - "serve.py"
          resources:
            limits:
              cpu: '2'
              memory: 2000Mi
            requests:
              cpu: '1'
              memory: 1000Mi
          env:
          - name: ENV
            value: cn
          volumeMounts:
          - mountPath: /mnt
            name: model
        - name: text-generation-inference-ailme
          image: 'qa-roc.apuscn.com/deploy_prod/nlp/text-generation-inference-ailme:sha-5a58226'
          imagePullPolicy: Always
          args: ["--model-id", "/data/ailme13b","--port","8080","--json-output","--sharded","false","--max-input-length", "2040","--max-total-tokens","2048"]
          ports:
          - containerPort: 8080
            protocol: TCP
            scheme: HTTP
          resources:
            requests:
              cpu: "2"
              memory: 10000Mi
            limits:
              cpu: "4"
              memory: 32000Mi
              tke.cloud.tencent.com/qgpu-core: "100"
          volumeMounts:
            - mountPath: /data
              name: model
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /info
              port: 8080
            initialDelaySeconds: 120
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
        - name: apusagent
        volumes:
        - name: model
          nfs:
            server: 10.56.9.229
            path: /
ingress:
service:
  spec:
    type: ClusterIP

wangzhe.yaml

测试

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness-demo
    image: nginx:1.14-alpine
    ports:
    - name: http
      containerPort: 80
    lifecycle:
      postStart:
        exec:
          command:
          - /bin/sh
          - -c
          - 'echo Healty > /usr/share/nginx/html/healthz'
    livenessProbe:
      httpGet:
        path: /healthz
        port: http
        scheme: HTTP
      periodSeconds: 2
      failureThreshold: 2
      initialDelaySeconds: 3

kubectl describe pod liveness-http

Containers:
  liveness-demo:
    Container ID:   docker://5455b4c3b1220617e5455b0ebf0d32aec39de58ce028f64b06c6eb3d4e03c939
    Image:          nginx:1.14-alpine
    Image ID:       docker-pullable://nginx@sha256:485b610fefec7ff6c463ced9623314a04ed67e3945b9c08d7e53a47f6d108dc7
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sat, 10 Dec 2022 11:17:21 +0800
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:http/healthz delay=3s timeout=1s period=2s #success=1 #failure=2
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-2wg5g (ro)

1.1.2.就绪状态检测

ReadinessProbe用于检查容器中运行的应用程序是否准备就绪？是否可以负载均衡来自Service的流量？

1.1.2.1.exec command

piVersion: v1
kind: Pod
metadata:
  labels:
    test: readiness-exec
  name: readiness-exec
spec:
  containers:
  - name: readiness-demo
    image: busybox
    args: ["/bin/sh", "-c", "while true; do rm -f /tmp/ready; sleep 30; touch /tmp/ready; sleep 300; done"]
    readinessProbe:
      exec:
        command: ["test", "-e", "/tmp/ready"]
      initialDelaySeconds: 5
      periodSeconds: 5

kubectl describe pod readiness-exec

Name:         readiness-exec
Namespace:    default
Priority:     0
Node:         node1/192.168.56.19
Start Time:   Sat, 10 Dec 2022 15:54:15 +0800
Labels:       test=readiness-exec
Annotations:  cni.projectcalico.org/podIP: 10.244.166.143/32
              kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"test":"readiness-exec"},"name":"readiness-exec","namespace":"defau...
Status:       Running
IP:           10.244.166.143
IPs:
  IP:  10.244.166.143
Containers:
  readiness-demo:
    Container ID:  docker://b476ef68b3eb3aa38851eb13c929a20e1c89ff67ff54845db82c15bad170d79f
    Image:         busybox
    Image ID:      docker-pullable://busybox@sha256:5acba83a746c7608ed544dc1533b87c737a0b0fb730301639a0179f9344b1678
    Port:          <none>
    Host Port:     <none>
    Args:
      /bin/sh
      -c
      while true; do rm -f /tmp/ready; sleep 30; touch /tmp/ready; sleep 300; done
    State:          Running
      Started:      Sat, 10 Dec 2022 15:54:31 +0800
    Ready:          True
    Restart Count:  0
    Readiness:      exec [test -e /tmp/ready] delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-2wg5g (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-2wg5g:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-2wg5g
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  <unknown>          default-scheduler  Successfully assigned default/readiness-exec to node1
  Normal   Pulling    61s                kubelet, node1     Pulling image "busybox"
  Normal   Pulled     45s                kubelet, node1     Successfully pulled image "busybox"
  Normal   Created    45s                kubelet, node1     Created container readiness-demo
  Normal   Started    45s                kubelet, node1     Started container readiness-demo
  Warning  Unhealthy  16s (x5 over 36s)  kubelet, node1     Readiness probe failed:

1.2.Pod的状态

1、Pending 挂起，Pod 已被 Kubernetes 系统接收，但仍有一个或多个容器未被创建，可以通过kubectl describe 查看处于 Pending 状态的原因

2、Running 运行中，Pod 已经被绑定到一个节点上，并且所有的容器都已经被创建，而且至少有一个是运行状态，或者是正在启动或者重启，可以通过 kubectl logs 查看 Pod 的日志

3、Succeeded 成功，所有容器执行成功并终止，并且不会再次重启，可以通过 kubectl logs 查看 Pod 日志

4、Failed 失败，所有容器都已终止，并且至少有一个容器以失败的方式终止，也就是说这个容器要么以非零状态退出，要么被系统终止，可以通过 logs 和 describe 查看 Pod 日志和状态

5、Unknown 未知，通常是由于API-Server和Kubelet之间的通信问题，造成API-Server无法获得 Pod 的状态

6、ImagePullBackOff 镜像拉取失败，一般是由于镜像不存在、网络不通或者需要登录认证引起的，可以使用 describe 命令查看具体原因

7、CrashLoopBackOff 容器启动失败，可以通过 logs 命令查看具体原因，一般为启动命令不正确，健康检查不通过等

8、OOMKilled 容器内存溢出，一般是容器的内存 Limit 设置的过小，或者程序本身有内存溢出，可以通过 logs 查看程序启动日志

9、Terminating Pod 正在被删除，可以通过 describe 查看状态

10、SysctlForbidden Pod 自定义了内核配置，但 kubelet 没有添加内核配置或配置的内核参数不支持，可以通过 describe 查看具体原因

11、Completed 容器内部主进程退出，一般计划任务执行结束会显示该状态，此时可以通过 logs 查看容器日志

12、ContainerCreating Pod 正在创建，一般为正在下载镜像，或者有配置不当的地方，可以通过 describe 查看具体原因

13、InvalidImageName 无法解析镜像名称

14、ImageInspectError 无法校验镜像

15、ErrImageNeverPull 策略禁止拉取镜像

16、RegistryUnavailable 连接不到镜像中心

17、CreateContainerConfigError 不能创建kubelet使用的容器配置

18、CreateContainerError 创建容器失败

19、m.internalLifecycle.PreStartContainer 执行hook报错

20、RunContainerError 启动容器失败

21、PostStartHookError 执行hook报错

22、ContainersNotInitialized 容器没有初始化完毕

23、ContainersNotReady 容器没有准备完毕

24、PodInitializing pod 初始化中

25、DockerDaemonNotReady docker还没有完全启动

26、NetworkPluginNotReady 网络插件还没有完全启动

1.3.Pod的安全上下文

定义容器以什么系统用户身份运行

1.4.Pod的资源配额

在K8s集群中可以定义Pod的资源配额，这就容器云的多租户，弹性计算，提供条件；

CPU资源的计量方式：1个核心=1000个微核心，即1=1000m,0.5=500m；

内存资源的计量方式：默认单位为字节，也可以使用K/E/P/T/G/M或者Ki/Ei/Pi/Ti/Gi/Mi的形式为单位后缀；

我们可以定义Pod对Node节点的存储、CPU、内存资源的要求和上限，CPU属于压缩资源，当容器使用的内存、硬盘资源达到上限被被Kill；

apiVersion: v1
kind: Pod
metadata:
  name: stress-pod
spec:
  containers:
  - name: stress
    image: ikubernetes/stress-ng
    command: ["/usr/bin/stress-ng", "-c 1", "-m 1", "--metrics-brief"]
    resources:
      requests:
        memory: "128Mi"
        cpu: "200m"
      limits:
        memory: "512Mi"
        cpu: "400m"

1.5.Pod服务质量类别

根据Pod对象的request、limits属性，Kubenetes把Pod对象归类为BestEffort、Burstable、Guaranteed 三个服务质量类别（Quality Of Service）即Qos；

Guraranteed：每个容器都为CPU/内存资源设置了相同值的request和limit属性，这类Pod资源具有最高优先级；
Burstable：至少有1个容器这种CPU/内存资源的request属性值，但不满足Guraranteed类别；
BestEffort：没有任何1个容器设置request和limit属性

2.控制器资源管理

我们通过.yaml文件创建出来的各种类型的控制器对象，称为活动的控制器对象（LiveObjects），它们统称为控制器资源；

控制器对象（LiveObjcet）通过ReconciliationLoop保证LiveObjects的状态属性（Status）和用户期望的状态属性（Spec）一致；

在k8s中几乎每1种特定资源（Pod/Service/Ingress）都有一1种特定的Controller对象来创建、维护，那谁来创建、维护各种Controller对象呢？

ControllerManager中包含各类Controller类的代码，ControllerManager可以创建各种类型的Controller，这些Controllers由ControllerManager进行统一管理；

各种类型的Controller被包含在ControllerManager中，所有的Controllers最终被运行为1个守护进程，这个组织了各种Controllers的守护进程名称为kube-controller-manager，我们称这个守护进程为ControllerManager；

/etc/kubernetes/manifests/kube-controller-manager.yaml

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-controller-manager
    tier: control-plane
  name: kube-controller-manager
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-controller-manager
    - --allocate-node-cidrs=true
    - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --bind-address=127.0.0.1
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --cluster-cidr=10.244.0.0/16
    - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    - --controllers=*,bootstrapsigner,tokencleaner
    - --kubeconfig=/etc/kubernetes/controller-manager.conf
    - --leader-elect=true
    - --node-cidr-mask-size=24
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
    - --root-ca-file=/etc/kubernetes/pki/ca.crt
    - --service-account-private-key-file=/etc/kubernetes/pki/sa.key
    - --service-cluster-ip-range=10.96.0.0/12
    - --use-service-account-credentials=true
    image: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.17.4
    imagePullPolicy: IfNotPresent

例如：1个由Deployment控制器对象管理的Pod对象，之上的管理组织架构应当如下：

运行在kube-controller-manager守护进程中的DeploymentPod控制器-----------管理--------->各种Deployment控制器对象(Nginx-deploy、Django-deploy......）

各种Deployment控制器对象(Nginx-deploy、Django-deploy...）-----------管理--------->各种Pod资源对象(Nginx-pod、Django-pod......)

2.1.Pod控制器

Pod控制器是用于管理Pod基础资源的；

Pod是用来管理容器的，容器是用来运行应用程序的，根据应用程序业务类型不同，k8s管理员需要选择不同的Pod控制器来管理部署不同的应用程序；

应用程序可以划分为多种类型：

守护进程

无状态：上一次用户请求和当前用户请求无关；

非系统级:采用Deployment控制器管理

　　系统级：DaemontSet

有状态：上一次用户请求和当前用户请求有关；（采用StatefulSet控制器管理）

非守护进程

非周期性运行：Job

周期性运行：CronJob

以下将介绍在.yaml文件定义Pod控制器的格式，实例化出各种Pod控制器；

2.1.1.ReplicationSetController

ReplicaSet控制器是可以直接控制Pod的控制器，可以保证用户指定的Pod副本数量，精确满足用户要求；

定义1个名称为myapp-rs的ReplicaSet控制器

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: myapp-rs
spec:
  replicas: 2
  selector:
     matchLabels:
       app: myapp-pod
  template:
    metadata:
      labels:
        app: myapp-pod
    spec:
      containers:
      - name: myapp
        image: ikubernetes/myapp:v1
        ports:
        - name: http
          containerPort: 80

删除名称为myapp-rs的ReplicaSet控制器

由控制器创建出来的Pod资源，通过删控制器，删除Pod。

[root@master chapter5]# kubectl delete rs myapp-rs
replicaset.apps "myapp-rs" deleted

Pod数量的在线扩缩容

当Pod资源被控制器创建出来之后，我们可以对Pod的数量进行动态调整，达到在线扩缩容的目的，但无法触发Pod中应用程序版本的滚动更新功能；

[root@master chapter5]# kubectl scale --replicas=3 rs myapp-rs
replicaset.apps/myapp-rs scaled

2.1.2.DeploymentController

DeploymentPod控制器不直接控制Pod资源；

由于ReplicaSet控制器无法实现版本滚动更新和回退，Deployment控制器封装了ReplicaSet控制器，间接为ReplicaSet控制器扩展了滚动部署的功能；

DeploymentPod控制器控制ReplicaSet控制器的方式实现版本滚动更新；

创建Deployment控制器

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-nginx
spec:
  replicas: 3
  minReadySeconds: 10
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.10-alpine
        ports:
        - containerPort: 80
          name: http
        readinessProbe:
          periodSeconds: 1
          httpGet:
            path: /
            port: http

查看RS控制器的名称和Pod名称

[root@master chapter5]# kubectl apply -f deploy-nginx.yaml 
deployment.apps/deploy-nginx unchanged
[root@master chapter5]# kubectl get deploy
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
deploy-nginx   3/3     3            3           4m15s
[root@master chapter5]# kubectl get rs
NAME                      DESIRED   CURRENT   READY   AGE
deploy-nginx-5745bb45d7   3         3         3       4m19s
[root@master chapter5]# kubectl get pod
NAME                            READY   STATUS    RESTARTS   AGE
deploy-nginx-5745bb45d7-2vspv   1/1     Running   0          4m22s
deploy-nginx-5745bb45d7-db4vd   1/1     Running   0          4m22s
deploy-nginx-5745bb45d7-qhdgx   1/1     Running   0          4m22s
[root@master chapter5]#

一键版本滚动更新

kubectl set image deployment deploy-nginx nginx=nginx:1.15-alpine

滚动更新策略

在不影响线上业务的情况下，如果镜像的版本需要更新，可以采用Devplyment的滚动更新功能；

Devplyment控制器支持滚动更新操作，也可以在yaml中设置其滚动更新的详细步骤即滚动更新策略；

如用户指定了滚动更新策略replicas=3，maxSurge=1，maxUnavailab=1，就等于声明了Depolyment控制器在滚动更新过程中出现的Pod副本的数量。

为什么要从maxSurge和maxUnavailab这2个层面来限，滚动更新过程中新旧Pod的总数量呢？

因为有时候整个K8s集群必须运行特定数量的Pod；

K8S集群之中的Node资源，无法运行(replicas+1）个Pod，但是可以少1个；（性能限制）
线上服务不允许运行（replicas-1）个Pod，服务整体会出问题，但是可以多1个；（服务要求限制）

Depolyment控制器在滚动更新过程中：

maxSurge=1：新旧Pod副本的总数量可以多1个，那么新旧pod数量总和的上限= 3+1=4个；
maxUnavailab=1：新旧Pod副本的总数量可以少1个，那么新旧pod数量总和的上限=3-1=2个；

Depolyment控制器在滚动更新之后

用户期望新版本Pod的副本（replicas）总数量最终=3个；

strategy:
    rollingUpdate:
      maxSurge: 1        #滚动更新时Pod的副本运行少1个
      maxUnavailable: 1  #滚动更新时Pod的副本运行多1个
type: RollingUpdate

rollingUpdate策略的更新步骤应该如下

减1个old版本，then加2个new版本
减2个old版本, then减1个old版本

[root@master chapter5]# vim deploy-nginx.yaml 
[root@master chapter5]# kubectl apply -f deploy-nginx.yaml 
deployment.apps/deploy-nginx configured
[root@master chapter5]# kubectl get pod -w
NAME                            READY   STATUS              RESTARTS   AGE
deploy-nginx-5745bb45d7-db4vd   1/1     Running             0          12m
deploy-nginx-5745bb45d7-qhdgx   1/1     Running             0          12m
deploy-nginx-754874567-2sfs9    1/1     Running             0          4s
deploy-nginx-754874567-d2dbn    0/1     ContainerCreating   0          4s
^C[root@master chapter5]# kubectl get pod 
NAME                            READY   STATUS              RESTARTS   AGE
deploy-nginx-5745bb45d7-db4vd   1/1     Running             0          13m
deploy-nginx-5745bb45d7-qhdgx   1/1     Running             0          13m
deploy-nginx-754874567-2sfs9    1/1     Running             0          15s
deploy-nginx-754874567-d2dbn    0/1     ContainerCreating   0          15s
[root@master chapter5]# kubectl get pod -w
NAME                            READY   STATUS              RESTARTS   AGE
deploy-nginx-5745bb45d7-db4vd   1/1     Running             0          13m
deploy-nginx-5745bb45d7-qhdgx   1/1     Running             0          13m
deploy-nginx-754874567-2sfs9    1/1     Running             0          20s
deploy-nginx-754874567-d2dbn    0/1     ContainerCreating   0          20s
deploy-nginx-754874567-d2dbn    0/1     Running             0          73s
deploy-nginx-5745bb45d7-qhdgx   1/1     Terminating         0          13m
deploy-nginx-754874567-pr9jb    0/1     Pending             0          0s
deploy-nginx-754874567-pr9jb    0/1     Pending             0          0s
deploy-nginx-754874567-pr9jb    0/1     ContainerCreating   0          0s
deploy-nginx-754874567-d2dbn    1/1     Running             0          74s
deploy-nginx-754874567-pr9jb    0/1     ContainerCreating   0          1s
deploy-nginx-5745bb45d7-qhdgx   0/1     Terminating         0          13m
deploy-nginx-754874567-pr9jb    0/1     Running             0          2s
deploy-nginx-754874567-pr9jb    1/1     Running             0          2s
deploy-nginx-5745bb45d7-qhdgx   0/1     Terminating         0          14m
deploy-nginx-5745bb45d7-qhdgx   0/1     Terminating         0          14m
deploy-nginx-5745bb45d7-db4vd   1/1     Terminating         0          14m
deploy-nginx-5745bb45d7-db4vd   0/1     Terminating         0          14m
deploy-nginx-5745bb45d7-db4vd   0/1     Terminating         0          14m
deploy-nginx-5745bb45d7-db4vd   0/1     Terminating         0          14m
^C[root@master chapter5]# kubectl get pod 
NAME                           READY   STATUS    RESTARTS   AGE
deploy-nginx-754874567-2sfs9   1/1     Running   0          110s
deploy-nginx-754874567-d2dbn   1/1     Running   0          110s
deploy-nginx-754874567-pr9jb   1/1     Running   0          37s
[root@master chapter5]# kubectl get rs
NAME                      DESIRED   CURRENT   READY   AGE
deploy-nginx-5745bb45d7   0         0         0       14m    #DeploymentPod控制器可以控制ReplicaSet控制器的方式实现版本滚动更新；
deploy-nginx-754874567    3         3         3       2m12s  #DeploymentPod控制器底层换了1个ReplicaSet控制器

maxUnavailable和maxSurge这2个滚动更新配置项，可以同时指定2个，也可以仅指定其中的1个。

例如：目前只有1个node节点支持GPU功能，且这个节点上目前正在运行着1个Pod（Pod副本数量为1）。

这是滚动更新策略可以仅设置maxUnavailable: 1，那么新的Pod就会更新成功；

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-nginx
spec:
  replicas: 1
  minReadySeconds: 10
  strategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.10-alpine
        ports:
        - containerPort: 80
          name: http
        readinessProbe:
          periodSeconds: 1
          httpGet:
            path: /
            port: http

Recreate策略

其实当replicas=1时，更加简单直接的方式是采用Recreate策略，而不是滚动更新策略。

在测试环境中，还有1种更加简单直接的策略去更新Pod，那就是重新创建pod。

重建更新策略不支持maxSurge、maxUnavailable参数，此策略是将原有Pod删除后重建新的Pod，更新期间Pod（应用）将不可用，这也是不推荐此策略的原因。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-nginx
spec:
  replicas: 1
  minReadySeconds: 10
  strategy:
    type: Recreate

版本回滚策略

[root@master ~]# kubectl get rs -o wide
NAME                      DESIRED   CURRENT   READY   AGE     CONTAINERS   IMAGES              SELECTOR
deploy-nginx-5745bb45d7   0         0         0       64m     nginx        nginx:1.10-alpine   app=nginx,pod-template-hash=5745bb45d7
deploy-nginx-5b5d689658   3         3         3       6m21s   nginx        nginx:1.15-alpine   app=nginx,pod-template-hash=5b5d689658
deploy-nginx-754874567    0         0         0       52m     nginx        nginx:1.14-alpine   app=nginx,pod-template-hash=754874567
[root@master ~]# kubectl rollout history deployment/deploy-nginx
deployment.apps/deploy-nginx 
REVISION  CHANGE-CAUSE
1         <none>
2         <none>
3         <none>
[root@master ~]# kubectl rollout undo deployment/deploy-nginx
deployment.apps/deploy-nginx rolled back
[root@master ~]# kubectl rollout history deployment/deploy-nginx
deployment.apps/deploy-nginx 
REVISION  CHANGE-CAUSE
1         <none>
3         <none>
4         <none>
[root@master ~]# kubectl get rs -o wide
NAME                      DESIRED   CURRENT   READY   AGE     CONTAINERS   IMAGES              SELECTOR
deploy-nginx-5745bb45d7   0         0         0       68m     nginx        nginx:1.10-alpine   app=nginx,pod-template-hash=5745bb45d7
deploy-nginx-5b5d689658   0         0         0       9m35s   nginx        nginx:1.15-alpine   app=nginx,pod-template-hash=5b5d689658
deploy-nginx-754874567    3         3         3       55m     nginx        nginx:1.14-alpine   app=nginx,pod-template-hash=754874567
[root@master ~]#

Rollout Pause和Rollout resume

如果在版本滚动更新的过程中，只更新了一部分Pod之后就Pause暂停了，产生了新旧2种类型的Pod，称为金丝雀发布；

[root@master ~]# kubectl set image deployment deploy-nginx nginx=nginx:1.16-alpine && kubectl rollout pause deployment/deploy-nginx
deployment.apps/deploy-nginx paused
[root@master ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
deploy-nginx-5b5d689658-dtcps 1/1 Running 0 17m
deploy-nginx-5b5d689658-mdmqs 1/1 Running 0 17m
deploy-nginx-84957d6795-kmc8d 1/1 Running 0 52s  #放2只金丝雀
deploy-nginx-84957d6795-lr4lz 1/1 Running 0 52s

借助Istio引入一部分用户流量之后，发现金丝雀版本没bug，那就彻底放开继续resume发新版吧，称为滚动更新；

[root@master ~]# kubectl set image deployment deploy-nginx nginx=nginx:1.15-alpine && kubectl rollout resume deployment/deploy-nginx
[root@master ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
deploy-nginx-84957d6795-bnqxs 1/1 Running 0 15s
deploy-nginx-84957d6795-kmc8d 1/1 Running 0 2m15s
deploy-nginx-84957d6795-lr4lz 1/1 Running 0 2m15s

至此，DeploymentPod控制器又有了一键滚动更新功能，运维工程师就不用再像之前那样，写脚本精心策划版本更新策略了；

DeploymentPod控制器完全可以替代运维人员的3大核心职能，即发布管理、变更管理、故障处理；

2.1.3.DaemonSet

DaemonSet Pod控制器可以保证每1个节点上只运行1个Pod，支持滚动更新功能；

用户就无法定义Pod副本的数量，当DaemonSet控制器中指定的NodeSlector完全匹配K8s集群中所有Nodes时， DaemonSet Pod控制器控制的Pod副本数量=Node节点的数量；

一旦新的Node节点加入K8s集群Pod的数量就会增加，一旦新的Node节点退出K8s集群，该节点中运行的Pod被垃圾回收；

kubectl apply -f filebeat-ds.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat-ds
  labels:
    app: filebeat
spec:
  selector:
    matchLabels:
      app: filebeat
  template:
    metadata:
      labels:
        app: filebeat
      name: filebeat
    spec:
      containers:
      - name: filebeat
        image: ikubernetes/filebeat:5.6.5-alpine
        env:
        - name: REDIS_HOST
          value: db.ikubernetes.io:6379
        - name: LOG_LEVEL
          value: info

查看DaemonSet控制器

[root@master ~]# kubectl get ds
NAME          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
filebeat-ds   2         2         2       2            2           <none>          17m

查看创建的Pod

[root@master chapter5]# kubectl get pods -l app=filebeat -o wide --show-labels
NAME                READY   STATUS    RESTARTS   AGE     IP               NODE    NOMINATED NODE   READINESS GATES   LABELS
filebeat-ds-5w4g4   1/1     Running   0          6m34s   10.244.166.166   node1   <none>           <none>            app=filebeat,controller-revision-hash=fb6b847cc,pod-template-generation=1
filebeat-ds-xtv4l   1/1     Running   0          6m34s   10.244.104.18    node2   <none>           <none>            app=filebeat,controller-revision-hash=fb6b847cc,pod-template-generation=1

Daemonset控制器滚动更新

[root@master ~]# kubectl set image daemonset/filebeat-ds filebeat=ikubernetes/filebeat:5.6.6-alpine -n default
daemonset.apps/filebeat-ds image updated
[root@master chapter5]# kubectl get pods -w
NAME                READY   STATUS    RESTARTS   AGE
filebeat-ds-5w4g4   1/1     Running   0          15m
filebeat-ds-xtv4l   1/1     Running   0          15m
filebeat-ds-5w4g4   1/1     Terminating   0          21m
filebeat-ds-5w4g4   0/1     Terminating   0          21m
filebeat-ds-5w4g4   0/1     Terminating   0          21m
filebeat-ds-5w4g4   0/1     Terminating   0          21m
filebeat-ds-s2vm8   0/1     Pending       0          0s
filebeat-ds-s2vm8   0/1     Pending       0          0s
filebeat-ds-s2vm8   0/1     ContainerCreating   0          0s
filebeat-ds-s2vm8   0/1     ContainerCreating   0          1s
filebeat-ds-s2vm8   1/1     Running             0          20s
filebeat-ds-xtv4l   1/1     Terminating         0          22m
filebeat-ds-xtv4l   0/1     Terminating         0          22m
filebeat-ds-xtv4l   0/1     Terminating         0          22m
filebeat-ds-xtv4l   0/1     Terminating         0          22m
filebeat-ds-98r79   0/1     Pending             0          0s
filebeat-ds-98r79   0/1     Pending             0          0s
filebeat-ds-98r79   0/1     ContainerCreating   0          0s
filebeat-ds-98r79   0/1     ContainerCreating   0          2s
filebeat-ds-98r79   1/1     Running             0          6s

增加nodeSelector选择指定节点

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat-ds
  labels:
    app: filebeat
spec:
  selector:
    matchLabels:
      app: filebeat
  template:
    metadata:
      labels:
        app: filebeat
      name: filebeat
    spec:
      containers:
      - name: filebeat
        image: ikubernetes/filebeat:5.6.5-alpine
        env:
        - name: REDIS_HOST
          value: db.ikubernetes.io:6379
        - name: LOG_LEVEL
          value: info
      nodeSelector:
        logcollecting: "on"

node节点设置logcollecting="on"标签

[root@master chapter5]# kubectl get pods
No resources found in default namespace.
[root@master chapter5]# kubectl label node node2 logcollecting="on" --overwrite
node/node2 labeled
[root@master chapter5]# kubectl get pods
NAME                READY   STATUS    RESTARTS   AGE
filebeat-ds-wnfzg   1/1     Running   0          3s

2.1.4.Job

Job控制器能创建出1个或多个Pod，控制Pod执行完1次性作业任务，之后退出；

如果1次性作业任务可以被切分成N个子任务，可以使用Job控制器创建多个Pod并行去完成各个子任务。（多进程）

2.1.4.1.restartPolicy任务重启策略

Never： Pod在执行作业时出现了错误，当前Pod不会被重启直接退出，Pod状态为Failure，但是会创建NewPod；

OnFailure：Pod在执行作业时出现了错误，当前Pod会被不断重启，直到作业完成。

2.1.4.2.单路作业

apiVersion: batch/v1
kind: Job
metadata:
  name: job-example
spec:
  template:
    metadata:
      labels:
        app: myjob
    spec:
      containers:
      - name: myjob
        image: alpine
        command: ["/bin/sh",  "-c", "sleep 10"]
      restartPolicy: Never

--------------------------------

[root@master chapter5]# kubectl apply -f job-example.yaml 
job.batch/job-example unchanged
[root@master chapter5]# kubectl get job
NAME          COMPLETIONS   DURATION   AGE
job-example   1/1           29s        6m6s
[root@master chapter5]# kubectl get job -o wide
NAME          COMPLETIONS   DURATION   AGE     CONTAINERS   IMAGES   SELECTOR
job-example   1/1           29s        6m12s   myjob        alpine   controller-uid=8818358f-2dd6-4a15-ab20-e4cb4c59bde7
[root@master chapter5]# kubectl get pod
NAME                READY   STATUS      RESTARTS   AGE
job-example-pqh8t   0/1     Completed   0          6m33s
[root@master chapter5]#

2.1.4.3.多路作业

apiVersion: batch/v1
kind: Job
metadata:
  name: job-multi
spec:
  completions: 5
  parallelism: 2
  template:
    metadata:
      labels:
        app: myjob
    spec:
      containers:
      - name: myjob
        image: alpine
        command: ["/bin/sh",  "-c", "sleep 3"]
      restartPolicy: Never

completions: 5 parallelism: 2的含义：共有5个Pod（任务），1次最多能同时执行2个任务，所以需要分个3批次完成；

[root@master ~]# kubectl get pods -w
NAME              READY   STATUS    RESTARTS   AGE
job-multi-c5sb7   0/1     Pending   0          0s
job-multi-c5sb7   0/1     Pending   0          0s
job-multi-s4prd   0/1     Pending   0          0s
job-multi-s4prd   0/1     Pending   0          0s
job-multi-c5sb7   0/1     ContainerCreating   0          0s
job-multi-s4prd   0/1     ContainerCreating   0          0s
job-multi-s4prd   0/1     ContainerCreating   0          1s
job-multi-c5sb7   0/1     ContainerCreating   0          1s
job-multi-s4prd   1/1     Running             0          16s
job-multi-c5sb7   1/1     Running             0          17s
job-multi-s4prd   0/1     Completed           0          19s
job-multi-d2x7w   0/1     Pending             0          0s
job-multi-d2x7w   0/1     Pending             0          1s
job-multi-d2x7w   0/1     ContainerCreating   0          1s
job-multi-d2x7w   0/1     ContainerCreating   0          1s
job-multi-c5sb7   0/1     Completed           0          21s
job-multi-p4dlz   0/1     Pending             0          0s
job-multi-p4dlz   0/1     Pending             0          0s
job-multi-p4dlz   0/1     ContainerCreating   0          0s
job-multi-p4dlz   0/1     ContainerCreating   0          0s
job-multi-d2x7w   1/1     Running             0          17s
job-multi-d2x7w   0/1     Completed           0          20s
job-multi-7779j   0/1     Pending             0          0s
job-multi-7779j   0/1     Pending             0          0s
job-multi-7779j   0/1     ContainerCreating   0          0s
job-multi-7779j   0/1     ContainerCreating   0          1s
job-multi-7779j   1/1     Running             0          12s
job-multi-p4dlz   1/1     Running             0          30s
job-multi-7779j   0/1     Completed           0          14s
job-multi-p4dlz   0/1     Completed           0          32s

2.1.5.CronJob

类似于replicaSet和deployment控制器的关系；

CronJob控制器通过控制Job控制器，去周期性执行Job任务；

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: cronjob-example
  labels:
    app: mycronjob
spec:
  schedule: "*/2 * * * *"
  jobTemplate:
    metadata:
      labels:
        app: mycronjob-jobs
    spec:
      parallelism: 2
      template:
        spec:
          containers:
          - name: myjob
            image: alpine
            command:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster; sleep 10
          restartPolicy: OnFailure

查看

[root@master chapter5]# kubectl get job
NAME                         COMPLETIONS   DURATION   AGE
cronjob-example-1670823360   2/1 of 2      41s        79s
[root@master chapter5]# kubectl get cronjob
NAME              SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob-example   */2 * * * *   False     0        98s             2m46s
[root@master chapter5]# kubectl get pod
NAME                               READY   STATUS      RESTARTS   AGE
cronjob-example-1670823360-ljmzw   0/1     Completed   0          92s
cronjob-example-1670823360-ts48l   0/1     Completed   0          92s

2.1.6.StatefulSets

StatefulSet 是用来管理有状态应用的工作负载 API 对象。

实例之间有不对等关系，以及实例对外部数据有依赖关系的应用，称为有状态应用；

StatefulSet 用来管理 Deployment 和扩展一组 Pod，并且能为这些 Pod 提供序号和唯一性保证。

StatefulSets 对于需要满足以下一个或多个需求的应用程序很有价值：

稳定的、唯一的网络标识符。稳定的、持久的存储。有序的、优雅的部署和缩放。有序的、自动的滚动更新。

3.Service资源管理

Pod对象的动态性会给访问Pod的客户端带来以下困扰：

Pod资源对象存在生命周期且不可以被重建，必要时仅能创建1个新的替代者；
Pod对象在其控制器进行应用规模伸缩时，同1应用程序的Pod对象会增加或减少；

Service资源为动态管理的Pod对象添加了1个有着固定访问入口的抽象层

Service通过标签选择器关联到拥有相关标签的Pod对象；
客户端向Service发送请求，间接到达目标Pod；

3.1.Service创建流程

简单来说，1个Service对象本质是存在在每1个工作节点（Node）上的iptables/ipvs规则；

每1个Node节点上的kube-proxy始终watch着kube-apiserver上的每1个Service资源的变更；
一旦Service资源发生变动，kube-apiserver马上通知给每1个Node节点上的kube-proxy；
1个Node节点上的kube-proxy把Service的定义转换为当前Node上相应的iptables/ipvs规则；

3.2.Service转发流程

基于以上Service的创建流程，现在iptables/ipvs规则已经生成，并正常工作做每1个节点的内核空间；

ClientPod通过Service访问ServerPod的大致流程如下：

Pod中的容器本质是运行Node节点上的进程，进程请求Service的IP地址，就一定会被当前Node节点内核空间中的iptables/ipvs拦截到；
iptables/ipvs根据Node节点内核空间中的生成的iptables/ipvs规则被转发到Endpoint；
如果Endpoint是PodIP直接通过Pod网络直接通信，如果是Endpoint是外部IP通过外网通信；

3.3.Service资源定义

Service资源是K8s的标准资源之一，可以通过yaml文件进行定义；

Service资源分为以下几种类型，分别实现不同Service的调用功能；

kubectl explain service.spec.type

KIND:     Service
VERSION:  v1

FIELD:    type <string>

DESCRIPTION:
     type determines how the Service is exposed. Defaults to ClusterIP. Valid
     options are ExternalName, ClusterIP, NodePort, and LoadBalancer.
     "ExternalName" maps to the specified externalName. "ClusterIP" allocates a
     cluster-internal IP address for load-balancing to endpoints. Endpoints are
     determined by the selector or if that is not specified, by manual
     construction of an Endpoints object. If clusterIP is "None", no virtual IP
     is allocated and the endpoints are published as a set of endpoints rather
     than a stable IP. "NodePort" builds on ClusterIP and allocates a port on
     every node which routes to the clusterIP. "LoadBalancer" builds on NodePort
     and creates an external load-balancer (if supported in the current cloud)
     which routes to the clusterIP. More info:

3.3.1.ClusterIP

默认类型，使用PodIP

apiVersion: v1
kind: Service
metadata:
 name: myapp-service
 namespace: default
spec:
 ports:
  - name: http
    port: 80
    targetPort: 80
 selector:
  app: myapp-pod

3.3.2.NodePort

在每1个Node节点上暴露30080端口；

apiVersion: v1
kind: Service
metadata:
  name: myapp-service
  namespace: default
spec:
  ports:
    - name: http
      port: 80
      nodePort: 30080
      targetPort: 80
  selector:
    app: myapp-pod
  type: NodePort

3.3.3.LoadBalancer

为NodePort类型引入自动管理的负载均衡器，需要想底层的云计算厂商发API请求，得购买LBAS服务；

3.3.4.ExternalName

如果K8s集群中Pod需要调用外网的服务，可以使用ExternalName类型的Service;

ExternalName类型的Service可以将集群外部的Service引入集群内部，提供给集群内部的各个客户端Pod使用；

3.3.5.HeadLessService

客户端Pod直接通过CoreDNS调度和服务端Pod通信，不受service调度（调度结果受DNS缓存影响），用于Stateful类型的Pod;

4.Ingress资源管理

虽然通过HostNetwork/HostPort/NodePort类型的Service可以引入K8s集群外部的客户端的流量，但是Service本质是工作在节点（Node）的Linux内核空间中的1组Iptables/ipvs规则，所以Service工作在传输层，无法实现应用层的代理功能；

如果在K8s集群中，我们想根据URL的方式进行负载均衡，可以借助Ingress可以实现七层代理功能；

4.1.Ingress与IngressController

Ingerss

Ingress仅是用于定义流量转发规则和调度方式的通用格式的配置信息；

这些通用格式的配置信息需要被IngressController转换为具有HTTP协议转换和调度功能的应用程序（例如：nginx、haproxy、trarfik等应用程序）的配置文件，并有相应的应用程序加载、生效相应的配置信息后完成7层流量的转发代理功能；

IngressController

IngressController运行为1个Pod，这个Pod实时Watch着Ingress资源的变动，把用户定义在Ingress中的配置信息，转换为自身配置的应用程序（nginx、haproxy、traefik）的7层流量路由规则；

4.2.IngressController与Service

Ingress自身不运行使用标签选择器挑选正在提供服务的Pod对象；

Ingress需要由Service对象的辅助完成对后端Pod对象的动态发现和收拢；

IngressController也运行为1个Pod对象，这个Pod可以于后端的Pod直接进行通信；

[root@master ~]# kubectl get pods -n ingress-nginx -o wide
NAME                                        READY   STATUS    RESTARTS   AGE     IP               NODE    NOMINATED NODE   READINESS GATES
nginx-ingress-controller-5556bd798f-4xrmf   1/1     Running   0          5m55s   10.244.166.134   node1   <none>           <none>

IngressController（1个Pod对象）根据Ingress对象的配置，调度流量时，其报文将由IngressController（1个Pod对象）直接调度至目的Pod，中间不再由Service调度；

4.3.部署Ingress-nginx

Ingress-nginx就是基于Nginx原有功能进行扩展开发出来的1款用于实现IngressController功能的软件；

IngressController需要由Kubernetes管理员额外以Addons的形式部署为Pod资源对象；

这个Pod通过kube-api-server获取用户定义在Ingress中的配置信息；

4.3.1.部署nginx-ingress的Pod

apiVersion: v1
kind: Namespace
metadata:
  name: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

---

kind: ConfigMap
apiVersion: v1
metadata:
  name: nginx-configuration
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

---
kind: ConfigMap
apiVersion: v1
metadata:
  name: tcp-services
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

---
kind: ConfigMap
apiVersion: v1
metadata:
  name: udp-services
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nginx-ingress-serviceaccount
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: nginx-ingress-clusterrole
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
      - endpoints
      - nodes
      - pods
      - secrets
    verbs:
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - services
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - create
      - patch
  - apiGroups:
      - "extensions"
      - "networking.k8s.io"
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - "extensions"
      - "networking.k8s.io"
    resources:
      - ingresses/status
    verbs:
      - update

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  name: nginx-ingress-role
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
      - pods
      - secrets
      - namespaces
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - configmaps
    resourceNames:
      # Defaults to "<election-id>-<ingress-class>"
      # Here: "<ingress-controller-leader>-<nginx>"
      # This has to be adapted if you change either parameter
      # when launching the nginx-ingress-controller.
      - "ingress-controller-leader-nginx"
    verbs:
      - get
      - update
  - apiGroups:
      - ""
    resources:
      - configmaps
    verbs:
      - create
  - apiGroups:
      - ""
    resources:
      - endpoints
    verbs:
      - get

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: nginx-ingress-role-nisa-binding
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: nginx-ingress-role
subjects:
  - kind: ServiceAccount
    name: nginx-ingress-serviceaccount
    namespace: ingress-nginx

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: nginx-ingress-clusterrole-nisa-binding
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: nginx-ingress-clusterrole
subjects:
  - kind: ServiceAccount
    name: nginx-ingress-serviceaccount
    namespace: ingress-nginx

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-ingress-controller
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx
      app.kubernetes.io/part-of: ingress-nginx
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ingress-nginx
        app.kubernetes.io/part-of: ingress-nginx
      annotations:
        prometheus.io/port: "10254"
        prometheus.io/scrape: "true"
    spec:
      # wait up to five minutes for the drain of connections
      terminationGracePeriodSeconds: 300
      serviceAccountName: nginx-ingress-serviceaccount
      nodeSelector:
        kubernetes.io/os: linux
      containers:
        - name: nginx-ingress-controller
          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.28.0
          args:
            - /nginx-ingress-controller
            - --configmap=$(POD_NAMESPACE)/nginx-configuration
            - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
            - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
            - --publish-service=$(POD_NAMESPACE)/ingress-nginx
            - --annotations-prefix=nginx.ingress.kubernetes.io
          securityContext:
            allowPrivilegeEscalation: true
            capabilities:
              drop:
                - ALL
              add:
                - NET_BIND_SERVICE
            # www-data -> 101
            runAsUser: 101
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
            - name: https
              containerPort: 443
              protocol: TCP
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 10
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 10
          lifecycle:
            preStop:
              exec:
                command:
                  - /wait-shutdown

---

apiVersion: v1
kind: LimitRange
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
spec:
  limits:
  - default:
    min:
      memory: 90Mi
      cpu: 100m
    type: Container

nginx-ingress.yml

4.3.2.为nginx-ingress的Pod添加Service

使IngressController的Pod对象（nginx-ingress）可以通过NodePort的方式引入外网；

apiVersion: v1
kind: Service
metadata:
  name: ingress
  namespace: ingress-nginx
spec:
  selector:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  ports:
  - name: http
    port: 80
  - name: https
    port: 443
  type: NodePort

ingress-svc.yml

4.4.Ingress资源定义

5.存储资源（Volume）管理

在Docker中使用本地文件系统的目录作为Docker的挂载卷，可以使Docker产生的数据，脱离容器的生命周期存在；

在K8s中Pod是会被动态调度到不同的Node之上的，如果使用本地存储卷，一旦Pod重建到K8s集群中的其他节点，该Pod所依赖的历史数据就会丢失；

为了使Pod产生的数据，可以脱离Pod的生命周期保存，就需要1套外置在K8S集群之外、可通过网络访问分布式存储系统；例如：Ceph、NFS

5.1.K8s存储资源类型

K8s支持以下几种类型的存储系统，作为K8S的文件存储后端；

[root@master ~]# kubectl explain pod.spec.volumes
KIND: Pod
VERSION: v1

RESOURCE: volumes <[]Object>

DESCRIPTION:
List of volumes that can be mounted by containers belonging to the pod.
More info: https://kubernetes.io/docs/concepts/storage/volumes

Volume represents a named volume in a pod that may be accessed by any
container in the pod.

FIELDS:
awsElasticBlockStore <Object>
AWSElasticBlockStore represents an AWS Disk resource that is attached to a
kubelet's host machine and then exposed to the pod. More info:
https://kubernetes.io/docs/concepts/storage/volumes#awselasticblockstore

azureDisk <Object>
AzureDisk represents an Azure Data Disk mount on the host and bind mount to
the pod.

azureFile <Object>
AzureFile represents an Azure File Service mount on the host and bind mount
to the pod.

cephfs <Object>
CephFS represents a Ceph FS mount on the host that shares a pod's lifetime

cinder <Object>
Cinder represents a cinder volume attached and mounted on kubelets host
machine. More info: https://examples.k8s.io/mysql-cinder-pd/README.md

configMap <Object>
ConfigMap represents a configMap that should populate this volume

csi <Object>
CSI (Container Storage Interface) represents storage that is handled by an
external CSI driver (Alpha feature).

downwardAPI <Object>
DownwardAPI represents downward API about the pod that should populate this
volume

emptyDir <Object>
EmptyDir represents a temporary directory that shares a pod's lifetime.
More info: https://kubernetes.io/docs/concepts/storage/volumes#emptydir

fc <Object>
FC represents a Fibre Channel resource that is attached to a kubelet's host
machine and then exposed to the pod.

flexVolume <Object>
FlexVolume represents a generic volume resource that is
provisioned/attached using an exec based plugin.

flocker <Object>
Flocker represents a Flocker volume attached to a kubelet's host machine.
This depends on the Flocker control service being running

gcePersistentDisk <Object>
GCEPersistentDisk represents a GCE Disk resource that is attached to a
kubelet's host machine and then exposed to the pod. More info:
https://kubernetes.io/docs/concepts/storage/volumes#gcepersistentdisk

gitRepo <Object>
GitRepo represents a git repository at a particular revision. DEPRECATED:
GitRepo is deprecated. To provision a container with a git repo, mount an
EmptyDir into an InitContainer that clones the repo using git, then mount
the EmptyDir into the Pod's container.

glusterfs <Object>
Glusterfs represents a Glusterfs mount on the host that shares a pod's
lifetime. More info: https://examples.k8s.io/volumes/glusterfs/README.md

hostPath <Object>
HostPath represents a pre-existing file or directory on the host machine
that is directly exposed to the container. This is generally used for
system agents or other privileged things that are allowed to see the host
machine. Most containers will NOT need this. More info:
https://kubernetes.io/docs/concepts/storage/volumes#hostpath

iscsi <Object>
ISCSI represents an ISCSI Disk resource that is attached to a kubelet's
host machine and then exposed to the pod. More info:
https://examples.k8s.io/volumes/iscsi/README.md

name <string> -required-
Volume's name. Must be a DNS_LABEL and unique within the pod. More info:
https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names

nfs <Object>
NFS represents an NFS mount on the host that shares a pod's lifetime More
info: https://kubernetes.io/docs/concepts/storage/volumes#nfs

persistentVolumeClaim <Object>
PersistentVolumeClaimVolumeSource represents a reference to a
PersistentVolumeClaim in the same namespace. More info:
https://kubernetes.io/docs/concepts/storage/persistent-volumes#persistentvolumeclaims

photonPersistentDisk <Object>
PhotonPersistentDisk represents a PhotonController persistent disk attached
and mounted on kubelets host machine

portworxVolume <Object>
PortworxVolume represents a portworx volume attached and mounted on
kubelets host machine

projected <Object>
Items for all in one resources secrets, configmaps, and downward API

quobyte <Object>
Quobyte represents a Quobyte mount on the host that shares a pod's lifetime

rbd <Object>
RBD represents a Rados Block Device mount on the host that shares a pod's
lifetime. More info: https://examples.k8s.io/volumes/rbd/README.md

scaleIO <Object>
ScaleIO represents a ScaleIO persistent volume attached and mounted on
Kubernetes nodes.

secret <Object>
Secret represents a secret that should populate this volume. More info:
https://kubernetes.io/docs/concepts/storage/volumes#secret

storageos <Object>
StorageOS represents a StorageOS volume attached and mounted on Kubernetes
nodes.

vsphereVolume <Object>
VsphereVolume represents a vSphere volume attached and mounted on kubelets
host machine

kubectl explain pod.spec.volumes

云存储

awsElasticBlockStorc
azureDisk
azurcFilc
gcePdrsistentDisk
vshpereVolume

分布式存储

cephfs
glusterfs
rbd

网络存储

NFS
iscsi
fc

临时存储

一旦Pod消失存储的数据也随之消失。

emptyDir
gitRepo (deprecated)

本地存储

无法跨K8s的Node节点进行数据保存。

hostPath：本地文件系统，无法跨节点
local：直接挂载硬盘设备

特殊存储

configMap：保存明文配置文件
secret：保存秘文的配置文件
downwardAPl：获取当前Pod的元数据信息

自定义存储

CSI：Container Storage Interface，基于此接口，自定义开发K8s和存储系统的驱动

持久卷申请

Persistent Volume Claim：统一的抽象层

5.2.本地存储

本地存储不支持Pod跨节点运行之后依然保存数据，除非Pod设置Node绑定；

hostPath

只能提供节点级别的数据持久性，一旦Pod被调度到其他Node，Pod就无法使用之前的历史数据；

apiVersion: v1
kind: Pod
metadata:
  name: myapp
  namespace: vol
  labels:
    app: myapp
spec:
  nodeName: node1
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
    volumeMounts: 
    - name: web-storage
      mountPath: /usr/share/nginx/html
      readOnly: true
  volumes:
  - name: web-storage
    hostPath:
      path: /volumes/myapp
      type: DirectoryOrCreate

emptydir

1个Pod中运行2个容器，这2个容器通过emptydir类型的存储卷进行数据交换，Pod消失数据也被删除；

apiVersion: v1
kind: Pod
metadata:
  name: vol-emptydir-pod
spec:
  volumes:
  - name: html
    emptyDir: {}
  containers:
  - name: nginx
    image: nginx:1.12-alpine
    volumeMounts:
    - name: html
      mountPath: /usr/share/nginx/html
  - name: pagegen
    image: alpine
    volumeMounts:
    - name: html
      mountPath: /html
    command: ["/bin/sh", "-c"]
    args:
    - while true; do
        echo $(hostname) $(date) >> /html/index.html;
        sleep 10;
      done

5.3.特殊存储

应用程序配容器化之后，通常通过环境变量的方式配置Pod中的容器，弊端是一旦配置信息变动，就必须重启容器，才可以重新加载到最新的配置，无法实现容器中应用程序配置的热更新；

K8s的ConfigMap/Secret资源是K8s资源中的First-Class-Citizen，是可以被Container所挂载的资源，ConfigMap/Secre对象就是Pod的集中配置中心；

一旦Container挂载了ConfigMap/Secret资源，就会定期热加载ConfigMap/Secret中的配置信息，从而实现容器中应用热更新的目的；

ConfigMap

[root@kubemaster001.eq-sg-2.aliyun.com zhanggen]# kubectl get configmap -n ingress-nginx
NAME                              DATA   AGE
ingress-controller-leader-nginx   0      4y94d
nginx-configuration               8      4y94d
prometheus-configuration          1      4y94d
[root@kubemaster001.eq-sg-2.aliyun.com zhanggen]# kubectl -n ingress-nginx edit configmap nginx-configuration
Edit cancelled, no changes made.
[root@kubemaster001.eq-sg-2.aliyun.com zhanggen]#

Secret

存储密码信息

downwardAPI

为了在容器内获取Pod级别的信息，Kubernetes提供了DownwardAPI机制来将Pod和容器的某些元数据信息注入容器环境内，供容器应用方便地使用。

Downward API可以通过以下两种方式将Pod和容器的元数据信息注入容器内部:

环境变量：将Pod或Container信息设置为容器内的环境变量；
Volume挂载：将Pod或Container信息以文件的形式挂载到容器内部。

5.4.持久卷申请

我们可以在定义Pod的时候，使用sepc.volume字段可以直接指定外置存储系统的连接方式，但是这样无疑增加了K8s使用存储卷的复杂度，

如果当前使用外置存储是Ceph这样复杂的外置存储系统，需要指定rbd的monitor、pool、image、user....会给K8s的使用人员造成很高的使用门槛；

为了简化在K8s中使用存储资源的配置，可以把各种外置存储系统提供的存储资源抽象为PV

存储团队：负责维护外置存储系统
K8S管理员：负责配置连接外置存储资源的PV
开发人员：负责通过PVC占用PV，具体方法为在某1个namespace中定义1PVC占用1个PV（多租户数据隔离）；

持久卷存储（Persistent Volume Storage）就是把外置的存储资源抽象成PV、PVC的模式，为K8s集群中Pod动态/静态供给存储资源；

与本地存储不同，本地存储支持Pod跨节点运行之后依然保存数据，除非Pod设置Node绑定；

PV

Persistent Volume是外置存储设备可以提供给PVC占用的最小逻辑存储单元；

PV属于集群资源，不属于任何namespace；

PV的回收策略：PV绑定的PVC被删除之后，PV如何处理？Delet删除策略/Recycle(Depreated)/Retain保留策略

PV的mode

PVC申请占用PV之后，限制Pause容器（客户端）可以通过PVC（代理），在PV（服务端）上执行哪些权限的操作；

ReadWriteOnce：单路读、写
ReadWriteMany：多路读、写
ReadOnlyMany：多路只读

PV的生命周期

Provisionging：分为静态供给和动态供给，取决于存储系统的API；
Binding: PVC和PV的1对1绑定建立成功；
Using: Pause容器通过PVC使用PV存储资源；
Reclaim: 占用PV存储资源的PVC被删除之后，PV被释放进入Reclaim状态，此后就可以被其他PVC所继续占用；

PVC

Persistent Volum Claim属于集群级别的资源，不属于任何Namespace，而Pod属于Namespace资源；
如果Pod想要使用PV就必须把PV注册到namespace资源当中去，当PV注册到某1个namespace中之后叫做PVC；
此后Pod就通过PVC去使用PV；

CS

StorageClass，可定义连接复杂存储设备（ceph）的参数， StorageClass能够根据PVC的要求，动态、自动创建出符合PVC需求的PV，实现了动态的PV资源供应，但是需要存储设备API的支持；

K8s日志收集方案

Kubernetes的三种外部访问方式：NodePort、LoadBalancer和Ingress_手机搜狐网 (sohu.com)

posted on 2022-11-09 16:58 Martin8866 阅读(343) 评论(0) 编辑收藏举报

刷新页面返回顶部