Kubernetes的pod控制器及ReplicaSet控制器类型的pod的定义
为什么需要Pod
Kubernetes项目之所以这么做的原因;
因为Kubernetes是谷歌公司基于Borg项目做出来的,谷歌工程师发现,他们部署的应用往往存在这进程与进程组的关系。具体说呢,就是这些应用之间有着密切的协作关系,使得他们必须部署在同一台机器上
而如果事先没有组的概念,像这样的运维关系很难处理;举个例子
rsyslogd是由3个进程组成的:一个imklog模块,一个imuxsock模块,一个rsyslogd自己的9函数主进程。这三个进程一定要运行在同一台机器上否则,他们之间基于 Socket的通信和文件交换,都会出现问题;现在,我要把rsyslogd这个应用给容器化,由于受限于的单进程模型,这三个模块必须被分别制作成三个不同容器运行,他们设置的内存配额都是1GB
注意:强调一下容器的“单进程模型”,并不是指容器里只能运行一个进程,而是容器没有管理多个进程的能力。这是因为容器里PID=1的进程就是应用本身,其他进程都是这个PID=1进程的子进程。可是,用户编写的应用,并不能够像正常操作系统里的init进程或者systemd那样拥有进程管理功能。比如,你的应用是一个java Web程序(PID=1),然后你执行docker exec在后台启动了一个nginx进程(PID=3)。可是,当nginx进程异常退出的时候,你怎么知道呢?这个进程退出后的垃圾收集工作,由谁做
假设我们Kubernetes集群上有两个节点:node-1上有3GB可用内存,node-2上有2.5GB可用内存
这时,假设我要用Docker Swarm来运行这个rsyslogd程序。为了能够让着三个容器都运行在同一台机器上,就必须在两个容器上设置一个affinity=main(与main容器有亲密性)的约束,即:它俩必须和main容器运行在同一台机器上
然后,我依次执行:“docker run main” “docker run imklog” 和“docker run imuxsock”,创建这三个容器;这样,这三个容器都进入Swarm的代调度队列。然后,main容器和imklog容器都先后出队列并被调度到node-2节点上(这个情况完全有可能的)
可是,当imuxsock容器出队列调度时,Swarm就有的懵了:node-2上的可用资源只有0.5GB了,并不足运行imuxsock容器;可是,根据affinity=main的约束,imuxsock容器有只能运行在node-2上;这就是一个典型的成组调度没有被妥善处理的例子
工业界与学术界,关于这个问题的讨论可谓旷日持久,也产生很多可选方案
比如,Mesos中就有一个资源囤积的机制,会在所有设置了Affinity约束的任务都到达时,才开始对它们统一进行调度。而谷歌在Omege论文中提出使用乐观调度处理冲突方法,即:先不管这些冲突,而是通过精心设计的回滚机制在出现冲突之后解决
以上的方法都谈不上完美。资源囤积带来了不可避免的调度效率失所与死锁的可能性;而乐观调度的复杂程度,不是常规技术团结队所能驾驭的。
但是,到了Kubernetes项目里,这样的问题迎刃而解:Pod是Kubernetes的最小调度单位,这就意味着,Kubernetes项目在调度时,自然就会去选择可用内存等于3GB的node-1节点进行绑定,而根本就不会考虑nod-2
像这样的容器间紧密协作,我们称为“超亲密关系”。这些具有“超亲密关系”容器的典型特征包括但不限于:互相之间发生直接的文件交换,使用localhost或者Socket文件进行本地通信、会发生非常频繁的远程调用、需要共享这些Linux Namespace(比如,一个容器要加入另一个容器的Linux Namespace)等等
这也就意味着,并不是所有有关系的容器都属于同一个Pod。比如,PHP容器和Mysql虽然会发生访问关系,但并不需要、也不应该部署同一个Pod里,更适合做成两个pod
如果只是处理这种超亲密关系这样的调度问题,有Borg和Omega论文珠玉在前,Kubernetes项目肯定可以在调度器层面把它解决掉
不过,pod在Kubernetes项目里还有更重要的意义,那就是容器设计模式
为理解这一层含义,就必须介绍一下Pod的实现原理
首先关于Pod最重要的一个事实是:它就是一个逻辑概念;也就是说Kubernetes真正处理的,还是宿主机操作系统是上的Linux容器的Namespace与Cgroups,而并不存在所谓的Pod边界或者隔离排环境;pod其实就是一组共享某些资源的容器
具体说:pod里所有容器都是共享同一个Network Namespace,并且可以声明挂载同一个Volume
那这么来看的话,一个有A、B两个容器的pod,不就等同一个容器(容器A)共享另一个容器(容器B)的网络和Volume的玩法了;这个好像通过 Docker run --net --volumes-from这样的命令就可以实现
docker run --net=B --volumes-from=B --name=A image-A ...
但是,你没有考虑过,如果真的这样的话,容器B就必须比容器A先启动,这样一个Pod里的多个容器就不是对等关系,而是拓扑关系;
所以在Kubernetes项目里,pod的实现需要一个中间容器,这个容器叫Infra容器。在这个pod中,infra容器永远都是第一个被创建出来的容器,而其他用户定义的容器,则通过join Network Namespace 的方式,与Infra容器关联在一起的
这个pod 里有两个用户容器A和B,还有一个Infra容器,很容易理解,在Kubernetes项目里,Infra容器一定占有少量的资源,所有它使用的是一个非常特殊的镜像叫做:k8s.gcr.io/pause。这个镜像时一个汇编语言编写的、永远处于暂停装态的容器,解压后的大小也只有100~200kb左右
而Infra容器Hold住network Namespace后,用户容器
pod 控制器
ReplicaSet:代用户创建指定数量的副本,并控制副本数量一直处于用户期望的数量状态;多退少补,支持滚动更新;自动扩缩容机制;不建议直接使用
顶级帮助
[root@master manifests]# kubectl explain rs KIND: ReplicaSet VERSION: extensions/v1beta1 DESCRIPTION: DEPRECATED - This group version of ReplicaSet is deprecated by apps/v1beta2/ReplicaSet. See the release notes for more information. ReplicaSet ensures that a specified number of pod replicas are running at any given time. FIELDS: apiVersion <string> APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#resources kind <string> Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kinds metadata <Object> 控制器元数据 If the Labels of a ReplicaSet are empty, they are defaulted to be the same as the Pod(s) that the ReplicaSet manages. Standard object's metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata spec <Object> 控制器的定义 Spec defines the specification of the desired behavior of the ReplicaSet. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status status <Object> Status is the most recently observed status of the ReplicaSet. This data may be out of date by some window of time. Populated by the system. Read-only. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
控制器元数据定义参数
[root@master manifests]# kubectl explain rs.metadata KIND: ReplicaSet VERSION: extensions/v1beta1 RESOURCE: metadata <Object> DESCRIPTION: If the Labels of a ReplicaSet are empty, they are defaulted to be the same as the Pod(s) that the ReplicaSet manages. Standard object's metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata ObjectMeta is metadata that all persisted resources must have, which includes all objects users must create. FIELDS: annotations <map[string]string> Annotations is an unstructured key value map stored with a resource that may be set by external tools to store and retrieve arbitrary metadata. They are not queryable and should be preserved when modifying objects. More info: http://kubernetes.io/docs/user-guide/annotations clusterName <string> The name of the cluster which the object belongs to. This is used to distinguish resources with same name and namespace in different clusters. This field is not set anywhere right now and apiserver is going to ignore it if set in create or update request. creationTimestamp <string> CreationTimestamp is a timestamp representing the server time when this object was created. It is not guaranteed to be set in happens-before order across separate operations. Clients may not set this value. It is represented in RFC3339 form and is in UTC. Populated by the system. Read-only. Null for lists. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata deletionGracePeriodSeconds <integer> Number of seconds allowed for this object to gracefully terminate before it will be removed from the system. Only set when deletionTimestamp is also set. May only be shortened. Read-only. deletionTimestamp <string> DeletionTimestamp is RFC 3339 date and time at which this resource will be deleted. This field is set by the server when a graceful deletion is requested by the user, and is not directly settable by a client. The resource is expected to be deleted (no longer visible from resource lists, and not reachable by name) after the time in this field, once the finalizers list is empty. As long as the finalizers list contains items, deletion is blocked. Once the deletionTimestamp is set, this value may not be unset or be set further into the future, although it may be shortened or the resource may be deleted prior to this time. For example, a user may request that a pod is deleted in 30 seconds. The Kubelet will react by sending a graceful termination signal to the containers in the pod. After that 30 seconds, the Kubelet will send a hard termination signal (SIGKILL) to the container and after cleanup, remove the pod from the API. In the presence of network partitions, this object may still exist after this timestamp, until an administrator or automated process can determine the resource is fully terminated. If not set, graceful deletion of the object has not been requested. Populated by the system when a graceful deletion is requested. Read-only. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata finalizers <[]string> Must be empty before the object is deleted from the registry. Each entry is an identifier for the responsible component that will remove the entry from the list. If the deletionTimestamp of the object is non-nil, entries in this list can only be removed. generateName <string> GenerateName is an optional prefix, used by the server, to generate a unique name ONLY IF the Name field has not been provided. If this field is used, the name returned to the client will be different than the name passed. This value will also be combined with a unique suffix. The provided value has the same validation rules as the Name field, and may be truncated by the length of the suffix required to make the value unique on the server. If this field is specified and the generated name exists, the server will NOT return a 409 - instead, it will either return 201 Created or 500 with Reason ServerTimeout indicating a unique name could not be found in the time allotted, and the client should retry (optionally after the time indicated in the Retry-After header). Applied only if Name is not specified. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#idempotency generation <integer> A sequence number representing a specific generation of the desired state. Populated by the system. Read-only. initializers <Object> An initializer is a controller which enforces some system invariant at object creation time. This field is a list of initializers that have not yet acted on this object. If nil or empty, this object has been completely initialized. Otherwise, the object is considered uninitialized and is hidden (in list/watch and get calls) from clients that haven't explicitly asked to observe uninitialized objects. When an object is created, the system will populate this list with the current set of initializers. Only privileged users may set or modify this list. Once it is empty, it may not be modified further by any user. DEPRECATED - initializers are an alpha field and will be removed in v1.15. labels <map[string]string> Map of string keys and values that can be used to organize and categorize (scope and select) objects. May match selectors of replication controllers and services. More info: http://kubernetes.io/docs/user-guide/labels managedFields <[]Object> ManagedFields maps workflow-id and version to the set of fields that are managed by that workflow. This is mostly for internal housekeeping, and users typically shouldn't need to set or understand this field. A workflow can be the user's name, a controller's name, or the name of a specific apply path like "ci-cd". The set of fields is always in the version that the workflow used when modifying the object. This field is alpha and can be changed or removed without notice. name <string> 名字 Name must be unique within a namespace. Is required when creating resources, although some resources may allow a client to request the generation of an appropriate name automatically. Name is primarily intended for creation idempotence and configuration definition. Cannot be updated. More info: http://kubernetes.io/docs/user-guide/identifiers#names namespace <string> 属于哪个名称空间 Namespace defines the space within each name must be unique. An empty namespace is equivalent to the "default" namespace, but "default" is the canonical representation. Not all objects are required to be scoped to a namespace - the value of this field for those objects will be empty. Must be a DNS_LABEL. Cannot be updated. More info: http://kubernetes.io/docs/user-guide/namespaces ownerReferences <[]Object> List of objects depended by this object. If ALL objects in the list have been deleted, this object will be garbage collected. If this object is managed by a controller, then an entry in this list will point to this controller, with the controller field set to true. There cannot be more than one managing controller. resourceVersion <string> An opaque value that represents the internal version of this object that can be used by clients to determine when objects have changed. May be used for optimistic concurrency, change detection, and the watch operation on a resource or set of resources. Clients must treat these values as opaque and passed unmodified back to the server. They may only be valid for a particular resource or set of resources. Populated by the system. Read-only. Value must be treated as opaque by clients and . More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#concurrency-control-and-consistency selfLink <string> SelfLink is a URL representing this object. Populated by the system. Read-only. uid <string> UID is the unique in time and space value for this object. It is typically generated by the server on successful creation of a resource and is not allowed to change on PUT operations. Populated by the system. Read-only. More info: http://kubernetes.io/docs/user-guide/identifiers#uids
控制器状态定义参数
[root@master manifests]# kubectl explain rs.spec KIND: ReplicaSet VERSION: extensions/v1beta1 RESOURCE: spec <Object> DESCRIPTION: Spec defines the specification of the desired behavior of the ReplicaSet. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status ReplicaSetSpec is the specification of a ReplicaSet. FIELDS: minReadySeconds <integer> Minimum number of seconds for which a newly created pod should be ready without any of its container crashing, for it to be considered available. Defaults to 0 (pod will be considered available as soon as it is ready) replicas <integer> 副本个数 Replicas is the number of desired replicas. This is a pointer to distinguish between explicit zero and unspecified. Defaults to 1. More info: https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller/#what-is-a-replicationcontroller selector <Object> 标签选择器 Selector is a label query over pods that should match the replica count. If the selector is empty, it is defaulted to the labels present on the pod template. Label keys and values that must match in order to be controlled by this replica set. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors template <Object> pod的定义 Template is the object that describes the pod that will be created if insufficient replicas are detected. More info: https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller#pod-template
pod的定义参数介绍
[root@master manifests]# kubectl explain rs.spec.template KIND: ReplicaSet VERSION: extensions/v1beta1 RESOURCE: template <Object> DESCRIPTION: Template is the object that describes the pod that will be created if insufficient replicas are detected. More info: https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller#pod-template PodTemplateSpec describes the data a pod should have when created from a template FIELDS: metadata <Object> 元数据 Standard object's metadata. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata spec <Object> pod 的目标状态定义 Specification of the desired behavior of the pod. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status
pod元数据的定义
[root@master manifests]# kubectl explain rs.spec.template.metadata KIND: ReplicaSet VERSION: extensions/v1beta1 RESOURCE: metadata <Object> DESCRIPTION: Standard object's metadata. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata ObjectMeta is metadata that all persisted resources must have, which includes all objects users must create. FIELDS: annotations <map[string]string> Annotations is an unstructured key value map stored with a resource that may be set by external tools to store and retrieve arbitrary metadata. They are not queryable and should be preserved when modifying objects. More info: http://kubernetes.io/docs/user-guide/annotations clusterName <string> The name of the cluster which the object belongs to. This is used to distinguish resources with same name and namespace in different clusters. This field is not set anywhere right now and apiserver is going to ignore it if set in create or update request. creationTimestamp <string> CreationTimestamp is a timestamp representing the server time when this object was created. It is not guaranteed to be set in happens-before order across separate operations. Clients may not set this value. It is represented in RFC3339 form and is in UTC. Populated by the system. Read-only. Null for lists. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata deletionGracePeriodSeconds <integer> Number of seconds allowed for this object to gracefully terminate before it will be removed from the system. Only set when deletionTimestamp is also set. May only be shortened. Read-only. deletionTimestamp <string> DeletionTimestamp is RFC 3339 date and time at which this resource will be deleted. This field is set by the server when a graceful deletion is requested by the user, and is not directly settable by a client. The resource is expected to be deleted (no longer visible from resource lists, and not reachable by name) after the time in this field, once the finalizers list is empty. As long as the finalizers list contains items, deletion is blocked. Once the deletionTimestamp is set, this value may not be unset or be set further into the future, although it may be shortened or the resource may be deleted prior to this time. For example, a user may request that a pod is deleted in 30 seconds. The Kubelet will react by sending a graceful termination signal to the containers in the pod. After that 30 seconds, the Kubelet will send a hard termination signal (SIGKILL) to the container and after cleanup, remove the pod from the API. In the presence of network partitions, this object may still exist after this timestamp, until an administrator or automated process can determine the resource is fully terminated. If not set, graceful deletion of the object has not been requested. Populated by the system when a graceful deletion is requested. Read-only. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata finalizers <[]string> Must be empty before the object is deleted from the registry. Each entry is an identifier for the responsible component that will remove the entry from the list. If the deletionTimestamp of the object is non-nil, entries in this list can only be removed. generateName <string> GenerateName is an optional prefix, used by the server, to generate a unique name ONLY IF the Name field has not been provided. If this field is used, the name returned to the client will be different than the name passed. This value will also be combined with a unique suffix. The provided value has the same validation rules as the Name field, and may be truncated by the length of the suffix required to make the value unique on the server. If this field is specified and the generated name exists, the server will NOT return a 409 - instead, it will either return 201 Created or 500 with Reason ServerTimeout indicating a unique name could not be found in the time allotted, and the client should retry (optionally after the time indicated in the Retry-After header). Applied only if Name is not specified. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#idempotency generation <integer> A sequence number representing a specific generation of the desired state. Populated by the system. Read-only. initializers <Object> An initializer is a controller which enforces some system invariant at object creation time. This field is a list of initializers that have not yet acted on this object. If nil or empty, this object has been completely initialized. Otherwise, the object is considered uninitialized and is hidden (in list/watch and get calls) from clients that haven't explicitly asked to observe uninitialized objects. When an object is created, the system will populate this list with the current set of initializers. Only privileged users may set or modify this list. Once it is empty, it may not be modified further by any user. DEPRECATED - initializers are an alpha field and will be removed in v1.15. labels <map[string]string> 标签定义 Map of string keys and values that can be used to organize and categorize (scope and select) objects. May match selectors of replication controllers and services. More info: http://kubernetes.io/docs/user-guide/labels managedFields <[]Object> ManagedFields maps workflow-id and version to the set of fields that are managed by that workflow. This is mostly for internal housekeeping, and users typically shouldn't need to set or understand this field. A workflow can be the user's name, a controller's name, or the name of a specific apply path like "ci-cd". The set of fields is always in the version that the workflow used when modifying the object. This field is alpha and can be changed or removed without notice. name <string> Name must be unique within a namespace. Is required when creating resources, although some resources may allow a client to request the generation of an appropriate name automatically. Name is primarily intended for creation idempotence and configuration definition. Cannot be updated. More info: http://kubernetes.io/docs/user-guide/identifiers#names namespace <string> Namespace defines the space within each name must be unique. An empty namespace is equivalent to the "default" namespace, but "default" is the canonical representation. Not all objects are required to be scoped to a namespace - the value of this field for those objects will be empty. Must be a DNS_LABEL. Cannot be updated. More info: http://kubernetes.io/docs/user-guide/namespaces ownerReferences <[]Object> List of objects depended by this object. If ALL objects in the list have been deleted, this object will be garbage collected. If this object is managed by a controller, then an entry in this list will point to this controller, with the controller field set to true. There cannot be more than one managing controller. resourceVersion <string> An opaque value that represents the internal version of this object that can be used by clients to determine when objects have changed. May be used for optimistic concurrency, change detection, and the watch operation on a resource or set of resources. Clients must treat these values as opaque and passed unmodified back to the server. They may only be valid for a particular resource or set of resources. Populated by the system. Read-only. Value must be treated as opaque by clients and . More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#concurrency-control-and-consistency selfLink <string> SelfLink is a URL representing this object. Populated by the system. Read-only. uid <string> UID is the unique in time and space value for this object. It is typically generated by the server on successful creation of a resource and is not allowed to change on PUT operations. Populated by the system. Read-only. More info: http://kubernetes.io/docs/user-guide/identifiers#uids
pod的状态定义的参数
[root@master manifests]# kubectl explain rs.spec.template.spec KIND: ReplicaSet VERSION: extensions/v1beta1 RESOURCE: spec <Object> DESCRIPTION: Specification of the desired behavior of the pod. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status PodSpec is a description of a pod. FIELDS: activeDeadlineSeconds <integer> Optional duration in seconds the pod may be active on the node relative to StartTime before the system will actively try to mark it failed and kill associated containers. Value must be a positive integer. affinity <Object> If specified, the pod's scheduling constraints automountServiceAccountToken <boolean> AutomountServiceAccountToken indicates whether a service account token should be automatically mounted. containers <[]Object> -required- 容器的定义 List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated. dnsConfig <Object> Specifies the DNS parameters of a pod. Parameters specified here will be merged to the generated DNS configuration based on DNSPolicy. dnsPolicy <string> Set DNS policy for the pod. Defaults to "ClusterFirst". Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'. DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy. To have DNS options set along with hostNetwork, you have to specify DNS policy explicitly to 'ClusterFirstWithHostNet'. enableServiceLinks <boolean> EnableServiceLinks indicates whether information about services should be injected into pod's environment variables, matching the syntax of Docker links. Optional: Defaults to true. hostAliases <[]Object> HostAliases is an optional list of hosts and IPs that will be injected into the pod's hosts file if specified. This is only valid for non-hostNetwork pods. hostIPC <boolean> Use the host's ipc namespace. Optional: Default to false. hostNetwork <boolean> Host networking requested for this pod. Use the host's network namespace. If this option is set, the ports that will be used must be specified. Default to false. hostPID <boolean> Use the host's pid namespace. Optional: Default to false. hostname <string> Specifies the hostname of the Pod If not specified, the pod's hostname will be set to a system-defined value. imagePullSecrets <[]Object> ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec. If specified, these secrets will be passed to individual puller implementations for them to use. For example, in the case of docker, only DockerConfig type secrets are honored. More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod initContainers <[]Object> List of initialization containers belonging to the pod. Init containers are executed in order prior to containers being started. If any init container fails, the pod is considered to have failed and is handled according to its restartPolicy. The name for an init container or normal container must be unique among all containers. Init containers may not have Lifecycle actions, Readiness probes, or Liveness probes. The resourceRequirements of an init container are taken into account during scheduling by finding the highest request/limit for each resource type, and then using the max of of that value or the sum of the normal containers. Limits are applied to init containers in a similar fashion. Init containers cannot currently be added or removed. Cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ nodeName <string> NodeName is a request to schedule this pod onto a specific node. If it is non-empty, the scheduler simply schedules this pod onto that node, assuming that it fits resource requirements. nodeSelector <map[string]string> NodeSelector is a selector which must be true for the pod to fit on a node. Selector which must match a node's labels for the pod to be scheduled on that node. More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ preemptionPolicy <string> PreemptionPolicy is the Policy for preempting pods with lower priority. One of Never, PreemptLowerPriority. Defaults to PreemptLowerPriority if unset. This field is alpha-level and is only honored by servers that enable the NonPreemptingPriority feature. priority <integer> The priority value. Various system components use this field to find the priority of the pod. When Priority Admission Controller is enabled, it prevents users from setting this field. The admission controller populates this field from PriorityClassName. The higher the value, the higher the priority. priorityClassName <string> If specified, indicates the pod's priority. "system-node-critical" and "system-cluster-critical" are two special keywords which indicate the highest priorities with the former being the highest priority. Any other name must be defined by creating a PriorityClass object with that name. If not specified, the pod priority will be default or zero if there is no default. readinessGates <[]Object> If specified, all readiness gates will be evaluated for pod readiness. A pod is ready when all its containers are ready AND all conditions specified in the readiness gates have status equal to "True" More info: https://git.k8s.io/enhancements/keps/sig-network/0007-pod-ready%2B%2B.md restartPolicy <string> Restart policy for all containers within the pod. One of Always, OnFailure, Never. Default to Always. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy runtimeClassName <string> RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run this pod. If no RuntimeClass resource matches the named class, the pod will not be run. If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an empty definition that uses the default runtime handler. More info: https://git.k8s.io/enhancements/keps/sig-node/runtime-class.md This is a beta feature as of Kubernetes v1.14. schedulerName <string> If specified, the pod will be dispatched by specified scheduler. If not specified, the pod will be dispatched by default scheduler. securityContext <Object> SecurityContext holds pod-level security attributes and common container settings. Optional: Defaults to empty. See type description for default values of each field. serviceAccount <string> DeprecatedServiceAccount is a depreciated alias for ServiceAccountName. Deprecated: Use serviceAccountName instead. serviceAccountName <string> ServiceAccountName is the name of the ServiceAccount to use to run this pod. More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/ shareProcessNamespace <boolean> Share a single process namespace between all of the containers in a pod. When this is set containers will be able to view and signal processes from other containers in the same pod, and the first process in each container will not be assigned PID 1. HostPID and ShareProcessNamespace cannot both be set. Optional: Default to false. This field is beta-level and may be disabled with the PodShareProcessNamespace feature. subdomain <string> If specified, the fully qualified Pod hostname will be "<hostname>.<subdomain>.<pod namespace>.svc.<cluster domain>". If not specified, the pod will not have a domainname at all. terminationGracePeriodSeconds <integer> Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request. Value must be non-negative integer. The value zero indicates delete immediately. If this value is nil, the default grace period will be used instead. The grace period is the duration in seconds after the processes running in the pod are sent a termination signal and the time when the processes are forcibly halted with a kill signal. Set this value longer than the expected cleanup time for your process. Defaults to 30 seconds. tolerations <[]Object> If specified, the pod's tolerations. volumes <[]Object> List of volumes that can be mounted by containers belonging to the pod. More info: https://kubernetes.io/docs/concepts/storage/volumes
pod的容器的相关定义
[root@master manifests]# kubectl explain rs.spec.template.spec.containers KIND: ReplicaSet VERSION: extensions/v1beta1 RESOURCE: containers <[]Object> DESCRIPTION: List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated. A single application container that you want to run within a pod. FIELDS: args <[]string> Arguments to the entrypoint. The docker image's CMD is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container's environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. Cannot be updated. More info: https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#running-a-command-in-a-shell command <[]string> Entrypoint array. Not executed within a shell. The docker image's ENTRYPOINT is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container's environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. Cannot be updated. More info: https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#running-a-command-in-a-shell env <[]Object> 容器里变量定义 List of environment variables to set in the container. Cannot be updated. envFrom <[]Object> List of sources to populate environment variables in the container. The keys defined within a source must be a C_IDENTIFIER. All invalid keys will be reported as an event when the container is starting. When a key exists in multiple sources, the value associated with the last source will take precedence. Values defined by an Env with a duplicate key will take precedence. Cannot be updated. image <string> 使用的容器镜像 Docker image name. More info: https://kubernetes.io/docs/concepts/containers/images This field is optional to allow higher level config management to default or override container images in workload controllers like Deployments and StatefulSets. imagePullPolicy <string> 获取镜像的策略 Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always if :latest tag is specified, or IfNotPresent otherwise. Cannot be updated. More info: https://kubernetes.io/docs/concepts/containers/images#updating-images lifecycle <Object> Actions that the management system should take in response to container lifecycle events. Cannot be updated. livenessProbe <Object> Periodic probe of container liveness. Container will be restarted if the probe fails. Cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes name <string> -required- 容器名字 Name of the container specified as a DNS_LABEL. Each container in a pod must have a unique name (DNS_LABEL). Cannot be updated. ports <[]Object> 暴露端口的参数 List of ports to expose from the container. Exposing a port here gives the system additional information about the network connections a container uses, but is primarily informational. Not specifying a port here DOES NOT prevent that port from being exposed. Any port which is listening on the default "0.0.0.0" address inside a container will be accessible from the network. Cannot be updated. readinessProbe <Object> Periodic probe of container service readiness. Container will be removed from service endpoints if the probe fails. Cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes resources <Object> Compute Resources required by this container. Cannot be updated. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ securityContext <Object> Security options the pod should run with. More info: https://kubernetes.io/docs/concepts/policy/security-context/ More info: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ stdin <boolean> Whether this container should allocate a buffer for stdin in the container runtime. If this is not set, reads from stdin in the container will always result in EOF. Default is false. stdinOnce <boolean> Whether the container runtime should close the stdin channel after it has been opened by a single attach. When stdin is true the stdin stream will remain open across multiple attach sessions. If stdinOnce is set to true, stdin is opened on container start, is empty until the first client attaches to stdin, and then remains open and accepts data until the client disconnects, at which time stdin is closed and remains closed until the container is restarted. If this flag is false, a container processes that reads from stdin will never receive an EOF. Default is false terminationMessagePath <string> Optional: Path at which the file to which the container's termination message will be written is mounted into the container's filesystem. Message written is intended to be brief final status, such as an assertion failure message. Will be truncated by the node if greater than 4096 bytes. The total message length across all containers will be limited to 12kb. Defaults to /dev/termination-log. Cannot be updated. terminationMessagePolicy <string> Indicate how the termination message should be populated. File will use the contents of terminationMessagePath to populate the container status message on both success and failure. FallbackToLogsOnError will use the last chunk of container log output if the termination message file is empty and the container exited with an error. The log output is limited to 2048 bytes or 80 lines, whichever is smaller. Defaults to File. Cannot be updated. tty <boolean> Whether this container should allocate a TTY for itself, also requires 'stdin' to be true. Default is false. volumeDevices <[]Object> volumeDevices is the list of block devices to be used by the container. This is a beta feature. volumeMounts <[]Object> Pod volumes to mount into the container's filesystem. Cannot be updated. workingDir <string> Container's working directory. If not specified, the container runtime's default will be used, which might be configured in the container image. Cannot be updated.
pod里容器暴露端口的参数
[root@master manifests]# kubectl explain rs.spec.template.spec.containers.ports KIND: ReplicaSet VERSION: extensions/v1beta1 RESOURCE: ports <[]Object> DESCRIPTION: List of ports to expose from the container. Exposing a port here gives the system additional information about the network connections a container uses, but is primarily informational. Not specifying a port here DOES NOT prevent that port from being exposed. Any port which is listening on the default "0.0.0.0" address inside a container will be accessible from the network. Cannot be updated. ContainerPort represents a network port in a single container. FIELDS: containerPort <integer> -required- 容器里的端口 Number of port to expose on the pod's IP address. This must be a valid port number, 0 < x < 65536. hostIP <string> What host IP to bind the external port to. hostPort <integer> Number of port to expose on the host. If specified, this must be a valid port number, 0 < x < 65536. If HostNetwork is specified, this must match ContainerPort. Most containers do not need this. name <string> 名字 If specified, this must be an IANA_SVC_NAME and unique within the pod. Each named port in a pod must have a unique name. Name for the port that can be referred to by services. protocol <string> 协议 Protocol for port. Must be UDP, TCP, or SCTP. Defaults to "TCP".
pod容器状态探针的定义
[root@master manifests]# kubectl explain rs.spec.template.spec.containers.livenessProbe KIND: ReplicaSet VERSION: extensions/v1beta1 RESOURCE: livenessProbe <Object> DESCRIPTION: Periodic probe of container liveness. Container will be restarted if the probe fails. Cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes Probe describes a health check to be performed against a container to determine whether it is alive or ready to receive traffic. FIELDS: exec <Object> 使用命令 One and only one of the following should be specified. Exec specifies the action to take. failureThreshold <integer> Minimum consecutive failures for the probe to be considered failed after having succeeded. Defaults to 3. Minimum value is 1. httpGet <Object> 使用http HTTPGet specifies the http request to perform. initialDelaySeconds <integer> Number of seconds after the container has started before liveness probes are initiated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes periodSeconds <integer> How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1. successThreshold <integer> Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. Minimum value is 1. tcpSocket <Object> 使用tcp TCPSocket specifies an action involving a TCP port. TCP hooks not yet supported timeoutSeconds <integer> Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
编写一个rs控制器的yaml文件,并启动pod
[root@master manifests]# cat rs-01.yaml apiVersion: apps/v1 #API的版本 kind: ReplicaSet #控制器对象 metadata: #控制器元数据 name: rs-myapp #控制的名字 namespace: default #控制器的名称空间 spec: #期望状态定义 replicas: 3 #期望的副本数量 selector: #便签选择器定义 matchLabels: #使用哪个标签选择器 app: rs-cx #标签的定义 rs: cx #标签定义 template: pod定义 metadata: pod 元数据定义 labels: 定义pod 标签 app: rs-cx rs: cx spec: pod期望状态定义 containers: 容器定义 - name: myapp-rs 容器的名字 image: ikubernetes/myapp:v1 镜像的定义 ports: 暴露端口的定义 - name: http 端口名字定义 containerPort: 80 容器里暴露的端口
创建这个控制器类型的pod
kubectl create -f rs-01.yaml 查看创建pod kubectl get pods -o wide -l rs=cx NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES rs-myapp-f5pg7 1/1 Running 0 168m 10.244.1.43 node01 <none> <none> rs-myapp-wzjkz 1/1 Running 0 168m 10.244.1.45 node01 <none> <none> rs-myapp-z6kx4 1/1 Running 0 168m 10.244.2.21 node02 <none> <none>
删除一个pod 自动创建
[root@master manifests]# kubectl delete pods rs-myapp-z6kx4 pod "rs-myapp-z6kx4" deleted [root@master manifests]# kubectl get pods -o wide -l rs=cx NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES rs-myapp-f5pg7 1/1 Running 0 170m 10.244.1.43 node01 <none> <none> rs-myapp-pmvw8 1/1 Running 0 9s 10.244.2.23 node02 <none> <none> rs-myapp-wzjkz 1/1 Running 0 170m 10.244.1.45 node01 <none> <none>
查看创建的pod 的详细信息
[root@master manifests]# kubectl describe pods rs-myapp-wzjkz Name: rs-myapp-wzjkz Namespace: default Priority: 0 Node: node01/192.168.183.12 运行在呢个节点 Start Time: Sat, 10 Aug 2019 13:01:49 +0800 Labels: app=rs-cx 标签 rs=cx Annotations: <none> Status: Running 状态 IP: 10.244.1.45 pod的IP地址 Controlled By: ReplicaSet/rs-myapp Containers: myapp-rs: Container ID: docker://42b4318ab99e8d36aa5716ae8fa459ceb70adf4b68f4d1ebbbbcd79527457175 Image: ikubernetes/myapp:v1 镜像 Image ID: docker-pullable://ikubernetes/myapp@sha256:9c3dc30b5219788b2b8a4b065f548b922a34479577befb54b03330999d30d513 Port: 80/TCP容器端口 Host Port: 0/TCP State: Running Started: Sat, 10 Aug 2019 13:04:47 +0800 Ready: True Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-2m2ts (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: default-token-2m2ts: Type: Secret (a volume populated by a Secret) SecretName: default-token-2m2ts Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: <none>
多退的示例
[root@master manifests]# kubectl label pods pod-demo app=rs-cx --overwrite pod/pod-demo labeled [root@master manifests]# kubectl label pods pod-demo rs=cx pod/pod-demo labeled [root@master manifests]# kubectl get pods --show-labels NAME READY STATUS RESTARTS AGE LABELS myapp-84cd4b7f95-g6ldp 1/1 Running 6 16d pod-template-hash=84cd4b7f95,run=myapp nginx-5896f46c8-zblcs 1/1 Running 6 16d chenxi=cx,pod-template-hash=5896f46c8,run=nginx pod-demo 2/2 Running 8 5d16h app=rs-cx,rs=cx,tier=frontend rs-myapp-f5pg7 1/1 Running 0 3h5m app=rs-cx,rs=cx rs-myapp-pmvw8 0/1 Terminating 0 15m app=rs-cx,rs=cx rs-myapp-wzjkz 1/1 Running 0 3h5m app=rs-cx,rs=cx [root@master manifests]# kubectl get pods --show-labels 自动随机删除一个pod NAME READY STATUS RESTARTS AGE LABELS myapp-84cd4b7f95-g6ldp 1/1 Running 6 16d pod-template-hash=84cd4b7f95,run=myapp nginx-5896f46c8-zblcs 1/1 Running 6 16d chenxi=cx,pod-template-hash=5896f46c8,run=nginx pod-demo 2/2 Running 8 5d16h app=rs-cx,rs=cx,tier=frontend rs-myapp-f5pg7 1/1 Running 0 3h5m app=rs-cx,rs=cx rs-myapp-wzjkz 1/1 Running 0 3h5m app=rs-cx,rs=cx
动态修改pod的个数
[root@master manifests]# kubectl edit rs rs-myapp # Please edit the object below. Lines beginning with a '#' will be ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: extensions/v1beta1 kind: ReplicaSet metadata: creationTimestamp: "2019-08-10T05:01:49Z" generation: 1 name: rs-myapp namespace: default resourceVersion: "587241" selfLink: /apis/extensions/v1beta1/namespaces/default/replicasets/rs-myapp uid: c3a57f4b-dde9-4b0c-804e-5026271b70f9 spec: replicas: 5 selector: matchLabels: app: rs-cx rs: cx template: metadata: creationTimestamp: null labels: app: rs-cx rs: cx spec: containers: - image: ikubernetes/myapp:v1 imagePullPolicy: IfNotPresent name: myapp-rs ports: - containerPort: 80 name: http protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 status: availableReplicas: 3 fullyLabeledReplicas: 3 observedGeneration: 1 readyReplicas: 3 replicas: 3 [root@master manifests]# kubectl get pods -o wide -l rs=cx 扩容到5个 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES rs-myapp-5z6hp 1/1 Running 0 71s 10.244.2.26 node02 <none> <none> rs-myapp-f5pg7 1/1 Running 0 3h17m 10.244.1.43 node01 <none> <none> rs-myapp-j75tx 1/1 Running 0 8m9s 10.244.2.24 node02 <none> <none> rs-myapp-kh659 1/1 Running 0 71s 10.244.2.25 node02 <none> <none> rs-myapp-wzjkz 1/1 Running 0 3h17m 10.244.1.45 node01 <none> <none>
Deployment:工作在ReplicaSet之上,支持滚动更新与回滚操作,支持声明式的配置
DaemonSet: 保证每个节点上运行一个特定的pod副本,或指定类型的节点运行一个pod副本
job 运行一次性任务的pod控制器
CronJob:周期性任务的pod控制器
StatefulSet:有状态的pod控制