Configure a Security Context for a Pod or Container
A security context defines privilege and access control settings for a Pod or Container. Security context settings include, but are not limited to:
- Discretionary Access Control: Permission to access an object, like a file, is based on user ID (UID) and group ID (GID).
- Security Enhanced Linux (SELinux): Objects are assigned security labels.
- Running as privileged or unprivileged.
- Linux Capabilities: Give a process some privileges, but not all the privileges of the root user.
- AppArmor: Use program profiles to restrict the capabilities of individual programs.
- Seccomp: Filter a process's system calls.
- AllowPrivilegeEscalation: Controls whether a process can gain more privileges than its parent process. This bool directly controls whether the
no_new_privs
flag gets set on the container process. AllowPrivilegeEscalation is true always when the container is: 1) run as Privileged OR 2) hasCAP_SYS_ADMIN
. - readOnlyRootFilesystem: Mounts the container's root filesystem as read-only.
The above bullets are not a complete set of security context settings -- please see SecurityContext for a comprehensive list.
For more information about security mechanisms in Linux, see Overview of Linux Kernel Security Features
Set the security context for a Pod
To specify security settings for a Pod, include the securityContext
field in the Pod specification. The securityContext
field is a PodSecurityContext object. The security settings that you specify for a Pod apply to all Containers in the Pod. Here is a configuration file for a Pod that has a securityContext
and an emptyDir
volume:
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
volumes:
- name: sec-ctx-vol
emptyDir: {}
containers:
- name: sec-ctx-demo
image: busybox
command: [ "sh", "-c", "sleep 1h" ]
volumeMounts:
- name: sec-ctx-vol
mountPath: /data/demo
securityContext:
allowPrivilegeEscalation: false
In the configuration file, the runAsUser
field specifies that for any Containers in the Pod, all processes run with user ID 1000. The runAsGroup
field specifies the primary group ID of 3000 for all processes within any containers of the Pod. If this field is omitted, the primary group ID of the containers will be root(0). Any files created will also be owned by user 1000 and group 3000 when runAsGroup
is specified. Since fsGroup
field is specified, all processes of the container are also part of the supplementary group ID 2000. The owner for volume /data/demo
and any files created in that volume will be Group ID 2000.
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/security/security-context.yaml
Verify that the Pod's Container is running:
kubectl get pod security-context-demo
Get a shell to the running Container:
kubectl exec -it security-context-demo -- sh
In your shell, list the running processes:
ps
The output shows that the processes are running as user 1000, which is the value of runAsUser
:
PID USER TIME COMMAND
1 1000 0:00 sleep 1h
6 1000 0:00 sh
...
In your shell, navigate to /data
, and list the one directory:
cd /data
ls -l
The output shows that the /data/demo
directory has group ID 2000, which is the value of fsGroup
.
drwxrwsrwx 2 root 2000 4096 Jun 6 20:08 demo
In your shell, navigate to /data/demo
, and create a file:
cd demo
echo hello > testfile
List the file in the /data/demo
directory:
ls -l
The output shows that testfile
has group ID 2000, which is the value of fsGroup
.
-rw-r--r-- 1 1000 2000 6 Jun 6 20:08 testfile
Run the following command:
$ id
uid=1000 gid=3000 groups=2000
You will see that gid is 3000 which is same as runAsGroup
field. If the runAsGroup
was omitted the gid would remain as 0(root) and the process will be able to interact with files that are owned by root(0) group and that have the required group permissions for root(0) group.
Exit your shell:
exit
Configure volume permission and ownership change policy for Pods
FEATURE STATE: Kubernetes v1.18 [alpha]
By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup
specified in a Pod's securityContext
when that volume is mounted. For large volumes, checking and changing ownership and permissions can take a lot of time, slowing Pod startup. You can use the fsGroupChangePolicy
field inside a securityContext
to control the way that Kubernetes checks and manages ownership and permissions for a volume.
fsGroupChangePolicy - fsGroupChangePolicy
defines behavior for changing ownership and permission of the volume before being exposed inside a Pod. This field only applies to volume types that support fsGroup
controlled ownership and permissions. This field has two possible values:
- OnRootMismatch: Only change permissions and ownership if permission and ownership of root directory does not match with expected permissions of the volume. This could help shorten the time it takes to change ownership and permission of a volume.
- Always: Always change permission and ownership of the volume when volume is mounted.
For example:
securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
fsGroupChangePolicy: "OnRootMismatch"
This is an alpha feature. To use it, enable the feature gate ConfigurableFSGroupPolicy
for the kube-api-server, the kube-controller-manager, and for the kubelet.
Note: This field has no effect on ephemeral volume types such as
secret
,configMap
, andemptydir
.
Set the security context for a Container
To specify security settings for a Container, include the securityContext
field in the Container manifest. The securityContext
field is a SecurityContext object. Security settings that you specify for a Container apply only to the individual Container, and they override settings made at the Pod level when there is overlap. Container settings do not affect the Pod's Volumes.
Here is the configuration file for a Pod that has one Container. Both the Pod and the Container have a securityContext
field:
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-2
spec:
securityContext:
runAsUser: 1000
containers:
- name: sec-ctx-demo-2
image: gcr.io/google-samples/node-hello:1.0
securityContext:
runAsUser: 2000
allowPrivilegeEscalation: false
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/security/security-context-2.yaml
Verify that the Pod's Container is running:
kubectl get pod security-context-demo-2
Get a shell into the running Container:
kubectl exec -it security-context-demo-2 -- sh
In your shell, list the running processes:
ps aux
The output shows that the processes are running as user 2000. This is the value of runAsUser
specified for the Container. It overrides the value 1000 that is specified for the Pod.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
2000 1 0.0 0.0 4336 764 ? Ss 20:36 0:00 /bin/sh -c node server.js
2000 8 0.1 0.5 772124 22604 ? Sl 20:36 0:00 node server.js
...
Exit your shell:
exit
Set capabilities for a Container
With Linux capabilities, you can grant certain privileges to a process without granting all the privileges of the root user. To add or remove Linux capabilities for a Container, include the capabilities
field in the securityContext
section of the Container manifest.
First, see what happens when you don't include a capabilities
field. Here is configuration file that does not add or remove any Container capabilities:
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-3
spec:
containers:
- name: sec-ctx-3
image: gcr.io/google-samples/node-hello:1.0
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/security/security-context-3.yaml
Verify that the Pod's Container is running:
kubectl get pod security-context-demo-3
Get a shell into the running Container:
kubectl exec -it security-context-demo-3 -- sh
In your shell, list the running processes:
ps aux
The output shows the process IDs (PIDs) for the Container:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 4336 796 ? Ss 18:17 0:00 /bin/sh -c node server.js
root 5 0.1 0.5 772124 22700 ? Sl 18:17 0:00 node server.js
In your shell, view the status for process 1:
cd /proc/1
cat status
The output shows the capabilities bitmap for the process:
...
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
...
Make a note of the capabilities bitmap, and then exit your shell:
exit
Next, run a Container that is the same as the preceding container, except that it has additional capabilities set.
Here is the configuration file for a Pod that runs one Container. The configuration adds the CAP_NET_ADMIN
and CAP_SYS_TIME
capabilities:
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-4
spec:
containers:
- name: sec-ctx-4
image: gcr.io/google-samples/node-hello:1.0
securityContext:
capabilities:
add: ["NET_ADMIN", "SYS_TIME"]
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/security/security-context-4.yaml
Get a shell into the running Container:
kubectl exec -it security-context-demo-4 -- sh
In your shell, view the capabilities for process 1:
cd /proc/1
cat status
The output shows capabilities bitmap for the process:
...
CapPrm: 00000000aa0435fb
CapEff: 00000000aa0435fb
...
Compare the capabilities of the two Containers:
00000000a80425fb
00000000aa0435fb
In the capability bitmap of the first container, bits 12 and 25 are clear. In the second container, bits 12 and 25 are set. Bit 12 is CAP_NET_ADMIN
, and bit 25 is CAP_SYS_TIME
. See capability.h for definitions of the capability constants.
Note: Linux capability constants have the form
CAP_XXX
. But when you list capabilities in your Container manifest, you must omit theCAP_
portion of the constant. For example, to addCAP_SYS_TIME
, includeSYS_TIME
in your list of capabilities.
Assign SELinux labels to a Container
To assign SELinux labels to a Container, include the seLinuxOptions
field in the securityContext
section of your Pod or Container manifest. The seLinuxOptions
field is an SELinuxOptions object. Here's an example that applies an SELinux level:
...
securityContext:
seLinuxOptions:
level: "s0:c123,c456"
Note: To assign SELinux labels, the SELinux security module must be loaded on the host operating system.
Discussion
The security context for a Pod applies to the Pod's Containers and also to the Pod's Volumes when applicable. Specifically fsGroup
and seLinuxOptions
are applied to Volumes as follows:
fsGroup
: Volumes that support ownership management are modified to be owned and writable by the GID specified infsGroup
. See the Ownership Management design document for more details.seLinuxOptions
: Volumes that support SELinux labeling are relabeled to be accessible by the label specified underseLinuxOptions
. Usually you only need to set thelevel
section. This sets the Multi-Category Security (MCS) label given to all Containers in the Pod as well as the Volumes.
Warning: After you specify an MCS label for a Pod, all Pods with the same label can access the Volume. If you need inter-Pod protection, you must assign a unique MCS label to each Pod.
PodSecurityContext v1 core
Group | Version | Kind |
---|---|---|
core |
v1 |
PodSecurityContext |
PodSecurityContext holds pod-level security attributes and common container settings. Some fields are also present in container.securityContext. Field values of container.securityContext take precedence over field values of PodSecurityContext.
Appears In:
- [PodSpec core/v1]
Field | Description |
---|---|
fsGroup integer |
A special supplemental group that applies to all containers in a pod. Some volume types allow the Kubelet to change the ownership of that volume to be owned by the pod: 1. The owning GID will be the FSGroup 2. The setgid bit is set (new files created in the volume will be owned by FSGroup) 3. The permission bits are OR'd with rw-rw---- If unset, the Kubelet will not modify the ownership and permissions of any volume. |
fsGroupChangePolicy string |
fsGroupChangePolicy defines behavior of changing ownership and permission of the volume before being exposed inside Pod. This field will only apply to volume types which support fsGroup based ownership(and permissions). It will have no effect on ephemeral volume types such as: secret, configmaps and emptydir. Valid values are "OnRootMismatch" and "Always". If not specified defaults to "Always". |
runAsGroup integer |
The GID to run the entrypoint of the container process. Uses runtime default if unset. May also be set in SecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence for that container. |
runAsNonRoot boolean |
Indicates that the container must run as a non-root user. If true, the Kubelet will validate the image at runtime to ensure that it does not run as UID 0 (root) and fail to start the container if it does. If unset or false, no such validation will be performed. May also be set in SecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. |
runAsUser integer |
The UID to run the entrypoint of the container process. Defaults to user specified in image metadata if unspecified. May also be set in SecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence for that container. |
seLinuxOptions SELinuxOptions |
The SELinux context to be applied to all containers. If unspecified, the container runtime will allocate a random SELinux context for each container. May also be set in SecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence for that container. |
supplementalGroups integer array |
A list of groups applied to the first process run in each container, in addition to the container's primary GID. If unspecified, no groups will be added to any container. |
sysctls Sysctl array |
Sysctls hold a list of namespaced sysctls used for the pod. Pods with unsupported sysctls (by the container runtime) might fail to launch. |
windowsOptions WindowsSecurityContextOptions |
The Windows specific settings applied to all containers. If unspecified, the options within a container's SecurityContext will be used. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. |
SecurityContext v1 core
Group | Version | Kind |
---|---|---|
core |
v1 |
SecurityContext |
SecurityContext holds security configuration that will be applied to a container. Some fields are present in both SecurityContext and PodSecurityContext. When both are set, the values in SecurityContext take precedence.
Appears In:
Field | Description |
---|---|
allowPrivilegeEscalation boolean |
AllowPrivilegeEscalation controls whether a process can gain more privileges than its parent process. This bool directly controls if the no_new_privs flag will be set on the container process. AllowPrivilegeEscalation is true always when the container is: 1) run as Privileged 2) has CAP_SYS_ADMIN |
capabilities Capabilities |
The capabilities to add/drop when running containers. Defaults to the default set of capabilities granted by the container runtime. |
privileged boolean |
Run container in privileged mode. Processes in privileged containers are essentially equivalent to root on the host. Defaults to false. |
procMount string |
procMount denotes the type of proc mount to use for the containers. The default is DefaultProcMount which uses the container runtime defaults for readonly paths and masked paths. This requires the ProcMountType feature flag to be enabled. |
readOnlyRootFilesystem boolean |
Whether this container has a read-only root filesystem. Default is false. |
runAsGroup integer |
The GID to run the entrypoint of the container process. Uses runtime default if unset. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. |
runAsNonRoot boolean |
Indicates that the container must run as a non-root user. If true, the Kubelet will validate the image at runtime to ensure that it does not run as UID 0 (root) and fail to start the container if it does. If unset or false, no such validation will be performed. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. |
runAsUser integer |
The UID to run the entrypoint of the container process. Defaults to user specified in image metadata if unspecified. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. |
seLinuxOptions SELinuxOptions |
The SELinux context to be applied to the container. If unspecified, the container runtime will allocate a random SELinux context for each container. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. |
windowsOptions WindowsSecurityContextOptions |
The Windows specific settings applied to all containers. If unspecified, the options from the PodSecurityContext will be used. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. |