debain 下使用cgroup2对磁盘带宽和内存进行限制
最近在公司有个项目上要对磁盘进行限速,在centos上面使用cgroup v1接口对磁盘限速很正常,但是在debain下面怎么都不生效,只好放弃cgroup v1采用cgroup v2。
从Linux 4.5内核开始cgroup v2接口已经被标记为官方发布,意味着不再使用devel标签并且可以作为新型cgroup2 fs类型来挂载。
V2相对于V1,规则发生了一些变化。每个控制组(control group)都有一个cgroup.controllers文件,列出子group可以开启的controller。另外,还有一个 cgroup.subtree_control文件,用于控制开启/关闭子group的controller。
1、修改系统启动项
在/etc/default/grub文件修改系统启动cmdline,末尾添加cgroup_no_v1=all
root@node115:~# vim.tiny /etc/default/grub
添加或修改GRUB_CMDLINE_LINUX行:
GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all"
root@node115:~# update-grub
root@node115:~# reboot
2、检查是否生效
重启系统后查看系统日志,有下面的信息就表示cgroup1已经关闭了.
root@node115:~# dmesg|grep group
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.3.10-1-pve root=/dev/mapper/vcl-root ro systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all quiet
[ 0.108203] Built 1 zonelists, mobility grouping on. Total pages: 2064227
[ 0.108208] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.3.10-1-pve root=/dev/mapper/vcl-root ro systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all quiet
[ 0.290723] Disabling cpuset control group subsystem in v1 mounts
[ 0.290733] Disabling cpu control group subsystem in v1 mounts
[ 0.290737] Disabling cpuacct control group subsystem in v1 mounts
[ 0.290742] Disabling io control group subsystem in v1 mounts
[ 0.290762] Disabling memory control group subsystem in v1 mounts
[ 0.290775] Disabling devices control group subsystem in v1 mounts
[ 0.290786] Disabling freezer control group subsystem in v1 mounts
[ 0.290789] Disabling net_cls control group subsystem in v1 mounts
[ 0.290802] Disabling perf_event control group subsystem in v1 mounts
[ 0.290805] Disabling net_prio control group subsystem in v1 mounts
[ 0.290808] Disabling hugetlb control group subsystem in v1 mounts
[ 0.290810] Disabling pids control group subsystem in v1 mounts
[ 0.290813] Disabling rdma control group subsystem in v1 mounts
[ 0.290816] *** VALIDATE cgroup1 ***
[ 0.290819] *** VALIDATE cgroup2 ***
查看cgroup2的挂载目录.
root@node115:~# mount | grep cgroup
cgroup on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
3、限制磁盘读写带宽
注意这样设置后,重启系统时设置会失效,可将相关的命令加入到启动项.
root@node115:~# echo "+io +memory" > /sys/fs/cgroup/cgroup.subtree_control
root@node115:~# ls -l /sys/fs/cgroup/user.slice/
root@node115:~# lsblk -d
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 32G 0 disk
sdb 8:16 0 100G 0 disk /mnt/sdb
sdc 8:32 0 100G 0 disk
sdd 8:48 0 100G 0 disk
sde 8:64 0 32G 0 disk
root@node115:~# echo "8:16 wbps=10485760" > /sys/fs/cgroup/user.slice/io.max
root@node115:~# echo "8:16 rbps=10485760" > /sys/fs/cgroup/user.slice/io.max
root@node115:/mnt/sdb# dd if=/dev/zero of=1gfile bs=1M count=1024 conv=fdatasync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 103.251 s, 10.4 MB/s
取消限制:
root@node115:/sys/fs/cgroup/user.slice# echo "8:16 rbps=max" > io.max
root@node115:/sys/fs/cgroup/user.slice# echo "8:16 wbps=max" > io.max
root@node115:/sys/fs/cgroup/user.slice# cat io.max
4、验证磁盘读写带宽限制是否生效
使用fio测试磁盘读写带宽,可以看到对磁盘的读写带宽限制生效.
fio顺序读带宽:
root@node115:~# fio -filename=/mnt/sdb/testfile -direct=1 -iodepth 1 -thread -rw=read -ioengine=libaio -bs=64k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=mytest
mytest: (g=0): rw=read, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=1
...
fio-2.16
Starting 10 threads
mytest: Laying out IO file(s) (1 file(s) / 2048MB)
Jobs: 10 (f=10): [R(10)] [100.0% done] [10250KB/0KB/0KB /s] [160/0/0 iops] [eta 00m:00s]
mytest: (groupid=0, jobs=10): err= 0: pid=19216: Mon Apr 25 16:53:20 2022
read : io=615040KB, bw=10249KB/s, iops=160, runt= 60007msec
slat (usec): min=6, max=458, avg=42.04, stdev=28.85
clat (usec): min=356, max=175157, avg=62390.91, stdev=47134.65
lat (usec): min=402, max=175260, avg=62432.94, stdev=47116.45
clat percentiles (usec):
| 1.00th=[ 644], 5.00th=[ 948], 10.00th=[ 1096], 20.00th=[ 1256],
| 30.00th=[ 1800], 40.00th=[84480], 50.00th=[96768], 60.00th=[98816],
| 70.00th=[98816], 80.00th=[99840], 90.00th=[101888], 95.00th=[105984],
| 99.00th=[116224], 99.50th=[121344], 99.90th=[173056], 99.95th=[175104],
| 99.99th=[175104]
lat (usec) : 500=0.23%, 750=1.71%, 1000=4.46%
lat (msec) : 2=24.34%, 4=2.81%, 10=2.86%, 20=1.04%, 50=0.22%
lat (msec) : 100=48.45%, 250=13.88%
cpu : usr=0.04%, sys=0.11%, ctx=9764, majf=0, minf=170
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=9610/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: io=615040KB, aggrb=10249KB/s, minb=10249KB/s, maxb=10249KB/s, mint=60007msec, maxt=60007msec
Disk stats (read/write):
sdb: ios=8297/3, merge=1359/0, ticks=24863/74, in_queue=10864, util=7.60%
fio顺序写带宽:
root@node115:~# fio -filename=/mnt/sdb/testfile -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=64k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=mytest
mytest: (g=0): rw=write, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=1
...
fio-2.16
Starting 10 threads
Jobs: 10 (f=10): [W(10)] [100.0% done] [0KB/10240KB/0KB /s] [0/160/0 iops] [eta 00m:00s]
mytest: (groupid=0, jobs=10): err= 0: pid=20050: Mon Apr 25 16:55:05 2022
write: io=614720KB, bw=10244KB/s, iops=160, runt= 60007msec
slat (usec): min=10, max=518, avg=54.92, stdev=34.38
clat (usec): min=595, max=219293, avg=62408.97, stdev=47014.26
lat (usec): min=677, max=219329, avg=62463.90, stdev=46990.89
clat percentiles (usec):
| 1.00th=[ 1560], 5.00th=[ 1768], 10.00th=[ 1880], 20.00th=[ 2064],
| 30.00th=[ 2352], 40.00th=[95744], 50.00th=[97792], 60.00th=[98816],
| 70.00th=[98816], 80.00th=[98816], 90.00th=[98816], 95.00th=[99840],
| 99.00th=[102912], 99.50th=[102912], 99.90th=[218112], 99.95th=[218112],
| 99.99th=[220160]
lat (usec) : 750=0.03%, 1000=0.01%
lat (msec) : 2=16.94%, 4=20.32%, 10=0.33%, 100=59.19%, 250=3.18%
cpu : usr=0.06%, sys=0.10%, ctx=9971, majf=0, minf=10
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=9605/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: io=614720KB, aggrb=10244KB/s, minb=10244KB/s, maxb=10244KB/s, mint=60007msec, maxt=60007msec
Disk stats (read/write):
sdb: ios=72/8951, merge=0/631, ticks=23/25042, in_queue=3808, util=8.21%
5、设置内存限制
限制user.slice服务的最大内存使用率为100M.
root@node115:~# systemctl show user.slice | grep MemoryLimit
root@node115:~# systemctl set-property user.slice MemoryLimit=100M
root@node115:~# systemctl daemon-reload
root@node115:~# cd /sys/fs/cgroup/user.slice/
root@node115:/sys/fs/cgroup/user.slice# ls
root@node115:/sys/fs/cgroup/user.slice# cat memory.max
104857600
root@node115:~# cat /etc/systemd/system.control/user.slice.d/50-MemoryLimit.conf
# This is a drop-in unit file extension, created via "systemctl set-property"
# or an equivalent operation. Do not edit.
[Slice]
MemoryLimit=104857600
查看服务资源占用:
root@node115:~# systemd-cgtop
6、验证进程内存限制
关闭服务虚拟内存:
root@node115:/sys/fs/cgroup/user.slice# echo 0 > memory.swap.max
root@node115:~/inode_test# ./memtest
Killed
root@node115:~/inode_test# systemctl set-property user.slice MemoryLimit=200M
root@node115:/sys/fs/cgroup/user.slice# echo 0 > memory.swap.max
root@node115:/sys/fs/cgroup/user.slice# cat memory.swap.max
0
root@node115:~/inode_test# ./memtest
malloc memory 100 MB
Killed
禁用虚拟内存:
root@node115:~/inode_test# swapoff -a
root@node115:~/inode_test# free -mh
total used free shared buff/cache available
Mem: 7.8G 1.6G 5.8G 68M 356M 5.8G
Swap: 0B 0B 0B
root@node115:/sys/fs/cgroup/user.slice# echo 314572800 > memory.max
root@node115:/sys/fs/cgroup/user.slice# cat memory.max
314572800
root@node115:~/inode_test# ./memtest
malloc memory 100 MB
malloc memory 200 MB
Killed
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本