sched misc
sched misc
查看sched信息命令
cat /proc/sched_debug
cat /proc/schedstat
sysctl:
/proc/sys/kernel/sched_*
sched_child_runs_first sched_rt_period_us
sched_domain/ sched_rt_runtime_us
sched_energy_aware sched_schedstats
sched_latency_ns sched_tunable_scaling
sched_migration_cost_ns sched_util_clamp_max
sched_min_granularity_ns sched_util_clamp_min
sched_nr_migrate sched_util_clamp_min_rt_default
sched_rr_timeslice_ms sched_wakeup_granularity_ns
/proc/pid/sched
/proc/pid/schedstat
查看一个线程的调度详情(sched count等)
在/proc/pid/sched里的nr_switches反映的是这个线程到目前为止已经经历了多少此的sched switch
cat /proc/pid/sched
console:/ # cat /proc/3201/sched thread1 (3201, #threads: 1) ------------------------------------------------------------------- se.exec_start : 3649568.375043 se.vruntime : 35988.132407 se.sum_exec_runtime : 2257.807624 se.nr_migrations : 262 nr_switches : 7472 nr_voluntary_switches : 7067 nr_involuntary_switches : 405 se.load.weight : 140288 se.runnable_weight : 140288 se.avg.load_sum : 21 se.avg.runnable_load_sum : 21 se.avg.util_sum : 17408 se.avg.load_avg : 0 se.avg.runnable_load_avg : 0 se.avg.util_avg : 0 se.avg.last_update_time : 3649568374784 se.avg.util_est.ewma : 10 se.avg.util_est.enqueued : 0 policy : 0 prio : 129 clock-delta : 250
/proc/pid/sched结果里的policy值的含义
#define SCHED_NORMAL 0 //CFS #define SCHED_FIFO 1 //RT #define SCHED_RR 2 //RT #define SCHED_BATCH 3 //CFS #define SCHED_IDLE 5 //CFS #define SCHED_DEADLINE 6 //DL
sched priority范围
/* * Priority of a process goes from 0..MAX_PRIO-1, valid RT * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority * values are inverted: lower p->prio value means higher priority. * * The MAX_USER_RT_PRIO value allows the actual maximum * RT priority to be separate from the value exported to * user-space. This allows kernel threads to set their * priority to a value higher than any user task. Note: * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO. */ #define MAX_USER_RT_PRIO 100 #define MAX_RT_PRIO MAX_USER_RT_PRIO #define MAX_PRIO (MAX_RT_PRIO + 40) #define DEFAULT_PRIO (MAX_RT_PRIO + 20) // 默认优先级,对应 nice 值为 0 的静态优先级
RT priority范围:0-MAX_RT_PRIO-1,即0-99;
非RT priority范围:MAX_RT_PRIO..MAX_PRIO-1,即100-139
priority值与实际priority是反相的,即priority值越大,priority越低
/proc/sched_debug字段说明
The first being tree-key
column, it indicates the task's virtual runtime, and its name comes from the kernel sorting all runnable tasks by this key in a red-black tree. The switches
column indicates the total number of switches (involuntary or not), and naturally the prio
refers to the process priority. The wait-time
value indicates the amount of time the task waited to be scheduled. Finally both sum-exec
and sum-sleep
account for the total amount of time (in nanoseconds) the task was running on the processor or asleep, respectively.
from:
https://documentation.suse.com/sles/15-SP1/html/SLES-all/cha-tuning-taskscheduler.html
task_struct.thread_info.preempt_count
* PREEMPT_MASK: 0x000000ff * SOFTIRQ_MASK: 0x0000ff00 * HARDIRQ_MASK: 0x000f0000 * NMI_MASK: 0x00100000 * PREEMPT_NEED_RESCHED: 0x80000000 */ #define PREEMPT_BITS 8 #define SOFTIRQ_BITS 8 #define HARDIRQ_BITS 4 #define NMI_BITS 1 #define PREEMPT_SHIFT 0 #define SOFTIRQ_SHIFT (PREEMPT_SHIFT + PREEMPT_BITS) //=8 #define HARDIRQ_SHIFT (SOFTIRQ_SHIFT + SOFTIRQ_BITS) //=16 #define NMI_SHIFT (HARDIRQ_SHIFT + HARDIRQ_BITS) //=20 #define __IRQ_MASK(x) ((1UL << (x))-1) #define PREEMPT_MASK (__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT) //(1<<8) - 1 #define SOFTIRQ_MASK (__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT) //((1<<8)-1) <<8 #define HARDIRQ_MASK (__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT) //((1<<4)-1) <<16 #define NMI_MASK (__IRQ_MASK(NMI_BITS) << NMI_SHIFT) //((1<<1)-1) << 20 #define PREEMPT_OFFSET (1UL << PREEMPT_SHIFT) //1<<0 #define SOFTIRQ_OFFSET (1UL << SOFTIRQ_SHIFT) //1<<8 #define HARDIRQ_OFFSET (1UL << HARDIRQ_SHIFT) //1<<16 #define NMI_OFFSET (1UL << NMI_SHIFT) //1<<20
interrupt处理程序完成后,会去检查是否要执行sched switch
interrupt处理程序完成后,它会去监测preempt_count,如果它不为0,则会直接跳过sched switch;如果它为0,再检查是否TIF_NEED_RESCHED flag被置起来了,如果置起来了再去执行sched switch;
如果preempt_count不为0,关了抢占、关了软中断等均会使preempt_count不为0。所以如果在这之前有关软中断,此时将不会去执行sched switch,所以在(硬)中断处理完后,会直接返回被中断打断的thread继续执行:
arch/arm64/kernel/entry.S
el1_irq: kernel_entry 1 enable_da_f #ifdef CONFIG_TRACE_IRQFLAGS bl trace_hardirqs_off #endif irq_handler #ifdef CONFIG_PREEMPT ldr w24, [tsk, #TSK_TI_PREEMPT] // get preempt count cbnz w24, 1f // preempt count != 0 ldr x0, [tsk, #TSK_TI_FLAGS] // get flags tbz x0, #TIF_NEED_RESCHED, 1f // needs rescheduling? bl el1_preempt 1: #endif #ifdef CONFIG_TRACE_IRQFLAGS bl trace_hardirqs_on #endif kernel_exit 1 ENDPROC(el1_irq) #ifdef CONFIG_PREEMPT el1_preempt: mov x24, lr 1: bl preempt_schedule_irq // irq en/disable is done inside ldr x0, [tsk, #TSK_TI_FLAGS] // get new tasks TI_FLAGS tbnz x0, #TIF_NEED_RESCHED, 1b // needs rescheduling? ret x24 #endif
renice调整已经存在的进程的优先级
renice -p -n -20 $pid #调整进程号为的pid的进程的nice值为-20,-p的含义是以进程为单位进行调整
Preemptible Kernel (Low-Latency Desktop)
这种模式和”No Forced Preemption (Server)”相反,它的目标是提升交互性(即减少内核的latency)而牺牲throughput。它允许在除临界区之外的所有内核code中进行线程切换;临界区也叫做atomic context,主要是:
- a. spinlock保护的区间;
- b. irq-handler,即中断上下文;
除了这两种区间的内核code,其他地方都允许线程切换,所以,低优先级的线程,即使在内核中且非自愿的情况下(例如纯CPU计算)也能被抢占,让CPU去服务优先级更高的线程(例如交互性事件)。用代价a. throughput降低;b. 线程切换更多(消耗CPU),换取更高的交互性,适合于Desktop及嵌入式系统。
from: https://www.yuanguohuo.com/2020/03/31/linux-preemption-mode/
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 一个奇形怪状的面试题:Bean中的CHM要不要加volatile?
· [.NET]调用本地 Deepseek 模型
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· 在鹅厂做java开发是什么体验
· 百万级群聊的设计实践
· WPF到Web的无缝过渡:英雄联盟客户端的OpenSilver迁移实战
· 永远不要相信用户的输入:从 SQL 注入攻防看输入验证的重要性
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析