linux2.4内核调度
进程调度需要兼顾3种进程:交互进程,批处理进程,实时进程,在设计一个进程调度机制时需要考虑具体问题
(1)调度时机?
答:进程在用户空间可以pause()或者让内核设置进程为睡眠状态,以此调度,调度还可以强制性的发生在从系统调用返回前夕,以此每次从中断或异常处理返回到用户空间前夕(用户空间表示,只有cpu在用户空间运行时,发生异常或者中断才会调度),如果发生在内核的异常或者中断不会引起调度
缺陷:在实时进程中,内核中发生了中断,而且这个中断处理时间很长,并且内核中断不会调度.那就可能将调度过分延迟,从而使得用户感觉到明显的延迟.,另外从内核返回到用户空间并非一定会调度,而取决于pcb中的need_resched是否设置为1(谁来设置呢,当前进程自动让粗,在内核唤醒一个进程,以及时间中断处理程序发现当前进程运行时间太久时)
(2)调度的政策,依靠什么标准调度下一进程
答:用户抢占,时机从内核态返回到用户态,内核不可抢占(2.6内核版本改进).
内核为每个进程计算一个权值,选最高运行,进程运行时,资格随时间低调,当所有进程的资格变为0时,就从新计算一次(2.6内核改进,每个,而非全部)
为了适应不同策略,分为:sched_fifo(实时进程)跟sched_rr(运行时间长的进程)还有other
(3)调度的方式:可抢占式,还是不可抢占式
/* * 'schedule()' is the scheduler function. It's a very simple and nice * scheduler: it's not perfect, but certainly works for most things. * * The goto is "interesting". * * NOTE!! Task 0 is the 'idle' task, which gets called when no other * tasks can run. It can not be killed, and it cannot sleep. The 'state' * information in task[0] is never used. */ asmlinkage void schedule(void) { struct schedule_data * sched_data; struct task_struct *prev, *next, *p; struct list_head *tmp; int this_cpu, c; if (!current->active_mm) BUG();//调度时,线程的active_mm不可以为0,借用之前的空间 need_resched_back: prev = current;//赋值获得当前pcb this_cpu = prev->processor; if (in_interrupt())//是否处于中断处理状态,一个bug,将调用bug() goto scheduling_in_interrupt; release_kernel_lock(prev, this_cpu);//对单核cpu是空语句 /*检查内核软中断服务请求是否在等待 Do "administrative" work here while we don't hold any locks */ if (softirq_active(this_cpu) & softirq_mask(this_cpu)) goto handle_softirq;//转到下面,进行请求服务 handle_softirq_back: /*sched_data用于保存一下一次调度时,所需要的信息 * 'sched_data' is protected by the fact that we can run * only one process per CPU. */ sched_data = & aligned_data[this_cpu].schedule_data; spin_lock_irq(&runqueue_lock);//加锁此队列 /* move an exhausted RR process to be last.. */ if (prev->policy == SCHED_RR)//如果当前进程的调度策略为sched_rr也就是轮换调度,那就特殊处理 goto move_rr_last;//判断时间配额是否用完,用完移到run队列队尾,同时恢复最初时间配额,然后跳到这里 move_rr_back://对sched_rr特殊处理 switch (prev->state) { case TASK_INTERRUPTIBLE: if (signal_pending(prev)) {//检测当前进程是否有信号要进行处理 prev->state = TASK_RUNNING; break; } default: del_from_runqueue(prev);//从可运行队列中删除 case TASK_RUNNING: } prev->need_resched = 0;//设置为不需要调度,因为所需求的调度已经在运行了 /* * this is the scheduler proper: */ repeat_schedule://接下来挑选一进程来运行了 /* * Default process to select.. */ next = idle_task(this_cpu);//指向最佳候选进程 c = -1000;//设置c的权值为最低值,后面遍历有用 if (prev->state == TASK_RUNNING)//如果当前进程还是处于可运行状态 goto still_running;//如果当前进程还想继续运行,那就从当前进程计算权值开始,相同权值具有优先级 still_running_back: list_for_each(tmp, &runqueue_head) { p = list_entry(tmp, struct task_struct, run_list); if (can_schedule(p, this_cpu)) {//遍历运行队列中的所有进程 int weight = goodness(p, this_cpu, prev->active_mm);//通过goodness计算机它当前所具有的权值 if (weight > c) c = weight, next = p; } } /* Do we need to re-calculate counters? */ if (!c)//如果已选择的进程(权值最高)为0,那就要从新计算机各个进程的时间配额,说明系统已经没有就绪的实时进程了 goto recalculate; /* * from this point on nothing can prevent us from * switching to the next task, save this fact in * sched_data. */ sched_data->curr = next; #ifdef CONFIG_SMP next->has_cpu = 1; next->processor = this_cpu; #endif spin_unlock_irq(&runqueue_lock); if (prev == next)//如果挑选出来的进程是当前进程,那就直接返回 goto same_process; #ifdef CONFIG_SMP /* * maintain the per-process 'last schedule' value. * (this has to be recalculated even if we reschedule to * the same process) Currently this is only used on SMP, * and it's approximate, so we do not have to maintain * it while holding the runqueue spinlock. */ sched_data->last_schedule = get_cycles(); /* * We drop the scheduler lock early (it's a global spinlock), * thus we have to lock the previous process from getting * rescheduled during switch_to(). */ #endif /* CONFIG_SMP */ kstat.context_swtch++; /* * there are 3 processes which are affected by a context switch: * * prev == .... ==> (last => next) * * It's the 'much more previous' 'prev' that is on next's stack, * but prev is set to (the just run) 'last' process by switch_to(). * This might sound slightly confusing but makes tons of sense. */ prepare_to_switch();//准备调度 { struct mm_struct *mm = next->mm;//下一进程的mm struct mm_struct *oldmm = prev->active_mm;//当前进程的mm if (!mm) {//下一要调度的是线程 if (next->active_mm) BUG();//如果线程连空间都木有,那就bug next->active_mm = oldmm;//沿用前一进程的空间 atomic_inc(&oldmm->mm_count);//引用计数++ enter_lazy_tlb(oldmm, next, this_cpu); } else {//下一要调度的是进程 if (next->active_mm != mm) BUG(); switch_mm(oldmm, mm, next, this_cpu);//切换空间 } if (!prev->mm) {//前一进程为线程 prev->active_mm = NULL;//设置为NULL mmdrop(oldmm);//释放,这里线程只是把引用计数-- } } /* * This just switches the register state and the * stack. */ switch_to(prev, next, prev);//开始调度------------------ __schedule_tail(prev);//对于新创建的进程,调用后,直接转到ret_from_sys_call返回到用户空间 same_process: reacquire_kernel_lock(current);//空语句 if (current->need_resched)//前面已经清空为0,现在变成了非0,那就中断发生了有变化 goto need_resched_back;//再次调度 return; recalculate: { struct task_struct *p; spin_unlock_irq(&runqueue_lock); read_lock(&tasklist_lock); for_each_task(p)//将当前进程的时间配额除以2?nice换来的ticks数量 p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice); read_unlock(&tasklist_lock); spin_lock_irq(&runqueue_lock); } goto repeat_schedule; still_running: c = goodness(prev, this_cpu, prev->active_mm); next = prev; goto still_running_back; handle_softirq: do_softirq(); goto handle_softirq_back; move_rr_last: if (!prev->counter) {//一旦counter为0,表示运行时间配额为0,将从可执行进程队列当前位置移到队列尾部 prev->counter = NICE_TO_TICKS(prev->nice);//恢复最初的时间配额.将根据进程的优先级别换成可运行的时间配额. move_last_runqueue(prev); } goto move_rr_back; scheduling_in_interrupt://一个bug,在中断处理程序中调度了 printk("Scheduling in interrupt\n"); BUG(); return; }
goodness函数解析
goodness对于非实时进程来说权重等于时间配额+1(如果是线程,+1)+(20-nice)
nice对于实时进程的权重计算没什么用,不过对sched_rr的时间配额有用
实时进程权重计算:weight = 1000 + p->rt_priority,rt_priority对实时进程的权重还是很重要的
1 static inline int goodness(struct task_struct * p, int this_cpu, struct mm_struct *this_mm) 2 { 3 int weight; 4 /* 5 * select the current process after every other 6 * runnable process, but before the idle thread. 7 * Also, dont trigger a counter recalculation. 8 */ 9 weight = -1; 10 if (p->policy & SCHED_YIELD)//如果当前进程设置了此标志位,表示礼让,权值设置为-1.直接return 11 goto out; 12 /* 13 * Non-RT process - normal case first. 14 */ 15 if (p->policy == SCHED_OTHER) {//对于没有实时要求的进程来说 16 /* 17 * Give the process a first-approximation goodness value 18 * according to the number of clock-ticks it has left. 19 * 20 * Don't do any other calculations if the time slice is 21 * over.. 22 */ 23 weight = p->counter;//weight等于时间配额 24 if (!weight)//用完了,权值为0,直接返回 25 goto out; 26 27 #ifdef CONFIG_SMP 28 /* Give a largish advantage to the same processor... */ 29 /* (this is equivalent to penalizing other processors) */ 30 if (p->processor == this_cpu) 31 weight += PROC_CHANGE_PENALTY; 32 #endif 33 /* .. and a slight advantage to the current MM */ 34 if (p->mm == this_mm || !p->mm)//如果是内核线程,或者用户空间与当前进程相同,唔需要切换用户空间,获得奖励+1s 35 weight += 1; 36 weight += 20 - p->nice;//nice也小,优先级越高,范围-20到19. 37 goto out; 38 } 39 /* 40 * Realtime process, select the first one on the 41 * runqueue (taking priorities within processes 42 * into account).//实时进程的nice与优先级无关,但对于sched_rr进程的时间配额大小有关,实时进程就绪时,非实时进程没机会运行 43 *///对于实时进程来说,则有一种正向优先级,那就是实时优先级rt_priority,由于时间要求,对进程赋予很高的全职 44 weight = 1000 + p->rt_priority;//rt_priotty对实时进程哟很重要的作用 45 out: 46 return weight; 47 }
总schedule流程:
准备:
处理中断处理状态直接跳到bug()出错
当前进程是SCHED_RR(需要长时间的进程),判断时间片是否用完了,用完了移到run队尾,同时恢复时间片配额
判断当前进程是否是可中断睡眠,是而且有信号要处理,那就设置当前进程为可运行状态;,如果是除了运行状态的其他状态
那就把当前进程从可运行状态队列删除.
挑选:
如果当前进程处于run.计算权重从当前进程计算,这样使得当前进程在同权重的进程中有优先级
遍历所有运行队列中的所有进程,通过goodness(goodness对于非实时进程来说权重=时间配额+1(如果是线程,+1)+(20-nice)
nice对于实时进程的权重计算没什么用,不过对sched_rr的时间片配额有用,实时进程权重计算:weight = 1000 + p->rt_priority)
计算所有run状态的权重,选取最高的运行,不过如果最高的是0,那就表示运行队列中没有实时进程,
需要重新计算可运行状态队列中的所有进程的时间片.而且这种情况持续一段时间了,否则sched_other没机会消耗到0
,计算完后.选最高权重进程调度
调度:
切换空间,切换进程或线程