Load Balancing(三)
实时进程的负载均衡在进程创建和销往时进行,所以系统的负载均衡机制,大部分是针对CFS的。
v3.14.25
负载均衡有两种方式:pull, push:
1) pull拉:负载轻的CPU,从负载繁重的CPU pull tasks来运行。这应该是主要的方式,因为不应该让负载本身就繁重的CPU执行负载均衡任务。相应的为load balance。
2) push推:负载重的CPU,向负载轻的CPU,推送tasks由其帮忙执行。相应的为active balance。
负载均衡的入口有softirq的run_rebalance_domains,和schedule中的idle_balance。
start_kernel() |-->sched_init() |-->init_sched_fair_class() |-->open_softirq(SCHED_SOFTIRQ, run_rebalance_domains); run_rebalance_domains |-->rebalance_domains(this_rq, idle); |-->nohz_idle_balance(this_rq, idle); //ignore void rebalance_domains(struct rq *rq, enum cpu_idle_type idle) |-->int continue_balancing = 1; int cpu = rq->cpu; struct sched_domain *sd; unsigned long next_balance = jiffies + 60*HZ; |-->for_each_domain(cpu, sd) { if (!(sd->flags & SD_LOAD_BALANCE)) continue; if (!continue_balancing) break; //如果在该sched_domain上已经均衡,则无需继续到上层进行均衡操作 if (time_after_eq(jiffies, sd->last_balance + interval)) { //到了均衡时间 if (load_balance(cpu, rq, sd, idle, &continue_balancing)) idle = idle_cpu(cpu) ? CPU_IDLE : CPU_NOT_IDLE; sd->last_balance = jiffies; } } |-->rq->next_balance = next_balance; a)find_busiest_group b)find_busiest_queue c)move_tasks 4)active balance int load_balance(int this_cpu, struct rq *this_rq, struct sched_domain *sd, enum cpu_idle_type idle, int *continue_balancing) |-->int ld_moved, cur_ld_moved, active_balance = 0; struct sched_domain *sd_parent = sd->parent; struct sched_group *group; struct rq *busiest; unsigned long flags; struct cpumask *cpus = __get_cpu_var(load_balance_mask); |-->struct lb_env env = { .sd = sd, .dst_cpu = this_cpu, .dst_rq = this_rq, .dst_grpmask = sched_group_cpus(sd->groups), .idle = idle, .loop_break = sched_nr_migrate_break, .cpus = cpus, .fbq_type = all, }; |-->if (!should_we_balance(&env) { *continue_balancing = 0; goto out_balanced; } |-->group = find_busiest_group(&env); |-->busiest = find_busiest_queue(&env, group); |-->if (busiest->nr_running > 1) { env.flags |= LBF_ALL_PINNED; env.src_cpu = busiest->cpu; env.src_rq = busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running); cur_ld_moved = move_tasks(&env); ld_moved += cur_ld_moved; if (cur_ld_moved && env.dst_cpu != smp_processor_id()) resched_cpu(env.dst_cpu); } |-->if (!ld_moved) { if (need_active_balance(&env)) { //active balance if (active_balance) { stop_one_cpu_nowait(cpu_of(busiest), active_load_balance_cpu_stop, busiest, &busiest->active_balance_work); } } } struct sched_group *find_busiest_group(struct lb_env *env) |-->struct sg_lb_stats *local, *busiest; struct sd_lb_stats sds; |-->update_sd_lb_stats(env, &sds); //遍历sd下组成循环链表的sg来统计负载信息 |-->local = &sds.local_stat; busiest = &sds.busiest_stat; |-->... |-->force_balance: calculate_imbalance(env, &sds); //计算this group & busiest group与avg_load的比较 return sds.busiest; |-->out_balanced: env->imbalance = 0; return NULL; 理解: sgs->avg_load = (sgs->group_load*SCHED_POWER_SCALE) / sgs->group_power; CPU power of this group, SCHED_LOAD_SCALE being max power for a single CPU. SCHED_LOAD_SCALE是一个CPU的最大负载能力,sgs->group_power是当前groups上所有CPU的负载能力,因此用当前groups的负载量sgs->group_load>乘以 SCHED_POWER_SCALE/sgs->group_power就等于当前groups上每个CPU的平均负载量。
idle balance是在是在schedule中,选择下一个进程执行的时候,发现没有进程可以调度时,就考虑从其它sd中pull进程来执行。
static void __sched __schedule(void)
|-->if (unlikely(!rq->nr_running))
|-->idle_balance(cpu, rq);
|-->load_balance
负载均衡触发的时机:
1)scheduler_tick(简介)调用update_cpu_load,然后使用trigger_load_balance看是否需要均衡;……raise_softirq(SCHED_SOFTIRQ)
2)__schedule中,发现空闲时调用idle_balance
3)softirq周期性的触发均衡操作
均衡的类型有:
1)pull balance, 由softirq执行
2)push balance(active balance),由workqueue执行……active_load_balance_cpu_stop
3)idle_balance,直接执行
idle的类型有newly_idle, not idle, idle:
1)在rebalance_domains中,idle = idle_cpu(cpu) ? CPU_IDLE : CPU_NOT_IDLE
2)在__schedule()中进行idle_balance时传入CPU_NEWLY_IDLE
花了两天时间,查看了主要步骤,依然不得要领……是不是该停下,还有好多datasheet未看……