Load Balancing(三)

实时进程的负载均衡在进程创建和销往时进行，所以系统的负载均衡机制，大部分是针对CFS的。

v3.14.25

负载均衡有两种方式：pull, push:
1) pull拉：负载轻的CPU，从负载繁重的CPU pull tasks来运行。这应该是主要的方式，因为不应该让负载本身就繁重的CPU执行负载均衡任务。相应的为load balance。

2) push推：负载重的CPU，向负载轻的CPU，推送tasks由其帮忙执行。相应的为active balance。

负载均衡的入口有softirq的run_rebalance_domains，和schedule中的idle_balance。

start_kernel()                                                                                                                    
|-->sched_init()
    |-->init_sched_fair_class()
        |-->open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);

run_rebalance_domains
|-->rebalance_domains(this_rq, idle);
|-->nohz_idle_balance(this_rq, idle); //ignore

void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
|-->int continue_balancing = 1;
    int cpu = rq->cpu;
    struct sched_domain *sd;
    unsigned long next_balance = jiffies + 60*HZ;
|-->for_each_domain(cpu, sd) {
      if (!(sd->flags & SD_LOAD_BALANCE)) continue;

      if (!continue_balancing) break;
      //如果在该sched_domain上已经均衡，则无需继续到上层进行均衡操作

      if (time_after_eq(jiffies, sd->last_balance + interval)) { //到了均衡时间
        if (load_balance(cpu, rq, sd, idle, &continue_balancing))
          idle = idle_cpu(cpu) ? CPU_IDLE : CPU_NOT_IDLE;
        sd->last_balance = jiffies;
      }
    }
|-->rq->next_balance = next_balance;


a)find_busiest_group
b)find_busiest_queue
c)move_tasks
4)active balance
int load_balance(int this_cpu, struct rq *this_rq,
                 struct sched_domain *sd,
                 enum cpu_idle_type idle,
                 int *continue_balancing) 
|-->int ld_moved, cur_ld_moved, active_balance = 0;
    struct sched_domain *sd_parent = sd->parent;
    struct sched_group *group;
    struct rq *busiest;
    unsigned long flags;
    struct cpumask *cpus = __get_cpu_var(load_balance_mask);
|-->struct lb_env env = {
      .sd     = sd,
      .dst_cpu    = this_cpu,
      .dst_rq     = this_rq,
      .dst_grpmask    = sched_group_cpus(sd->groups),
      .idle       = idle,
      .loop_break = sched_nr_migrate_break,
      .cpus       = cpus,
      .fbq_type   = all,
      };
|-->if (!should_we_balance(&env) {
      *continue_balancing = 0;
      goto out_balanced;
    }
|-->group = find_busiest_group(&env);
|-->busiest = find_busiest_queue(&env, group);
|-->if (busiest->nr_running > 1) {
      env.flags |= LBF_ALL_PINNED;
      env.src_cpu   = busiest->cpu;
      env.src_rq    = busiest;
      env.loop_max  = min(sysctl_sched_nr_migrate, busiest->nr_running);
      cur_ld_moved = move_tasks(&env);
      ld_moved += cur_ld_moved;
      if (cur_ld_moved && env.dst_cpu != smp_processor_id())
        resched_cpu(env.dst_cpu);
    }
|-->if (!ld_moved) {
      if (need_active_balance(&env)) {  //active balance
        if (active_balance) {
          stop_one_cpu_nowait(cpu_of(busiest),
                              active_load_balance_cpu_stop, busiest,
                              &busiest->active_balance_work);
        }
      }
    }


struct sched_group *find_busiest_group(struct lb_env *env)
|-->struct sg_lb_stats *local, *busiest;
    struct sd_lb_stats sds;
|-->update_sd_lb_stats(env, &sds); //遍历sd下组成循环链表的sg来统计负载信息
|-->local = &sds.local_stat;
    busiest = &sds.busiest_stat;
|-->...
|-->force_balance:
      calculate_imbalance(env, &sds); //计算this group & busiest group与avg_load的比较
      return sds.busiest;
|-->out_balanced:
      env->imbalance = 0;
      return NULL;

理解：
sgs->avg_load = (sgs->group_load*SCHED_POWER_SCALE) / sgs->group_power;
CPU power of this group, SCHED_LOAD_SCALE being max power for a single CPU.
SCHED_LOAD_SCALE是一个CPU的最大负载能力，sgs->group_power是当前groups上所有CPU的负载能力，因此用当前groups的负载量sgs->group_load>乘以 SCHED_POWER_SCALE/sgs->group_power就等于当前groups上每个CPU的平均负载量。

idle balance是在是在schedule中，选择下一个进程执行的时候，发现没有进程可以调度时，就考虑从其它sd中pull进程来执行。
static void __sched __schedule(void)
|-->if (unlikely(!rq->nr_running))
|-->idle_balance(cpu, rq);
|-->load_balance

负载均衡触发的时机：
1)scheduler_tick(简介)调用update_cpu_load，然后使用trigger_load_balance看是否需要均衡；……raise_softirq(SCHED_SOFTIRQ)
2)__schedule中，发现空闲时调用idle_balance
3)softirq周期性的触发均衡操作

均衡的类型有：
1)pull balance, 由softirq执行
2)push balance(active balance)，由workqueue执行……active_load_balance_cpu_stop
3)idle_balance，直接执行

idle的类型有newly_idle, not idle, idle:
1)在rebalance_domains中，idle = idle_cpu(cpu) ? CPU_IDLE : CPU_NOT_IDLE
2)在__schedule()中进行idle_balance时传入CPU_NEWLY_IDLE

花了两天时间，查看了主要步骤，依然不得要领……是不是该停下，还有好多datasheet未看……

posted on 2014-12-01 00:06 阿加阅读(993) 评论(0) 收藏举报

刷新页面返回顶部

Load Balancing(三)

导航

公告