Load balance is used to rebalance the whole system resources (eg, CPU, memory, etc) so that system performance, scalability (in terms of no matter how many processes contend) and usability (in terms of idle resources can be used up immediately) will be improved. In this article, I mainly present how load balance for CPU reosurces works inside operating systems like Linux Kernel (v4.7.4).
Load Balance Stack Inside Kernel
tick_periodic ->update_process_times ->scheduler_tick ->trigger_load_balance ->raise_softirq
SoftIRQ -> run_rebalance_domains -> rebalance_domains -> load_balance
Like above system call stack, we will see that kernel trigger load balance softirq periodically during each tick. Then, this softirq will start load_balance within system frequency (this can be set in .config when you compile your kernel).
load_balance (linux/kernel/sched/fair.c) --> find_busiest_group ---> find_busiest_queue ----> nr_running > 1 -> detach_tasks && attach_tasks -----> if load_balance here fails, active_balance will be set to wake up Migration Thread on the source CPU
Migration Thread has two key tasks to do:
1, fulfill migration requests originating from the scheduler class;
System call stack:
sched_setaffinity ->__set_cpus_allowed_ptr ->stop_one_cpu ->migration_cpu_stop ->__migrate_task ->move_queued_task ->enqueue_task ->returns the new run queue of destination CPU
2, used to implement active balancing.
System call stack:
load_balance ->stop_one_cpu_nowait (pack into a work and ask migration thread to do) ->active_load_balance_cpu_stop ->detach_one_task ->attach_one_task
Besides, I pick out following comments from kernel to describe how migration thread works. Note that, stopper in kernel is the migration thread on per CPU. You can also look into my previous articles about how migration thread works .
/* * This is how migration works: * * 1) we invoke migration_cpu_stop() on the target CPU using * stop_one_cpu(). * 2) stopper starts to run (implicitly forcing the migrated thread * off the CPU) * 3) it checks whether the migrated task is still in the wrong runqueue. * 4) if it's in the wrong runqueue then the migration thread removes * it and puts it into the right queue. * 5) stopper completes and stop_one_cpu() returns and the migration * is done. */
Conclusion and Future Work
This article mainly talks about how load balancing works inside Linux Kernel. Actually, there are still two unclear questions: 1, who does (which CPU???) load balancing work? 2, What are the mechanisms/rules for load balancing in Linux Kernel. To be more precise, I mean what kind of status kernel has will be regarded already balancing or not balancing? These two questions will be solved later on.
 Linux Kernel Development, Robert Love
 Understanding the Linux Kernel, Daniel P. Bovet, Marco Cesati
 Professional Linux Kernel Architecture, Wolfgang Mauerer