How does load balancing work inside of operating systems, Linux as an example

Introduction

Load balance is used to rebalance the whole system resources (eg, CPU, memory, etc) so that system performance, scalability (in terms of no matter how many processes contend) and usability (in terms of idle resources can be used up immediately) will be improved. In this article, I mainly present how load balance for CPU reosurces works inside operating systems like Linux Kernel (v4.7.4).

Load Balance Stack Inside Kernel

tick_periodic
->update_process_times
->scheduler_tick
->trigger_load_balance
->raise_softirq
SoftIRQ
-> run_rebalance_domains 
-> rebalance_domains 
-> load_balance

Like above system call stack, we will see that kernel trigger load balance softirq periodically during each tick. Then, this softirq will start load_balance within system frequency (this can be set in .config when you compile your kernel).

load_balance (linux/kernel/sched/fair.c)
       --> find_busiest_group
       ---> find_busiest_queue
       ----> nr_running > 1 -> detach_tasks && attach_tasks
       -----> if load_balance here fails, active_balance will be set to wake up Migration Thread on the source CPU

Migration Thread has two key tasks to do:
1, fulfill migration requests originating from the scheduler class;
System call stack:

sched_setaffinity
->__set_cpus_allowed_ptr
->stop_one_cpu
->migration_cpu_stop
->__migrate_task
->move_queued_task
->enqueue_task
->returns the new run queue of destination CPU

2, used to implement active balancing.
System call stack:

load_balance
->stop_one_cpu_nowait (pack into a work and ask migration thread to do)
->active_load_balance_cpu_stop
->detach_one_task
->attach_one_task

Besides, I pick out following comments from kernel to describe how migration thread works. Note that, stopper in kernel is the migration thread on per CPU. You can also look into my previous articles about how migration thread works [5][6].

/*
 * This is how migration works:
 *
 * 1) we invoke migration_cpu_stop() on the target CPU using
 *    stop_one_cpu().
 * 2) stopper starts to run (implicitly forcing the migrated thread
 *    off the CPU)
 * 3) it checks whether the migrated task is still in the wrong runqueue.
 * 4) if it's in the wrong runqueue then the migration thread removes
 *    it and puts it into the right queue.
 * 5) stopper completes and stop_one_cpu() returns and the migration
 *    is done.
 */

Conclusion and Future Work

This article mainly talks about how load balancing works inside Linux Kernel. Actually, there are still two unclear questions: 1, who does (which CPU???) load balancing work? 2, What are the mechanisms/rules for load balancing in Linux Kernel. To be more precise, I mean what kind of status kernel has will be regarded already balancing or not balancing? These two questions will be solved later on.

References

[1] Linux Kernel Development, Robert Love
[2] Understanding the Linux Kernel, Daniel P. Bovet, Marco Cesati
[3] Professional Linux Kernel Architecture, Wolfgang Mauerer
[4] https://elixir.free-electrons.com/linux/v4.7.4/source
[5] https://www.systutorials.com/239971/migration-thread-works-inside-linux-kernel/
[6] https://www.systutorials.com/239953/sched_setaffinity-works-inside-of-linux-kernel/

Weiwei Jia

Weiwei Jia is a Ph.D. student in the Department of Computer Science at New Jersey Institute of Technology since 2016. His research interests are include storage systems, operating systems and computer systems.

Leave a Reply

Your email address will not be published. Required fields are marked *