How sched_setaffinity works inside of Linux Kernel

Abstract
Sometimes, we may want to migrate one process/thread to one specific CPU for some specific purpose. In the Unix/Linux systems, you may choose sched_setaffinity to finish this job. This article will help you to understand how sched_setaffinity (or other APIs like pthread_setaffinity_np in user-space) works internal Linux kernel.

Details

SYSCALL_DEFINE3(sched_setaffinity, pid_t, pid, unsigned int, len, unsigned long __user *, user_mask_ptr)
-- sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
--- __set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask, bool check)
---- stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
----- migration_cpu_stop(void *data)
------ __migrate_task(struct rq *rq, struct task_struct *p, int dest_cpu)
------- move_queued_task(struct rq *rq, struct task_struct *p, int new_cpu)
-------- enqueue_task(struct rq *rq, struct task_struct *p, int flags)
--------- returns the new run queue of destination CPU

Above character steps give a workflow of how sched_setaffinity works (how it migrates one process/thread from the run queue of source CPU to the run queue of destination CPU). Let’s analyze them in details (Note that this article is discussing about how sched_setaffinity works inside the Linux Kernel 4.7.4 and other versions may have a little differences).

Obviously, we can find that sched_setaffinity is a system call in Linux System. Sched_setaffinity gets the process ID we want to migrate and the destination CPU mask bits we want to migrate to. Then, it calls __set_cpus_allowed_ptr to do some checking works before migration. In __set_cpus_allowed_ptr, it changes the affinity of the process/thread and then calls the most important function “stop_one_cpu” to do the migration. However, before this, it checks whether the process/thread is running (or is going to run, TASK_WAKING). If true, the stop_one_cpu triggers. If not true, check whether the process/thread is on the run queue of source CPU, if true, the CPU which executes sched_setaffinity just migrates the process/thread from the run queue of source CPU to the run queue of destination CPU directly.

In stop_one_cpu, it invokes migration_cpu_stop on the CPU of the process/thread we want to migrate with high priority. In migration_cpu_stop, it calls __migrate_task to test whether the affinity of the process/thread has been changed correctly previously and then, it calls move_queued_task to move the process/thread from the old run queue. At last, in move_queued_task function, it calls enqueue_task to move the process/thread to the new CPU’s run queue. Up to here, stop_one_cpu function returns and the migration is done.

Conclusion
In a word, the sched_setaffinity does following jobs internal Linux Kernel.
1, Check the status of migrated process.
2, If it is in the running/task_waking status, let the source CPU of this process/thread to do migration.
3, If it is in the run queue of source CPU, let the CPU (executes sched_setaffinity system call) to do migration.
4, If it is in the waiting queue, only change the affinity of the source CPU.
5, The migration is to move the process/thread from old run queue to the new run queue of destionation CPU.

References:
1, Linux Kernel source codes – https://www.kernel.org/
2, http://lxr.free-electrons.com/source/kernel/sched/core.c?v=4.7#L4706

How to aggregate multiple RSS feeds to a single one?

ByEric Ma Mar 24, 2018Jun 26, 2018

How to aggregate multiple RSS feeds to a single one? If you want a program to construct a web service by your own, SimplePie may be a good choice. It is written in PHP. If you just want a tool to make work done, I suggest Yahoo Pipes which works really well for me. (Yahoo…

Transactional memory learning materials

ByQ A Mar 24, 2018

I want to learn transactional memory technologies. Any suggestions on Transactional memory learning materials? Thanks! I highly suggest the Transactional Memory lecture by James R. Larus and Ravi Rajwar of Synthesis Lectures on Computer Architecture: The Transactional Memory lecture:http://www.morganclaypool.com/doi/abs/10.2200/S00070ED1V01Y200611CAC002 Link to the PDF:http://www.morganclaypool.com/doi/pdf/10.2200/S00070ED1V01Y200611CAC002 Read more: Mmaping Memory Range Larger Than the Total Size of Physical…

Good introductions to Hadoop 2.0 (YARN)?

ByEric Ma Mar 24, 2018Feb 26, 2019

Which ones are recommended introductions to Hadoop 2.0 (YARN)? Pointers to webpages are good. Those are good ones that I find: The SoCC13 paper “Apache Hadoop YARN: Yet Another Resource Negotiator” by Vinod Kumar Vavilapalli et al.: http://www.socc2013.org/home/program/a5-vavilapalli.pdf The introduction from Hortonworks by Arun Murthy:http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/ The “Official” one from Apache Hadoop website (very brief):https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html Read…

How to list a git repository’s all branches on the remote server?

ByQ A Mar 24, 2018

I can list all local branches by git branch How to list a git repository’s all branches on the remote server? You need the -r option: git branch -r From the man page of git-branch. -r List or delete (if used with -d) the remote-tracking branches. Read more: Clearing Git History in Local and Remote…

Windows 7 64-bit fails to install on VirtualBox / Linux with status code 0xc0000225

ByQ A Mar 24, 2018Mar 24, 2018

Windows 7 64-bit fails to install on VirtualBox on Linux with status code 0xc0000225: “Windows failed to start. A recent hardware or software change might be the cause. To fix the problem: 1. insert your Windows installation disc and restart your computer. 2. Choose your language settings, and click “Next.” 3. Click “Repair your Computer”….

How to get processes’ I/O utilization percentage

ByWeiwei Jia Mar 24, 2018Jan 7, 2020

Two notices: 1, a process has only one main thread which is itself. 2, a process has many threads. Solution 1: Please use taskstats [1] related interfaces, and send TASKSTATS_TYPE_PID and TASKSTATS_TYPE_TGID commands to kernel to get a process’s ‘blkio_delay_total’ parameter for a process with one main thread and a process with threads separately. Solution…

Similar Posts

Leave a Reply Cancel reply