mirror of
				https://github.com/torvalds/linux.git
				synced 2025-11-04 10:40:15 +02:00 
			
		
		
		
	The RCU-bh update API is now defined in terms of that of RCU-bh and RCU-sched, so this commit updates the documentation accordingly. In addition, although RCU-sched persists in !PREEMPT kernels, in the PREEMPT case its update API is now defined in terms of that of RCU-preempt, so this commit also updates the documentation accordingly. While in the area, this commit removes the documentation for the now-obsolete synchronize_rcu_mult() and clarifies the Tasks RCU documentation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
		
			
				
	
	
		
			356 lines
		
	
	
	
		
			13 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			356 lines
		
	
	
	
		
			13 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
==========================================
 | 
						|
Reducing OS jitter due to per-cpu kthreads
 | 
						|
==========================================
 | 
						|
 | 
						|
This document lists per-CPU kthreads in the Linux kernel and presents
 | 
						|
options to control their OS jitter.  Note that non-per-CPU kthreads are
 | 
						|
not listed here.  To reduce OS jitter from non-per-CPU kthreads, bind
 | 
						|
them to a "housekeeping" CPU dedicated to such work.
 | 
						|
 | 
						|
References
 | 
						|
==========
 | 
						|
 | 
						|
-	Documentation/IRQ-affinity.txt:  Binding interrupts to sets of CPUs.
 | 
						|
 | 
						|
-	Documentation/cgroup-v1:  Using cgroups to bind tasks to sets of CPUs.
 | 
						|
 | 
						|
-	man taskset:  Using the taskset command to bind tasks to sets
 | 
						|
	of CPUs.
 | 
						|
 | 
						|
-	man sched_setaffinity:  Using the sched_setaffinity() system
 | 
						|
	call to bind tasks to sets of CPUs.
 | 
						|
 | 
						|
-	/sys/devices/system/cpu/cpuN/online:  Control CPU N's hotplug state,
 | 
						|
	writing "0" to offline and "1" to online.
 | 
						|
 | 
						|
-	In order to locate kernel-generated OS jitter on CPU N:
 | 
						|
 | 
						|
		cd /sys/kernel/debug/tracing
 | 
						|
		echo 1 > max_graph_depth # Increase the "1" for more detail
 | 
						|
		echo function_graph > current_tracer
 | 
						|
		# run workload
 | 
						|
		cat per_cpu/cpuN/trace
 | 
						|
 | 
						|
kthreads
 | 
						|
========
 | 
						|
 | 
						|
Name:
 | 
						|
  ehca_comp/%u
 | 
						|
 | 
						|
Purpose:
 | 
						|
  Periodically process Infiniband-related work.
 | 
						|
 | 
						|
To reduce its OS jitter, do any of the following:
 | 
						|
 | 
						|
1.	Don't use eHCA Infiniband hardware, instead choosing hardware
 | 
						|
	that does not require per-CPU kthreads.  This will prevent these
 | 
						|
	kthreads from being created in the first place.  (This will
 | 
						|
	work for most people, as this hardware, though important, is
 | 
						|
	relatively old and is produced in relatively low unit volumes.)
 | 
						|
2.	Do all eHCA-Infiniband-related work on other CPUs, including
 | 
						|
	interrupts.
 | 
						|
3.	Rework the eHCA driver so that its per-CPU kthreads are
 | 
						|
	provisioned only on selected CPUs.
 | 
						|
 | 
						|
 | 
						|
Name:
 | 
						|
  irq/%d-%s
 | 
						|
 | 
						|
Purpose:
 | 
						|
  Handle threaded interrupts.
 | 
						|
 | 
						|
To reduce its OS jitter, do the following:
 | 
						|
 | 
						|
1.	Use irq affinity to force the irq threads to execute on
 | 
						|
	some other CPU.
 | 
						|
 | 
						|
Name:
 | 
						|
  kcmtpd_ctr_%d
 | 
						|
 | 
						|
Purpose:
 | 
						|
  Handle Bluetooth work.
 | 
						|
 | 
						|
To reduce its OS jitter, do one of the following:
 | 
						|
 | 
						|
1.	Don't use Bluetooth, in which case these kthreads won't be
 | 
						|
	created in the first place.
 | 
						|
2.	Use irq affinity to force Bluetooth-related interrupts to
 | 
						|
	occur on some other CPU and furthermore initiate all
 | 
						|
	Bluetooth activity on some other CPU.
 | 
						|
 | 
						|
Name:
 | 
						|
  ksoftirqd/%u
 | 
						|
 | 
						|
Purpose:
 | 
						|
  Execute softirq handlers when threaded or when under heavy load.
 | 
						|
 | 
						|
To reduce its OS jitter, each softirq vector must be handled
 | 
						|
separately as follows:
 | 
						|
 | 
						|
TIMER_SOFTIRQ
 | 
						|
-------------
 | 
						|
 | 
						|
Do all of the following:
 | 
						|
 | 
						|
1.	To the extent possible, keep the CPU out of the kernel when it
 | 
						|
	is non-idle, for example, by avoiding system calls and by forcing
 | 
						|
	both kernel threads and interrupts to execute elsewhere.
 | 
						|
2.	Build with CONFIG_HOTPLUG_CPU=y.  After boot completes, force
 | 
						|
	the CPU offline, then bring it back online.  This forces
 | 
						|
	recurring timers to migrate elsewhere.	If you are concerned
 | 
						|
	with multiple CPUs, force them all offline before bringing the
 | 
						|
	first one back online.  Once you have onlined the CPUs in question,
 | 
						|
	do not offline any other CPUs, because doing so could force the
 | 
						|
	timer back onto one of the CPUs in question.
 | 
						|
 | 
						|
NET_TX_SOFTIRQ and NET_RX_SOFTIRQ
 | 
						|
---------------------------------
 | 
						|
 | 
						|
Do all of the following:
 | 
						|
 | 
						|
1.	Force networking interrupts onto other CPUs.
 | 
						|
2.	Initiate any network I/O on other CPUs.
 | 
						|
3.	Once your application has started, prevent CPU-hotplug operations
 | 
						|
	from being initiated from tasks that might run on the CPU to
 | 
						|
	be de-jittered.  (It is OK to force this CPU offline and then
 | 
						|
	bring it back online before you start your application.)
 | 
						|
 | 
						|
BLOCK_SOFTIRQ
 | 
						|
-------------
 | 
						|
 | 
						|
Do all of the following:
 | 
						|
 | 
						|
1.	Force block-device interrupts onto some other CPU.
 | 
						|
2.	Initiate any block I/O on other CPUs.
 | 
						|
3.	Once your application has started, prevent CPU-hotplug operations
 | 
						|
	from being initiated from tasks that might run on the CPU to
 | 
						|
	be de-jittered.  (It is OK to force this CPU offline and then
 | 
						|
	bring it back online before you start your application.)
 | 
						|
 | 
						|
IRQ_POLL_SOFTIRQ
 | 
						|
----------------
 | 
						|
 | 
						|
Do all of the following:
 | 
						|
 | 
						|
1.	Force block-device interrupts onto some other CPU.
 | 
						|
2.	Initiate any block I/O and block-I/O polling on other CPUs.
 | 
						|
3.	Once your application has started, prevent CPU-hotplug operations
 | 
						|
	from being initiated from tasks that might run on the CPU to
 | 
						|
	be de-jittered.  (It is OK to force this CPU offline and then
 | 
						|
	bring it back online before you start your application.)
 | 
						|
 | 
						|
TASKLET_SOFTIRQ
 | 
						|
---------------
 | 
						|
 | 
						|
Do one or more of the following:
 | 
						|
 | 
						|
1.	Avoid use of drivers that use tasklets.  (Such drivers will contain
 | 
						|
	calls to things like tasklet_schedule().)
 | 
						|
2.	Convert all drivers that you must use from tasklets to workqueues.
 | 
						|
3.	Force interrupts for drivers using tasklets onto other CPUs,
 | 
						|
	and also do I/O involving these drivers on other CPUs.
 | 
						|
 | 
						|
SCHED_SOFTIRQ
 | 
						|
-------------
 | 
						|
 | 
						|
Do all of the following:
 | 
						|
 | 
						|
1.	Avoid sending scheduler IPIs to the CPU to be de-jittered,
 | 
						|
	for example, ensure that at most one runnable kthread is present
 | 
						|
	on that CPU.  If a thread that expects to run on the de-jittered
 | 
						|
	CPU awakens, the scheduler will send an IPI that can result in
 | 
						|
	a subsequent SCHED_SOFTIRQ.
 | 
						|
2.	CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered
 | 
						|
	is marked as an adaptive-ticks CPU using the "nohz_full="
 | 
						|
	boot parameter.  This reduces the number of scheduler-clock
 | 
						|
	interrupts that the de-jittered CPU receives, minimizing its
 | 
						|
	chances of being selected to do the load balancing work that
 | 
						|
	runs in SCHED_SOFTIRQ context.
 | 
						|
3.	To the extent possible, keep the CPU out of the kernel when it
 | 
						|
	is non-idle, for example, by avoiding system calls and by
 | 
						|
	forcing both kernel threads and interrupts to execute elsewhere.
 | 
						|
	This further reduces the number of scheduler-clock interrupts
 | 
						|
	received by the de-jittered CPU.
 | 
						|
 | 
						|
HRTIMER_SOFTIRQ
 | 
						|
---------------
 | 
						|
 | 
						|
Do all of the following:
 | 
						|
 | 
						|
1.	To the extent possible, keep the CPU out of the kernel when it
 | 
						|
	is non-idle.  For example, avoid system calls and force both
 | 
						|
	kernel threads and interrupts to execute elsewhere.
 | 
						|
2.	Build with CONFIG_HOTPLUG_CPU=y.  Once boot completes, force the
 | 
						|
	CPU offline, then bring it back online.  This forces recurring
 | 
						|
	timers to migrate elsewhere.  If you are concerned with multiple
 | 
						|
	CPUs, force them all offline before bringing the first one
 | 
						|
	back online.  Once you have onlined the CPUs in question, do not
 | 
						|
	offline any other CPUs, because doing so could force the timer
 | 
						|
	back onto one of the CPUs in question.
 | 
						|
 | 
						|
RCU_SOFTIRQ
 | 
						|
-----------
 | 
						|
 | 
						|
Do at least one of the following:
 | 
						|
 | 
						|
1.	Offload callbacks and keep the CPU in either dyntick-idle or
 | 
						|
	adaptive-ticks state by doing all of the following:
 | 
						|
 | 
						|
	a.	CONFIG_NO_HZ_FULL=y and ensure that the CPU to be
 | 
						|
		de-jittered is marked as an adaptive-ticks CPU using the
 | 
						|
		"nohz_full=" boot parameter.  Bind the rcuo kthreads to
 | 
						|
		housekeeping CPUs, which can tolerate OS jitter.
 | 
						|
	b.	To the extent possible, keep the CPU out of the kernel
 | 
						|
		when it is non-idle, for example, by avoiding system
 | 
						|
		calls and by forcing both kernel threads and interrupts
 | 
						|
		to execute elsewhere.
 | 
						|
 | 
						|
2.	Enable RCU to do its processing remotely via dyntick-idle by
 | 
						|
	doing all of the following:
 | 
						|
 | 
						|
	a.	Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y.
 | 
						|
	b.	Ensure that the CPU goes idle frequently, allowing other
 | 
						|
		CPUs to detect that it has passed through an RCU quiescent
 | 
						|
		state.	If the kernel is built with CONFIG_NO_HZ_FULL=y,
 | 
						|
		userspace execution also allows other CPUs to detect that
 | 
						|
		the CPU in question has passed through a quiescent state.
 | 
						|
	c.	To the extent possible, keep the CPU out of the kernel
 | 
						|
		when it is non-idle, for example, by avoiding system
 | 
						|
		calls and by forcing both kernel threads and interrupts
 | 
						|
		to execute elsewhere.
 | 
						|
 | 
						|
Name:
 | 
						|
  kworker/%u:%d%s (cpu, id, priority)
 | 
						|
 | 
						|
Purpose:
 | 
						|
  Execute workqueue requests
 | 
						|
 | 
						|
To reduce its OS jitter, do any of the following:
 | 
						|
 | 
						|
1.	Run your workload at a real-time priority, which will allow
 | 
						|
	preempting the kworker daemons.
 | 
						|
2.	A given workqueue can be made visible in the sysfs filesystem
 | 
						|
	by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
 | 
						|
	Such a workqueue can be confined to a given subset of the
 | 
						|
	CPUs using the ``/sys/devices/virtual/workqueue/*/cpumask`` sysfs
 | 
						|
	files.	The set of WQ_SYSFS workqueues can be displayed using
 | 
						|
	"ls sys/devices/virtual/workqueue".  That said, the workqueues
 | 
						|
	maintainer would like to caution people against indiscriminately
 | 
						|
	sprinkling WQ_SYSFS across all the workqueues.	The reason for
 | 
						|
	caution is that it is easy to add WQ_SYSFS, but because sysfs is
 | 
						|
	part of the formal user/kernel API, it can be nearly impossible
 | 
						|
	to remove it, even if its addition was a mistake.
 | 
						|
3.	Do any of the following needed to avoid jitter that your
 | 
						|
	application cannot tolerate:
 | 
						|
 | 
						|
	a.	Build your kernel with CONFIG_SLUB=y rather than
 | 
						|
		CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
 | 
						|
		use of each CPU's workqueues to run its cache_reap()
 | 
						|
		function.
 | 
						|
	b.	Avoid using oprofile, thus avoiding OS jitter from
 | 
						|
		wq_sync_buffer().
 | 
						|
	c.	Limit your CPU frequency so that a CPU-frequency
 | 
						|
		governor is not required, possibly enlisting the aid of
 | 
						|
		special heatsinks or other cooling technologies.  If done
 | 
						|
		correctly, and if you CPU architecture permits, you should
 | 
						|
		be able to build your kernel with CONFIG_CPU_FREQ=n to
 | 
						|
		avoid the CPU-frequency governor periodically running
 | 
						|
		on each CPU, including cs_dbs_timer() and od_dbs_timer().
 | 
						|
 | 
						|
		WARNING:  Please check your CPU specifications to
 | 
						|
		make sure that this is safe on your particular system.
 | 
						|
	d.	As of v3.18, Christoph Lameter's on-demand vmstat workers
 | 
						|
		commit prevents OS jitter due to vmstat_update() on
 | 
						|
		CONFIG_SMP=y systems.  Before v3.18, is not possible
 | 
						|
		to entirely get rid of the OS jitter, but you can
 | 
						|
		decrease its frequency by writing a large value to
 | 
						|
		/proc/sys/vm/stat_interval.  The default value is HZ,
 | 
						|
		for an interval of one second.	Of course, larger values
 | 
						|
		will make your virtual-memory statistics update more
 | 
						|
		slowly.  Of course, you can also run your workload at
 | 
						|
		a real-time priority, thus preempting vmstat_update(),
 | 
						|
		but if your workload is CPU-bound, this is a bad idea.
 | 
						|
		However, there is an RFC patch from Christoph Lameter
 | 
						|
		(based on an earlier one from Gilad Ben-Yossef) that
 | 
						|
		reduces or even eliminates vmstat overhead for some
 | 
						|
		workloads at https://lkml.org/lkml/2013/9/4/379.
 | 
						|
	e.	Boot with "elevator=noop" to avoid workqueue use by
 | 
						|
		the block layer.
 | 
						|
	f.	If running on high-end powerpc servers, build with
 | 
						|
		CONFIG_PPC_RTAS_DAEMON=n.  This prevents the RTAS
 | 
						|
		daemon from running on each CPU every second or so.
 | 
						|
		(This will require editing Kconfig files and will defeat
 | 
						|
		this platform's RAS functionality.)  This avoids jitter
 | 
						|
		due to the rtas_event_scan() function.
 | 
						|
		WARNING:  Please check your CPU specifications to
 | 
						|
		make sure that this is safe on your particular system.
 | 
						|
	g.	If running on Cell Processor, build your kernel with
 | 
						|
		CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
 | 
						|
		spu_gov_work().
 | 
						|
		WARNING:  Please check your CPU specifications to
 | 
						|
		make sure that this is safe on your particular system.
 | 
						|
	h.	If running on PowerMAC, build your kernel with
 | 
						|
		CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
 | 
						|
		avoiding OS jitter from rackmeter_do_timer().
 | 
						|
 | 
						|
Name:
 | 
						|
  rcuc/%u
 | 
						|
 | 
						|
Purpose:
 | 
						|
  Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels.
 | 
						|
 | 
						|
To reduce its OS jitter, do at least one of the following:
 | 
						|
 | 
						|
1.	Build the kernel with CONFIG_PREEMPT=n.  This prevents these
 | 
						|
	kthreads from being created in the first place, and also obviates
 | 
						|
	the need for RCU priority boosting.  This approach is feasible
 | 
						|
	for workloads that do not require high degrees of responsiveness.
 | 
						|
2.	Build the kernel with CONFIG_RCU_BOOST=n.  This prevents these
 | 
						|
	kthreads from being created in the first place.  This approach
 | 
						|
	is feasible only if your workload never requires RCU priority
 | 
						|
	boosting, for example, if you ensure frequent idle time on all
 | 
						|
	CPUs that might execute within the kernel.
 | 
						|
3.	Build with CONFIG_RCU_NOCB_CPU=y and boot with the rcu_nocbs=
 | 
						|
	boot parameter offloading RCU callbacks from all CPUs susceptible
 | 
						|
	to OS jitter.  This approach prevents the rcuc/%u kthreads from
 | 
						|
	having any work to do, so that they are never awakened.
 | 
						|
4.	Ensure that the CPU never enters the kernel, and, in particular,
 | 
						|
	avoid initiating any CPU hotplug operations on this CPU.  This is
 | 
						|
	another way of preventing any callbacks from being queued on the
 | 
						|
	CPU, again preventing the rcuc/%u kthreads from having any work
 | 
						|
	to do.
 | 
						|
 | 
						|
Name:
 | 
						|
  rcuop/%d and rcuos/%d
 | 
						|
 | 
						|
Purpose:
 | 
						|
  Offload RCU callbacks from the corresponding CPU.
 | 
						|
 | 
						|
To reduce its OS jitter, do at least one of the following:
 | 
						|
 | 
						|
1.	Use affinity, cgroups, or other mechanism to force these kthreads
 | 
						|
	to execute on some other CPU.
 | 
						|
2.	Build with CONFIG_RCU_NOCB_CPU=n, which will prevent these
 | 
						|
	kthreads from being created in the first place.  However, please
 | 
						|
	note that this will not eliminate OS jitter, but will instead
 | 
						|
	shift it to RCU_SOFTIRQ.
 | 
						|
 | 
						|
Name:
 | 
						|
  watchdog/%u
 | 
						|
 | 
						|
Purpose:
 | 
						|
  Detect software lockups on each CPU.
 | 
						|
 | 
						|
To reduce its OS jitter, do at least one of the following:
 | 
						|
 | 
						|
1.	Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these
 | 
						|
	kthreads from being created in the first place.
 | 
						|
2.	Boot with "nosoftlockup=0", which will also prevent these kthreads
 | 
						|
	from being created.  Other related watchdog and softlockup boot
 | 
						|
	parameters may be found in Documentation/admin-guide/kernel-parameters.rst
 | 
						|
	and Documentation/watchdog/watchdog-parameters.txt.
 | 
						|
3.	Echo a zero to /proc/sys/kernel/watchdog to disable the
 | 
						|
	watchdog timer.
 | 
						|
4.	Echo a large number of /proc/sys/kernel/watchdog_thresh in
 | 
						|
	order to reduce the frequency of OS jitter due to the watchdog
 | 
						|
	timer down to a level that is acceptable for your workload.
 |