linux/kernel
Andrea Righi 1626e5ef0b sched_ext: Fix lock imbalance in dispatch_to_local_dsq()
While performing the rq locking dance in dispatch_to_local_dsq(), we may
trigger the following lock imbalance condition, in particular when
multiple tasks are rapidly changing CPU affinity (i.e., running a
`stress-ng --race-sched 0`):

[   13.413579] =====================================
[   13.413660] WARNING: bad unlock balance detected!
[   13.413729] 6.13.0-virtme #15 Not tainted
[   13.413792] -------------------------------------
[   13.413859] kworker/1:1/80 is trying to release lock (&rq->__lock) at:
[   13.413954] [<ffffffff873c6c48>] dispatch_to_local_dsq+0x108/0x1a0
[   13.414111] but there are no more locks to release!
[   13.414176]
[   13.414176] other info that might help us debug this:
[   13.414258] 1 lock held by kworker/1:1/80:
[   13.414318]  #0: ffff8b66feb41698 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x20/0x90
[   13.414612]
[   13.414612] stack backtrace:
[   13.415255] CPU: 1 UID: 0 PID: 80 Comm: kworker/1:1 Not tainted 6.13.0-virtme #15
[   13.415505] Workqueue:  0x0 (events)
[   13.415567] Sched_ext: dsp_local_on (enabled+all), task: runnable_at=-2ms
[   13.415570] Call Trace:
[   13.415700]  <TASK>
[   13.415744]  dump_stack_lvl+0x78/0xe0
[   13.415806]  ? dispatch_to_local_dsq+0x108/0x1a0
[   13.415884]  print_unlock_imbalance_bug+0x11b/0x130
[   13.415965]  ? dispatch_to_local_dsq+0x108/0x1a0
[   13.416226]  lock_release+0x231/0x2c0
[   13.416326]  _raw_spin_unlock+0x1b/0x40
[   13.416422]  dispatch_to_local_dsq+0x108/0x1a0
[   13.416554]  flush_dispatch_buf+0x199/0x1d0
[   13.416652]  balance_one+0x194/0x370
[   13.416751]  balance_scx+0x61/0x1e0
[   13.416848]  prev_balance+0x43/0xb0
[   13.416947]  __pick_next_task+0x6b/0x1b0
[   13.417052]  __schedule+0x20d/0x1740

This happens because dispatch_to_local_dsq() is racing with
dispatch_dequeue() and, when the latter wins, we incorrectly assume that
the task has been moved to dst_rq.

Fix by properly tracking the currently locked rq.

Fixes: 4d3ca89bdd ("sched_ext: Refactor consume_remote_task()")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-01-27 12:41:12 -10:00
..
bpf bpf-next-6.14 2025-01-23 08:04:07 -08:00
cgroup drm next for 6.14-rc1 2025-01-21 16:09:47 -08:00
configs configs/debug: make sure PROVE_RCU_LIST=y takes effect 2024-10-28 10:21:09 -07:00
debug kdb: fix ctrl+e/a/f/b/d/p/n broken in keyboard mode 2024-11-18 15:20:22 +00:00
dma dma-debug: fix physical address calculation for struct dma_debug_entry 2024-11-28 10:19:16 +01:00
entry sched: Add TIF_NEED_RESCHED_LAZY infrastructure 2024-11-05 12:55:37 +01:00
events Performance events changes for v6.14: 2025-01-21 10:52:03 -08:00
futex sched/wake_q: Add helper to call wake_up_q after unlock with preemption disabled 2024-12-20 15:31:21 +01:00
gcov
irq Updates for the interrupt subsystem: 2025-01-21 13:51:07 -08:00
kcsan kcsan: Remove redundant call of kallsyms_lookup_name() 2024-10-14 16:44:56 +02:00
livepatch livepatch: Add stack_order sysfs attribute 2024-12-09 11:44:03 +01:00
locking RCU pull request for v6.14 2025-01-21 14:39:21 -08:00
module module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
power Merge branches 'pm-sleep', 'pm-cpuidle' and 'pm-em' 2025-01-20 19:14:15 +01:00
printk Merge branch 'for-6.14-cpu_sync-fixup' into for-linus 2025-01-20 13:40:52 +01:00
rcu Kthreads affinity follow either of 4 existing different patterns: 2025-01-21 17:10:05 -08:00
sched sched_ext: Fix lock imbalance in dispatch_to_local_dsq() 2025-01-27 12:41:12 -10:00
time Updates for timers and timekeeping: 2025-01-21 13:16:00 -08:00
trace Fix atomic64 operations on some architectures for the tracing ring buffer: 2025-01-23 18:02:55 -08:00
.gitignore
acct.c acct: avoid pointless reference count bump 2024-12-02 11:25:13 +01:00
async.c
audit.c lsm: replace context+len with lsm_context 2024-12-04 14:42:31 -05:00
audit.h audit: change context data from secid to lsm_prop 2024-10-11 14:34:16 -04:00
audit_fsnotify.c
audit_tree.c
audit_watch.c
auditfilter.c audit: fix suffixed '/' filename matching 2024-12-05 19:22:38 -05:00
auditsc.c lsm/stable-6.14 PR 20250121 2025-01-21 20:03:04 -08:00
backtracetest.c
bounds.c
capability.c
cfi.c
compat.c
configs.c
context_tracking.c
cpu.c hrtimers: Handle CPU state correctly on hotplug 2025-01-16 13:06:14 +01:00
cpu_pm.c
crash_core.c kexec/crash: no crash update when kexec in progress 2024-11-05 17:12:27 -08:00
crash_reserve.c
cred.c cred: remove old {override,revert}_creds() helpers 2024-12-02 11:25:09 +01:00
delayacct.c
dma.c
elfcorehdr.c
exec_domain.c
exit.c remove pointless includes of <linux/fdtable.h> 2024-10-07 13:34:41 -04:00
exit.h
extable.c
fail_function.c
fork.c \n 2025-01-23 13:36:06 -08:00
freezer.c sched/fair: Fix external p->on_rq users 2024-10-14 09:14:35 +02:00
gen_kheaders.sh kheaders: Ignore silly-rename files 2024-12-20 22:07:55 +01:00
groups.c
hung_task.c hung_task: add detect count for hung tasks 2024-11-11 17:17:03 -08:00
iomem.c
irq_work.c
jump_label.c jump_label: Fix static_key_slow_dec() yet again 2024-09-10 11:57:27 +02:00
kallsyms.c
kallsyms_internal.h
kallsyms_selftest.c kallsyms: Use kthread_run_on_cpu() 2025-01-02 22:12:12 +01:00
kallsyms_selftest.h
kcmp.c get rid of ...lookup...fdget_rcu() family 2024-10-07 13:34:41 -04:00
Kconfig.freezer
Kconfig.hz
Kconfig.kexec crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32 2024-11-14 22:43:48 -08:00
Kconfig.locks
Kconfig.preempt sched: No PREEMPT_RT=y for all{yes,mod}config 2024-11-07 15:25:05 +01:00
kcov.c kcov: mark in_softirq_really() as __always_inline 2024-12-30 17:59:08 -08:00
kexec.c
kexec_core.c kexec_core: Add and update comments regarding the KEXEC_JUMP flow 2025-01-14 13:03:34 +01:00
kexec_elf.c
kexec_file.c
kexec_internal.h
kheaders.c
kprobes.c kprobes: Remove remaining gotos 2025-01-10 09:00:13 +09:00
ksyms_common.c
ksysfs.c
kthread.c Kthreads affinity follow either of 4 existing different patterns: 2025-01-21 17:10:05 -08:00
latencytop.c
Makefile
module_signature.c
notifier.c reboot: move reboot_notifier_list to kernel/reboot.c 2024-11-05 17:12:31 -08:00
nsproxy.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
padata.c padata: avoid UAF for reorder_work 2025-01-19 12:44:28 +08:00
panic.c drm next for 6.12-rc1 2024-09-19 10:18:15 +02:00
params.c
pid.c kernel-6.14-rc1.pid 2025-01-20 10:29:11 -08:00
pid_namespace.c pid: allow pid_max to be set per pid namespace 2024-12-02 11:25:25 +01:00
pid_sysctl.h
profile.c
ptrace.c
range.c
reboot.c kernel/reboot: replace sprintf() with sysfs_emit() 2024-11-11 17:17:05 -08:00
regset.c
relay.c [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
resource.c module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
resource_kunit.c resource, kunit: fix user-after-free in resource_test_region_intersects() 2024-10-09 12:47:19 -07:00
rseq.c rseq: Fix rseq unregistration regression 2025-01-21 08:10:51 +01:00
scftorture.c scftorture: Handle NULL argument passed to scf_add_to_free_list(). 2024-11-14 16:09:51 -08:00
scs.c
seccomp.c
signal.c signal/posixtimers: Handle ignore/blocked sequences correctly 2025-01-15 18:08:01 +01:00
smp.c smp/scf: Evaluate local cond_func() before IPI side-effects 2024-12-05 14:25:28 +01:00
smpboot.c
smpboot.h
softirq.c softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel 2024-12-02 12:01:27 +01:00
stackleak.c stackleak: Use str_enabled_disabled() helper in stack_erasing_sysctl() 2024-12-22 20:28:11 -08:00
stacktrace.c
static_call.c
static_call_inline.c x86/static-call: provide a way to do very early static-call updates 2024-12-13 09:28:32 +01:00
stop_machine.c
sys.c tracing: Add task_prctl_unknown tracepoint 2024-12-22 20:28:11 -08:00
sys_ni.c
sysctl-test.c
sysctl.c pid: allow pid_max to be set per pid namespace 2024-12-02 11:25:25 +01:00
task_work.c sched/core: Disable page allocation in task_tick_mm_cid() 2024-10-11 10:49:32 +02:00
taskstats.c fdget(), more trivial conversions 2024-11-03 01:28:06 -05:00
torture.c
tracepoint.c tracing: Fix syscall tracepoint use-after-free 2024-11-01 14:37:31 -04:00
tsacct.c
ucount.c Summary 2024-11-22 20:36:11 -08:00
uid16.c
uid16.h
umh.c remove pointless includes of <linux/fdtable.h> 2024-10-07 13:34:41 -04:00
up.c
user-return-notifier.c
user.c uidgid: make sure we fit into one cacheline 2024-09-12 12:16:09 +02:00
user_namespace.c user_namespace: use kmemdup_array() instead of kmemdup() for multiple allocation 2024-09-09 16:47:42 -07:00
usermode_driver.c
utsname.c
utsname_sysctl.c
vhost_task.c
vmcore_info.c
watch_queue.c watch_queue: Use page->private instead of page->index 2024-12-22 11:29:51 +01:00
watchdog.c - The series "resource: A couple of cleanups" from Andy Shevchenko 2024-11-25 16:09:48 -08:00
watchdog_buddy.c
watchdog_perf.c
workqueue.c Kthreads affinity follow either of 4 existing different patterns: 2025-01-21 17:10:05 -08:00
workqueue_internal.h