mirror of
				https://github.com/torvalds/linux.git
				synced 2025-11-04 02:30:34 +02:00 
			
		
		
		
	sched/numa: Fix unsafe get_task_struct() in task_numa_assign()
Unlocked access to dst_rq->curr in task_numa_compare() is racy.
If curr task is exiting this may be a reason of use-after-free:
task_numa_compare()                    do_exit()
    ...                                        current->flags |= PF_EXITING;
    ...                                    release_task()
    ...                                        ~~delayed_put_task_struct()~~
    ...                                    schedule()
    rcu_read_lock()                        ...
    cur = ACCESS_ONCE(dst_rq->curr)        ...
        ...                                rq->curr = next;
        ...                                    context_switch()
        ...                                        finish_task_switch()
        ...                                            put_task_struct()
        ...                                                __put_task_struct()
        ...                                                    free_task_struct()
        task_numa_assign()                                     ...
            get_task_struct()                                  ...
As noted by Oleg:
  <<The lockless get_task_struct(tsk) is only safe if tsk == current
    and didn't pass exit_notify(), or if this tsk was found on a rcu
    protected list (say, for_each_process() or find_task_by_vpid()).
    IOW, it is only safe if release_task() was not called before we
    take rcu_read_lock(), in this case we can rely on the fact that
    delayed_put_pid() can not drop the (potentially) last reference
    until rcu_read_unlock().
    And as Kirill pointed out task_numa_compare()->task_numa_assign()
    path does get_task_struct(dst_rq->curr) and this is not safe. The
    task_struct itself can't go away, but rcu_read_lock() can't save
    us from the final put_task_struct() in finish_task_switch(); this
    reference goes away without rcu gp>>
The patch provides simple check of PF_EXITING flag. If it's not set,
this guarantees that call_rcu() of delayed_put_task_struct() callback
hasn't happened yet, so we can safely do get_task_struct() in
task_numa_assign().
Locked dst_rq->lock protects from concurrency with the last schedule().
Reusing or unmapping of cur's memory may happen without it.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1413962231.19914.130.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
			
			
This commit is contained in:
		
							parent
							
								
									aee38ea954
								
							
						
					
					
						commit
						1effd9f193
					
				
					 1 changed files with 12 additions and 2 deletions
				
			
		| 
						 | 
				
			
			@ -1164,9 +1164,19 @@ static void task_numa_compare(struct task_numa_env *env,
 | 
			
		|||
	long moveimp = imp;
 | 
			
		||||
 | 
			
		||||
	rcu_read_lock();
 | 
			
		||||
	cur = ACCESS_ONCE(dst_rq->curr);
 | 
			
		||||
	if (cur->pid == 0) /* idle */
 | 
			
		||||
 | 
			
		||||
	raw_spin_lock_irq(&dst_rq->lock);
 | 
			
		||||
	cur = dst_rq->curr;
 | 
			
		||||
	/*
 | 
			
		||||
	 * No need to move the exiting task, and this ensures that ->curr
 | 
			
		||||
	 * wasn't reaped and thus get_task_struct() in task_numa_assign()
 | 
			
		||||
	 * is safe under RCU read lock.
 | 
			
		||||
	 * Note that rcu_read_lock() itself can't protect from the final
 | 
			
		||||
	 * put_task_struct() after the last schedule().
 | 
			
		||||
	 */
 | 
			
		||||
	if ((cur->flags & PF_EXITING) || is_idle_task(cur))
 | 
			
		||||
		cur = NULL;
 | 
			
		||||
	raw_spin_unlock_irq(&dst_rq->lock);
 | 
			
		||||
 | 
			
		||||
	/*
 | 
			
		||||
	 * "imp" is the fault differential for the source task between the
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue