smp: Improve locality in smp_call_function_any()

smp_call_function_any() tries to make a local call as it's the cheapest
option, or switches to a CPU in the same node. If it's not possible, the
algorithm gives up and searches for any CPU, in a numerical order.

Instead, it can search for the best CPU based on NUMA locality, including
the 2nd nearest hop (a set of equidistant nodes), and higher.

sched_numa_find_nth_cpu() does exactly that, and also helps to drop most
of the housekeeping code.

Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250623000010.10124-2-yury.norov@gmail.com
This commit is contained in:
Yury Norov [NVIDIA] 2025-06-22 20:00:06 -04:00 committed by Thomas Gleixner
parent 09735f0624
commit 5f295519b4

View file

@ -741,32 +741,19 @@ EXPORT_SYMBOL_GPL(smp_call_function_single_async);
*
* Selection preference:
* 1) current cpu if in @mask
* 2) any cpu of current node if in @mask
* 3) any other online cpu in @mask
* 2) nearest cpu in @mask, based on NUMA topology
*/
int smp_call_function_any(const struct cpumask *mask,
smp_call_func_t func, void *info, int wait)
{
unsigned int cpu;
const struct cpumask *nodemask;
int ret;
/* Try for same CPU (cheapest) */
cpu = get_cpu();
if (cpumask_test_cpu(cpu, mask))
goto call;
if (!cpumask_test_cpu(cpu, mask))
cpu = sched_numa_find_nth_cpu(mask, 0, cpu_to_node(cpu));
/* Try for same node. */
nodemask = cpumask_of_node(cpu_to_node(cpu));
for (cpu = cpumask_first_and(nodemask, mask); cpu < nr_cpu_ids;
cpu = cpumask_next_and(cpu, nodemask, mask)) {
if (cpu_online(cpu))
goto call;
}
/* Any online will do: smp_call_function_single handles nr_cpu_ids. */
cpu = cpumask_any_and(mask, cpu_online_mask);
call:
ret = smp_call_function_single(cpu, func, info, wait);
put_cpu();
return ret;