mirror of
				https://github.com/torvalds/linux.git
				synced 2025-11-04 02:30:34 +02:00 
			
		
		
		
	
				On graniterapids the cache home agent (CHA) and memory controller
(IMC) PMUs all have their cpumask set to per-socket information. In
order for per NUMA node aggregation to work correctly the PMUs cpumask
needs to be set to CPUs for the relevant sub-NUMA grouping.
For example, on a 2 socket graniterapids machine with sub NUMA
clustering of 3, for uncore_cha and uncore_imc PMUs the cpumask is
"0,120" leading to aggregation only on NUMA nodes 0 and 3:
```
$ perf stat --per-node -e 'UNC_CHA_CLOCKTICKS,UNC_M_CLOCKTICKS' -a sleep 1
 Performance counter stats for 'system wide':
N0        1    277,835,681,344      UNC_CHA_CLOCKTICKS
N0        1     19,242,894,228      UNC_M_CLOCKTICKS
N3        1    277,803,448,124      UNC_CHA_CLOCKTICKS
N3        1     19,240,741,498      UNC_M_CLOCKTICKS
       1.002113847 seconds time elapsed
```
By updating the PMUs cpumasks to "0,120", "40,160" and "80,200" then
the correctly 6 NUMA node aggregations are achieved:
```
$ perf stat --per-node -e 'UNC_CHA_CLOCKTICKS,UNC_M_CLOCKTICKS' -a sleep 1
 Performance counter stats for 'system wide':
N0        1     92,748,667,796      UNC_CHA_CLOCKTICKS
N0        0      6,424,021,142      UNC_M_CLOCKTICKS
N1        0     92,753,504,424      UNC_CHA_CLOCKTICKS
N1        1      6,424,308,338      UNC_M_CLOCKTICKS
N2        0     92,751,170,084      UNC_CHA_CLOCKTICKS
N2        0      6,424,227,402      UNC_M_CLOCKTICKS
N3        1     92,745,944,144      UNC_CHA_CLOCKTICKS
N3        0      6,423,752,086      UNC_M_CLOCKTICKS
N4        0     92,725,793,788      UNC_CHA_CLOCKTICKS
N4        1      6,422,393,266      UNC_M_CLOCKTICKS
N5        0     92,717,504,388      UNC_CHA_CLOCKTICKS
N5        0      6,421,842,618      UNC_M_CLOCKTICKS
       1.003406645 seconds time elapsed
```
In general, having the perf tool adjust cpumasks isn't desirable as
ideally the PMU driver would be advertising the correct cpumask.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Weilin Wang <weilin.wang@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20250515181417.491401-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
		
	
					 | 
			||
|---|---|---|
| .. | ||
| alpha | ||
| arc/annotate | ||
| arm | ||
| arm64 | ||
| csky | ||
| loongarch | ||
| mips | ||
| parisc | ||
| powerpc | ||
| riscv | ||
| riscv64/annotate | ||
| s390 | ||
| sh | ||
| sparc | ||
| x86 | ||
| xtensa | ||
| Build | ||
| common.c | ||
| common.h | ||