Commit graph

25 commits

Author SHA1 Message Date
Babu Moger
19de7113bf x86,fs/resctrl: Fix NULL pointer dereference with events force-disabled in mbm_event mode
The following NULL pointer dereference is encountered on mount of resctrl fs
after booting a system that supports assignable counters with the
"rdt=!mbmtotal,!mbmlocal" kernel parameters:

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  RIP: 0010:mbm_cntr_get
  Call Trace:
  rdtgroup_assign_cntr_event
  rdtgroup_assign_cntrs
  rdt_get_tree

Specifying the kernel parameter "rdt=!mbmtotal,!mbmlocal" effectively disables
the legacy X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features
and the MBM events they represent. This results in the per-domain MBM event
related data structures to not be allocated during early initialization.

resctrl fs initialization follows by implicitly enabling both MBM total and
local events on a system that supports assignable counters (mbm_event mode),
but this enabling occurs after the per-domain data structures have been
created.

After booting, resctrl fs assumes that an enabled event can access all its
state. This results in NULL pointer dereference when resctrl attempts to
access the un-allocated structures of an enabled event.

Remove the late MBM event enabling from resctrl fs.

This leaves a problem where the X86_FEATURE_CQM_MBM_TOTAL and
X86_FEATURE_CQM_MBM_LOCAL features may be disabled while assignable counter
(mbm_event) mode is enabled without any events to support. Switching between
the "default" and "mbm_event" mode without any events is not practical.

Create a dependency between the X86_FEATURE_{CQM_MBM_TOTAL,CQM_MBM_LOCAL} and
X86_FEATURE_ABMC (assignable counter) hardware features. An x86 system that
supports assignable counters now requires support of X86_FEATURE_CQM_MBM_TOTAL
or X86_FEATURE_CQM_MBM_LOCAL.

This ensures all needed MBM related data structures are created before use and
that it is only possible to switch between "default" and "mbm_event" mode when
the same events are available in both modes. This dependency does not exist in
the hardware but this usage of these feature settings work for known systems.

  [ bp: Massage commit message. ]

Fixes: 13390861b4 ("x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature details")
Co-developed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://patch.msgid.link/a62e6ac063d0693475615edd213d5be5e55443e6.1760560934.git.babu.moger@amd.com
2025-10-20 18:06:31 +02:00
Babu Moger
dd86b69d20 fs/resctrl: Fix counter auto-assignment on mkdir with mbm_event enabled
rdt_resource::resctrl_mon::mbm_assign_on_mkdir determines if a counter will
automatically be assigned to an RMID, MBM event pair when its associated
monitor group is created via mkdir.

Testing shows that counters are always automatically assigned to new monitor
groups, whether mbm_assign_on_mkdir is set or not.

To support automatic counter assignment the check for mbm_assign_on_mkdir
should be in rdtgroup_assign_cntrs() that assigns counters during monitor
group creation. Instead, the check for mbm_assign_on_mkdir is in
rdtgroup_unassign_cntrs() that is called on monitor group deletion from where
counters should always be unassigned, whether mbm_assign_on_mkdir is set or
not.

Fix automatic counter assignment by moving the mbm_assign_on_mkdir check from
rdtgroup_unassign_cntrs() to rdtgroup_assign_cntrs().

  [ bp: Replace commit message with Reinette's version. ]

Fixes: ef712fe97e ("fs/resctrl: Auto assign counters on mkdir and clean up on group removal")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
2025-09-17 11:31:12 +02:00
Babu Moger
8004ea01cf fs/resctrl: Introduce the interface to switch between monitor modes
Resctrl subsystem can support two monitoring modes, "mbm_event" or "default".
In mbm_event mode, monitoring event can only accumulate data while it is
backed by a hardware counter. In "default" mode, resctrl assumes there is
a hardware counter for each event within every CTRL_MON and MON group.

Introduce mbm_assign_mode resctrl file to switch between mbm_event and default
modes.

Example:
To list the MBM monitor modes supported:
  $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
  [mbm_event]
  default

To enable the "mbm_event" counter assignment mode:
  $ echo "mbm_event" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode

To enable the "default" monitoring mode:
  $ echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode

Reset MBM event counters automatically as part of changing the mode.  Clear
both architectural and non-architectural event states to prevent overflow
conditions during the next event read. Clear assignable counter configuration
on all the domains. Also, enable auto assignment when switching to "mbm_event"
mode.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:49:18 +02:00
Babu Moger
88bee79640 fs/resctrl: Introduce the interface to modify assignments in a group
Enable the mbm_l3_assignments resctrl file to be used to modify counter
assignments of CTRL_MON and MON groups when the "mbm_event" counter
assignment mode is enabled.

Process the assignment modifications in the following format:
<Event>:<Domain id>=<Assignment state>;<Domain id>=<Assignment state>

Event: A valid MBM event in the
       /sys/fs/resctrl/info/L3_MON/event_configs directory.

Domain ID: A valid domain ID. When writing, '*' applies the changes
	   to all domains.

Assignment states:

    _ : Unassign a counter.

    e : Assign a counter exclusively.

Examples:

  $ cd /sys/fs/resctrl
  $ cat /sys/fs/resctrl/mbm_L3_assignments
    mbm_total_bytes:0=e;1=e
    mbm_local_bytes:0=e;1=e

To unassign the counter associated with the mbm_total_bytes event on
domain 0:

  $ echo "mbm_total_bytes:0=_" > mbm_L3_assignments
  $ cat /sys/fs/resctrl/mbm_L3_assignments
    mbm_total_bytes:0=_;1=e
    mbm_local_bytes:0=e;1=e

To unassign the counter associated with the mbm_total_bytes event on
all the domains:

  $ echo "mbm_total_bytes:*=_" > mbm_L3_assignments
  $ cat /sys/fs/resctrl/mbm_L3_assignments
    mbm_total_bytes:0=_;1=_
    mbm_local_bytes:0=e;1=e

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:47:17 +02:00
Babu Moger
cba8222880 fs/resctrl: Introduce mbm_L3_assignments to list assignments in a group
Introduce the mbm_L3_assignments resctrl file associated with CTRL_MON and MON
resource groups to display the counter assignment states of the resource group
when "mbm_event" counter assignment mode is enabled.

Display the list in the following format:
<Event>:<Domain id>=<Assignment state>;<Domain id>=<Assignment state>

Event: A valid MBM event listed in
      /sys/fs/resctrl/info/L3_MON/event_configs directory.

Domain ID: A valid domain ID.

The assignment state can be one of the following:

_ : No counter assigned.

e : Counter assigned exclusively.

Example:
To list the assignment states for the default group
  $ cd /sys/fs/resctrl
  $ cat /sys/fs/resctrl/mbm_L3_assignments
  mbm_total_bytes:0=e;1=e
  mbm_local_bytes:0=e;1=e

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:45:34 +02:00
Babu Moger
ef712fe97e fs/resctrl: Auto assign counters on mkdir and clean up on group removal
Resctrl provides a user-configurable option mbm_assign_on_mkdir that
determines if a counter will automatically be assigned to an RMID, event pair
when its associated monitor group is created via mkdir.

Enable mbm_assign_on_mkdir by default to automatically assign counters to
the two default events (MBM total and MBM local) of a new monitoring group
created via mkdir. This maintains backward compatibility with original
resctrl support for these two events.

Unassign and free counters belonging to a monitoring group when the group
is deleted.

Monitor group creation does not fail if a counter cannot be assigned to one or
both events. There may be limited counters and users have the flexibility to
modify counter assignments at a later time. Log the error message "Failed to
allocate counter for <event> in domain <id>" in
/sys/fs/resctrl/info/last_cmd_status when a new monitoring group is created
but counter assignment failed.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:44:04 +02:00
Babu Moger
ac1df9bb0b fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir
The "mbm_event" counter assignment mode allows users to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned.

Introduce a user-configurable option that determines if a counter will
automatically be assigned to an RMID, event pair when its associated
monitor group is created via mkdir. Accessible when "mbm_event" counter
assignment mode is enabled.

Suggested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:42:02 +02:00
Babu Moger
f9ae5913d4 fs/resctrl: Provide interface to update the event configurations
When "mbm_event" counter assignment mode is enabled, users can modify the
event configuration by writing to the 'event_filter' resctrl file.  The event
configurations for mbm_event mode are located in
/sys/fs/resctrl/info/L3_MON/event_configs/.

Update the assignments of all CTRL_MON and MON resource groups when the event
configuration is modified.

Example:
  $ mount -t resctrl resctrl /sys/fs/resctrl

  $ cd /sys/fs/resctrl/

  $ cat info/L3_MON/event_configs/mbm_local_bytes/event_filter
    local_reads,local_non_temporal_writes,local_reads_slow_memory

  $ echo "local_reads,local_non_temporal_writes" >
    info/L3_MON/event_configs/mbm_total_bytes/event_filter

  $ cat info/L3_MON/event_configs/mbm_total_bytes/event_filter
    local_reads,local_non_temporal_writes

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:40:38 +02:00
Babu Moger
ea274cbeaf fs/resctrl: Add event configuration directory under info/L3_MON/
The "mbm_event" counter assignment mode allows the user to assign a hardware
counter to an RMID, event pair and monitor the bandwidth as long as it is
assigned. The user can specify the memory transaction(s) for the counter to
track.

When this mode is supported, the /sys/fs/resctrl/info/L3_MON/event_configs
directory contains a sub-directory for each MBM event that can be assigned to
a counter.  The MBM event sub-directory contains a file named "event_filter"
that is used to view and modify which memory transactions the MBM event is
configured with.

Create /sys/fs/resctrl/info/L3_MON/event_configs directory on resctrl mount
and pre-populate it with directories for the two existing MBM events:
mbm_total_bytes and mbm_local_bytes. Create the "event_filter" file within
each MBM event directory with the needed *show() that displays the memory
transactions with which the MBM event is configured.

Example:
  $ mount -t resctrl resctrl /sys/fs/resctrl
  $ cd /sys/fs/resctrl/
  $ cat info/L3_MON/event_configs/mbm_total_bytes/event_filter
    local_reads,remote_reads,local_non_temporal_writes,
    remote_non_temporal_writes,local_reads_slow_memory,
    remote_reads_slow_memory,dirty_victim_writes_all

  $ cat info/L3_MON/event_configs/mbm_local_bytes/event_filter
    local_reads,local_non_temporal_writes,local_reads_slow_memory

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:39:38 +02:00
Babu Moger
159f36cd4d fs/resctrl: Support counter read/reset with mbm_event assignment mode
When "mbm_event" counter assignment mode is enabled, the architecture requires
a counter ID to read the event data.

Introduce an is_mbm_cntr field in struct rmid_read to indicate whether counter
assignment mode is in use.

Update the logic to call resctrl_arch_cntr_read() and resctrl_arch_reset_cntr()
when the assignment mode is active. Report 'Unassigned' in case the user attempts
to read an event without assigning a hardware counter.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:38:58 +02:00
Babu Moger
bc53eea6c2 fs/resctrl: Pass struct rdtgroup instead of individual members
Reading monitoring data for a monitoring group requires both the RMID and
CLOSID. The RMID and CLOSID are members of struct rdtgroup but passed
separately to several functions involved in retrieving event data.

When "mbm_event" counter assignment mode is enabled, a counter ID is required
to read event data. The counter ID is obtained through mbm_cntr_get(), which
expects a struct rdtgroup pointer.

Provide a pointer to the struct rdtgroup as parameter to functions involved in
retrieving event data to simplify access to RMID, CLOSID, and counter ID.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:23:24 +02:00
Babu Moger
aab2c5088c fs/resctrl: Add the functionality to unassign MBM events
The "mbm_event" counter assignment mode offers "num_mbm_cntrs" number of
counters that can be assigned to RMID, event pairs and monitor bandwidth usage
as long as it is assigned. If all the counters are in use, the kernel logs the
error message "Failed to allocate counter for <event> in domain <id>" in
/sys/fs/resctrl/info/last_cmd_status when a new assignment is requested.

To make space for a new assignment, users must unassign an already assigned
counter and retry the assignment again.

Add the functionality to unassign and free the counters in the domain.  Also,
add the helper rdtgroup_unassign_cntrs() to unassign counters in the group.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:22:24 +02:00
Babu Moger
bd85310efd fs/resctrl: Add the functionality to assign MBM events
When supported, "mbm_event" counter assignment mode offers "num_mbm_cntrs"
number of counters that can be assigned to RMID, event pairs and monitor
bandwidth usage as long as it is assigned.

Add the functionality to allocate and assign a counter to an RMID, event
pair in the domain. Also, add the helper rdtgroup_assign_cntrs() to assign
counters in the group.

Log the error message "Failed to allocate counter for <event> in domain
<id>" in /sys/fs/resctrl/info/last_cmd_status if all the counters are in
use. Exit on the first failure when assigning counters across all the
domains.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:21:24 +02:00
Babu Moger
ebebda8536 fs/resctrl: Introduce event configuration field in struct mon_evt
When supported, mbm_event counter assignment mode allows the user to configure
events to track specific types of memory transactions.

Introduce an evt_cfg field in struct mon_evt to define the type of memory
transactions tracked by a monitoring event. Also add a helper function to get
the evt_cfg value.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:18:31 +02:00
Babu Moger
16ff6b038f fs/resctrl: Introduce interface to display number of free MBM counters
Introduce the "available_mbm_cntrs" resctrl file to display the number of
counters available for assignment in each domain when "mbm_event" mode is
enabled.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:16:11 +02:00
Babu Moger
8c793336ea fs/resctrl: Add resctrl file to display number of assignable counters
The "mbm_event" counter assignment mode allows users to assign a hardware
counter to an RMID, event pair and monitor bandwidth usage as long as it is
assigned.  The hardware continues to track the assigned counter until it is
explicitly unassigned by the user.

Create 'num_mbm_cntrs' resctrl file that displays the number of counters
supported in each domain. 'num_mbm_cntrs' is only visible to user space when
the system supports "mbm_event" mode.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:12:09 +02:00
Babu Moger
3b497c3f4f fs/resctrl: Introduce the interface to display monitoring modes
Introduce the resctrl file "mbm_assign_mode" to list the supported counter
assignment modes.

The "mbm_event" counter assignment mode allows users to assign a hardware
counter to an RMID, event pair and monitor bandwidth usage as long as it is
assigned. The hardware continues to track the assigned counter until it is
explicitly unassigned by the user. Each event within a resctrl group can be
assigned independently in this mode.

On AMD systems "mbm_event" mode is backed by the ABMC (Assignable Bandwidth
Monitoring Counters) hardware feature and is enabled by default.

The "default" mode is the existing mode that works without the explicit
counter assignment, instead relying on dynamic counter assignment by hardware
that may result in hardware not dedicating a counter resulting in monitoring
data reads returning "Unavailable".

Provide an interface to display the monitor modes on the system.

  $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
  [mbm_event]
  default

Add IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED) check to support Arm64.

On x86, CONFIG_RESCTRL_ASSIGN_FIXED is not defined. On Arm64, it will be
defined when the "mbm_event" mode is supported.

Add IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED) check early to ensure the user
interface remains compatible with upcoming Arm64 support. IS_ENABLED() safely
evaluates to 0 when the configuration is not defined.

As a result, for MPAM, the display would be either:

  [default]

or

  [mbm_event]

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 12:10:58 +02:00
Babu Moger
13390861b4 x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature details
ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
Bits Description
15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
     Monitoring Counter ID + 1

The ABMC feature details are documented in APM [1] available from [2].

  [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
  Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
  Monitoring (ABMC).

Detect the feature and number of assignable counters supported. For backward
compatibility, upon detecting the assignable counter feature, enable the
mbm_total_bytes and mbm_local_bytes events that users are familiar with as
part of original L3 MBM support.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-09-15 12:08:01 +02:00
Tony Luck
83b0398773 x86,fs/resctrl: Prepare for more monitor events
There's a rule in computer programming that objects appear zero, once, or many
times. So code accordingly.

There are two MBM events and resctrl is coded with a lot of

  if (local)
          do one thing
  if (total)
          do a different thing

Change the rdt_mon_domain and rdt_hw_mon_domain structures to hold arrays of
pointers to per event data instead of explicit fields for total and local
bandwidth.

Simplify by coding for many events using loops on which are enabled.

Move resctrl_is_mbm_event() to <linux/resctrl.h> so it can be used more
widely. Also provide a for_each_mbm_event_id() helper macro.

Cleanup variable names in functions touched to consistently use "eventid" for
those with type enum resctrl_event_id.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 11:57:03 +02:00
Tony Luck
d257cc2e5c x86,fs/resctrl: Replace architecture event enabled checks
The resctrl file system now has complete knowledge of the status of every
event. So there is no need for per-event function calls to check.

Replace each of the resctrl_arch_is_{event}enabled() calls with
resctrl_is_mon_event_enabled(QOS_{EVENT}).

No functional change.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 11:54:14 +02:00
Tony Luck
09f3713446 x86,fs/resctrl: Consolidate monitor event descriptions
There are currently only three monitor events, all associated with the
RDT_RESOURCE_L3 resource. Growing support for additional events will be easier
with some restructuring to have a single point in file system code where all
attributes of all events are defined.

Place all event descriptions into an array mon_event_all[]. Doing this has the
beneficial side effect of removing the need for rdt_resource::evt_list.

Add resctrl_event_id::QOS_FIRST_EVENT for a lower bound on range checks for
event ids and as the starting index to scan mon_event_all[].

Drop the code that builds evt_list and change the two places where the list is
scanned to scan mon_event_all[] instead using a new helper macro
for_each_mon_event().

Architecture code now informs file system code which events are available with
resctrl_enable_mon_event().

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15 11:46:08 +02:00
Reinette Chatre
d2e1b84c51 fs/resctrl: Eliminate false positive lockdep warning when reading SNC counters
Running resctrl_tests on an SNC-2 system with lockdep debugging enabled
triggers several warnings with following trace:

  WARNING: CPU: 0 PID: 1914 at kernel/cpu.c:528 lockdep_assert_cpus_held
  ...
  Call Trace:
  __mon_event_count
  ? __lock_acquire
  ? __pfx___mon_event_count
  mon_event_count
  ? __pfx_smp_mon_event_count
  smp_mon_event_count
  smp_call_on_cpu_callback

get_cpu_cacheinfo_level() called from __mon_event_count() requires CPU hotplug
lock to be held. The hotplug lock is indeed held during this time, as
confirmed by the lockdep_assert_cpus_held() within mon_event_read() that calls
mon_event_count() via IPI, but the lockdep tracking is not able to follow the
IPI.

Fresh CPU cache information via get_cpu_cacheinfo_level() from
__mon_event_count() was added to support the fix for the issue where resctrl
inappropriately maintained links to L3 cache information that will be stale in
the case when the associated CPU goes offline.

Keep the cacheinfo ID in struct rdt_mon_domain to ensure that resctrl does not
maintain stale cache information while CPUs can go offline. Return to using
a pointer to the L3 cache information (struct cacheinfo) in struct rmid_read,
rmid_read::ci. Initialize rmid_read::ci before the IPI where it is used. CPU
hotplug lock is held across rmid_read::ci initialization and use to ensure
that it points to accurate cache information.

Fixes: 594902c986 ("x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-09-09 12:43:36 +02:00
Qinyun Tan
594902c986 x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem
In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain
structure representing a NUMA node relies on the cacheinfo interface
(rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map)
for monitoring. The L3 cache information of a SNC NUMA node determines
which domains are summed for the "top level" L3-scoped events.

rdt_mon_domain::ci is initialized using the first online CPU of a NUMA
node. When this CPU goes offline, its shared_cpu_map is cleared to contain
only the offline CPU itself. Subsequently, attempting to read counters
via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning
zero values for "top-level events" without any error indication.

Replace the cacheinfo references in struct rdt_mon_domain and struct
rmid_read with the cacheinfo ID (a unique identifier for the L3 cache).

rdt_domain_hdr::cpu_mask contains the online CPUs associated with that
domain. When reading "top-level events", select a CPU from
rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine
valid CPUs for reading RMID counter via the MSR interface.

Considering all CPUs associated with the L3 cache improves the chances
of picking a housekeeping CPU on which the counter reading work can be
queued, avoiding an unnecessary IPI.

Fixes: 328ea68874 ("x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files")
Signed-off-by: Qinyun Tan <qinyuntan@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250530182053.37502-2-qinyuntan@linux.alibaba.com
2025-06-16 21:06:12 +02:00
James Morse
7168ae330e x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl
Resctrl is a filesystem interface to hardware that provides cache
allocation policy and bandwidth control for groups of tasks or CPUs.

To support more than one architecture, resctrl needs to live in /fs/.

Move the code that is concerned with the filesystem interface to
/fs/resctrl.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250515165855.31452-25-james.morse@arm.com
2025-05-16 14:36:09 +02:00
James Morse
bff70402d6 fs/resctrl: Add boiler plate for external resctrl code
Add Makefile and Kconfig for fs/resctrl. Add ARCH_HAS_CPU_RESCTRL
for the common parts of the resctrl interface and make X86_CPU_RESCTRL
select this.

Adding an include of asm/resctrl.h to linux/resctrl.h allows the
/fs/resctrl files to switch over to using this header instead.

Co-developed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250515165855.31452-16-james.morse@arm.com
2025-05-16 11:05:40 +02:00