Commit graph

299 commits

Author SHA1 Message Date
Linus Torvalds
abdf766d14 More power management updates for 6.18-rc1
- Make cpufreq drivers setting the default CPU transition latency to
    CPUFREQ_ETERNAL specify a proper default transition latency value
    instead which addresses a regression introduced during the 6.6 cycle
    that broke CPUFREQ_ETERNAL handling (Rafael Wysocki)
 
  - Make the cpufreq CPPC driver use a proper transition delay value
    when CPUFREQ_ETERNAL is returned by cppc_get_transition_latency() to
    indicate an error condition (Rafael Wysocki)
 
  - Make cppc_get_transition_latency() return a negative error code to
    indicate error conditions instead of using CPUFREQ_ETERNAL for this
    purpose and drop CPUFREQ_ETERNAL that has no other users (Rafael
    Wysocki, Gopi Krishna Menon)
 
  - Fix device leak in the mediatek cpufreq driver (Johan Hovold)
 
  - Set target frequency on all CPUs sharing a policy during frequency
    updates in the tegra186 cpufreq driver and make it initialize all
    cores to max frequencies (Aaron Kling)
 
  - Rust cpufreq helper cleanup (Thorsten Blum)
 
  - Make pm_runtime_put*() family of functions return 1 when the
    given device is already suspended which is consistent with the
    documentation (Brian Norris)
 
  - Add basic kunit tests for runtime PM API contracts and update return
    values in kerneldoc comments for the runtime PM API (Brian Norris,
    Dan Carpenter)
 
  - Add auto-cleanup macros for runtime PM "resume and get" and "get
    without resume" operations, use one of them in the PCI core and
    drop the existing "free" macro introduced for similar purpose, but
    somewhat cumbersome to use (Rafael Wysocki)
 
  - Make the core power management code avoid waiting on device links
    marked as SYNC_STATE_ONLY which is consistent with the handling of
    those device links elsewhere (Pin-yen Lin)
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmjk8hUSHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO1PtgH/0AwdSCX8uI44n/EnyjLlQFWOdNSpXI4
 zAIReNLPP0skWrNwf5ivHYKPEWSeo0o0OjHiCjdTCC3uT8FxCLmjjlPS43zhhHem
 41YQFRJb6GpwD86Vog25q5GSPOQORvRGYV+ZGlMesah0cE4qh4LxgVSNtftm9z7b
 CjMFOeFEAAHFNGEEX4U2/GE+PFdbBGMjSDSfyLazKAKjZS436SOGvpP7NUTt0m9C
 kxO2JLJuQXph5lqyzDRAUu1yOseEwM/f6Y5wWs1z/T5GeIacub/KypZlxhsjWALf
 VxdXfvvLehidS747pxOzXPFhP8sXw1a4XdBNqlVGi/YjF7S9OuAxRMg=
 =RfuO
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These are cpufreq fixes and cleanups on top of the material merged
  previously, a power management core code fix and updates of the
  runtime PM framework including unit tests, documentation updates and
  introduction of auto-cleanup macros for runtime PM "resume and get"
  and "get without resuming" operations.

  Specifics:

   - Make cpufreq drivers setting the default CPU transition latency to
     CPUFREQ_ETERNAL specify a proper default transition latency value
     instead which addresses a regression introduced during the 6.6
     cycle that broke CPUFREQ_ETERNAL handling (Rafael Wysocki)

   - Make the cpufreq CPPC driver use a proper transition delay value
     when CPUFREQ_ETERNAL is returned by cppc_get_transition_latency()
     to indicate an error condition (Rafael Wysocki)

   - Make cppc_get_transition_latency() return a negative error code to
     indicate error conditions instead of using CPUFREQ_ETERNAL for this
     purpose and drop CPUFREQ_ETERNAL that has no other users (Rafael
     Wysocki, Gopi Krishna Menon)

   - Fix device leak in the mediatek cpufreq driver (Johan Hovold)

   - Set target frequency on all CPUs sharing a policy during frequency
     updates in the tegra186 cpufreq driver and make it initialize all
     cores to max frequencies (Aaron Kling)

   - Rust cpufreq helper cleanup (Thorsten Blum)

   - Make pm_runtime_put*() family of functions return 1 when the given
     device is already suspended which is consistent with the
     documentation (Brian Norris)

   - Add basic kunit tests for runtime PM API contracts and update
     return values in kerneldoc comments for the runtime PM API (Brian
     Norris, Dan Carpenter)

   - Add auto-cleanup macros for runtime PM "resume and get" and "get
     without resume" operations, use one of them in the PCI core and
     drop the existing "free" macro introduced for similar purpose, but
     somewhat cumbersome to use (Rafael Wysocki)

   - Make the core power management code avoid waiting on device links
     marked as SYNC_STATE_ONLY which is consistent with the handling of
     those device links elsewhere (Pin-yen Lin)"

* tag 'pm-6.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  docs/zh_CN: Fix malformed table
  docs/zh_TW: Fix malformed table
  PM: runtime: Fix error checking for kunit_device_register()
  PM: runtime: Introduce one more usage counter guard
  cpufreq: Drop unused symbol CPUFREQ_ETERNAL
  ACPI: CPPC: Do not use CPUFREQ_ETERNAL as an error value
  cpufreq: CPPC: Avoid using CPUFREQ_ETERNAL as transition delay
  cpufreq: Make drivers using CPUFREQ_ETERNAL specify transition latency
  PM: runtime: Drop DEFINE_FREE() for pm_runtime_put()
  PCI/sysfs: Use runtime PM guard macro for auto-cleanup
  PM: runtime: Add auto-cleanup macros for "resume and get" operations
  cpufreq: tegra186: Initialize all cores to max frequencies
  cpufreq: tegra186: Set target frequency for all cpus in policy
  rust: cpufreq: streamline find_supply_names
  cpufreq: mediatek: fix device leak on probe failure
  PM: sleep: Do not wait on SYNC_STATE_ONLY device links
  PM: runtime: Update kerneldoc return codes
  PM: runtime: Make put{,_sync}() return 1 when already suspended
  PM: runtime: Add basic kunit tests for API contracts
2025-10-07 09:39:51 -07:00
Bjorn Helgaas
51204faa42 Merge branch 'pci/misc'
- Fix whitespace issues (Li Jun)

- Fix pci_acpi_preserve_config() memory leak (Nirmoy Das)

- Add sysfs 'serial_number' file to expose the Device Serial Number
  (Matthew Wood)

* pci/misc:
  PCI/sysfs: Expose PCI device serial number
  PCI/ACPI: Fix pci_acpi_preserve_config() memory leak
2025-10-03 12:13:25 -05:00
Bjorn Helgaas
fead6a0b15 Merge branch 'pci/resource'
- Ensure relaxed tail alignment does not increase min_align when computing
  bridge window size, to fix a regression (Ilpo Järvinen)

- Fix bridge window size computation to fix a regression for devices with
  undefined PCI class, e.g., Samsung [144d:a5a5] (Ilpo Järvinen)

- Fix error handling during resource resize to fix a regression in amdgpu
  (Ilpo Järvinen)

- Align m68k pcibios_enable_device() with other arches (Ilpo Järvinen)

- Remove several sparc pcibios_enable_device() implementations that don't
  do anything beyond what pci_enable_resources() does (Ilpo Järvinen)

- Remove mips pcibios_enable_resources() and use pci_enable_resources()
  instead (Ilpo Järvinen)

- Refactor and simplify find_bus_resource_of_type() (Ilpo Järvinen)

- Claim bridge windows before setting them up (Ilpo Järvinen)

- Disable non-claimed bridge windows so the kernel's view matches the
  hardware configuration (Ilpo Järvinen)

- Use pci_release_resource() instead of release_resource() to reduce code
  duplication and increase consistency (Ilpo Järvinen)

- Enable bridges even if bridge window assignment fails (Ilpo Järvinen)

- Preserve bridge window resource type flags when assignment fails because
  we may need it later (Ilpo Järvinen)

- Add bridge window selection functions to make the selection consistent
  across the several places that do this (Ilpo Järvinen)

- Warn if bridge window cannot be released when resizing BAR (Ilpo
  Järvinen)

- Set up bridge resources before enumerating children so we can check
  whether child resources are inside bridge windows (Ilpo Järvinen)

* pci/resource:
  PCI: Set up bridge resources earlier
  PCI: Don't print stale information about resource
  PCI: Alter misleading recursion to pci_bus_release_bridge_resources()
  PCI: Pass bridge window to pci_bus_release_bridge_resources()
  PCI: Add pci_setup_one_bridge_window()
  PCI: Refactor remove_dev_resources() to use pbus_select_window()
  PCI: Refactor distributing available memory to use loops
  PCI: Use pbus_select_window_for_type() during mem window sizing
  PCI: Use pbus_select_window() in space available checker
  PCI: Rename resource variable from r to res
  PCI: Use pbus_select_window_for_type() during IO window sizing
  PCI: Use pbus_select_window() during BAR resize
  PCI: Warn if bridge window cannot be released when resizing BAR
  PCI: Fix finding bridge window in pci_reassign_bridge_resources()
  PCI: Add bridge window selection functions
  PCI: Add defines for bridge window indexing
  PCI: Preserve bridge window resource type flags
  PCI: Enable bridge even if bridge window fails to assign
  PCI: Use pci_release_resource() instead of release_resource()
  PCI: Disable non-claimed bridge window
  PCI: Always claim bridge window before its setup
  PCI: Refactor find_bus_resource_of_type() logic checks
  PCI: Move find_bus_resource_of_type() earlier
  MIPS: PCI: Use pci_enable_resources()
  sparc/PCI: Remove pcibios_enable_device() as they do nothing extra
  m68k/PCI: Use pci_enable_resources() in pcibios_enable_device()
  PCI: Fix failure detection during resource resize
  PCI: Fix pdev_resources_assignable() disparity
  PCI: Ensure relaxed tail alignment does not increase min_align
2025-10-03 12:13:12 -05:00
Rafael J. Wysocki
8ff5aaa7b8 PCI/sysfs: Use runtime PM guard macro for auto-cleanup
Use the newly introduced pm_runtime_active_try guard to simplify
the code and add the proper error handling for PM runtime resume
errors.

Based on an earlier patch from Takashi Iwai <tiwai@suse.de> [1].

Link: https://patch.msgid.link/20250919163147.4743-3-tiwai@suse.de [1]
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
2025-09-29 17:01:47 +02:00
Brian Norris
48991e4935 PCI/sysfs: Ensure devices are powered for config reads
The "max_link_width", "current_link_speed", "current_link_width",
"secondary_bus_number", and "subordinate_bus_number" sysfs files all access
config registers, but they don't check the runtime PM state. If the device
is in D3cold or a parent bridge is suspended, we may see -EINVAL, bogus
values, or worse, depending on implementation details.

Wrap these access in pci_config_pm_runtime_{get,put}() like most of the
rest of the similar sysfs attributes.

Notably, "max_link_speed" does not access config registers; it returns a
cached value since d2bd39c045 ("PCI: Store all PCIe Supported Link
Speeds").

Fixes: 56c1af4606 ("PCI: Add sysfs max_link_speed/width, current_link_speed/width, etc")
Signed-off-by: Brian Norris <briannorris@google.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250924095711.v2.1.Ibb5b6ca1e2c059e04ec53140cd98a44f2684c668@changeid
2025-09-24 14:02:39 -05:00
Matthew Wood
cf6ee09b09 PCI/sysfs: Expose PCI device serial number
Add a single sysfs read-only interface for reading PCI device serial
numbers from userspace in a programmatic way. This device attribute uses
the same hexadecimal 1-byte dashed formatting as lspci serial number
capability output. If a device doesn't support the serial number
capability, the serial_number sysfs attribute will not be visible.

Signed-off-by: Matthew Wood <thepacketgeek@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mario Limonciello <superm1@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://patch.msgid.link/20250917125815.722952-2-thepacketgeek@gmail.com
2025-09-17 10:53:36 -05:00
Ilpo Järvinen
7dc58aa7f1 PCI: Use pbus_select_window() during BAR resize
Prior to a BAR resize, __resource_resize_store() loops through the normal
resources of the PCI device and releases those that match to the flags of
the BAR to be resized. This is necessary to allow resizing also the
upstream bridge window as only childless bridge windows can be resized.

While the flags check (mostly) works (if corner cases are ignored), the
more straightforward way is to check if the resources share the bridge
window. Change __resource_resize_store() to do the check using
pbus_select_window().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-16-ilpo.jarvinen@linux.intel.com
2025-09-16 11:19:38 -05:00
Ilpo Järvinen
8278c69143 PCI: Preserve bridge window resource type flags
When a bridge window is found unused or fails to assign, the flags of the
associated resource are cleared. Clearing flags is problematic as it also
removes the type information of the resource which is needed later.

Thus, always preserve the bridge window type flags and use IORESOURCE_UNSET
and IORESOURCE_DISABLED to indicate the status of the bridge window. Also,
when initializing resources, make sure all valid bridge windows do get
their type flags set.

Change various places that relied on resource flags being cleared to check
for IORESOURCE_UNSET and IORESOURCE_DISABLED to allow bridge window
resource to retain their type flags. Add pdev_resource_assignable() and
pdev_resource_should_fit() helpers to filter out disabled bridge windows
during resource fitting; the latter combines more common checks into the
helper.

When reading the bridge windows from the registers, instead of leaving the
resource flags cleared for bridge windows that are not enabled, always
set up the flags and set IORESOURCE_UNSET | IORESOURCE_DISABLED as needed.

When resource fitting or assignment fails for a bridge window resource, or
the bridge window is not needed, mark the resource with IORESOURCE_UNSET or
IORESOURCE_DISABLED, respectively.

Use dummy zero resource in resource_show() for backwards compatibility as
lspci will otherwise misrepresent disabled bridge windows.

This change fixes an issue which highlights the importance of keeping the
resource type flags intact:

  At the end of __assign_resources_sorted(), reset_resource() is called,
  previously clearing the flags. Later, pci_prepare_next_assign_round()
  attempted to release bridge resources using
  pci_bus_release_bridge_resources() that calls into
  pci_bridge_release_resources() that assumes type flags are still present.
  As type flags were cleared, IORESOURCE_MEM_64 was not set leading to
  resources under an incorrect bridge window to be released (idx = 1
  instead of idx = 2). While the assignments performed later covered this
  problem so that the wrongly released resources got assigned in the end,
  it was still causing extra release+assign pairs.

There are other reasons why the resource flags should be retained in
upcoming changes too.

Removing the flag reset for non-bridge window resource is left as future
work, in part because it has a much higher regression potential due to
pci_enable_resources() that will start to work also for those resources
then and due to what endpoint drivers might assume about resources.

Despite the Fixes tag, backporting this (at least any time soon) is highly
discouraged. The issue fixed is borderline cosmetic as the later
assignments normally cover the problem entirely. Also there might be
non-obvious dependencies.

Fixes: 5b28541552 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-11-ilpo.jarvinen@linux.intel.com
2025-09-16 11:19:18 -05:00
Thomas Weißschuh
fb506e31b3 sysfs: treewide: switch back to attribute_group::bin_attrs
The normal bin_attrs field can now handle const pointers.
This makes the _new variant unnecessary.
Switch all users back.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20250530-sysfs-const-bin_attr-final-v3-4-724bfcf05b99@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-17 10:44:15 +02:00
Thomas Weißschuh
2fbe82037a sysfs: treewide: switch back to bin_attribute::read()/write()
The bin_attribute argument of bin_attribute::read() is now const.
This makes the _new() callbacks unnecessary. Switch all users back.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20250530-sysfs-const-bin_attr-final-v3-3-724bfcf05b99@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-17 10:44:13 +02:00
Bjorn Helgaas
f377d9cb25 Merge branch 'pci/pm'
- Add pm_runtime_put() cleanup helper for use with __free() to
  automatically drop the device usage count when a pointer goes out of
  scope (Alex Williamson)

- Increment PM usage counter when probing reset methods so we don't try to
  read config space of a powered-off device (Alex Williamson)

- Set all devices to D0 during enumeration to ensure ACPI opregion is
  connected via _REG (Mario Limonciello)

* pci/pm:
  PCI: Explicitly put devices into D0 when initializing
  PCI: Increment PM usage counter when probing reset methods
  PM: runtime: Define pm_runtime_put cleanup helper
2025-06-04 10:50:01 -05:00
Jon Pan-Doh
b4fe7398de PCI/AER: Add sysfs attributes for log ratelimits
Allow userspace to read/write log ratelimits per device (including
enable/disable). Create aer/ sysfs directory to store them and any
future AER configs.

The new sysfs files are:

  /sys/bus/pci/devices/*/aer/correctable_ratelimit_burst
  /sys/bus/pci/devices/*/aer/correctable_ratelimit_interval_ms
  /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_burst
  /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_interval_ms

The default values are ratelimit_burst=10, ratelimit_interval_ms=5000, so
if we try to emit more than 10 messages in a 5 second period, some are
suppressed.

Update AER sysfs ABI filename to reflect the broader scope of AER sysfs
attributes (e.g. stats and ratelimits).

  Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats ->
    sysfs-bus-pci-devices-aer

Tested using aer-inject[1]. Configured correctable log ratelimit to 5.
Sent 6 AER errors. Observed 5 errors logged while AER stats
(cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) shows 6.

Disabled ratelimiting and sent 6 more AER errors. Observed all 6 errors
logged and accounted in AER stats (12 total errors).

[1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git

[bhelgaas: note fatal errors are not ratelimited, "aer_report" ->
"aer_info", replace ratelimit_log_enable toggle with *_ratelimit_interval_ms]

Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
Signed-off-by: Jon Pan-Doh <pandoh@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-21-helgaas@kernel.org
2025-05-23 11:11:45 -05:00
Alex Williamson
0a0829b1fd PCI: Increment PM usage counter when probing reset methods
We can get different results probing reset methods for a device depending
on its power state.  For example, reading the PM control register of a
device in D3cold will always indicate NoSoftRst+ because we get ~0 data
when the config read fails on PCI, preventing us from correctly probing PM
reset support.

Increment the PM usage counter before any probes and use the cleanup __free
facility to automatically drop the usage counter out of scope.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250422230534.2295291-3-alex.williamson@redhat.com
2025-04-23 16:06:45 -05:00
Bjorn Helgaas
38d42a6612 Merge branch 'pci/resource'
- Use pci_resource_n() to simplify BAR/window resource lookup (Ilpo
  Järvinen)

- Fix typo that repeatedly distributed resources to a bridge instead of
  iterating over subordinate bridges, which resulted in too little space to
  assign some BARs (Kai-Heng Feng)

- Relax bridge window tail sizing for optional resources, e.g., IOV BARs,
  to avoid failures when removing and re-adding devices (Ilpo Järvinen)

- Fix a double counting error for I/O resources, as we previously did for
  memory resources (Ilpo Järvinen)

- Use resource_set_{range,size}() helpers in more places (Ilpo Järvinen)

- Add pci_resource_is_iov() to identify IOV resources (Ilpo Järvinen)

- Add pci_resource_num() to look up the BAR number from the resource
  pointer (Ilpo Järvinen)

- Add restore_dev_resource() to simplify code that resources saved device
  resources (Ilpo Järvinen)

- Allow drivers to enable devices even if we haven't assigned optional IOV
  resources to them (Ilpo Järvinen)

- Improve debug output during resource reallocation (Ilpo Järvinen)

- Rework handling of optional resources (IOV BARs, ROMs) to reduce failures
  if we can't allocate them (Ilpo Järvinen)

- Move declarations of pci_rescan_bus_bridge_resize(),
  pci_reassign_bridge_resources(), and CardBus-related sizes from
  include/linux/pci.h to drivers/pci/pci.h since they're not used outside
  the PCI core (Ilpo Järvinen)

- Make pci_setup_bridge() static (Ilpo Järvinen)

- Fix a NULL dereference in the SR-IOV VF creation error path (Shay Drory)

- Fix s390 mmio_read/write syscalls, which didn't cause page faults in some
  cases, which broke vfio-pci lazy mapping on first access (Niklas
  Schnelle)

- Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which was
  disabled only for s390 (Niklas Schnelle)

- Support mmap of PCI resources on s390 except for ISM devices (Niklas
  Schnelle)

* pci/resource:
  s390/pci: Support mmap() of PCI resources except for ISM devices
  s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP
  s390/pci: Fix s390_mmio_read/write syscall page fault handling
  PCI: Fix NULL dereference in SR-IOV VF creation error path
  PCI: Move cardbus IO size declarations into pci/pci.h
  PCI: Make pci_setup_bridge() static
  PCI: Move resource reassignment func declarations into pci/pci.h
  PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.h
  PCI: Fix BAR resizing when VF BARs are assigned
  PCI: Do not claim to release resource falsely
  PCI: Increase Resizable BAR support from 512 GB to 128 TB
  PCI: Rework optional resource handling
  PCI: Perform reset_resource() and build fail list in sync
  PCI: Use res->parent to check if resource is assigned
  PCI: Add debug print when releasing resources before retry
  PCI: Indicate optional resource assignment failures
  PCI: Always have realloc_head in __assign_resources_sorted()
  PCI: Extend enable to check for any optional resource
  PCI: Add restore_dev_resource()
  PCI: Remove incorrect comment from pci_reassign_resource()
  PCI: Consolidate assignment loop next round preparation
  PCI: Rename retval to ret
  PCI: Use while loop and break instead of gotos
  PCI: Refactor pdev_sort_resources() & __dev_sort_resources()
  PCI: Converge return paths in __assign_resources_sorted()
  PCI: Add dev & res local variables to resource assignment funcs
  PCI: Add pci_resource_num() helper
  PCI: Check resource_size() separately
  PCI: Add pci_resource_is_iov() to identify IOV resources
  PCI: Use resource_set_{range,size}() helpers
  PCI: Use SZ_* instead of literals in setup-bus.c
  PCI: Fix old_size lower bound in calculate_iosize() too
  PCI: Allow relaxed bridge window tail sizing for optional resources
  PCI: Simplify size1 assignment logic
  PCI: Use min_align, not unrelated add_align, for size0
  PCI: Remove add_align overwrite unrelated to size0
  PCI: Use downstream bridges for distributing resources
  PCI: Cleanup dev->resource + resno to use pci_resource_n()
2025-03-27 13:14:45 -05:00
Alistair Francis
2311ab1820 PCI/DOE: Expose DOE features via sysfs
PCIe r6.0 added support for Data Object Exchange (DOE).  When DOE is
supported, the DOE Discovery Feature must be implemented per PCIe r6.1, sec
6.30.1.1. DOE allows a requester to obtain information about the other DOE
features supported by the device.

The kernel already queries the DOE features supported and caches the
values.  Expose the values in sysfs to allow user space to determine which
DOE features are supported by the PCIe device.

By exposing the information to userspace, tools like lspci can relay the
information to users. By listing all of the supported features we can allow
userspace to parse the list, which might include vendor specific features
as well as yet to be supported features.

As the DOE Discovery feature must always be supported we treat it as a
special named attribute case. This allows the usual PCI attribute_group
handling to correctly create the doe_features directory when registering
pci_doe_sysfs_group (otherwise it doesn't and sysfs_add_file_to_group()
will seg fault).

After this patch is supported you can see something like this when
attaching a DOE device:

  $ ls /sys/devices/pci0000:00/0000:00:02.0//doe*
  0001:01        0001:02        doe_discovery

Link: https://lore.kernel.org/r/20250306075211.1855177-3-alistair@alistair23.me
Signed-off-by: Alistair Francis <alistair@alistair23.me>
[bhelgaas: drop pci_doe_sysfs_init() stub return, make
DEVICE_ATTR_RO(doe_discovery) static]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-03-21 16:36:01 -05:00
Niklas Schnelle
888bd8322d s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP
The ability to map PCI resources to user-space is controlled by global
defines. For vfio there is VFIO_PCI_MMAP which is only disabled on s390 and
controls mapping of PCI resources using vfio-pci with a fallback option via
the pread()/pwrite() interface.

For the PCI core there is ARCH_GENERIC_PCI_MMAP_RESOURCE which enables a
generic implementation for mapping PCI resources plus the newer sysfs
interface. Then there is HAVE_PCI_MMAP which can be used with custom
definitions of pci_mmap_resource_range() and the historical /proc/bus/pci
interface. Both mechanisms are all or nothing.

For s390 mapping PCI resources is possible and useful for testing and
certain applications such as QEMU's vfio-pci based user-space NVMe driver.
For certain devices, however access to PCI resources via mappings to
user-space is not possible and these must be excluded from the general PCI
resource mapping mechanisms.

Introduce pdev->non_mappable_bars to indicate that a PCI device's BARs can
not be accessed via mappings to user-space. In the future this enables
per-device restrictions of PCI resource mapping.

For now, set this flag for all PCI devices on s390 in line with the
existing, general disable of PCI resource mapping. As s390 is the only user
of the VFI_PCI_MMAP Kconfig options this can already be replaced with a
check of this new flag. Also add similar checks in the other code protected
by HAVE_PCI_MMAP respectively ARCH_GENERIC_PCI_MMAP in preparation for
enabling these for supported devices.

Link: https://lore.kernel.org/lkml/20250212132808.08dcf03c.alex.williamson@redhat.com/
Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-2-c5c0f1d26efd@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-21 14:54:16 -05:00
Ilpo Järvinen
9ec19bfa78 PCI: Fix BAR resizing when VF BARs are assigned
__resource_resize_store() attempts to release all resources of the device
before attempting the resize. The loop, however, only covers standard BARs
(< PCI_STD_NUM_BARS). If a device has VF BARs that are assigned,
pci_reassign_bridge_resources() finds the bridge window still has some
assigned child resources and returns -NOENT which makes
pci_resize_resource() to detect an error and abort the resize.

Change the release loop to cover all resources up to VF BARs which allows
the resize operation to release the bridge windows and attempt to assigned
them again with the different size.

If SR-IOV is enabled, disallow resize as it requires releasing also IOV
resources.

Link: https://lore.kernel.org/r/20250320142837.8027-1-ilpo.jarvinen@linux.intel.com
Fixes: 91fa127794 ("PCI: Expose PCIe Resizable BAR support via sysfs")
Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2025-03-20 16:44:11 -05:00
Bjorn Helgaas
10ff5bbfd4 Merge branch 'pci/misc'
- Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM ACPI
  hotplug driver (Thomas Weißschuh)

- Update PCI_EXP_LNKCAP_SLS comment (Lukas Wunner)

- Drop superfluous pm_wakeup.h include (Wolfram Sang)

- Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong Zhang)

- Correct documentation of the 'config_acs=' kernel parameter (Akihiko
  Odaki)

* pci/misc:
  Documentation: Fix pci=config_acs= example
  PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT
  PCI: Don't include 'pm_wakeup.h' directly
  PCI: Update code comment on PCI_EXP_LNKCAP_SLS for PCIe r3.0
  PCI/ACPI: Constify 'struct bin_attribute'
  PCI/P2PDMA: Constify 'struct bin_attribute'
  PCI/VPD: Constify 'struct bin_attribute'
  PCI/sysfs: Constify 'struct bin_attribute'
2025-01-23 13:05:06 -06:00
Ilpo Järvinen
2d54d23c60
PCI/sysfs: Remove unnecessary zero in initializer
Providing empty initializer for an array is enough to set its elements
to zero. Thus, remove the redundant 0 from the initializer.

Link: https://lore.kernel.org/r/20241028174046.1736-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-01-15 03:54:44 +00:00
Ilpo Järvinen
b6e5c46d83
PCI/sysfs: Use __free() in reset_method_store()
Use __free() from  cleanup.h to handle freeing options in
reset_method_store() as it simplifies the code flow.

Link: https://lore.kernel.org/r/20241028174046.1736-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-01-15 03:54:38 +00:00
Ilpo Järvinen
10269d5781
PCI/sysfs: Move reset related sysfs code to correct file
Most PCI sysfs code and structs are in a dedicated file but a few reset
related things remain in pci.c. Move also them to pci-sysfs.c and drop
pci_dev_reset_method_attr_is_visible() as it is 100% duplicate of
pci_dev_reset_attr_is_visible().

Link: https://lore.kernel.org/r/20241028174046.1736-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-01-15 03:54:22 +00:00
Thomas Weißschuh
0530ad489d PCI/sysfs: Constify 'struct bin_attribute'
The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.

Link: https://lore.kernel.org/r/20241202-sysfs-const-bin_attr-pci-v1-1-c32360f495a7@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-12-03 15:25:41 -06:00
Linus Torvalds
55cb93fd24 Driver core changes for 6.13-rc1
Here is a small set of driver core changes for 6.13-rc1.
 
 Nothing major for this merge cycle, except for the 2 simple merge
 conflicts are here just to make life interesting.
 
 Included in here are:
   - sysfs core changes and preparations for more sysfs api cleanups that
     can come through all driver trees after -rc1 is out
   - fw_devlink fixes based on many reports and debugging sessions
   - list_for_each_reverse() removal, no one was using it!
   - last-minute seq_printf() format string bug found and fixed in many
     drivers all at once.
   - minor bugfixes and changes full details in the shortlog
 
 As mentioned above, there is 2 merge conflicts with your tree, one is
 where the file is removed (easy enough to resolve), the second is a
 build time error, that has been found in linux-next and the fix can be
 seen here:
 	https://lore.kernel.org/r/20241107212645.41252436@canb.auug.org.au
 
 Other than that, the changes here have been in linux-next with no other
 reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ0lEog8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ym+0ACgw6wN+LkLVIHWhxTq5DYHQ0QCxY8AoJrRIcKe
 78h0+OU3OXhOy8JGz62W
 =oI5S
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is a small set of driver core changes for 6.13-rc1.

  Nothing major for this merge cycle, except for the two simple merge
  conflicts are here just to make life interesting.

  Included in here are:

   - sysfs core changes and preparations for more sysfs api cleanups
     that can come through all driver trees after -rc1 is out

   - fw_devlink fixes based on many reports and debugging sessions

   - list_for_each_reverse() removal, no one was using it!

   - last-minute seq_printf() format string bug found and fixed in many
     drivers all at once.

   - minor bugfixes and changes full details in the shortlog"

* tag 'driver-core-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (35 commits)
  Fix a potential abuse of seq_printf() format string in drivers
  cpu: Remove spurious NULL in attribute_group definition
  s390/con3215: Remove spurious NULL in attribute_group definition
  perf: arm-ni: Remove spurious NULL in attribute_group definition
  driver core: Constify bin_attribute definitions
  sysfs: attribute_group: allow registration of const bin_attribute
  firmware_loader: Fix possible resource leak in fw_log_firmware_info()
  drivers: core: fw_devlink: Fix excess parameter description in docstring
  driver core: class: Correct WARN() message in APIs class_(for_each|find)_device()
  cacheinfo: Use of_property_present() for non-boolean properties
  cdx: Fix cdx_mmap_resource() after constifying attr in ->mmap()
  drivers: core: fw_devlink: Make the error message a bit more useful
  phy: tegra: xusb: Set fwnode for xusb port devices
  drm: display: Set fwnode for aux bus devices
  driver core: fw_devlink: Stop trying to optimize cycle detection logic
  driver core: Constify attribute arguments of binary attributes
  sysfs: bin_attribute: add const read/write callback variants
  sysfs: implement all BIN_ATTR_* macros in terms of __BIN_ATTR()
  sysfs: treewide: constify attribute callback of bin_attribute::llseek()
  sysfs: treewide: constify attribute callback of bin_attribute::mmap()
  ...
2024-11-29 11:43:29 -08:00
Keith Busch
2fa046449a PCI: Add 'reset_subordinate' to reset hierarchy below bridge
The "bus" and "cxl_bus" reset methods reset a device by asserting Secondary
Bus Reset on the bridge leading to the device.  These only work if the
device is the only device below the bridge.

Add a sysfs 'reset_subordinate' attribute on bridges that can assert
Secondary Bus Reset regardless of how many devices are below the bridge.

This resets all the devices below a bridge in a single command, including
the locking and config space save/restore that reset methods normally do.

This may be the only way to reset devices that don't support other reset
methods (ACPI, FLR, PM reset, etc).

Link: https://lore.kernel.org/r/20241025222755.3756162-1-kbusch@meta.com
Signed-off-by: Keith Busch <kbusch@kernel.org>
[bhelgaas: commit log, add capable(CAP_SYS_ADMIN) check]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Amey Narkhede <ameynarkhede03@gmail.com>
2024-11-13 16:43:34 -06:00
Thomas Weißschuh
699e7b85af sysfs: treewide: constify attribute callback of bin_attribute::llseek()
The llseek() callbacks should not modify the struct
bin_attribute passed as argument.
Enforce this by marking the argument as const.

As there are not many callback implementers perform this change
throughout the tree at once.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-7-71110628844c@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05 14:00:28 +01:00
Thomas Weißschuh
94a20fb9af sysfs: treewide: constify attribute callback of bin_attribute::mmap()
The mmap() callbacks should not modify the struct
bin_attribute passed as argument.
Enforce this by marking the argument as const.

As there are not many callback implementers perform this change
throughout the tree at once.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-6-71110628844c@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05 14:00:28 +01:00
Thomas Weißschuh
b626816fdd sysfs: treewide: constify attribute callback of bin_is_visible()
The is_bin_visible() callbacks should not modify the struct
bin_attribute passed as argument.
Enforce this by marking the argument as const.

As there are not many callback implementers perform this change
throughout the tree at once.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-5-71110628844c@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05 14:00:28 +01:00
Thomas Weißschuh
a1ab720ee5 PCI/sysfs: Calculate bin_attribute size through bin_size()
Stop abusing the is_bin_visible() callback to calculate the attribute
size. Instead use the new, dedicated bin_size() one.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-3-71110628844c@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05 14:00:28 +01:00
Lukas Wunner
265baca69a s390/pci: Stop usurping pdev->dev.groups
Bjorn suggests using pdev->dev.groups for attribute_groups constructed on
PCI device enumeration:

  "Is it feasible to build an attribute group in pci_doe_init() and
   add it to dev->groups so device_add() will automatically add them?"
   https://lore.kernel.org/r/20231019165829.GA1381099@bhelgaas

Unfortunately on s390, pcibios_device_add() usurps pdev->dev.groups for
arch-specific attribute_groups, preventing its use for anything else.

Introduce an ARCH_PCI_DEV_GROUPS macro which arches can define in
<asm/pci.h>.  The macro is visible in drivers/pci/pci-sysfs.c through the
inclusion of <linux/pci.h>, which in turn includes <asm/pci.h>.

On s390, define the macro to the three attribute_groups previously assigned
to pdev->dev.groups.  Thereby pdev->dev.groups is made available for use by
the PCI core.

As a side effect, arch/s390/pci/pci_sysfs.c no longer needs to be compiled
into the kernel if CONFIG_SYSFS=n.

Link: https://lore.kernel.org/r/7b970f7923e373d1b23784721208f93418720485.1722870934.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
2024-08-09 14:58:27 -05:00
Ilpo Järvinen
f6c7399983 PCI/sysfs: Demacrofy pci_dev_resource_resize_attr(n) functions
pci_dev_resource_resize_attr(n) macro is invoked for six resources,
creating a large footprint function for each resource.

Rework the macro to only create a function that calls a helper function so
the compiler can decide if it warrants to inline the function or not.

With x86_64 defconfig, this saves roughly 2.5kB:

  $ scripts/bloat-o-meter drivers/pci/pci-sysfs.o{.old,.new}
  add/remove: 1/0 grow/shrink: 0/6 up/down: 512/-2934 (-2422)
  Function                                     old     new   delta
  __resource_resize_store                        -     512    +512
  resource5_resize_store                       503      14    -489
  resource4_resize_store                       503      14    -489
  resource3_resize_store                       503      14    -489
  resource2_resize_store                       503      14    -489
  resource1_resize_store                       503      14    -489
  resource0_resize_store                       500      11    -489
  Total: Before=13399, After=10977, chg -18.08%

(The compiler seemingly chose to still inline __resource_resize_show()
which is fine, those functions are not very complex/large.)

Link: https://lore.kernel.org/r/20240222114607.1837-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-03-05 16:10:17 -06:00
Lukas Wunner
be9c3a4c8b PCI/sysfs: Compile pci-sysfs.c only if CONFIG_SYSFS=y
It is possible to enable CONFIG_PCI but disable CONFIG_SYSFS and for
space-constrained devices such as routers, such a configuration may
actually make sense.

However pci-sysfs.c is compiled even if CONFIG_SYSFS is disabled,
unnecessarily increasing the kernel's size.

To rectify that:

* Move pci_mmap_fits() to mmap.c.  It is not only needed by
  pci-sysfs.c, but also proc.c.

* Move pci_dev_type to probe.c and make it private.  It references
  pci_dev_attr_groups in pci-sysfs.c.  Make that public instead for
  consistency with pci_dev_groups, pcibus_groups and pci_bus_groups,
  which are likewise public and referenced by struct definitions in
  pci-driver.c and probe.c.

* Define pci_dev_groups, pci_dev_attr_groups, pcibus_groups and
  pci_bus_groups to NULL if CONFIG_SYSFS is disabled.  Provide empty
  static inlines for pci_{create,remove}_legacy_files() and
  pci_{create,remove}_sysfs_dev_files().

Result:

vmlinux size is reduced by 122996 bytes in my arm 32-bit test build.

Link: https://lore.kernel.org/r/85ca95ae8e4d57ccf082c5c069b8b21eb141846e.1698668982.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
2024-03-05 16:08:43 -06:00
Linus Torvalds
b06f58ad8e Driver core changes for 6.7-rc1
Here is the set of driver core updates for 6.7-rc1.  Nothing major in
 here at all, just a small number of changes including:
   - minor cleanups and updates from Andy Shevchenko
   - __counted_by addition
   - firmware_loader update for aborting loads cleaner
   - other minor changes, details in the shortlog
   - documentation update
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZUTe8A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynP0QCfT2jQx3OcL22MoqCvdTuZJKPiHSIAoMxrliJF
 d4cUeICW17ywlTFzsKg8
 =nTeu
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the set of driver core updates for 6.7-rc1. Nothing major in
  here at all, just a small number of changes including:

   - minor cleanups and updates from Andy Shevchenko

   - __counted_by addition

   - firmware_loader update for aborting loads cleaner

   - other minor changes, details in the shortlog

   - documentation update

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (21 commits)
  firmware_loader: Abort all upcoming firmware load request once reboot triggered
  firmware_loader: Refactor kill_pending_fw_fallback_reqs()
  Documentation: security-bugs.rst: linux-distros relaxed their rules
  driver core: Release all resources during unbind before updating device links
  driver core: class: remove boilerplate code
  driver core: platform: Annotate struct irq_affinity_devres with __counted_by
  resource: Constify resource crosscheck APIs
  resource: Unify next_resource() and next_resource_skip_children()
  resource: Reuse for_each_resource() macro
  PCI: Implement custom llseek for sysfs resource entries
  kernfs: sysfs: support custom llseek method for sysfs entries
  debugfs: Fix __rcu type comparison warning
  device property: Replace custom implementation of COUNT_ARGS()
  drivers: base: test: Make property entry API test modular
  driver core: Add missing parameter description to __fwnode_link_add()
  device property: Clarify usage scope of some struct fwnode_handle members
  devres: rename the first parameter of devm_add_action(_or_reset)
  driver core: platform: Unify the firmware node type check
  driver core: platform: Use temporary variable in platform_device_add()
  driver core: platform: Refactor error path in a couple places
  ...
2023-11-03 15:15:47 -10:00
Bjorn Helgaas
5897c17402 Merge branch 'pci/field-get'
- Use FIELD_GET()/FIELD_PREP() when possible throughout drivers/pci/ (Ilpo
  Järvinen, Bjorn Helgaas)

- Rework DPC control programming for clarity (Ilpo Järvinen)

* pci/field-get:
  PCI/portdrv: Use FIELD_GET()
  PCI/VC: Use FIELD_GET()
  PCI/PTM: Use FIELD_GET()
  PCI/PME: Use FIELD_GET()
  PCI/ATS: Use FIELD_GET()
  PCI/ATS: Show PASID Capability register width in bitmasks
  PCI: Use FIELD_GET() in Sapphire RX 5600 XT Pulse quirk
  PCI: Use FIELD_GET()
  PCI/MSI: Use FIELD_GET/PREP()
  PCI/DPC: Use defines with DPC reason fields
  PCI/DPC: Use defined fields with DPC_CTL register
  PCI/DPC: Use FIELD_GET()
  PCI: hotplug: Use FIELD_GET/PREP()
  PCI: dwc: Use FIELD_GET/PREP()
  PCI: cadence: Use FIELD_GET()
  PCI: Use FIELD_GET() to extract Link Width
  PCI: mvebu: Use FIELD_PREP() with Link Width
  PCI: tegra194: Use FIELD_GET()/FIELD_PREP() with Link Width fields

# Conflicts:
#	drivers/pci/controller/dwc/pcie-tegra194.c
2023-10-28 13:31:05 -05:00
Bjorn Helgaas
dbf9527ca1 Merge branch 'pci/vga'
- Add pci_is_vga() helper, which checks for both PCI_CLASS_DISPLAY_VGA and
  PCI_CLASS_NOT_DEFINED_VGA (which catches ancient devices built before
  Class Codes were defined) (Sui Jingfeng)

- Use the new pci_is_vga() to identify devices for the VGA arbiter, the
  sysfs "boot_vga" attribute, and the virtio and qxl drivers (SUi Jingfeng)

* pci/vga:
  drm/qxl: Use pci_is_vga() to identify VGA devices
  drm/virtio: Use pci_is_vga() to identify VGA devices
  PCI/sysfs: Enable 'boot_vga' attribute via pci_is_vga()
  PCI/VGA: Select VGA devices earlier
  PCI/VGA: Use pci_is_vga() to identify VGA devices
  PCI: Add pci_is_vga() helper
2023-10-28 13:31:00 -05:00
Ilpo Järvinen
d1f9b39da4 PCI: Use FIELD_GET() to extract Link Width
Use FIELD_GET() to extract PCIe Negotiated and Maximum Link Width fields
instead of custom masking and shifting.

Link: https://lore.kernel.org/r/20230919125648.1920-7-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: drop duplicate include of <linux/bitfield.h>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2023-10-17 20:10:31 -05:00
Sui Jingfeng
cdd3cecb52 PCI/sysfs: Enable 'boot_vga' attribute via pci_is_vga()
Enable the 'boot_vga' sysfs attribute via pci_is_vga().

This exposes 'boot_vga' for old PCI_CLASS_NOT_DEFINED_VGA (0x0001) devices
as well as for the PCI_CLASS_DISPLAY_VGA (0x0300) devices where it was
previously exposed.

Link: https://lore.kernel.org/r/20230830111532.444535-4-sui.jingfeng@linux.dev
Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Maciej W. Rozycki" <macro@orcam.me.uk>
2023-10-06 17:19:01 -05:00
Valentine Sinitsyn
24de09c16f PCI: Implement custom llseek for sysfs resource entries
Since commit 636b21b501 ("PCI: Revoke mappings like devmem"), mmappable
sysfs entries have started to receive their f_mapping from the iomem
pseudo filesystem, so that CONFIG_IO_STRICT_DEVMEM is honored in sysfs
(and procfs) as well as in /dev/[k]mem.

This resulted in a userspace-visible regression:

1. Open a sysfs PCI resource file (eg. /sys/bus/pci/devices/*/resource0)
2. Use lseek(fd, 0, SEEK_END) to determine its size

Expected result: a PCI region size is returned.
Actual result: 0 is returned.

The reason is that PCI resource files residing in sysfs use
generic_file_llseek(), which relies on f_mapping->host inode to get the
file size. As f_mapping is now redefined, f_mapping->host points to an
anonymous zero-sized iomem_inode which has nothing to do with sysfs file
in question.

Implement a custom llseek method for sysfs PCI resources, which is
almost the same as proc_bus_pci_lseek() used for procfs entries.

This makes sysfs and procfs entries consistent with regards to seeking,
but also introduces userspace-visible changes to seeking PCI resources
in sysfs:

- SEEK_DATA and SEEK_HOLE are no longer supported;
- Seeking past the end of the file is prohibited while previously
  offsets up to MAX_NON_LFS were accepted (reading from these offsets
  was always invalid).

Signed-off-by: Valentine Sinitsyn <valesini@yandex-team.ru>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20230925084013.309399-2-valesini@yandex-team.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-05 13:42:15 +02:00
Lukas Wunner
70b70a4307 PCI/sysfs: Protect driver's D3cold preference from user space
struct pci_dev contains two flags which govern whether the device may
suspend to D3cold:

* no_d3cold provides an opt-out for drivers (e.g. if a device is known
  to not wake from D3cold)

* d3cold_allowed provides an opt-out for user space (default is true,
  user space may set to false)

Since commit 9d26d3a8f1 ("PCI: Put PCIe ports into D3 during suspend"),
the user space setting overwrites the driver setting.  Essentially user
space is trusted to know better than the driver whether D3cold is
working.

That feels unsafe and wrong.  Assume that the change was introduced
inadvertently and do not overwrite no_d3cold when d3cold_allowed is
modified.  Instead, consider d3cold_allowed in addition to no_d3cold
when choosing a suspend state for the device.

That way, user space may opt out of D3cold if the driver hasn't, but it
may no longer force an opt in if the driver has opted out.

Fixes: 9d26d3a8f1 ("PCI: Put PCIe ports into D3 during suspend")
Link: https://lore.kernel.org/r/b8a7f4af2b73f6b506ad8ddee59d747cbf834606.1695025365.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: stable@vger.kernel.org	# v4.8+
2023-09-29 17:47:50 -05:00
Niklas Schnelle
5da1b58868 PCI/sysfs: Make I/O resource depend on HAS_IOPORT
If legacy I/O spaces are not supported simply return an error when
trying to access them via pci_resource_io(). This allows inb() and
friends to become undefined when they are known at compile time to be
non-functional in a later patch.

Co-developed-by: Arnd Bergmann <arnd@kernel.org>
Link: https://lore.kernel.org/r/20230703135255.2202721-3-schnelle@linux.ibm.com
Signed-off-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-07-18 16:56:58 -05:00
Greg Kroah-Hartman
75cff725d9 driver core: bus: mark the struct bus_type for sysfs callbacks as constant
struct bus_type should never be modified in a sysfs callback as there is
nothing in the structure to modify, and frankly, the structure is almost
never used in a sysfs callback, so mark it as constant to allow struct
bus_type to be moved to read-only memory.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ben Widawsky <bwidawsk@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Harald Freudenberger <freude@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hu Haowen <src.res@email.cn>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Stuart Yoder <stuyoder@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Acked-by: Ilya Dryomov <idryomov@gmail.com> # rbd
Acked-by: Ira Weiny <ira.weiny@intel.com> # cxl
Reviewed-by: Alex Shi <alexs@kernel.org>
Acked-by: Iwona Winiarska <iwona.winiarska@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>	# pci
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi
Link: https://lore.kernel.org/r/20230313182918.1312597-23-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-23 13:20:40 +01:00
Linus Torvalds
c7020e1b34 pci-v6.2-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmOYpTIUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxuZhAAhGjE8voLZeOYwxbvfL69hGTAZ+Me
 x2hqRWVhh/IGWXTTaoSLwSjMMokcmAKN5S/wv8qdCG5sB8EN8FyTBIZDy8PuRRdl
 8UlqlBMSL+d4oSRDCnYLxFNcynLRNnmx2dfcdw9tJ4zjTLN8Y4o8PHFogR6pJ3MT
 sDC8S0myTQKXr4wAGzTZycKsiGManviYtByp6dCcKD3Oy5Q2uZ9OKO2DP2yQpn+F
 c3IJSV9oDz3KR8JVJ5Q1iz9cdMXbGwjkM3JLlHpxhedwjN4ErLumPutKcebtzO5C
 aTqabN7Nnzc4yJusAIfojFCWH7fgaYUyJ3pxcFyJ4tu4m9Last+2I5UB/kV2sYAD
 jWiCYx3sA/mRopNXOnrBGae+Lgy+sQnt8or0grySr0bK+b+ArAGis4uT4A0uASGO
 RUQdIQwz7zhHeQrwAladHWxnx4BEDNCatgfn38p4fklIYKydCY5nfZURMDvHezSR
 G6Nu08hoE9ZXlmkWTFw+5F23wPWKcCpzZj0hf7OroIouXUp8vqSFSqatH5vGkbCl
 bDswck9GdRJ2hl5SvFOeelaXkM42du45TMLU2JmIn6dYYFNrO93JgdvKSU7E2CpG
 AmDIpg1Idxo8fEPPGH1I7RVU5+ilzmmPQQY7poQW+va4/dEd/QVp1+ZZTDnMC1qk
 qi3ck22VdvPU2VU=
 =KULr
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Squash portdrv_{core,pci}.c into portdrv.c to ease maintenance and
     make more things static.

   - Make portdrv bind to Switch Ports that have AER. Previously, if
     these Ports lacked MSI/MSI-X, portdrv failed to bind, which meant
     the Ports couldn't be suspended to low-power states. AER on these
     Ports doesn't use interrupts, and the AER driver doesn't need to
     claim them.

   - Assign PCI domain IDs using ida_alloc(), which makes host bridge
     add/remove work better.

  Resource management:

   - To work better with recent BIOSes that use EfiMemoryMappedIO for
     PCI host bridge apertures, remove those regions from the E820 map
     (E820 entries normally prevent us from allocating BARs). In v5.19,
     we added some quirks to disable E820 checking, but that's not very
     maintainable. EfiMemoryMappedIO means the OS needs to map the
     region for use by EFI runtime services; it shouldn't prevent OS
     from using it.

  PCIe native device hotplug:

   - Build pciehp by default if USB4 is enabled, since Thunderbolt/USB4
     PCIe tunneling depends on native PCIe hotplug.

   - Enable Command Completed Interrupt only if supported to avoid user
     confusion from lspci output that says this is enabled but not
     supported.

   - Prevent pciehp from binding to Switch Upstream Ports; this happened
     because of interaction with acpiphp and caused devices below the
     Upstream Port to disappear.

  Power management:

   - Convert AGP drivers to generic power management. We hope to remove
     legacy power management from the PCI core eventually.

  Virtualization:

   - Fix pci_device_is_present(), which previously always returned
     "false" for VFs, causing virtio hangs when unbinding the driver.

  Miscellaneous:

   - Convert drivers to gpiod API to prepare for dropping some legacy
     code.

   - Fix DOE fencepost error for the maximum data object length.

  Baikal-T1 PCIe controller driver:

   - Add driver and DT bindings.

  Broadcom STB PCIe controller driver:

   - Enable Multi-MSI.

   - Delay 100ms after PERST# deassert to allow power and clocks to
     stabilize.

   - Configure Read Completion Boundary to 64 bytes.

  Freescale i.MX6 PCIe controller driver:

   - Initialize PHY before deasserting core reset to fix a regression in
     v6.0 on boards where the PHY provides the reference.

   - Fix imx6sx and imx8mq clock names in DT schema.

  Intel VMD host bridge driver:

   - Fix Secondary Bus Reset on VMD bridges, which allows reset of NVMe
     SSDs in VT-d pass-through scenarios.

   - Disable MSI remapping, which gets re-enabled by firmware during
     suspend/resume.

  MediaTek PCIe Gen3 controller driver:

   - Add MT7986 and MT8195 support.

  Qualcomm PCIe controller driver:

   - Add SC8280XP/SA8540P basic interconnect support.

  Rockchip DesignWare PCIe controller driver:

   - Base DT schema on common Synopsys schema.

  Synopsys DesignWare PCIe core:

   - Collect DT items shared between Root Port and Endpoint (PERST GPIO,
     PHY info, clocks, resets, link speed, number of lanes, number of
     iATU windows, interrupt info, etc) to snps,dw-pcie-common.yaml.

   - Add dma-ranges support for Root Ports and Endpoints.

   - Consolidate DT resource retrieval for "dbi", "dbi2", "atu", etc. to
     reduce code duplication.

   - Add generic names for clocks and resets to encourage more
     consistent naming across drivers using DesignWare IP.

   - Stop advertising PTM Responder role for Endpoints, which aren't
     allowed to be responders.

  TI J721E PCIe driver:

   - Add j721s2 host mode ID to DT schema.

   - Add interrupt properties to DT schema.

  Toshiba Visconti PCIe controller driver:

   - Fix interrupts array max constraints in DT schema"

* tag 'pci-v6.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (95 commits)
  x86/PCI: Use pr_info() when possible
  x86/PCI: Fix log message typo
  x86/PCI: Tidy E820 removal messages
  PCI: Skip allocate_resource() if too little space available
  efi/x86: Remove EfiMemoryMappedIO from E820 map
  PCI/portdrv: Allow AER service only for Root Ports & RCECs
  PCI: xilinx-nwl: Fix coding style violations
  PCI: mvebu: Switch to using gpiod API
  PCI: pciehp: Enable Command Completed Interrupt only if supported
  PCI: aardvark: Switch to using devm_gpiod_get_optional()
  dt-bindings: PCI: mediatek-gen3: add support for mt7986
  dt-bindings: PCI: mediatek-gen3: add SoC based clock config
  dt-bindings: PCI: qcom: Allow 'dma-coherent' property
  PCI: mt7621: Add sentinel to quirks table
  PCI: vmd: Fix secondary bus reset for Intel bridges
  PCI: endpoint: pci-epf-vntb: Fix sparse ntb->reg build warning
  PCI: endpoint: pci-epf-vntb: Fix sparse build warning for epf_db
  PCI: endpoint: pci-epf-vntb: Replace hardcoded 4 with sizeof(u32)
  PCI: endpoint: pci-epf-vntb: Remove unused epf_db_phy struct member
  PCI: endpoint: pci-epf-vntb: Fix call pci_epc_mem_free_addr() in error path
  ...
2022-12-14 09:54:10 -08:00
Ira Weiny
278294798a PCI: Allow drivers to request exclusive config regions
PCI config space access from user space has traditionally been
unrestricted with writes being an understood risk for device operation.

Unfortunately, device breakage or odd behavior from config writes lacks
indicators that can leave driver writers confused when evaluating
failures.  This is especially true with the new PCIe Data Object
Exchange (DOE) mailbox protocol where backdoor shenanigans from user
space through things such as vendor defined protocols may affect device
operation without complete breakage.

A prior proposal restricted read and writes completely.[1]  Greg and
Bjorn pointed out that proposal is flawed for a couple of reasons.
First, lspci should always be allowed and should not interfere with any
device operation.  Second, setpci is a valuable tool that is sometimes
necessary and it should not be completely restricted.[2]  Finally
methods exist for full lock of device access if required.

Even though access should not be restricted it would be nice for driver
writers to be able to flag critical parts of the config space such that
interference from user space can be detected.

Introduce pci_request_config_region_exclusive() to mark exclusive config
regions.  Such regions trigger a warning and kernel taint if accessed
via user space.

Create pci_warn_once() to restrict the user from spamming the log.

[1] https://lore.kernel.org/all/161663543465.1867664.5674061943008380442.stgit@dwillia2-desk3.amr.corp.intel.com/
[2] https://lore.kernel.org/all/YF8NGeGv9vYcMfTV@kroah.com/

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20220926215711.2893286-2-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-14 10:07:22 -08:00
Sascha Hauer
aa382ffa70 PCI/sysfs: Fix double free in error path
When pci_create_attr() fails, pci_remove_resource_files() is called which
will iterate over the res_attr[_wc] arrays and frees every non NULL entry.
To avoid a double free here set the array entry only after it's clear we
successfully initialized it.

Fixes: b562ec8f74 ("PCI: Don't leak memory if sysfs_create_bin_file() fails")
Link: https://lore.kernel.org/r/20221007070735.GX986@pengutronix.de/
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
2022-11-09 16:57:29 -06:00
Alex Williamson
91fa127794 PCI: Expose PCIe Resizable BAR support via sysfs
Add a simple sysfs interface to Resizable BAR support, largely for the
purposes of assigning such devices to a VM through VFIO.  Resizable BARs
present a difficult feature to expose to a VM through emulation, as
resizing a BAR is done on the host.  It can fail, and often does, but we
have no means via emulation of a PCIe REBAR capability to handle the error
cases.

A vfio-pci specific ioctl interface is also cumbersome as there are often
multiple devices within the same bridge aperture and handling them is a
challenge.  In the interface proposed here, expanding a BAR potentially
requires such devices to be soft-removed during the resize operation and
rescanned after, in order for all the necessary resources to be released.
A pci-sysfs interface is also more universal than a vfio specific
interface.

Please see the ABI documentation update for usage.

Link: https://lore.kernel.org/r/166336088796.3597940.14973499936692558556.stgit@omen
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Krzysztof Wilczyński <kw@linux.com>
2022-10-05 12:21:02 -05:00
Krzysztof Kozlowski
23d99baf9d PCI: Use driver_set_override() instead of open-coding
Use a helper to set driver_override to the reduce amount of duplicated
code.  Make the driver_override field const char, because it is not
modified by the core and it matches other subsystems.

Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220419113435.246203-6-krzysztof.kozlowski@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-22 17:13:54 +02:00
Bjorn Helgaas
c50762a85d PCI: Remove unused assignments
Remove variables and assignments that are never used.

Found by Krzysztof using cppcheck, e.g.,

  $ cppcheck --enable=all --force
  uselessAssignmentPtrArg drivers/pci/proc.c:102 Assignment of function parameter has no effect outside the function. Did you forget dereferencing it?
  unreadVariable drivers/pci/setup-bus.c:1528 Variable 'old_flags' is assigned a value that is never used.

Reported-by: Krzysztof Wilczyński <kw@linux.com>
Link: https://lore.kernel.org/r/20220313192933.434746-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-03-22 11:23:53 -05:00
Thomas Gleixner
793c500676 PCI/sysfs: Use pci_irq_vector()
instead of fiddling with MSI descriptors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211206210224.265589103@linutronix.de
2021-12-09 11:52:21 +01:00
Linus Torvalds
0c5c62ddf8 pci-v5.16-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmGFXBkUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vx6Tg/7BsGWm8f+uw/mr9lLm47q2mc4XyoO
 7bR9KDp5NM84W/8ZOU7dqqqsnY0ddrSOLBRyhJJYMW3SwJd1y1ajTBsL1Ujqv+eN
 z+JUFmhq4Laqm4k6Spc9CEJE+Ol5P6gGUtxLYo6PM2R0VxnSs/rDxctT5i7YOpCi
 COJ+NVT/mc/by2loz1kLTSR9GgtBBgd+Y8UA33GFbHKssROw02L0OI3wffp81Oba
 EhMGPoD+0FndAniDw+vaOSoO+YaBuTfbM92T/O00mND69Fj1PWgmNWZz7gAVgsXb
 3RrNENUFxgw6CDt7LZWB8OyT04iXe0R2kJs+PA9gigFCGbypwbd/Nbz5M7e9HUTR
 ray+1EpZib6+nIksQBL2mX8nmtyHMcLiM57TOEhq0+ECDO640MiRm8t0FIG/1E8v
 3ZYd9w20o/NxlFNXHxxpZ3D/osGH5ocyF5c5m1rfB4RGRwztZGL172LWCB0Ezz9r
 eHB8sWxylxuhrH+hp2BzQjyddg7rbF+RA4AVfcQSxUpyV01hoRocKqknoDATVeLH
 664nJIINFxKJFwfuL3E6OhrInNe1LnAhCZsHHqbS+NNQFgvPRznbixBeLkI9dMf5
 Yf6vpsWO7ur8lHHbRndZubVu8nxklXTU7B/w+C11sq6k9LLRJSHzanr3Fn9WA80x
 sznCxwUvbTCu1r0=
 =nsMh
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:
   - Conserve IRQs by setting up portdrv IRQs only when there are users
     (Jan Kiszka)
   - Rework and simplify _OSC negotiation for control of PCIe features
     (Joerg Roedel)
   - Remove struct pci_dev.driver pointer since it's redundant with the
     struct device.driver pointer (Uwe Kleine-König)

  Resource management:
   - Coalesce contiguous host bridge apertures from _CRS to accommodate
     BARs that cover more than one aperture (Kai-Heng Feng)

  Sysfs:
   - Check CAP_SYS_ADMIN before parsing user input (Krzysztof
     Wilczyński)
   - Return -EINVAL consistently from "store" functions (Krzysztof
     Wilczyński)
   - Use sysfs_emit() in endpoint "show" functions to avoid buffer
     overruns (Kunihiko Hayashi)

  PCIe native device hotplug:
   - Ignore Link Down/Up caused by resets during error recovery so
     endpoint drivers can remain bound to the device (Lukas Wunner)

  Virtualization:
   - Avoid bus resets on Atheros QCA6174, where they hang the device
     (Ingmar Klein)
   - Work around Pericom PI7C9X2G switch packet drop erratum by using
     store and forward mode instead of cut-through (Nathan Rossi)
   - Avoid trying to enable AtomicOps on VFs; the PF setting applies to
     all VFs (Selvin Xavier)

  MSI:
   - Document that /sys/bus/pci/devices/.../irq contains the legacy INTx
     interrupt or the IRQ of the first MSI (not MSI-X) vector (Barry
     Song)

  VPD:
   - Add pci_read_vpd_any() and pci_write_vpd_any() to access anywhere
     in the possible VPD space; use these to simplify the cxgb3 driver
     (Heiner Kallweit)

  Peer-to-peer DMA:
   - Add (not subtract) the bus offset when calculating DMA address
     (Wang Lu)

  ASPM:
   - Re-enable LTR at Downstream Ports so they don't report Unsupported
     Requests when reset or hot-added devices send LTR messages
     (Mingchuang Qiao)

  Apple PCIe controller driver:
   - Add driver for Apple M1 PCIe controller (Alyssa Rosenzweig, Marc
     Zyngier)

  Cadence PCIe controller driver:
   - Return success when probe succeeds instead of falling into error
     path (Li Chen)

  HiSilicon Kirin PCIe controller driver:
   - Reorganize PHY logic and add support for external PHY drivers
     (Mauro Carvalho Chehab)
   - Support PERST# GPIOs for HiKey970 external PEX 8606 bridge (Mauro
     Carvalho Chehab)
   - Add Kirin 970 support (Mauro Carvalho Chehab)
   - Make driver removable (Mauro Carvalho Chehab)

  Intel VMD host bridge driver:
   - If IOMMU supports interrupt remapping, leave VMD MSI-X remapping
     enabled (Adrian Huang)
   - Number each controller so we can tell them apart in
     /proc/interrupts (Chunguang Xu)
   - Avoid building on UML because VMD depends on x86 bare metal APIs
     (Johannes Berg)

  Marvell Aardvark PCIe controller driver:
   - Define macros for PCI_EXP_DEVCTL_PAYLOAD_* (Pali Rohár)
   - Set Max Payload Size to 512 bytes per Marvell spec (Pali Rohár)
   - Downgrade PIO Response Status messages to debug level (Marek Behún)
   - Preserve CRS SV (Config Request Retry Software Visibility) bit in
     emulated Root Control register (Pali Rohár)
   - Fix issue in configuring reference clock (Pali Rohár)
   - Don't clear status bits for masked interrupts (Pali Rohár)
   - Don't mask unused interrupts (Pali Rohár)
   - Avoid code repetition in advk_pcie_rd_conf() (Marek Behún)
   - Retry config accesses on CRS response (Pali Rohár)
   - Simplify emulated Root Capabilities initialization (Pali Rohár)
   - Fix several link training issues (Pali Rohár)
   - Fix link-up checking via LTSSM (Pali Rohár)
   - Fix reporting of Data Link Layer Link Active (Pali Rohár)
   - Fix emulation of W1C bits (Marek Behún)
   - Fix MSI domain .alloc() method to return zero on success (Marek
     Behún)
   - Read entire 16-bit MSI vector in MSI handler, not just low 8 bits
     (Marek Behún)
   - Clear Root Port I/O Space, Memory Space, and Bus Master Enable bits
     at startup; PCI core will set those as necessary (Pali Rohár)
   - When operating as a Root Port, set class code to "PCI Bridge"
     instead of the default "Mass Storage Controller" (Pali Rohár)
   - Add emulation for PCI_BRIDGE_CTL_BUS_RESET since aardvark doesn't
     implement this per spec (Pali Rohár)
   - Add emulation of option ROM BAR since aardvark doesn't implement
     this per spec (Pali Rohár)

  MediaTek MT7621 PCIe controller driver:
   - Add MediaTek MT7621 PCIe host controller driver and DT binding
     (Sergio Paracuellos)

  Qualcomm PCIe controller driver:
   - Add SC8180x compatible string (Bjorn Andersson)
   - Add endpoint controller driver and DT binding (Manivannan
     Sadhasivam)
   - Restructure to use of_device_get_match_data() (Prasad Malisetty)
   - Add SC7280-specific pcie_1_pipe_clk_src handling (Prasad Malisetty)

  Renesas R-Car PCIe controller driver:
   - Remove unnecessary includes (Geert Uytterhoeven)

  Rockchip DesignWare PCIe controller driver:
   - Add DT binding (Simon Xue)

  Socionext UniPhier Pro5 controller driver:
   - Serialize INTx masking/unmasking (Kunihiko Hayashi)

  Synopsys DesignWare PCIe controller driver:
   - Run dwc .host_init() method before registering MSI interrupt
     handler so we can deal with pending interrupts left by bootloader
     (Bjorn Andersson)
   - Clean up Kconfig dependencies (Andy Shevchenko)
   - Export symbols to allow more modular drivers (Luca Ceresoli)

  TI DRA7xx PCIe controller driver:
   - Allow host and endpoint drivers to be modules (Luca Ceresoli)
   - Enable external clock if present (Luca Ceresoli)

  TI J721E PCIe driver:
   - Disable PHY when probe fails after initializing it (Christophe
     JAILLET)

  MicroSemi Switchtec management driver:
   - Return error to application when command execution fails because an
     out-of-band reset has cleared the device BARs, Memory Space Enable,
     etc (Kelvin Cao)
   - Fix MRPC error status handling issue (Kelvin Cao)
   - Mask out other bits when reading of management VEP instance ID
     (Kelvin Cao)
   - Return EOPNOTSUPP instead of ENOTSUPP from sysfs show functions
     (Kelvin Cao)
   - Add check of event support (Logan Gunthorpe)

  Miscellaneous:
   - Remove unused pci_pool wrappers, which have been replaced by
     dma_pool (Cai Huoqing)
   - Use 'unsigned int' instead of bare 'unsigned' (Krzysztof
     Wilczyński)
   - Use kstrtobool() directly, sans strtobool() wrapper (Krzysztof
     Wilczyński)
   - Fix some sscanf(), sprintf() format mismatches (Krzysztof
     Wilczyński)
   - Update PCI subsystem information in MAINTAINERS (Krzysztof
     Wilczyński)
   - Correct some misspellings (Krzysztof Wilczyński)"

* tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (137 commits)
  PCI: Add ACS quirk for Pericom PI7C9X2G switches
  PCI: apple: Configure RID to SID mapper on device addition
  iommu/dart: Exclude MSI doorbell from PCIe device IOVA range
  PCI: apple: Implement MSI support
  PCI: apple: Add INTx and per-port interrupt support
  PCI: kirin: Allow removing the driver
  PCI: kirin: De-init the dwc driver
  PCI: kirin: Disable clkreq during poweroff sequence
  PCI: kirin: Move the power-off code to a common routine
  PCI: kirin: Add power_off support for Kirin 960 PHY
  PCI: kirin: Allow building it as a module
  PCI: kirin: Add MODULE_* macros
  PCI: kirin: Add Kirin 970 compatible
  PCI: kirin: Support PERST# GPIOs for HiKey970 external PEX 8606 bridge
  PCI: apple: Set up reference clocks when probing
  PCI: apple: Add initial hardware bring-up
  PCI: of: Allow matching of an interrupt-map local to a PCI device
  of/irq: Allow matching of an interrupt-map local to an interrupt controller
  irqdomain: Make of_phandle_args_to_fwspec() generally available
  PCI: Do not enable AtomicOps on VFs
  ...
2021-11-06 14:36:12 -07:00
Bjorn Helgaas
ebf275b856 Merge branch 'pci/sysfs'
- Check for CAP_SYS_ADMIN before validating sysfs user input, not after
  (Krzysztof Wilczyński)

- Always return -EINVAL from sysfs "store" functions for invalid user input
  instead of -EINVAL sometimes and -ERANGE others (Krzysztof Wilczyński)

- Use kstrtobool() directly instead of the strtobool() wrapper (Krzysztof
  Wilczyński)

* pci/sysfs:
  PCI: Use kstrtobool() directly, sans strtobool() wrapper
  PCI/sysfs: Return -EINVAL consistently from "store" functions
  PCI/sysfs: Check CAP_SYS_ADMIN before parsing user input

# Conflicts:
#	drivers/pci/iov.c
2021-11-05 11:28:46 -05:00
Barry Song
ac8e3cef58 PCI/sysfs: Explicitly show first MSI IRQ for 'irq'
The sysfs "irq" file contains the legacy INTx IRQ.  Or, if the device has
MSI enabled, it contains the first MSI IRQ instead.

Previously this file showed the pci_dev.irq value directly.  But we'd
prefer to use pci_dev.irq only for the INTx IRQ and decouple that from any
MSI or MSI-X IRQs.

If the device has MSI enabled, explicitly look up and show the first MSI
IRQ in the sysfs "irq" file.  Otherwise, show the INTx IRQ.

This removes the requirement that msi_capability_init() set pci_dev.irq to
the first MSI IRQ when enabling MSI and pci_msi_shutdown() restore the INTx
IRQ when disabling MSI.

[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20210825102636.52757-3-21cnbao@gmail.com
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-10-18 16:43:04 -05:00