forked from mirrors/linux
		
	PCI: Provide sensible IRQ vector alloc/free routines
Add a function to allocate and free a range of interrupt vectors, using MSI-X, MSI or legacy vectors (in that order) based on the capabilities of the underlying device and PCIe complex. Additionally a new helper is provided to get the Linux IRQ number for given device-relative vector so that the drivers don't need to allocate their own arrays to keep track of the vectors for the multi vector MSI-X case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alexander Gordeev <agordeev@redhat.com>
This commit is contained in:
		
							parent
							
								
									3ac020e0ca
								
							
						
					
					
						commit
						aff171641d
					
				
					 3 changed files with 192 additions and 391 deletions
				
			
		|  | @ -78,422 +78,107 @@ CONFIG_PCI_MSI option. | ||||||
| 
 | 
 | ||||||
| 4.2 Using MSI | 4.2 Using MSI | ||||||
| 
 | 
 | ||||||
| Most of the hard work is done for the driver in the PCI layer.  It simply | Most of the hard work is done for the driver in the PCI layer.  The driver | ||||||
| has to request that the PCI layer set up the MSI capability for this | simply has to request that the PCI layer set up the MSI capability for this | ||||||
| device. | device. | ||||||
| 
 | 
 | ||||||
| 4.2.1 pci_enable_msi | To automatically use MSI or MSI-X interrupt vectors, use the following | ||||||
|  | function: | ||||||
| 
 | 
 | ||||||
| int pci_enable_msi(struct pci_dev *dev) |   int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, | ||||||
|  | 		unsigned int max_vecs, unsigned int flags); | ||||||
| 
 | 
 | ||||||
| A successful call allocates ONE interrupt to the device, regardless | which allocates up to max_vecs interrupt vectors for a PCI device.  It | ||||||
| of how many MSIs the device supports.  The device is switched from | returns the number of vectors allocated or a negative error.  If the device | ||||||
| pin-based interrupt mode to MSI mode.  The dev->irq number is changed | has a requirements for a minimum number of vectors the driver can pass a | ||||||
| to a new number which represents the message signaled interrupt; | min_vecs argument set to this limit, and the PCI core will return -ENOSPC | ||||||
| consequently, this function should be called before the driver calls | if it can't meet the minimum number of vectors. | ||||||
| request_irq(), because an MSI is delivered via a vector that is |  | ||||||
| different from the vector of a pin-based interrupt. |  | ||||||
| 
 | 
 | ||||||
| 4.2.2 pci_enable_msi_range | The flags argument should normally be set to 0, but can be used to pass the | ||||||
|  | PCI_IRQ_NOMSI and PCI_IRQ_NOMSIX flag in case a device claims to support | ||||||
|  | MSI or MSI-X, but the support is broken, or to pass PCI_IRQ_NOLEGACY in | ||||||
|  | case the device does not support legacy interrupt lines. | ||||||
| 
 | 
 | ||||||
| int pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec) | To get the Linux IRQ numbers passed to request_irq() and free_irq() and the | ||||||
|  | vectors, use the following function: | ||||||
| 
 | 
 | ||||||
| This function allows a device driver to request any number of MSI |   int pci_irq_vector(struct pci_dev *dev, unsigned int nr); | ||||||
| interrupts within specified range from 'minvec' to 'maxvec'. |  | ||||||
| 
 | 
 | ||||||
| If this function returns a positive number it indicates the number of | Any allocated resources should be freed before removing the device using | ||||||
| MSI interrupts that have been successfully allocated.  In this case | the following function: | ||||||
| the device is switched from pin-based interrupt mode to MSI mode and |  | ||||||
| updates dev->irq to be the lowest of the new interrupts assigned to it. |  | ||||||
| The other interrupts assigned to the device are in the range dev->irq |  | ||||||
| to dev->irq + returned value - 1.  Device driver can use the returned |  | ||||||
| number of successfully allocated MSI interrupts to further allocate |  | ||||||
| and initialize device resources. |  | ||||||
| 
 | 
 | ||||||
| If this function returns a negative number, it indicates an error and |   void pci_free_irq_vectors(struct pci_dev *dev); | ||||||
| the driver should not attempt to request any more MSI interrupts for |  | ||||||
| this device. |  | ||||||
| 
 | 
 | ||||||
| This function should be called before the driver calls request_irq(), | If a device supports both MSI-X and MSI capabilities, this API will use the | ||||||
| because MSI interrupts are delivered via vectors that are different | MSI-X facilities in preference to the MSI facilities.  MSI-X supports any | ||||||
| from the vector of a pin-based interrupt. | number of interrupts between 1 and 2048.  In contrast, MSI is restricted to | ||||||
|  | a maximum of 32 interrupts (and must be a power of two).  In addition, the | ||||||
|  | MSI interrupt vectors must be allocated consecutively, so the system might | ||||||
|  | not be able to allocate as many vectors for MSI as it could for MSI-X.  On | ||||||
|  | some platforms, MSI interrupts must all be targeted at the same set of CPUs | ||||||
|  | whereas MSI-X interrupts can all be targeted at different CPUs. | ||||||
| 
 | 
 | ||||||
| It is ideal if drivers can cope with a variable number of MSI interrupts; | If a device supports neither MSI-X or MSI it will fall back to a single | ||||||
| there are many reasons why the platform may not be able to provide the | legacy IRQ vector. | ||||||
| exact number that a driver asks for. |  | ||||||
| 
 | 
 | ||||||
| There could be devices that can not operate with just any number of MSI | The typical usage of MSI or MSI-X interrupts is to allocate as many vectors | ||||||
| interrupts within a range.  See chapter 4.3.1.3 to get the idea how to | as possible, likely up to the limit supported by the device.  If nvec is | ||||||
| handle such devices for MSI-X - the same logic applies to MSI. | larger than the number supported by the device it will automatically be | ||||||
|  | capped to the supported limit, so there is no need to query the number of | ||||||
|  | vectors supported beforehand: | ||||||
| 
 | 
 | ||||||
| 4.2.1.1 Maximum possible number of MSI interrupts | 	nvec = pci_alloc_irq_vectors(pdev, 1, nvec, 0); | ||||||
| 
 | 	if (nvec < 0) | ||||||
| The typical usage of MSI interrupts is to allocate as many vectors as | 		goto out_err; | ||||||
| possible, likely up to the limit returned by pci_msi_vec_count() function: |  | ||||||
| 
 |  | ||||||
| static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) |  | ||||||
| { |  | ||||||
| 	return pci_enable_msi_range(pdev, 1, nvec); |  | ||||||
| } |  | ||||||
| 
 |  | ||||||
| Note the value of 'minvec' parameter is 1.  As 'minvec' is inclusive, |  | ||||||
| the value of 0 would be meaningless and could result in error. |  | ||||||
| 
 |  | ||||||
| Some devices have a minimal limit on number of MSI interrupts. |  | ||||||
| In this case the function could look like this: |  | ||||||
| 
 |  | ||||||
| static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) |  | ||||||
| { |  | ||||||
| 	return pci_enable_msi_range(pdev, FOO_DRIVER_MINIMUM_NVEC, nvec); |  | ||||||
| } |  | ||||||
| 
 |  | ||||||
| 4.2.1.2 Exact number of MSI interrupts |  | ||||||
| 
 | 
 | ||||||
| If a driver is unable or unwilling to deal with a variable number of MSI | If a driver is unable or unwilling to deal with a variable number of MSI | ||||||
| interrupts it could request a particular number of interrupts by passing | interrupts it can request a particular number of interrupts by passing that | ||||||
| that number to pci_enable_msi_range() function as both 'minvec' and 'maxvec' | number to pci_alloc_irq_vectors() function as both 'min_vecs' and | ||||||
| parameters: | 'max_vecs' parameters: | ||||||
| 
 | 
 | ||||||
| static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) | 	ret = pci_alloc_irq_vectors(pdev, nvec, nvec, 0); | ||||||
| { | 	if (ret < 0) | ||||||
| 	return pci_enable_msi_range(pdev, nvec, nvec); | 		goto out_err; | ||||||
| } |  | ||||||
| 
 | 
 | ||||||
| Note, unlike pci_enable_msi_exact() function, which could be also used to | The most notorious example of the request type described above is enabling | ||||||
| enable a particular number of MSI-X interrupts, pci_enable_msi_range() | the single MSI mode for a device.  It could be done by passing two 1s as | ||||||
| returns either a negative errno or 'nvec' (not negative errno or 0 - as | 'min_vecs' and 'max_vecs': | ||||||
| pci_enable_msi_exact() does). |  | ||||||
| 
 | 
 | ||||||
| 4.2.1.3 Single MSI mode | 	ret = pci_alloc_irq_vectors(pdev, 1, 1, 0); | ||||||
|  | 	if (ret < 0) | ||||||
|  | 		goto out_err; | ||||||
| 
 | 
 | ||||||
| The most notorious example of the request type described above is | Some devices might not support using legacy line interrupts, in which case | ||||||
| enabling the single MSI mode for a device.  It could be done by passing | the PCI_IRQ_NOLEGACY flag can be used to fail the request if the platform | ||||||
| two 1s as 'minvec' and 'maxvec': | can't provide MSI or MSI-X interrupts: | ||||||
| 
 | 
 | ||||||
| static int foo_driver_enable_single_msi(struct pci_dev *pdev) | 	nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_NOLEGACY); | ||||||
| { | 	if (nvec < 0) | ||||||
| 	return pci_enable_msi_range(pdev, 1, 1); | 		goto out_err; | ||||||
| } |  | ||||||
| 
 | 
 | ||||||
| Note, unlike pci_enable_msi() function, which could be also used to | 4.3 Legacy APIs | ||||||
| enable the single MSI mode, pci_enable_msi_range() returns either a |  | ||||||
| negative errno or 1 (not negative errno or 0 - as pci_enable_msi() |  | ||||||
| does). |  | ||||||
| 
 | 
 | ||||||
| 4.2.3 pci_enable_msi_exact | The following old APIs to enable and disable MSI or MSI-X interrupts should | ||||||
|  | not be used in new code: | ||||||
| 
 | 
 | ||||||
| int pci_enable_msi_exact(struct pci_dev *dev, int nvec) |   pci_enable_msi()		/* deprecated */ | ||||||
|  |   pci_enable_msi_range()	/* deprecated */ | ||||||
|  |   pci_enable_msi_exact()	/* deprecated */ | ||||||
|  |   pci_disable_msi()		/* deprecated */ | ||||||
|  |   pci_enable_msix_range()	/* deprecated */ | ||||||
|  |   pci_enable_msix_exact()	/* deprecated */ | ||||||
|  |   pci_disable_msix()		/* deprecated */ | ||||||
| 
 | 
 | ||||||
| This variation on pci_enable_msi_range() call allows a device driver to | Additionally there are APIs to provide the number of supported MSI or MSI-X | ||||||
| request exactly 'nvec' MSIs. | vectors: pci_msi_vec_count() and pci_msix_vec_count().  In general these | ||||||
|  | should be avoided in favor of letting pci_alloc_irq_vectors() cap the | ||||||
|  | number of vectors.  If you have a legitimate special use case for the count | ||||||
|  | of vectors we might have to revisit that decision and add a | ||||||
|  | pci_nr_irq_vectors() helper that handles MSI and MSI-X transparently. | ||||||
| 
 | 
 | ||||||
| If this function returns a negative number, it indicates an error and | 4.4 Considerations when using MSIs | ||||||
| the driver should not attempt to request any more MSI interrupts for |  | ||||||
| this device. |  | ||||||
| 
 | 
 | ||||||
| By contrast with pci_enable_msi_range() function, pci_enable_msi_exact() | 4.4.1 Spinlocks | ||||||
| returns zero in case of success, which indicates MSI interrupts have been |  | ||||||
| successfully allocated. |  | ||||||
| 
 |  | ||||||
| 4.2.4 pci_disable_msi |  | ||||||
| 
 |  | ||||||
| void pci_disable_msi(struct pci_dev *dev) |  | ||||||
| 
 |  | ||||||
| This function should be used to undo the effect of pci_enable_msi_range(). |  | ||||||
| Calling it restores dev->irq to the pin-based interrupt number and frees |  | ||||||
| the previously allocated MSIs.  The interrupts may subsequently be assigned |  | ||||||
| to another device, so drivers should not cache the value of dev->irq. |  | ||||||
| 
 |  | ||||||
| Before calling this function, a device driver must always call free_irq() |  | ||||||
| on any interrupt for which it previously called request_irq(). |  | ||||||
| Failure to do so results in a BUG_ON(), leaving the device with |  | ||||||
| MSI enabled and thus leaking its vector. |  | ||||||
| 
 |  | ||||||
| 4.2.4 pci_msi_vec_count |  | ||||||
| 
 |  | ||||||
| int pci_msi_vec_count(struct pci_dev *dev) |  | ||||||
| 
 |  | ||||||
| This function could be used to retrieve the number of MSI vectors the |  | ||||||
| device requested (via the Multiple Message Capable register). The MSI |  | ||||||
| specification only allows the returned value to be a power of two, |  | ||||||
| up to a maximum of 2^5 (32). |  | ||||||
| 
 |  | ||||||
| If this function returns a negative number, it indicates the device is |  | ||||||
| not capable of sending MSIs. |  | ||||||
| 
 |  | ||||||
| If this function returns a positive number, it indicates the maximum |  | ||||||
| number of MSI interrupt vectors that could be allocated. |  | ||||||
| 
 |  | ||||||
| 4.3 Using MSI-X |  | ||||||
| 
 |  | ||||||
| The MSI-X capability is much more flexible than the MSI capability. |  | ||||||
| It supports up to 2048 interrupts, each of which can be controlled |  | ||||||
| independently.  To support this flexibility, drivers must use an array of |  | ||||||
| `struct msix_entry': |  | ||||||
| 
 |  | ||||||
| struct msix_entry { |  | ||||||
| 	u16 	vector; /* kernel uses to write alloc vector */ |  | ||||||
| 	u16	entry; /* driver uses to specify entry */ |  | ||||||
| }; |  | ||||||
| 
 |  | ||||||
| This allows for the device to use these interrupts in a sparse fashion; |  | ||||||
| for example, it could use interrupts 3 and 1027 and yet allocate only a |  | ||||||
| two-element array.  The driver is expected to fill in the 'entry' value |  | ||||||
| in each element of the array to indicate for which entries the kernel |  | ||||||
| should assign interrupts; it is invalid to fill in two entries with the |  | ||||||
| same number. |  | ||||||
| 
 |  | ||||||
| 4.3.1 pci_enable_msix_range |  | ||||||
| 
 |  | ||||||
| int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, |  | ||||||
| 			  int minvec, int maxvec) |  | ||||||
| 
 |  | ||||||
| Calling this function asks the PCI subsystem to allocate any number of |  | ||||||
| MSI-X interrupts within specified range from 'minvec' to 'maxvec'. |  | ||||||
| The 'entries' argument is a pointer to an array of msix_entry structs |  | ||||||
| which should be at least 'maxvec' entries in size. |  | ||||||
| 
 |  | ||||||
| On success, the device is switched into MSI-X mode and the function |  | ||||||
| returns the number of MSI-X interrupts that have been successfully |  | ||||||
| allocated.  In this case the 'vector' member in entries numbered from |  | ||||||
| 0 to the returned value - 1 is populated with the interrupt number; |  | ||||||
| the driver should then call request_irq() for each 'vector' that it |  | ||||||
| decides to use.  The device driver is responsible for keeping track of the |  | ||||||
| interrupts assigned to the MSI-X vectors so it can free them again later. |  | ||||||
| Device driver can use the returned number of successfully allocated MSI-X |  | ||||||
| interrupts to further allocate and initialize device resources. |  | ||||||
| 
 |  | ||||||
| If this function returns a negative number, it indicates an error and |  | ||||||
| the driver should not attempt to allocate any more MSI-X interrupts for |  | ||||||
| this device. |  | ||||||
| 
 |  | ||||||
| This function, in contrast with pci_enable_msi_range(), does not adjust |  | ||||||
| dev->irq.  The device will not generate interrupts for this interrupt |  | ||||||
| number once MSI-X is enabled. |  | ||||||
| 
 |  | ||||||
| Device drivers should normally call this function once per device |  | ||||||
| during the initialization phase. |  | ||||||
| 
 |  | ||||||
| It is ideal if drivers can cope with a variable number of MSI-X interrupts; |  | ||||||
| there are many reasons why the platform may not be able to provide the |  | ||||||
| exact number that a driver asks for. |  | ||||||
| 
 |  | ||||||
| There could be devices that can not operate with just any number of MSI-X |  | ||||||
| interrupts within a range.  E.g., an network adapter might need let's say |  | ||||||
| four vectors per each queue it provides.  Therefore, a number of MSI-X |  | ||||||
| interrupts allocated should be a multiple of four.  In this case interface |  | ||||||
| pci_enable_msix_range() can not be used alone to request MSI-X interrupts |  | ||||||
| (since it can allocate any number within the range, without any notion of |  | ||||||
| the multiple of four) and the device driver should master a custom logic |  | ||||||
| to request the required number of MSI-X interrupts. |  | ||||||
| 
 |  | ||||||
| 4.3.1.1 Maximum possible number of MSI-X interrupts |  | ||||||
| 
 |  | ||||||
| The typical usage of MSI-X interrupts is to allocate as many vectors as |  | ||||||
| possible, likely up to the limit returned by pci_msix_vec_count() function: |  | ||||||
| 
 |  | ||||||
| static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) |  | ||||||
| { |  | ||||||
| 	return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, |  | ||||||
| 				     1, nvec); |  | ||||||
| } |  | ||||||
| 
 |  | ||||||
| Note the value of 'minvec' parameter is 1.  As 'minvec' is inclusive, |  | ||||||
| the value of 0 would be meaningless and could result in error. |  | ||||||
| 
 |  | ||||||
| Some devices have a minimal limit on number of MSI-X interrupts. |  | ||||||
| In this case the function could look like this: |  | ||||||
| 
 |  | ||||||
| static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) |  | ||||||
| { |  | ||||||
| 	return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, |  | ||||||
| 				     FOO_DRIVER_MINIMUM_NVEC, nvec); |  | ||||||
| } |  | ||||||
| 
 |  | ||||||
| 4.3.1.2 Exact number of MSI-X interrupts |  | ||||||
| 
 |  | ||||||
| If a driver is unable or unwilling to deal with a variable number of MSI-X |  | ||||||
| interrupts it could request a particular number of interrupts by passing |  | ||||||
| that number to pci_enable_msix_range() function as both 'minvec' and 'maxvec' |  | ||||||
| parameters: |  | ||||||
| 
 |  | ||||||
| static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) |  | ||||||
| { |  | ||||||
| 	return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, |  | ||||||
| 				     nvec, nvec); |  | ||||||
| } |  | ||||||
| 
 |  | ||||||
| Note, unlike pci_enable_msix_exact() function, which could be also used to |  | ||||||
| enable a particular number of MSI-X interrupts, pci_enable_msix_range() |  | ||||||
| returns either a negative errno or 'nvec' (not negative errno or 0 - as |  | ||||||
| pci_enable_msix_exact() does). |  | ||||||
| 
 |  | ||||||
| 4.3.1.3 Specific requirements to the number of MSI-X interrupts |  | ||||||
| 
 |  | ||||||
| As noted above, there could be devices that can not operate with just any |  | ||||||
| number of MSI-X interrupts within a range.  E.g., let's assume a device that |  | ||||||
| is only capable sending the number of MSI-X interrupts which is a power of |  | ||||||
| two.  A routine that enables MSI-X mode for such device might look like this: |  | ||||||
| 
 |  | ||||||
| /* |  | ||||||
|  * Assume 'minvec' and 'maxvec' are non-zero |  | ||||||
|  */ |  | ||||||
| static int foo_driver_enable_msix(struct foo_adapter *adapter, |  | ||||||
| 				  int minvec, int maxvec) |  | ||||||
| { |  | ||||||
| 	int rc; |  | ||||||
| 
 |  | ||||||
| 	minvec = roundup_pow_of_two(minvec); |  | ||||||
| 	maxvec = rounddown_pow_of_two(maxvec); |  | ||||||
| 
 |  | ||||||
| 	if (minvec > maxvec) |  | ||||||
| 		return -ERANGE; |  | ||||||
| 
 |  | ||||||
| retry: |  | ||||||
| 	rc = pci_enable_msix_range(adapter->pdev, adapter->msix_entries, |  | ||||||
| 				   maxvec, maxvec); |  | ||||||
| 	/* |  | ||||||
| 	 * -ENOSPC is the only error code allowed to be analyzed |  | ||||||
| 	 */ |  | ||||||
| 	if (rc == -ENOSPC) { |  | ||||||
| 		if (maxvec == 1) |  | ||||||
| 			return -ENOSPC; |  | ||||||
| 
 |  | ||||||
| 		maxvec /= 2; |  | ||||||
| 
 |  | ||||||
| 		if (minvec > maxvec) |  | ||||||
| 			return -ENOSPC; |  | ||||||
| 
 |  | ||||||
| 		goto retry; |  | ||||||
| 	} |  | ||||||
| 
 |  | ||||||
| 	return rc; |  | ||||||
| } |  | ||||||
| 
 |  | ||||||
| Note how pci_enable_msix_range() return value is analyzed for a fallback - |  | ||||||
| any error code other than -ENOSPC indicates a fatal error and should not |  | ||||||
| be retried. |  | ||||||
| 
 |  | ||||||
| 4.3.2 pci_enable_msix_exact |  | ||||||
| 
 |  | ||||||
| int pci_enable_msix_exact(struct pci_dev *dev, |  | ||||||
| 			  struct msix_entry *entries, int nvec) |  | ||||||
| 
 |  | ||||||
| This variation on pci_enable_msix_range() call allows a device driver to |  | ||||||
| request exactly 'nvec' MSI-Xs. |  | ||||||
| 
 |  | ||||||
| If this function returns a negative number, it indicates an error and |  | ||||||
| the driver should not attempt to allocate any more MSI-X interrupts for |  | ||||||
| this device. |  | ||||||
| 
 |  | ||||||
| By contrast with pci_enable_msix_range() function, pci_enable_msix_exact() |  | ||||||
| returns zero in case of success, which indicates MSI-X interrupts have been |  | ||||||
| successfully allocated. |  | ||||||
| 
 |  | ||||||
| Another version of a routine that enables MSI-X mode for a device with |  | ||||||
| specific requirements described in chapter 4.3.1.3 might look like this: |  | ||||||
| 
 |  | ||||||
| /* |  | ||||||
|  * Assume 'minvec' and 'maxvec' are non-zero |  | ||||||
|  */ |  | ||||||
| static int foo_driver_enable_msix(struct foo_adapter *adapter, |  | ||||||
| 				  int minvec, int maxvec) |  | ||||||
| { |  | ||||||
| 	int rc; |  | ||||||
| 
 |  | ||||||
| 	minvec = roundup_pow_of_two(minvec); |  | ||||||
| 	maxvec = rounddown_pow_of_two(maxvec); |  | ||||||
| 
 |  | ||||||
| 	if (minvec > maxvec) |  | ||||||
| 		return -ERANGE; |  | ||||||
| 
 |  | ||||||
| retry: |  | ||||||
| 	rc = pci_enable_msix_exact(adapter->pdev, |  | ||||||
| 				   adapter->msix_entries, maxvec); |  | ||||||
| 
 |  | ||||||
| 	/* |  | ||||||
| 	 * -ENOSPC is the only error code allowed to be analyzed |  | ||||||
| 	 */ |  | ||||||
| 	if (rc == -ENOSPC) { |  | ||||||
| 		if (maxvec == 1) |  | ||||||
| 			return -ENOSPC; |  | ||||||
| 
 |  | ||||||
| 		maxvec /= 2; |  | ||||||
| 
 |  | ||||||
| 		if (minvec > maxvec) |  | ||||||
| 			return -ENOSPC; |  | ||||||
| 
 |  | ||||||
| 		goto retry; |  | ||||||
| 	} else if (rc < 0) { |  | ||||||
| 		return rc; |  | ||||||
| 	} |  | ||||||
| 
 |  | ||||||
| 	return maxvec; |  | ||||||
| } |  | ||||||
| 
 |  | ||||||
| 4.3.3 pci_disable_msix |  | ||||||
| 
 |  | ||||||
| void pci_disable_msix(struct pci_dev *dev) |  | ||||||
| 
 |  | ||||||
| This function should be used to undo the effect of pci_enable_msix_range(). |  | ||||||
| It frees the previously allocated MSI-X interrupts. The interrupts may |  | ||||||
| subsequently be assigned to another device, so drivers should not cache |  | ||||||
| the value of the 'vector' elements over a call to pci_disable_msix(). |  | ||||||
| 
 |  | ||||||
| Before calling this function, a device driver must always call free_irq() |  | ||||||
| on any interrupt for which it previously called request_irq(). |  | ||||||
| Failure to do so results in a BUG_ON(), leaving the device with |  | ||||||
| MSI-X enabled and thus leaking its vector. |  | ||||||
| 
 |  | ||||||
| 4.3.3 The MSI-X Table |  | ||||||
| 
 |  | ||||||
| The MSI-X capability specifies a BAR and offset within that BAR for the |  | ||||||
| MSI-X Table.  This address is mapped by the PCI subsystem, and should not |  | ||||||
| be accessed directly by the device driver.  If the driver wishes to |  | ||||||
| mask or unmask an interrupt, it should call disable_irq() / enable_irq(). |  | ||||||
| 
 |  | ||||||
| 4.3.4 pci_msix_vec_count |  | ||||||
| 
 |  | ||||||
| int pci_msix_vec_count(struct pci_dev *dev) |  | ||||||
| 
 |  | ||||||
| This function could be used to retrieve number of entries in the device |  | ||||||
| MSI-X table. |  | ||||||
| 
 |  | ||||||
| If this function returns a negative number, it indicates the device is |  | ||||||
| not capable of sending MSI-Xs. |  | ||||||
| 
 |  | ||||||
| If this function returns a positive number, it indicates the maximum |  | ||||||
| number of MSI-X interrupt vectors that could be allocated. |  | ||||||
| 
 |  | ||||||
| 4.4 Handling devices implementing both MSI and MSI-X capabilities |  | ||||||
| 
 |  | ||||||
| If a device implements both MSI and MSI-X capabilities, it can |  | ||||||
| run in either MSI mode or MSI-X mode, but not both simultaneously. |  | ||||||
| This is a requirement of the PCI spec, and it is enforced by the |  | ||||||
| PCI layer.  Calling pci_enable_msi_range() when MSI-X is already |  | ||||||
| enabled or pci_enable_msix_range() when MSI is already enabled |  | ||||||
| results in an error.  If a device driver wishes to switch between MSI |  | ||||||
| and MSI-X at runtime, it must first quiesce the device, then switch |  | ||||||
| it back to pin-interrupt mode, before calling pci_enable_msi_range() |  | ||||||
| or pci_enable_msix_range() and resuming operation.  This is not expected |  | ||||||
| to be a common operation but may be useful for debugging or testing |  | ||||||
| during development. |  | ||||||
| 
 |  | ||||||
| 4.5 Considerations when using MSIs |  | ||||||
| 
 |  | ||||||
| 4.5.1 Choosing between MSI-X and MSI |  | ||||||
| 
 |  | ||||||
| If your device supports both MSI-X and MSI capabilities, you should use |  | ||||||
| the MSI-X facilities in preference to the MSI facilities.  As mentioned |  | ||||||
| above, MSI-X supports any number of interrupts between 1 and 2048. |  | ||||||
| In contrast, MSI is restricted to a maximum of 32 interrupts (and |  | ||||||
| must be a power of two).  In addition, the MSI interrupt vectors must |  | ||||||
| be allocated consecutively, so the system might not be able to allocate |  | ||||||
| as many vectors for MSI as it could for MSI-X.  On some platforms, MSI |  | ||||||
| interrupts must all be targeted at the same set of CPUs whereas MSI-X |  | ||||||
| interrupts can all be targeted at different CPUs. |  | ||||||
| 
 |  | ||||||
| 4.5.2 Spinlocks |  | ||||||
| 
 | 
 | ||||||
| Most device drivers have a per-device spinlock which is taken in the | Most device drivers have a per-device spinlock which is taken in the | ||||||
| interrupt handler.  With pin-based interrupts or a single MSI, it is not | interrupt handler.  With pin-based interrupts or a single MSI, it is not | ||||||
|  | @ -505,7 +190,7 @@ acquire the spinlock.  Such deadlocks can be avoided by using | ||||||
| spin_lock_irqsave() or spin_lock_irq() which disable local interrupts | spin_lock_irqsave() or spin_lock_irq() which disable local interrupts | ||||||
| and acquire the lock (see Documentation/DocBook/kernel-locking). | and acquire the lock (see Documentation/DocBook/kernel-locking). | ||||||
| 
 | 
 | ||||||
| 4.6 How to tell whether MSI/MSI-X is enabled on a device | 4.5 How to tell whether MSI/MSI-X is enabled on a device | ||||||
| 
 | 
 | ||||||
| Using 'lspci -v' (as root) may show some devices with "MSI", "Message | Using 'lspci -v' (as root) may show some devices with "MSI", "Message | ||||||
| Signalled Interrupts" or "MSI-X" capabilities.  Each of these capabilities | Signalled Interrupts" or "MSI-X" capabilities.  Each of these capabilities | ||||||
|  |  | ||||||
|  | @ -4,6 +4,7 @@ | ||||||
|  * |  * | ||||||
|  * Copyright (C) 2003-2004 Intel |  * Copyright (C) 2003-2004 Intel | ||||||
|  * Copyright (C) Tom Long Nguyen (tom.l.nguyen@intel.com) |  * Copyright (C) Tom Long Nguyen (tom.l.nguyen@intel.com) | ||||||
|  |  * Copyright (C) 2016 Christoph Hellwig. | ||||||
|  */ |  */ | ||||||
| 
 | 
 | ||||||
| #include <linux/err.h> | #include <linux/err.h> | ||||||
|  | @ -1121,6 +1122,94 @@ int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, | ||||||
| } | } | ||||||
| EXPORT_SYMBOL(pci_enable_msix_range); | EXPORT_SYMBOL(pci_enable_msix_range); | ||||||
| 
 | 
 | ||||||
|  | /**
 | ||||||
|  |  * pci_alloc_irq_vectors - allocate multiple IRQs for a device | ||||||
|  |  * @dev:		PCI device to operate on | ||||||
|  |  * @min_vecs:		minimum number of vectors required (must be >= 1) | ||||||
|  |  * @max_vecs:		maximum (desired) number of vectors | ||||||
|  |  * @flags:		flags or quirks for the allocation | ||||||
|  |  * | ||||||
|  |  * Allocate up to @max_vecs interrupt vectors for @dev, using MSI-X or MSI | ||||||
|  |  * vectors if available, and fall back to a single legacy vector | ||||||
|  |  * if neither is available.  Return the number of vectors allocated, | ||||||
|  |  * (which might be smaller than @max_vecs) if successful, or a negative | ||||||
|  |  * error code on error. If less than @min_vecs interrupt vectors are | ||||||
|  |  * available for @dev the function will fail with -ENOSPC. | ||||||
|  |  * | ||||||
|  |  * To get the Linux IRQ number used for a vector that can be passed to | ||||||
|  |  * request_irq() use the pci_irq_vector() helper. | ||||||
|  |  */ | ||||||
|  | int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, | ||||||
|  | 		unsigned int max_vecs, unsigned int flags) | ||||||
|  | { | ||||||
|  | 	int vecs = -ENOSPC; | ||||||
|  | 
 | ||||||
|  | 	if (!(flags & PCI_IRQ_NOMSIX)) { | ||||||
|  | 		vecs = pci_enable_msix_range(dev, NULL, min_vecs, max_vecs); | ||||||
|  | 		if (vecs > 0) | ||||||
|  | 			return vecs; | ||||||
|  | 	} | ||||||
|  | 
 | ||||||
|  | 	if (!(flags & PCI_IRQ_NOMSI)) { | ||||||
|  | 		vecs = pci_enable_msi_range(dev, min_vecs, max_vecs); | ||||||
|  | 		if (vecs > 0) | ||||||
|  | 			return vecs; | ||||||
|  | 	} | ||||||
|  | 
 | ||||||
|  | 	/* use legacy irq if allowed */ | ||||||
|  | 	if (!(flags & PCI_IRQ_NOLEGACY) && min_vecs == 1) | ||||||
|  | 		return 1; | ||||||
|  | 	return vecs; | ||||||
|  | } | ||||||
|  | EXPORT_SYMBOL(pci_alloc_irq_vectors); | ||||||
|  | 
 | ||||||
|  | /**
 | ||||||
|  |  * pci_free_irq_vectors - free previously allocated IRQs for a device | ||||||
|  |  * @dev:		PCI device to operate on | ||||||
|  |  * | ||||||
|  |  * Undoes the allocations and enabling in pci_alloc_irq_vectors(). | ||||||
|  |  */ | ||||||
|  | void pci_free_irq_vectors(struct pci_dev *dev) | ||||||
|  | { | ||||||
|  | 	pci_disable_msix(dev); | ||||||
|  | 	pci_disable_msi(dev); | ||||||
|  | } | ||||||
|  | EXPORT_SYMBOL(pci_free_irq_vectors); | ||||||
|  | 
 | ||||||
|  | /**
 | ||||||
|  |  * pci_irq_vector - return Linux IRQ number of a device vector | ||||||
|  |  * @dev: PCI device to operate on | ||||||
|  |  * @nr: device-relative interrupt vector index (0-based). | ||||||
|  |  */ | ||||||
|  | int pci_irq_vector(struct pci_dev *dev, unsigned int nr) | ||||||
|  | { | ||||||
|  | 	if (dev->msix_enabled) { | ||||||
|  | 		struct msi_desc *entry; | ||||||
|  | 		int i = 0; | ||||||
|  | 
 | ||||||
|  | 		for_each_pci_msi_entry(entry, dev) { | ||||||
|  | 			if (i == nr) | ||||||
|  | 				return entry->irq; | ||||||
|  | 			i++; | ||||||
|  | 		} | ||||||
|  | 		WARN_ON_ONCE(1); | ||||||
|  | 		return -EINVAL; | ||||||
|  | 	} | ||||||
|  | 
 | ||||||
|  | 	if (dev->msi_enabled) { | ||||||
|  | 		struct msi_desc *entry = first_pci_msi_entry(dev); | ||||||
|  | 
 | ||||||
|  | 		if (WARN_ON_ONCE(nr >= entry->nvec_used)) | ||||||
|  | 			return -EINVAL; | ||||||
|  | 	} else { | ||||||
|  | 		if (WARN_ON_ONCE(nr > 0)) | ||||||
|  | 			return -EINVAL; | ||||||
|  | 	} | ||||||
|  | 
 | ||||||
|  | 	return dev->irq + nr; | ||||||
|  | } | ||||||
|  | EXPORT_SYMBOL(pci_irq_vector); | ||||||
|  | 
 | ||||||
| struct pci_dev *msi_desc_to_pci_dev(struct msi_desc *desc) | struct pci_dev *msi_desc_to_pci_dev(struct msi_desc *desc) | ||||||
| { | { | ||||||
| 	return to_pci_dev(desc->dev); | 	return to_pci_dev(desc->dev); | ||||||
|  |  | ||||||
|  | @ -1237,6 +1237,10 @@ resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev, int resno); | ||||||
| int pci_set_vga_state(struct pci_dev *pdev, bool decode, | int pci_set_vga_state(struct pci_dev *pdev, bool decode, | ||||||
| 		      unsigned int command_bits, u32 flags); | 		      unsigned int command_bits, u32 flags); | ||||||
| 
 | 
 | ||||||
|  | #define PCI_IRQ_NOLEGACY	(1 << 0) /* don't use legacy interrupts */ | ||||||
|  | #define PCI_IRQ_NOMSI		(1 << 1) /* don't use MSI interrupts */ | ||||||
|  | #define PCI_IRQ_NOMSIX		(1 << 2) /* don't use MSI-X interrupts */ | ||||||
|  | 
 | ||||||
| /* kmem_cache style wrapper around pci_alloc_consistent() */ | /* kmem_cache style wrapper around pci_alloc_consistent() */ | ||||||
| 
 | 
 | ||||||
| #include <linux/pci-dma.h> | #include <linux/pci-dma.h> | ||||||
|  | @ -1284,6 +1288,11 @@ static inline int pci_enable_msix_exact(struct pci_dev *dev, | ||||||
| 		return rc; | 		return rc; | ||||||
| 	return 0; | 	return 0; | ||||||
| } | } | ||||||
|  | int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, | ||||||
|  | 		unsigned int max_vecs, unsigned int flags); | ||||||
|  | void pci_free_irq_vectors(struct pci_dev *dev); | ||||||
|  | int pci_irq_vector(struct pci_dev *dev, unsigned int nr); | ||||||
|  | 
 | ||||||
| #else | #else | ||||||
| static inline int pci_msi_vec_count(struct pci_dev *dev) { return -ENOSYS; } | static inline int pci_msi_vec_count(struct pci_dev *dev) { return -ENOSYS; } | ||||||
| static inline void pci_msi_shutdown(struct pci_dev *dev) { } | static inline void pci_msi_shutdown(struct pci_dev *dev) { } | ||||||
|  | @ -1307,6 +1316,24 @@ static inline int pci_enable_msix_range(struct pci_dev *dev, | ||||||
| static inline int pci_enable_msix_exact(struct pci_dev *dev, | static inline int pci_enable_msix_exact(struct pci_dev *dev, | ||||||
| 		      struct msix_entry *entries, int nvec) | 		      struct msix_entry *entries, int nvec) | ||||||
| { return -ENOSYS; } | { return -ENOSYS; } | ||||||
|  | static inline int pci_alloc_irq_vectors(struct pci_dev *dev, | ||||||
|  | 		unsigned int min_vecs, unsigned int max_vecs, | ||||||
|  | 		unsigned int flags) | ||||||
|  | { | ||||||
|  | 	if (min_vecs > 1) | ||||||
|  | 		return -EINVAL; | ||||||
|  | 	return 1; | ||||||
|  | } | ||||||
|  | static inline void pci_free_irq_vectors(struct pci_dev *dev) | ||||||
|  | { | ||||||
|  | } | ||||||
|  | 
 | ||||||
|  | static inline int pci_irq_vector(struct pci_dev *dev, unsigned int nr) | ||||||
|  | { | ||||||
|  | 	if (WARN_ON_ONCE(nr > 0)) | ||||||
|  | 		return -EINVAL; | ||||||
|  | 	return dev->irq; | ||||||
|  | } | ||||||
| #endif | #endif | ||||||
| 
 | 
 | ||||||
| #ifdef CONFIG_PCIEPORTBUS | #ifdef CONFIG_PCIEPORTBUS | ||||||
|  |  | ||||||
		Loading…
	
		Reference in a new issue
	
	 Christoph Hellwig
						Christoph Hellwig