forked from mirrors/linux
		
	PCI: Provide sensible IRQ vector alloc/free routines
Add a function to allocate and free a range of interrupt vectors, using MSI-X, MSI or legacy vectors (in that order) based on the capabilities of the underlying device and PCIe complex. Additionally a new helper is provided to get the Linux IRQ number for given device-relative vector so that the drivers don't need to allocate their own arrays to keep track of the vectors for the multi vector MSI-X case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alexander Gordeev <agordeev@redhat.com>
This commit is contained in:
		
							parent
							
								
									3ac020e0ca
								
							
						
					
					
						commit
						aff171641d
					
				
					 3 changed files with 192 additions and 391 deletions
				
			
		|  | @ -78,422 +78,107 @@ CONFIG_PCI_MSI option. | |||
| 
 | ||||
| 4.2 Using MSI | ||||
| 
 | ||||
| Most of the hard work is done for the driver in the PCI layer.  It simply | ||||
| has to request that the PCI layer set up the MSI capability for this | ||||
| Most of the hard work is done for the driver in the PCI layer.  The driver | ||||
| simply has to request that the PCI layer set up the MSI capability for this | ||||
| device. | ||||
| 
 | ||||
| 4.2.1 pci_enable_msi | ||||
| To automatically use MSI or MSI-X interrupt vectors, use the following | ||||
| function: | ||||
| 
 | ||||
| int pci_enable_msi(struct pci_dev *dev) | ||||
|   int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, | ||||
| 		unsigned int max_vecs, unsigned int flags); | ||||
| 
 | ||||
| A successful call allocates ONE interrupt to the device, regardless | ||||
| of how many MSIs the device supports.  The device is switched from | ||||
| pin-based interrupt mode to MSI mode.  The dev->irq number is changed | ||||
| to a new number which represents the message signaled interrupt; | ||||
| consequently, this function should be called before the driver calls | ||||
| request_irq(), because an MSI is delivered via a vector that is | ||||
| different from the vector of a pin-based interrupt. | ||||
| which allocates up to max_vecs interrupt vectors for a PCI device.  It | ||||
| returns the number of vectors allocated or a negative error.  If the device | ||||
| has a requirements for a minimum number of vectors the driver can pass a | ||||
| min_vecs argument set to this limit, and the PCI core will return -ENOSPC | ||||
| if it can't meet the minimum number of vectors. | ||||
| 
 | ||||
| 4.2.2 pci_enable_msi_range | ||||
| The flags argument should normally be set to 0, but can be used to pass the | ||||
| PCI_IRQ_NOMSI and PCI_IRQ_NOMSIX flag in case a device claims to support | ||||
| MSI or MSI-X, but the support is broken, or to pass PCI_IRQ_NOLEGACY in | ||||
| case the device does not support legacy interrupt lines. | ||||
| 
 | ||||
| int pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec) | ||||
| To get the Linux IRQ numbers passed to request_irq() and free_irq() and the | ||||
| vectors, use the following function: | ||||
| 
 | ||||
| This function allows a device driver to request any number of MSI | ||||
| interrupts within specified range from 'minvec' to 'maxvec'. | ||||
|   int pci_irq_vector(struct pci_dev *dev, unsigned int nr); | ||||
| 
 | ||||
| If this function returns a positive number it indicates the number of | ||||
| MSI interrupts that have been successfully allocated.  In this case | ||||
| the device is switched from pin-based interrupt mode to MSI mode and | ||||
| updates dev->irq to be the lowest of the new interrupts assigned to it. | ||||
| The other interrupts assigned to the device are in the range dev->irq | ||||
| to dev->irq + returned value - 1.  Device driver can use the returned | ||||
| number of successfully allocated MSI interrupts to further allocate | ||||
| and initialize device resources. | ||||
| Any allocated resources should be freed before removing the device using | ||||
| the following function: | ||||
| 
 | ||||
| If this function returns a negative number, it indicates an error and | ||||
| the driver should not attempt to request any more MSI interrupts for | ||||
| this device. | ||||
|   void pci_free_irq_vectors(struct pci_dev *dev); | ||||
| 
 | ||||
| This function should be called before the driver calls request_irq(), | ||||
| because MSI interrupts are delivered via vectors that are different | ||||
| from the vector of a pin-based interrupt. | ||||
| If a device supports both MSI-X and MSI capabilities, this API will use the | ||||
| MSI-X facilities in preference to the MSI facilities.  MSI-X supports any | ||||
| number of interrupts between 1 and 2048.  In contrast, MSI is restricted to | ||||
| a maximum of 32 interrupts (and must be a power of two).  In addition, the | ||||
| MSI interrupt vectors must be allocated consecutively, so the system might | ||||
| not be able to allocate as many vectors for MSI as it could for MSI-X.  On | ||||
| some platforms, MSI interrupts must all be targeted at the same set of CPUs | ||||
| whereas MSI-X interrupts can all be targeted at different CPUs. | ||||
| 
 | ||||
| It is ideal if drivers can cope with a variable number of MSI interrupts; | ||||
| there are many reasons why the platform may not be able to provide the | ||||
| exact number that a driver asks for. | ||||
| If a device supports neither MSI-X or MSI it will fall back to a single | ||||
| legacy IRQ vector. | ||||
| 
 | ||||
| There could be devices that can not operate with just any number of MSI | ||||
| interrupts within a range.  See chapter 4.3.1.3 to get the idea how to | ||||
| handle such devices for MSI-X - the same logic applies to MSI. | ||||
| The typical usage of MSI or MSI-X interrupts is to allocate as many vectors | ||||
| as possible, likely up to the limit supported by the device.  If nvec is | ||||
| larger than the number supported by the device it will automatically be | ||||
| capped to the supported limit, so there is no need to query the number of | ||||
| vectors supported beforehand: | ||||
| 
 | ||||
| 4.2.1.1 Maximum possible number of MSI interrupts | ||||
| 
 | ||||
| The typical usage of MSI interrupts is to allocate as many vectors as | ||||
| possible, likely up to the limit returned by pci_msi_vec_count() function: | ||||
| 
 | ||||
| static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) | ||||
| { | ||||
| 	return pci_enable_msi_range(pdev, 1, nvec); | ||||
| } | ||||
| 
 | ||||
| Note the value of 'minvec' parameter is 1.  As 'minvec' is inclusive, | ||||
| the value of 0 would be meaningless and could result in error. | ||||
| 
 | ||||
| Some devices have a minimal limit on number of MSI interrupts. | ||||
| In this case the function could look like this: | ||||
| 
 | ||||
| static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) | ||||
| { | ||||
| 	return pci_enable_msi_range(pdev, FOO_DRIVER_MINIMUM_NVEC, nvec); | ||||
| } | ||||
| 
 | ||||
| 4.2.1.2 Exact number of MSI interrupts | ||||
| 	nvec = pci_alloc_irq_vectors(pdev, 1, nvec, 0); | ||||
| 	if (nvec < 0) | ||||
| 		goto out_err; | ||||
| 
 | ||||
| If a driver is unable or unwilling to deal with a variable number of MSI | ||||
| interrupts it could request a particular number of interrupts by passing | ||||
| that number to pci_enable_msi_range() function as both 'minvec' and 'maxvec' | ||||
| parameters: | ||||
| interrupts it can request a particular number of interrupts by passing that | ||||
| number to pci_alloc_irq_vectors() function as both 'min_vecs' and | ||||
| 'max_vecs' parameters: | ||||
| 
 | ||||
| static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) | ||||
| { | ||||
| 	return pci_enable_msi_range(pdev, nvec, nvec); | ||||
| } | ||||
| 	ret = pci_alloc_irq_vectors(pdev, nvec, nvec, 0); | ||||
| 	if (ret < 0) | ||||
| 		goto out_err; | ||||
| 
 | ||||
| Note, unlike pci_enable_msi_exact() function, which could be also used to | ||||
| enable a particular number of MSI-X interrupts, pci_enable_msi_range() | ||||
| returns either a negative errno or 'nvec' (not negative errno or 0 - as | ||||
| pci_enable_msi_exact() does). | ||||
| The most notorious example of the request type described above is enabling | ||||
| the single MSI mode for a device.  It could be done by passing two 1s as | ||||
| 'min_vecs' and 'max_vecs': | ||||
| 
 | ||||
| 4.2.1.3 Single MSI mode | ||||
| 	ret = pci_alloc_irq_vectors(pdev, 1, 1, 0); | ||||
| 	if (ret < 0) | ||||
| 		goto out_err; | ||||
| 
 | ||||
| The most notorious example of the request type described above is | ||||
| enabling the single MSI mode for a device.  It could be done by passing | ||||
| two 1s as 'minvec' and 'maxvec': | ||||
| Some devices might not support using legacy line interrupts, in which case | ||||
| the PCI_IRQ_NOLEGACY flag can be used to fail the request if the platform | ||||
| can't provide MSI or MSI-X interrupts: | ||||
| 
 | ||||
| static int foo_driver_enable_single_msi(struct pci_dev *pdev) | ||||
| { | ||||
| 	return pci_enable_msi_range(pdev, 1, 1); | ||||
| } | ||||
| 	nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_NOLEGACY); | ||||
| 	if (nvec < 0) | ||||
| 		goto out_err; | ||||
| 
 | ||||
| Note, unlike pci_enable_msi() function, which could be also used to | ||||
| enable the single MSI mode, pci_enable_msi_range() returns either a | ||||
| negative errno or 1 (not negative errno or 0 - as pci_enable_msi() | ||||
| does). | ||||
| 4.3 Legacy APIs | ||||
| 
 | ||||
| 4.2.3 pci_enable_msi_exact | ||||
| The following old APIs to enable and disable MSI or MSI-X interrupts should | ||||
| not be used in new code: | ||||
| 
 | ||||
| int pci_enable_msi_exact(struct pci_dev *dev, int nvec) | ||||
|   pci_enable_msi()		/* deprecated */ | ||||
|   pci_enable_msi_range()	/* deprecated */ | ||||
|   pci_enable_msi_exact()	/* deprecated */ | ||||
|   pci_disable_msi()		/* deprecated */ | ||||
|   pci_enable_msix_range()	/* deprecated */ | ||||
|   pci_enable_msix_exact()	/* deprecated */ | ||||
|   pci_disable_msix()		/* deprecated */ | ||||
| 
 | ||||
| This variation on pci_enable_msi_range() call allows a device driver to | ||||
| request exactly 'nvec' MSIs. | ||||
| Additionally there are APIs to provide the number of supported MSI or MSI-X | ||||
| vectors: pci_msi_vec_count() and pci_msix_vec_count().  In general these | ||||
| should be avoided in favor of letting pci_alloc_irq_vectors() cap the | ||||
| number of vectors.  If you have a legitimate special use case for the count | ||||
| of vectors we might have to revisit that decision and add a | ||||
| pci_nr_irq_vectors() helper that handles MSI and MSI-X transparently. | ||||
| 
 | ||||
| If this function returns a negative number, it indicates an error and | ||||
| the driver should not attempt to request any more MSI interrupts for | ||||
| this device. | ||||
| 4.4 Considerations when using MSIs | ||||
| 
 | ||||
| By contrast with pci_enable_msi_range() function, pci_enable_msi_exact() | ||||
| returns zero in case of success, which indicates MSI interrupts have been | ||||
| successfully allocated. | ||||
| 
 | ||||
| 4.2.4 pci_disable_msi | ||||
| 
 | ||||
| void pci_disable_msi(struct pci_dev *dev) | ||||
| 
 | ||||
| This function should be used to undo the effect of pci_enable_msi_range(). | ||||
| Calling it restores dev->irq to the pin-based interrupt number and frees | ||||
| the previously allocated MSIs.  The interrupts may subsequently be assigned | ||||
| to another device, so drivers should not cache the value of dev->irq. | ||||
| 
 | ||||
| Before calling this function, a device driver must always call free_irq() | ||||
| on any interrupt for which it previously called request_irq(). | ||||
| Failure to do so results in a BUG_ON(), leaving the device with | ||||
| MSI enabled and thus leaking its vector. | ||||
| 
 | ||||
| 4.2.4 pci_msi_vec_count | ||||
| 
 | ||||
| int pci_msi_vec_count(struct pci_dev *dev) | ||||
| 
 | ||||
| This function could be used to retrieve the number of MSI vectors the | ||||
| device requested (via the Multiple Message Capable register). The MSI | ||||
| specification only allows the returned value to be a power of two, | ||||
| up to a maximum of 2^5 (32). | ||||
| 
 | ||||
| If this function returns a negative number, it indicates the device is | ||||
| not capable of sending MSIs. | ||||
| 
 | ||||
| If this function returns a positive number, it indicates the maximum | ||||
| number of MSI interrupt vectors that could be allocated. | ||||
| 
 | ||||
| 4.3 Using MSI-X | ||||
| 
 | ||||
| The MSI-X capability is much more flexible than the MSI capability. | ||||
| It supports up to 2048 interrupts, each of which can be controlled | ||||
| independently.  To support this flexibility, drivers must use an array of | ||||
| `struct msix_entry': | ||||
| 
 | ||||
| struct msix_entry { | ||||
| 	u16 	vector; /* kernel uses to write alloc vector */ | ||||
| 	u16	entry; /* driver uses to specify entry */ | ||||
| }; | ||||
| 
 | ||||
| This allows for the device to use these interrupts in a sparse fashion; | ||||
| for example, it could use interrupts 3 and 1027 and yet allocate only a | ||||
| two-element array.  The driver is expected to fill in the 'entry' value | ||||
| in each element of the array to indicate for which entries the kernel | ||||
| should assign interrupts; it is invalid to fill in two entries with the | ||||
| same number. | ||||
| 
 | ||||
| 4.3.1 pci_enable_msix_range | ||||
| 
 | ||||
| int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, | ||||
| 			  int minvec, int maxvec) | ||||
| 
 | ||||
| Calling this function asks the PCI subsystem to allocate any number of | ||||
| MSI-X interrupts within specified range from 'minvec' to 'maxvec'. | ||||
| The 'entries' argument is a pointer to an array of msix_entry structs | ||||
| which should be at least 'maxvec' entries in size. | ||||
| 
 | ||||
| On success, the device is switched into MSI-X mode and the function | ||||
| returns the number of MSI-X interrupts that have been successfully | ||||
| allocated.  In this case the 'vector' member in entries numbered from | ||||
| 0 to the returned value - 1 is populated with the interrupt number; | ||||
| the driver should then call request_irq() for each 'vector' that it | ||||
| decides to use.  The device driver is responsible for keeping track of the | ||||
| interrupts assigned to the MSI-X vectors so it can free them again later. | ||||
| Device driver can use the returned number of successfully allocated MSI-X | ||||
| interrupts to further allocate and initialize device resources. | ||||
| 
 | ||||
| If this function returns a negative number, it indicates an error and | ||||
| the driver should not attempt to allocate any more MSI-X interrupts for | ||||
| this device. | ||||
| 
 | ||||
| This function, in contrast with pci_enable_msi_range(), does not adjust | ||||
| dev->irq.  The device will not generate interrupts for this interrupt | ||||
| number once MSI-X is enabled. | ||||
| 
 | ||||
| Device drivers should normally call this function once per device | ||||
| during the initialization phase. | ||||
| 
 | ||||
| It is ideal if drivers can cope with a variable number of MSI-X interrupts; | ||||
| there are many reasons why the platform may not be able to provide the | ||||
| exact number that a driver asks for. | ||||
| 
 | ||||
| There could be devices that can not operate with just any number of MSI-X | ||||
| interrupts within a range.  E.g., an network adapter might need let's say | ||||
| four vectors per each queue it provides.  Therefore, a number of MSI-X | ||||
| interrupts allocated should be a multiple of four.  In this case interface | ||||
| pci_enable_msix_range() can not be used alone to request MSI-X interrupts | ||||
| (since it can allocate any number within the range, without any notion of | ||||
| the multiple of four) and the device driver should master a custom logic | ||||
| to request the required number of MSI-X interrupts. | ||||
| 
 | ||||
| 4.3.1.1 Maximum possible number of MSI-X interrupts | ||||
| 
 | ||||
| The typical usage of MSI-X interrupts is to allocate as many vectors as | ||||
| possible, likely up to the limit returned by pci_msix_vec_count() function: | ||||
| 
 | ||||
| static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) | ||||
| { | ||||
| 	return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, | ||||
| 				     1, nvec); | ||||
| } | ||||
| 
 | ||||
| Note the value of 'minvec' parameter is 1.  As 'minvec' is inclusive, | ||||
| the value of 0 would be meaningless and could result in error. | ||||
| 
 | ||||
| Some devices have a minimal limit on number of MSI-X interrupts. | ||||
| In this case the function could look like this: | ||||
| 
 | ||||
| static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) | ||||
| { | ||||
| 	return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, | ||||
| 				     FOO_DRIVER_MINIMUM_NVEC, nvec); | ||||
| } | ||||
| 
 | ||||
| 4.3.1.2 Exact number of MSI-X interrupts | ||||
| 
 | ||||
| If a driver is unable or unwilling to deal with a variable number of MSI-X | ||||
| interrupts it could request a particular number of interrupts by passing | ||||
| that number to pci_enable_msix_range() function as both 'minvec' and 'maxvec' | ||||
| parameters: | ||||
| 
 | ||||
| static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) | ||||
| { | ||||
| 	return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, | ||||
| 				     nvec, nvec); | ||||
| } | ||||
| 
 | ||||
| Note, unlike pci_enable_msix_exact() function, which could be also used to | ||||
| enable a particular number of MSI-X interrupts, pci_enable_msix_range() | ||||
| returns either a negative errno or 'nvec' (not negative errno or 0 - as | ||||
| pci_enable_msix_exact() does). | ||||
| 
 | ||||
| 4.3.1.3 Specific requirements to the number of MSI-X interrupts | ||||
| 
 | ||||
| As noted above, there could be devices that can not operate with just any | ||||
| number of MSI-X interrupts within a range.  E.g., let's assume a device that | ||||
| is only capable sending the number of MSI-X interrupts which is a power of | ||||
| two.  A routine that enables MSI-X mode for such device might look like this: | ||||
| 
 | ||||
| /* | ||||
|  * Assume 'minvec' and 'maxvec' are non-zero | ||||
|  */ | ||||
| static int foo_driver_enable_msix(struct foo_adapter *adapter, | ||||
| 				  int minvec, int maxvec) | ||||
| { | ||||
| 	int rc; | ||||
| 
 | ||||
| 	minvec = roundup_pow_of_two(minvec); | ||||
| 	maxvec = rounddown_pow_of_two(maxvec); | ||||
| 
 | ||||
| 	if (minvec > maxvec) | ||||
| 		return -ERANGE; | ||||
| 
 | ||||
| retry: | ||||
| 	rc = pci_enable_msix_range(adapter->pdev, adapter->msix_entries, | ||||
| 				   maxvec, maxvec); | ||||
| 	/* | ||||
| 	 * -ENOSPC is the only error code allowed to be analyzed | ||||
| 	 */ | ||||
| 	if (rc == -ENOSPC) { | ||||
| 		if (maxvec == 1) | ||||
| 			return -ENOSPC; | ||||
| 
 | ||||
| 		maxvec /= 2; | ||||
| 
 | ||||
| 		if (minvec > maxvec) | ||||
| 			return -ENOSPC; | ||||
| 
 | ||||
| 		goto retry; | ||||
| 	} | ||||
| 
 | ||||
| 	return rc; | ||||
| } | ||||
| 
 | ||||
| Note how pci_enable_msix_range() return value is analyzed for a fallback - | ||||
| any error code other than -ENOSPC indicates a fatal error and should not | ||||
| be retried. | ||||
| 
 | ||||
| 4.3.2 pci_enable_msix_exact | ||||
| 
 | ||||
| int pci_enable_msix_exact(struct pci_dev *dev, | ||||
| 			  struct msix_entry *entries, int nvec) | ||||
| 
 | ||||
| This variation on pci_enable_msix_range() call allows a device driver to | ||||
| request exactly 'nvec' MSI-Xs. | ||||
| 
 | ||||
| If this function returns a negative number, it indicates an error and | ||||
| the driver should not attempt to allocate any more MSI-X interrupts for | ||||
| this device. | ||||
| 
 | ||||
| By contrast with pci_enable_msix_range() function, pci_enable_msix_exact() | ||||
| returns zero in case of success, which indicates MSI-X interrupts have been | ||||
| successfully allocated. | ||||
| 
 | ||||
| Another version of a routine that enables MSI-X mode for a device with | ||||
| specific requirements described in chapter 4.3.1.3 might look like this: | ||||
| 
 | ||||
| /* | ||||
|  * Assume 'minvec' and 'maxvec' are non-zero | ||||
|  */ | ||||
| static int foo_driver_enable_msix(struct foo_adapter *adapter, | ||||
| 				  int minvec, int maxvec) | ||||
| { | ||||
| 	int rc; | ||||
| 
 | ||||
| 	minvec = roundup_pow_of_two(minvec); | ||||
| 	maxvec = rounddown_pow_of_two(maxvec); | ||||
| 
 | ||||
| 	if (minvec > maxvec) | ||||
| 		return -ERANGE; | ||||
| 
 | ||||
| retry: | ||||
| 	rc = pci_enable_msix_exact(adapter->pdev, | ||||
| 				   adapter->msix_entries, maxvec); | ||||
| 
 | ||||
| 	/* | ||||
| 	 * -ENOSPC is the only error code allowed to be analyzed | ||||
| 	 */ | ||||
| 	if (rc == -ENOSPC) { | ||||
| 		if (maxvec == 1) | ||||
| 			return -ENOSPC; | ||||
| 
 | ||||
| 		maxvec /= 2; | ||||
| 
 | ||||
| 		if (minvec > maxvec) | ||||
| 			return -ENOSPC; | ||||
| 
 | ||||
| 		goto retry; | ||||
| 	} else if (rc < 0) { | ||||
| 		return rc; | ||||
| 	} | ||||
| 
 | ||||
| 	return maxvec; | ||||
| } | ||||
| 
 | ||||
| 4.3.3 pci_disable_msix | ||||
| 
 | ||||
| void pci_disable_msix(struct pci_dev *dev) | ||||
| 
 | ||||
| This function should be used to undo the effect of pci_enable_msix_range(). | ||||
| It frees the previously allocated MSI-X interrupts. The interrupts may | ||||
| subsequently be assigned to another device, so drivers should not cache | ||||
| the value of the 'vector' elements over a call to pci_disable_msix(). | ||||
| 
 | ||||
| Before calling this function, a device driver must always call free_irq() | ||||
| on any interrupt for which it previously called request_irq(). | ||||
| Failure to do so results in a BUG_ON(), leaving the device with | ||||
| MSI-X enabled and thus leaking its vector. | ||||
| 
 | ||||
| 4.3.3 The MSI-X Table | ||||
| 
 | ||||
| The MSI-X capability specifies a BAR and offset within that BAR for the | ||||
| MSI-X Table.  This address is mapped by the PCI subsystem, and should not | ||||
| be accessed directly by the device driver.  If the driver wishes to | ||||
| mask or unmask an interrupt, it should call disable_irq() / enable_irq(). | ||||
| 
 | ||||
| 4.3.4 pci_msix_vec_count | ||||
| 
 | ||||
| int pci_msix_vec_count(struct pci_dev *dev) | ||||
| 
 | ||||
| This function could be used to retrieve number of entries in the device | ||||
| MSI-X table. | ||||
| 
 | ||||
| If this function returns a negative number, it indicates the device is | ||||
| not capable of sending MSI-Xs. | ||||
| 
 | ||||
| If this function returns a positive number, it indicates the maximum | ||||
| number of MSI-X interrupt vectors that could be allocated. | ||||
| 
 | ||||
| 4.4 Handling devices implementing both MSI and MSI-X capabilities | ||||
| 
 | ||||
| If a device implements both MSI and MSI-X capabilities, it can | ||||
| run in either MSI mode or MSI-X mode, but not both simultaneously. | ||||
| This is a requirement of the PCI spec, and it is enforced by the | ||||
| PCI layer.  Calling pci_enable_msi_range() when MSI-X is already | ||||
| enabled or pci_enable_msix_range() when MSI is already enabled | ||||
| results in an error.  If a device driver wishes to switch between MSI | ||||
| and MSI-X at runtime, it must first quiesce the device, then switch | ||||
| it back to pin-interrupt mode, before calling pci_enable_msi_range() | ||||
| or pci_enable_msix_range() and resuming operation.  This is not expected | ||||
| to be a common operation but may be useful for debugging or testing | ||||
| during development. | ||||
| 
 | ||||
| 4.5 Considerations when using MSIs | ||||
| 
 | ||||
| 4.5.1 Choosing between MSI-X and MSI | ||||
| 
 | ||||
| If your device supports both MSI-X and MSI capabilities, you should use | ||||
| the MSI-X facilities in preference to the MSI facilities.  As mentioned | ||||
| above, MSI-X supports any number of interrupts between 1 and 2048. | ||||
| In contrast, MSI is restricted to a maximum of 32 interrupts (and | ||||
| must be a power of two).  In addition, the MSI interrupt vectors must | ||||
| be allocated consecutively, so the system might not be able to allocate | ||||
| as many vectors for MSI as it could for MSI-X.  On some platforms, MSI | ||||
| interrupts must all be targeted at the same set of CPUs whereas MSI-X | ||||
| interrupts can all be targeted at different CPUs. | ||||
| 
 | ||||
| 4.5.2 Spinlocks | ||||
| 4.4.1 Spinlocks | ||||
| 
 | ||||
| Most device drivers have a per-device spinlock which is taken in the | ||||
| interrupt handler.  With pin-based interrupts or a single MSI, it is not | ||||
|  | @ -505,7 +190,7 @@ acquire the spinlock.  Such deadlocks can be avoided by using | |||
| spin_lock_irqsave() or spin_lock_irq() which disable local interrupts | ||||
| and acquire the lock (see Documentation/DocBook/kernel-locking). | ||||
| 
 | ||||
| 4.6 How to tell whether MSI/MSI-X is enabled on a device | ||||
| 4.5 How to tell whether MSI/MSI-X is enabled on a device | ||||
| 
 | ||||
| Using 'lspci -v' (as root) may show some devices with "MSI", "Message | ||||
| Signalled Interrupts" or "MSI-X" capabilities.  Each of these capabilities | ||||
|  |  | |||
|  | @ -4,6 +4,7 @@ | |||
|  * | ||||
|  * Copyright (C) 2003-2004 Intel | ||||
|  * Copyright (C) Tom Long Nguyen (tom.l.nguyen@intel.com) | ||||
|  * Copyright (C) 2016 Christoph Hellwig. | ||||
|  */ | ||||
| 
 | ||||
| #include <linux/err.h> | ||||
|  | @ -1121,6 +1122,94 @@ int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, | |||
| } | ||||
| EXPORT_SYMBOL(pci_enable_msix_range); | ||||
| 
 | ||||
| /**
 | ||||
|  * pci_alloc_irq_vectors - allocate multiple IRQs for a device | ||||
|  * @dev:		PCI device to operate on | ||||
|  * @min_vecs:		minimum number of vectors required (must be >= 1) | ||||
|  * @max_vecs:		maximum (desired) number of vectors | ||||
|  * @flags:		flags or quirks for the allocation | ||||
|  * | ||||
|  * Allocate up to @max_vecs interrupt vectors for @dev, using MSI-X or MSI | ||||
|  * vectors if available, and fall back to a single legacy vector | ||||
|  * if neither is available.  Return the number of vectors allocated, | ||||
|  * (which might be smaller than @max_vecs) if successful, or a negative | ||||
|  * error code on error. If less than @min_vecs interrupt vectors are | ||||
|  * available for @dev the function will fail with -ENOSPC. | ||||
|  * | ||||
|  * To get the Linux IRQ number used for a vector that can be passed to | ||||
|  * request_irq() use the pci_irq_vector() helper. | ||||
|  */ | ||||
| int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, | ||||
| 		unsigned int max_vecs, unsigned int flags) | ||||
| { | ||||
| 	int vecs = -ENOSPC; | ||||
| 
 | ||||
| 	if (!(flags & PCI_IRQ_NOMSIX)) { | ||||
| 		vecs = pci_enable_msix_range(dev, NULL, min_vecs, max_vecs); | ||||
| 		if (vecs > 0) | ||||
| 			return vecs; | ||||
| 	} | ||||
| 
 | ||||
| 	if (!(flags & PCI_IRQ_NOMSI)) { | ||||
| 		vecs = pci_enable_msi_range(dev, min_vecs, max_vecs); | ||||
| 		if (vecs > 0) | ||||
| 			return vecs; | ||||
| 	} | ||||
| 
 | ||||
| 	/* use legacy irq if allowed */ | ||||
| 	if (!(flags & PCI_IRQ_NOLEGACY) && min_vecs == 1) | ||||
| 		return 1; | ||||
| 	return vecs; | ||||
| } | ||||
| EXPORT_SYMBOL(pci_alloc_irq_vectors); | ||||
| 
 | ||||
| /**
 | ||||
|  * pci_free_irq_vectors - free previously allocated IRQs for a device | ||||
|  * @dev:		PCI device to operate on | ||||
|  * | ||||
|  * Undoes the allocations and enabling in pci_alloc_irq_vectors(). | ||||
|  */ | ||||
| void pci_free_irq_vectors(struct pci_dev *dev) | ||||
| { | ||||
| 	pci_disable_msix(dev); | ||||
| 	pci_disable_msi(dev); | ||||
| } | ||||
| EXPORT_SYMBOL(pci_free_irq_vectors); | ||||
| 
 | ||||
| /**
 | ||||
|  * pci_irq_vector - return Linux IRQ number of a device vector | ||||
|  * @dev: PCI device to operate on | ||||
|  * @nr: device-relative interrupt vector index (0-based). | ||||
|  */ | ||||
| int pci_irq_vector(struct pci_dev *dev, unsigned int nr) | ||||
| { | ||||
| 	if (dev->msix_enabled) { | ||||
| 		struct msi_desc *entry; | ||||
| 		int i = 0; | ||||
| 
 | ||||
| 		for_each_pci_msi_entry(entry, dev) { | ||||
| 			if (i == nr) | ||||
| 				return entry->irq; | ||||
| 			i++; | ||||
| 		} | ||||
| 		WARN_ON_ONCE(1); | ||||
| 		return -EINVAL; | ||||
| 	} | ||||
| 
 | ||||
| 	if (dev->msi_enabled) { | ||||
| 		struct msi_desc *entry = first_pci_msi_entry(dev); | ||||
| 
 | ||||
| 		if (WARN_ON_ONCE(nr >= entry->nvec_used)) | ||||
| 			return -EINVAL; | ||||
| 	} else { | ||||
| 		if (WARN_ON_ONCE(nr > 0)) | ||||
| 			return -EINVAL; | ||||
| 	} | ||||
| 
 | ||||
| 	return dev->irq + nr; | ||||
| } | ||||
| EXPORT_SYMBOL(pci_irq_vector); | ||||
| 
 | ||||
| struct pci_dev *msi_desc_to_pci_dev(struct msi_desc *desc) | ||||
| { | ||||
| 	return to_pci_dev(desc->dev); | ||||
|  |  | |||
|  | @ -1237,6 +1237,10 @@ resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev, int resno); | |||
| int pci_set_vga_state(struct pci_dev *pdev, bool decode, | ||||
| 		      unsigned int command_bits, u32 flags); | ||||
| 
 | ||||
| #define PCI_IRQ_NOLEGACY	(1 << 0) /* don't use legacy interrupts */ | ||||
| #define PCI_IRQ_NOMSI		(1 << 1) /* don't use MSI interrupts */ | ||||
| #define PCI_IRQ_NOMSIX		(1 << 2) /* don't use MSI-X interrupts */ | ||||
| 
 | ||||
| /* kmem_cache style wrapper around pci_alloc_consistent() */ | ||||
| 
 | ||||
| #include <linux/pci-dma.h> | ||||
|  | @ -1284,6 +1288,11 @@ static inline int pci_enable_msix_exact(struct pci_dev *dev, | |||
| 		return rc; | ||||
| 	return 0; | ||||
| } | ||||
| int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, | ||||
| 		unsigned int max_vecs, unsigned int flags); | ||||
| void pci_free_irq_vectors(struct pci_dev *dev); | ||||
| int pci_irq_vector(struct pci_dev *dev, unsigned int nr); | ||||
| 
 | ||||
| #else | ||||
| static inline int pci_msi_vec_count(struct pci_dev *dev) { return -ENOSYS; } | ||||
| static inline void pci_msi_shutdown(struct pci_dev *dev) { } | ||||
|  | @ -1307,6 +1316,24 @@ static inline int pci_enable_msix_range(struct pci_dev *dev, | |||
| static inline int pci_enable_msix_exact(struct pci_dev *dev, | ||||
| 		      struct msix_entry *entries, int nvec) | ||||
| { return -ENOSYS; } | ||||
| static inline int pci_alloc_irq_vectors(struct pci_dev *dev, | ||||
| 		unsigned int min_vecs, unsigned int max_vecs, | ||||
| 		unsigned int flags) | ||||
| { | ||||
| 	if (min_vecs > 1) | ||||
| 		return -EINVAL; | ||||
| 	return 1; | ||||
| } | ||||
| static inline void pci_free_irq_vectors(struct pci_dev *dev) | ||||
| { | ||||
| } | ||||
| 
 | ||||
| static inline int pci_irq_vector(struct pci_dev *dev, unsigned int nr) | ||||
| { | ||||
| 	if (WARN_ON_ONCE(nr > 0)) | ||||
| 		return -EINVAL; | ||||
| 	return dev->irq; | ||||
| } | ||||
| #endif | ||||
| 
 | ||||
| #ifdef CONFIG_PCIEPORTBUS | ||||
|  |  | |||
		Loading…
	
		Reference in a new issue
	
	 Christoph Hellwig
						Christoph Hellwig