mirror of
				https://github.com/torvalds/linux.git
				synced 2025-11-04 10:40:15 +02:00 
			
		
		
		
	Fix Sphinx warnings in ibmvmc.rst, add an index.rst file in Documentation/misc-devices/, and insert that index file into the top-level index file. Documentation/misc-devices/ibmvmc.rst:2: WARNING: Explicit markup ends without a blank line; unexpected unindent. Documentation/misc-devices/ibmvmc.rst:: WARNING: document isn't included in any toctree Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Royer <seroyer@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
		
			
				
	
	
		
			227 lines
		
	
	
	
		
			9.8 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			227 lines
		
	
	
	
		
			9.8 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
.. SPDX-License-Identifier: GPL-2.0+
 | 
						||
 | 
						||
======================================================
 | 
						||
IBM Virtual Management Channel Kernel Driver (IBMVMC)
 | 
						||
======================================================
 | 
						||
 | 
						||
:Authors:
 | 
						||
	Dave Engebretsen <engebret@us.ibm.com>,
 | 
						||
	Adam Reznechek <adreznec@linux.vnet.ibm.com>,
 | 
						||
	Steven Royer <seroyer@linux.vnet.ibm.com>,
 | 
						||
	Bryant G. Ly <bryantly@linux.vnet.ibm.com>,
 | 
						||
 | 
						||
Introduction
 | 
						||
============
 | 
						||
 | 
						||
Note: Knowledge of virtualization technology is required to understand
 | 
						||
this document.
 | 
						||
 | 
						||
A good reference document would be:
 | 
						||
 | 
						||
https://openpowerfoundation.org/wp-content/uploads/2016/05/LoPAPR_DRAFT_v11_24March2016_cmt1.pdf
 | 
						||
 | 
						||
The Virtual Management Channel (VMC) is a logical device which provides an
 | 
						||
interface between the hypervisor and a management partition. This interface
 | 
						||
is like a message passing interface. This management partition is intended
 | 
						||
to provide an alternative to systems that use a Hardware Management
 | 
						||
Console (HMC) - based system management.
 | 
						||
 | 
						||
The primary hardware management solution that is developed by IBM relies
 | 
						||
on an appliance server named the Hardware Management Console (HMC),
 | 
						||
packaged as an external tower or rack-mounted personal computer. In a
 | 
						||
Power Systems environment, a single HMC can manage multiple POWER
 | 
						||
processor-based systems.
 | 
						||
 | 
						||
Management Application
 | 
						||
----------------------
 | 
						||
 | 
						||
In the management partition, a management application exists which enables
 | 
						||
a system administrator to configure the system’s partitioning
 | 
						||
characteristics via a command line interface (CLI) or Representational
 | 
						||
State Transfer Application (REST API's).
 | 
						||
 | 
						||
The management application runs on a Linux logical partition on a
 | 
						||
POWER8 or newer processor-based server that is virtualized by PowerVM.
 | 
						||
System configuration, maintenance, and control functions which
 | 
						||
traditionally require an HMC can be implemented in the management
 | 
						||
application using a combination of HMC to hypervisor interfaces and
 | 
						||
existing operating system methods. This tool provides a subset of the
 | 
						||
functions implemented by the HMC and enables basic partition configuration.
 | 
						||
The set of HMC to hypervisor messages supported by the management
 | 
						||
application component are passed to the hypervisor over a VMC interface,
 | 
						||
which is defined below.
 | 
						||
 | 
						||
The VMC enables the management partition to provide basic partitioning
 | 
						||
functions:
 | 
						||
 | 
						||
- Logical Partitioning Configuration
 | 
						||
- Start, and stop actions for individual partitions
 | 
						||
- Display of partition status
 | 
						||
- Management of virtual Ethernet
 | 
						||
- Management of virtual Storage
 | 
						||
- Basic system management
 | 
						||
 | 
						||
Virtual Management Channel (VMC)
 | 
						||
--------------------------------
 | 
						||
 | 
						||
A logical device, called the Virtual Management Channel (VMC), is defined
 | 
						||
for communicating between the management application and the hypervisor. It
 | 
						||
basically creates the pipes that enable virtualization management
 | 
						||
software. This device is presented to a designated management partition as
 | 
						||
a virtual device.
 | 
						||
 | 
						||
This communication device uses Command/Response Queue (CRQ) and the
 | 
						||
Remote Direct Memory Access (RDMA) interfaces. A three-way handshake is
 | 
						||
defined that must take place to establish that both the hypervisor and
 | 
						||
management partition sides of the channel are running prior to
 | 
						||
sending/receiving any of the protocol messages.
 | 
						||
 | 
						||
This driver also utilizes Transport Event CRQs. CRQ messages are sent
 | 
						||
when the hypervisor detects one of the peer partitions has abnormally
 | 
						||
terminated, or one side has called H_FREE_CRQ to close their CRQ.
 | 
						||
Two new classes of CRQ messages are introduced for the VMC device. VMC
 | 
						||
Administrative messages are used for each partition using the VMC to
 | 
						||
communicate capabilities to their partner. HMC Interface messages are used
 | 
						||
for the actual flow of HMC messages between the management partition and
 | 
						||
the hypervisor. As most HMC messages far exceed the size of a CRQ buffer,
 | 
						||
a virtual DMA (RMDA) of the HMC message data is done prior to each HMC
 | 
						||
Interface CRQ message. Only the management partition drives RDMA
 | 
						||
operations; hypervisors never directly cause the movement of message data.
 | 
						||
 | 
						||
 | 
						||
Terminology
 | 
						||
-----------
 | 
						||
RDMA
 | 
						||
        Remote Direct Memory Access is DMA transfer from the server to its
 | 
						||
        client or from the server to its partner partition. DMA refers
 | 
						||
        to both physical I/O to and from memory operations and to memory
 | 
						||
        to memory move operations.
 | 
						||
CRQ
 | 
						||
        Command/Response Queue a facility which is used to communicate
 | 
						||
        between partner partitions. Transport events which are signaled
 | 
						||
        from the hypervisor to partition are also reported in this queue.
 | 
						||
 | 
						||
Example Management Partition VMC Driver Interface
 | 
						||
=================================================
 | 
						||
 | 
						||
This section provides an example for the management application
 | 
						||
implementation where a device driver is used to interface to the VMC
 | 
						||
device. This driver consists of a new device, for example /dev/ibmvmc,
 | 
						||
which provides interfaces to open, close, read, write, and perform
 | 
						||
ioctl’s against the VMC device.
 | 
						||
 | 
						||
VMC Interface Initialization
 | 
						||
----------------------------
 | 
						||
 | 
						||
The device driver is responsible for initializing the VMC when the driver
 | 
						||
is loaded. It first creates and initializes the CRQ. Next, an exchange of
 | 
						||
VMC capabilities is performed to indicate the code version and number of
 | 
						||
resources available in both the management partition and the hypervisor.
 | 
						||
Finally, the hypervisor requests that the management partition create an
 | 
						||
initial pool of VMC buffers, one buffer for each possible HMC connection,
 | 
						||
which will be used for management application  session initialization.
 | 
						||
Prior to completion of this initialization sequence, the device returns
 | 
						||
EBUSY to open() calls. EIO is returned for all open() failures.
 | 
						||
 | 
						||
::
 | 
						||
 | 
						||
        Management Partition		Hypervisor
 | 
						||
                        CRQ INIT
 | 
						||
        ---------------------------------------->
 | 
						||
        	   CRQ INIT COMPLETE
 | 
						||
        <----------------------------------------
 | 
						||
        	      CAPABILITIES
 | 
						||
        ---------------------------------------->
 | 
						||
        	 CAPABILITIES RESPONSE
 | 
						||
        <----------------------------------------
 | 
						||
              ADD BUFFER (HMC IDX=0,1,..)         _
 | 
						||
        <----------------------------------------  |
 | 
						||
        	  ADD BUFFER RESPONSE              | - Perform # HMCs Iterations
 | 
						||
        ----------------------------------------> -
 | 
						||
 | 
						||
VMC Interface Open
 | 
						||
------------------
 | 
						||
 | 
						||
After the basic VMC channel has been initialized, an HMC session level
 | 
						||
connection can be established. The application layer performs an open() to
 | 
						||
the VMC device and executes an ioctl() against it, indicating the HMC ID
 | 
						||
(32 bytes of data) for this session. If the VMC device is in an invalid
 | 
						||
state, EIO will be returned for the ioctl(). The device driver creates a
 | 
						||
new HMC session value (ranging from 1 to 255) and HMC index value (starting
 | 
						||
at index 0 and ranging to 254) for this HMC ID. The driver then does an
 | 
						||
RDMA of the HMC ID to the hypervisor, and then sends an Interface Open
 | 
						||
message to the hypervisor to establish the session over the VMC. After the
 | 
						||
hypervisor receives this information, it sends Add Buffer messages to the
 | 
						||
management partition to seed an initial pool of buffers for the new HMC
 | 
						||
connection. Finally, the hypervisor sends an Interface Open Response
 | 
						||
message, to indicate that it is ready for normal runtime messaging. The
 | 
						||
following illustrates this VMC flow:
 | 
						||
 | 
						||
::
 | 
						||
 | 
						||
        Management Partition             Hypervisor
 | 
						||
        	      RDMA HMC ID
 | 
						||
        ---------------------------------------->
 | 
						||
        	    Interface Open
 | 
						||
        ---------------------------------------->
 | 
						||
        	      Add Buffer                  _
 | 
						||
        <----------------------------------------  |
 | 
						||
        	  Add Buffer Response              | - Perform N Iterations
 | 
						||
        ----------------------------------------> -
 | 
						||
        	Interface Open Response
 | 
						||
        <----------------------------------------
 | 
						||
 | 
						||
VMC Interface Runtime
 | 
						||
---------------------
 | 
						||
 | 
						||
During normal runtime, the management application and the hypervisor
 | 
						||
exchange HMC messages via the Signal VMC message and RDMA operations. When
 | 
						||
sending data to the hypervisor, the management application performs a
 | 
						||
write() to the VMC device, and the driver RDMA’s the data to the hypervisor
 | 
						||
and then sends a Signal Message. If a write() is attempted before VMC
 | 
						||
device buffers have been made available by the hypervisor, or no buffers
 | 
						||
are currently available, EBUSY is returned in response to the write(). A
 | 
						||
write() will return EIO for all other errors, such as an invalid device
 | 
						||
state. When the hypervisor sends a message to the management, the data is
 | 
						||
put into a VMC buffer and an Signal Message is sent to the VMC driver in
 | 
						||
the management partition. The driver RDMA’s the buffer into the partition
 | 
						||
and passes the data up to the appropriate management application via a
 | 
						||
read() to the VMC device. The read() request blocks if there is no buffer
 | 
						||
available to read. The management application may use select() to wait for
 | 
						||
the VMC device to become ready with data to read.
 | 
						||
 | 
						||
::
 | 
						||
 | 
						||
        Management Partition             Hypervisor
 | 
						||
        		MSG RDMA
 | 
						||
        ---------------------------------------->
 | 
						||
        		SIGNAL MSG
 | 
						||
        ---------------------------------------->
 | 
						||
        		SIGNAL MSG
 | 
						||
        <----------------------------------------
 | 
						||
        		MSG RDMA
 | 
						||
        <----------------------------------------
 | 
						||
 | 
						||
VMC Interface Close
 | 
						||
-------------------
 | 
						||
 | 
						||
HMC session level connections are closed by the management partition when
 | 
						||
the application layer performs a close() against the device. This action
 | 
						||
results in an Interface Close message flowing to the hypervisor, which
 | 
						||
causes the session to be terminated. The device driver must free any
 | 
						||
storage allocated for buffers for this HMC connection.
 | 
						||
 | 
						||
::
 | 
						||
 | 
						||
        Management Partition             Hypervisor
 | 
						||
        	     INTERFACE CLOSE
 | 
						||
        ---------------------------------------->
 | 
						||
                INTERFACE CLOSE RESPONSE
 | 
						||
        <----------------------------------------
 | 
						||
 | 
						||
Additional Information
 | 
						||
======================
 | 
						||
 | 
						||
For more information on the documentation for CRQ Messages, VMC Messages,
 | 
						||
HMC interface Buffers, and signal messages please refer to the Linux on
 | 
						||
Power Architecture Platform Reference. Section F.
 |