forked from mirrors/linux
		
	ARM: 7825/1: document the use of NEON in kernel mode
Add a file to Documentation/arm explaining how kernel mode NEON is supposed to be used. Reviewed-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This commit is contained in:
		
							parent
							
								
									83d26d1113
								
							
						
					
					
						commit
						2afd0a0524
					
				
					 1 changed files with 121 additions and 0 deletions
				
			
		
							
								
								
									
										121
									
								
								Documentation/arm/kernel_mode_neon.txt
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										121
									
								
								Documentation/arm/kernel_mode_neon.txt
									
									
									
									
									
										Normal file
									
								
							| 
						 | 
					@ -0,0 +1,121 @@
 | 
				
			||||||
 | 
					Kernel mode NEON
 | 
				
			||||||
 | 
					================
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					TL;DR summary
 | 
				
			||||||
 | 
					-------------
 | 
				
			||||||
 | 
					* Use only NEON instructions, or VFP instructions that don't rely on support
 | 
				
			||||||
 | 
					  code
 | 
				
			||||||
 | 
					* Isolate your NEON code in a separate compilation unit, and compile it with
 | 
				
			||||||
 | 
					  '-mfpu=neon -mfloat-abi=softfp'
 | 
				
			||||||
 | 
					* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
 | 
				
			||||||
 | 
					  NEON code
 | 
				
			||||||
 | 
					* Don't sleep in your NEON code, and be aware that it will be executed with
 | 
				
			||||||
 | 
					  preemption disabled
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Introduction
 | 
				
			||||||
 | 
					------------
 | 
				
			||||||
 | 
					It is possible to use NEON instructions (and in some cases, VFP instructions) in
 | 
				
			||||||
 | 
					code that runs in kernel mode. However, for performance reasons, the NEON/VFP
 | 
				
			||||||
 | 
					register file is not preserved and restored at every context switch or taken
 | 
				
			||||||
 | 
					exception like the normal register file is, so some manual intervention is
 | 
				
			||||||
 | 
					required. Furthermore, special care is required for code that may sleep [i.e.,
 | 
				
			||||||
 | 
					may call schedule()], as NEON or VFP instructions will be executed in a
 | 
				
			||||||
 | 
					non-preemptible section for reasons outlined below.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Lazy preserve and restore
 | 
				
			||||||
 | 
					-------------------------
 | 
				
			||||||
 | 
					The NEON/VFP register file is managed using lazy preserve (on UP systems) and
 | 
				
			||||||
 | 
					lazy restore (on both SMP and UP systems). This means that the register file is
 | 
				
			||||||
 | 
					kept 'live', and is only preserved and restored when multiple tasks are
 | 
				
			||||||
 | 
					contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to
 | 
				
			||||||
 | 
					another core). Lazy restore is implemented by disabling the NEON/VFP unit after
 | 
				
			||||||
 | 
					every context switch, resulting in a trap when subsequently a NEON/VFP
 | 
				
			||||||
 | 
					instruction is issued, allowing the kernel to step in and perform the restore if
 | 
				
			||||||
 | 
					necessary.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Any use of the NEON/VFP unit in kernel mode should not interfere with this, so
 | 
				
			||||||
 | 
					it is required to do an 'eager' preserve of the NEON/VFP register file, and
 | 
				
			||||||
 | 
					enable the NEON/VFP unit explicitly so no exceptions are generated on first
 | 
				
			||||||
 | 
					subsequent use. This is handled by the function kernel_neon_begin(), which
 | 
				
			||||||
 | 
					should be called before any kernel mode NEON or VFP instructions are issued.
 | 
				
			||||||
 | 
					Likewise, the NEON/VFP unit should be disabled again after use to make sure user
 | 
				
			||||||
 | 
					mode will hit the lazy restore trap upon next use. This is handled by the
 | 
				
			||||||
 | 
					function kernel_neon_end().
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Interruptions in kernel mode
 | 
				
			||||||
 | 
					----------------------------
 | 
				
			||||||
 | 
					For reasons of performance and simplicity, it was decided that there shall be no
 | 
				
			||||||
 | 
					preserve/restore mechanism for the kernel mode NEON/VFP register contents. This
 | 
				
			||||||
 | 
					implies that interruptions of a kernel mode NEON section can only be allowed if
 | 
				
			||||||
 | 
					they are guaranteed not to touch the NEON/VFP registers. For this reason, the
 | 
				
			||||||
 | 
					following rules and restrictions apply in the kernel:
 | 
				
			||||||
 | 
					* NEON/VFP code is not allowed in interrupt context;
 | 
				
			||||||
 | 
					* NEON/VFP code is not allowed to sleep;
 | 
				
			||||||
 | 
					* NEON/VFP code is executed with preemption disabled.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If latency is a concern, it is possible to put back to back calls to
 | 
				
			||||||
 | 
					kernel_neon_end() and kernel_neon_begin() in places in your code where none of
 | 
				
			||||||
 | 
					the NEON registers are live. (Additional calls to kernel_neon_begin() should be
 | 
				
			||||||
 | 
					reasonably cheap if no context switch occurred in the meantime)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					VFP and support code
 | 
				
			||||||
 | 
					--------------------
 | 
				
			||||||
 | 
					Earlier versions of VFP (prior to version 3) rely on software support for things
 | 
				
			||||||
 | 
					like IEEE-754 compliant underflow handling etc. When the VFP unit needs such
 | 
				
			||||||
 | 
					software assistance, it signals the kernel by raising an undefined instruction
 | 
				
			||||||
 | 
					exception. The kernel responds by inspecting the VFP control registers and the
 | 
				
			||||||
 | 
					current instruction and arguments, and emulates the instruction in software.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Such software assistance is currently not implemented for VFP instructions
 | 
				
			||||||
 | 
					executed in kernel mode. If such a condition is encountered, the kernel will
 | 
				
			||||||
 | 
					fail and generate an OOPS.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Separating NEON code from ordinary code
 | 
				
			||||||
 | 
					---------------------------------------
 | 
				
			||||||
 | 
					The compiler is not aware of the special significance of kernel_neon_begin() and
 | 
				
			||||||
 | 
					kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions
 | 
				
			||||||
 | 
					between calls to these respective functions. Furthermore, GCC may generate NEON
 | 
				
			||||||
 | 
					instructions of its own at -O3 level if -mfpu=neon is selected, and even if the
 | 
				
			||||||
 | 
					kernel is currently compiled at -O2, future changes may result in NEON/VFP
 | 
				
			||||||
 | 
					instructions appearing in unexpected places if no special care is taken.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Therefore, the recommended and only supported way of using NEON/VFP in the
 | 
				
			||||||
 | 
					kernel is by adhering to the following rules:
 | 
				
			||||||
 | 
					* isolate the NEON code in a separate compilation unit and compile it with
 | 
				
			||||||
 | 
					  '-mfpu=neon -mfloat-abi=softfp';
 | 
				
			||||||
 | 
					* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
 | 
				
			||||||
 | 
					  into the unit containing the NEON code from a compilation unit which is *not*
 | 
				
			||||||
 | 
					  built with the GCC flag '-mfpu=neon' set.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					As the kernel is compiled with '-msoft-float', the above will guarantee that
 | 
				
			||||||
 | 
					both NEON and VFP instructions will only ever appear in designated compilation
 | 
				
			||||||
 | 
					units at any optimization level.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					NEON assembler
 | 
				
			||||||
 | 
					--------------
 | 
				
			||||||
 | 
					NEON assembler is supported with no additional caveats as long as the rules
 | 
				
			||||||
 | 
					above are followed.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					NEON code generated by GCC
 | 
				
			||||||
 | 
					--------------------------
 | 
				
			||||||
 | 
					The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit
 | 
				
			||||||
 | 
					parallelism, and generates NEON code from ordinary C source code. This is fully
 | 
				
			||||||
 | 
					supported as long as the rules above are followed.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					NEON intrinsics
 | 
				
			||||||
 | 
					---------------
 | 
				
			||||||
 | 
					NEON intrinsics are also supported. However, as code using NEON intrinsics
 | 
				
			||||||
 | 
					relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should
 | 
				
			||||||
 | 
					observe the following in addition to the rules above:
 | 
				
			||||||
 | 
					* Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
 | 
				
			||||||
 | 
					  uses its builtin version of <stdint.h> (this is a C99 header which the kernel
 | 
				
			||||||
 | 
					  does not supply);
 | 
				
			||||||
 | 
					* Include <arm_neon.h> last, or at least after <linux/types.h>
 | 
				
			||||||
		Loading…
	
		Reference in a new issue