mirror of
				https://github.com/torvalds/linux.git
				synced 2025-11-04 10:40:15 +02:00 
			
		
		
		
	Fix typos in Documentation. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20230814212822.193684-4-helgaas@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
		
			
				
	
	
		
			662 lines
		
	
	
	
		
			22 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			662 lines
		
	
	
	
		
			22 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
.. SPDX-License-Identifier: GPL-2.0
 | 
						|
 | 
						|
===================================
 | 
						|
Cache on Already Mounted Filesystem
 | 
						|
===================================
 | 
						|
 | 
						|
.. Contents:
 | 
						|
 | 
						|
 (*) Overview.
 | 
						|
 | 
						|
 (*) Requirements.
 | 
						|
 | 
						|
 (*) Configuration.
 | 
						|
 | 
						|
 (*) Starting the cache.
 | 
						|
 | 
						|
 (*) Things to avoid.
 | 
						|
 | 
						|
 (*) Cache culling.
 | 
						|
 | 
						|
 (*) Cache structure.
 | 
						|
 | 
						|
 (*) Security model and SELinux.
 | 
						|
 | 
						|
 (*) A note on security.
 | 
						|
 | 
						|
 (*) Statistical information.
 | 
						|
 | 
						|
 (*) Debugging.
 | 
						|
 | 
						|
 (*) On-demand Read.
 | 
						|
 | 
						|
 | 
						|
Overview
 | 
						|
========
 | 
						|
 | 
						|
CacheFiles is a caching backend that's meant to use as a cache a directory on
 | 
						|
an already mounted filesystem of a local type (such as Ext3).
 | 
						|
 | 
						|
CacheFiles uses a userspace daemon to do some of the cache management - such as
 | 
						|
reaping stale nodes and culling.  This is called cachefilesd and lives in
 | 
						|
/sbin.
 | 
						|
 | 
						|
The filesystem and data integrity of the cache are only as good as those of the
 | 
						|
filesystem providing the backing services.  Note that CacheFiles does not
 | 
						|
attempt to journal anything since the journalling interfaces of the various
 | 
						|
filesystems are very specific in nature.
 | 
						|
 | 
						|
CacheFiles creates a misc character device - "/dev/cachefiles" - that is used
 | 
						|
to communication with the daemon.  Only one thing may have this open at once,
 | 
						|
and while it is open, a cache is at least partially in existence.  The daemon
 | 
						|
opens this and sends commands down it to control the cache.
 | 
						|
 | 
						|
CacheFiles is currently limited to a single cache.
 | 
						|
 | 
						|
CacheFiles attempts to maintain at least a certain percentage of free space on
 | 
						|
the filesystem, shrinking the cache by culling the objects it contains to make
 | 
						|
space if necessary - see the "Cache Culling" section.  This means it can be
 | 
						|
placed on the same medium as a live set of data, and will expand to make use of
 | 
						|
spare space and automatically contract when the set of data requires more
 | 
						|
space.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Requirements
 | 
						|
============
 | 
						|
 | 
						|
The use of CacheFiles and its daemon requires the following features to be
 | 
						|
available in the system and in the cache filesystem:
 | 
						|
 | 
						|
	- dnotify.
 | 
						|
 | 
						|
	- extended attributes (xattrs).
 | 
						|
 | 
						|
	- openat() and friends.
 | 
						|
 | 
						|
	- bmap() support on files in the filesystem (FIBMAP ioctl).
 | 
						|
 | 
						|
	- The use of bmap() to detect a partial page at the end of the file.
 | 
						|
 | 
						|
It is strongly recommended that the "dir_index" option is enabled on Ext3
 | 
						|
filesystems being used as a cache.
 | 
						|
 | 
						|
 | 
						|
Configuration
 | 
						|
=============
 | 
						|
 | 
						|
The cache is configured by a script in /etc/cachefilesd.conf.  These commands
 | 
						|
set up cache ready for use.  The following script commands are available:
 | 
						|
 | 
						|
 brun <N>%, bcull <N>%, bstop <N>%, frun <N>%, fcull <N>%, fstop <N>%
 | 
						|
	Configure the culling limits.  Optional.  See the section on culling
 | 
						|
	The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
 | 
						|
 | 
						|
	The commands beginning with a 'b' are file space (block) limits, those
 | 
						|
	beginning with an 'f' are file count limits.
 | 
						|
 | 
						|
 dir <path>
 | 
						|
	Specify the directory containing the root of the cache.  Mandatory.
 | 
						|
 | 
						|
 tag <name>
 | 
						|
	Specify a tag to FS-Cache to use in distinguishing multiple caches.
 | 
						|
	Optional.  The default is "CacheFiles".
 | 
						|
 | 
						|
 debug <mask>
 | 
						|
	Specify a numeric bitmask to control debugging in the kernel module.
 | 
						|
	Optional.  The default is zero (all off).  The following values can be
 | 
						|
	OR'd into the mask to collect various information:
 | 
						|
 | 
						|
		==	=================================================
 | 
						|
		1	Turn on trace of function entry (_enter() macros)
 | 
						|
		2	Turn on trace of function exit (_leave() macros)
 | 
						|
		4	Turn on trace of internal debug points (_debug())
 | 
						|
		==	=================================================
 | 
						|
 | 
						|
	This mask can also be set through sysfs, eg::
 | 
						|
 | 
						|
		echo 5 >/sys/modules/cachefiles/parameters/debug
 | 
						|
 | 
						|
 | 
						|
Starting the Cache
 | 
						|
==================
 | 
						|
 | 
						|
The cache is started by running the daemon.  The daemon opens the cache device,
 | 
						|
configures the cache and tells it to begin caching.  At that point the cache
 | 
						|
binds to fscache and the cache becomes live.
 | 
						|
 | 
						|
The daemon is run as follows::
 | 
						|
 | 
						|
	/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
 | 
						|
 | 
						|
The flags are:
 | 
						|
 | 
						|
 ``-d``
 | 
						|
	Increase the debugging level.  This can be specified multiple times and
 | 
						|
	is cumulative with itself.
 | 
						|
 | 
						|
 ``-s``
 | 
						|
	Send messages to stderr instead of syslog.
 | 
						|
 | 
						|
 ``-n``
 | 
						|
	Don't daemonise and go into background.
 | 
						|
 | 
						|
 ``-f <configfile>``
 | 
						|
	Use an alternative configuration file rather than the default one.
 | 
						|
 | 
						|
 | 
						|
Things to Avoid
 | 
						|
===============
 | 
						|
 | 
						|
Do not mount other things within the cache as this will cause problems.  The
 | 
						|
kernel module contains its own very cut-down path walking facility that ignores
 | 
						|
mountpoints, but the daemon can't avoid them.
 | 
						|
 | 
						|
Do not create, rename or unlink files and directories in the cache while the
 | 
						|
cache is active, as this may cause the state to become uncertain.
 | 
						|
 | 
						|
Renaming files in the cache might make objects appear to be other objects (the
 | 
						|
filename is part of the lookup key).
 | 
						|
 | 
						|
Do not change or remove the extended attributes attached to cache files by the
 | 
						|
cache as this will cause the cache state management to get confused.
 | 
						|
 | 
						|
Do not create files or directories in the cache, lest the cache get confused or
 | 
						|
serve incorrect data.
 | 
						|
 | 
						|
Do not chmod files in the cache.  The module creates things with minimal
 | 
						|
permissions to prevent random users being able to access them directly.
 | 
						|
 | 
						|
 | 
						|
Cache Culling
 | 
						|
=============
 | 
						|
 | 
						|
The cache may need culling occasionally to make space.  This involves
 | 
						|
discarding objects from the cache that have been used less recently than
 | 
						|
anything else.  Culling is based on the access time of data objects.  Empty
 | 
						|
directories are culled if not in use.
 | 
						|
 | 
						|
Cache culling is done on the basis of the percentage of blocks and the
 | 
						|
percentage of files available in the underlying filesystem.  There are six
 | 
						|
"limits":
 | 
						|
 | 
						|
 brun, frun
 | 
						|
     If the amount of free space and the number of available files in the cache
 | 
						|
     rises above both these limits, then culling is turned off.
 | 
						|
 | 
						|
 bcull, fcull
 | 
						|
     If the amount of available space or the number of available files in the
 | 
						|
     cache falls below either of these limits, then culling is started.
 | 
						|
 | 
						|
 bstop, fstop
 | 
						|
     If the amount of available space or the number of available files in the
 | 
						|
     cache falls below either of these limits, then no further allocation of
 | 
						|
     disk space or files is permitted until culling has raised things above
 | 
						|
     these limits again.
 | 
						|
 | 
						|
These must be configured thusly::
 | 
						|
 | 
						|
	0 <= bstop < bcull < brun < 100
 | 
						|
	0 <= fstop < fcull < frun < 100
 | 
						|
 | 
						|
Note that these are percentages of available space and available files, and do
 | 
						|
_not_ appear as 100 minus the percentage displayed by the "df" program.
 | 
						|
 | 
						|
The userspace daemon scans the cache to build up a table of cullable objects.
 | 
						|
These are then culled in least recently used order.  A new scan of the cache is
 | 
						|
started as soon as space is made in the table.  Objects will be skipped if
 | 
						|
their atimes have changed or if the kernel module says it is still using them.
 | 
						|
 | 
						|
 | 
						|
Cache Structure
 | 
						|
===============
 | 
						|
 | 
						|
The CacheFiles module will create two directories in the directory it was
 | 
						|
given:
 | 
						|
 | 
						|
 * cache/
 | 
						|
 * graveyard/
 | 
						|
 | 
						|
The active cache objects all reside in the first directory.  The CacheFiles
 | 
						|
kernel module moves any retired or culled objects that it can't simply unlink
 | 
						|
to the graveyard from which the daemon will actually delete them.
 | 
						|
 | 
						|
The daemon uses dnotify to monitor the graveyard directory, and will delete
 | 
						|
anything that appears therein.
 | 
						|
 | 
						|
 | 
						|
The module represents index objects as directories with the filename "I..." or
 | 
						|
"J...".  Note that the "cache/" directory is itself a special index.
 | 
						|
 | 
						|
Data objects are represented as files if they have no children, or directories
 | 
						|
if they do.  Their filenames all begin "D..." or "E...".  If represented as a
 | 
						|
directory, data objects will have a file in the directory called "data" that
 | 
						|
actually holds the data.
 | 
						|
 | 
						|
Special objects are similar to data objects, except their filenames begin
 | 
						|
"S..." or "T...".
 | 
						|
 | 
						|
 | 
						|
If an object has children, then it will be represented as a directory.
 | 
						|
Immediately in the representative directory are a collection of directories
 | 
						|
named for hash values of the child object keys with an '@' prepended.  Into
 | 
						|
this directory, if possible, will be placed the representations of the child
 | 
						|
objects::
 | 
						|
 | 
						|
	 /INDEX    /INDEX     /INDEX                            /DATA FILES
 | 
						|
	/=========/==========/=================================/================
 | 
						|
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
 | 
						|
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
 | 
						|
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
 | 
						|
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry
 | 
						|
 | 
						|
 | 
						|
If the key is so long that it exceeds NAME_MAX with the decorations added on to
 | 
						|
it, then it will be cut into pieces, the first few of which will be used to
 | 
						|
make a nest of directories, and the last one of which will be the objects
 | 
						|
inside the last directory.  The names of the intermediate directories will have
 | 
						|
'+' prepended::
 | 
						|
 | 
						|
	J1223/@23/+xy...z/+kl...m/Epqr
 | 
						|
 | 
						|
 | 
						|
Note that keys are raw data, and not only may they exceed NAME_MAX in size,
 | 
						|
they may also contain things like '/' and NUL characters, and so they may not
 | 
						|
be suitable for turning directly into a filename.
 | 
						|
 | 
						|
To handle this, CacheFiles will use a suitably printable filename directly and
 | 
						|
"base-64" encode ones that aren't directly suitable.  The two versions of
 | 
						|
object filenames indicate the encoding:
 | 
						|
 | 
						|
	===============	===============	===============
 | 
						|
	OBJECT TYPE	PRINTABLE	ENCODED
 | 
						|
	===============	===============	===============
 | 
						|
	Index		"I..."		"J..."
 | 
						|
	Data		"D..."		"E..."
 | 
						|
	Special		"S..."		"T..."
 | 
						|
	===============	===============	===============
 | 
						|
 | 
						|
Intermediate directories are always "@" or "+" as appropriate.
 | 
						|
 | 
						|
 | 
						|
Each object in the cache has an extended attribute label that holds the object
 | 
						|
type ID (required to distinguish special objects) and the auxiliary data from
 | 
						|
the netfs.  The latter is used to detect stale objects in the cache and update
 | 
						|
or retire them.
 | 
						|
 | 
						|
 | 
						|
Note that CacheFiles will erase from the cache any file it doesn't recognise or
 | 
						|
any file of an incorrect type (such as a FIFO file or a device file).
 | 
						|
 | 
						|
 | 
						|
Security Model and SELinux
 | 
						|
==========================
 | 
						|
 | 
						|
CacheFiles is implemented to deal properly with the LSM security features of
 | 
						|
the Linux kernel and the SELinux facility.
 | 
						|
 | 
						|
One of the problems that CacheFiles faces is that it is generally acting on
 | 
						|
behalf of a process, and running in that process's context, and that includes a
 | 
						|
security context that is not appropriate for accessing the cache - either
 | 
						|
because the files in the cache are inaccessible to that process, or because if
 | 
						|
the process creates a file in the cache, that file may be inaccessible to other
 | 
						|
processes.
 | 
						|
 | 
						|
The way CacheFiles works is to temporarily change the security context (fsuid,
 | 
						|
fsgid and actor security label) that the process acts as - without changing the
 | 
						|
security context of the process when it the target of an operation performed by
 | 
						|
some other process (so signalling and suchlike still work correctly).
 | 
						|
 | 
						|
 | 
						|
When the CacheFiles module is asked to bind to its cache, it:
 | 
						|
 | 
						|
 (1) Finds the security label attached to the root cache directory and uses
 | 
						|
     that as the security label with which it will create files.  By default,
 | 
						|
     this is::
 | 
						|
 | 
						|
	cachefiles_var_t
 | 
						|
 | 
						|
 (2) Finds the security label of the process which issued the bind request
 | 
						|
     (presumed to be the cachefilesd daemon), which by default will be::
 | 
						|
 | 
						|
	cachefilesd_t
 | 
						|
 | 
						|
     and asks LSM to supply a security ID as which it should act given the
 | 
						|
     daemon's label.  By default, this will be::
 | 
						|
 | 
						|
	cachefiles_kernel_t
 | 
						|
 | 
						|
     SELinux transitions the daemon's security ID to the module's security ID
 | 
						|
     based on a rule of this form in the policy::
 | 
						|
 | 
						|
	type_transition <daemon's-ID> kernel_t : process <module's-ID>;
 | 
						|
 | 
						|
     For instance::
 | 
						|
 | 
						|
	type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
 | 
						|
 | 
						|
 | 
						|
The module's security ID gives it permission to create, move and remove files
 | 
						|
and directories in the cache, to find and access directories and files in the
 | 
						|
cache, to set and access extended attributes on cache objects, and to read and
 | 
						|
write files in the cache.
 | 
						|
 | 
						|
The daemon's security ID gives it only a very restricted set of permissions: it
 | 
						|
may scan directories, stat files and erase files and directories.  It may
 | 
						|
not read or write files in the cache, and so it is precluded from accessing the
 | 
						|
data cached therein; nor is it permitted to create new files in the cache.
 | 
						|
 | 
						|
 | 
						|
There are policy source files available in:
 | 
						|
 | 
						|
	https://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
 | 
						|
 | 
						|
and later versions.  In that tarball, see the files::
 | 
						|
 | 
						|
	cachefilesd.te
 | 
						|
	cachefilesd.fc
 | 
						|
	cachefilesd.if
 | 
						|
 | 
						|
They are built and installed directly by the RPM.
 | 
						|
 | 
						|
If a non-RPM based system is being used, then copy the above files to their own
 | 
						|
directory and run::
 | 
						|
 | 
						|
	make -f /usr/share/selinux/devel/Makefile
 | 
						|
	semodule -i cachefilesd.pp
 | 
						|
 | 
						|
You will need checkpolicy and selinux-policy-devel installed prior to the
 | 
						|
build.
 | 
						|
 | 
						|
 | 
						|
By default, the cache is located in /var/fscache, but if it is desirable that
 | 
						|
it should be elsewhere, than either the above policy files must be altered, or
 | 
						|
an auxiliary policy must be installed to label the alternate location of the
 | 
						|
cache.
 | 
						|
 | 
						|
For instructions on how to add an auxiliary policy to enable the cache to be
 | 
						|
located elsewhere when SELinux is in enforcing mode, please see::
 | 
						|
 | 
						|
	/usr/share/doc/cachefilesd-*/move-cache.txt
 | 
						|
 | 
						|
When the cachefilesd rpm is installed; alternatively, the document can be found
 | 
						|
in the sources.
 | 
						|
 | 
						|
 | 
						|
A Note on Security
 | 
						|
==================
 | 
						|
 | 
						|
CacheFiles makes use of the split security in the task_struct.  It allocates
 | 
						|
its own task_security structure, and redirects current->cred to point to it
 | 
						|
when it acts on behalf of another process, in that process's context.
 | 
						|
 | 
						|
The reason it does this is that it calls vfs_mkdir() and suchlike rather than
 | 
						|
bypassing security and calling inode ops directly.  Therefore the VFS and LSM
 | 
						|
may deny the CacheFiles access to the cache data because under some
 | 
						|
circumstances the caching code is running in the security context of whatever
 | 
						|
process issued the original syscall on the netfs.
 | 
						|
 | 
						|
Furthermore, should CacheFiles create a file or directory, the security
 | 
						|
parameters with that object is created (UID, GID, security label) would be
 | 
						|
derived from that process that issued the system call, thus potentially
 | 
						|
preventing other processes from accessing the cache - including CacheFiles's
 | 
						|
cache management daemon (cachefilesd).
 | 
						|
 | 
						|
What is required is to temporarily override the security of the process that
 | 
						|
issued the system call.  We can't, however, just do an in-place change of the
 | 
						|
security data as that affects the process as an object, not just as a subject.
 | 
						|
This means it may lose signals or ptrace events for example, and affects what
 | 
						|
the process looks like in /proc.
 | 
						|
 | 
						|
So CacheFiles makes use of a logical split in the security between the
 | 
						|
objective security (task->real_cred) and the subjective security (task->cred).
 | 
						|
The objective security holds the intrinsic security properties of a process and
 | 
						|
is never overridden.  This is what appears in /proc, and is what is used when a
 | 
						|
process is the target of an operation by some other process (SIGKILL for
 | 
						|
example).
 | 
						|
 | 
						|
The subjective security holds the active security properties of a process, and
 | 
						|
may be overridden.  This is not seen externally, and is used when a process
 | 
						|
acts upon another object, for example SIGKILLing another process or opening a
 | 
						|
file.
 | 
						|
 | 
						|
LSM hooks exist that allow SELinux (or Smack or whatever) to reject a request
 | 
						|
for CacheFiles to run in a context of a specific security label, or to create
 | 
						|
files and directories with another security label.
 | 
						|
 | 
						|
 | 
						|
Statistical Information
 | 
						|
=======================
 | 
						|
 | 
						|
If FS-Cache is compiled with the following option enabled::
 | 
						|
 | 
						|
	CONFIG_CACHEFILES_HISTOGRAM=y
 | 
						|
 | 
						|
then it will gather certain statistics and display them through a proc file.
 | 
						|
 | 
						|
 /proc/fs/cachefiles/histogram
 | 
						|
 | 
						|
     ::
 | 
						|
 | 
						|
	cat /proc/fs/cachefiles/histogram
 | 
						|
	JIFS  SECS  LOOKUPS   MKDIRS    CREATES
 | 
						|
	===== ===== ========= ========= =========
 | 
						|
 | 
						|
     This shows the breakdown of the number of times each amount of time
 | 
						|
     between 0 jiffies and HZ-1 jiffies a variety of tasks took to run.  The
 | 
						|
     columns are as follows:
 | 
						|
 | 
						|
	=======		=======================================================
 | 
						|
	COLUMN		TIME MEASUREMENT
 | 
						|
	=======		=======================================================
 | 
						|
	LOOKUPS		Length of time to perform a lookup on the backing fs
 | 
						|
	MKDIRS		Length of time to perform a mkdir on the backing fs
 | 
						|
	CREATES		Length of time to perform a create on the backing fs
 | 
						|
	=======		=======================================================
 | 
						|
 | 
						|
     Each row shows the number of events that took a particular range of times.
 | 
						|
     Each step is 1 jiffy in size.  The JIFS column indicates the particular
 | 
						|
     jiffy range covered, and the SECS field the equivalent number of seconds.
 | 
						|
 | 
						|
 | 
						|
Debugging
 | 
						|
=========
 | 
						|
 | 
						|
If CONFIG_CACHEFILES_DEBUG is enabled, the CacheFiles facility can have runtime
 | 
						|
debugging enabled by adjusting the value in::
 | 
						|
 | 
						|
	/sys/module/cachefiles/parameters/debug
 | 
						|
 | 
						|
This is a bitmask of debugging streams to enable:
 | 
						|
 | 
						|
	=======	=======	===============================	=======================
 | 
						|
	BIT	VALUE	STREAM				POINT
 | 
						|
	=======	=======	===============================	=======================
 | 
						|
	0	1	General				Function entry trace
 | 
						|
	1	2					Function exit trace
 | 
						|
	2	4					General
 | 
						|
	=======	=======	===============================	=======================
 | 
						|
 | 
						|
The appropriate set of values should be OR'd together and the result written to
 | 
						|
the control file.  For example::
 | 
						|
 | 
						|
	echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
 | 
						|
 | 
						|
will turn on all function entry debugging.
 | 
						|
 | 
						|
 | 
						|
On-demand Read
 | 
						|
==============
 | 
						|
 | 
						|
When working in its original mode, CacheFiles serves as a local cache for a
 | 
						|
remote networking fs - while in on-demand read mode, CacheFiles can boost the
 | 
						|
scenario where on-demand read semantics are needed, e.g. container image
 | 
						|
distribution.
 | 
						|
 | 
						|
The essential difference between these two modes is seen when a cache miss
 | 
						|
occurs: In the original mode, the netfs will fetch the data from the remote
 | 
						|
server and then write it to the cache file; in on-demand read mode, fetching
 | 
						|
the data and writing it into the cache is delegated to a user daemon.
 | 
						|
 | 
						|
``CONFIG_CACHEFILES_ONDEMAND`` should be enabled to support on-demand read mode.
 | 
						|
 | 
						|
 | 
						|
Protocol Communication
 | 
						|
----------------------
 | 
						|
 | 
						|
The on-demand read mode uses a simple protocol for communication between kernel
 | 
						|
and user daemon. The protocol can be modeled as::
 | 
						|
 | 
						|
	kernel --[request]--> user daemon --[reply]--> kernel
 | 
						|
 | 
						|
CacheFiles will send requests to the user daemon when needed.  The user daemon
 | 
						|
should poll the devnode ('/dev/cachefiles') to check if there's a pending
 | 
						|
request to be processed.  A POLLIN event will be returned when there's a pending
 | 
						|
request.
 | 
						|
 | 
						|
The user daemon then reads the devnode to fetch a request to process.  It should
 | 
						|
be noted that each read only gets one request. When it has finished processing
 | 
						|
the request, the user daemon should write the reply to the devnode.
 | 
						|
 | 
						|
Each request starts with a message header of the form::
 | 
						|
 | 
						|
	struct cachefiles_msg {
 | 
						|
		__u32 msg_id;
 | 
						|
		__u32 opcode;
 | 
						|
		__u32 len;
 | 
						|
		__u32 object_id;
 | 
						|
		__u8  data[];
 | 
						|
	};
 | 
						|
 | 
						|
where:
 | 
						|
 | 
						|
	* ``msg_id`` is a unique ID identifying this request among all pending
 | 
						|
	  requests.
 | 
						|
 | 
						|
	* ``opcode`` indicates the type of this request.
 | 
						|
 | 
						|
	* ``object_id`` is a unique ID identifying the cache file operated on.
 | 
						|
 | 
						|
	* ``data`` indicates the payload of this request.
 | 
						|
 | 
						|
	* ``len`` indicates the whole length of this request, including the
 | 
						|
	  header and following type-specific payload.
 | 
						|
 | 
						|
 | 
						|
Turning on On-demand Mode
 | 
						|
-------------------------
 | 
						|
 | 
						|
An optional parameter becomes available to the "bind" command::
 | 
						|
 | 
						|
	bind [ondemand]
 | 
						|
 | 
						|
When the "bind" command is given no argument, it defaults to the original mode.
 | 
						|
When it is given the "ondemand" argument, i.e. "bind ondemand", on-demand read
 | 
						|
mode will be enabled.
 | 
						|
 | 
						|
 | 
						|
The OPEN Request
 | 
						|
----------------
 | 
						|
 | 
						|
When the netfs opens a cache file for the first time, a request with the
 | 
						|
CACHEFILES_OP_OPEN opcode, a.k.a an OPEN request will be sent to the user
 | 
						|
daemon.  The payload format is of the form::
 | 
						|
 | 
						|
	struct cachefiles_open {
 | 
						|
		__u32 volume_key_size;
 | 
						|
		__u32 cookie_key_size;
 | 
						|
		__u32 fd;
 | 
						|
		__u32 flags;
 | 
						|
		__u8  data[];
 | 
						|
	};
 | 
						|
 | 
						|
where:
 | 
						|
 | 
						|
	* ``data`` contains the volume_key followed directly by the cookie_key.
 | 
						|
	  The volume key is a NUL-terminated string; the cookie key is binary
 | 
						|
	  data.
 | 
						|
 | 
						|
	* ``volume_key_size`` indicates the size of the volume key in bytes.
 | 
						|
 | 
						|
	* ``cookie_key_size`` indicates the size of the cookie key in bytes.
 | 
						|
 | 
						|
	* ``fd`` indicates an anonymous fd referring to the cache file, through
 | 
						|
	  which the user daemon can perform write/llseek file operations on the
 | 
						|
	  cache file.
 | 
						|
 | 
						|
 | 
						|
The user daemon can use the given (volume_key, cookie_key) pair to distinguish
 | 
						|
the requested cache file.  With the given anonymous fd, the user daemon can
 | 
						|
fetch the data and write it to the cache file in the background, even when
 | 
						|
kernel has not triggered a cache miss yet.
 | 
						|
 | 
						|
Be noted that each cache file has a unique object_id, while it may have multiple
 | 
						|
anonymous fds.  The user daemon may duplicate anonymous fds from the initial
 | 
						|
anonymous fd indicated by the @fd field through dup().  Thus each object_id can
 | 
						|
be mapped to multiple anonymous fds, while the usr daemon itself needs to
 | 
						|
maintain the mapping.
 | 
						|
 | 
						|
When implementing a user daemon, please be careful of RLIMIT_NOFILE,
 | 
						|
``/proc/sys/fs/nr_open`` and ``/proc/sys/fs/file-max``.  Typically these needn't
 | 
						|
be huge since they're related to the number of open device blobs rather than
 | 
						|
open files of each individual filesystem.
 | 
						|
 | 
						|
The user daemon should reply the OPEN request by issuing a "copen" (complete
 | 
						|
open) command on the devnode::
 | 
						|
 | 
						|
	copen <msg_id>,<cache_size>
 | 
						|
 | 
						|
where:
 | 
						|
 | 
						|
	* ``msg_id`` must match the msg_id field of the OPEN request.
 | 
						|
 | 
						|
	* When >= 0, ``cache_size`` indicates the size of the cache file;
 | 
						|
	  when < 0, ``cache_size`` indicates any error code encountered by the
 | 
						|
	  user daemon.
 | 
						|
 | 
						|
 | 
						|
The CLOSE Request
 | 
						|
-----------------
 | 
						|
 | 
						|
When a cookie withdrawn, a CLOSE request (opcode CACHEFILES_OP_CLOSE) will be
 | 
						|
sent to the user daemon.  This tells the user daemon to close all anonymous fds
 | 
						|
associated with the given object_id.  The CLOSE request has no extra payload,
 | 
						|
and shouldn't be replied.
 | 
						|
 | 
						|
 | 
						|
The READ Request
 | 
						|
----------------
 | 
						|
 | 
						|
When a cache miss is encountered in on-demand read mode, CacheFiles will send a
 | 
						|
READ request (opcode CACHEFILES_OP_READ) to the user daemon. This tells the user
 | 
						|
daemon to fetch the contents of the requested file range.  The payload is of the
 | 
						|
form::
 | 
						|
 | 
						|
	struct cachefiles_read {
 | 
						|
		__u64 off;
 | 
						|
		__u64 len;
 | 
						|
	};
 | 
						|
 | 
						|
where:
 | 
						|
 | 
						|
	* ``off`` indicates the starting offset of the requested file range.
 | 
						|
 | 
						|
	* ``len`` indicates the length of the requested file range.
 | 
						|
 | 
						|
 | 
						|
When it receives a READ request, the user daemon should fetch the requested data
 | 
						|
and write it to the cache file identified by object_id.
 | 
						|
 | 
						|
When it has finished processing the READ request, the user daemon should reply
 | 
						|
by using the CACHEFILES_IOC_READ_COMPLETE ioctl on one of the anonymous fds
 | 
						|
associated with the object_id given in the READ request.  The ioctl is of the
 | 
						|
form::
 | 
						|
 | 
						|
	ioctl(fd, CACHEFILES_IOC_READ_COMPLETE, msg_id);
 | 
						|
 | 
						|
where:
 | 
						|
 | 
						|
	* ``fd`` is one of the anonymous fds associated with the object_id
 | 
						|
	  given.
 | 
						|
 | 
						|
	* ``msg_id`` must match the msg_id field of the READ request.
 |