mirror of
				https://github.com/torvalds/linux.git
				synced 2025-11-04 10:40:15 +02:00 
			
		
		
		
	Update documentation to the current state of affairs. Remove duplicated method descruptions in exportfs.h and point to Documentation/filesystems/ Exporting instead. Add a little file header comment in expfs.c describing what's going on and mentioning Neils and my copyright [1]. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: <linux-ext4@vger.kernel.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Hugh Dickins <hugh@veritas.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
		
			
				
	
	
		
			147 lines
		
	
	
	
		
			6.5 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			147 lines
		
	
	
	
		
			6.5 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
 | 
						|
Making Filesystems Exportable
 | 
						|
=============================
 | 
						|
 | 
						|
Overview
 | 
						|
--------
 | 
						|
 | 
						|
All filesystem operations require a dentry (or two) as a starting
 | 
						|
point.  Local applications have a reference-counted hold on suitable
 | 
						|
dentries via open file descriptors or cwd/root.  However remote
 | 
						|
applications that access a filesystem via a remote filesystem protocol
 | 
						|
such as NFS may not be able to hold such a reference, and so need a
 | 
						|
different way to refer to a particular dentry.  As the alternative
 | 
						|
form of reference needs to be stable across renames, truncates, and
 | 
						|
server-reboot (among other things, though these tend to be the most
 | 
						|
problematic), there is no simple answer like 'filename'.
 | 
						|
 | 
						|
The mechanism discussed here allows each filesystem implementation to
 | 
						|
specify how to generate an opaque (outside of the filesystem) byte
 | 
						|
string for any dentry, and how to find an appropriate dentry for any
 | 
						|
given opaque byte string.
 | 
						|
This byte string will be called a "filehandle fragment" as it
 | 
						|
corresponds to part of an NFS filehandle.
 | 
						|
 | 
						|
A filesystem which supports the mapping between filehandle fragments
 | 
						|
and dentries will be termed "exportable".
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Dcache Issues
 | 
						|
-------------
 | 
						|
 | 
						|
The dcache normally contains a proper prefix of any given filesystem
 | 
						|
tree.  This means that if any filesystem object is in the dcache, then
 | 
						|
all of the ancestors of that filesystem object are also in the dcache.
 | 
						|
As normal access is by filename this prefix is created naturally and
 | 
						|
maintained easily (by each object maintaining a reference count on
 | 
						|
its parent).
 | 
						|
 | 
						|
However when objects are included into the dcache by interpreting a
 | 
						|
filehandle fragment, there is no automatic creation of a path prefix
 | 
						|
for the object.  This leads to two related but distinct features of
 | 
						|
the dcache that are not needed for normal filesystem access.
 | 
						|
 | 
						|
1/ The dcache must sometimes contain objects that are not part of the
 | 
						|
   proper prefix. i.e that are not connected to the root.
 | 
						|
2/ The dcache must be prepared for a newly found (via ->lookup) directory
 | 
						|
   to already have a (non-connected) dentry, and must be able to move
 | 
						|
   that dentry into place (based on the parent and name in the
 | 
						|
   ->lookup).   This is particularly needed for directories as
 | 
						|
   it is a dcache invariant that directories only have one dentry.
 | 
						|
 | 
						|
To implement these features, the dcache has:
 | 
						|
 | 
						|
a/ A dentry flag DCACHE_DISCONNECTED which is set on
 | 
						|
   any dentry that might not be part of the proper prefix.
 | 
						|
   This is set when anonymous dentries are created, and cleared when a
 | 
						|
   dentry is noticed to be a child of a dentry which is in the proper
 | 
						|
   prefix. 
 | 
						|
 | 
						|
b/ A per-superblock list "s_anon" of dentries which are the roots of
 | 
						|
   subtrees that are not in the proper prefix.  These dentries, as
 | 
						|
   well as the proper prefix, need to be released at unmount time.  As
 | 
						|
   these dentries will not be hashed, they are linked together on the
 | 
						|
   d_hash list_head.
 | 
						|
 | 
						|
c/ Helper routines to allocate anonymous dentries, and to help attach
 | 
						|
   loose directory dentries at lookup time. They are:
 | 
						|
    d_alloc_anon(inode) will return a dentry for the given inode.
 | 
						|
      If the inode already has a dentry, one of those is returned.
 | 
						|
      If it doesn't, a new anonymous (IS_ROOT and
 | 
						|
        DCACHE_DISCONNECTED) dentry is allocated and attached.
 | 
						|
      In the case of a directory, care is taken that only one dentry
 | 
						|
      can ever be attached.
 | 
						|
    d_splice_alias(inode, dentry) will make sure that there is a
 | 
						|
      dentry with the same name and parent as the given dentry, and
 | 
						|
      which refers to the given inode.
 | 
						|
      If the inode is a directory and already has a dentry, then that
 | 
						|
      dentry is d_moved over the given dentry.
 | 
						|
      If the passed dentry gets attached, care is taken that this is
 | 
						|
      mutually exclusive to a d_alloc_anon operation.
 | 
						|
      If the passed dentry is used, NULL is returned, else the used
 | 
						|
      dentry is returned.  This corresponds to the calling pattern of
 | 
						|
      ->lookup.
 | 
						|
  
 | 
						|
 
 | 
						|
Filesystem Issues
 | 
						|
-----------------
 | 
						|
 | 
						|
For a filesystem to be exportable it must:
 | 
						|
 
 | 
						|
   1/ provide the filehandle fragment routines described below.
 | 
						|
   2/ make sure that d_splice_alias is used rather than d_add
 | 
						|
      when ->lookup finds an inode for a given parent and name.
 | 
						|
      Typically the ->lookup routine will end with a:
 | 
						|
 | 
						|
		return d_splice_alias(inode, dentry);
 | 
						|
	}
 | 
						|
 | 
						|
 | 
						|
 | 
						|
  A file system implementation declares that instances of the filesystem
 | 
						|
are exportable by setting the s_export_op field in the struct
 | 
						|
super_block.  This field must point to a "struct export_operations"
 | 
						|
struct which has the following members:
 | 
						|
 | 
						|
 encode_fh  (optional)
 | 
						|
    Takes a dentry and creates a filehandle fragment which can later be used
 | 
						|
    to find or create a dentry for the same object.  The default
 | 
						|
    implementation creates a filehandle fragment that encodes a 32bit inode
 | 
						|
    and generation number for the inode encoded, and if necessary the
 | 
						|
    same information for the parent.
 | 
						|
 | 
						|
  fh_to_dentry (mandatory)
 | 
						|
    Given a filehandle fragment, this should find the implied object and
 | 
						|
    create a dentry for it (possibly with d_alloc_anon).
 | 
						|
 | 
						|
  fh_to_parent (optional but strongly recommended)
 | 
						|
    Given a filehandle fragment, this should find the parent of the
 | 
						|
    implied object and create a dentry for it (possibly with d_alloc_anon).
 | 
						|
    May fail if the filehandle fragment is too small.
 | 
						|
 | 
						|
  get_parent (optional but strongly recommended)
 | 
						|
    When given a dentry for a directory, this should return  a dentry for
 | 
						|
    the parent.  Quite possibly the parent dentry will have been allocated
 | 
						|
    by d_alloc_anon.  The default get_parent function just returns an error
 | 
						|
    so any filehandle lookup that requires finding a parent will fail.
 | 
						|
    ->lookup("..") is *not* used as a default as it can leave ".." entries
 | 
						|
    in the dcache which are too messy to work with.
 | 
						|
 | 
						|
  get_name (optional)
 | 
						|
    When given a parent dentry and a child dentry, this should find a name
 | 
						|
    in the directory identified by the parent dentry, which leads to the
 | 
						|
    object identified by the child dentry.  If no get_name function is
 | 
						|
    supplied, a default implementation is provided which uses vfs_readdir
 | 
						|
    to find potential names, and matches inode numbers to find the correct
 | 
						|
    match.
 | 
						|
 | 
						|
 | 
						|
A filehandle fragment consists of an array of 1 or more 4byte words,
 | 
						|
together with a one byte "type".
 | 
						|
The decode_fh routine should not depend on the stated size that is
 | 
						|
passed to it.  This size may be larger than the original filehandle
 | 
						|
generated by encode_fh, in which case it will have been padded with
 | 
						|
nuls.  Rather, the encode_fh routine should choose a "type" which
 | 
						|
indicates the decode_fh how much of the filehandle is valid, and how
 | 
						|
it should be interpreted.
 |