mirror of
				https://github.com/torvalds/linux.git
				synced 2025-11-04 02:30:34 +02:00 
			
		
		
		
	The mandatory file locking implementation has long-standing races that probably render it useless. I know of no plans to fix them. Till we do, we should at least warn people. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
		
			
				
	
	
		
			171 lines
		
	
	
	
		
			8 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			171 lines
		
	
	
	
		
			8 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
	Mandatory File Locking For The Linux Operating System
 | 
						|
 | 
						|
		Andy Walker <andy@lysaker.kvaerner.no>
 | 
						|
 | 
						|
			   15 April 1996
 | 
						|
		     (Updated September 2007)
 | 
						|
 | 
						|
0. Why you should avoid mandatory locking
 | 
						|
-----------------------------------------
 | 
						|
 | 
						|
The Linux implementation is prey to a number of difficult-to-fix race
 | 
						|
conditions which in practice make it not dependable:
 | 
						|
 | 
						|
	- The write system call checks for a mandatory lock only once
 | 
						|
	  at its start.  It is therefore possible for a lock request to
 | 
						|
	  be granted after this check but before the data is modified.
 | 
						|
	  A process may then see file data change even while a mandatory
 | 
						|
	  lock was held.
 | 
						|
	- Similarly, an exclusive lock may be granted on a file after
 | 
						|
	  the kernel has decided to proceed with a read, but before the
 | 
						|
	  read has actually completed, and the reading process may see
 | 
						|
	  the file data in a state which should not have been visible
 | 
						|
	  to it.
 | 
						|
	- Similar races make the claimed mutual exclusion between lock
 | 
						|
	  and mmap similarly unreliable.
 | 
						|
 | 
						|
1. What is  mandatory locking?
 | 
						|
------------------------------
 | 
						|
 | 
						|
Mandatory locking is kernel enforced file locking, as opposed to the more usual
 | 
						|
cooperative file locking used to guarantee sequential access to files among
 | 
						|
processes. File locks are applied using the flock() and fcntl() system calls
 | 
						|
(and the lockf() library routine which is a wrapper around fcntl().) It is
 | 
						|
normally a process' responsibility to check for locks on a file it wishes to
 | 
						|
update, before applying its own lock, updating the file and unlocking it again.
 | 
						|
The most commonly used example of this (and in the case of sendmail, the most
 | 
						|
troublesome) is access to a user's mailbox. The mail user agent and the mail
 | 
						|
transfer agent must guard against updating the mailbox at the same time, and
 | 
						|
prevent reading the mailbox while it is being updated.
 | 
						|
 | 
						|
In a perfect world all processes would use and honour a cooperative, or
 | 
						|
"advisory" locking scheme. However, the world isn't perfect, and there's
 | 
						|
a lot of poorly written code out there.
 | 
						|
 | 
						|
In trying to address this problem, the designers of System V UNIX came up
 | 
						|
with a "mandatory" locking scheme, whereby the operating system kernel would
 | 
						|
block attempts by a process to write to a file that another process holds a
 | 
						|
"read" -or- "shared" lock on, and block attempts to both read and write to a 
 | 
						|
file that a process holds a "write " -or- "exclusive" lock on.
 | 
						|
 | 
						|
The System V mandatory locking scheme was intended to have as little impact as
 | 
						|
possible on existing user code. The scheme is based on marking individual files
 | 
						|
as candidates for mandatory locking, and using the existing fcntl()/lockf()
 | 
						|
interface for applying locks just as if they were normal, advisory locks.
 | 
						|
 | 
						|
Note 1: In saying "file" in the paragraphs above I am actually not telling
 | 
						|
the whole truth. System V locking is based on fcntl(). The granularity of
 | 
						|
fcntl() is such that it allows the locking of byte ranges in files, in addition
 | 
						|
to entire files, so the mandatory locking rules also have byte level
 | 
						|
granularity.
 | 
						|
 | 
						|
Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite
 | 
						|
borrowing the fcntl() locking scheme from System V. The mandatory locking
 | 
						|
scheme is defined by the System V Interface Definition (SVID) Version 3.
 | 
						|
 | 
						|
2. Marking a file for mandatory locking
 | 
						|
---------------------------------------
 | 
						|
 | 
						|
A file is marked as a candidate for mandatory locking by setting the group-id
 | 
						|
bit in its file mode but removing the group-execute bit. This is an otherwise
 | 
						|
meaningless combination, and was chosen by the System V implementors so as not
 | 
						|
to break existing user programs.
 | 
						|
 | 
						|
Note that the group-id bit is usually automatically cleared by the kernel when
 | 
						|
a setgid file is written to. This is a security measure. The kernel has been
 | 
						|
modified to recognize the special case of a mandatory lock candidate and to
 | 
						|
refrain from clearing this bit. Similarly the kernel has been modified not
 | 
						|
to run mandatory lock candidates with setgid privileges.
 | 
						|
 | 
						|
3. Available implementations
 | 
						|
----------------------------
 | 
						|
 | 
						|
I have considered the implementations of mandatory locking available with
 | 
						|
SunOS 4.1.x, Solaris 2.x and HP-UX 9.x.
 | 
						|
 | 
						|
Generally I have tried to make the most sense out of the behaviour exhibited
 | 
						|
by these three reference systems. There are many anomalies.
 | 
						|
 | 
						|
All the reference systems reject all calls to open() for a file on which
 | 
						|
another process has outstanding mandatory locks. This is in direct
 | 
						|
contravention of SVID 3, which states that only calls to open() with the
 | 
						|
O_TRUNC flag set should be rejected. The Linux implementation follows the SVID
 | 
						|
definition, which is the "Right Thing", since only calls with O_TRUNC can
 | 
						|
modify the contents of the file.
 | 
						|
 | 
						|
HP-UX even disallows open() with O_TRUNC for a file with advisory locks, not
 | 
						|
just mandatory locks. That would appear to contravene POSIX.1.
 | 
						|
 | 
						|
mmap() is another interesting case. All the operating systems mentioned
 | 
						|
prevent mandatory locks from being applied to an mmap()'ed file, but  HP-UX
 | 
						|
also disallows advisory locks for such a file. SVID actually specifies the
 | 
						|
paranoid HP-UX behaviour.
 | 
						|
 | 
						|
In my opinion only MAP_SHARED mappings should be immune from locking, and then
 | 
						|
only from mandatory locks - that is what is currently implemented.
 | 
						|
 | 
						|
SunOS is so hopeless that it doesn't even honour the O_NONBLOCK flag for
 | 
						|
mandatory locks, so reads and writes to locked files always block when they
 | 
						|
should return EAGAIN.
 | 
						|
 | 
						|
I'm afraid that this is such an esoteric area that the semantics described
 | 
						|
below are just as valid as any others, so long as the main points seem to
 | 
						|
agree. 
 | 
						|
 | 
						|
4. Semantics
 | 
						|
------------
 | 
						|
 | 
						|
1. Mandatory locks can only be applied via the fcntl()/lockf() locking
 | 
						|
   interface - in other words the System V/POSIX interface. BSD style
 | 
						|
   locks using flock() never result in a mandatory lock.
 | 
						|
 | 
						|
2. If a process has locked a region of a file with a mandatory read lock, then
 | 
						|
   other processes are permitted to read from that region. If any of these
 | 
						|
   processes attempts to write to the region it will block until the lock is
 | 
						|
   released, unless the process has opened the file with the O_NONBLOCK
 | 
						|
   flag in which case the system call will return immediately with the error
 | 
						|
   status EAGAIN.
 | 
						|
 | 
						|
3. If a process has locked a region of a file with a mandatory write lock, all
 | 
						|
   attempts to read or write to that region block until the lock is released,
 | 
						|
   unless a process has opened the file with the O_NONBLOCK flag in which case
 | 
						|
   the system call will return immediately with the error status EAGAIN.
 | 
						|
 | 
						|
4. Calls to open() with O_TRUNC, or to creat(), on a existing file that has
 | 
						|
   any mandatory locks owned by other processes will be rejected with the
 | 
						|
   error status EAGAIN.
 | 
						|
 | 
						|
5. Attempts to apply a mandatory lock to a file that is memory mapped and
 | 
						|
   shared (via mmap() with MAP_SHARED) will be rejected with the error status
 | 
						|
   EAGAIN.
 | 
						|
 | 
						|
6. Attempts to create a shared memory map of a file (via mmap() with MAP_SHARED)
 | 
						|
   that has any mandatory locks in effect will be rejected with the error status
 | 
						|
   EAGAIN.
 | 
						|
 | 
						|
5. Which system calls are affected?
 | 
						|
-----------------------------------
 | 
						|
 | 
						|
Those which modify a file's contents, not just the inode. That gives read(),
 | 
						|
write(), readv(), writev(), open(), creat(), mmap(), truncate() and
 | 
						|
ftruncate(). truncate() and ftruncate() are considered to be "write" actions
 | 
						|
for the purposes of mandatory locking.
 | 
						|
 | 
						|
The affected region is usually defined as stretching from the current position
 | 
						|
for the total number of bytes read or written. For the truncate calls it is
 | 
						|
defined as the bytes of a file removed or added (we must also consider bytes
 | 
						|
added, as a lock can specify just "the whole file", rather than a specific
 | 
						|
range of bytes.)
 | 
						|
 | 
						|
Note 3: I may have overlooked some system calls that need mandatory lock
 | 
						|
checking in my eagerness to get this code out the door. Please let me know, or
 | 
						|
better still fix the system calls yourself and submit a patch to me or Linus.
 | 
						|
 | 
						|
6. Warning!
 | 
						|
-----------
 | 
						|
 | 
						|
Not even root can override a mandatory lock, so runaway processes can wreak
 | 
						|
havoc if they lock crucial files. The way around it is to change the file
 | 
						|
permissions (remove the setgid bit) before trying to read or write to it.
 | 
						|
Of course, that might be a bit tricky if the system is hung :-(
 | 
						|
 |