mirror of
				https://github.com/torvalds/linux.git
				synced 2025-11-04 10:40:15 +02:00 
			
		
		
		
	ext4: Orphan file documentation
Add documentation about the orphan file feature. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210816095713.16537-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This commit is contained in:
		
							parent
							
								
									02f310fcf4
								
							
						
					
					
						commit
						3a6541e97c
					
				
					 5 changed files with 89 additions and 6 deletions
				
			
		| 
						 | 
				
			
			@ -11,3 +11,4 @@ have static metadata at fixed locations.
 | 
			
		|||
.. include:: bitmaps.rst
 | 
			
		||||
.. include:: mmp.rst
 | 
			
		||||
.. include:: journal.rst
 | 
			
		||||
.. include:: orphan.rst
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -498,11 +498,11 @@ structure -- inode change time (ctime), access time (atime), data
 | 
			
		|||
modification time (mtime), and deletion time (dtime). The four fields
 | 
			
		||||
are 32-bit signed integers that represent seconds since the Unix epoch
 | 
			
		||||
(1970-01-01 00:00:00 GMT), which means that the fields will overflow in
 | 
			
		||||
January 2038. For inodes that are not linked from any directory but are
 | 
			
		||||
still open (orphan inodes), the dtime field is overloaded for use with
 | 
			
		||||
the orphan list. The superblock field ``s_last_orphan`` points to the
 | 
			
		||||
first inode in the orphan list; dtime is then the number of the next
 | 
			
		||||
orphaned inode, or zero if there are no more orphans.
 | 
			
		||||
January 2038. If the filesystem does not have orphan_file feature, inodes
 | 
			
		||||
that are not linked from any directory but are still open (orphan inodes) have
 | 
			
		||||
the dtime field overloaded for use with the orphan list. The superblock field
 | 
			
		||||
``s_last_orphan`` points to the first inode in the orphan list; dtime is then
 | 
			
		||||
the number of the next orphaned inode, or zero if there are no more orphans.
 | 
			
		||||
 | 
			
		||||
If the inode structure size ``sb->s_inode_size`` is larger than 128
 | 
			
		||||
bytes and the ``i_inode_extra`` field is large enough to encompass the
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
							
								
								
									
										52
									
								
								Documentation/filesystems/ext4/orphan.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										52
									
								
								Documentation/filesystems/ext4/orphan.rst
									
									
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,52 @@
 | 
			
		|||
.. SPDX-License-Identifier: GPL-2.0
 | 
			
		||||
 | 
			
		||||
Orphan file
 | 
			
		||||
-----------
 | 
			
		||||
 | 
			
		||||
In unix there can inodes that are unlinked from directory hierarchy but that
 | 
			
		||||
are still alive because they are open. In case of crash the filesystem has to
 | 
			
		||||
clean up these inodes as otherwise they (and the blocks referenced from them)
 | 
			
		||||
would leak. Similarly if we truncate or extend the file, we need not be able
 | 
			
		||||
to perform the operation in a single journalling transaction. In such case we
 | 
			
		||||
track the inode as orphan so that in case of crash extra blocks allocated to
 | 
			
		||||
the file get truncated.
 | 
			
		||||
 | 
			
		||||
Traditionally ext4 tracks orphan inodes in a form of single linked list where
 | 
			
		||||
superblock contains the inode number of the last orphan inode (s\_last\_orphan
 | 
			
		||||
field) and then each inode contains inode number of the previously orphaned
 | 
			
		||||
inode (we overload i\_dtime inode field for this). However this filesystem
 | 
			
		||||
global single linked list is a scalability bottleneck for workloads that result
 | 
			
		||||
in heavy creation of orphan inodes. When orphan file feature
 | 
			
		||||
(COMPAT\_ORPHAN\_FILE) is enabled, the filesystem has a special inode
 | 
			
		||||
(referenced from the superblock through s\_orphan_file_inum) with several
 | 
			
		||||
blocks. Each of these blocks has a structure:
 | 
			
		||||
 | 
			
		||||
.. list-table::
 | 
			
		||||
   :widths: 8 8 24 40
 | 
			
		||||
   :header-rows: 1
 | 
			
		||||
 | 
			
		||||
   * - Offset
 | 
			
		||||
     - Type
 | 
			
		||||
     - Name
 | 
			
		||||
     - Description
 | 
			
		||||
   * - 0x0
 | 
			
		||||
     - Array of \_\_le32 entries
 | 
			
		||||
     - Orphan inode entries
 | 
			
		||||
     - Each \_\_le32 entry is either empty (0) or it contains inode number of
 | 
			
		||||
       an orphan inode.
 | 
			
		||||
   * - blocksize - 8
 | 
			
		||||
     - \_\_le32
 | 
			
		||||
     - ob\_magic
 | 
			
		||||
     - Magic value stored in orphan block tail (0x0b10ca04)
 | 
			
		||||
   * - blocksize - 4
 | 
			
		||||
     - \_\_le32
 | 
			
		||||
     - ob\_checksum
 | 
			
		||||
     - Checksum of the orphan block.
 | 
			
		||||
 | 
			
		||||
When a filesystem with orphan file feature is writeably mounted, we set
 | 
			
		||||
RO\_COMPAT\_ORPHAN\_PRESENT feature in the superblock to indicate there may
 | 
			
		||||
be valid orphan entries. In case we see this feature when mounting the
 | 
			
		||||
filesystem, we read the whole orphan file and process all orphan inodes found
 | 
			
		||||
there as usual. When cleanly unmounting the filesystem we remove the
 | 
			
		||||
RO\_COMPAT\_ORPHAN\_PRESENT feature to avoid unnecessary scanning of the orphan
 | 
			
		||||
file and also make the filesystem fully compatible with older kernels.
 | 
			
		||||
| 
						 | 
				
			
			@ -36,3 +36,20 @@ ext4 reserves some inode for special features, as follows:
 | 
			
		|||
   * - 11
 | 
			
		||||
     - Traditional first non-reserved inode. Usually this is the lost+found directory. See s\_first\_ino in the superblock.
 | 
			
		||||
 | 
			
		||||
Note that there are also some inodes allocated from non-reserved inode numbers
 | 
			
		||||
for other filesystem features which are not referenced from standard directory
 | 
			
		||||
hierarchy. These are generally reference from the superblock. They are:
 | 
			
		||||
 | 
			
		||||
.. list-table::
 | 
			
		||||
   :widths: 20 50
 | 
			
		||||
   :header-rows: 1
 | 
			
		||||
 | 
			
		||||
   * - Superblock field
 | 
			
		||||
     - Description
 | 
			
		||||
 | 
			
		||||
   * - s\_lpf\_ino
 | 
			
		||||
     - Inode number of lost+found directory.
 | 
			
		||||
   * - s\_prj\_quota\_inum
 | 
			
		||||
     - Inode number of quota file tracking project quotas
 | 
			
		||||
   * - s\_orphan\_file\_inum
 | 
			
		||||
     - Inode number of file tracking orphan inodes.
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -479,7 +479,11 @@ The ext4 superblock is laid out as follows in
 | 
			
		|||
     - Filename charset encoding flags.
 | 
			
		||||
   * - 0x280
 | 
			
		||||
     - \_\_le32
 | 
			
		||||
     - s\_reserved[95]
 | 
			
		||||
     - s\_orphan\_file\_inum
 | 
			
		||||
     - Orphan file inode number.
 | 
			
		||||
   * - 0x284
 | 
			
		||||
     - \_\_le32
 | 
			
		||||
     - s\_reserved[94]
 | 
			
		||||
     - Padding to the end of the block.
 | 
			
		||||
   * - 0x3FC
 | 
			
		||||
     - \_\_le32
 | 
			
		||||
| 
						 | 
				
			
			@ -603,6 +607,11 @@ following:
 | 
			
		|||
       the journal, JBD2 incompat feature
 | 
			
		||||
       (JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT) gets
 | 
			
		||||
       set (COMPAT\_FAST\_COMMIT).
 | 
			
		||||
   * - 0x1000
 | 
			
		||||
     - Orphan file allocated. This is the special file for more efficient
 | 
			
		||||
       tracking of unlinked but still open inodes. When there may be any
 | 
			
		||||
       entries in the file, we additionally set proper rocompat feature
 | 
			
		||||
       (RO\_COMPAT\_ORPHAN\_PRESENT).
 | 
			
		||||
 | 
			
		||||
.. _super_incompat:
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -713,6 +722,10 @@ the following:
 | 
			
		|||
     - Filesystem tracks project quotas. (RO\_COMPAT\_PROJECT)
 | 
			
		||||
   * - 0x8000
 | 
			
		||||
     - Verity inodes may be present on the filesystem. (RO\_COMPAT\_VERITY)
 | 
			
		||||
   * - 0x10000
 | 
			
		||||
     - Indicates orphan file may have valid orphan entries and thus we need
 | 
			
		||||
       to clean them up when mounting the filesystem
 | 
			
		||||
       (RO\_COMPAT\_ORPHAN\_PRESENT).
 | 
			
		||||
 | 
			
		||||
.. _super_def_hash:
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue