forked from mirrors/linux
		
	xfs: document the motivation for online fsck design
Start the first chapter of the online fsck design documentation. This covers the motivations for creating this in the first place. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
This commit is contained in:
		
							parent
							
								
									09a9639e56
								
							
						
					
					
						commit
						a8f6c2e54d
					
				
					 2 changed files with 213 additions and 0 deletions
				
			
		| 
						 | 
					@ -123,4 +123,5 @@ Documentation for filesystem implementations.
 | 
				
			||||||
   vfat
 | 
					   vfat
 | 
				
			||||||
   xfs-delayed-logging-design
 | 
					   xfs-delayed-logging-design
 | 
				
			||||||
   xfs-self-describing-metadata
 | 
					   xfs-self-describing-metadata
 | 
				
			||||||
 | 
					   xfs-online-fsck-design
 | 
				
			||||||
   zonefs
 | 
					   zonefs
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
							
								
								
									
										212
									
								
								Documentation/filesystems/xfs-online-fsck-design.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										212
									
								
								Documentation/filesystems/xfs-online-fsck-design.rst
									
									
									
									
									
										Normal file
									
								
							| 
						 | 
					@ -0,0 +1,212 @@
 | 
				
			||||||
 | 
					.. SPDX-License-Identifier: GPL-2.0
 | 
				
			||||||
 | 
					.. _xfs_online_fsck_design:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					..
 | 
				
			||||||
 | 
					        Mapping of heading styles within this document:
 | 
				
			||||||
 | 
					        Heading 1 uses "====" above and below
 | 
				
			||||||
 | 
					        Heading 2 uses "===="
 | 
				
			||||||
 | 
					        Heading 3 uses "----"
 | 
				
			||||||
 | 
					        Heading 4 uses "````"
 | 
				
			||||||
 | 
					        Heading 5 uses "^^^^"
 | 
				
			||||||
 | 
					        Heading 6 uses "~~~~"
 | 
				
			||||||
 | 
					        Heading 7 uses "...."
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        Sections are manually numbered because apparently that's what everyone
 | 
				
			||||||
 | 
					        does in the kernel.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					======================
 | 
				
			||||||
 | 
					XFS Online Fsck Design
 | 
				
			||||||
 | 
					======================
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This document captures the design of the online filesystem check feature for
 | 
				
			||||||
 | 
					XFS.
 | 
				
			||||||
 | 
					The purpose of this document is threefold:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- To help kernel distributors understand exactly what the XFS online fsck
 | 
				
			||||||
 | 
					  feature is, and issues about which they should be aware.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- To help people reading the code to familiarize themselves with the relevant
 | 
				
			||||||
 | 
					  concepts and design points before they start digging into the code.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- To help developers maintaining the system by capturing the reasons
 | 
				
			||||||
 | 
					  supporting higher level decision making.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					As the online fsck code is merged, the links in this document to topic branches
 | 
				
			||||||
 | 
					will be replaced with links to code.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This document is licensed under the terms of the GNU Public License, v2.
 | 
				
			||||||
 | 
					The primary author is Darrick J. Wong.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This design document is split into seven parts.
 | 
				
			||||||
 | 
					Part 1 defines what fsck tools are and the motivations for writing a new one.
 | 
				
			||||||
 | 
					Parts 2 and 3 present a high level overview of how online fsck process works
 | 
				
			||||||
 | 
					and how it is tested to ensure correct functionality.
 | 
				
			||||||
 | 
					Part 4 discusses the user interface and the intended usage modes of the new
 | 
				
			||||||
 | 
					program.
 | 
				
			||||||
 | 
					Parts 5 and 6 show off the high level components and how they fit together, and
 | 
				
			||||||
 | 
					then present case studies of how each repair function actually works.
 | 
				
			||||||
 | 
					Part 7 sums up what has been discussed so far and speculates about what else
 | 
				
			||||||
 | 
					might be built atop online fsck.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. contents:: Table of Contents
 | 
				
			||||||
 | 
					   :local:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					1. What is a Filesystem Check?
 | 
				
			||||||
 | 
					==============================
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					A Unix filesystem has four main responsibilities:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- Provide a hierarchy of names through which application programs can associate
 | 
				
			||||||
 | 
					  arbitrary blobs of data for any length of time,
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- Virtualize physical storage media across those names, and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- Retrieve the named data blobs at any time.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- Examine resource usage.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Metadata directly supporting these functions (e.g. files, directories, space
 | 
				
			||||||
 | 
					mappings) are sometimes called primary metadata.
 | 
				
			||||||
 | 
					Secondary metadata (e.g. reverse mapping and directory parent pointers) support
 | 
				
			||||||
 | 
					operations internal to the filesystem, such as internal consistency checking
 | 
				
			||||||
 | 
					and reorganization.
 | 
				
			||||||
 | 
					Summary metadata, as the name implies, condense information contained in
 | 
				
			||||||
 | 
					primary metadata for performance reasons.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The filesystem check (fsck) tool examines all the metadata in a filesystem
 | 
				
			||||||
 | 
					to look for errors.
 | 
				
			||||||
 | 
					In addition to looking for obvious metadata corruptions, fsck also
 | 
				
			||||||
 | 
					cross-references different types of metadata records with each other to look
 | 
				
			||||||
 | 
					for inconsistencies.
 | 
				
			||||||
 | 
					People do not like losing data, so most fsck tools also contains some ability
 | 
				
			||||||
 | 
					to correct any problems found.
 | 
				
			||||||
 | 
					As a word of caution -- the primary goal of most Linux fsck tools is to restore
 | 
				
			||||||
 | 
					the filesystem metadata to a consistent state, not to maximize the data
 | 
				
			||||||
 | 
					recovered.
 | 
				
			||||||
 | 
					That precedent will not be challenged here.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Filesystems of the 20th century generally lacked any redundancy in the ondisk
 | 
				
			||||||
 | 
					format, which means that fsck can only respond to errors by erasing files until
 | 
				
			||||||
 | 
					errors are no longer detected.
 | 
				
			||||||
 | 
					More recent filesystem designs contain enough redundancy in their metadata that
 | 
				
			||||||
 | 
					it is now possible to regenerate data structures when non-catastrophic errors
 | 
				
			||||||
 | 
					occur; this capability aids both strategies.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					+--------------------------------------------------------------------------+
 | 
				
			||||||
 | 
					| **Note**:                                                                |
 | 
				
			||||||
 | 
					+--------------------------------------------------------------------------+
 | 
				
			||||||
 | 
					| System administrators avoid data loss by increasing the number of        |
 | 
				
			||||||
 | 
					| separate storage systems through the creation of backups; and they avoid |
 | 
				
			||||||
 | 
					| downtime by increasing the redundancy of each storage system through the |
 | 
				
			||||||
 | 
					| creation of RAID arrays.                                                 |
 | 
				
			||||||
 | 
					| fsck tools address only the first problem.                               |
 | 
				
			||||||
 | 
					+--------------------------------------------------------------------------+
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					TLDR; Show Me the Code!
 | 
				
			||||||
 | 
					-----------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Code is posted to the kernel.org git trees as follows:
 | 
				
			||||||
 | 
					`kernel changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-symlink>`_,
 | 
				
			||||||
 | 
					`userspace changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_, and
 | 
				
			||||||
 | 
					`QA test changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=repair-dirs>`_.
 | 
				
			||||||
 | 
					Each kernel patchset adding an online repair function will use the same branch
 | 
				
			||||||
 | 
					name across the kernel, xfsprogs, and fstests git repos.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Existing Tools
 | 
				
			||||||
 | 
					--------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The online fsck tool described here will be the third tool in the history of
 | 
				
			||||||
 | 
					XFS (on Linux) to check and repair filesystems.
 | 
				
			||||||
 | 
					Two programs precede it:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The first program, ``xfs_check``, was created as part of the XFS debugger
 | 
				
			||||||
 | 
					(``xfs_db``) and can only be used with unmounted filesystems.
 | 
				
			||||||
 | 
					It walks all metadata in the filesystem looking for inconsistencies in the
 | 
				
			||||||
 | 
					metadata, though it lacks any ability to repair what it finds.
 | 
				
			||||||
 | 
					Due to its high memory requirements and inability to repair things, this
 | 
				
			||||||
 | 
					program is now deprecated and will not be discussed further.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The second program, ``xfs_repair``, was created to be faster and more robust
 | 
				
			||||||
 | 
					than the first program.
 | 
				
			||||||
 | 
					Like its predecessor, it can only be used with unmounted filesystems.
 | 
				
			||||||
 | 
					It uses extent-based in-memory data structures to reduce memory consumption,
 | 
				
			||||||
 | 
					and tries to schedule readahead IO appropriately to reduce I/O waiting time
 | 
				
			||||||
 | 
					while it scans the metadata of the entire filesystem.
 | 
				
			||||||
 | 
					The most important feature of this tool is its ability to respond to
 | 
				
			||||||
 | 
					inconsistencies in file metadata and directory tree by erasing things as needed
 | 
				
			||||||
 | 
					to eliminate problems.
 | 
				
			||||||
 | 
					Space usage metadata are rebuilt from the observed file metadata.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Problem Statement
 | 
				
			||||||
 | 
					-----------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The current XFS tools leave several problems unsolved:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					1. **User programs** suddenly **lose access** to the filesystem when unexpected
 | 
				
			||||||
 | 
					   shutdowns occur as a result of silent corruptions in the metadata.
 | 
				
			||||||
 | 
					   These occur **unpredictably** and often without warning.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2. **Users** experience a **total loss of service** during the recovery period
 | 
				
			||||||
 | 
					   after an **unexpected shutdown** occurs.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					3. **Users** experience a **total loss of service** if the filesystem is taken
 | 
				
			||||||
 | 
					   offline to **look for problems** proactively.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					4. **Data owners** cannot **check the integrity** of their stored data without
 | 
				
			||||||
 | 
					   reading all of it.
 | 
				
			||||||
 | 
					   This may expose them to substantial billing costs when a linear media scan
 | 
				
			||||||
 | 
					   performed by the storage system administrator might suffice.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					5. **System administrators** cannot **schedule** a maintenance window to deal
 | 
				
			||||||
 | 
					   with corruptions if they **lack the means** to assess filesystem health
 | 
				
			||||||
 | 
					   while the filesystem is online.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					6. **Fleet monitoring tools** cannot **automate periodic checks** of filesystem
 | 
				
			||||||
 | 
					   health when doing so requires **manual intervention** and downtime.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					7. **Users** can be tricked into **doing things they do not desire** when
 | 
				
			||||||
 | 
					   malicious actors **exploit quirks of Unicode** to place misleading names
 | 
				
			||||||
 | 
					   in directories.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Given this definition of the problems to be solved and the actors who would
 | 
				
			||||||
 | 
					benefit, the proposed solution is a third fsck tool that acts on a running
 | 
				
			||||||
 | 
					filesystem.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This new third program has three components: an in-kernel facility to check
 | 
				
			||||||
 | 
					metadata, an in-kernel facility to repair metadata, and a userspace driver
 | 
				
			||||||
 | 
					program to drive fsck activity on a live filesystem.
 | 
				
			||||||
 | 
					``xfs_scrub`` is the name of the driver program.
 | 
				
			||||||
 | 
					The rest of this document presents the goals and use cases of the new fsck
 | 
				
			||||||
 | 
					tool, describes its major design points in connection to those goals, and
 | 
				
			||||||
 | 
					discusses the similarities and differences with existing tools.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					+--------------------------------------------------------------------------+
 | 
				
			||||||
 | 
					| **Note**:                                                                |
 | 
				
			||||||
 | 
					+--------------------------------------------------------------------------+
 | 
				
			||||||
 | 
					| Throughout this document, the existing offline fsck tool can also be     |
 | 
				
			||||||
 | 
					| referred to by its current name "``xfs_repair``".                        |
 | 
				
			||||||
 | 
					| The userspace driver program for the new online fsck tool can be         |
 | 
				
			||||||
 | 
					| referred to as "``xfs_scrub``".                                          |
 | 
				
			||||||
 | 
					| The kernel portion of online fsck that validates metadata is called      |
 | 
				
			||||||
 | 
					| "online scrub", and portion of the kernel that fixes metadata is called  |
 | 
				
			||||||
 | 
					| "online repair".                                                         |
 | 
				
			||||||
 | 
					+--------------------------------------------------------------------------+
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The naming hierarchy is broken up into objects known as directories and files
 | 
				
			||||||
 | 
					and the physical space is split into pieces known as allocation groups.
 | 
				
			||||||
 | 
					Sharding enables better performance on highly parallel systems and helps to
 | 
				
			||||||
 | 
					contain the damage when corruptions occur.
 | 
				
			||||||
 | 
					The division of the filesystem into principal objects (allocation groups and
 | 
				
			||||||
 | 
					inodes) means that there are ample opportunities to perform targeted checks and
 | 
				
			||||||
 | 
					repairs on a subset of the filesystem.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					While this is going on, other parts continue processing IO requests.
 | 
				
			||||||
 | 
					Even if a piece of filesystem metadata can only be regenerated by scanning the
 | 
				
			||||||
 | 
					entire system, the scan can still be done in the background while other file
 | 
				
			||||||
 | 
					operations continue.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In summary, online fsck takes advantage of resource sharding and redundant
 | 
				
			||||||
 | 
					metadata to enable targeted checking and repair operations while the system
 | 
				
			||||||
 | 
					is running.
 | 
				
			||||||
 | 
					This capability will be coupled to automatic system management so that
 | 
				
			||||||
 | 
					autonomous self-healing of XFS maximizes service availability.
 | 
				
			||||||
		Loading…
	
		Reference in a new issue