mirror of
				https://github.com/torvalds/linux.git
				synced 2025-11-04 10:40:15 +02:00 
			
		
		
		
	xfs: document the motivation for online fsck design
Start the first chapter of the online fsck design documentation. This covers the motivations for creating this in the first place. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
This commit is contained in:
		
							parent
							
								
									09a9639e56
								
							
						
					
					
						commit
						a8f6c2e54d
					
				
					 2 changed files with 213 additions and 0 deletions
				
			
		| 
						 | 
				
			
			@ -123,4 +123,5 @@ Documentation for filesystem implementations.
 | 
			
		|||
   vfat
 | 
			
		||||
   xfs-delayed-logging-design
 | 
			
		||||
   xfs-self-describing-metadata
 | 
			
		||||
   xfs-online-fsck-design
 | 
			
		||||
   zonefs
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
							
								
								
									
										212
									
								
								Documentation/filesystems/xfs-online-fsck-design.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										212
									
								
								Documentation/filesystems/xfs-online-fsck-design.rst
									
									
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,212 @@
 | 
			
		|||
.. SPDX-License-Identifier: GPL-2.0
 | 
			
		||||
.. _xfs_online_fsck_design:
 | 
			
		||||
 | 
			
		||||
..
 | 
			
		||||
        Mapping of heading styles within this document:
 | 
			
		||||
        Heading 1 uses "====" above and below
 | 
			
		||||
        Heading 2 uses "===="
 | 
			
		||||
        Heading 3 uses "----"
 | 
			
		||||
        Heading 4 uses "````"
 | 
			
		||||
        Heading 5 uses "^^^^"
 | 
			
		||||
        Heading 6 uses "~~~~"
 | 
			
		||||
        Heading 7 uses "...."
 | 
			
		||||
 | 
			
		||||
        Sections are manually numbered because apparently that's what everyone
 | 
			
		||||
        does in the kernel.
 | 
			
		||||
 | 
			
		||||
======================
 | 
			
		||||
XFS Online Fsck Design
 | 
			
		||||
======================
 | 
			
		||||
 | 
			
		||||
This document captures the design of the online filesystem check feature for
 | 
			
		||||
XFS.
 | 
			
		||||
The purpose of this document is threefold:
 | 
			
		||||
 | 
			
		||||
- To help kernel distributors understand exactly what the XFS online fsck
 | 
			
		||||
  feature is, and issues about which they should be aware.
 | 
			
		||||
 | 
			
		||||
- To help people reading the code to familiarize themselves with the relevant
 | 
			
		||||
  concepts and design points before they start digging into the code.
 | 
			
		||||
 | 
			
		||||
- To help developers maintaining the system by capturing the reasons
 | 
			
		||||
  supporting higher level decision making.
 | 
			
		||||
 | 
			
		||||
As the online fsck code is merged, the links in this document to topic branches
 | 
			
		||||
will be replaced with links to code.
 | 
			
		||||
 | 
			
		||||
This document is licensed under the terms of the GNU Public License, v2.
 | 
			
		||||
The primary author is Darrick J. Wong.
 | 
			
		||||
 | 
			
		||||
This design document is split into seven parts.
 | 
			
		||||
Part 1 defines what fsck tools are and the motivations for writing a new one.
 | 
			
		||||
Parts 2 and 3 present a high level overview of how online fsck process works
 | 
			
		||||
and how it is tested to ensure correct functionality.
 | 
			
		||||
Part 4 discusses the user interface and the intended usage modes of the new
 | 
			
		||||
program.
 | 
			
		||||
Parts 5 and 6 show off the high level components and how they fit together, and
 | 
			
		||||
then present case studies of how each repair function actually works.
 | 
			
		||||
Part 7 sums up what has been discussed so far and speculates about what else
 | 
			
		||||
might be built atop online fsck.
 | 
			
		||||
 | 
			
		||||
.. contents:: Table of Contents
 | 
			
		||||
   :local:
 | 
			
		||||
 | 
			
		||||
1. What is a Filesystem Check?
 | 
			
		||||
==============================
 | 
			
		||||
 | 
			
		||||
A Unix filesystem has four main responsibilities:
 | 
			
		||||
 | 
			
		||||
- Provide a hierarchy of names through which application programs can associate
 | 
			
		||||
  arbitrary blobs of data for any length of time,
 | 
			
		||||
 | 
			
		||||
- Virtualize physical storage media across those names, and
 | 
			
		||||
 | 
			
		||||
- Retrieve the named data blobs at any time.
 | 
			
		||||
 | 
			
		||||
- Examine resource usage.
 | 
			
		||||
 | 
			
		||||
Metadata directly supporting these functions (e.g. files, directories, space
 | 
			
		||||
mappings) are sometimes called primary metadata.
 | 
			
		||||
Secondary metadata (e.g. reverse mapping and directory parent pointers) support
 | 
			
		||||
operations internal to the filesystem, such as internal consistency checking
 | 
			
		||||
and reorganization.
 | 
			
		||||
Summary metadata, as the name implies, condense information contained in
 | 
			
		||||
primary metadata for performance reasons.
 | 
			
		||||
 | 
			
		||||
The filesystem check (fsck) tool examines all the metadata in a filesystem
 | 
			
		||||
to look for errors.
 | 
			
		||||
In addition to looking for obvious metadata corruptions, fsck also
 | 
			
		||||
cross-references different types of metadata records with each other to look
 | 
			
		||||
for inconsistencies.
 | 
			
		||||
People do not like losing data, so most fsck tools also contains some ability
 | 
			
		||||
to correct any problems found.
 | 
			
		||||
As a word of caution -- the primary goal of most Linux fsck tools is to restore
 | 
			
		||||
the filesystem metadata to a consistent state, not to maximize the data
 | 
			
		||||
recovered.
 | 
			
		||||
That precedent will not be challenged here.
 | 
			
		||||
 | 
			
		||||
Filesystems of the 20th century generally lacked any redundancy in the ondisk
 | 
			
		||||
format, which means that fsck can only respond to errors by erasing files until
 | 
			
		||||
errors are no longer detected.
 | 
			
		||||
More recent filesystem designs contain enough redundancy in their metadata that
 | 
			
		||||
it is now possible to regenerate data structures when non-catastrophic errors
 | 
			
		||||
occur; this capability aids both strategies.
 | 
			
		||||
 | 
			
		||||
+--------------------------------------------------------------------------+
 | 
			
		||||
| **Note**:                                                                |
 | 
			
		||||
+--------------------------------------------------------------------------+
 | 
			
		||||
| System administrators avoid data loss by increasing the number of        |
 | 
			
		||||
| separate storage systems through the creation of backups; and they avoid |
 | 
			
		||||
| downtime by increasing the redundancy of each storage system through the |
 | 
			
		||||
| creation of RAID arrays.                                                 |
 | 
			
		||||
| fsck tools address only the first problem.                               |
 | 
			
		||||
+--------------------------------------------------------------------------+
 | 
			
		||||
 | 
			
		||||
TLDR; Show Me the Code!
 | 
			
		||||
-----------------------
 | 
			
		||||
 | 
			
		||||
Code is posted to the kernel.org git trees as follows:
 | 
			
		||||
`kernel changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-symlink>`_,
 | 
			
		||||
`userspace changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_, and
 | 
			
		||||
`QA test changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=repair-dirs>`_.
 | 
			
		||||
Each kernel patchset adding an online repair function will use the same branch
 | 
			
		||||
name across the kernel, xfsprogs, and fstests git repos.
 | 
			
		||||
 | 
			
		||||
Existing Tools
 | 
			
		||||
--------------
 | 
			
		||||
 | 
			
		||||
The online fsck tool described here will be the third tool in the history of
 | 
			
		||||
XFS (on Linux) to check and repair filesystems.
 | 
			
		||||
Two programs precede it:
 | 
			
		||||
 | 
			
		||||
The first program, ``xfs_check``, was created as part of the XFS debugger
 | 
			
		||||
(``xfs_db``) and can only be used with unmounted filesystems.
 | 
			
		||||
It walks all metadata in the filesystem looking for inconsistencies in the
 | 
			
		||||
metadata, though it lacks any ability to repair what it finds.
 | 
			
		||||
Due to its high memory requirements and inability to repair things, this
 | 
			
		||||
program is now deprecated and will not be discussed further.
 | 
			
		||||
 | 
			
		||||
The second program, ``xfs_repair``, was created to be faster and more robust
 | 
			
		||||
than the first program.
 | 
			
		||||
Like its predecessor, it can only be used with unmounted filesystems.
 | 
			
		||||
It uses extent-based in-memory data structures to reduce memory consumption,
 | 
			
		||||
and tries to schedule readahead IO appropriately to reduce I/O waiting time
 | 
			
		||||
while it scans the metadata of the entire filesystem.
 | 
			
		||||
The most important feature of this tool is its ability to respond to
 | 
			
		||||
inconsistencies in file metadata and directory tree by erasing things as needed
 | 
			
		||||
to eliminate problems.
 | 
			
		||||
Space usage metadata are rebuilt from the observed file metadata.
 | 
			
		||||
 | 
			
		||||
Problem Statement
 | 
			
		||||
-----------------
 | 
			
		||||
 | 
			
		||||
The current XFS tools leave several problems unsolved:
 | 
			
		||||
 | 
			
		||||
1. **User programs** suddenly **lose access** to the filesystem when unexpected
 | 
			
		||||
   shutdowns occur as a result of silent corruptions in the metadata.
 | 
			
		||||
   These occur **unpredictably** and often without warning.
 | 
			
		||||
 | 
			
		||||
2. **Users** experience a **total loss of service** during the recovery period
 | 
			
		||||
   after an **unexpected shutdown** occurs.
 | 
			
		||||
 | 
			
		||||
3. **Users** experience a **total loss of service** if the filesystem is taken
 | 
			
		||||
   offline to **look for problems** proactively.
 | 
			
		||||
 | 
			
		||||
4. **Data owners** cannot **check the integrity** of their stored data without
 | 
			
		||||
   reading all of it.
 | 
			
		||||
   This may expose them to substantial billing costs when a linear media scan
 | 
			
		||||
   performed by the storage system administrator might suffice.
 | 
			
		||||
 | 
			
		||||
5. **System administrators** cannot **schedule** a maintenance window to deal
 | 
			
		||||
   with corruptions if they **lack the means** to assess filesystem health
 | 
			
		||||
   while the filesystem is online.
 | 
			
		||||
 | 
			
		||||
6. **Fleet monitoring tools** cannot **automate periodic checks** of filesystem
 | 
			
		||||
   health when doing so requires **manual intervention** and downtime.
 | 
			
		||||
 | 
			
		||||
7. **Users** can be tricked into **doing things they do not desire** when
 | 
			
		||||
   malicious actors **exploit quirks of Unicode** to place misleading names
 | 
			
		||||
   in directories.
 | 
			
		||||
 | 
			
		||||
Given this definition of the problems to be solved and the actors who would
 | 
			
		||||
benefit, the proposed solution is a third fsck tool that acts on a running
 | 
			
		||||
filesystem.
 | 
			
		||||
 | 
			
		||||
This new third program has three components: an in-kernel facility to check
 | 
			
		||||
metadata, an in-kernel facility to repair metadata, and a userspace driver
 | 
			
		||||
program to drive fsck activity on a live filesystem.
 | 
			
		||||
``xfs_scrub`` is the name of the driver program.
 | 
			
		||||
The rest of this document presents the goals and use cases of the new fsck
 | 
			
		||||
tool, describes its major design points in connection to those goals, and
 | 
			
		||||
discusses the similarities and differences with existing tools.
 | 
			
		||||
 | 
			
		||||
+--------------------------------------------------------------------------+
 | 
			
		||||
| **Note**:                                                                |
 | 
			
		||||
+--------------------------------------------------------------------------+
 | 
			
		||||
| Throughout this document, the existing offline fsck tool can also be     |
 | 
			
		||||
| referred to by its current name "``xfs_repair``".                        |
 | 
			
		||||
| The userspace driver program for the new online fsck tool can be         |
 | 
			
		||||
| referred to as "``xfs_scrub``".                                          |
 | 
			
		||||
| The kernel portion of online fsck that validates metadata is called      |
 | 
			
		||||
| "online scrub", and portion of the kernel that fixes metadata is called  |
 | 
			
		||||
| "online repair".                                                         |
 | 
			
		||||
+--------------------------------------------------------------------------+
 | 
			
		||||
 | 
			
		||||
The naming hierarchy is broken up into objects known as directories and files
 | 
			
		||||
and the physical space is split into pieces known as allocation groups.
 | 
			
		||||
Sharding enables better performance on highly parallel systems and helps to
 | 
			
		||||
contain the damage when corruptions occur.
 | 
			
		||||
The division of the filesystem into principal objects (allocation groups and
 | 
			
		||||
inodes) means that there are ample opportunities to perform targeted checks and
 | 
			
		||||
repairs on a subset of the filesystem.
 | 
			
		||||
 | 
			
		||||
While this is going on, other parts continue processing IO requests.
 | 
			
		||||
Even if a piece of filesystem metadata can only be regenerated by scanning the
 | 
			
		||||
entire system, the scan can still be done in the background while other file
 | 
			
		||||
operations continue.
 | 
			
		||||
 | 
			
		||||
In summary, online fsck takes advantage of resource sharding and redundant
 | 
			
		||||
metadata to enable targeted checking and repair operations while the system
 | 
			
		||||
is running.
 | 
			
		||||
This capability will be coupled to automatic system management so that
 | 
			
		||||
autonomous self-healing of XFS maximizes service availability.
 | 
			
		||||
		Loading…
	
		Reference in a new issue