mirror of
				https://github.com/torvalds/linux.git
				synced 2025-11-04 02:30:34 +02:00 
			
		
		
		
	[PATCH] ramfs, rootfs, and initramfs docs
Docs for ramfs, rootfs, and initramfs. Signed-off-by: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This commit is contained in:
		
							parent
							
								
									cbf8f0f36a
								
							
						
					
					
						commit
						7f46a240b0
					
				
					 1 changed files with 195 additions and 0 deletions
				
			
		
							
								
								
									
										195
									
								
								Documentation/filesystems/ramfs-rootfs-initramfs.txt
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										195
									
								
								Documentation/filesystems/ramfs-rootfs-initramfs.txt
									
									
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,195 @@
 | 
			
		|||
ramfs, rootfs and initramfs
 | 
			
		||||
October 17, 2005
 | 
			
		||||
Rob Landley <rob@landley.net>
 | 
			
		||||
=============================
 | 
			
		||||
 | 
			
		||||
What is ramfs?
 | 
			
		||||
--------------
 | 
			
		||||
 | 
			
		||||
Ramfs is a very simple filesystem that exports Linux's disk caching
 | 
			
		||||
mechanisms (the page cache and dentry cache) as a dynamically resizable
 | 
			
		||||
ram-based filesystem.
 | 
			
		||||
 | 
			
		||||
Normally all files are cached in memory by Linux.  Pages of data read from
 | 
			
		||||
backing store (usually the block device the filesystem is mounted on) are kept
 | 
			
		||||
around in case it's needed again, but marked as clean (freeable) in case the
 | 
			
		||||
Virtual Memory system needs the memory for something else.  Similarly, data
 | 
			
		||||
written to files is marked clean as soon as it has been written to backing
 | 
			
		||||
store, but kept around for caching purposes until the VM reallocates the
 | 
			
		||||
memory.  A similar mechanism (the dentry cache) greatly speeds up access to
 | 
			
		||||
directories.
 | 
			
		||||
 | 
			
		||||
With ramfs, there is no backing store.  Files written into ramfs allocate
 | 
			
		||||
dentries and page cache as usual, but there's nowhere to write them to.
 | 
			
		||||
This means the pages are never marked clean, so they can't be freed by the
 | 
			
		||||
VM when it's looking to recycle memory.
 | 
			
		||||
 | 
			
		||||
The amount of code required to implement ramfs is tiny, because all the
 | 
			
		||||
work is done by the existing Linux caching infrastructure.  Basically,
 | 
			
		||||
you're mounting the disk cache as a filesystem.  Because of this, ramfs is not
 | 
			
		||||
an optional component removable via menuconfig, since there would be negligible
 | 
			
		||||
space savings.
 | 
			
		||||
 | 
			
		||||
ramfs and ramdisk:
 | 
			
		||||
------------------
 | 
			
		||||
 | 
			
		||||
The older "ram disk" mechanism created a synthetic block device out of
 | 
			
		||||
an area of ram and used it as backing store for a filesystem.  This block
 | 
			
		||||
device was of fixed size, so the filesystem mounted on it was of fixed
 | 
			
		||||
size.  Using a ram disk also required unnecessarily copying memory from the
 | 
			
		||||
fake block device into the page cache (and copying changes back out), as well
 | 
			
		||||
as creating and destroying dentries.  Plus it needed a filesystem driver
 | 
			
		||||
(such as ext2) to format and interpret this data.
 | 
			
		||||
 | 
			
		||||
Compared to ramfs, this wastes memory (and memory bus bandwidth), creates
 | 
			
		||||
unnecessary work for the CPU, and pollutes the CPU caches.  (There are tricks
 | 
			
		||||
to avoid this copying by playing with the page tables, but they're unpleasantly
 | 
			
		||||
complicated and turn out to be about as expensive as the copying anyway.)
 | 
			
		||||
More to the point, all the work ramfs is doing has to happen _anyway_,
 | 
			
		||||
since all file access goes through the page and dentry caches.  The ram
 | 
			
		||||
disk is simply unnecessary, ramfs is internally much simpler.
 | 
			
		||||
 | 
			
		||||
Another reason ramdisks are semi-obsolete is that the introduction of
 | 
			
		||||
loopback devices offered a more flexible and convenient way to create
 | 
			
		||||
synthetic block devices, now from files instead of from chunks of memory.
 | 
			
		||||
See losetup (8) for details.
 | 
			
		||||
 | 
			
		||||
ramfs and tmpfs:
 | 
			
		||||
----------------
 | 
			
		||||
 | 
			
		||||
One downside of ramfs is you can keep writing data into it until you fill
 | 
			
		||||
up all memory, and the VM can't free it because the VM thinks that files
 | 
			
		||||
should get written to backing store (rather than swap space), but ramfs hasn't
 | 
			
		||||
got any backing store.  Because of this, only root (or a trusted user) should
 | 
			
		||||
be allowed write access to a ramfs mount.
 | 
			
		||||
 | 
			
		||||
A ramfs derivative called tmpfs was created to add size limits, and the ability
 | 
			
		||||
to write the data to swap space.  Normal users can be allowed write access to
 | 
			
		||||
tmpfs mounts.  See Documentation/filesystems/tmpfs.txt for more information.
 | 
			
		||||
 | 
			
		||||
What is rootfs?
 | 
			
		||||
---------------
 | 
			
		||||
 | 
			
		||||
Rootfs is a special instance of ramfs, which is always present in 2.6 systems.
 | 
			
		||||
(It's used internally as the starting and stopping point for searches of the
 | 
			
		||||
kernel's doubly-linked list of mount points.)
 | 
			
		||||
 | 
			
		||||
Most systems just mount another filesystem over it and ignore it.  The
 | 
			
		||||
amount of space an empty instance of ramfs takes up is tiny.
 | 
			
		||||
 | 
			
		||||
What is initramfs?
 | 
			
		||||
------------------
 | 
			
		||||
 | 
			
		||||
All 2.6 Linux kernels contain a gzipped "cpio" format archive, which is
 | 
			
		||||
extracted into rootfs when the kernel boots up.  After extracting, the kernel
 | 
			
		||||
checks to see if rootfs contains a file "init", and if so it executes it as PID
 | 
			
		||||
1.  If found, this init process is responsible for bringing the system the
 | 
			
		||||
rest of the way up, including locating and mounting the real root device (if
 | 
			
		||||
any).  If rootfs does not contain an init program after the embedded cpio
 | 
			
		||||
archive is extracted into it, the kernel will fall through to the older code
 | 
			
		||||
to locate and mount a root partition, then exec some variant of /sbin/init
 | 
			
		||||
out of that.
 | 
			
		||||
 | 
			
		||||
All this differs from the old initrd in several ways:
 | 
			
		||||
 | 
			
		||||
  - The old initrd was a separate file, while the initramfs archive is linked
 | 
			
		||||
    into the linux kernel image.  (The directory linux-*/usr is devoted to
 | 
			
		||||
    generating this archive during the build.)
 | 
			
		||||
 | 
			
		||||
  - The old initrd file was a gzipped filesystem image (in some file format,
 | 
			
		||||
    such as ext2, that had to be built into the kernel), while the new
 | 
			
		||||
    initramfs archive is a gzipped cpio archive (like tar only simpler,
 | 
			
		||||
    see cpio(1) and Documentation/early-userspace/buffer-format.txt).
 | 
			
		||||
 | 
			
		||||
  - The program run by the old initrd (which was called /initrd, not /init) did
 | 
			
		||||
    some setup and then returned to the kernel, while the init program from
 | 
			
		||||
    initramfs is not expected to return to the kernel.  (If /init needs to hand
 | 
			
		||||
    off control it can overmount / with a new root device and exec another init
 | 
			
		||||
    program.  See the switch_root utility, below.)
 | 
			
		||||
 | 
			
		||||
  - When switching another root device, initrd would pivot_root and then
 | 
			
		||||
    umount the ramdisk.  But initramfs is rootfs: you can neither pivot_root
 | 
			
		||||
    rootfs, nor unmount it.  Instead delete everything out of rootfs to
 | 
			
		||||
    free up the space (find -xdev / -exec rm '{}' ';'), overmount rootfs
 | 
			
		||||
    with the new root (cd /newmount; mount --move . /; chroot .), attach
 | 
			
		||||
    stdin/stdout/stderr to the new /dev/console, and exec the new init.
 | 
			
		||||
 | 
			
		||||
    Since this is a remarkably persnickity process (and involves deleting
 | 
			
		||||
    commands before you can run them), the klibc package introduced a helper
 | 
			
		||||
    program (utils/run_init.c) to do all this for you.  Most other packages
 | 
			
		||||
    (such as busybox) have named this command "switch_root".
 | 
			
		||||
 | 
			
		||||
Populating initramfs:
 | 
			
		||||
---------------------
 | 
			
		||||
 | 
			
		||||
The 2.6 kernel build process always creates a gzipped cpio format initramfs
 | 
			
		||||
archive and links it into the resulting kernel binary.  By default, this
 | 
			
		||||
archive is empty (consuming 134 bytes on x86).  The config option
 | 
			
		||||
CONFIG_INITRAMFS_SOURCE (for some reason buried under devices->block devices
 | 
			
		||||
in menuconfig, and living in usr/Kconfig) can be used to specify a source for
 | 
			
		||||
the initramfs archive, which will automatically be incorporated into the
 | 
			
		||||
resulting binary.  This option can point to an existing gzipped cpio archive, a
 | 
			
		||||
directory containing files to be archived, or a text file specification such
 | 
			
		||||
as the following example:
 | 
			
		||||
 | 
			
		||||
  dir /dev 755 0 0
 | 
			
		||||
  nod /dev/console 644 0 0 c 5 1
 | 
			
		||||
  nod /dev/loop0 644 0 0 b 7 0
 | 
			
		||||
  dir /bin 755 1000 1000
 | 
			
		||||
  slink /bin/sh busybox 777 0 0
 | 
			
		||||
  file /bin/busybox initramfs/busybox 755 0 0
 | 
			
		||||
  dir /proc 755 0 0
 | 
			
		||||
  dir /sys 755 0 0
 | 
			
		||||
  dir /mnt 755 0 0
 | 
			
		||||
  file /init initramfs/init.sh 755 0 0
 | 
			
		||||
 | 
			
		||||
One advantage of the text file is that root access is not required to
 | 
			
		||||
set permissions or create device nodes in the new archive.  (Note that those
 | 
			
		||||
two example "file" entries expect to find files named "init.sh" and "busybox" in
 | 
			
		||||
a directory called "initramfs", under the linux-2.6.* directory.  See
 | 
			
		||||
Documentation/early-userspace/README for more details.)
 | 
			
		||||
 | 
			
		||||
If you don't already understand what shared libraries, devices, and paths
 | 
			
		||||
you need to get a minimal root filesystem up and running, here are some
 | 
			
		||||
references:
 | 
			
		||||
http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
 | 
			
		||||
http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
 | 
			
		||||
http://www.linuxfromscratch.org/lfs/view/stable/
 | 
			
		||||
 | 
			
		||||
The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is
 | 
			
		||||
designed to be a tiny C library to statically link early userspace
 | 
			
		||||
code against, along with some related utilities.  It is BSD licensed.
 | 
			
		||||
 | 
			
		||||
I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
 | 
			
		||||
myself.  These are LGPL and GPL, respectively.
 | 
			
		||||
 | 
			
		||||
In theory you could use glibc, but that's not well suited for small embedded
 | 
			
		||||
uses like this.  (A "hello world" program statically linked against glibc is
 | 
			
		||||
over 400k.  With uClibc it's 7k.  Also note that glibc dlopens libnss to do
 | 
			
		||||
name lookups, even when otherwise statically linked.)
 | 
			
		||||
 | 
			
		||||
Future directions:
 | 
			
		||||
------------------
 | 
			
		||||
 | 
			
		||||
Today (2.6.14), initramfs is always compiled in, but not always used.  The
 | 
			
		||||
kernel falls back to legacy boot code that is reached only if initramfs does
 | 
			
		||||
not contain an /init program.  The fallback is legacy code, there to ensure a
 | 
			
		||||
smooth transition and allowing early boot functionality to gradually move to
 | 
			
		||||
"early userspace" (I.E. initramfs).
 | 
			
		||||
 | 
			
		||||
The move to early userspace is necessary because finding and mounting the real
 | 
			
		||||
root device is complex.  Root partitions can span multiple devices (raid or
 | 
			
		||||
separate journal).  They can be out on the network (requiring dhcp, setting a
 | 
			
		||||
specific mac address, logging into a server, etc).  They can live on removable
 | 
			
		||||
media, with dynamically allocated major/minor numbers and persistent naming
 | 
			
		||||
issues requiring a full udev implementation to sort out.  They can be
 | 
			
		||||
compressed, encrypted, copy-on-write, loopback mounted, strangely partitioned,
 | 
			
		||||
and so on.
 | 
			
		||||
 | 
			
		||||
This kind of complexity (which inevitably includes policy) is rightly handled
 | 
			
		||||
in userspace.  Both klibc and busybox/uClibc are working on simple initramfs
 | 
			
		||||
packages to drop into a kernel build, and when standard solutions are ready
 | 
			
		||||
and widely deployed, the kernel's legacy early boot code will become obsolete
 | 
			
		||||
and a candidate for the feature removal schedule.
 | 
			
		||||
 | 
			
		||||
But that's a while off yet.
 | 
			
		||||
		Loading…
	
		Reference in a new issue