forked from mirrors/gecko-dev
		
	 dca287e7da
			
		
	
	
		dca287e7da
		
	
	
	
	
		
			
			Differential Revision: https://phabricator.services.mozilla.com/D35299 --HG-- extra : moz-landing-system : lando
		
			
				
	
	
		
			157 lines
		
	
	
	
		
			6.2 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			157 lines
		
	
	
	
		
			6.2 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. _build_sparse:
 | |
| 
 | |
| ================
 | |
| Sparse Checkouts
 | |
| ================
 | |
| 
 | |
| The Firefox repository is large: over 230,000 files. That many files
 | |
| can put a lot of strain on machines, tools, and processes.
 | |
| 
 | |
| Some version control tools have the ability to only populate a
 | |
| working directory / checkout with a subset of files in the repository.
 | |
| This is called *sparse checkout*.
 | |
| 
 | |
| Various tools in the Firefox repository are configured to work
 | |
| when a sparse checkout is being used.
 | |
| 
 | |
| Sparse Checkouts in Mercurial
 | |
| =============================
 | |
| 
 | |
| Mercurial 4.3 introduced **experimental** support for sparse checkouts
 | |
| in the official distribution (a Facebook-authored extension has
 | |
| implemented the feature as a 3rd party extension for years).
 | |
| 
 | |
| To enable sparse checkout support in Mercurial, enable the ``sparse``
 | |
| extension::
 | |
| 
 | |
|    [extensions]
 | |
|    sparse =
 | |
| 
 | |
| The *sparseness* of the working directory is managed using
 | |
| ``hg debugsparse``. Run ``hg help debugsparse`` and ``hg help -e sparse``
 | |
| for more info on the feature.
 | |
| 
 | |
| When a *sparse config* is enabled, the working directory only contains
 | |
| files matching that config. You cannot ``hg add`` or ``hg remove`` files
 | |
| outside the *sparse config*.
 | |
| 
 | |
| .. warning::
 | |
| 
 | |
|    Sparse support in Mercurial 4.3 does not have any backwards
 | |
|    compatibility guarantees. Expect things to change. Scripting against
 | |
|    commands or relying on behavior is strongly discouraged.
 | |
| 
 | |
| In-Tree Sparse Profiles
 | |
| =======================
 | |
| 
 | |
| Mercurial supports defining the sparse config using files under version
 | |
| control. These are called *sparse profiles*.
 | |
| 
 | |
| Essentially, the sparse profiles are managed just like any other file in
 | |
| the repository. When you ``hg update``, the sparse configuration is
 | |
| evaluated against the sparse profile at the revision being updated to.
 | |
| From an end-user perspective, you just need to *activate* a profile once
 | |
| and files will be added or removed as appropriate whenever the versioned
 | |
| profile file updates.
 | |
| 
 | |
| In the Firefox repository, the ``build/sparse-profiles`` directory
 | |
| contains Mercurial *sparse profiles* files.
 | |
| 
 | |
| Each *sparse profile* essentially defines a list of file patterns
 | |
| (see ``hg help patterns``) to include or exclude. See
 | |
| ``hg help -e sparse`` for more.
 | |
| 
 | |
| Mach Support for Sparse Checkouts
 | |
| =================================
 | |
| 
 | |
| ``mach`` detects when a sparse checkout is being used and its
 | |
| behavior may vary to accommodate this.
 | |
| 
 | |
| By default it is a fatal error if ``mach`` can't load one of the
 | |
| ``mach_commands.py`` files it was told to. But if a sparse checkout
 | |
| is being used, ``mach`` assumes that file isn't part of the sparse
 | |
| checkout and to ignore missing file errors. This means that
 | |
| running ``mach`` inside a sparse checkout will only have access
 | |
| to the commands defined in files in the sparse checkout.
 | |
| 
 | |
| Sparse Checkouts in Automation
 | |
| ==============================
 | |
| 
 | |
| ``hg robustcheckout`` (the extension/command used to perform clones
 | |
| and working directory operations in automation) supports sparse checkout.
 | |
| However, it has a number of limitations over Mercurial's default sparse
 | |
| checkout implementation:
 | |
| 
 | |
| * Only supports 1 profile at a time
 | |
| * Does not support non-profile sparse configs
 | |
| * Does not allow transitioning from a non-sparse to sparse checkout or
 | |
|   vice-versa
 | |
| 
 | |
| These restrictions ensure that any sparse working directory populated by
 | |
| ``hg robustcheckout`` is as consistent and robust as possible.
 | |
| 
 | |
| ``run-task`` (the low-level script for *bootstrapping* tasks in
 | |
| automation) has support for sparse checkouts.
 | |
| 
 | |
| TaskGraph tasks using ``run-task`` can specify a ``sparse-profile``
 | |
| attribute in YAML (or in code) to denote the sparse profile file to
 | |
| use. e.g.::
 | |
| 
 | |
|    run:
 | |
|        using: run-command
 | |
|        command: <command>
 | |
|        sparse-profile: taskgraph
 | |
| 
 | |
| This automagically results in ``run-task`` and ``hg robustcheckout``
 | |
| using the sparse profile defined in ``build/sparse-profiles/<value>``.
 | |
| 
 | |
| Pros and Cons of Sparse Checkouts
 | |
| =================================
 | |
| 
 | |
| The benefits of sparse checkout are that it makes the repository appear
 | |
| to be smaller. This means:
 | |
| 
 | |
| * Less time performing working directory operations -> faster version
 | |
|   control operations
 | |
| * Fewer files to consult -> faster operations
 | |
| * Working directories only contain what is needed -> easier to understand
 | |
|   what everything does
 | |
| 
 | |
| Fewer files in the working directory also contributes to disadvantages:
 | |
| 
 | |
| * Searching may not yield hits because a file isn't in the sparse
 | |
|   checkout. e.g. a *global* search and replace may not actually be
 | |
|   *global* after all.
 | |
| * Tools performing filesystem walking or path globbing (e.g.
 | |
|   ``**/*.js``) may fail to find files because they don't exist.
 | |
| * Various tools and processes make assumptions that all files in the
 | |
|   repository are always available.
 | |
| 
 | |
| There can also be problems caused by mixing sparse and non-sparse
 | |
| checkouts. For example, if a process in automation is using sparse
 | |
| and a local developer is not using sparse, things may work for the
 | |
| local developer but fail in automation (because a file isn't included
 | |
| in the sparse configuration and not available to automation.
 | |
| Furthermore, if environments aren't using exactly the same sparse
 | |
| configuration, differences can contribute to varying behavior.
 | |
| 
 | |
| When Should Sparse Checkouts Be Used?
 | |
| =====================================
 | |
| 
 | |
| Developers are discouraged from using sparse checkouts for local work
 | |
| until tools for handling sparse checkouts have improved. In particular,
 | |
| Mercurial's support for sparse is still experimental and various Firefox
 | |
| tools make assumptions that all files are available. Developers should
 | |
| use sparse checkout at their own risk.
 | |
| 
 | |
| The use of sparse checkouts in automation is a performance versus
 | |
| robustness trade-off. Use of sparse checkouts will make automation
 | |
| faster because machines will only have to manage a few thousand files
 | |
| in a checkout instead of a few hundred thousand. This can potentially
 | |
| translate to minutes saved per machine day. At the scale of thousands
 | |
| of machines, the savings can be significant. But adopting sparse
 | |
| checkouts will open up new avenues for failures. (See section above.)
 | |
| If a process is isolated (in terms of file access) and well-understood,
 | |
| sparse checkout can likely be leveraged with little risk. But if a
 | |
| process is doing things like walking the filesystem and performing
 | |
| lots of wildcard matching, the dangers are higher.
 |