Commit graph

50 commits

Author SHA1 Message Date
Nick Thomas
0c309d25d1 Bug 1630809 - when downloading artifacts using fetch-content, optionally verify hash using chain-of-trust.json r=aki
This improves the integrity of downloads of upstream artifacts when using fetch-content. If `verify-hash: True` is set on the fetch config, then the chain-of-trust.json of the upstream is used to retieve the expected sha256 of the artifact, and this is checked.

Differential Revision: https://phabricator.services.mozilla.com/D87725
2020-08-27 22:19:46 +00:00
Butkovits Atila
b8629b8d1e Backed out 9 changesets (bug 1630809, bug 1653476) for Gecko Decision failures. CLOSED TREE
Backed out changeset 02a27bfc76dd (bug 1653476)
Backed out changeset afb5df61943a (bug 1630809)
Backed out changeset 04628c1f98e9 (bug 1630809)
Backed out changeset 4b4d50e0b1bf (bug 1630809)
Backed out changeset 2fa2deb5c993 (bug 1630809)
Backed out changeset d6652114cac3 (bug 1630809)
Backed out changeset ad5e4caa3291 (bug 1630809)
Backed out changeset d3d841cd14f3 (bug 1630809)
Backed out changeset b3746502e227 (bug 1630809)
2020-08-28 01:15:03 +03:00
Nick Thomas
a2c4b8f1ea Bug 1630809 - when downloading artifacts using fetch-content, optionally verify hash using chain-of-trust.json r=aki
This improves the integrity of downloads of upstream artifacts when using fetch-content. If `verify-hash: True` is set on the fetch config, then the chain-of-trust.json of the upstream is used to retieve the expected sha256 of the artifact, and this is checked.

Differential Revision: https://phabricator.services.mozilla.com/D87725
2020-08-27 05:28:00 +00:00
Tom Ritter
e6b8454b50 Bug 1616925 - Support a taskcluster-based ssh key for fetch jobs r=tomprince
Differential Revision: https://phabricator.services.mozilla.com/D81448
2020-08-03 15:33:01 +00:00
Noemi Erli
e91002b722 Backed out changeset 359f9a3acc75 (bug 1616925) for causing failures in test_2_conformance2__glsl3__matrix-row-major-dynamic-indexing.html CLOSED TREE 2020-08-03 22:35:34 +03:00
Tom Ritter
58fc2fa062 Bug 1616925 - Support a taskcluster-based ssh key for fetch jobs r=tomprince
Differential Revision: https://phabricator.services.mozilla.com/D81448
2020-08-03 15:33:01 +00:00
Mike Hommey
a760600961 Bug 1634560 - Fix fetch-config for git repos with submodules. r=dmajor
There are cases where --recurse-submodules breaks things (e.g. when
newer versions of the repository remove a submodule). So don't
recurse-submodules at all at clone or checkout time, but instead
initialize and update submodules after the checkout.

Also don't checkout at clone time, because it's redundant with the
checkout, and we only really trust the explicit checkout anyways, so
it's better to not checkout during the clone.

Differential Revision: https://phabricator.services.mozilla.com/D73353
2020-05-02 06:18:33 +00:00
Mike Hommey
f8e37d2100 Bug 1621845 - Normalize fetch path in fetch-content. r=rstewart
The win64-aarch64 have a kind of a nasty trick that makes fetch-content
download artifacts of a dependent task directly as artifacts of the task
itself. For some reason, while this pattern works on native Windows
jobs, it doesn't on Linux. What happens is essentially that:

  `pathlib.Path(path).joinpath('../foo').mkdir(parents=True, exist=ok=True)`

fails when path doesn't exist first. I guess the fetches directory
already exists on Windows worker or something.

Unfortunately, os.path.normpath doesn't take `pathlib.Path`s in
still-supported python 3.5, so we have to convert to str first.

Differential Revision: https://phabricator.services.mozilla.com/D66518

--HG--
extra : moz-landing-system : lando
2020-03-19 08:18:37 +00:00
Mike Hommey
7ef39a5fd8 Bug 1617043 - Track the time spent in fetch-content and mach artifact toolchain. r=rstewart
Note: while we can use time.monotonic in fetch-content, we can't in
mach artifact toolchain yet because it's still python2.

Differential Revision: https://phabricator.services.mozilla.com/D65690

--HG--
extra : moz-landing-system : lando
2020-03-07 10:46:14 +00:00
Justin Wood
b627a90bcf No Bug - Remove taskcluster.net references in the tree. r=aki
Differential Revision: https://phabricator.services.mozilla.com/D58297

--HG--
extra : moz-landing-system : lando
2020-01-24 15:52:50 +00:00
Noemi Erli
8c4ff0fb12 Backed out changeset cf3d74d0cf82 per Callek's request DONTBUILD CLOSED TREE 2020-01-24 17:48:10 +02:00
Justin Wood
19e5f06716 No Bug - Remove taskcluster.net references in the tree.
Differential Revision: https://phabricator.services.mozilla.com/D58297
2020-01-24 00:16:37 +02:00
Andreea Pavel
38dd93c9be Backed out changeset c5a138a88095 on request on a CLOSED TREE 2020-01-24 00:29:17 +02:00
Justin Wood
e38c52acbe No Bug - Remove taskcluster.net references in the tree.
Differential Revision: https://phabricator.services.mozilla.com//D58297
2020-01-24 00:16:37 +02:00
Sebastian Hengst
e2dd028d86 Backed out changeset bbd910f6301a because it only landed to build toolchains and docker images. CLOSED TREE DONTBUILD
It will be relanded once these are complete. This prevents from those tasks
getting scheduled for every push until the initial ones have been completed.
2020-01-06 17:09:20 +01:00
Justin Wood
3835fde8ca No Bug - Remove taskcluster.net references in the tree. r=aki CLOSED TREE
Differential Revision: https://phabricator.services.mozilla.com/D58297

--HG--
extra : amend_source : 0bcd812ae330be7a69ec60f60034533f15e58769
2020-01-03 20:52:34 +01:00
Mike Shal
9b622424d1 Bug 1582189 - Include submodules in git fetch tasks; r=froydnj
Using git-archive for the fetch task means that we don't get the
submodules of a git repository included in the archive. There isn't a
straightforward way to get submodules from a bare repo included with
git-archive, so instead we can simply clone & checkout with
--recurse-submodules and then use a standard tar command to bundle up
the tree.

Adding --recurse-submodules to the commands has no effect on a repo
without submodules, so we can add it to all invocations for simplicity.

Differential Revision: https://phabricator.services.mozilla.com/D46827

--HG--
extra : moz-landing-system : lando
2019-09-25 20:46:24 +00:00
Rob Thijssen
142d7d3127 Bug 1582726 - use cafile from certifi when available r=dustin
python's `urllib.request.urlopen(url)` can fail when a system doesn't know how to verify a ca certificate. this patch makes use of the cafile provided by the `certifi` module, if/when it is installed, to verify certificates.

Differential Revision: https://phabricator.services.mozilla.com/D47044

--HG--
extra : source : 92b9ffc8f37ddd16ca3f426d64df059eea38d5fa
2019-09-26 09:17:15 +00:00
Noemi Erli
5e34ed9990 Backed out changeset 92b9ffc8f37d (bug 1582726) for causing fetch bustages CLOSED TREE 2019-09-26 14:14:17 +03:00
Rob Thijssen
37c23f431d Bug 1582726 - use cafile from certifi when available r=dustin
python's `urllib.request.urlopen(url)` can fail when a system doesn't know how to verify a ca certificate. this patch makes use of the cafile provided by the `certifi` module, if/when it is installed, to verify certificates.

Differential Revision: https://phabricator.services.mozilla.com/D47044

--HG--
extra : moz-landing-system : lando
2019-09-26 09:17:15 +00:00
Dustin J. Mitchell
b6c8e578bf Bug 1572132 - fix URL generation in fetch-content r=glandium
MANUAL PUSH: to allow docker images to build without closing autoland

Differential Revision: https://phabricator.services.mozilla.com/D41038

--HG--
extra : rebase_source : 60ae00549917411d1839b6e3f8e6ae962d217470
extra : amend_source : a2531b115f5732345f8c34c88669428510d100a4
2019-08-07 15:53:15 +00:00
Mike Hommey
b3c14183b8 Bug 1571589 - Allow simple manipulation of file paths in fetched archives. r=tomprince
Namely:
- adding a prefix,
- stripping path components.

Differential Revision: https://phabricator.services.mozilla.com/D40741
2019-08-07 13:54:26 +09:00
Mike Hommey
890f87dad8 Bug 1571589 - Allow to repack downloaded archives "on the fly". r=tomprince
Bug 1479533 was proposing to add a similar functionality, but this
iteration avoids actually unpacking anything, and ensures
reproducibility by relying on the reproducible bits from the original
archives: file ordering, flags, etc. (since they are checksummed, those
are never going to change for a given archive).

Another notable difference is that this applies the repack on the fetch
task itself, rather than create a separate task to apply the repack. The
latter has advantages, in that it allows to change the repacking without
redownloading the original file from a third-party server, but in
practice, most changes to the repacking would trigger the download tasks
anyways.

This patch only takes care of changing the archive type (zip->tar), and
the compression type (anything->zstandard).

Differential Revision: https://phabricator.services.mozilla.com/D40740
2019-08-07 13:54:25 +09:00
Mike Hommey
525bccdd60 Bug 1571589 - Abstract opening a temporary file and renaming it after close. r=tomprince
And use that in git_checkout_archive.

Differential Revision: https://phabricator.services.mozilla.com/D40739
2019-08-07 13:54:24 +09:00
Mike Hommey
34a2eebc79 Bug 1571589 - Use urlparse rather relying on just splitting on / being enough. r=tomprince
Differential Revision: https://phabricator.services.mozilla.com/D40738
2019-08-07 13:54:23 +09:00
Mike Hommey
a57cc9b49f Bug 1570541 - Use tarfile in fetch-content on Windows. r=tomprince
Differential Revision: https://phabricator.services.mozilla.com/D40401
2019-08-07 13:54:14 +09:00
Mike Hommey
21665e187c Bug 1569124 - Add git support to fetch tasks. r=tomprince
This is loosely based on what was in bug 1467359, but simplified to
handle git only, and simply using git-archive because, at least now,
it's deterministic (it uses the commit date as timestamp in tar
archives).

This also adds 4 tasks for some of the things we use for toolchains, but
doesn't hook them up yet.

This also upgrades the fetch docker image to Debian buster, and installs
the required packages in it.

Differential Revision: https://phabricator.services.mozilla.com/D39480
2019-07-30 14:43:31 +09:00
Dustin J. Mitchell
1fded4473e Bug 1508381 - use rootUrl style with taskcluster-proxy r=tomprince
Differential Revision: https://phabricator.services.mozilla.com/D18023

--HG--
extra : moz-landing-system : lando
2019-03-12 20:38:42 +00:00
arthur.iakab
c152ccec1d Backed out 4 changesets (bug 1508381) for multiple Windows build bustages CLOSED TREE
Backed out changeset f01cec6f712e (bug 1508381)
Backed out changeset ba69e59924de (bug 1508381)
Backed out changeset 97fe4e5a665e (bug 1508381)
Backed out changeset 0c3065c12bef (bug 1508381)
2019-01-31 23:14:11 +02:00
Dustin J. Mitchell
22fcbfc133 Bug 1508381 - use rootUrl style with taskcluster-proxy r=tomprince
Differential Revision: https://phabricator.services.mozilla.com/D18023

--HG--
extra : moz-landing-system : lando
2019-01-30 18:58:09 +00:00
Dustin J. Mitchell
60d0a26a65 Bug 1492664 - update fetch-content to use TASKCLUSTER_ROOT_URL; r=tomprince
--HG--
extra : rebase_source : ae5064b8cf13ee50b4db0299e4b7fad215902af1
extra : source : 742b038bb1dd39c029afce73eb3f5b683cb590f2
2018-10-02 14:40:39 +00:00
Sebastian Hengst
767c971623 Backed out 21 changesets (bug 1492664) for breaking cron task for nightlies. a=backout
Backed out changeset a7d50dbb2c8e (bug 1492664)
Backed out changeset 2d876c4ece8b (bug 1492664)
Backed out changeset c82285d253de (bug 1492664)
Backed out changeset bf6d089640eb (bug 1492664)
Backed out changeset d9a7f2ce49c3 (bug 1492664)
Backed out changeset 06c466ab4323 (bug 1492664)
Backed out changeset c1ea4a10cc8d (bug 1492664)
Backed out changeset 4c63a04fdd47 (bug 1492664)
Backed out changeset 742b038bb1dd (bug 1492664)
Backed out changeset 911b4b0fb683 (bug 1492664)
Backed out changeset 870c8cec99e5 (bug 1492664)
Backed out changeset 77699b51336b (bug 1492664)
Backed out changeset 29f33f22fd8b (bug 1492664)
Backed out changeset e7f305408708 (bug 1492664)
Backed out changeset 335a92b1f424 (bug 1492664)
Backed out changeset c566f1c8dcdf (bug 1492664)
Backed out changeset c77ae59aba41 (bug 1492664)
Backed out changeset 9c35dd209c6b (bug 1492664)
Backed out changeset a972d6b4434e (bug 1492664)
Backed out changeset 5ea6f03f845e (bug 1492664)
Backed out changeset 0699d3873e44 (bug 1492664)

--HG--
extra : histedit_source : 5cb1f7e50f25d4a875c1a58c86b7dce902e1a89c%2C20f1ab1a843b612cfcc67cf5c6ff745d65abf076
2018-12-20 12:43:22 +02:00
Dustin J. Mitchell
26dce736fb Bug 1492664 - update fetch-content to use TASKCLUSTER_ROOT_URL; r=tomprince
--HG--
extra : rebase_source : 1cb8dcaf83ffd97088b35d68420b506cc650f197
2018-10-02 14:40:39 +00:00
Margareta Eliza Balazs
2e5e28f518 Backed out 16 changesets (bug 1492664) for breaking developer artifact builds, requested by standard8 a=backout
Backed out changeset 31e500489665 (bug 1492664)
Backed out changeset f4945658d45f (bug 1492664)
Backed out changeset 6d17291b8b92 (bug 1492664)
Backed out changeset 90f3faa36137 (bug 1492664)
Backed out changeset 0b229b00818a (bug 1492664)
Backed out changeset 5eb2c77d70a9 (bug 1492664)
Backed out changeset e1ebad5d89c5 (bug 1492664)
Backed out changeset 3017e5890739 (bug 1492664)
Backed out changeset c8b7e620eabf (bug 1492664)
Backed out changeset d3dfbd848236 (bug 1492664)
Backed out changeset 5c92bb5ac895 (bug 1492664)
Backed out changeset fb7cfca6ebc3 (bug 1492664)
Backed out changeset 0c4101230d4d (bug 1492664)
Backed out changeset b93a0fcc86f3 (bug 1492664)
Backed out changeset 6dc9522ee0bf (bug 1492664)
Backed out changeset 85d7f8b330eb (bug 1492664)
2018-12-19 11:45:29 +02:00
Dustin J. Mitchell
015e1e8538 Bug 1492664 - update fetch-content to use TASKCLUSTER_ROOT_URL; r=tomprince
Differential Revision: https://phabricator.services.mozilla.com/D14207

--HG--
extra : moz-landing-system : lando
2018-12-18 17:26:43 +00:00
Justin Wood
d83c794486 Bug 1475512 - Fix .zip fetch tasks on windows. r=tomprince
Differential Revision: https://phabricator.services.mozilla.com/D9329

--HG--
extra : moz-landing-system : lando
2018-10-22 18:23:05 +00:00
Tom Prince
14cf8b64d6 Bug 1486224: [fetch-content] Retry downloads when fetching content; r=gps
Differential Revision: https://phabricator.services.mozilla.com/D6686

--HG--
extra : moz-landing-system : lando
2018-09-25 16:40:42 +00:00
Nick Thomas
64b1b8b4a0 Bug 1493056 - fetch-content tries to use https for private urls with the proxy, should use http, r=tomprince
Differential Revision: https://phabricator.services.mozilla.com/D6454

--HG--
extra : moz-landing-system : lando
2018-09-21 03:14:27 +00:00
Tom Prince
83067e2603 Bug 1484012: [fetch-content] Add support for downloading private artifacts; r=gps
Differential Revision: https://phabricator.services.mozilla.com/D3556

--HG--
extra : rebase_source : 8207be2e99ee8fdc75209f62c4a357b5c827edce
2018-08-16 15:13:02 -06:00
Tom Prince
9fd238c7e3 Bug 1484012: [fetch-content] Transparently decompress artifacts; r=gps
generic-worker transparently compresses uncompressed artifacts. Teach
fetch-content to decompress those artifacts.

Differential Revision: https://phabricator.services.mozilla.com/D3555

--HG--
extra : rebase_source : 3e1847b545de5443fd4349f75acc605ea5a46701
2018-08-15 15:53:27 -06:00
Tom Prince
6701e41a4c Bug 1484012: [fetch-content] Add an option to not unpack downloaded artifacts; r=gps
Differential Revision: https://phabricator.services.mozilla.com/D3554

--HG--
extra : rebase_source : 58bba31bd921d29d4a40ad8d9ba09c4c7ac1f8dc
2018-08-15 15:16:49 -06:00
Tom Prince
43c8cdcaae Bug 1484012: [fetch-content] Pass MOZ_FETCHES as json; r=gps,ahal
Rather than trying to parse strings, just pass a json blob. This will allow us
to easily do things like mark artifacts to be left unextracted.

Differential Revision: https://phabricator.services.mozilla.com/D3553

--HG--
extra : rebase_source : 4e762c65d1c9f13361d5bae2e4608ba09bb39a91
2018-08-17 10:37:21 -06:00
Andrew Halberstadt
321a8788f2 Bug 1484790 - [fetches] Overwrite without prompting when unzipping an artifact with fetch-content, r=gps
This also moves the call to 'fetch_artifacts' in run-task down inside the
try/finally block. This way if something goes wrong, we'll still cleanup
MOZ_FETCHES_DIR.

Differential Revision: https://phabricator.services.mozilla.com/D4152

--HG--
extra : moz-landing-system : lando
2018-08-24 16:04:59 +00:00
Gregory Szorc
71e90e5309 Bug 1480431 - Make ifh a file object; r=tomprince
Otherwise it can't be used as a context manager since it
doesn't have __enter__ or __exit__.

Differential Revision: https://phabricator.services.mozilla.com/D2672

--HG--
extra : moz-landing-system : lando
2018-08-02 16:22:46 +00:00
Gregory Szorc
3b427569ba Bug 1479533 - Log to stderr, capitalize messages; r=tomprince
This is what a lot of programs do.

We do logging in a helper function so we can flush after every write.

Differential Revision: https://phabricator.services.mozilla.com/D2526

--HG--
extra : rebase_source : 98563aee129c16662a783122241623b8ed2fe457
2018-07-31 15:39:10 -07:00
Gregory Szorc
2207dd7026 Bug 1479533 - Refactor archive decompression; r=tomprince
Previously, we told `tar` or `unzip` to operate on an explicit file.
This worked when `tar` understood the compression format of the file.
And this worked in the majority of cases.

But `tar` does not support zstandard compression (at least not outside
extremely new versions, which aren't yet widely deployed). And not all
versions of `tar` support the `-a` argument.

This commit changes our invocation of `tar` so input data is piped
to it from Python. In the case of `tar`, we perform decompression in
Python, if possible. This allows us to support zstandard and `tar`
binaries that don't support `-a` to auto-detect the compression format.

I wanted to be consistent and always pipe the raw data via stdin.
But `unzip` doesn't appear to like this. Oh well.

We also refactor the logic around detecting archives. We have a
function to identify the archive type based on a filename. We then
pass the archive type to the extraction function and key off that
logic within. We also conditionally call extract_archive() and
fail hard in extract_archive() when things fail. This will make
future archive code easier to reason about.

Differential Revision: https://phabricator.services.mozilla.com/D1576

--HG--
extra : rebase_source : 1c66396cced1b2a94a959386eecc3f512b033308
2018-08-01 09:00:58 -07:00
Andrew Halberstadt
e8a36b30d0 Bug 1468812 - [fetch-content] Implement ability to specify a per-fetch subdirectory to extract into r=gps
Currently 'fetch' artifacts are all extracted in the same directory, this could
make the extdir messy, or in the worst case, cause file name collisions.

Some artifacts are ok to extract into the same directory as they're already
bundled within the archive. But other artifacts are not. This patch keeps the
default behaviour (extracting everything into the same directory), but allows
task authors to specify per-artifact directories to extract into.

The syntax is:
path[>dest]@<task>

The 'dest' value will be a subdirectory of the MOZ_FETCHES_DIR environment
variable.

Depends on D2102.

Differential Revision: https://phabricator.services.mozilla.com/D2166

--HG--
extra : moz-landing-system : lando
2018-07-18 17:52:43 +00:00
Gregory Szorc
8922082362 Bug 1460777 - Taskgraph tasks for retrieving remote content; r=dustin, glandium
Currently, many tasks fetch content from the Internets. A problem with
that is fetching from the Internets is unreliable: servers may have
outages or be slow; content may disappear or change out from under us.

The unreliability of 3rd party services poses a risk to Firefox CI.
If services aren't available, we could potentially not run some CI tasks.
In the worst case, we might not be able to release Firefox. That would
be bad. In fact, as I write this, gmplib.org has been unavailable for
~24 hours and Firefox CI is unable to retrieve the GMP source code.
As a result, building GCC toolchains is failing.

A solution to this is to make tasks more hermetic by depending on
fewer network services (which by definition aren't reliable over time
and therefore introduce instability).

This commit attempts to mitigate some external service dependencies
by introducing the *fetch* task kind.

The primary goal of the *fetch* kind is to obtain remote content and
re-expose it as a task artifact. By making external content available
as a cached task artifact, we allow dependent tasks to consume this
content without touching the service originally providing that
content, thus eliminating a run-time dependency and making tasks more
hermetic and reproducible over time.

We introduce a single "fetch-url" "using" flavor to define tasks that
fetch single URLs and then re-expose that URL as an artifact. Powering
this is a new, minimal "fetch" Docker image that contains a
"fetch-content" Python script that does the work for us.

We have added tasks to fetch source archives used to build the GCC
toolchains.

Fetching remote content and re-exposing it as an artifact is not
very useful by itself: the value is in having tasks use those
artifacts.

We introduce a taskgraph transform that allows tasks to define an
array of "fetches." Each entry corresponds to the name of a "fetch"
task kind. When present, the corresponding "fetch" task is added as a
dependency. And the task ID and artifact path from that "fetch" task
is added to the MOZ_FETCHES environment variable of the task depending
on it. Our "fetch-content" script has a "task-artifacts"
sub-command that tasks can execute to perform retrieval of all
artifacts listed in MOZ_FETCHES.

To prove all of this works, the code for fetching dependencies when
building GCC toolchains has been updated to use `fetch-content`. The
now-unused legacy code has been deleted.

This commit improves the reliability and efficiency of GCC toolchain
tasks. Dependencies now all come from task artifacts and should always
be available in the common case. In addition, `fetch-content` downloads
and extracts files concurrently. This makes it faster than the serial
application which we were previously using.

There are some things I don't like about this commit.

First, a new Docker image and Python script for downloading URLs feels
a bit heavyweight. The Docker image is definitely overkill as things
stand. I can eventually justify it because I want to implement support
for fetching and repackaging VCS repositories and for caching Debian
packages. These will require more packages than what I'm comfortable
installing on the base Debian image, therefore justifying a dedicated
image.

The `fetch-content static-url` sub-command could definitely be
implemented as a shell script. But Python is readily available and
is more pleasant to maintain than shell, so I wrote it in Python.

`fetch-content task-artifacts` is more advanced and writing it in
Python is more justified, IMO. FWIW, the script is Python 3 only,
which conveniently gives us access to `concurrent.futures`, which
facilitates concurrent download.

`fetch-content` also duplicates functionality found elsewhere.
generic-worker's task payload supports a "mounts" feature which
facilitates downloading remote content, including from a task
artifact. However, this feature doesn't exist on docker-worker.
So we have to implement downloading inside the task rather than
at the worker level. I concede that if all workers had generic-worker's
"mounts" feature and supported concurrent download, `fetch-content`
wouldn't need to exist.

`fetch-content` also duplicates functionality of
`mach artifact toolchain`. I probably could have used
`mach artifact toolchain` instead of writing
`fetch-content task-artifacts`. However, I didn't want to introduce
the requirement of a VCS checkout. `mach artifact toolchain` has its
origins in providing a feature to the build system. And "fetching
artifacts from tasks" is a more generic feature than that. I think
it should be implemented as a generic feature and not something that is
"toolchain" specific.

I think the best place for a generic "fetch content" feature is in
the worker, where content can be defined in the task payload. But as
explained above, that feature isn't universally available. The next
best place is probably run-task. run-task already performs generic,
very-early task preparation steps, such as performing a VCS checkout.
I would like to fold `fetch-content` into run-task and make it all
driven by environment variables. But run-task is currently Python 2
and achieving concurrency would involve a bit of programming (or
adding package dependencies). I may very well port run-task to Python
3 and then fold fetch-content into it. Or maybe we leave
`fetch-content` as a standalone script.

MozReview-Commit-ID: AGuTcwNcNJR

--HG--
extra : source : 0b941cbdca76fb2fbb98dc5bbc1a0237c69954d0
extra : histedit_source : a3e43bdd8a9a58550bef02fec3be832ca304ea93
2018-06-06 14:37:49 -07:00
Gurzau Raul
53a10471cf Backed out 2 changesets (bug 1460777) for Toolchains failure on a CLOSED TREE
Backed out changeset 52ef9348401d (bug 1460777)
Backed out changeset 60ed097650b8 (bug 1460777)
2018-06-06 20:57:29 +03:00
Gregory Szorc
2f189264b9 Bug 1460777 - Taskgraph tasks for retrieving remote content; r=dustin,glandium
Currently, many tasks fetch content from the Internets. A problem with
that is fetching from the Internets is unreliable: servers may have
outages or be slow; content may disappear or change out from under us.

The unreliability of 3rd party services poses a risk to Firefox CI.
If services aren't available, we could potentially not run some CI tasks.
In the worst case, we might not be able to release Firefox. That would
be bad. In fact, as I write this, gmplib.org has been unavailable for
~24 hours and Firefox CI is unable to retrieve the GMP source code.
As a result, building GCC toolchains is failing.

A solution to this is to make tasks more hermetic by depending on
fewer network services (which by definition aren't reliable over time
and therefore introduce instability).

This commit attempts to mitigate some external service dependencies
by introducing the *fetch* task kind.

The primary goal of the *fetch* kind is to obtain remote content and
re-expose it as a task artifact. By making external content available
as a cached task artifact, we allow dependent tasks to consume this
content without touching the service originally providing that
content, thus eliminating a run-time dependency and making tasks more
hermetic and reproducible over time.

We introduce a single "fetch-url" "using" flavor to define tasks that
fetch single URLs and then re-expose that URL as an artifact. Powering
this is a new, minimal "fetch" Docker image that contains a
"fetch-content" Python script that does the work for us.

We have added tasks to fetch source archives used to build the GCC
toolchains.

Fetching remote content and re-exposing it as an artifact is not
very useful by itself: the value is in having tasks use those
artifacts.

We introduce a taskgraph transform that allows tasks to define an
array of "fetches." Each entry corresponds to the name of a "fetch"
task kind. When present, the corresponding "fetch" task is added as a
dependency. And the task ID and artifact path from that "fetch" task
is added to the MOZ_FETCHES environment variable of the task depending
on it. Our "fetch-content" script has a "task-artifacts"
sub-command that tasks can execute to perform retrieval of all
artifacts listed in MOZ_FETCHES.

To prove all of this works, the code for fetching dependencies when
building GCC toolchains has been updated to use `fetch-content`. The
now-unused legacy code has been deleted.

This commit improves the reliability and efficiency of GCC toolchain
tasks. Dependencies now all come from task artifacts and should always
be available in the common case. In addition, `fetch-content` downloads
and extracts files concurrently. This makes it faster than the serial
application which we were previously using.

There are some things I don't like about this commit.

First, a new Docker image and Python script for downloading URLs feels
a bit heavyweight. The Docker image is definitely overkill as things
stand. I can eventually justify it because I want to implement support
for fetching and repackaging VCS repositories and for caching Debian
packages. These will require more packages than what I'm comfortable
installing on the base Debian image, therefore justifying a dedicated
image.

The `fetch-content static-url` sub-command could definitely be
implemented as a shell script. But Python is readily available and
is more pleasant to maintain than shell, so I wrote it in Python.

`fetch-content task-artifacts` is more advanced and writing it in
Python is more justified, IMO. FWIW, the script is Python 3 only,
which conveniently gives us access to `concurrent.futures`, which
facilitates concurrent download.

`fetch-content` also duplicates functionality found elsewhere.
generic-worker's task payload supports a "mounts" feature which
facilitates downloading remote content, including from a task
artifact. However, this feature doesn't exist on docker-worker.
So we have to implement downloading inside the task rather than
at the worker level. I concede that if all workers had generic-worker's
"mounts" feature and supported concurrent download, `fetch-content`
wouldn't need to exist.

`fetch-content` also duplicates functionality of
`mach artifact toolchain`. I probably could have used
`mach artifact toolchain` instead of writing
`fetch-content task-artifacts`. However, I didn't want to introduce
the requirement of a VCS checkout. `mach artifact toolchain` has its
origins in providing a feature to the build system. And "fetching
artifacts from tasks" is a more generic feature than that. I think
it should be implemented as a generic feature and not something that is
"toolchain" specific.

I think the best place for a generic "fetch content" feature is in
the worker, where content can be defined in the task payload. But as
explained above, that feature isn't universally available. The next
best place is probably run-task. run-task already performs generic,
very-early task preparation steps, such as performing a VCS checkout.
I would like to fold `fetch-content` into run-task and make it all
driven by environment variables. But run-task is currently Python 2
and achieving concurrency would involve a bit of programming (or
adding package dependencies). I may very well port run-task to Python
3 and then fold fetch-content into it. Or maybe we leave
`fetch-content` as a standalone script.

MozReview-Commit-ID: AGuTcwNcNJR

--HG--
extra : rebase_source : 4918b8c3bac53d63665006802054038bfbca0314
2018-06-06 09:37:38 -07:00