Skip to content

Commit

Permalink
Improve CI.rst document clarity. (apache#36181)
Browse files Browse the repository at this point in the history
* Improve document clarity.

* Remove whitespace at end of lines.

* Use double-backticks for code.
  • Loading branch information
UncertainBadg3r authored Dec 13, 2023
1 parent 6ee4d40 commit 82b3c11
Showing 1 changed file with 69 additions and 56 deletions.
125 changes: 69 additions & 56 deletions CI.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,20 +38,27 @@ of how Airflow CI works.
GitHub Actions runs
-------------------

Our builds on CI are highly optimized. They utilise some of the latest features provided by GitHub Actions
environment that make it possible to reuse parts of the build process across different Jobs.

Big part of our CI runs use Container Images. Airflow has a lot of dependencies and in order to make
sure that we are running tests in a well configured and repeatable environment, most of the tests,
documentation building, and some more sophisticated static checks are run inside a docker container
environment. This environment consist of two types of images: CI images and PROD images. CI Images
are used for most of the tests and checks where PROD images are used in the Kubernetes tests.

In order to run the tests, we need to make sure that the images are built using latest sources and that it
is done quickly (full rebuild of such image from scratch might take ~15 minutes). Therefore optimisation
techniques have been implemented that use efficiently cache from the GitHub Docker registry - in most cases
this brings down the time needed to rebuild the image to ~4 minutes. In some cases (when dependencies change)
it can be ~6-7 minutes and in case base image of Python releases new patch-level, it can be ~12 minutes.
Our CI builds are highly optimized, leveraging the latest features provided
by the GitHub Actions environment to reuse parts of the build process across
different jobs.

A significant portion of our CI runs utilize container images. Given that
Airflow has numerous dependencies, we use Docker containers to ensure tests
run in a well-configured and consistent environment. This approach is used
for most tests, documentation building, and some advanced static checks.
The environment comprises two types of images: CI images and PROD images.
CI images are used for most tests and checks, while PROD images are used for
Kubernetes tests.

To run the tests, we need to ensure that the images are built using the
latest sources and that the build process is efficient. A full rebuild of
such an image from scratch might take approximately 15 minutes. Therefore,
we've implemented optimization techniques that efficiently use the cache
from the GitHub Docker registry. In most cases, this reduces the time
needed to rebuild the image to about 4 minutes. However, when
dependencies change, it can take around 6-7 minutes, and if the base
image of Python releases a new patch-level, it can take approximately
12 minutes.

Container Registry used as cache
--------------------------------
Expand Down Expand Up @@ -105,7 +112,7 @@ The image names follow the patterns (except the Python image, all the images are
https://ghcr.io/ in ``apache`` organization.

The packages are available under (CONTAINER_NAME is url-encoded name of the image). Note that "/" are
supported now in the ``ghcr.io`` as apart of the image name within ``apache`` organization, but they
supported now in the ``ghcr.io`` as a part of the image name within the ``apache`` organization, but they
have to be percent-encoded when you access them via UI (/ = %2F)

``https://github.com/apache/airflow/pkgs/container/<CONTAINER_NAME>``
Expand Down Expand Up @@ -192,29 +199,33 @@ When you are running the CI jobs in GitHub Actions, GITHUB_TOKEN is set automati
CI run types
============

The following CI Job run types are currently run for Apache Airflow (run by ci.yaml workflow)
and each of the run types has different purpose and context.
The Apache Airflow project utilizes several types of Continuous Integration (CI)
jobs, each with a distinct purpose and context. These jobs are executed by the
``ci.yaml`` workflow.

Besides the regular "PR" runs we also have "Canary" runs that are able to detect most of the
problems that might impact regular PRs early, without necessarily failing all PRs when those
problems happen. This allows to provide much more stable environment for contributors, who
contribute their PR, while giving a chance to maintainers to react early on problems that
need reaction, when the "canary" builds fail.
In addition to the standard "PR" runs, we also execute "Canary" runs.
These runs are designed to detect potential issues that could affect
regular PRs early on, without causing all PRs to fail when such problems
arise. This strategy ensures a more stable environment for contributors
submitting their PRs. At the same time, it allows maintainers to proactively
address issues highlighted by the "Canary" builds.

Pull request run
----------------

Those runs are results of PR from the forks made by contributors. Most builds for Apache Airflow fall
into this category. They are executed in the context of the "Fork", not main
Airflow Code Repository which means that they have only "read" permission to all the GitHub resources
(container registry, code repository). This is necessary as the code in those PRs (including CI job
definition) might be modified by people who are not committers for the Apache Airflow Code Repository.
These runs are triggered by pull requests from contributors' forks. The majority of
Apache Airflow builds fall into this category. They are executed in the context of
the contributor's "Fork", not the main Airflow Code Repository, meaning they only have
"read" access to all GitHub resources, such as the container registry and code repository.
This is necessary because the code in these PRs, including the CI job definition,
might be modified by individuals who are not committers to the Apache Airflow Code Repository.

The main purpose of those jobs is to check if PR builds cleanly, if the test run properly and if
the PR is ready to review and merge. The runs are using cached images from the Private GitHub registry -
CI, Production Images as well as base Python images that are also cached in the Private GitHub registry.
Also for those builds we only execute Python tests if important files changed (so for example if it is
"no-code" change, no tests will be executed.
The primary purpose of these jobs is to verify if the PR builds cleanly, if the tests
run correctly, and if the PR is ready for review and merge. These runs utilize cached
images from the Private GitHub registry, including CI, Production Images, and base
Python images. Furthermore, for these builds, we only execute Python tests if
significant files have changed. For instance, if the PR involves a "no-code" change,
no tests will be executed.

Regular PR builds run in a "stable" environment:

Expand All @@ -232,30 +243,32 @@ and has WRITE access to the GitHub Container Registry.
Canary run
----------

This is the flow that happens when a pull request is merged to the "main" branch or pushed to any of
the "v2-*-test" branches. The "Canary" run attempts to upgrade dependencies to the latest versions
and quickly pushes a preview of cache the CI/PROD images to the GitHub Registry - so that pull requests
can quickly use the new cache - this is useful when Dockerfile or installation scripts change because such
cache will already have the latest Dockerfile and scripts pushed even if some tests will fail.
When successful, the run updates the constraints files in the "constraints-main" branch with the latest
constraints and pushes both cache and latest CI/PROD images to the GitHub Registry.
When "Canary" build fails, it's often a sign that some of our dependencies released a new version that
is not compatible with current tests or Airflow code, Also it might mean that a breaking change has been
merged to "main". Both cases should be addressed quickly by the maintainers. The "broken main" by our code
should be fixed quickly, while the "broken dependencies" can take a bit of time to fix as until the tests
succeeds, constraints will not be updated, which means that regular PRs will continue using the old version
of dependencies that already passed one of the previous "Canary" runs.

This workflow is triggered when a pull request is merged into the "main" branch or pushed to any of
the "v2-*-test" branches. The "Canary" run aims to upgrade dependencies to their latest versions
and promptly pushes a preview of the CI/PROD image cache to the GitHub Registry. This allows pull
requests to quickly utilize the new cache, which is particularly beneficial when the Dockerfile or
installation scripts have been modified. Even if some tests fail, this cache will already include the
latest Dockerfile and scripts.Upon successful execution, the run updates the constraint files in the
"constraints-main" branch with the latest constraints and pushes both the cache and the latest CI/PROD
images to the GitHub Registry.
If the "Canary" build fails, it often indicates that a new version of our dependencies is incompatible
with the current tests or Airflow code. Alternatively, it could mean that a breaking change has been
merged into "main". Both scenarios require prompt attention from the maintainers. While a "broken main"
due to our code should be fixed quickly, "broken dependencies" may take longer to resolve. Until the tests
pass, the constraints will not be updated, meaning that regular PRs will continue using the older version
of dependencies that passed one of the previous "Canary" runs.

Scheduled runs
--------------

This is the flow that happens when a scheduled run is triggered. The "scheduled" workflow is aimed to
run regularly (overnight). Scheduled run is generally the same as "Canary" run, with the difference
that the image is build always from the scratch and not from the cache. This way we can check that no
"system" dependencies in debian base image have changed and that the build is still reproducible.
No separate diagram is needed for scheduled run as it is identical to that of "Canary" run.
The "scheduled" workflow, which is designed to run regularly (typically overnight),
is triggered when a scheduled run occurs. This workflow is largely identical to the
"Canary" run, with one key difference: the image is always built from scratch, not
from a cache. This approach ensures that we can verify whether any "system" dependencies
in the Debian base image have changed, and confirm that the build process remains reproducible.
Since the process for a scheduled run mirrors that of a "Canary" run, no separate diagram is
necessary to illustrate it.

Workflows
=========
Expand Down Expand Up @@ -310,10 +323,10 @@ that triggered it.
Differences for main and release branches
-----------------------------------------

There are a few differences of what kind of tests are run, depending on which version/branch the tests are executed for.
While all our tests run for the "main" development branch to keep Airflow in check, only a subset of those tests is run
in older branches when we are releasing patch-level releases. This is because we never use old branches to release
providers and helm charts, we only use them to release Airflow and Airflow image.
The type of tests executed varies depending on the version or branch under test. For the "main" development branch,
we run all tests to maintain the quality of Airflow. However, when releasing patch-level updates on older
branches, we only run a subset of these tests. This is because older branches are exclusively used for releasing
Airflow and its corresponding image, not for releasing providers or helm charts.

This behaviour is controlled by ``default-branch`` output of the build-info job. Whenever we create a branch for old version
we update the ``AIRFLOW_BRANCH`` in ``airflow_breeze/branch_defaults.py`` to point to the new branch and there are a few
Expand Down

0 comments on commit 82b3c11

Please sign in to comment.