Skip to content

Commit

Permalink
docs: Add Kubernetes executor S3 logging docs (#576)
Browse files Browse the repository at this point in the history
* add kubernetes executor s3 logging docs

* attempt to fix linter

* attempt to fix linter 2

* attempt to fix linter

* Update docs/modules/airflow/pages/usage-guide/using-kubernetes-executors.adoc

Co-authored-by: Sebastian Bernauer <[email protected]>

* Update docs/modules/airflow/pages/usage-guide/using-kubernetes-executors.adoc

Co-authored-by: Sebastian Bernauer <[email protected]>

* Apply suggestions from code review

Co-authored-by: Sebastian Bernauer <[email protected]>

* dont use env abbreviation

* use anchors in yaml

* Update docs/modules/airflow/examples/example-airflow-kubernetes-executor-s3-logging.yaml

Co-authored-by: Sebastian Bernauer <[email protected]>

* improve language

* linter...

---------

Co-authored-by: Sebastian Bernauer <[email protected]>
  • Loading branch information
maltesander and sbernauer authored Jan 27, 2025
1 parent 43038de commit 9992513
Show file tree
Hide file tree
Showing 4 changed files with 63 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
apiVersion: airflow.stackable.tech/v1alpha1
kind: AirflowCluster
metadata:
name: airflow
spec:
image:
productVersion: 2.9.3
clusterConfig: {}
webservers:
envOverrides: &s3-logging-env-overrides
AIRFLOW_LOGGING_REMOTE_LOGGING: "True"
AIRFLOW_LOGGING_REMOTE_BASE_LOG_FOLDER: s3://<bucket-name>/airflow-task-logs/
# The name / connection ID created in the Airflow Web UI
AIRFLOW_LOGGING_REMOTE_LOG_CONN_ID: minio
roleGroups:
default:
replicas: 1
schedulers:
envOverrides: *s3-logging-env-overrides
roleGroups:
default:
replicas: 1
kubernetesExecutors:
envOverrides: *s3-logging-env-overrides
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@

Instead of using the Celery workers you can let Airflow run the tasks using Kubernetes executors, where Pods are created dynamically as needed without jobs being routed through a Redis queue to the workers.

== Kubernetes Executor configuration

To achieve this, swap `spec.celeryExecutors` with `spec.kubernetesExecutors`.
E.g. you would change the following example

Expand All @@ -28,3 +30,40 @@ spec:
resources:
# ...
----

== Logging

Kubernetes Executors and their respective Pods only live as long as the task they are executing.
Afterwards the Pod is immediately terminated and e.g. console output or logs are gone.

In order to persist task logs, Airflow can be configured to store its https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/kubernetes_executor.html#managing-dags-and-logs[executor logs on disk (PV)] or as described in the following section on S3.

=== Airflow Web UI

In the Airflow Web UI, click on `Admin` -> `Connections` -> `Add a new record` (the plus).
Then enter your MinIO host and credentials as shown.

image::airflow_edit_s3_connection.png[Airflow connection menu]

The name or connection ID is `minio`, the type is `Amazon Web Services`, the `AWS Access Key ID` and `AWS Secret Access Key` are filled with the S3 credentials.
The `Extra` field contains the endpoint URL like:

[source,json]
----
{
"endpoint_url": "http://minio.default.svc.cluster.local:9000"
}
----

=== Executor configuration

In order to configure the S3 logging, you need to add the following environment variables to the Airflow cluster definition:

[source,yaml]
----
include::example$example-airflow-kubernetes-executor-s3-logging.yaml[]
----

Now you should be able to fetch and inspect logs in the Airflow Web UI from S3 for each DAG run.

image::airflow_dag_s3_logs.png[Airflow DAG S3 logs]

0 comments on commit 9992513

Please sign in to comment.