Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[airflow] Add lint rule to show error for removed context variables in airflow #15144

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sunank200
Copy link

@sunank200 sunank200 commented Dec 26, 2024

Summary

Airflow 3.0 removes following deprecated Airflow context variables:

conf
execution_date
next_ds
next_ds_nodash
next_execution_date
prev_ds
prev_ds_nodash
prev_execution_date
prev_execution_date_success
tomorrow_ds
yesterday_ds
yesterday_ds_nodash

They have been deprecated in 2.x, but the removal causes incompatibilities that we want to detect.

related: apache/airflow#44409, apache/airflow#41641

Test Plan

A test fixture is included in the PR.

Copy link
Contributor

github-actions bot commented Dec 26, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+107 -0 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)

apache/airflow (+107 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL

+ airflow/api_connexion/security.py:148:45: AIR302 `conf` is removed in Airflow 3.0
+ airflow/api_connexion/security.py:148:69: AIR302 `conf` is removed in Airflow 3.0
+ airflow/api_connexion/security.py:165:42: AIR302 `conf` is removed in Airflow 3.0
+ airflow/api_connexion/security.py:184:48: AIR302 `conf` is removed in Airflow 3.0
+ airflow/api_connexion/security.py:203:51: AIR302 `conf` is removed in Airflow 3.0
+ airflow/api_connexion/security.py:78:46: AIR302 `conf` is removed in Airflow 3.0
+ airflow/api_connexion/security.py:97:52: AIR302 `conf` is removed in Airflow 3.0
+ airflow/auth/managers/simple/simple_auth_manager.py:131:86: AIR302 `conf` is removed in Airflow 3.0
+ airflow/cli/simple_table.py:131:37: AIR302 `conf` is removed in Airflow 3.0
+ airflow/cli/simple_table.py:132:36: AIR302 `conf` is removed in Airflow 3.0
+ airflow/cli/simple_table.py:133:31: AIR302 `conf` is removed in Airflow 3.0
+ airflow/cli/simple_table.py:134:39: AIR302 `conf` is removed in Airflow 3.0
+ airflow/cli/simple_table.py:135:39: AIR302 `conf` is removed in Airflow 3.0
+ airflow/cli/simple_table.py:136:41: AIR302 `conf` is removed in Airflow 3.0
+ airflow/cli/simple_table.py:137:35: AIR302 `conf` is removed in Airflow 3.0
+ airflow/cli/simple_table.py:141:41: AIR302 `conf` is removed in Airflow 3.0
+ airflow/decorators/base.py:187:58: AIR302 `conf` is removed in Airflow 3.0
+ airflow/decorators/base.py:187:77: AIR302 `conf` is removed in Airflow 3.0
+ airflow/decorators/base.py:237:23: AIR302 `conf` is removed in Airflow 3.0
+ airflow/decorators/base.py:278:35: AIR302 `conf` is removed in Airflow 3.0
+ airflow/decorators/base.py:279:32: AIR302 `conf` is removed in Airflow 3.0
+ airflow/decorators/sensor.py:60:68: AIR302 `conf` is removed in Airflow 3.0
+ airflow/decorators/sensor.py:60:87: AIR302 `conf` is removed in Airflow 3.0
+ airflow/models/baseoperator.py:628:37: AIR302 `conf` is removed in Airflow 3.0
+ airflow/models/baseoperator.py:631:35: AIR302 `conf` is removed in Airflow 3.0
+ airflow/models/log.py:88:23: AIR302 `conf` is removed in Airflow 3.0
+ airflow/models/log.py:90:23: AIR302 `conf` is removed in Airflow 3.0
+ airflow/models/taskinstance.py:2915:66: AIR302 `conf` is removed in Airflow 3.0
+ airflow/sentry.py:172:42: AIR302 `conf` is removed in Airflow 3.0
+ airflow/utils/json.py:112:27: AIR302 `conf` is removed in Airflow 3.0
+ airflow/utils/log/colored_log.py:55:53: AIR302 `conf` is removed in Airflow 3.0
+ airflow/utils/operator_helpers.py:91:24: AIR302 `conf` is removed in Airflow 3.0
+ airflow/utils/operator_helpers.py:92:33: AIR302 `conf` is removed in Airflow 3.0
+ airflow/utils/operator_helpers.py:93:27: AIR302 `conf` is removed in Airflow 3.0
+ airflow/www/auth.py:193:40: AIR302 `conf` is removed in Airflow 3.0
+ airflow/www/auth.py:90:36: AIR302 `conf` is removed in Airflow 3.0
+ providers/src/airflow/providers/alibaba/cloud/log/oss_task_handler.py:48:13: AIR302 `conf` is removed in Airflow 3.0
+ providers/src/airflow/providers/amazon/aws/hooks/sagemaker.py:933:92: AIR302 `conf` is removed in Airflow 3.0
+ providers/src/airflow/providers/amazon/aws/links/emr.py:126:36: AIR302 `conf` is removed in Airflow 3.0
+ providers/src/airflow/providers/amazon/aws/links/emr.py:144:36: AIR302 `conf` is removed in Airflow 3.0
+ providers/src/airflow/providers/amazon/aws/links/emr.py:48:27: AIR302 `conf` is removed in Airflow 3.0
+ providers/src/airflow/providers/amazon/aws/log/s3_task_handler.py:54:13: AIR302 `conf` is removed in Airflow 3.0
+ providers/src/airflow/providers/amazon/aws/secrets/secrets_manager.py:149:40: AIR302 `conf` is removed in Airflow 3.0
+ providers/src/airflow/providers/amazon/aws/secrets/systems_manager.py:107:40: AIR302 `conf` is removed in Airflow 3.0
+ providers/src/airflow/providers/apache/hdfs/log/hdfs_task_handler.py:43:13: AIR302 `conf` is removed in Airflow 3.0
+ providers/src/airflow/providers/common/sql/hooks/sql.py:188:13: AIR302 `conf` is removed in Airflow 3.0
+ providers/src/airflow/providers/common/sql/hooks/sql.py:191:13: AIR302 `conf` is removed in Airflow 3.0
+ providers/src/airflow/providers/common/sql/hooks/sql.py:193:59: AIR302 `conf` is removed in Airflow 3.0
+ providers/src/airflow/providers/elasticsearch/log/es_task_handler.py:182:13: AIR302 `conf` is removed in Airflow 3.0
+ providers/src/airflow/providers/fab/auth_manager/fab_auth_manager.py:454:35: AIR302 `conf` is removed in Airflow 3.0
... 57 additional changes omitted for project

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
AIR302 107 107 0 0 0

@task
def print_config(**context):
# This should not throw an error as logical_date is part of airflow context.
logical_date = context["logical_date"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunank200 and I discussed this earlier. What we're trying to check is whether there's a variable named as context in a function (most commonly seen in taskflow and python operator) and whether it's can be accessed like a dict with the keys we want to check. I think it's unlikely users are using something like this out of the airflow context. But would like to know whether there's any concern

@MichaReiser @uranusjr

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added logic for other ways to access context value as well. It is part of tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s probably better to detect

  1. Arguments of a function decorated with @task (either ** or simple named arguments). (As a follow-up, any functions called by such a function)
  2. The execute function of a BaseOperator subclass (As a follow-up, any functions called by execute)
  3. The dict returned by get_current_context.

This should be better than detecting with variable name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about python_callable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think python_callable takes the context though? It only accepts things you provide in self.op_args and self.op_kwargs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it'll be useful to guard this check by first verifying that the parameter is coming from a function which is decorated with a @task.

I think this can be done as a pre-check for context variables by using the checker.semantic().current_statements() method to traverse up the AST to find the function definition node and checking whether the function has a @task decorator that originates from the airflow module.

/// Returns an [`Iterator`] over the current statement hierarchy, from the current [`Stmt`]
/// through to any parents.
pub fn current_statements(&self) -> impl Iterator<Item = &'a Stmt> + '_ {
let id = self.node_id.expect("No current node");
self.nodes
.ancestor_ids(id)
.filter_map(move |id| self.nodes[id].as_statement())
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think python_callable takes the context though? It only accepts things you provide in self.op_args and self.op_kwargs.

I though we can still get it in the python_callable? https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/python.html#pythonoperator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK I didn’t even realise you can do that… yeah in that case it’s probably a good idea to also detect python_callable arguments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the logic for named argument and function decorated with @task

@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch from 5c96f89 to 5103ef7 Compare December 27, 2024 04:52
@sunank200 sunank200 requested review from Lee-W and uranusjr December 27, 2024 04:53
@task
def print_config(**context):
# This should not throw an error as logical_date is part of airflow context.
logical_date = context["logical_date"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about python_callable?

@dhruvmanila dhruvmanila added rule Implementing or modifying a lint rule preview Related to preview mode features labels Dec 30, 2024
@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch 3 times, most recently from d580a4b to c0a34d3 Compare January 2, 2025 08:03
pub(crate) fn removed_context_variable(checker: &mut Checker, expr: &Expr) {
const REMOVED_CONTEXT_KEYS: [&str; 12] = [
"conf",
"execution_date",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For execution_date there is actually a replacement - in the docs: https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html#deprecated-variables - can you add this?

(Same for next_execution_date, prev_execution_date)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have renamed execution_date to logical_date at places but we have removed them as well:apache/airflow#42404

add lint rule to show error for removed context variables in airflow
@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch from c0a34d3 to 72a9dd3 Compare January 6, 2025 05:14
@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch from 62de813 to fb8b300 Compare January 6, 2025 15:32
@sunank200 sunank200 requested review from uranusjr and Lee-W January 6, 2025 15:32
pub(crate) fn removed_context_variable(checker: &mut Checker, expr: &Expr) {
if let Expr::Subscript(ExprSubscript { value, slice, .. }) = expr {
if let Expr::Name(ExprName { id, .. }) = &**value {
if id.as_str() == "context" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to support code like this?

c = get_current_context()
c["execution_date"]

We should probably not hard code the variable name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to check 2 cases here.

  1. the context is from a function argument (e.g., def func(**context):)
  2. the variable is assigned from get_current_context. we probably could do something like
    let Some(qualname) = typing::resolve_assignment(value, checker.semantic()) else {
    return;
    };


if !value
.as_name_expr()
.is_some_and(|name| matches!(name.id.as_str(), "context" | "kwargs"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I think we should just check for all names (as long as it’s **).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the logic

@dhruvmanila
Copy link
Member

Going to focus on reviewing this PR instead of #15240 for now as I think this one supersedes the other one but please correct me if I'm wrong.

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this. I've a couple of doubts which I've highlighted in the review comments and #15240 (comment).

Comment on lines +959 to +961
let parents: Vec<_> = checker.semantic().current_statements().collect();

for stmt in parents {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid allocating a vector here as we can directly iterate over the statement tree like so:

for stmt in checker.semantic().current_statements() {
	// ...
}

crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs Outdated Show resolved Hide resolved
crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs Outdated Show resolved Hide resolved
crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs Outdated Show resolved Hide resolved
Comment on lines +985 to +986
fn extract_task_function_arguments(stmt: &StmtFunctionDef) -> Vec<String> {
let mut arguments = Vec::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could avoid this allocation but I would like to first understand what does is_task_context_referenced function is suppose to do.

crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs Outdated Show resolved Hide resolved
pub(crate) fn removed_context_variable(checker: &mut Checker, expr: &Expr) {
if let Expr::Subscript(ExprSubscript { value, slice, .. }) = expr {
if let Expr::Name(ExprName { id, .. }) = &**value {
if id.as_str() == "context" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to check 2 cases here.

  1. the context is from a function argument (e.g., def func(**context):)
  2. the variable is assigned from get_current_context. we probably could do something like
    let Some(qualname) = typing::resolve_assignment(value, checker.semantic()) else {
    return;
    };

crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs Outdated Show resolved Hide resolved
/// @task
/// def access_invalid_key_task_out_of_dag(**context):
/// print("access invalid key", context.get("conf"))
/// ```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably will need to check

c = get_current_context()
c.get("execution_date")

crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs Outdated Show resolved Hide resolved
@dhruvmanila
Copy link
Member

Thank you for updating the PR, I plan on looking at it later today.

@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch from 4975421 to ffee139 Compare January 15, 2025 14:54
crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs Outdated Show resolved Hide resolved
false
}

fn extract_task_function_arguments(stmt: &StmtFunctionDef) -> Vec<String> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just inline this function as it's only used once and avoid allocating a vector here but chaining the arguments that needs to be checked:

  for param in stmt
      .parameters
      .args
      .iter()
      .map(|param| param.parameter.name.as_str())
      .chain(
          stmt.parameters
              .kwarg
              .as_ref()
              .map(|vararg| vararg.name.as_str()),
      )
  {
	// ...
}

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, can you update the PR description to include all the checks that are being done? I'm mainly looking for all the structural matching that's being done here, not specific symbols or variables that's being checked. I'm having a hard time keeping track of them :)

@dhruvmanila
Copy link
Member

Please re-request for review when it's ready :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Related to preview mode features rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants