Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[airflow] Add lint rule to show error for removed context variables in airflow #15144
base: main
Are you sure you want to change the base?
[airflow] Add lint rule to show error for removed context variables in airflow #15144
Changes from 2 commits
72a9dd3
fb8b300
211431f
ffee139
688b19c
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sunank200 and I discussed this earlier. What we're trying to check is whether there's a variable named as
context
in a function (most commonly seen in taskflow and python operator) and whether it's can be accessed like a dict with the keys we want to check. I think it's unlikely users are using something like this out of the airflow context. But would like to know whether there's any concern@MichaReiser @uranusjr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added logic for other ways to access context value as well. It is part of tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s probably better to detect
@task
(either**
or simple named arguments). (As a follow-up, any functions called by such a function)execute
function of a BaseOperator subclass (As a follow-up, any functions called byexecute
)get_current_context
.This should be better than detecting with variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about
python_callable
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think
python_callable
takes the context though? It only accepts things you provide inself.op_args
andself.op_kwargs
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it'll be useful to guard this check by first verifying that the parameter is coming from a function which is decorated with a
@task
.I think this can be done as a pre-check for context variables by using the
checker.semantic().current_statements()
method to traverse up the AST to find the function definition node and checking whether the function has a@task
decorator that originates from theairflow
module.ruff/crates/ruff_python_semantic/src/model.rs
Lines 1232 to 1239 in 9fd4eb8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though we can still get it in the python_callable? https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/python.html#pythonoperator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm OK I didn’t even realise you can do that… yeah in that case it’s probably a good idea to also detect
python_callable
arguments.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the logic for named argument and function decorated with @task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to support code like this?
We should probably not hard code the variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need to check 2 cases here.
context
is from a function argument (e.g.,def func(**context):
)get_current_context
. we probably could do something likeruff/crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs
Lines 265 to 267 in bec8441
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably will need to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, I think we should just check for all names (as long as it’s
**
).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid allocating a vector here as we can directly iterate over the statement tree like so:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just inline this function as it's only used once and avoid allocating a vector here but chaining the arguments that needs to be checked:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could avoid this allocation but I would like to first understand what does
is_task_context_referenced
function is suppose to do.