-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create rule S7196: Complex logic provided to PySpark withColumn
method should be refactored into a separate expression
#4642
base: master
Are you sure you want to change the base?
Conversation
withColumn
method should be refactored into a separate expression
…d should be refactored into a separate expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I have added a small comment on formatting and maybe a bit of clarification on what the remediation would be. And I think we could add when
and filter
as well, as you mentioned in the ticket.
rules/S7196/python/rule.adoc
Outdated
`withColumn` method is commonly used to add or modify columns in a DataFrame. When complex functions or expressions are directly passed to withColumn, it can lead to code that is difficult to read, understand, and maintain. Also, it will become easier to write unit tests for these functions, ensuring that the logic is correct and behaves as expected. This leads to more robust and reliable code. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would leave out complex function. As it is ok to pass a single complex function to a withColumn.
I would also add that refactoring theses expressions is a good thing.
This is just a suggestion feel free to rework it.
`withColumn` method is commonly used to add or modify columns in a DataFrame. When complex functions or expressions are directly passed to withColumn, it can lead to code that is difficult to read, understand, and maintain. Also, it will become easier to write unit tests for these functions, ensuring that the logic is correct and behaves as expected. This leads to more robust and reliable code. | |
`withColumn` method is commonly used to add or modify columns in a DataFrame. When long or complex expressions are directly passed to `withColumn`, it can lead to code that is difficult to read, understand, and maintain. Refactoring such expressions into functions or variables will help with readability. Also, it will become easier to write unit tests for these functions, ensuring that the logic is correct and behaves as expected. This leads to more robust and reliable code. | |
rules/S7196/python/rule.adoc
Outdated
@@ -0,0 +1,42 @@ | |||
This rule raises an issue when complex functions or expressions are directly passed to withColumn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here withColumn should be in backticks.
rules/S7196/python/metadata.json
Outdated
@@ -0,0 +1,26 @@ | |||
{ | |||
"title": "Complex logic provided to PySpark withColumn method should be refactored into a separate expression", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think withColumn should be in quotes here (I think in the title we cannot use backticks, but I am not sure)
rules/S7196/python/rule.adoc
Outdated
[source,python,diff-id=1,diff-type=noncompliant] | ||
---- | ||
from pyspark.sql.functions import * | ||
df = df.withColumn('Revenue', col('fare_amount').substr(0, 10).cast("float") + col('extra').substr(0, 5).cast("float") + col('tax').substr(0, 3).cast("float")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The # NonCompliant
comment is missing here. I don;t think this is mandatory, it is just a convention that we have used so far.
dda3503
to
a5843c4
Compare
a5843c4
to
7ede043
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks super good. The example really highlight the strength of this rule! There is only one thing I would add, which I forgot Yesterday, is that we should also detect where
which is an alias for filter
but it is a small detail. We should at least put it in the implementation ticket.
7ede043
to
2ea372b
Compare
Quality Gate passed for 'rspec-tools'Issues Measures |
Quality Gate passed for 'rspec-frontend'Issues Measures |
You can preview this rule here (updated a few minutes after each push).
Review
A dedicated reviewer checked the rule description successfully for: