-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries #31886
Conversation
@@ -405,7 +382,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession with SQLHelper | |||
val goldenOutput = { | |||
s"-- Automatically generated by ${getClass.getSimpleName}\n" + | |||
s"-- Number of queries: ${outputs.size}\n\n\n" + | |||
outputs.zipWithIndex.map{case (qr, i) => qr.toString}.mkString("\n\n\n") + "\n" | |||
outputs.mkString("\n\n\n") + "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just removed the unnecessary code here.
NOTE: I saw the Xiao's comment in the recent release process and then I made this PR. |
Kubernetes integration test starting |
cc @wangyum , @HyukjinKwon because @wangyum made this attempt already before. |
Kubernetes integration test status success |
Yea, but this PR is not for the benchmark purpose. |
I prefer to have this in GA ... cc @wangyum who should be the best person to review. |
+1 to have this check! |
sql/core/src/test/resources/tpcds-query-results/v1_4/q1.sql.out
Outdated
Show resolved
Hide resolved
NOTE: I'm checking if the result in the golden files is correct query-by-query. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #136296 has finished for PR 31886 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #136307 has finished for PR 31886 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #136323 has finished for PR 31886 at commit
|
-- !query schema | ||
struct<i_item_id:string,i_item_desc:string,s_state:string,store_sales_quantitycount:bigint,store_sales_quantityave:double,store_sales_quantitystdev:double,store_sales_quantitycov:double,as_store_returns_quantitycount:bigint,as_store_returns_quantityave:double,as_store_returns_quantitystdev:double,store_returns_quantitycov:double,catalog_sales_quantitycount:bigint,catalog_sales_quantityave:double,catalog_sales_quantitystdev:double,catalog_sales_quantitycov:double> | ||
-- !query output | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be
+AAAAAAAAKPFEAAAA Recently right TN 1 99.0 NULL NULL 1 66.0 NULL NULL 1 32.0 NULL NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be caused by the wrong schema of date_dim
: https://github.com/apache/spark/pull/31012/files#r599447123
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #31012
-- !query schema | ||
struct<i_product_name:string> | ||
-- !query output | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be:
ableationableought
anticallyeingese
callycallyeingese
oughtationableought
A PR to fix this is: #31940
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR seems to have two independent themes. Could you split this PR into two?
- One is having
build_and_test.yml
change only. - One is the other test suite change and golden-file fix.
okay, @dongjoon-hyun ~ After I finish checking all the TPCDS query results, I'll do that (I'm still checking them) |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #136403 has finished for PR 31886 at commit
|
2d891e9
to
0a19585
Compare
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #136675 has finished for PR 31886 at commit
|
Please wait me a few hours. I will verify the result first. |
Test build #136678 has finished for PR 31886 at commit
|
@wangyum, don't worry. There are few more issues to address (#31886 (comment)). It will take more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine. I prefer to switch maropu/spark-tpcds-sf-1
to databricks/spark-sql-perf
. I roughly checked it internally, and it seems fine to open source this repo in the future as well.
We could merge this one, and maybe @wangyum can follow-up to switch to databricks/spark-sql-perf
? I will leave it to both of you to merge and how to proceed.
LGTM (pending #31886 (comment)) in any event.
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #136715 has finished for PR 31886 at commit
|
Thanks, @HyukjinKwon @wangyum . Merged to master. |
If someone finds the failure of the new GA job, please let me know. cc: @dongjoon-hyun @cloud-fan @viirya @gatorsmile |
After we resolve the pending issues on databricks/spark-sql-perf, I'm planning to add this job into branch-3.1./3.0, too. |
Thanks @maropu. |
Thanks @maropu |
This is awesome! We should do it 5 years ago. :-) |
… Adds a new job in GitHub Actions to check the output of TPC-DS queries ### What changes were proposed in this pull request? This PR proposes to add a new job in GitHub Actions to check the output of TPC-DS queries. NOTE: To generate TPC-DS table data in GA jobs, this PR includes generator code implemented in #32243 and #32460. This is the backport PR of #31886. ### Why are the changes needed? There are some cases where we noticed runtime-realted bugs after merging commits (e.g. .SPARK-33822). Therefore, I think it is worth adding a new job in GitHub Actions to check query output of TPC-DS (sf=1). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The new test added. Closes #32462 from maropu/TPCDSQUeryTestSuite-Branch3.1. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
… Adds a new job in GitHub Actions to check the output of TPC-DS queries ### What changes were proposed in this pull request? This PR proposes to add a new job in GitHub Actions to check the output of TPC-DS queries. NOTE: To generate TPC-DS table data in GA jobs, this PR includes generator code implemented in #32243 and #32460. This is the backport PR of #31886. ### Why are the changes needed? There are some cases where we noticed runtime-realted bugs after merging commits (e.g. .SPARK-33822). Therefore, I think it is worth adding a new job in GitHub Actions to check query output of TPC-DS (sf=1). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The new test added. Closes #32479 from maropu/TPCDSQueryTestSuite-Branch3.0. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
… Adds a new job in GitHub Actions to check the output of TPC-DS queries ### What changes were proposed in this pull request? This PR proposes to add a new job in GitHub Actions to check the output of TPC-DS queries. NOTE: To generate TPC-DS table data in GA jobs, this PR includes generator code implemented in apache#32243 and apache#32460. This is the backport PR of apache#31886. ### Why are the changes needed? There are some cases where we noticed runtime-realted bugs after merging commits (e.g. .SPARK-33822). Therefore, I think it is worth adding a new job in GitHub Actions to check query output of TPC-DS (sf=1). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The new test added. Closes apache#32462 from maropu/TPCDSQUeryTestSuite-Branch3.1. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR proposes to add a new job in GitHub Actions to check the output of TPC-DS queries.
NOTE: I've checked that the new job took 17m 35s in the GitHub Actions env.
Why are the changes needed?
There are some cases where we noticed runtime-realted bugs after merging commits (e.g. .SPARK-33822). Therefore, I think it is worth adding a new job in GitHub Actions to check query output of TPC-DS (sf=1).
Does this PR introduce any user-facing change?
No.
How was this patch tested?
The new test added.