Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries #31886

Closed
wants to merge 15 commits into from
48 changes: 48 additions & 0 deletions .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -428,3 +428,51 @@ jobs:
- name: Build with SBT
run: |
./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Phadoop-2.7 compile test:compile

tpcds-1g:
name: Run TPC-DS queries with SF=1
runs-on: ubuntu-20.04
steps:
- name: Checkout Spark repository
HyukjinKwon marked this conversation as resolved.
Show resolved Hide resolved
uses: actions/checkout@v2
- name: Cache TPC-DS generated data
id: cache-tpcds-sf-1
uses: actions/cache@v2
with:
path: ./tpcds-sf-1
key: tpcds-${{ hashFiles('tpcds-sf-1/.spark-tpcds-sf-1.md5') }}
restore-keys: |
tpcds-
- name: Checkout TPC-DS (SF=1) generated data repository
if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true'
uses: actions/checkout@v2
with:
repository: maropu/spark-tpcds-sf-1
Copy link
Member

@HyukjinKwon HyukjinKwon Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. To generate table data for TPCDSQueryTestSuite, I think we need more fixes in the orignal repo because they have different TPC-DS schemas (char/varchar vs string):
spark-sql-perf: https://github.com/databricks/spark-sql-perf/blob/master/src/main/scala/com/databricks/spark/sql/perf/tpcds/TPCDSTables.scala#L57-L542
spark-master/branch-3.1: https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala#L51-L548

I've filed a ticket for this: databricks/spark-sql-perf#198 I'll make a PR for it when I have time later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maropu. Just to clarify, do you need databricks/spark-sql-perf#196 too?

Copy link
Member Author

@maropu maropu Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, the @wangyum PR looks useful when generating data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will ping him offline to review

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @HyukjinKwon ~

ref: 6b660a53091bd6d23cbe58b0f09aae08e71cc667
path: ./tpcds-sf-1
- name: Cache Coursier local repository
uses: actions/cache@v2
with:
path: ~/.cache/coursier
key: tpcds-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
restore-keys: |
tpcds-coursier-
- name: Install Java 8
uses: actions/setup-java@v1
with:
java-version: 8
- name: Run TPC-DS queries
run: |
SPARK_TPCDS_DATA=`pwd`/tpcds-sf-1 build/sbt "sql/testOnly org.apache.spark.sql.TPCDSQueryTestSuite"
- name: Upload test results to report
if: always()
uses: actions/upload-artifact@v2
with:
name: test-results-tpcds--8-hadoop3.2-hive2.3
path: "**/target/test-reports/*.xml"
- name: Upload unit tests log files
if: failure()
uses: actions/upload-artifact@v2
with:
name: unit-tests-log-tpcds--8-hadoop3.2-hive2.3
path: "**/target/unit-tests.log"
105 changes: 105 additions & 0 deletions sql/core/src/test/resources/tpcds-query-results/v1_4/q1.sql.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
-- Automatically generated by TPCDSQueryTestSuite

-- !query schema
struct<c_customer_id:string>
-- !query output
AAAAAAAAAAABBAAA
AAAAAAAAAAADBAAA
AAAAAAAAAAADBAAA
AAAAAAAAAAAKAAAA
AAAAAAAAAABDAAAA
AAAAAAAAAABHBAAA
AAAAAAAAAABLAAAA
AAAAAAAAAABMAAAA
AAAAAAAAAACHAAAA
AAAAAAAAAACMAAAA
AAAAAAAAAADDAAAA
AAAAAAAAAADGAAAA
AAAAAAAAAADGBAAA
AAAAAAAAAADGBAAA
AAAAAAAAAADPAAAA
AAAAAAAAAAEBAAAA
AAAAAAAAAAEFBAAA
AAAAAAAAAAEGBAAA
AAAAAAAAAAEIAAAA
AAAAAAAAAAEMAAAA
AAAAAAAAAAFAAAAA
AAAAAAAAAAFPAAAA
AAAAAAAAAAGGBAAA
AAAAAAAAAAGHBAAA
AAAAAAAAAAGJAAAA
AAAAAAAAAAGMAAAA
AAAAAAAAAAHEBAAA
AAAAAAAAAAHFBAAA
AAAAAAAAAAIEBAAA
AAAAAAAAAAJGBAAA
AAAAAAAAAAJHBAAA
AAAAAAAAAAKCAAAA
AAAAAAAAAAKCAAAA
AAAAAAAAAAKJAAAA
AAAAAAAAAAKMAAAA
AAAAAAAAAAKMAAAA
AAAAAAAAAALAAAAA
AAAAAAAAAALABAAA
AAAAAAAAAALGAAAA
AAAAAAAAAALHBAAA
AAAAAAAAAALJAAAA
AAAAAAAAAANHAAAA
AAAAAAAAAANHBAAA
AAAAAAAAAANJAAAA
AAAAAAAAAANMAAAA
AAAAAAAAAANMAAAA
AAAAAAAAAANNAAAA
AAAAAAAAAAOBBAAA
AAAAAAAAAAODBAAA
AAAAAAAAAAOLAAAA
AAAAAAAAAAPGBAAA
AAAAAAAAABAAAAAA
AAAAAAAAABAEAAAA
AAAAAAAAABAEBAAA
AAAAAAAAABAFBAAA
AAAAAAAAABAIAAAA
AAAAAAAAABAOAAAA
AAAAAAAAABBDBAAA
AAAAAAAAABCFAAAA
AAAAAAAAABCHBAAA
AAAAAAAAABDHAAAA
AAAAAAAAABENAAAA
AAAAAAAAABFEBAAA
AAAAAAAAABFGAAAA
AAAAAAAAABFMAAAA
AAAAAAAAABFPAAAA
AAAAAAAAABGFAAAA
AAAAAAAAABGFBAAA
AAAAAAAAABGJAAAA
AAAAAAAAABIBBAAA
AAAAAAAAABICBAAA
AAAAAAAAABIIAAAA
AAAAAAAAABJNAAAA
AAAAAAAAABKGBAAA
AAAAAAAAABLOAAAA
AAAAAAAAABLPAAAA
AAAAAAAAABMABAAA
AAAAAAAAABMPAAAA
AAAAAAAAABNAAAAA
AAAAAAAAABNCBAAA
AAAAAAAAABNEBAAA
AAAAAAAAABNLAAAA
AAAAAAAAABNOAAAA
AAAAAAAAABNPAAAA
AAAAAAAAABOAAAAA
AAAAAAAAABOFBAAA
AAAAAAAAABOOAAAA
AAAAAAAAABOPAAAA
AAAAAAAAABPEAAAA
AAAAAAAAACADAAAA
AAAAAAAAACAFAAAA
AAAAAAAAACAFAAAA
AAAAAAAAACAHBAAA
AAAAAAAAACAJAAAA
AAAAAAAAACBDAAAA
AAAAAAAAACBDAAAA
AAAAAAAAACBEBAAA
AAAAAAAAACBNAAAA
AAAAAAAAACBPAAAA
AAAAAAAAACCHAAAA
10 changes: 10 additions & 0 deletions sql/core/src/test/resources/tpcds-query-results/v1_4/q10.sql.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
-- Automatically generated by TPCDSQueryTestSuite

-- !query schema
struct<cd_gender:string,cd_marital_status:string,cd_education_status:string,cnt1:bigint,cd_purchase_estimate:int,cnt2:bigint,cd_credit_rating:string,cnt3:bigint,cd_dep_count:int,cnt4:bigint,cd_dep_employed_count:int,cnt5:bigint,cd_dep_college_count:int,cnt6:bigint>
-- !query output
F D Advanced Degree 1 3000 1 High Risk 1 2 1 4 1 5 1
F D Unknown 1 1500 1 Good 1 6 1 5 1 4 1
M D College 1 8500 1 Low Risk 1 3 1 0 1 1 1
M D Primary 1 7000 1 Unknown 1 2 1 1 1 1 1
M W Unknown 1 4500 1 Good 1 5 1 0 1 1 1
93 changes: 93 additions & 0 deletions sql/core/src/test/resources/tpcds-query-results/v1_4/q11.sql.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
-- Automatically generated by TPCDSQueryTestSuite

-- !query schema
struct<customer_preferred_cust_flag:string>
-- !query output
NULL
NULL
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Loading