Skip to content

Commit

Permalink
Merge branch 'master' into apply-in-arrow-input
Browse files Browse the repository at this point in the history
  • Loading branch information
Kimahriman committed Feb 8, 2025
2 parents 2d99a20 + ba7849e commit 7766b1d
Show file tree
Hide file tree
Showing 2,517 changed files with 79,385 additions and 17,251 deletions.
3 changes: 2 additions & 1 deletion .github/PULL_REQUEST_TEMPLATE
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ Please clarify why the changes are needed. For instance,

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such as the documentation fix.
Note that it means *any* user-facing change including all aspects such as new features, bug fixes, or other behavior changes. Documentation-only updates are not considered user-facing changes.

If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
If no, write 'No'.
Expand Down
1 change: 0 additions & 1 deletion .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,6 @@ CONNECT:
- changed-files:
- any-glob-to-any-file: [
'sql/connect/**/*',
'connector/connect/**/*',
'python/**/connect/**/*'
]

Expand Down
31 changes: 17 additions & 14 deletions .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,10 @@ on:
required: false
type: string
default: ''
secrets:
codecov_token:
description: The upload token of codecov.
required: false
jobs:
precondition:
name: Check changes
Expand Down Expand Up @@ -223,7 +227,7 @@ jobs:
needs: precondition
if: fromJson(needs.precondition.outputs.required).build == 'true'
runs-on: ubuntu-latest
timeout-minutes: 180
timeout-minutes: 120
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -354,7 +358,7 @@ jobs:
python-version: '3.11'
architecture: x64
- name: Install Python packages (Python 3.11)
if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) || contains(matrix.modules, 'connect')
if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) || contains(matrix.modules, 'connect') || contains(matrix.modules, 'yarn')
run: |
python3.11 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.67.0' 'grpcio-status==1.67.0' 'protobuf==5.29.1'
python3.11 -m pip list
Expand Down Expand Up @@ -487,7 +491,7 @@ jobs:
if: (!cancelled()) && (fromJson(needs.precondition.outputs.required).pyspark == 'true' || fromJson(needs.precondition.outputs.required).pyspark-pandas == 'true')
name: "Build modules: ${{ matrix.modules }}"
runs-on: ubuntu-latest
timeout-minutes: 180
timeout-minutes: 120
container:
image: ${{ needs.precondition.outputs.image_pyspark_url_link }}
strategy:
Expand Down Expand Up @@ -623,7 +627,7 @@ jobs:
if: fromJSON(inputs.envs).PYSPARK_CODECOV == 'true'
uses: codecov/codecov-action@v5
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
CODECOV_TOKEN: ${{ secrets.codecov_token }}
with:
files: ./python/coverage.xml
flags: unittests
Expand All @@ -650,7 +654,7 @@ jobs:
if: (!cancelled()) && fromJson(needs.precondition.outputs.required).sparkr == 'true'
name: "Build modules: sparkr"
runs-on: ubuntu-latest
timeout-minutes: 180
timeout-minutes: 120
container:
image: ${{ needs.precondition.outputs.image_sparkr_url_link }}
env:
Expand Down Expand Up @@ -745,12 +749,11 @@ jobs:
uses: bufbuild/buf-lint-action@v1
with:
input: core/src/main/protobuf
# Change 'branch-3.5' to 'branch-4.0' in master branch after cutting branch-4.0 branch.
- name: Breaking change detection against branch-3.5
- name: Breaking change detection against branch-4.0
uses: bufbuild/buf-breaking-action@v1
with:
input: sql/connect/common/src/main
against: 'https://github.com/apache/spark.git#branch=branch-3.5,subdir=connector/connect/common/src/main'
against: 'https://github.com/apache/spark.git#branch=branch-4.0,subdir=sql/connect/common/src/main'
- name: Install Python 3.11
uses: actions/setup-python@v5
with:
Expand All @@ -773,7 +776,7 @@ jobs:
if: (!cancelled()) && fromJson(needs.precondition.outputs.required).lint == 'true'
name: Linters, licenses, and dependencies
runs-on: ubuntu-latest
timeout-minutes: 180
timeout-minutes: 120
env:
LC_ALL: C.UTF-8
LANG: C.UTF-8
Expand Down Expand Up @@ -904,7 +907,7 @@ jobs:
if: (!cancelled()) && fromJson(needs.precondition.outputs.required).docs == 'true'
name: Documentation generation
runs-on: ubuntu-latest
timeout-minutes: 180
timeout-minutes: 120
env:
LC_ALL: C.UTF-8
LANG: C.UTF-8
Expand Down Expand Up @@ -1027,7 +1030,7 @@ jobs:
name: Run TPC-DS queries with SF=1
# Pin to 'Ubuntu 20.04' due to 'databricks/tpcds-kit' compilation
runs-on: ubuntu-20.04
timeout-minutes: 180
timeout-minutes: 120
env:
SPARK_LOCAL_IP: localhost
steps:
Expand Down Expand Up @@ -1129,7 +1132,7 @@ jobs:
if: fromJson(needs.precondition.outputs.required).docker-integration-tests == 'true'
name: Run Docker integration tests
runs-on: ubuntu-latest
timeout-minutes: 180
timeout-minutes: 120
env:
HADOOP_PROFILE: ${{ inputs.hadoop }}
HIVE_PROFILE: hive2.3
Expand Down Expand Up @@ -1196,7 +1199,7 @@ jobs:
if: fromJson(needs.precondition.outputs.required).k8s-integration-tests == 'true'
name: Run Spark on Kubernetes Integration test
runs-on: ubuntu-latest
timeout-minutes: 180
timeout-minutes: 120
steps:
- name: Checkout Spark repository
uses: actions/checkout@v4
Expand Down Expand Up @@ -1275,7 +1278,7 @@ jobs:
if: fromJson(needs.precondition.outputs.required).ui == 'true'
name: Run Spark UI tests
runs-on: ubuntu-latest
timeout-minutes: 180
timeout-minutes: 120
steps:
- uses: actions/checkout@v4
- name: Use Node.js
Expand Down
53 changes: 53 additions & 0 deletions .github/workflows/build_branch40.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

name: "Build (branch-4.0, Scala 2.13, Hadoop 3, JDK 17)"

on:
schedule:
- cron: '0 12 * * *'
workflow_dispatch:

jobs:
run-build:
permissions:
packages: write
name: Run
uses: ./.github/workflows/build_and_test.yml
if: github.repository == 'apache/spark'
with:
java: 17
branch: branch-4.0
hadoop: hadoop3
envs: >-
{
"SCALA_PROFILE": "scala2.13",
"PYSPARK_IMAGE_TO_TEST": "",
"PYTHON_TO_TEST": "",
"ORACLE_DOCKER_IMAGE_NAME": "gvenzl/oracle-free:23.6-slim"
}
jobs: >-
{
"build": "true",
"sparkr": "true",
"tpcds-1g": "true",
"docker-integration-tests": "true",
"k8s-integration-tests": "true",
"lint" : "true"
}
57 changes: 57 additions & 0 deletions .github/workflows/build_branch40_java21.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

name: "Build (branch-4.0, Scala 2.13, Hadoop 3, JDK 21)"

on:
schedule:
- cron: '0 5 * * *'
workflow_dispatch:

jobs:
run-build:
permissions:
packages: write
name: Run
uses: ./.github/workflows/build_and_test.yml
if: github.repository == 'apache/spark'
with:
java: 21
branch: branch-4.0
hadoop: hadoop3
envs: >-
{
"PYSPARK_IMAGE_TO_TEST": "python-311",
"PYTHON_TO_TEST": "python3.11",
"SKIP_MIMA": "true",
"SKIP_UNIDOC": "true",
"DEDICATED_JVM_SBT_TESTS": "org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormatV1Suite,org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormatV2Suite,org.apache.spark.sql.execution.datasources.orc.OrcSourceV1Suite,org.apache.spark.sql.execution.datasources.orc.OrcSourceV2Suite"
}
jobs: >-
{
"build": "true",
"pyspark": "true",
"sparkr": "true",
"tpcds-1g": "true",
"docker-integration-tests": "true",
"yarn": "true",
"k8s-integration-tests": "true",
"buf": "true",
"ui": "true"
}
35 changes: 35 additions & 0 deletions .github/workflows/build_branch40_maven.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

name: "Build / Maven (branch-4.0, Scala 2.13, Hadoop 3, JDK 17)"

on:
schedule:
- cron: '0 14 * * *'
workflow_dispatch:

jobs:
run-build:
permissions:
packages: write
name: Run
uses: ./.github/workflows/maven_test.yml
if: github.repository == 'apache/spark'
with:
branch: branch-4.0
36 changes: 36 additions & 0 deletions .github/workflows/build_branch40_maven_java21.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

name: "Build / Maven (branch-4.0, Scala 2.13, Hadoop 3, JDK 21)"

on:
schedule:
- cron: '0 14 * * *'
workflow_dispatch:

jobs:
run-build:
permissions:
packages: write
name: Run
uses: ./.github/workflows/maven_test.yml
if: github.repository == 'apache/spark'
with:
branch: branch-4.0
java: 21
53 changes: 53 additions & 0 deletions .github/workflows/build_branch40_non_ansi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

name: "Build / Non-ANSI (branch-4.0, Hadoop 3, JDK 17, Scala 2.13)"

on:
schedule:
- cron: '0 2 * * *'
workflow_dispatch:

jobs:
run-build:
permissions:
packages: write
name: Run
uses: ./.github/workflows/build_and_test.yml
if: github.repository == 'apache/spark'
with:
java: 17
branch: branch-4.0
hadoop: hadoop3
envs: >-
{
"PYSPARK_IMAGE_TO_TEST": "python-311",
"PYTHON_TO_TEST": "python3.11",
"SPARK_ANSI_SQL_MODE": "false",
}
jobs: >-
{
"build": "true",
"docs": "true",
"pyspark": "true",
"sparkr": "true",
"tpcds-1g": "true",
"docker-integration-tests": "true",
"yarn": "true"
}
Loading

0 comments on commit 7766b1d

Please sign in to comment.