Skip to content

Commit

Permalink
[SPARK-49428][SQL] Move Connect Scala Client from Connector to SQL
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
This PR moves the connect Scala JVM client project to sql. It also moves the connect/bin and connect/doc to sql.

### Why are the changes needed?
Connect is part of the sql project now. It is weird to keep these seperate.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #49695 from hvanhovell/SPARK-49428.

Authored-by: Herman van Hovell <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
  • Loading branch information
hvanhovell committed Jan 31, 2025
1 parent ecf6851 commit ece1470
Show file tree
Hide file tree
Showing 69 changed files with 21 additions and 25 deletions.
1 change: 0 additions & 1 deletion .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,6 @@ CONNECT:
- changed-files:
- any-glob-to-any-file: [
'sql/connect/**/*',
'connector/connect/**/*',
'python/**/connect/**/*'
]

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/maven_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ jobs:
if [[ "$INCLUDED_TAGS" != "" ]]; then
./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pjvm-profiler -Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} -Dtest.include.tags="$INCLUDED_TAGS" test -fae
elif [[ "$MODULES_TO_TEST" == "connect" ]]; then
./build/mvn $MAVEN_CLI_OPTS -Dtest.exclude.tags="$EXCLUDED_TAGS" -Djava.version=${JAVA_VERSION/-ea} -pl connector/connect/client/jvm,sql/connect/common,sql/connect/server test -fae
./build/mvn $MAVEN_CLI_OPTS -Dtest.exclude.tags="$EXCLUDED_TAGS" -Djava.version=${JAVA_VERSION/-ea} -pl sql/connect/client/jvm,sql/connect/common,sql/connect/server test -fae
elif [[ "$EXCLUDED_TAGS" != "" ]]; then
./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pjvm-profiler -Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} -Dtest.exclude.tags="$EXCLUDED_TAGS" test -fae
elif [[ "$MODULES_TO_TEST" == *"sql#hive-thriftserver"* ]]; then
Expand Down
4 changes: 2 additions & 2 deletions assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@
<executable>cp</executable>
<arguments>
<argument>-r</argument>
<argument>${basedir}/../connector/connect/client/jvm/target/connect-repl</argument>
<argument>${basedir}/../sql/connect/client/jvm/target/connect-repl</argument>
<argument>${basedir}/target/scala-${scala.binary.version}/jars/</argument>
</arguments>
</configuration>
Expand All @@ -206,7 +206,7 @@
<configuration>
<executable>cp</executable>
<arguments>
<argument>${basedir}/../connector/connect/client/jvm/target/spark-connect-client-jvm_${scala.binary.version}-${project.version}.jar</argument>
<argument>${basedir}/../sql/connect/client/jvm/target/spark-connect-client-jvm_${scala.binary.version}-${project.version}.jar</argument>
<argument>${basedir}/target/scala-${scala.binary.version}/jars/connect-repl</argument>
</arguments>
</configuration>
Expand Down
6 changes: 3 additions & 3 deletions dev/lint-scala
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,14 @@ ERRORS=$(./build/mvn \
-pl sql/api \
-pl sql/connect/common \
-pl sql/connect/server \
-pl connector/connect/client/jvm \
-pl sql/connect/client/jvm \
2>&1 | grep -e "Unformatted files found" \
)

if test ! -z "$ERRORS"; then
echo -e "The scalafmt check failed on sql/connect or connector/connect at following occurrences:\n\n$ERRORS\n"
echo -e "The scalafmt check failed on sql/connect or sql/connect at following occurrences:\n\n$ERRORS\n"
echo "Before submitting your change, please make sure to format your code using the following command:"
echo "./build/mvn scalafmt:format -Dscalafmt.skip=false -Dscalafmt.validateOnly=false -Dscalafmt.changedOnly=false -pl sql/api -pl sql/connect/common -pl sql/connect/server -pl connector/connect/client/jvm"
echo "./build/mvn scalafmt:format -Dscalafmt.skip=false -Dscalafmt.validateOnly=false -Dscalafmt.changedOnly=false -pl sql/api -pl sql/connect/common -pl sql/connect/server -pl sql/connect/client/jvm"
exit 1
else
echo -e "Scalafmt checks passed."
Expand Down
2 changes: 1 addition & 1 deletion dev/protobuf-breaking-changes-check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ fi

pushd sql/connect/common/src/main &&
echo "Start protobuf breaking changes checking against $BRANCH" &&
buf breaking --against "https://github.com/apache/spark.git#branch=$BRANCH,subdir=connector/connect/common/src/main" &&
buf breaking --against "https://github.com/apache/spark.git#branch=$BRANCH,subdir=sql/connect/common/src/main" &&
echo "Finsh protobuf breaking changes checking: SUCCESS"

if [[ $? -ne -0 ]]; then
Expand Down
1 change: 0 additions & 1 deletion dev/sparktestsupport/modules.py
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,6 @@ def __hash__(self):
dependencies=[hive, avro, protobuf],
source_file_regexes=[
"sql/connect",
"connector/connect",
],
sbt_test_goals=[
"connect/test",
Expand Down
4 changes: 2 additions & 2 deletions docs/_plugins/build_api_docs.rb
Original file line number Diff line number Diff line change
Expand Up @@ -149,11 +149,11 @@ def build_scala_and_java_docs
# Copy over the unified ScalaDoc for all projects to api/scala.
# This directory will be copied over to _site when `jekyll` command is run.
copy_and_update_scala_docs("../target/scala-2.13/unidoc", "api/scala")
# copy_and_update_scala_docs("../connector/connect/client/jvm/target/scala-2.13/unidoc", "api/connect/scala")
# copy_and_update_scala_docs("../sql/connect/client/jvm/target/scala-2.13/unidoc", "api/connect/scala")

# Copy over the unified JavaDoc for all projects to api/java.
copy_and_update_java_docs("../target/javaunidoc", "api/java", "api/scala")
# copy_and_update_java_docs("../connector/connect/client/jvm/target/javaunidoc", "api/connect/java", "api/connect/scala")
# copy_and_update_java_docs("../sql/connect/client/jvm/target/javaunidoc", "api/connect/java", "api/connect/scala")
end

def build_python_docs
Expand Down
2 changes: 1 addition & 1 deletion docs/spark-connect-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ res0: Long = 10L

By default, the REPL will attempt to connect to a local Spark Server on port 15002.
The connection, however, may be configured in several ways as described in this configuration
[reference](https://github.com/apache/spark/blob/master/connector/connect/docs/client-connection-string.md).
[reference](https://github.com/apache/spark/blob/master/sql/connect/docs/client-connection-string.md).

#### Set SPARK_REMOTE environment variable

Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@
<module>sql/hive</module>
<module>sql/connect/server</module>
<module>sql/connect/common</module>
<module>sql/connect/client/jvm</module>
<module>assembly</module>
<module>examples</module>
<module>repl</module>
Expand All @@ -106,7 +107,6 @@
<module>connector/kafka-0-10-assembly</module>
<module>connector/kafka-0-10-sql</module>
<module>connector/avro</module>
<module>connector/connect/client/jvm</module>
<module>connector/protobuf</module>
<!-- See additional modules enabled by profiles below -->
</modules>
Expand Down
2 changes: 1 addition & 1 deletion project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -1573,7 +1573,7 @@ object CopyDependencies {
Files.createDirectories(destDir)

val sourceAssemblyJar = Paths.get(
BuildCommons.sparkHome.getAbsolutePath, "connector", "connect", "client",
BuildCommons.sparkHome.getAbsolutePath, "sql", "connect", "client",
"jvm", "target", s"scala-$scalaBinaryVer", s"spark-connect-client-jvm-assembly-$sparkVer.jar")
val destAssemblyJar = Paths.get(destDir.toString, s"spark-connect-client-jvm-assembly-$sparkVer.jar")
Files.copy(sourceAssemblyJar, destAssemblyJar, StandardCopyOption.REPLACE_EXISTING)
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
#
# Start a local server:
# A local spark-connect server with default settings can be started using the following command:
# `connector/connect/bin/spark-connect`
# `sql/connect/bin/spark-connect`
# The client should be able to connect to this server directly with the default client settings.
#
# Connect to a remote server:
Expand Down Expand Up @@ -49,7 +49,7 @@ if [ "$SCBUILD" -eq "1" ]; then
fi

if [ -z "$SCCLASSPATH" ]; then
SCCLASSPATH=$(connector/connect/bin/spark-connect-scala-client-classpath)
SCCLASSPATH=$(sql/connect/bin/spark-connect-scala-client-classpath)
fi

JVM_ARGS="-XX:+IgnoreUnrecognizedVMOptions \
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -173,8 +173,7 @@ class ReplE2ESuite extends ConnectFunSuite with RemoteSparkSession with BeforeAn
// scalastyle:off classforname line.size.limit
val sparkHome = IntegrationTestUtils.sparkHome
val testJar = Paths
.get(
s"$sparkHome/connector/connect/client/jvm/src/test/resources/TestHelloV2_$scalaVersion.jar")
.get(s"$sparkHome/sql/connect/client/jvm/src/test/resources/TestHelloV2_$scalaVersion.jar")
.toFile

assert(testJar.exists(), "Missing TestHelloV2 jar!")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ object CheckConnectJvmClientCompatibility {
private val clientJar = {
val path = Paths.get(
sparkHome,
"connector",
"sql",
"connect",
"client",
"jvm",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ object IntegrationTestUtils {
sys.props.getOrElse("spark.test.home", sys.env("SPARK_HOME"))
}

private[sql] lazy val connectClientHomeDir = s"$sparkHome/connector/connect/client/jvm"
private[sql] lazy val connectClientHomeDir = s"$sparkHome/sql/connect/client/jvm"

private[sql] lazy val connectClientTestClassDir = {
s"$connectClientHomeDir/target/$scalaDir/test-classes"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,16 @@ import java.util.concurrent.TimeUnit

import scala.concurrent.duration.FiniteDuration

import IntegrationTestUtils._
import org.scalatest.{BeforeAndAfterAll, Suite}
import org.scalatest.concurrent.Eventually.eventually
import org.scalatest.concurrent.Futures.timeout
import org.scalatest.time.SpanSugar._

import org.apache.spark.SparkBuildInfo
import org.apache.spark.sql.connect.SparkSession
import org.apache.spark.sql.connect.client.RetryPolicy
import org.apache.spark.sql.connect.client.SparkConnectClient
import org.apache.spark.sql.connect.client.{RetryPolicy, SparkConnectClient}
import org.apache.spark.sql.connect.common.config.ConnectCommon
import org.apache.spark.sql.connect.test.IntegrationTestUtils._
import org.apache.spark.util.ArrayImplicits._

/**
Expand Down
2 changes: 1 addition & 1 deletion sql/connect/common/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Spark Common
============

See [Spark Connect Client](https://github.com/apache/spark/tree/master/connector/connect) directory
See [Spark Connect Client](https://github.com/apache/spark/tree/master/sql/connect) directory
for more information and scripts for development.
File renamed without changes.
2 changes: 1 addition & 1 deletion sql/connect/server/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Spark Connect Server
====================

See [Spark Connect Client](https://github.com/apache/spark/tree/master/connector/connect) directory
See [Spark Connect Client](https://github.com/apache/spark/tree/master/sql/connect) directory
for more information and scripts for development.

0 comments on commit ece1470

Please sign in to comment.