Skip to content

Commit

Permalink
Metaclient: deprecate Hadoop 2 (#6399)
Browse files Browse the repository at this point in the history
  • Loading branch information
johnnyaug authored Aug 15, 2023
1 parent c5be8c1 commit 6919547
Show file tree
Hide file tree
Showing 6 changed files with 35 additions and 28 deletions.
11 changes: 7 additions & 4 deletions .github/workflows/esti.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -645,11 +645,14 @@ jobs:
matrix:
spark:
- project-variable: core3
project-suffix: 301
project-suffix: "-301"
artifact-suffix: ""
- project-variable: core312
project-suffix: 312-hadoop3
project-suffix: "-312-hadoop3"
artifact-suffix: "-hadoop3"
- project-variable: core
project-suffix: ""
artifact-suffix: ""
env:
TAG: ${{ needs.deploy-image.outputs.tag }}
REPO: ${{ secrets.AWS_ACCOUNT_ID }}.dkr.ecr.us-east-1.amazonaws.com
Expand All @@ -674,7 +677,7 @@ jobs:
if: steps.restore-cache.outputs.cache-hit != 'true'
working-directory: clients/spark
run: |
sbt 'set ${{ matrix.spark.project-variable }} / assembly / test := {}' lakefs-spark-client-${{ matrix.spark.project-suffix }}/assembly
sbt 'set ${{ matrix.spark.project-variable }} / assembly / test := {}' lakefs-spark-client${{ matrix.spark.project-suffix }}/assembly
- name: Prepare Metaclient location for export
if: steps.restore-cache.outputs.cache-hit != 'true'
Expand All @@ -684,7 +687,7 @@ jobs:
working-directory: clients/spark
run: |
mkdir -p ${{ github.workspace }}/test/spark/metaclient
cp target/core-${{ matrix.spark.project-suffix }}/scala-2.12/lakefs-spark-client-${{ matrix.spark.project-suffix }}-assembly*.jar ${{ github.workspace }}/test/spark/metaclient/spark-assembly${{ matrix.spark.artifact-suffix }}.jar
cp target/core${{ matrix.spark.project-suffix }}/scala-2.12/lakefs-spark-client${{ matrix.spark.project-suffix }}-assembly*.jar ${{ github.workspace }}/test/spark/metaclient/spark-assembly${{ matrix.spark.artifact-suffix }}.jar
metadata-client-export-spark3:
name: Test lakeFS metadata client export with Spark 3.x
Expand Down
12 changes: 6 additions & 6 deletions clients/spark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,23 +16,23 @@ The Uber-Jar can be found on a public S3 location:

It should be used when running into conflicting dependencies on environments like EMR, Databricks, etc.

For Spark 3.1.2+:
For Spark for Hadoop 3:
http://treeverse-clients-us-east.s3-website-us-east-1.amazonaws.com/lakefs-spark-client-312-hadoop3/${CLIENT_VERSION}/lakefs-spark-client-312-hadoop3-assembly-${CLIENT_VERSION}.jar

For Spark 3.0.1:
For Spark for Hadoop 2 (deprecated):
http://treeverse-clients-us-east.s3-website-us-east-1.amazonaws.com/lakefs-spark-client-301/${CLIENT_VERSION}/lakefs-spark-client-301-assembly-${CLIENT_VERSION}.jar


### Maven
Otherwise, the client can be included using Maven coordinates:

For Spark 3.1.2+:
For Spark for Hadoop 3:
```
io.lakefs:lakefs-spark-client-312-hadoop3_2.12:<version>
```
[See available versions](https://mvnrepository.com/artifact/io.lakefs/lakefs-spark-client-312-hadoop3_2.12).

For Spark 3.0.1:
For Spark for Hadoop 2 (deprecated):
```
io.lakefs:lakefs-spark-client-301_2.12:<version>
```
Expand All @@ -41,7 +41,7 @@ io.lakefs:lakefs-spark-client-301_2.12:<version>
## Usage Examples
### Export using spark-submit

Replace `<version>` below with the latest version available. See available versions for [Spark 3.1.2+](https://mvnrepository.com/artifact/io.lakefs/lakefs-spark-client-312-hadoop3_2.12) or [Spark 3.0.1](https://mvnrepository.com/artifact/io.lakefs/lakefs-spark-client-301_2.12).
Replace `<version>` below with the latest version available. See available versions for [Spark for Hadoop 3](https://mvnrepository.com/artifact/io.lakefs/lakefs-spark-client-312-hadoop3_2.12) or [Spark for Hadoop 2](https://mvnrepository.com/artifact/io.lakefs/lakefs-spark-client-301_2.12) (deprecated).

```
CLIENT_VERSION=0.8.1
Expand All @@ -60,7 +60,7 @@ spark-submit --conf spark.hadoop.lakefs.api.url=https://lakefs.example.com/api/v

### Export using spark-submit (uber-jar)

Replace `<version>` below with the latest version available. See available versions for [Spark 3.1.2+](https://mvnrepository.com/artifact/io.lakefs/lakefs-spark-client-312-hadoop3_2.12) or [Spark 3.0.1](https://mvnrepository.com/artifact/io.lakefs/lakefs-spark-client-301_2.12).
Replace `<version>` below with the latest version available. See available versions for [Spark for Hadoop 3](https://mvnrepository.com/artifact/io.lakefs/lakefs-spark-client-312-hadoop3_2.12) or [Spark for Hadoop 2](https://mvnrepository.com/artifact/io.lakefs/lakefs-spark-client-301_2.12) (deprecated).
```
CLIENT_VERSION=0.8.1
SPARK_VERSION=301 # or 312-hadoop3
Expand Down
29 changes: 17 additions & 12 deletions clients/spark/build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import build.BuildType

lazy val baseName = "lakefs-spark"
lazy val projectVersion = "0.9.1"
lazy val hadoopVersion = "3.2.1"

ThisBuild / isSnapshot := false
ThisBuild / scalaVersion := "2.12.12"
Expand All @@ -24,8 +25,8 @@ def settingsToCompileIn(dir: String, flavour: String = "") = {
allSettings ++ flavourSettings
}

def generateCoreProject(buildType: BuildType) =
Project(s"${baseName}-client-${buildType.name}", file(s"core"))
def generateCoreProject(buildType: BuildType) = {
Project(s"${baseName}-client${buildType.suffix}", file("core"))
.settings(
sharedSettings,
if (buildType.hadoopFlavour == "hadoop2") hadoop2ShadingSettings
Expand All @@ -51,14 +52,14 @@ def generateCoreProject(buildType: BuildType) =

// Uncomment to get (very) full stacktraces in test:
// Test / testOptions += Tests.Argument("-oF"),
target := file(s"target/core-${buildType.name}/"),
target := file(s"target/core${buildType.suffix}/"),
buildInfoKeys := Seq[BuildInfoKey](name, version, scalaVersion, sbtVersion),
buildInfoPackage := "io.treeverse.clients"
)
.enablePlugins(S3Plugin, BuildInfoPlugin)

}
def generateExamplesProject(buildType: BuildType) =
Project(s"${baseName}-examples-${buildType.name}", file(s"examples"))
Project(s"${baseName}-examples${buildType.suffix}", file(s"examples"))
.settings(
sharedSettings,
settingsToCompileIn("examples", buildType.hadoopFlavour),
Expand All @@ -70,32 +71,36 @@ def generateExamplesProject(buildType: BuildType) =
"com.amazonaws" % "aws-java-sdk-bundle" % "1.12.194"
),
assembly / mainClass := Some("io.treeverse.examples.List"),
target := file(s"target/examples-${buildType.name}/"),
target := file(s"target/examples${buildType.suffix}/"),
run / fork := false // https://stackoverflow.com/questions/44298847/sbt-spark-fork-in-run
)

lazy val spark3Type =
new BuildType("301", "3.0.1", "0.10.11", "2.7.7", "hadoop2", "hadoop2-2.0.1")
new BuildType("-301", "3.0.1", "0.10.11", "hadoop2", "hadoop2-2.0.1")

// EMR-6.5.0 beta, managed GC
lazy val spark312Type =
new BuildType("312-hadoop3", "3.1.2", "0.10.11", "3.2.1", "hadoop3", "hadoop3-2.0.1")
new BuildType("-312-hadoop3", "3.1.2", "0.10.11", "hadoop3", "hadoop3-2.0.1")

lazy val coreType =
new BuildType("", "3.1.2", "0.10.11", "hadoop3", "hadoop3-2.0.1")
lazy val core = generateCoreProject(coreType)
lazy val core3 = generateCoreProject(spark3Type)
lazy val core312 = generateCoreProject(spark312Type)
lazy val examples3 = generateExamplesProject(spark3Type).dependsOn(core3)
lazy val examples312 = generateExamplesProject(spark312Type).dependsOn(core312)

lazy val root =
(project in file(".")).aggregate(core3, core312, examples3, examples312)
(project in file(".")).aggregate(core, core3, core312, examples3, examples312)

def getSharedLibraryDependencies(buildType: BuildType): Seq[ModuleID] = {
Seq(
"io.lakefs" % "api-client" % "0.91.0",
"org.apache.spark" %% "spark-sql" % buildType.sparkVersion % "provided",
"com.thesamet.scalapb" %% "scalapb-runtime" % scalapb.compiler.Version.scalapbVersion % "protobuf",
"org.apache.hadoop" % "hadoop-aws" % buildType.hadoopVersion % "provided",
"org.apache.hadoop" % "hadoop-common" % buildType.hadoopVersion % "provided",
"org.apache.hadoop" % "hadoop-azure" % buildType.hadoopVersion % "provided",
"org.apache.hadoop" % "hadoop-aws" % hadoopVersion % "provided",
"org.apache.hadoop" % "hadoop-common" % hadoopVersion % "provided",
"org.apache.hadoop" % "hadoop-azure" % hadoopVersion % "provided",
"org.json4s" %% "json4s-native" % "3.6.12",
"org.rogach" %% "scallop" % "4.0.3",
"com.azure" % "azure-core" % "1.10.0",
Expand Down
3 changes: 1 addition & 2 deletions clients/spark/project/types.scala
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
package build

class BuildType(
val name: String,
val suffix: String,
val sparkVersion: String,
val scalapbVersion: String,
val hadoopVersion: String,
val hadoopFlavour: String, // If set, a directory of additional sources to compile
val gcpConnectorVersion: String
)
4 changes: 2 additions & 2 deletions docs/howto/garbage-collection/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,8 @@ To run the job, use the following `spark-submit` command (or using your preferre

<div class="tabs">
<ul>
<li><a href="#aws-option">On AWS (Spark 3.1.2 and higher)</a></li>
<li><a href="#aws-301-option">On AWS (Spark 3.0.1)</a></li>
<li><a href="#aws-option">On AWS</a></li>
<li><a href="#aws-301-option">On AWS (Hadoop 2 - deprecated)</a></li>
<li><a href="#azure-option">On Azure</a></li>
<li><a href="#gcp-option">On GCP</a></li>
</ul>
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/spark-client.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ Start Spark Shell / PySpark with the `--packages` flag:

<div class="tabs">
<ul>
<li><a href="#packages-3-hadoop2">Spark 3.x</a></li>
<li><a href="#packages-3-hadoop3">Spark 3.x on Hadoop 3.x</a></li>
<li><a href="#packages-3-hadoop2">Spark 3.x for Hadoop 2 (deprecated)</a></li>
<li><a href="#packages-3-hadoop3">Spark 3.x for Hadoop 3</a></li>
</ul>
<div markdown="1" id="packages-3-hadoop2">
This client is compiled for Spark 3.0.1 with Hadoop 2 and tested with it, but can work for
Expand Down

0 comments on commit 6919547

Please sign in to comment.