-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] Replace examples of Hadoop catalog with JDBC & REST catalog #11845
base: main
Are you sure you want to change the base?
[docs] Replace examples of Hadoop catalog with JDBC & REST catalog #11845
Conversation
00ca569
to
6fe50e1
Compare
496e51e
to
63c9a1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note, there are two "getting started" docs
this one and site/docs/spark-quickstart.md
@@ -269,42 +273,104 @@ To read a table, simply use the Iceberg table's name. | |||
|
|||
### Adding A Catalog | |||
|
|||
Iceberg has several catalog back-ends that can be used to track tables, like JDBC, Hive MetaStore and Glue. | |||
Catalogs are configured using properties under `spark.sql.catalog.(catalog_name)`. In this guide, | |||
we use JDBC, but you can follow these instructions to configure other catalog types. To learn more, check out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird that the guide already mention JDBC here, but the example is still hadoop
site/docs/spark-quickstart.md
Outdated
- [Configuring JDBC Catalog](#configuring-jdbc-catalog) | ||
- [Configuring REST Catalog](#configuring-rest-catalog) | ||
- [Next steps](#next-steps) | ||
- [Adding Iceberg to Spark](#adding-iceberg-to-spark) | ||
- [Learn More](#learn-more) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--conf spark.sql.catalog.local.warehouse=$PWD/warehouse | ||
``` | ||
|
||
For example configuring a REST-based catalog, see [Configuring REST Catalog](/spark-quickstart#configuring-rest-catalog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of repeating here for configuring REST catalog, just link to site/docs/spark-quickstart.md
. I double checked the link here locally
--conf spark.sql.catalog.local.type=jdbc \ | ||
--conf spark.sql.catalog.local.uri=jdbc:sqlite:$PWD/iceberg_catalog_db.sqlite \ | ||
--conf spark.sql.catalog.local.warehouse=$PWD/warehouse \ | ||
--conf spark.sql.defaultCatalog=local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add defaultCatalog
to match other pages
spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions | ||
spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog | ||
spark.sql.catalog.spark_catalog.type hive | ||
spark.sql.catalog.local org.apache.iceberg.spark.SparkCatalog | ||
spark.sql.catalog.local.type hadoop | ||
spark.sql.catalog.local.warehouse $PWD/warehouse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$PWD
does not expand in spark-defaults.conf
. keeping this here will create a folder named $PWD
site/docs/spark-quickstart.md
Outdated
|
||
=== "CLI" | ||
|
||
```sh | ||
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }}\ | ||
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }},org.xerial:sqlite-jdbc:3.46.1.3 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taking on this extra dep since i dont see any iceberg specific package i can use. there is a hive-jdbc
package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Thanks for improving this!
|
||
This command creates a path-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog: | ||
This command creates a JDBC-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command creates a JDBC-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog. | |
This command creates a JDBC-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in `spark_catalog` using the Hive connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something like this,
This command creates a JDBC-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog. | |
This command creates a JDBC-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog (`spark_catalog`) using the Hive connector. |
Is it the "Hive connector" or the "Hive Metastore"?
But I'm also incline not to add this. I feel like this is too detailed for a "getting started" page.
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
Closes #11284
devlist discussion
This PR replaces examples of Hadoop catalog with examples of JDBC catalog and add examples of setting up a REST catalog
Testing
spark-quickstart.md
using JDBC catalogUsing
spark-sql
CLI config:Using
spark-defaults.conf
file:spark-quickstart.md
using REST catalogWith
spark-sql
CLI config:With
spark-defaults.conf
file:Rendered Docs
site/docs/spark-quickstart.md
(http://127.0.0.1:8000/spark-quickstart/#adding-catalogs
)docs/docs/spark-getting-started.md
(http://127.0.0.1:8000/docs/nightly/spark-getting-started/#adding-catalogs
)site/docs/how-to-release.md
(http://127.0.0.1:8000/how-to-release/#verifying-with-spark
)