diff --git a/rpc/spark_erdos_setup.md b/rpc/spark_erdos_setup.md index c77d2b31..0a6775fc 100644 --- a/rpc/spark_erdos_setup.md +++ b/rpc/spark_erdos_setup.md @@ -36,7 +36,7 @@ make Running `./dbgen` above creates a dataset of scale factor `s` of `1` (default) i.e. 1GB. -> NOTE: Had updated the scala version to 2.13.0 in tpch.sbt +> NOTE: Had updated the scala version to 2.13.0 in tpch.sbt. The sbt version used was `1.9.7`. Next, we build the target for `tpch-spark`: ```bash @@ -211,6 +211,9 @@ The above job submission is parameterized by `(DEADLINE, QUERY_NUM, DATASET_SIZE `(120, 4, 50, 50)`. > Refer to `launch_expt_script.py` in `tpch-spark` for more details on eligible values for these parameters and how they are used. +> NOTE: By default, env variable `TPCH_INPUT_DATA_DIR` will look for `dbgen` inside the current working directory. While it works for `spark-submit` +> issued from inside the `tpch-spark` repository, it needs to be explicitly set otherwise. + Once submitted, review the application's runtime status on the Spark Web UI. ### Shutdown cluster