TWSG Data Engineering Bootcamp

If you encounter any problems during the setup or during the labs, check our troubleshooting guide.

Run bash shell in container: docker run -v $(pwd)/data:/usr/local/data/ -it sequenceiq/hadoop-docker:2.7.1 /etc/bootstrap.sh -bash
cd $HADOOP_PREFIX
execute commands listed in lab!

Define SPARK_HOME: export SPARK_HOME=$(pwd)/spark-2.3.1-bin-hadoop2.7
Activate virtual env: source .venv_data_eng_bootcamp/bin/activate
To start spark in spark shell, run: $SPARK_HOME/bin/pyspark --master local
To start spark in jupyter notebook, run: PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=notebook $SPARK_HOME/bin/pyspark --master local
To deactivate the virtual environment, run: deactivate

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
data-processing-mapreduce		data-processing-mapreduce
data		data
datatraining		datatraining
jobs		jobs
notebooks		notebooks
tests		tests
.gitignore		.gitignore
README.md		README.md
contribution-guide.md		contribution-guide.md
get_data.sh		get_data.sh
requirements.txt		requirements.txt
setup.sh		setup.sh
troubleshooting-faq.md		troubleshooting-faq.md

Provide feedback