Welcome to the Hours With Experts Labs Repo

This repository will be the central location for the hands-on programming component of the course.

Course Work Overview

The goal of the course is to build an end-to-end data pipeline processing Amazon reviews.

The data pipeline you construct will look like below:

Week 1 - Environment Setup - Configure your environment to begin the programming course work
Week 2 - Spark SQL - write a Python Spark application to analyze local Amazon review data
Week 3 - Write to Amazon S3 - the program will now connect to Amazon S3 and write data to the storage
Week 4 - Kafka + Bronze layer - read from Kafka instead of the local file, and use Spark structured streaming to be output to Amazon S3 creating the Bronze layer
Week 5 - Silver layer - transform and enrich data from the Bronze layer, creating the Silver layer
Week 6 - Gold layer - define a schema for the silver layer, streams the data from the silver layer, transforms the data, and establishes the gold layer
TODO: Week 7 BI

Name		Name	Last commit message	Last commit date
Latest commit History 197 Commits
resources		resources
week1_welcome		week1_welcome
week2_sql		week2_sql
week3_python		week3_python
week4_kafka_bronze		week4_kafka_bronze
week5_silver		week5_silver
week6_gold		week6_gold
week7_bi		week7_bi
.gitignore		.gitignore
README.md		README.md
sample.env		sample.env