Skip to content

mharris1904/hwe-labs

 
 

Repository files navigation

Hours with Experts logo

Welcome to the Hours With Experts Labs Repo

This repository will be the central location for the hands-on programming component of the course.

Course Work Overview

The goal of the course is to build an end-to-end data pipeline processing Amazon reviews.

The data pipeline you construct will look like below: Hours with Experts logo

Repo Overview

  • Week 1 - Environment Setup - Configure your environment to begin the programming course work
  • Week 2 - Spark SQL - write a Python Spark application to analyze local Amazon review data
  • Week 3 - Write to Amazon S3 - the program will now connect to Amazon S3 and write data to the storage
  • Week 4 - Kafka + Bronze layer - read from Kafka instead of the local file, and use Spark structured streaming to be output to Amazon S3 creating the Bronze layer
  • Week 5 - Silver layer - transform and enrich data from the Bronze layer, creating the Silver layer
  • Week 6 - Gold layer - define a schema for the silver layer, streams the data from the silver layer, transforms the data, and establishes the gold layer
  • TODO: Week 7 BI

Important Course Resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 81.7%
  • PowerShell 10.9%
  • Shell 7.4%