Skip to content

This repository contains homework submissions for the Data Engineering Zoomcamp 2025 course by DataTalks.Club.

Notifications You must be signed in to change notification settings

jschuller/dezoomcamp-2025-hw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Data Engineering Zoomcamp 2025 - Homework Repository

This repository contains homework submissions for the Data Engineering Zoomcamp 2025 course by DataTalks.Club.

Project Overview

A comprehensive exploration of modern data engineering practices, integrating enterprise-grade tools and methodologies:

graph LR
    B1[Containerization<br>& IaC] -->|Docker<br>Terraform| B2[Orchestration]
    B2 -->|Mage| B3[Data Warehouse]
    B3 -->|BigQuery| B4[Analytics]
    B4 -->|dbt| B5[Processing]
    B5 -->|Spark| B6[Streaming]
    B6 -->|Kafka| B7[End State]
Loading

Module Structure

Each module demonstrates the practical application of data engineering principles in an enterprise context:

  • module1/: Docker, Terraform, and Infrastructure as Code
    • Container orchestration
    • Infrastructure automation
    • Local and cloud deployment patterns

Future modules will include:

  • Workflow Orchestration with Mage
  • Data Warehouse Implementation
  • Analytics Engineering with dbt
  • Batch Processing with Spark
  • Stream Processing with Kafka

Architecture Highlights

  • Modular Design: Each component is containerized and independently scalable
  • Infrastructure as Code: Automated deployment and configuration
  • Data Governance: Implemented through clear documentation and version control
  • Enterprise Integration: Focus on maintainable, production-ready solutions

Technical Implementation

Key aspects of the implementation:

flowchart TD
    subgraph infra[Infrastructure Layer]
        docker[Docker Containers]
        iac[Infrastructure as Code]
    end
    
    subgraph pipeline[Data Pipeline]
        ingest[Data Ingestion]
        transform[Transformation]
        load[Data Loading]
    end
    
    subgraph analytics[Analytics Layer]
        dw[Data Warehouse]
        bi[Business Intelligence]
    end
    
    infra --> pipeline
    pipeline --> analytics
Loading

Getting Started

  1. Clone the repository:
git clone https://github.com/jschuller/dezoomcamp-2025-hw.git
  1. Each module contains:
    • Detailed documentation
    • Implementation code
    • Infrastructure configurations
    • Test cases and examples

Acknowledgments

This work is based on the excellent curriculum provided by DataTalks.Club. The original course material can be found here.

About

This repository contains homework submissions for the Data Engineering Zoomcamp 2025 course by DataTalks.Club.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published