README Update

DareData · Jun 28, 2024 · f416016 · f416016
1 parent 85e181f
commit f416016
Show file tree

Hide file tree

Showing 3 changed files with 35 additions and 6 deletions.
diff --git a/01_spark_intro/README.md b/01_spark_intro/README.md
@@ -7,15 +7,15 @@ This module contains a series of notebooks that introduce and explore key concep
 
 ## Notebooks
 
-1. Introduction to Big Data and Spark:
+1. **Introduction to Big Data and Spark**
 
     Overview of Big Data concepts and introduction of Apache Spark, highlighting its architecture, key features, and components.
 
-2. Introduction to Databricks Environment:
+2. **Introduction to Databricks Environment**
 
     Introduction to the Databricks environment, including how to run shell commands, interact with the Databricks Filesystem, and execute SQL within Databricks cells.
 
-3. Pyspark RDDs:
+3. **Pyspark RDDs**
 
     Explores Resilient Distributed Datasets (RDDs) in PySpark, covering their creation, transformations, and actions, with hands-on examples to demonstrate these concepts.
 

diff --git a/02_pyspark_dataframes/README.md b/02_pyspark_dataframes/README.md
@@ -1,3 +1,32 @@
-# TODO
+# PySpark DataFrames Basics
 
-Explain import to DataBricks
+Welcome to the PySpark DataFrames Basics module! ✴️
+
+This module introduces the basics of working with DataFrames in PySpark. DataFrames are a key abstraction in PySpark that provide a more user-friendly interface for working with structured data.
+
+In this module, you will learn how to create DataFrames, perform basic operations on them, and understand the underlying concepts that drive PySpark's DataFrame API.
+
+To do so, you'll be looking into orders data from an e-commerce platform. You'll be using PySpark to load this data into a DataFrame and perform various operations on it to answer business questions.
+
+## Notebooks
+
+1. **PySpark DataFrames Part 1**
+
+    Introduction to PySpark DataFrames, covering their key features, advantages over RDDs, how to create DataFrames from different data sources and basic operations like selecting, filtering and creating new columns.
+
+    Also explores a lot of PySpark SQL functions that can be used to manipulate DataFrames.
+
+2. **PySpark DataFrames Part 2**
+
+    Explores more advanced operations on PySpark DataFrames, including grouping and aggregation, sorting and joining.
+
+
+## Running the Notebooks
+
+All notebooks in this module are designed to be run in the **Databricks Community Edition**. Detailed steps to set up and configure your environment are provided in the first module.
+
+If you need, go back to `2-Databricks-Environment` notebook in module `01_spark_intro` and follow the instructions there to ensure you have the necessary setup to run these notebooks successfully.
+
+---
+
+Happy Learning!
diff --git a/README.md b/README.md
@@ -19,7 +19,7 @@ This collection of modules is designed to help you learn how to work with Big Da
 
     Introduces advanced PySpark topics such as User-Defined Functions (UDFs), window functions, and working with complex data structures like arrays and structs.
 
-4. Final Project
+4. **Final Project**
 
     A final project that brings together the concepts covered in the previous modules. You will work on a real-world dataset, applying your knowledge of Spark to analyze and derive insights from the data.