The efficient analysis of large, complex, and rapidly changing data sets is fast becoming a necessity in nearly every large industry in today's era of big data. We are in need of new algorithms that can provide sensible solutions to problems involving large, messy, incomplete, and often rapidly changing data sets. This course is a first introduction to popular techniques, algorithms, and tools available for the analysis of real-world data sets.
- Course Content
- Syllabus
- Discussion on MSU Discord (please see D2L for invite or email me if link has expired)
Optional Readings Key:
Date | Topics | Due | Reading |
---|---|---|---|
01/17 | Introduction | ||
01/25 | Stats review | DM - chapter 1 | |
01/25 | Linear Algebra review | DM - chapter 1 | |
02/01 | Tools | quiz 1 & quiz 2 (see D2L) | |
02/03 | Tools and Data wrangling | hw 1 sol | |
02/08 | Data Viz | quiz 3 (see D2L) | |
02/10 | Cat Vars | ||
02/15 | Missing Vals | ||
02/17 | Graph Centrality | hw 2 | DM - chapter 4 |
02/22 | Prestige | DM - chapter 4 | |
02/24 | Clust Coeff | proj 1 quiz 4 (see D2L) | DM - chapter 4 |
03/01 | NetworkX | quiz 5 (see D2L) | |
03/03 | Dim Reduction | DM - chapter 7 | |
03/08 | Lin Alg 4 PCA | quiz 6 (see D2L) | DM - chapter 7 |
03/10 | PCA | hw 3 | DM - chapter 7 |
03/15 | SPRING BREAK (NO CLASS) | ||
03/17 | SPRING BREAK (NO CLASS) | ||
03/22 | PCA code | quiz 7 (see D2L) | DM - chapter 7 |
03/24 | PCA code cont | proj 2 | DM - chapter 7 |
03/29 | Clustering,K-Means | quiz 8 (see D2L) | DM - chapter 13 |
03/31 | DBSCAN - clean | hw 4 | DM - chapter 15 |
04/05 | Clust + PCA Code | quiz 9 (see D2L) | |
04/07 | F-Score | DM - chapter 22 | |
04/12 | Silhouette Coeff | quiz 10 (see D2L) hw 5 | DM - chapter 17 |
04/14 | Hierarchical Clustering | proj 4.1 | DM - chapter 14 |
04/19 | Classification | proj 3 | DM - chapters 18 & 21 |
04/21 | Itemset Mining | DM - chapter 8 | |
04/26 | Exam Review | quiz 11 (see D2L) | |
04/28 | Exam | ||
05/03 | BotNot | ||
05/05 | Data Ethics | ||
05/12 | Meeting 14:00-15:50 | proj 4.2-4 |
The lecture schedule is subject to change throughout the semester, but here is the current plan.
Week | Topics | Due |
---|---|---|
12: 04/11 | Classification | quiz, homework |
13: 04/18 | Classification | quiz, proj 3: dim reduction & clustering |
14: 04/25 | Adv topic (TBD) | quiz, exam |
15: 05/02 | Ethics | quiz, homework |
16: 05/09 | Final Project Presentations | final project |
Potential advanced topics:
- itemset mining
- text mining
- recommender systems
- massive data processing
- topological and geometric data analysis
This course was originally developed by Veronika Strnadova-Neeley