NOTE: This is a live document and is subject to change throughout the semester.
Clustering, classification and pattern recognition; performing automated discovery of knowledge from a data set.
Mon, Wed, Fri 09:00-09:50, Online
David L. Millman, Ph.D.
Email: [email protected]
Office hours: Mon 14:00-14:50 of Fri 12:00-12:50 on webex
Office: Off campus until further notice
Github: dlm
After successfully completing this course, students will be able to:
- Create a system for extracting information from a large dataset
- Evaluate the performance of that system
- Analyze the ethical and security implications of the system
- Zoom
- Miro
- Collaboration device (e.g. Wacom one pen tablet)
Required:
- Data Mining and Machine Learning: Fundamental Concepts and Algorithms, 2nd Edition by Mohammed J. Zaki, Wagner Meira, Jr., (DM below)
Optional and highly recommended:
- Mathematical Foundations for Data Analysis by Jeff Phillips (M4D below)
- Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, and Jeffrey D. Ullman (MoMD below)
- Lecture Notes from Modern Algorithmic Toolbox by Tim Roughgarden and Greg Valiant (MAT below)
- Dimension Reduction: A Guided Tour by Chris J.C. Burges) (DIM below)
- Foundations of Data Science by Avrim Blum, John Hopcroft, and Ravindran Kannan (FoDS below)
Others will be added as relevant.
-
M 221 -- Introduction to Linear Algebra Matrix algebra, systems of linear equations, determinants, vector algebra and geometry in Euclidean 3-space, eigenvalues, eigenvectors.
-
A course in probability or statistics
-
Willingness to get your hands dirty writing code and doing math
The lecture schedule is subject to change throughout the semester, but here is the current plan. Assignments and due dates will be updated as they're assigned in class.
You can access the folders for Videos and Notes. File names are the date of the lecture. Videos are hosted on MSU's Box and notes are in this repository. See below for specific links.
Date | Description | Assigned | Due | Reading | Links | Video |
---|---|---|---|---|---|---|
08/17 | Intro | miro | ||||
08/19 | Data | DM Ch 1.1 | miro | 1 of 1 | ||
08/21 | Sim, LA Review | DM Ch 1.2 | miro | 1 of 1 | ||
08/24 | Data: Prob View 1 | DM Ch 1.3 | 1 of 1 | |||
08/26 | Data: Prob View 2 | DM Ch 1.4 | 1 of 1 | |||
08/28 | Data: Prob View 3 | HW 00 | DM Ch 1.4 | 1 of 1 | ||
08/31 | Frequent Itemset 1 | DM Ch 8 | miro | 1 of 1 |
Date | Description | Assigned | Due | Reading | Links | Video |
---|---|---|---|---|---|---|
09/02 | Frequent Itemset 2 | DM Ch 8 | 1 of 1 | |||
09/04 | Rule Mining | HW 00 | DM Ch 8 | 1 of 1 | ||
09/07 | NO CLASS (Labor Day) | |||||
09/09 | Rule Assessment | DM Ch 8 | 1 of 1 | |||
09/11 | Assessment (cont) | HW 01 | DM Ch 8 | 1 of 1 | ||
09/14 | Recommender Systems | MoMD Ch 6 | 1 of 1 | |||
09/16 | Content Based | MoMD Ch 6 | 1 of 1 | |||
09/18 | Collab Filtering | MoMD Ch 6 | 1 of 1 | |||
09/21 | Rep Clust | HW 01 | DM Ch 13 | 1 of 1 | ||
09/23 | _k_Means & Hier Clust | DM Ch 13 | 1 of 1 | |||
09/25 | Hier Clust | DM Ch 14 | 1 of 1 | |||
09/28 | Density Clust | DM Ch 15 | 1 of 1 | |||
09/30 | DB Scan | DM Ch 15 | 1 of 1 |
Date | Description | Assigned | Due | Reading | Links | Video |
---|---|---|---|---|---|---|
10/02 | Cluster assess pt 1 | DM Ch 17 | 1 of 1 | |||
10/05 | Cluster assess pt 2 | HW 02 | DM Ch 17 | 1 of 1 | ||
10/07 | Classification Tasks | DM Ch 22 | 1 of 1 | |||
10/09 | Decision Trees pt 1 | DM Ch 22 | 1 of 1 | |||
10/12 | Decision Trees pt 2 | DM Ch 22 | 1 of 1 | |||
10/14 | k-Nearest Neighbors | DM Ch 22 | 1 of 1 | |||
10/16 | Classify assess pt 1 | DM Ch 22 | 1 of 1 | |||
10/19 | Classify assess pt 2 | HW 02 | DM Ch 22 | 1 of 1 | ||
10/21 | Compare classifiers | DM Ch 22 | 1 of 1 | |||
10/23 | Dim reduction into | DM Ch 7 | 1 of 1 | |||
10/26 | NO CLASS - TECHNICAL ISSUES | |||||
10/28 | Proj, Presentation, Hw 3 | HW 03 | 1 of 1 | |||
10/30 | Projections | Present | DM Ch 7 | 1 of 1 |
Date | Description | Assigned | Due | Reading | Links | Video |
---|---|---|---|---|---|---|
11/02 | PCA Pt 1 | Proj | DM Ch 7 | 1 of 1 | ||
11/04 | PCA Pt 2 | DM Ch 7 | 1 of 1 | |||
11/06 | Graph Data | Proj | DM Ch 4 | 1 of 1 | ||
11/09 | Centrality | HW 03 | DM Ch 4 | 1 of 1 | ||
11/11 | NO CLASS (Veteran's Day) | |||||
11/13 | Intrusion detection, Fake Data Detection | |||||
11/16 | Stan,Regression | |||||
11/18 | Frequent Subgraph, Timeseries and TDA | |||||
11/20 | Implementing DBScan, Streaming with Kafka | |||||
11/23 | Project Presentations | Proj | ||||
11/25 | Project Presentations | Proj | ||||
video | Decision making | |||||
video | (AWAITING CONTENT) Outlier detection | |||||
video | Time series | |||||
video | Subspace Clustering | |||||
video | Locality Sensitive Hashing | |||||
video | Linear Discriminant Analysis |
|
- topological data analysis
- data viz
- differential privacy / ethics
- compressed sensing
- map reduce
- page rank
- approx-nearest neighbors
- core sets
- curves and surfaces
- locality sensitive hashing
Your grade for this class will be determined by:
10% Quizzes (lowest quiz is dropped)(Removed due to technical constraints)- 50% Homework (lowest homework is dropped)
- 15% Group Presentation
- 25% Group Project
Attendance in class with not be taken but students are responsible for all material covered in class. To accommodate the challenges of remote delivery, this semester (barring technical difficulty), I will record and post lectures after class.
There will be regular homework assignments (about every week or every other week depending on the difficulty of the assignment) consisting of written problems and coding exercises. Homeworks will be posted in the schedule. If not specified, solutions should be submitted as a PDF on Brightspace. (The tool that I use for grading documents only works with PDFs, so any file format other than PDF will receive a 0.) Homework is due at 23:59 on the due date. Late homework will not be accepted.
You do NOT need to write up your solutions with LaTex, but I highly encourage you to do so. You can find some resources for getting started with latex (and for making figures, and keeping all those files safe with git) in the student resources repo.
I encourage collaboration, see collaboration section for details.
Group discussions, questions, and announcements will take place on the Brightspace message board. is okay to send me a direct message or email if you have a question that you feel is not appropriate to share with the class. If, however, you send me an message with a question for which the response would be useful to the rest of the class, I will likely ask you to post publicly.
Collaboration IS encouraged, however, all submitted individual work must be your own and you must acknowledge your collaborators at the beginning of the submission.
On any group project, every team member is expected to make a substantial contribution. The distribution of the work, however, is up to the team.
A few specifics for the assignments. You may:
- Work with anyone in the course.
- Share ideas with others in the course
- Help other teams debug their code or proofs.
You may NOT:
- Submit a proof or code that you did not write.
- Modify another's proof or code and claim it as your own.
Using resources in addition to the course materials is encouraged. But, be sure to properly cite additional resources. Remember, it is NEVER acceptable to pass others work off as your own.
Paraphrasing or quoting another's work without citing the source is a form of academic misconduct. Even inadvertent or unintentional misuse or appropriation of another's work (such as relying heavily on source material that is not acknowledged) is considered plagiarism. If you have any questions about using and citing sources, you are expected to ask for clarification. My rule of thumb is if I am in doubt, I cite.
By participating in this class, you agree to abide by the student code of conduct. Please review the policy.
Please, keep your mics muted, when you are not speaking. Background noise from your surrounds can be destructing to other learners. Disruptions to the class will result in you being asked to leave the lecture and will negatively impact your grade.
Please evaluate your own health status regularly and refrain from attending class and other on-campus events if you are ill. MSU students who miss class due to illness will be given opportunities to access course materials online. You are encouraged to seek appropriate medical attention for treatment of illness. In the event of contagious illness, please do not come to class or to campus to turn in work or attend class. Instead notify me by email me about your absence as soon as practical, so that accommodations can be made. Please note that documentation (a Doctor's note) for medical excuses is not required. MSU University Health Partners - as part their commitment to maintain patient confidentiality, to encourage more appropriate use of healthcare resources, and to support meaningful dialogue between instructors and students - does not provide such documentation.
If you are a student with a disability and wish to use your approved accommodations for this course, please contact me during my office hours to discuss. Please have your Accommodation Notification or Blue Card available for verification of accommodations. Accommodations are approved through the Office of Disability Services located in SUB 174. www.montana.edu/disabilityservices