Skip to content

msu/csci-550-fall2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CSCI 550: Advanced Data Mining

NOTE: This is a live document and is subject to change throughout the semester.

Clustering, classification and pattern recognition; performing automated discovery of knowledge from a data set.

Meeting Times

Mon, Wed, Fri 09:00-09:50, Online

Instructor

David L. Millman, Ph.D.

Email: [email protected]

Office hours: Mon 14:00-14:50 of Fri 12:00-12:50 on webex

Office: Off campus until further notice

Github: dlm

Learning Outcomes

After successfully completing this course, students will be able to:

  • Create a system for extracting information from a large dataset
  • Evaluate the performance of that system
  • Analyze the ethical and security implications of the system

Technology

Textbook

Required:

Optional and highly recommended:

Others will be added as relevant.

(Recommended) Background

  • M 221 -- Introduction to Linear Algebra Matrix algebra, systems of linear equations, determinants, vector algebra and geometry in Euclidean 3-space, eigenvalues, eigenvectors.

  • A course in probability or statistics

  • Willingness to get your hands dirty writing code and doing math

Class schedule

The lecture schedule is subject to change throughout the semester, but here is the current plan. Assignments and due dates will be updated as they're assigned in class.

You can access the folders for Videos and Notes. File names are the date of the lecture. Videos are hosted on MSU's Box and notes are in this repository. See below for specific links.

Aug

Date Description Assigned Due Reading Links Video
08/17 Intro miro
08/19 Data DM Ch 1.1 miro 1 of 1
08/21 Sim, LA Review DM Ch 1.2 miro 1 of 1
08/24 Data: Prob View 1 DM Ch 1.3 1 of 1
08/26 Data: Prob View 2 DM Ch 1.4 1 of 1
08/28 Data: Prob View 3 HW 00 DM Ch 1.4 1 of 1
08/31 Frequent Itemset 1 DM Ch 8 miro 1 of 1

Sept

Date Description Assigned Due Reading Links Video
09/02 Frequent Itemset 2 DM Ch 8 1 of 1
09/04 Rule Mining HW 00 DM Ch 8 1 of 1
09/07 NO CLASS (Labor Day)
09/09 Rule Assessment DM Ch 8 1 of 1
09/11 Assessment (cont) HW 01 DM Ch 8 1 of 1
09/14 Recommender Systems MoMD Ch 6 1 of 1
09/16 Content Based MoMD Ch 6 1 of 1
09/18 Collab Filtering MoMD Ch 6 1 of 1
09/21 Rep Clust HW 01 DM Ch 13 1 of 1
09/23 _k_Means & Hier Clust DM Ch 13 1 of 1
09/25 Hier Clust DM Ch 14 1 of 1
09/28 Density Clust DM Ch 15 1 of 1
09/30 DB Scan DM Ch 15 1 of 1

Oct

Date Description Assigned Due Reading Links Video
10/02 Cluster assess pt 1 DM Ch 17 1 of 1
10/05 Cluster assess pt 2 HW 02 DM Ch 17 1 of 1
10/07 Classification Tasks DM Ch 22 1 of 1
10/09 Decision Trees pt 1 DM Ch 22 1 of 1
10/12 Decision Trees pt 2 DM Ch 22 1 of 1
10/14 k-Nearest Neighbors DM Ch 22 1 of 1
10/16 Classify assess pt 1 DM Ch 22 1 of 1
10/19 Classify assess pt 2 HW 02 DM Ch 22 1 of 1
10/21 Compare classifiers DM Ch 22 1 of 1
10/23 Dim reduction into DM Ch 7 1 of 1
10/26 NO CLASS - TECHNICAL ISSUES
10/28 Proj, Presentation, Hw 3 HW 03 1 of 1
10/30 Projections Present DM Ch 7 1 of 1

Nov

Date Description Assigned Due Reading Links Video
11/02 PCA Pt 1 Proj DM Ch 7 1 of 1
11/04 PCA Pt 2 DM Ch 7 1 of 1
11/06 Graph Data Proj DM Ch 4 1 of 1
11/09 Centrality HW 03 DM Ch 4 1 of 1
11/11 NO CLASS (Veteran's Day)
11/13 Intrusion detection, Fake Data Detection
11/16 Stan,Regression
11/18 Frequent Subgraph, Timeseries and TDA
11/20 Implementing DBScan, Streaming with Kafka
11/23 Project Presentations Proj
11/25 Project Presentations Proj
video Decision making
video (AWAITING CONTENT) Outlier detection
video Time series
video Subspace Clustering
video Locality Sensitive Hashing
video Linear Discriminant Analysis

|

(Potential) Upcoming Topics:

  • topological data analysis
  • data viz
  • differential privacy / ethics
  • compressed sensing
  • map reduce
  • page rank
  • approx-nearest neighbors
  • core sets
  • curves and surfaces
  • locality sensitive hashing

Evaluation

Your grade for this class will be determined by:

  • 10% Quizzes (lowest quiz is dropped) (Removed due to technical constraints)
  • 50% Homework (lowest homework is dropped)
  • 15% Group Presentation
  • 25% Group Project

Policies

Attendance

Attendance in class with not be taken but students are responsible for all material covered in class. To accommodate the challenges of remote delivery, this semester (barring technical difficulty), I will record and post lectures after class.

Assignments

There will be regular homework assignments (about every week or every other week depending on the difficulty of the assignment) consisting of written problems and coding exercises. Homeworks will be posted in the schedule. If not specified, solutions should be submitted as a PDF on Brightspace. (The tool that I use for grading documents only works with PDFs, so any file format other than PDF will receive a 0.) Homework is due at 23:59 on the due date. Late homework will not be accepted.

You do NOT need to write up your solutions with LaTex, but I highly encourage you to do so. You can find some resources for getting started with latex (and for making figures, and keeping all those files safe with git) in the student resources repo.

I encourage collaboration, see collaboration section for details.

Discussion

Group discussions, questions, and announcements will take place on the Brightspace message board. is okay to send me a direct message or email if you have a question that you feel is not appropriate to share with the class. If, however, you send me an message with a question for which the response would be useful to the rest of the class, I will likely ask you to post publicly.

Collaboration

Collaboration IS encouraged, however, all submitted individual work must be your own and you must acknowledge your collaborators at the beginning of the submission.

On any group project, every team member is expected to make a substantial contribution. The distribution of the work, however, is up to the team.

A few specifics for the assignments. You may:

  • Work with anyone in the course.
  • Share ideas with others in the course
  • Help other teams debug their code or proofs.

You may NOT:

  • Submit a proof or code that you did not write.
  • Modify another's proof or code and claim it as your own.

Using resources in addition to the course materials is encouraged. But, be sure to properly cite additional resources. Remember, it is NEVER acceptable to pass others work off as your own.

Paraphrasing or quoting another's work without citing the source is a form of academic misconduct. Even inadvertent or unintentional misuse or appropriation of another's work (such as relying heavily on source material that is not acknowledged) is considered plagiarism. If you have any questions about using and citing sources, you are expected to ask for clarification. My rule of thumb is if I am in doubt, I cite.

By participating in this class, you agree to abide by the student code of conduct. Please review the policy.

Classroom Etiquette

Please, keep your mics muted, when you are not speaking. Background noise from your surrounds can be destructing to other learners. Disruptions to the class will result in you being asked to leave the lecture and will negatively impact your grade.

Health-Related Class Absence

Please evaluate your own health status regularly and refrain from attending class and other on-campus events if you are ill. MSU students who miss class due to illness will be given opportunities to access course materials online. You are encouraged to seek appropriate medical attention for treatment of illness. In the event of contagious illness, please do not come to class or to campus to turn in work or attend class. Instead notify me by email me about your absence as soon as practical, so that accommodations can be made. Please note that documentation (a Doctor's note) for medical excuses is not required. MSU University Health Partners - as part their commitment to maintain patient confidentiality, to encourage more appropriate use of healthcare resources, and to support meaningful dialogue between instructors and students - does not provide such documentation.

Disabilities

If you are a student with a disability and wish to use your approved accommodations for this course, please contact me during my office hours to discuss. Please have your Accommodation Notification or Blue Card available for verification of accommodations. Accommodations are approved through the Office of Disability Services located in SUB 174. www.montana.edu/disabilityservices

About

Course info for Fall 2020 Data Mining

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published