#Applied Machine Learning course
Tracks all resources / assignments / notes etc. for the course Modern Analytics.
##Assignments
- Assignment0: setup and Iris dataset
- Assignment1:
- Digit recognizer: Using kNearest Neighbor (Scikit/own-implementation) and Convnet (Tensorflow). Compare the two techniques. Also apply these 2 techniques on CIFAR-10 dataset and Street View House Numbers (SVNH) dataset.
- Titanic : kaggle machine learning contest. Logistic regression, SVM etc..
- Assignment2:
- Eigenfaces: SVD/PCA techniques for Face recognition on Yale Face Dataset. Comparing it with CNN visualizations for any match to prinicipal components.
- What's cooking: Kaggle competition test out different classification techniques. Logistic (discriminative) vs Naive bayes (generative) with Gaussian/Bernoulli priors.
- Assignment3:
- Sentiment analysis: Simple NLP processing pipeline. Algorithm comparison between Bag of Words, 2-gram(N-gram), PCA for BoW models. Compare these models with RNN deep learning model. Dataset used was IMDB_labelled, Yelp_labelled, Amazon_labelled dataset. Applied these techniques on Twitter dataset as well.
- EM algorithm: Implementation of EM algorithm for Gaussian Mixture Model paramter estimation. Dataset used is Old Failthful Geyser. Initialize Gaussian parameters using K-means clustering.
- Assignment4:
- Association rule learning: Implement the algorithm and apply on Project Vote Smart dataset.
- Random forests: Implement the algorithm and apply it for Image approximation.
##Written Exercises Written exercises were part of all assignments comprising of 3 questions. They are written-up in Tex format and attached as PDFs along with the .tex files. Questions covered different topics like
- Linear algebra
- Eigenvalue problem, SVD of rank deficient matrix
- LDA and least-squares correspondence
- Basic probability and stats
- Gradient and Hessian of log-likelihood of Logistic regression
- Application of Bayes rule
- Unsup
- GMM and EM algorithm details
- Procrustes algorithm
- Multidimensional scaling.
- Application
- Association rule learning
- Neural network as Function approx.
- ConvnetJS
##Course Final/Project Scene classification with following Data provided
- Supervised dataset:
- Alexnet CNN codes for the images
- SIFT image feature vectors for Bag of Visual words model
- Attribute data about the images, in the form of binary attribute vectors that indicate the presence or absence of certain key aspects of the image ("symmetrical," "open area", "horizon", etc)
- Unsupervised dataset:
- 10K similar images with 5 captions. Finally a report is written in NIPS paper template describing the details of approaches and results achieved.
Approaches evaluated to solve the problem:
- Simple supervised techniques:
- Softmax on the Alexnet CNN feature vectors
- K-means on SIFT descriptors to get the visual dicitonary. Histogram of these visual words can be used in Bag of Words model. Another idea is to use Pyramid Match kernel scheme.
- Train your own CNN feature vector using VGG model.
- Transfer learning
- Semi-supervised learning techniques
- Use the 10K image dataset with captions to train the RNN. Use the trained model to generate captions for the Training and Test images. Using these captions generate a Bag of Words dictionary and generate a model which can be used to classify the images.
- Use the Attribute data to train the CNN feature vectors by using that information to provide metadata. This can be used to generate more training data.
- Use 10K SIFT features
##Books used Required: T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition), Springer-Verlag, 2008.
Recommended: P. Harrington, Machine Learning in Action, Manning, 2012. A. Rajaraman, J. Leskovec and J. Ullman, Mining of Massive Datasets, v1.1. H. Daumé III, A Course in Machine Learning, v0.8.