This repo consists of various machine learning algorithms:
-
Regression:
- Simple Linear Regression: We will build a simple linear regression method to estimate the amount of CO$_2$ emmitted related to a car's fuel consumption.
- Multiple Linear Regression: We will build a multiple linear regression method to estimate the amount of CO$_2$ emmitted related to a car's fuel consumption, number of cylinders and engine size.
-
Classification:
- Decision trees: We will build a decision trees model to predict the best drug medication that a new patient would respond to.
- K Nearest Neighbours: We will build a KNN model for customer segmentation of a telecom industry.
- Logistic Regression: We will build a Logistic regression model for predicting whether a customer would leave a company for its competitor or not.
- Support Vector Machines: We will build a SVM model to detect if a new patient would be diagonised with cancer.
We will also build and compare all the classifiers to predict if a customer would default his loan or not. We will use accuracy measures like Jaccard similarity, F1 score and log loss(where applicable) for comparison.
-
Clustering:
- K-Means Clustering: We will build a k-means clustering model to group the customers with similar characteristics.
- Heirarchical Clustering: We will build a heirarchical clustering model to group similar vehicles so that all the competitors fall into same category. A new car can then be fitted to its right competitors.
- Density Based Clustering: We will use density based clustering model to group all weather stations that show similar weather conditions while avoiding outliers and noise points efficiently.
Finally, we build a recommender system using collaborative filtering and content based filtering for recommending movies to a user.
Modules Used: Numpy, sklearn, pandas, matplotlib, scipy.