Homework repository for ML_Zoomcamp 2024
Sep 2024 - Jan 2025
This folder contains the homework and projects for the course. Each week has its own folder. The homework is in the form of Jupyter notebooks.
I also keep track of my progress in this README.md
file. I will update it every week.
Homework 1: Introduction to Machine Learning due 1 October 2024 01:00
Extra material in subfolder contains the setup of the environment, the portfolio page, and other things I learned during the first week of course.
- Set up the environment
- locally on Mac
- on AWS
- on GCP
- on Azure
- Kaggle
- Colab
- Finished the homework (scored 6/7)
- Q7. Sum of weights was wrong
- Learn in public 1: Setting up Portfolio homepage medium published on 23 September 2024
- Learn in public 2 & 3: Setting up the environments
Homework 2: Machine Learning for Regression due 8 October 2024 01:00, extended to 10 October 2024 01:00
- Finished the homework (scored 6/6)
- Learn in public 1: weekly learning LinkedIn on 30 September 2024
- Learn in public 2 & 3: Explore more California housing dataset - EDA & predict the price of a house
- Learn in public 4 & 5: Setup iTerm2 for ML Zoomcamp
- medium.com on 9 October 2024
- LinkedIn on 9 October 2024
- Learn in public 6: Importance of cell order execution Slack on 8 October 2024
- Learn in public 7: Nobel price for AI in physics and chemistry Slack on 9 October 2024
- Explore more: Student Performance Data Set
- Explore more: UCI ML datasets
Homework 3: Machine Learning for Classification due 15 October 2024 01:00, extended to 17 October 2024 01:00
- Finished the homework (scored 5/6)
- Learn in public 1: weekly learning LinkedIn on 14 October 2024
- Learn in public 2, 3 & 4: Keep the pace
- Learn in public 5 & 6: Subtitle adventure
- Learn in public 7: Gemini is takeing project lead Slack on 16 October 2024
- Explore more: exclude least useful features
- Explore more: scikit-learn in project of last week
- Explore more: use
LinearRegression
(not regularized) andRidgeRegression
(regularized) - Explore more: Find the best regularization parameter for Ridge
- Explore more: using the
OneHotEncoding
class - Explore more: Lead scoring
- Explore more: Default prediction
Homework 4: Evaluation Metrics for Classification due 22 October 2024 01:00, extended to 23 October 2024 23:00
- Finished the homework (scored 6/6)
- Learn in public 1: weekly learning LinkedIn on 21 October 2024
- Learn in public 2: help with MacBook telegram on 21 October
- Learn in public 3: library HH LinkedIn on 22 October 2024
- Learn in public 4: promotion for Humble Bundle for Data Visualisation Slack on 23 October 2024
- Learn in public 5 & 6: Working with VSCode
- Learn in public 7: comment LinkedIn on 23 October 2024
- Explore more: Check the precision and recall of the dummy classifier that always predict "FALSE"
- Explore more: F1 score = 2 P R / (P + R)
- Explore more: Evaluate precision and recall at different thresholds, plot P vs R - this way you'll get the precision/recall curve (similar to ROC curve)
- Explore more: Area under the PR curve is also a useful metric
- Explore more: Calculate the metrics for the suggested datasets from the previous week
Homework 5: Deploying Machine Learning Models due 29 October 2024 00:00, extended to 31 October 2024 morning
- Finished the homework (scored 6/6)
- Learn in public 1: weekly learning LinkedIn on 28 October 2024
- Learn in public 2: python in zoo LinkedIn on 29 October 2024
- Learn in public 3: deployment LinkedIn on 29 October 2024
- Learn in public 4: deployment flask LinkedIn on 29 October 2024
- Learn in public 5: course leaderboard top10 LinkedIn on 30 October 2024
- Learn in public 6: docker LinkedIn on 30 October 2024
- Learn in public 7: beanstalk LinkedIn on 30 October 2024
- Explore more: Flask is not the only framework for creating web services. Try others, e. g. FastAPI
- Explore more: Experiment with other ways of managing environment, e. g. virtual env, conda, poetry
- Explore more: Explore other ways of deploying web services, e. g. GCP, Azure, Heroku, Python Anywhere, etc.
Homework 6: Decision Trees and Ensemble Learning due 5 November 2024 00:00
- Finished the homework (scored 6/6)
- Learn in public 1: leaderboard week5 LinkedIn on 4 November 2024
- Learn in public 2: weekly learning LinkedIn on 4 November 2024
- Learn in public 3: Decision Tree LinkedIn on 4 November 2024
- Learn in public 4: Decision Tree Parameter Tuning LinkedIn on 4 November 2024
- Learn in public 5: Random Forests LinkedIn on 4 November 2024
- Learn in public 6: Boosting LinkedIn on 4 November 2024
- Learn in public 7: Wrapping up week 6 LinkedIn on 4 November 2024
- Explore more: do EDA or feature engineering for this dataset to get more insights into the problem
- Explore more: For random forest, there are more parameters that we can tune. Check max_features and bootstrap.
- Explore more: Try ExtraTreesClassifier
- Explore more: Check if not filling NA's help improve performance (XGBoost).
- Explore more: Experiment with other XGBoost parameters: subsample and colsample_bytree.
- Explore more: When selecting the best split, decision trees find the most useful features. This information can be used for understanding which features are more important than others. See example here (?) for random forest (it's the same for plain decision trees) and for xgboost
- Explore more: using trees solving regression problems: check DecisionTreeRegressor, RandomForestRegressor and the objective=reg:squarederror parameter for XGBoost
Homework 8: Neural Networks and Deep Learning due 3 December 2024 01:00, extended to 5 December 2024
- Finished the homework (scored 5/6)
- Learn in public 1: Wrapping up week 8 LinkedIn on 04 December 2024
- Learn in public 2: CNNs LinkedIn on 04 December 2024
- Learn in public 3: Transfer learning LinkedIn on 04 December 2024
- Learn in public 4: Regularization and Dropout LinkedIn on 04 December 2024
- Learn in public 5: Augmentation LinkedIn on 04 December 2024
- Learn in public 6: Training LinkedIn on 04 December 2024
- Learn in public 7: Env / GPU wrap up LinkedIn on 04 December 2024
- Explore more: Add more data, e.g, Zalando etc
- Explore more: Albumentations - another way of generating augmentations
- Explore more: Use PyTorch or MXNet instead of TensorFlow/Keras
- Explore more: In addition to Xception, there are others architectures - try them
- Explore more: Project: cats vs dogs
- Explore more: Project: Hotdog vs not hotdog
- Explore more: Project: Category of images
Homework 9: Serverless Deep Learning due 10 December 2024 00:00
- Finished the homework (scored 6/6)
- Learn in public 1: leaderboard week 9 LinkedIn on 09 December 2024
- Learn in public 2: Wrapping up week 9 LinkedIn on 09 December 2024
- Learn in public 3: TF lite LinkedIn on 09 December 2024
- Learn in public 4: Lambda function 1 LinkedIn on 09 December 2024
- Learn in public 5: Docker for Lambda LinkedIn on 09 December 2024
- Learn in public 6: AWS API Gateway LinkedIn on 10 December 2024
- Learn in public 7: wrap up week 9 LinkedIn on 10 December 2024
- Explore more: Try similar serverless services from Google Cloud and Microsoft Azure
- Explore more: Deploy cats vs dogs and other Keras models with AWS Lambda
- Explore more: AWS Lambda is also good for other libraries, not just Tensorflow. You can deploy Scikit-Learn and XGBoost models with it as well
Homework 10: Kubernetes and TensorFlow Serving due 17 December 2024 00:00
- Finished the homework (scored x/7)
- Learn in public 1: docker-compose LinkedIn on 20 December 2024
- Learn in public 2: leaderboard week 10 linkedin on 20 December 2024
- Learn in public 3: overview kubernetes picture LinkedIn on 21 December 2024
- Learn in public 4: overview kubernetes LinkedIn on 21 December 2024
- Learn in public 5: pingpong LinkedIn on 21 December 2024
- Learn in public 6: local kubernetes LinkedIn on 21 December 2024
- Learn in public 7: aws eks LinkedIn on 21 December 2024
- Explore more: Other local Kubernetes: minikube, k3d, k3s, microk8s, EKS Anywhere
- Explore more: Rancher desktop
- Explore more: Docker desktop
- Explore more: Lens
- Explore more: Many cloud providers have Kubernetes: GCP, Azure, Digital ocean and others. Look for "Managed Kubernetes" in your favourite search engine
- Explore more: Deploy the model from previous modules and from your project with Kubernetes
- Explore more: Learn about Kubernetes namespaces. Here we used the default namespace
Midterm Project: due 26 November 2024 00:00
scored 16/16
- Start: 4 November 2024
- Problem description: README
- context
- problem
- use of solution
- Find a dataset: Classification Mushroom Data 2020
- Do EDA
- ranges of values
- missing values
- analysis of target variable
- feature importance analysis
- Model training
- Train multiple models
- tune hyperparameters
- Export the best model to script and save model
- Reproducibility
- re-execute the notebook without errors
- execute the training script without errors
- submit the dataset
- Deploy the model - Flask / Steamlit
- Flask
- Streamlit
- Dependency and environment management
- conda environment
- Makefile
- environment.yml
- requirements.txt
- Pipfile
- Readme
- installation
- activation
- Containerization
- Dockerfile
- README
- build a container
- run
- Cloud deployment
- AWS
- docker - beanstalk
- README
- deployment described with code
- cloud
- URL for testing / video or screenshot of testing
- deployment described with code
- AWS
- Streamlit app
- search field
- generate random mushroom
- prettify table
- Write a report / make a presentation / video
- Learn in public 1: Leaderboard LinkedIn on 7 November 2024
- Learn in public 2: Prediction working LinkedIn on 25 November 2024
- Learn in public 3: Picture of shroom LinkedIn on 25 November 2024
- Learn in public 4: 3D Scatter plot numerical fungi features LinkedIn on 25 November 2024
- Learn in public 5: 3D Scatter plot interactive LinkedIn on 25 November 2024
- Learn in public 6: Largest cap-diameter LinkedIn on 25 November 2024
- Learn in public 7: Correlation matrix LinkedIn on 26 November 2024
- Learn in public 8: Long tails LinkedIn on 26 November 2024
- Learn in public 9: Update Setup Saturn Cloud github on 25 November 2024
- Learn in public 10: Fungi without stem LinkedIn on 26 November 2024
- Learn in public 11: EB error LinkedIn on 26 November 2024
- Learn in public 12: Harvest time LinkedIn on 26 November 2024
- Learn in public 13: Slack feedback LinkedIn on 26 November 2024
- Learn in public 14: Pictures from epub LinkedIn on 26 November 2024
- Project 1 - Laptop price prediction, see evaluation done on 29 November 2024
- Project 2 - loan approval, see evaluation done on 29 November 2024
- Project 3 - diabetes, see evaluation done on 29 November 2024
scored xx/16
- Start: 13 December 2024
- Problem description: README
- context
- problem
- use of solution
- Find a dataset: Dog Breed
- Do Image EDA for object detection
- dataset overview
- visual inspection
- image properties
- outliers
- Model training
- Train multiple models
- tune hyperparameters
- Export the best model to script and save model
- Reproducibility
- re-execute the notebook without errors
- execute the training script without errors
- submit the dataset
- Deploy the model - Flask / Steamlit
- Flask
- Streamlit
- Dependency and environment management
- conda environment
- Makefile
- environment.yml
- requirements.txt
- Pipfile
- Readme
- installation
- activation
- Containerization
- Dockerfile
- README
- build a container
- run
- Cloud deployment
- AWS
- docker - beanstalk
- README
- deployment described with code
- cloud
- URL for testing / video or screenshot of testing
- deployment described with code
- AWS
- Web app (optional)
- Write a report / make a presentation