EMLO4 - Session 03

Docker Compose for MNIST Training, Evaluation, and Inference

In this assignment, you will create a Docker Compose configuration to perform training, evaluation, and inference on the MNIST dataset.

Requirements:

You’ll need to use this model and training technique (MNIST Hogwild): https://github.com/pytorch/examples/tree/main/mnist_hogwild
Set Num Processes to 2 for MNIST HogWild
Create three services in the Docker Compose file: train, evaluate, and infer.
Use a shared volume called mnist for sharing data between the services.
The train service should:
- Look for a checkpoint file in the volume. If found, resume training from that checkpoint. Train for ONLY 1 epoch and save the final checkpoint. Once done, exit.
The evaluate service should:
- Look for the final checkpoint file in the volume. Evaluate the model using the checkpoint and save the evaluation metrics in a json file. Once done, exit.
- Share the model code by importing the model instead of copy-pasting it in eval.py
The infer service should:
- Run inference on any 5 random MNIST images and save the results (images with file name as predicted number) in the results folder in the volume. Then exit.
After running all the services, ensure that the model, and results are available in the mnist volume.

Detailed Instructions:

Build all the Docker images using docker compose build.
Run the Docker Compose services using docker compose run train, docker compose run evaluate, and docker compose run infer. Verify that all services have completed successfully.
Check if the checkpoint file (mnist_cnn.pt) is saved in the mnist volume. If found, display "Checkpoint file found." If not found, display "Checkpoint file not found!" and exit with an error.
Check if the evaluation results file (eval_results.json) is saved in the mnist volume.
1. Example: {"Test loss": 0.0890245330810547, "Accuracy": 97.12}
Check the contents of the results folder in the mnist volume see if the inference results are saved.

The provided grading script will run the Docker Compose configuration, check for the required files, display the results, and perform size and version checks.

You can run it yourself before pushing the code to your repo

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
eval.py		eval.py
infer.py		infer.py
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EMLO4 - Session 03

About

Releases

Packages

Languages

The-School-of-AI/emlo4-session-03-abhiyagupta

Folders and files

Latest commit

History

Repository files navigation

EMLO4 - Session 03

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages