Docker Compose for MNIST Training, Evaluation, and Inference
In this assignment, you will create a Docker Compose configuration to perform training, evaluation, and inference on the MNIST dataset.
Requirements:
- You’ll need to use this model and training technique (MNIST Hogwild): https://github.com/pytorch/examples/tree/main/mnist_hogwild
- Set Num Processes to 2 for MNIST HogWild
- Create three services in the Docker Compose file:
train
,evaluate
, andinfer
. - Use a shared volume called
mnist
for sharing data between the services. - The
train
service should:- Look for a checkpoint file in the volume. If found, resume training from that checkpoint. Train for ONLY 1 epoch and save the final checkpoint. Once done, exit.
- The
evaluate
service should:- Look for the final checkpoint file in the volume. Evaluate the model using the checkpoint and save the evaluation metrics in a json file. Once done, exit.
- Share the model code by importing the model instead of copy-pasting it in eval.py
- The
infer
service should:- Run inference on any 5 random MNIST images and save the results (images with file name as predicted number) in the
results
folder in the volume. Then exit.
- Run inference on any 5 random MNIST images and save the results (images with file name as predicted number) in the
- After running all the services, ensure that the model, and results are available in the
mnist
volume.
Detailed Instructions:
- Build all the Docker images using
docker compose build
. - Run the Docker Compose services using
docker compose run train
,docker compose run evaluate
, anddocker compose run infer
. Verify that all services have completed successfully. - Check if the checkpoint file (
mnist_cnn.pt
) is saved in themnist
volume. If found, display "Checkpoint file found." If not found, display "Checkpoint file not found!" and exit with an error. - Check if the evaluation results file (
eval_results.json
) is saved in themnist
volume.- Example:
{"Test loss": 0.0890245330810547, "Accuracy": 97.12}
- Example:
- Check the contents of the
results
folder in themnist
volume see if the inference results are saved.
The provided grading script will run the Docker Compose configuration, check for the required files, display the results, and perform size and version checks.
You can run it yourself before pushing the code to your repo