You will work with the Fashion MNIST dataset, aiming to implement a Conditional Generative Adversarial Network (Conditional GAN) to generate images from this dataset. Start by splitting the dataset into training and test sets. While maintaining simplicity, focus on devising an innovative approach to analyze the test dataset using the Conditional GAN model. Employ data science techniques to explore and gain valuable insights from the generated data.
The architecture and implementation wholy follow the CGAN paper [1], including the choice of optimizers, schedulers, hyperparameters etc. As mentioned in the paper, the Maxout activation [2] is used for the discriminator.
During the training, the following files appear in the auto-generated folder "runs": optimal models for generator and discriminator; a checkpoint from the final epoch; samples generated during the training; a loss plot; a fid scores plot.
During the evaluation, the files created in the "eval" folder are: a batch of generated images and their labels; a batch of real data used in the evaluation; plots for dimensionality reductions (pca, tsne) for real and generated batches.
For logging, I tried to use Weight & Biases, and indeed it does nice plots for the training & logs
To evaluate the performance of the generated data sets, I used the FID metric and dimensionality reduction analysis. The models were trained for 100 epochs, and the checkpoints corresponding to the most optimal models were selected for evaluation.
The following plot shows the loss values for both the discriminator and the generator during the training.
Here we can see the plot of FID score calculated for each epoch on 1000 samples.
The FID score is currently quite high, but it was evaluated for 100 epochs. I believe it could decrease further with longer training.
The image below presents 196 generated samples. The quality is visibly far from ideal, the generated images seems noisy, so there is a room for improvement.
To compare feature distributions for real and generated data, I used two algorithms: PCA and t-SNE. PCA is quite straightforward, but at first, I didn't find the clusters very meaningful, even for the real data. On the other hand, as for me, t-SNE provides better clustering, but it is stochastic (and it can even cluster random gaussian noise as mentioned in [4]). Hence, I remained them too.
To extract features from the images, I used a pretrained InceptionV3 model, as it also serves in calculating the FID score.
Real data
For both methods, the images corresponding to label 9 (ankle boot) and label 8 (bag) are well-separated; labels 0 (t-shirt), 2 (pullover), and 3 (dress) tend to be mixed, they likely share similar concepts.
Generated data
The clusters in PCA for the generated images are more mixed, though still visible. Surprisingly, the clusters in t-SNE for the generated images are better-separated. I suspect that the lower quality and more simplified nature of those images make it easier to distinguish them.
Trying to improve the quality of the generated images, I used some of the proposed techniques from the [3] paper.
- One-side label smoothing (instead of labeling real data as 1, label it as a value slightly less than 1, e.g., 0.9)
- Xavier initialization for both models
- Use of LeakyReLU
Also, I tried to normalise images between -1 and 1 and use of tanh in the generator output -- but it did not work well for me.
The following plot shows the loss values for both the discriminator and the generator during the training.
The minimum value of the loss of the generator is smaller than for the previous try.
FID score starts with the lower value on the first epoch, besides that, it does not seems to be really improved for those 100 epochs.
The generated images are seems to be a bit less noisy (or I try to convince myself).
Real data
Generated data
The clusters obtained for generated data seems slightely improved for both methods. For further comparison, a londer training is necessary.
root/
├── eval/ # Folder generated during evaluation for each run
│ ├── 2025-01-08_01-55-51/ # Evaluation results for CGAN based on [1]
│ └── 2025-01-09_01-00-08/ # Evaluation results for CGAN based on [1] with improved techniques from [3]
├── plots/ # Plots for README.md taken from Weights & Biases
├── runs/ # Folder generated during training for each run
│ ├── 2025-01-08_00-59-57/ # Training results for CGAN based on [1]
│ └── 2025-01-09_00-33-46/ # Training results for CGAN based on [1] with improved techniques from [3]
└── src/ # Source code with implementation
-
Install the packages and go to src folder:
pip install -r requirements.txt cd src
-
Run the script (1) to train a model, (2) to run evaluation:
### (1) train python main.py --params params.yaml --train ### or python main.py --train
### (2) evaluate python main.py --params params.yaml --eval ### or python main.py --eval
The argument """--params""" is a path to the yaml file with hyperparamenets, default is params.yaml.
The results aren't as promising as I had hoped 😂 : quality of the generated images is currently not that high. Upon further reflection, I suppose that using convolutional layers instead of fully connected layers would have likely improved performance... Anyways, it was still a nice task for me 😇
[1] Mirza, M., & Osindero, S. (2014). Conditional Generative Adversarial Nets. https://arxiv.org/abs/1411.1784
[2] Goodfellow, I. J., et al. (2013). Maxout Networks. https://arxiv.org/abs/1302.4389
[3] Salimans, T., et al. (2016). Improved techniques for training GANs. https://arxiv.org/abs/1606.03498
[4] Wattenberg, et al. (2016). How to Use t-SNE Effectively. http://doi.org/10.23915/distill.00002