The goal of this project is the development of a fully automated optimization algorithm for (convolutional) neural networks. The algorithm optimizes both the neural architecture as well as all hyperparameters of the network. The following pipeline for the (nested) algorithm is proposed:
- A Nelder-Mead algorithm first optimizes the neural architecture and the parameters of the ensuing genetic algorithm used for training the network.
- A genetic algorithm then trains the hyperparameters of a fixed neural network architecture from the Nelder-Mead algorithm.
As an exemplary problem set, the MNIST dataset for image classification is used in this project.
The genetic algorithm optimizes all hyperparameters of a (convolutional) neural networks, i.e.:
- Weights of the fully connected layers
- Biases of the fully connected layers
- Filters of the convolutional layers
- Activation functions of all layers (from a list of possible functions to choose)
The genetic algorithm itself uses the following operations:
- Initialization: Fixed population size N
- Fitness function: Accuracy of the neural network
- Selection: Roulette-wheel selection with elitism
- Crossover: Neuron-wise crossover with a fixed crossover probability
- Mutation: Additive gaussian noise with fixed mutation probabilities
The Nelder-Mead algorithm optimizes the neural architecture as well as the parameters of the genetic algorithm, i.e.:
- Population size
- Selection size
- Mutation probabilities
- Crossover probability
- Gaussian noise variance
- Convolutional filter dimensions
- Convolutional filter strides
- Convolutional pooling strides
- Fully connected layer widths
The developed optimization algorithm achieved 87% accuracy on the MNIST dataset on a multi-layer perceptron and 89% accuracy on a convolutional neural network during testing. Furthermore, the current algorithm struggles with larger models due to the large amount of optimization parameters and converges consistently to relatively simple models. Further work is required for the algorithm to investigate deeper models and profit from the increased performance they can provide.