Input file: fer2013.csv
Output file: fer2013_augmented.csv
From analysis of the dataset emerges that it is:
- small: has only 35887 images
- not uniformly distributed: happiness has 8989 samples meanwhile disgust only 547.
The analysis and the augmentation is done in the fer2013_augmenter.ipynb
file.
The transformation class is implemented inside the fer2013_augmenter.py
and is invoked by the notebook.
We decided to enlarge the database using some transformations on the images.
The size of original dataset is almost 290 MB.
Each transformation increase the size of 290 MB more or less.
The transformation applied are the following, defined in the class Filters(Enum)
:
SOBEL
VERTICAL
HORIZONTAL
CONTRAST_LOW
CONTRAST_HIGH
CONTRAST_VERY_HIGH
FLIP_HORIZONTAL
ROT_LEFT_60_DEGREES
ROT_LEFT_40_DEGREES
ROT_LEFT_20_DEGREES
ROT_RIGHT_20_DEGREES
ROT_RIGHT_40_DEGREES
ROT_RIGHT_60_DEGREES
The final size will be almost 14 * 290 MB = 4 GB (original images included).
If the size is too much large, it is very easy to create a smaller dataset, by editing the notebook and selecting only the some of the filters.
The filters now are implemented in three possible ways:
- using a lambda function: (x, y, pixel) -> pixel
- used for altering the contrast
- using a filter matrix in a function: (image, filter) -> image
- used for vertical and horizontal filters
- using custom function: (image, custom_parameter) -> image
- used for sobel filter, flipping, rotations
Before applying the transformation filter-based, the image is padded to avoid images with lower dimensions.
Currently it is not supported the application of a stride.
If you want to add any type of transformation you just need to:
- add the transformation name to Filters(Enum)
-
- (for lambda/matrix) : add the lambda/matrix to the list of lambdas/matrixes
- (for custom functions) : add the function and edit the also the
generate_all_filters
function adding the function call
- initialize and execute the class
The image recognition is done by a Convolutional Neural Network, using PyTorch
.
The creation of the CNN is done by the classes DynamicNetBasic
and DynamicNetInceptions
, which are subclasses of the class torch.nn.Module
.
Both the classes allow to create dynamic nets (with a variable number of layers).
The class constructor allows to try many different nets by simply changing few parameters.
Structure of the CNNs:
DynamicNetBasic | DynamicNetInceptions |
|
|
The C-Block (convolutional-block) is formed by a Conv2D, a DropOut (optionally), and a ReLU:
Conv-Drop-ReLU | Conv-ReLU |
So, for both the classes, the full view of the first point of the structure is the following:
The class DynamicNetBasic has a linear structure and has the following parameters (divided by which step are used):
-
List( List( C-Block ), MaxPool2D ):
-
double
drop__before_relu
: percentage of dropout probability after each Conv2D.- NB: To use a Conv-ReLU without dropout pass a value
$\le 0$ .
- NB: To use a Conv-ReLU without dropout pass a value
-
integer
conv__in_channels
: number of channels in input (the number of filters used). -
tuple of integer
conv__out_channels
: each element represents the number of channels in output for all che Conv2d inside the inner list.- NB: Tipically you want to increase the number of channels in the convolutional part
-
tuple of integer
conv__layer_repetitions
: each element represents the number of times each inner list must be repeated before the MaxPool2D.- NB the first Conv2D has shape
$in{\textunderscore}chan \rightarrow out{\textunderscore}chan$ , the others$out{\textunderscore}chan \rightarrow out{\textunderscore}chan$ . - NB2: since the class is dynamic the two tuples can have any length, but must be same for both.
- NB the first Conv2D has shape
-
double
-
DropOut:
-
double
drop__before_linear
: percentage of dropout probability
-
double
-
List( Linear ):
-
tuple of integer
lin__out_dimension
: each element represents the number of features in output. The last element must have value$7 = len(emotions)$ , so that each value of the final array will represent the probability to be the i-th emotion.- NB: Tipically you want to decrease the number of channels in the linear part
-
tuple of integer
- SoftMax: no parameters
So, for example, this would be produce a well performing -but huge- model:
As a reminder, the structure of an Inception-Block is the following (developed by Google):
The class DynamicNetBasic doesn't have a linear structure for two reasons:
- each inception module inside itself diverges and converges
- each inception module has a skip connection:
$x = run{\textunderscore}inception(x, inception) + nn.Identidy(x)$
The class has the following parameters (divided by which step are used):
- List( List( C-Block ), MaxPool2D ): all same as Basic class
-
DropOut:
-
double
dropout_prob__before_incep
: percentage of dropout probability used before the inceptions
-
double
-
List( Inception-Block ):
-
integer
incep__num_layers
: number of inception modules to execute- NB the first has shape
$N \rightarrow 256 * mul$ , the others$256 * mul \rightarrow 256 * mul$
- NB the first has shape
-
integer
incep__multiplier
: multiplier applied to the default out dimension of resnet ($64$ for 1x1,$128$ for 3x3,$32$ for 5x5,$32$ for maxpool), for ex. if setted to$2$ will have$2 * 64$ for 1x1,$2 * 128$ for 3x3 ecc.
-
integer
-
DropOut:
-
double
dropout_prob__before_linear
: percentage of dropout probability used after the inceptions
-
double
- List( Linear ): all same as Basic class
- SoftMax: no parameters
So, for example, this would be produce a well performing -but huge- model: