Skip to content

zucchi99/Emotion-Recognition-of-fer2013

Repository files navigation

Emotion Recognition of fer2013

Dataset Augmentation

Input file: fer2013.csv
Output file: fer2013_augmented.csv

From analysis of the dataset emerges that it is:

  • small: has only 35887 images
  • not uniformly distributed: happiness has 8989 samples meanwhile disgust only 547.

The analysis and the augmentation is done in the fer2013_augmenter.ipynb file.
The transformation class is implemented inside the fer2013_augmenter.py and is invoked by the notebook.

We decided to enlarge the database using some transformations on the images.
The size of original dataset is almost 290 MB.
Each transformation increase the size of 290 MB more or less.
The transformation applied are the following, defined in the class Filters(Enum):

  1. SOBEL
  2. VERTICAL
  3. HORIZONTAL
  4. CONTRAST_LOW
  5. CONTRAST_HIGH
  6. CONTRAST_VERY_HIGH
  7. FLIP_HORIZONTAL
  8. ROT_LEFT_60_DEGREES
  9. ROT_LEFT_40_DEGREES
  10. ROT_LEFT_20_DEGREES
  11. ROT_RIGHT_20_DEGREES
  12. ROT_RIGHT_40_DEGREES
  13. ROT_RIGHT_60_DEGREES

The final size will be almost 14 * 290 MB = 4 GB (original images included).
If the size is too much large, it is very easy to create a smaller dataset, by editing the notebook and selecting only the some of the filters.

The filters now are implemented in three possible ways:

  • using a lambda function: (x, y, pixel) -> pixel
    • used for altering the contrast
  • using a filter matrix in a function: (image, filter) -> image
    • used for vertical and horizontal filters
  • using custom function: (image, custom_parameter) -> image
    • used for sobel filter, flipping, rotations

Before applying the transformation filter-based, the image is padded to avoid images with lower dimensions.
Currently it is not supported the application of a stride.

If you want to add any type of transformation you just need to:

  1. add the transformation name to Filters(Enum)
    • (for lambda/matrix) : add the lambda/matrix to the list of lambdas/matrixes
    • (for custom functions) : add the function and edit the also the generate_all_filters function adding the function call
  2. initialize and execute the class

Emotion Recognition

The image recognition is done by a Convolutional Neural Network, using PyTorch. The creation of the CNN is done by the classes DynamicNetBasic and DynamicNetInceptions, which are subclasses of the class torch.nn.Module. Both the classes allow to create dynamic nets (with a variable number of layers). The class constructor allows to try many different nets by simply changing few parameters.

Structure of the CNNs:

DynamicNetBasic DynamicNetInceptions
  1. List( List(C-Block), MaxPool2D )
  2. DropOut
  3. List( Linear )
  4. SoftMax


  1. List( List( C-Block ), MaxPool2D )
  2. DropOut
  3. List( Inception-Block )
  4. DropOut
  5. List( Linear )
  6. SoftMax

The C-Block (convolutional-block) is formed by a Conv2D, a DropOut (optionally), and a ReLU:

Conv-Drop-ReLU Conv-ReLU
Conv-Drop-Block Conv-Block

So, for both the classes, the full view of the first point of the structure is the following:

SequenceOfC-Block

Class DynamicNetBasic

The class DynamicNetBasic has a linear structure and has the following parameters (divided by which step are used):

  1. List( List( C-Block ), MaxPool2D ):
    • double drop__before_relu: percentage of dropout probability after each Conv2D.
      • NB: To use a Conv-ReLU without dropout pass a value $\le 0$.
    • integer conv__in_channels: number of channels in input (the number of filters used).
    • tuple of integer conv__out_channels: each element represents the number of channels in output for all che Conv2d inside the inner list.
      • NB: Tipically you want to increase the number of channels in the convolutional part
    • tuple of integer conv__layer_repetitions: each element represents the number of times each inner list must be repeated before the MaxPool2D.
      • NB the first Conv2D has shape $in{\textunderscore}chan \rightarrow out{\textunderscore}chan$, the others $out{\textunderscore}chan \rightarrow out{\textunderscore}chan$.
      • NB2: since the class is dynamic the two tuples can have any length, but must be same for both.
  2. DropOut:
    • double drop__before_linear: percentage of dropout probability
  3. List( Linear ):
    • tuple of integer lin__out_dimension: each element represents the number of features in output. The last element must have value $7 = len(emotions)$, so that each value of the final array will represent the probability to be the i-th emotion.
      • NB: Tipically you want to decrease the number of channels in the linear part
  4. SoftMax: no parameters

So, for example, this would be produce a well performing -but huge- model:
$drop{\textunderscore}{\textunderscore}before{\textunderscore}relu = 0$
$conv{\textunderscore}{\textunderscore}in{\textunderscore}channels = len(filters{\textunderscore}used)$
$conv{\textunderscore}{\textunderscore}out{\textunderscore}channels = (200, 400, 600, 800)$
$conv{\textunderscore}{\textunderscore}layer{\textunderscore}repetitions = ( 2, 2, 2, 1)$
$drop{\textunderscore}{\textunderscore}before{\textunderscore}linear = 0.35$
$lin{\textunderscore}{\textunderscore}out{\textunderscore}dimension = (432, 108, 27, len(emotions))$

Class DynamicNetInceptions

As a reminder, the structure of an Inception-Block is the following (developed by Google):

Inception-Block

The class DynamicNetBasic doesn't have a linear structure for two reasons:

  • each inception module inside itself diverges and converges
  • each inception module has a skip connection: $x = run{\textunderscore}inception(x, inception) + nn.Identidy(x)$

The class has the following parameters (divided by which step are used):

  1. List( List( C-Block ), MaxPool2D ): all same as Basic class
  2. DropOut:
    • double dropout_prob__before_incep: percentage of dropout probability used before the inceptions
  3. List( Inception-Block ):
    • integer incep__num_layers: number of inception modules to execute
      • NB the first has shape $N \rightarrow 256 * mul$, the others $256 * mul \rightarrow 256 * mul$
    • integer incep__multiplier: multiplier applied to the default out dimension of resnet ( $64$ for 1x1, $128$ for 3x3, $32$ for 5x5, $32$ for maxpool), for ex. if setted to $2$ will have $2 * 64$ for 1x1, $2 * 128$ for 3x3 ecc.
  4. DropOut:
    • double dropout_prob__before_linear: percentage of dropout probability used after the inceptions
  5. List( Linear ): all same as Basic class
  6. SoftMax: no parameters

So, for example, this would be produce a well performing -but huge- model:
$drop{\textunderscore}{\textunderscore}before{\textunderscore}relu = 0$
$drop{\textunderscore}{\textunderscore}before{\textunderscore}incep = 0.35$
$conv{\textunderscore}{\textunderscore}in{\textunderscore}channels = len(filters{\textunderscore}used)$
$conv{\textunderscore}{\textunderscore}out{\textunderscore}channels = (288, 566, 1122, 2244)$
$conv{\textunderscore}{\textunderscore}layer{\textunderscore}repetitions = ( 4, 3, 2, 1)$
$incep{\textunderscore}{\textunderscore}num{\textunderscore}layers = 35$
$incep{\textunderscore}{\textunderscore}multiplier = 3$
$drop{\textunderscore}{\textunderscore}before{\textunderscore}linear = 0.50$
$lin{\textunderscore}{\textunderscore}out{\textunderscore}dimension = (1024, 356, 158, 64, len(emotions))$

Class Optimizer

About

A CNN to recognise the emotions from the dataset fer2013.csv

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published