This is quick evaluation of different ways of data augmentation performance on ImageNet-2012.
The architecture is similar to CaffeNet, but has differences:
- Images are resized to small side = 128 for speed reasons.
- fc6 and fc7 layers have 2048 neurons instead of 4096.
- Networks are initialized with LSUV-init
- No LRN layers.
Default augmentation: random crop 128x128 from 144xN image, 50% random horizontal flip. Additional augmentations:
- dropout of input data, dropout_ratio = 0.1
- multiscale
- random 5 deg. rotation
Name | Accuracy | LogLoss | Comments |
---|---|---|---|
Default | 0.471 | 2.36 | Random flip, random crop 128x128 from 144xN, N > 144 |
Drop 0.1 | 0.306 | 3.56 | + Input dropout 10%. not finished, 186K iters result |
Multiscale | 0.462 | 2.40 | Random flip, random crop 128x128 from ( 144xN, - 50%, 188xN - 20%, 256xN - 20%, 130xN - 10%) |
5 deg rot | 0.448 | 2.47 | Random rotation to [0..5] degrees. |