This is quick evaluation of different pooling functions performance on ImageNet-2012.
The architecture is similar to CaffeNet, but has differences:
- Images are resized to small side = 128 for speed reasons.
- fc6 and fc7 layers have 2048 neurons instead of 4096.
- Networks are initialized with LSUV-init
Name | Accuracy | LogLoss | Comments |
---|---|---|---|
MaxPool | 0.471 | 2.36 | 290K iters stopped |
Stochastic | 0.438 | 2.54 | Underfitting, may be try without Dropout |
Stochastic, no dropout | 0.429 | 2.96 | Stoch pool does not prevent overfitting without dropout :(. Good start,bad finish |
AvgPool | 0.435 | 2.56 | |
Max+AvgPool | 0.483 | 2.29 | Element-wise sum |
NoPool | 0.472 | 2.35 | Strided conv2,conv3,conv4 |
Name | Accuracy | LogLoss | Comments |
---|---|---|---|
MaxPool 3x3/2 | 0.471 | 2.36 | default alexnet |
MaxPool 2x2/2 | 0.484 | 2.29 | Leads to larger feature map, Pool5=4x4 instead of 3x3 |
MaxPool 3x3/2 pad1 | 0.488 | 2.25 | Leads to even larger feature map, Pool5=5x5 instead of 3x3 |
Authors of Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree kindly provided reference implementation for test. Unfortunately, under patent: UCSD Docket No. SD2016-053, "Generalizing Pooling Functions in Convolutional Neural Network", filed on Sept 23, 201
The performance is good, but seems dependent on other design choises (i.e. beat MaxPool in one setup and lose in another) and also on initialization.
Name | Accuracy | LogLoss | Comments |
---|---|---|---|
MaxPool128-2048 | 0.470 | 2.36 | My reference caffenet128 |
GatedAveMaxPool128-2048 | 0.471 | 2.36 | |
GeneralPool128-2048 | 0.464* | 2.46* | Unfinished, 227K iters |
MaxPool128-4096 | 0.497 | 2.24 | fc6,fc7 = 4096 |
GeneralPool128-4096 | 0.494 | 2.25 | fc6,fc7 = 4096 |
MaxPool227-4096 | 0.565 | 1.87 | My reference caffenet227 |
GeneralPool227-4096 | 0.570 | 1.86 | |
Authors GeneralPool227-4096 | 0.585 | 1.78 | Different lr_policy: each step is longer |
Previous results on small datasets like CIFAR (see LSUV-init, Table3) looks a bit contradictory to ImageNet ones so far.
P.S. Logs are merged from lots of "save-resume", because were trained at nights, so plot "Accuracy vs. seconds" will give weird results.