This is quick evaluation of different pooling functions performance on ImageNet-2012.

The architecture is similar to CaffeNet, but has differences:

Name	Accuracy	LogLoss	Comments
MaxPool	0.471	2.36	290K iters stopped
Stochastic	0.438	2.54	Underfitting, may be try without Dropout
Stochastic, no dropout	0.429	2.96	Stoch pool does not prevent overfitting without dropout :(. Good start,bad finish
AvgPool	0.435	2.56
Max+AvgPool	0.483	2.29	Element-wise sum
NoPool	0.472	2.35	Strided conv2,conv3,conv4

Name	Accuracy	LogLoss	Comments
MaxPool 3x3/2	0.471	2.36	default alexnet
MaxPool 2x2/2	0.484	2.29	Leads to larger feature map, Pool5=4x4 instead of 3x3
MaxPool 3x3/2 pad1	0.488	2.25	Leads to even larger feature map, Pool5=5x5 instead of 3x3

Authors of Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree kindly provided reference implementation for test. Unfortunately, under patent: UCSD Docket No. SD2016-053, "Generalizing Pooling Functions in Convolutional Neural Network", filed on Sept 23, 201

The performance is good, but seems dependent on other design choises (i.e. beat MaxPool in one setup and lose in another) and also on initialization.

Name	Accuracy	LogLoss	Comments
MaxPool128-2048	0.470	2.36	My reference caffenet128
GatedAveMaxPool128-2048	0.471	2.36
GeneralPool128-2048	0.464*	2.46*	Unfinished, 227K iters
MaxPool128-4096	0.497	2.24	fc6,fc7 = 4096
GeneralPool128-4096	0.494	2.25	fc6,fc7 = 4096
MaxPool227-4096	0.565	1.87	My reference caffenet227
GeneralPool227-4096	0.570	1.86
Authors GeneralPool227-4096	0.585	1.78	Different lr_policy: each step is longer

Previous results on small datasets like CIFAR (see LSUV-init, Table3) looks a bit contradictory to ImageNet ones so far.

P.S. Logs are merged from lots of "save-resume", because were trained at nights, so plot "Accuracy vs. seconds" will give weird results.

Provide feedback

Saved searches