-
Notifications
You must be signed in to change notification settings - Fork 0
Performance
I have done some performance test using MNIST data set on my ASUS X55V laptop, which have
- Intel® Core™ i3-2370M CPU @ 2.40GHz × 4
- nVidia GeForce 610M/PCIe/SSE2
- 3.7 GiB RAM.
The net structure is as below:
conv_layer: 1 x 28 x 28 => 32 x 28 x 28, kernel 5 x 5 + 1, padding 2, params 832
relu_layer: in 25088, out 25088, param 0
pooling_layer: 32 x 28 x 28 => 32 x 14 x 14, kernel 2 x 2 + 2, padding 0, params 0
conv_layer: 32 x 14 x 14 => 64 x 14 x 14, kernel 5 x 5 + 1, padding 2, params 51264
relu_layer: in 12544, out 12544, param 0
pooling_layer: 64 x 14 x 14 => 64 x 7 x 7, kernel 2 x 2 + 2, padding 0, params 0
fc_layer: in 3136, out 1024, param 3212288
relu_layer: in 1024, out 1024, param 0
dropout_layer: in 1024, out 1024, param 1024
fc_layer: in 1024, out 10, param 10250
softmax_layer: in 10, out 10, param 0
cee_layer: in 10, out 1, param 10
total: layers 12, params 3275668, heap size 54685584
And here is the per batch performance with batch size = 600:
configuration | time (s) |
---|---|
clblas@gpu | 28 |
clblast@gpu | 23 |
@cpu | 22 |
openmp@cpu | 15 |
cublas@gpu | 15 |
openblas@cpu | 12 |