Skip to content

Performance

Yang Le edited this page Mar 3, 2018 · 3 revisions

I have done some performance test using MNIST data set on my ASUS X55V laptop, which have

  • Intel® Core™ i3-2370M CPU @ 2.40GHz × 4
  • nVidia GeForce 610M/PCIe/SSE2
  • 3.7 GiB RAM.

The net structure is as below:

conv_layer: 1 x 28 x 28 => 32 x 28 x 28, kernel 5 x 5 + 1, padding 2, params 832

relu_layer: in 25088, out 25088, param 0

pooling_layer: 32 x 28 x 28 => 32 x 14 x 14, kernel 2 x 2 + 2, padding 0, params 0

conv_layer: 32 x 14 x 14 => 64 x 14 x 14, kernel 5 x 5 + 1, padding 2, params 51264

relu_layer: in 12544, out 12544, param 0

pooling_layer: 64 x 14 x 14 => 64 x 7 x 7, kernel 2 x 2 + 2, padding 0, params 0

fc_layer: in 3136, out 1024, param 3212288

relu_layer: in 1024, out 1024, param 0

dropout_layer: in 1024, out 1024, param 1024

fc_layer: in 1024, out 10, param 10250

softmax_layer: in 10, out 10, param 0

cee_layer: in 10, out 1, param 10

total: layers 12, params 3275668, heap size 54685584

And here is the per batch performance with batch size = 600:

configuration time (s)
clblas@gpu 28
clblast@gpu 23
@cpu 22
openmp@cpu 15
cublas@gpu 15
openblas@cpu 12
Clone this wiki locally