ZNN

Most of current deep learning implementation use GPU, but GPU has some limitations:

SIMD (Single Instruction Multiple Data). A single instruction decoder - all cores do same work.
- divergence kills performance
Parallelization done per convolution(s)
- Direct convolution, computationally expensive
- FFT, can’t efficiently utilize all cores
Memory limitations
- Can’t cache FFT transforms for reuse
- limit the dense output size (few alternatives for this feature)

ZNN shines when Filter sizes are large so that FFTs are used

Wide and deep networks
Bigger output patch ZNN is the only (reasonable) open source solution
Very deep networks with large filters
FFTs of the feature maps and gradients can fit in RAM, but couldn’t fit on the GPU
run out of the box on future MUUUUULTI core machines

Resources

Zlateski, A., Lee, K. & Seung, H. S. (2015) ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines. (arXiv link)
Lee, K., Zlateski, A., Vishwanathan, A. & Seung, H. S. (2015) Recursive Training of 2D-3D Convolutional Networks for Neuronal Boundary Detection. (arXiv link)

C++ core

Python Interface

Name		Name	Last commit message	Last commit date
Latest commit History 1,281 Commits
benchmark		benchmark
bin		bin
dataset		dataset
docs		docs
julia		julia
networks		networks
python		python
src		src
testsuit		testsuit
zi		zi
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md