Repository to track the progress in model compression and acceleration
- T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor (CVPR 2019) paper
- MUSCO: Multi-Stage COmpression of neural networks (ICCVW 2019) paper | code (PyTorch)
- Efficient Neural Network Compression (CVPR 2019) paper | code (Caffe)
- Adaptive Mixture of Low-Rank Factorizations for Compact Neural Modeling (ICLR 2019) paper | code (PyTorch)
- Extreme Network Compression via Filter Group Approximation (ECCV 2018) paper
- Ultimate tensorization: compressing convolutional and FC layers alike (NIPS 2016 workshop) paper | code (TensorFlow) | code (MATLAB, Theano + Lasagne)
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications (ICLR 2016) paper
- Accelerating Very Deep Convolutional Networks for Classification and Detection (IEEE TPAMI 2016) paper
- Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition (ICLR 2015) paper | code (Caffe)
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation (NIPS 2014) paper
- Speeding up Convolutional Neural Networks with Low Rank Expansions (2014) paper
- Rethinking the Value of Network Pruning (ICLR 2019, NIPS 2018 workshop) paper | code (PyTorch)
- Dynamic Channel Pruning: Feature Boosting and Suppression (ICLR 2019) paper | code
- AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference (2019) paper
- CLIP-Q: Deep Network Compression Learning by In-ParallelPruning-Quantization (CVPR 2018) paper
- Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks (IJCAI 2018) paper | code and models (PyTorch)
- Discrimination-aware Channel Pruning for Deep Neural Networks (NIPS 2018) paper | code and pretrained models (PyTorch)
- AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV18) paper | code (PyTorch) | pretrained models (PyTorch, TensorFlow, TensorFlow Light)
- Channel Gating Neural Networks (2018) paper
- DSD: Dense-Sparse-Dense Training for Deep Neural Networks paper | pretrained models (Caffe) (ICLR 2017)
- Channel Pruning for Accelerating Very Deep Neural Networks (ICCV 2017) paper | code and pretrained models (Caffe) | code (PyTorch)
- Learning Efficient Convolutional Networks through Network Slimming (ICCV 2017) paper | code (Torch, Pytorch)
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression (ICCV 2017) paper | pretrained model (Caffe) | code (PyTorch)
- Structured Bayesian Pruning via Log-Normal Multiplicative Noise (NIPS 2017) paper | code (TensorFlow, Theano + Lasagne)
- SphereFace: Deep Hypersphere Embedding for Face Recognition (CVPR 2017) paper | code and pretrained models (Caffe)
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (ICLR 2016) paper
- Fast ConvNets Using Group-wise Brain Damage (CVPR 2016) paper
- Pruning + quantization code and pretrained models (TensorFlow, TensorFlow light). Examples for CIFAR.
- Learning Efficient Detector with Semi-supervised Adaptive Distillation (arxiv 2019) paper | code (Caffe)
- Model compression via distillation and quantization (ICLR 2018) paper | code (Pytorch)
- Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks (ICLR 2018 workshop) paper
- Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks ( BMVC 2018) paper
- Net2Net: Accelerating Learning via Knowledge Transfer (ICLR 2016) paper
- Distilling the Knowledge in a Neural Network (NIPS 2014) paper
- FitNets: Hints for Thin Deep Nets (2014) paper | code (Theano + Pylearn2)
TensorFlow implementation of three papers https://github.com/chengshengchan/model_compression, results for CIFAR-10
- Bayesian Bits: Unifying Quantization and Pruning (2020) paper
- Up or Down? Adaptive Rounding for Post-Training Quantization (2020) paper
- Gradient
$\ell_1$ Regularization for Quantization Robustness (ICLR 2020) paper - Training Binary Neural Networks with Real-to-Binary Convolutions (ICLR 2020) paper | code (coming soon)
- Data-Free Quantization Through Weight Equalization and Bias Correction (ICCV 2019) paper | code (PyTorch)
- XNOR-Net++ (2019) paper
- Matrix and tensor decompositions for training binary neural networks (2019) paper
- XNOR-Net (ECCV 2016) paper | code (Pytorch)
- Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks (2019) paper | code (TensorFlow)
- Relaxed Quantization for Discretized Neural Networks (ICLR 2019) paper
- Training and Inference with Integers in Deep Neural Networks (ICLR 2018) paper | code (TensorFlow)
- Training Quantized Nets: A Deeper Understanding (NeurIPS 2017) paper
- Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference (2017) paper
- Deep Learning with Limited Numerical Precision (2015) paper
- Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation (2013) paper
- MobileNets
- Searching for MobileNetV3 paper
- MobileNetV2: Inverted Residuals and Linear Bottlenecks (CVPR 2018) paper | code and pretrained models (TensorFlow)
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (ICML 2019) paper | code and pretrained models (TensorFlow)
- MnasNet: Platform-Aware Neural Architecture Search for Mobile (CVPR 2019) paper | code (TensorFlow)
- MorphNet: Fast & Simple Resource-Constrained Learning of Deep Network Structure (CVPR 2018) paper | code (TensorFlow)
- ShuffleNets
- Multi-Fiber Networks for Video Recognition (ECCV 2018) paper | code (PyTorch)
- IGCVs
- IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks (BMVC 2018) paper | code and pretrained models (MXNet)
- IGCV2: Interleaved Structured Sparse Convolutional Neural Networks (CVPR 2018) paper
- Interleaved Group Convolutions for Deep Neural Networks (ICCV 2017) paper
- Quantizing deep convolutional networks for efficient inference: A whitepaper (2018) paper
- Algorithms for speeding up convolutional neural networks (2018) thesis
- Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges (2018) paper
- Efficient methods and hardware for deep learning (2017) thesis
- MUSCO - framework for model compression using tensor decompositions (PyTorch, TensorFlow)
- AIMET - AI Model Efficiency Toolkit (PyTorch, Tensorflow)
- Distiller - package for compression using pruning and low-precision arithmetic (PyTorch)
- MorphNet - framework for neural networks architecture learning (TensorFlow)
- Mayo - deep learning framework with fine- and coarse-grained pruning, network slimming, and quantization methods
- PocketFlow - framework for model pruning, sparcification, quantization (TensorFlow implementation)
- Keras compressor - compression using low-rank approximations, SVD for matrices, Tucker for tensors.
- Caffe compressor K-means based quantization
- gemmlowp - Building a quantization paradigm from first principles (C++)
- NNI - Framework for Feature Engineering, NAS, Hyperparam tuning and Model compression
Please, see comparative_results.pdf
- https://github.com/ZhishengWang/Embedded-Neural-Network
- https://github.com/memoiry/Awesome-model-compression-and-acceleration
- https://github.com/sun254/awesome-model-compression-and-acceleration
- https://github.com/guan-yuan/awesome-AutoML-and-Lightweight-Models
- https://github.com/chester256/Model-Compression-Papers
- https://github.com/mapleam/model-compression-and-acceleration-4-DNN
- https://github.com/cedrickchee/awesome-ml-model-compression
- https://github.com/jnjaby/Model-Compression-Acceleration
- https://github.com/he-y/Awesome-Pruning