Super-Convergence-TensorFlow-Slim

This is a Tensorflow implementation of Super-Convergence: very fast training of neural networks using large learning rates aiming to be compatible on the TensorFlow-Slim image classification model library.

Super-Convergence

This paper suggests a different learning rate policy called one-cycle policy which makes network to be trained significantly faster and named this phenomenon super-convergence.

Cyclical Learning Rate

Cyclical Learning Rate (CLR) is the previous research of the author and suggests to train a network with CLR to get better classification performance.

One-Cycle Policy

Super-convergence paper suggests one-cycle policy which is small modification of CLR. It always use one cycle that is smaller than the total number of iterations/epochs and allow the learning rate to decrease several orders of magnitude less than the initial learning rate for the remaining iterations. If using one cycle learning rate schedule, it is better to use a cyclical momentum (CM) that starts at the maximum momentum value and decreases with increasing learning rate.

Learning Rate Range Test

In order to find proper learning rate range for CLR and one-cycle policy, you can use learning rate range test. As the learning rate increases, it eventually becomes too large and czuses the test/validation loss to increase and the accuracy to decrease. This point or smaller value of this point can be used as the maximum bound. Minimum bound can be found as:

a factor of 3 or 4 less than the maximum bound
a factor of 10 or 20 less than the maximum bound if only one cycle is used
by a short test of hundreds of iterations with a few initial learning rates and pick the largest one that allows convergence to begin without signs of overfitting

Test Results

Prerequisites

Python 3.x
TensorFlow 1.x
TF-slim
- Check the 'installation' part of TF-Slim image models README.

Prepare Data set

You should prepare your own dataset or open dataset (Cifar10, flowers, MNIST, ImageNet). For preparing dataset, you can follow the 'preparing the datasets' part in TF-Slim image models README.

Train a Model

Learning Rate Range Test

Below script gives you an example of learning rate range test. To perform learning rate range test, --learning_rate_decay_type argument should be lr_range_test. Minimum learning rate and maximum learning rate should be set on --learning_rate and --max_learning_rate arguments.

DATASET_DIR=/DIRECTORY/TO/DATASET
TRAIN_DIR=/DIRECTORY/TO/TRAIN
CUDA_VISIBLE_DEVICES=0 python ../train_image_classifier.py \
--train_dir $TRAIN_DIR \
--dataset_dir $DATASET_DIR \
--dataset_name imagenet \
--dataset_split_name train \
--model_name resnet_v1_50 \
--learning_rate_decay_type lr_range_test \
--optimizer momentum \
--momentum 0.9 \
--weight_decay 0.00001 \
--learning_rate 0.00001 \
--max_learning_rate 3.0 \
--step_size 50000 \
--max_number_of_steps 100000 \
--train_image_size 224 \
--batch_size 64

Train with One-Cycle Policy

Below script gives you an example of training a model with one-cycle policy. To perform one-cycle policy, --learning_rate_decay_type argument should be one_cycle. For cyclical learning rate, minimum learning rate and maximum learning rate should be set on --learning_rate and --max_learning_rate arguments. For cyclical momentum, minimum momentum and maximum momentum should be set on --min_momentum and --momentum arguments.

DATASET_DIR=/DIRECTORY/TO/DATASET
TRAIN_DIR=/DIRECTORY/TO/TRAIN
CUDA_VISIBLE_DEVICES=0 python ../train_image_classifier.py \
--train_dir $TRAIN_DIR \
--dataset_dir $DATASET_DIR \
--dataset_name imagenet \
--dataset_split_name train \
--model_name resnet_v1_50 \
--learning_rate_decay_type one_cycle \
--optimizer momentum \
--momentum 0.95 \
--min_momentum 0.85 \
--weight_decay 0.00001 \
--learning_rate 0.5 \
--max_learning_rate 1.0 \
--step_size 50000 \
--max_number_of_steps 150000 \
--train_image_size 224 \
--batch_size 64

Train with Cyclical Learning Rate

Below script gives you an example of training a model with cyclical learning rate. To perform cyclical learning rate, --learning_rate_decay_type argument should be CLR. Minimum learning rate and maximum learning rate should be set on --learning_rate and --max_learning_rate arguments.

DATASET_DIR=/DIRECTORY/TO/DATASET
TRAIN_DIR=/DIRECTORY/TO/TRAIN
CUDA_VISIBLE_DEVICES=0 python ../train_image_classifier.py \
--train_dir $TRAIN_DIR \
--dataset_dir $DATASET_DIR \
--dataset_name imagenet \
--dataset_split_name train \
--model_name resnet_v1_50 \
--learning_rate_decay_type CLR \
--optimizer momentum \
--momentum 0.95 \
--min_momentum 0.85 \
--weight_decay 0.00001 \
--learning_rate 0.1 \
--max_learning_rate 0.5 \
--step_size 1000 \
--max_number_of_steps 10000 \
--train_image_size 224 \
--batch_size 64

Evaluate a Model

To keep track of validation accuracy while training, you can use eval_image_classifier_loop.py which evaluate the performance at multiple checkpoints during training. If you want to just evaluate a model once, you can use eval_image_classifier.py.

Below script gives you an example of evaluating a model repeatedly while training a model.

DATASET_DIR=/DIRECTORY/TO/DATASET
CHECKPOINT_FILE=/DIRECTORY/TO/CHECKPOINT
EVAL_DIR=/DIRECTORY/TO/EVAL
CUDA_VISIBLE_DEVICES=0 python eval_image_classifier_loop.py \
    --alsologtostderr \
    --checkpoint_path=${CHECKPOINT_FILE} \
    --dataset_dir=${DATASET_DIR} \
    --eval_dir=${EVAL_DIR} \
    --dataset_name=imagenet \
    --dataset_split_name=validation \
    --model_name=resnet_v1_50 \
    --batch_size=100

Experiments

Cyclical LR and Momentum

Results

LR_schedule	model	data	max_steps	step_size	batch_size	optimizer	lr	weight_decay	momentum	acc	training_time
constant LR	resnet50_v1	imagenet	1000k	-	128	rmsprop	0.01	0.00004	0.9	0.6923	7d 19h
one-cycle policy	resnet50_v1	imagenet	250k	100k	128	momentum	0.05-1.0	0.00001	0.95-0.85	0.7075	2d 16h

Related Works

Blog: Super-Convergence: very fast training of neural networks using large learning rates
Repository: CBAM-TensorFlow-Slim
Repository: SENet-TensorFlow-Slim

References

Author

Byung Soo Ko / [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
datasets		datasets
deployment		deployment
figures		figures
nets		nets
preprocessing		preprocessing
scripts		scripts
utils		utils
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
eval_image_classifier.py		eval_image_classifier.py
eval_image_classifier_loop.py		eval_image_classifier_loop.py
train_image_classifier.py		train_image_classifier.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Super-Convergence-TensorFlow-Slim