super-convergence

Here are the Caffe files of our recent work: Smith, Leslie N. and Nicholay Topin "Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates" arXiv preprint arXiv:1708.07120 (2017). Please read the paper for details. In addition, see the paper "Cyclical Learning Rates for Training Neural Networks" at https://arxiv.org/pdf/1506.01186.pdf for instructions on implementing cyclical learning rates in Caffe.

Note: if you have a better theoretical understanding of the cause for super-convergence than the ones described in the paper, please contact [email protected] about a collaboration on a follow up paper.

Instructions:

To simplify the replication of the figures in the paper, a shell script x.sh is included, which we used to replicate our experiments and create the figures in the paper. This execution script shows the changes to each file needed for each run. Below we spell out these changes.

From caffe home directory: ./build/tools/caffe train --solver=$SOLVER -gpu=all

As provided, this solver file trains the CLR network from Figure 1a. Changes must be made to reproduce other experiments, as listed below.

Fig. 1a:
	LR=0.35:
	$SOLVER should be the provided "solver.prototxt". 	
		net: ".../Resnet56Cifar.prototxt"
		test_iter: 200
		test_interval: 100
		display: 100
		lr_policy: "multistep"
		stepvalue: 50000
		stepvalue: 70000
		base_lr: 0.35
		gamma: 0.1
		max_iter: 80000
		weight_decay: 1e-4
		momentum: 0.9

	CLR=0.1-3.0:
	$SOLVER should be the provided "clrsolver.prototxt". 
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3.0
		stepsize: 5000
		max_iter: 10000
		weight_decay: 1e-4
		momentum: 0.9

Fig. 1b:
$SOLVER should be the provided "clrsolver.prototxt". 
	Stepsize=10k:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3.0
		stepsize: 10000
		max_iter: 20000
	Stepsize=5k:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3.0
		stepsize: 5000
		max_iter: 10000
	Stepsize=3k:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3.0
		stepsize: 3000
		max_iter: 6000
	Stepsize=1k:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3.0
		stepsize: 1000
		max_iter: 2000
		
Fig. 2a:
	Figured reproduced from Smith [2017] with permission.

Fig. 2b:
$SOLVER should be the provided "lrRangeSolver.prototxt". 
	Max Iter=5k:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 3.0
		stepsize: 5000
		max_iter: 5000
	Max Iter=20k:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 3.0
		stepsize: 20000
		max_iter: 20000
	Max Iter=100k:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 3.0
		stepsize: 100000
		max_iter: 100000

Fig. 3a:
	Figure reproduced from Goodfellow et al. [2014] with permission.
	
Fig. 3b:
	Figure reproduced from Goodfellow et al. [2014] with permission.

Fig. 4a:
$SOLVER should be the provided "lrRangeSolver.prototxt". 
	Single network:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 1.5
		stepsize: 20000
		max_iter: 20000

Fig. 4b:
	Resnet-20:
		net: ".../Resnet20Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 1.5
		stepsize: 20000
		max_iter: 20000
	Resnet-110:
		net: ".../Resnet110Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 1.5
		stepsize: 20000
		max_iter: 20000

Fig. 5a:
	Need to write out snapshots every iteration for 300 iterations
	Modify L2TermComputation3.py:
		netProto = 
		presnapshot = 
		postsnapshot = 


Fig. 5b:
	Need to write out snapshots every 10th iteration for 10000 iterations
	Modify L2TermComputation4.py:
		netProto = 
		presnapshot = 
		postsnapshot = 

Fig. 6a:
	Same solver settings as Fig. 1a. 
	Training LMDB (or other source) listed within architecture must be re-made with fewer samples.

Fig. 6b:
	$SOLVER should be the provided "solver.prototxt". 	
	Resnet-110 LR=0.35:
		net: ".../Resnet110Cifar.prototxt"
		lr_policy: "multistep"
		stepvalue: 50000
		stepvalue: 70000
		base_lr: 0.35
		gamma: 0.1
		max_iter: 80000
	$SOLVER should be the provided "clrsolver.prototxt". 
	Resnet-110 CLR=0.1-3 SS=10k:
		net: ".../Resnet110Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3
		stepsize: 10000
		max_iter: 20000
	$SOLVER should be the provided "solver.prototxt". 	
	Resnet-20 LR=0.35:
		net: ".../Resnet20Cifar.prototxt"
		lr_policy: "multistep"
		stepvalue: 50000
		stepvalue: 70000
		base_lr: 0.35
		gamma: 0.1
		max_iter: 80000
	$SOLVER should be the provided "clrsolver.prototxt". 
	Resnet-20 CLR=0.1-3 SS=10k:
		net: ".../Resnet20Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3
		stepsize: 10000
		max_iter: 20000

Fig. 7a:
dataset used by network must be changed to CIFAR-100
$SOLVER should be the provided "lrRangeSolver.prototxt". 
	Single network:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 3.0
		stepsize: 20000
		max_iter: 20000

Fig. 7b:
dataset used by network must be changed to CIFAR-100
	LR=0.35:
	$SOLVER should be the provided "solver.prototxt". 
		net: ".../Resnet56Cifar.prototxt" 
		lr_policy: "multistep"
		stepvalue: 50000
		stepvalue: 70000
		base_lr: 0.35
		gamma: 0.1
		max_iter: 80000
	CLR=0.1-3 SS=5k:
	$SOLVER should be the provided "clrsolver.prototxt". 
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3.0
		stepsize: 5000
		max_iter: 10000
	
Fig. 8a:
$SOLVER should be the provided "solver.prototxt". 
	LR=0.35:
	All use solver settings from LR=0.35 in Fig. 1a, but with solver type changed.
		type: "Nesterov"
		type: "AdaDelta"
		type: "AdaGrad"  and remove momentum
		type: "Adam"     and base_lr:  0.0035

Fig. 8b:
	LR=0.35:
	$SOLVER should be the provided "solver.prototxt". 
	Same solver settings as Fig. 9a, but with:
		type: "Nesterov"
	CLR=0.1-3 SS=5k:
	$SOLVER should be the provided "clrsolver.prototxt". 
	Same solver settings as Fig. 1a, but with:
		type: "Nesterov"

Fig. 9a:
	Same solver settings as CLR=0.1-3.0 in Fig. 1a, but with batchSize changed within architecture.
	
Fig. 9b:
	Same solver settings as CLR=0.1-3.0 in Fig. 1a, but with dropout ratio changed within architecture.

Fig. 10a:
	Same solver settings as CLR=0.1-3.0 in Fig. 1a, but with momentum changed.
	
Fig. 10b:
	Same solver settings as CLR=0.1-3.0 in Fig. 1a, but with weight_decay changed.
	
Fig. 11a:
$SOLVER should be the provided "clrsolver.prototxt". 
	Single network:
		net: ".../bottleneckResnet56.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 0.61
		stepsize: 50000
		max_iter: 50000

Fig. 11b:
$SOLVER should be the provided "clrsolver.prototxt". 
	Single network:
		net: ".../ResNeXt56.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 0.7
		stepsize: 50000
		max_iter: 50000

liuguoyou / super-convergence Goto Github PK

super-convergence's Introduction

super-convergence

super-convergence's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent