Coder Social home page Coder Social logo

keras-one-cycle's Introduction

One Cycle Learning Rate Policy for Keras

Implementation of One-Cycle Learning rate policy from the papers by Leslie N. Smith.

Contains two Keras callbacks, LRFinder and OneCycleLR which are ported from the PyTorch Fast.ai library.

What is One Cycle Learning Rate

It is the combination of gradually increasing learning rate, and optionally, gradually decreasing the momentum during the first half of the cycle, then gradually decreasing the learning rate and optionally increasing the momentum during the latter half of the cycle.

Finally, in a certain percentage of the end of the cycle, the learning rate is sharply reduced every epoch.

The Learning rate schedule is visualized as :

The Optional Momentum schedule is visualized as :

Usage

Finding a good learning rate

Use LRFinder to obtain a loss plot, and visually inspect it to determine the initial loss plot. Provided below is an example, used for the MiniMobileNetV2 model.

An example script has been provided in find_lr_schedule.py inside the models/mobilenet/.

Essentially,

from clr import LRFinder

lr_callback = LRFinder(num_samples, batch_size,
                       minimum_lr, maximum_lr,
                       # validation_data=(X_val, Y_val),
                       lr_scale='exp', save_dir='path/to/save/directory')

# Ensure that number of epochs = 1 when calling fit()
model.fit(X, Y, epochs=1, batch_size=batch_size, callbacks=[lr_callback])

The above callback does a few things.

  • Must supply number of samples in the dataset (here, 50k from CIFAR 10) and the batch size that will be used during training.
  • lr_scale is set to exp - useful when searching over a large range of learning rates. Set to linear to search a smaller space.
  • save_dir - Automatic saving of the results of LRFinder on some directory path specified. This is highly encouraged.
  • validation_data - provide the validation data as a tuple to use that for the loss plot instead of the training batch loss. Since the validation dataset can be very large, we will randomly sample k batches (k * batch_size) from the validation set to provide quick estimate of the validation loss. The default value of k can be changed by changing validation_sample_rate

Note : When using this, be careful about setting the learning rate, momentum and weight decay schedule. The loss plots will be more erratic due to the sampling of the validation set.

NOTE 2 :

  • It is faster to get the learning rate without using validation_data, and then find the weight decay and momentum based on that learning rate while using validation_data.
  • You can also use LRFinder to find the optimal weight decay and momentum values using the examples find_momentum_schedule.py and find_weight_decay_schedule.py inside models/mobilenet/ folder.

To visualize the plot, there are two ways -

  • Use lr_callback.plot_schedule() after the fit() call. This uses the current training session results.
  • Use class method LRFinder.plot_schedule_from_dir('path/to/save/directory') to visualize the plot separately from the training session. This only works if you used the save_dir argument to save the results of the search to some location.

Finding the optimal Momentum

Use the find_momentum_schedule.py script inside models/mobilenet/ for an example.

Some notes :

  • Use a grid search over a few possible momentum values, such as [0.8, 0.85, 0.9, 0.95, 0.99]. Use linear as the lr_scale argument value.

  • Set the momentum value manually to the SGD optimizer before compiling the model.

  • Plot the curve at the end and visually see which momentum value yields the least noisy / lowest losses overall on the plot. The absolute value of the loss plot is not very important as much as the curve.

  • It is better to supply the validation_data here.

  • The plot will be very noisy, so if you wish, can use a larger value of loss_smoothing_beta (such as 0.99 or 0.995)

  • The actual curve values doesnt matter as much as what is overall curve movement. Choose the value which is more steady and tries to get the lowest value even at large learning rates.

Finding the optimal Weight Decay

Use the find_weight_decay_schedule.py script inside models/mobilenet/ for an example

Some notes :

  • Use a grid search over a few weight decay values, such as [1e-3, 1e-4, 1e-5, 1e-6, 1e-7]. Call this "coarse search" and use linear for the lr_scale argument.

  • Use a grid search over a select few weight decay values, such as [3e-7, 1e-7, 3e-6]. Call this "fine search" and use linear scale for the lr_scale argument.

  • Set the weight decay value manually to the model when building the model.

  • Plot the curve at the end and visually see which weight decay value yields the least noisy / lowest losses overall on the plot. The absolute value of the loss plot is not very important as much as the curve.

  • It is better to supply the validation_data here.

  • The plot will be very noisy, so if you wish, can use a larger value of loss_smoothing_beta (such as 0.99 or 0.995)

  • The actual curve values doesnt matter as much as what is overall curve movement. Choose the value which is more steady and tries to get the lowest value even at large learning rates.

Interpreting the plot

Learning Rate

Consider the above plot from using the LRFinder on the MiniMobileNetV2 model. In particular, there are a few regions above that we need to carefully interpret.

Note : The values are in log 10 scale (since exp was used for lr_scale) ; All values discussed will be based on the x-axis (learning rate) :

  • After the -1.5 point on the graph, the loss becomes erratic
  • After the 0.5 point on the graph, the loss is noisy but doesn't decrease any further.
  • -1.7 is the last relatively smooth portion before the -1.5 region. To be safe, we can choose to move a little more to the left, closer to -1.8, but this will reduce the performance.
  • It is usually important to visualize the first 2-3 epochs of OneCycleLR training with values close to these edges to determine which is the best.

Momentum

Using the above learning rate, use this information to next calculate the optimal momentum (find_momentum_schedule.py)

See the notes in the Finding the optimal momentum section on how to interpret the plot.

Weight Decay

Similarly, it is possible to use the above learning rate and momentum values to calculate the optimal weight decay (find_weight_decay_schedule.py).

Note : Due to large learning rates acting as a strong regularizer, other regularization techniques like weight decay and dropout should be decreased significantly to properly train the model.

It is best to search a range of regularization strength between 1e-3 to 1e-7 first, and then fine-search the region that provided the best overall plot.

See the notes in the Finding the optimal weight decay section on how to interpret the plot.

Training with OneCycleLR

Once we find the maximum learning rate, we can then move onto using the OneCycleLR callback with SGD to train our model.

from clr import OneCycleLR

lr_manager = OneCycleLR(num_samples, num_epoch, batch_size, max_lr
                        end_percentage=0.1, scale_percentage=None,
                        maximum_momentum=0.95, minimum_momentum=0.85)
                        
model.fit(X, Y, epochs=EPOCHS, batch_size=batch_size, callbacks=[model_checkpoint, lr_manager], 
          ...)

There are many parameters, but a few of the important ones :

  • Must provide a lot of training information - number of samples, number of epochs, batch size and max learning rate
  • end_percentage is used to determine what percentage of the training epochs will be used for steep reduction in the learning rate. At its miminum, the lowest learning rate will be calculated as 1/1000th of the max_lr provided.
  • scale_percentage is a confusing parameter. It dictates the scaling factor of the learning rate in the second half of the training cycle. It is best to test this out visually using the plot_clr.py script to ensure there are no mistakes. Leaving it as None defaults to using the same percentage as the provided end_percentage.
  • maximum/minimum_momentum are preset according to the paper and Fast.ai. However, if you don't wish to scale it, set both to the same value, generally 0.9 is preferred as the momentum value for SGD. If you don't want to update the momentum / are not using SGD (not adviseable) - set both to None to ignore the momentum updates.

Results

  • -1.7 is chosen to be the maximum learning rate (in log10 space) for the OneCycleLR schedule. Since this is in log10 scale, we use 10 ^ (x) to get the actual learning maximum learning rate. Here, 10 ^ -1.7 ~ 0.019999. Therefore, we round up to a maximum learning rate of 0.02
  • 0.9 is chosen as the maximum momentum from the momentum plot. Using Cyclic Momentum updates, choose a slightly lower value (0.85) as the minimum for faster training.
  • 3e-6 is chosen as the the weight decay factor.

For the MiniMobileNetV2 model, 2 passes of the OneCycle LR with SGD (40 epochs - max lr = 0.02, 30 epochs - max lr = 0.005) obtained 90.33%. This may not seem like much, but this is a model with only 650k parameters, and in comparison, the same model trained on Adam with initial learning rate 2e-3 did not converge to the same score in over 100 epochs (89.14%).

Requirements

  • Keras 2.1.6+
  • Tensorflow (tested) / Theano / CNTK for the backend
  • matplotlib to visualize the plots.

keras-one-cycle's People

Contributors

benman1 avatar cpoptic avatar mrinaljain17 avatar satcos avatar titu1994 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keras-one-cycle's Issues

Don't support tensorflow Datasets when use validation

Currently, it seems this tool suppose validation to be NumPy arrays or Keras generator, while the tensorflow.data.Dataset is also one of the most frequently used input data structure which is not supported by this tool. I notice that the raised error comes from indexing, counting and sampling on validation dataset which belongs to tf.data.Dataset class. This should be very easy to solve. Could you please add this feature?

LRFinder smoothing

The LR smoothing in line 336 isn't done right.
running_loss = self.loss_smoothing_beta * loss + (1. - self.loss_smoothing_beta) * loss
The first loss should be self.running_loss_, and self.running_loss_ should be updated right after.

Request for an interpretation about what these callbacks did

I know there is a "plot interpretation" section already. But I still cannot get the idea about what your callback did and what is the meaning of the plot. Why there will be a curve for "loss versus learning rate"? Does your LrFinder change the learning rate among steps? Since we know the best learning rate will change as training goes on, what is the meaning of the found learning rate by LrFinder()? what about the learning rate set in the optimizer? .... It would be very helpful for us to understand your tool if you introduce briefly what those callbacks did in the training. Just a simple introduction without details will be very very useful. Thanks

CLR failing with KeyError: 'batch_size' during model.fit

Have defined my lr callback as follows

lr_manager = OneCycleLR(max_lr,end_percentage,
scale_percentage,maximum_momentum,
minimum_momentum,verbose=True)
Then have further created the callback

callbackscustomx = [lr_manager,
ModelCheckpoint(filepathx, monitor='val_acc', verbose=1, save_best_only=True)]

But then model.fit is failing ....
modelCLR = model1.fit_generator(Customdatagen.flow(train_features, train_labels, batch_size=64),
steps_per_epoch=len(train_features) / 32,
epochs=nb_epoch,
verbose=1,
shuffle=False,
callbacks=callbackscustomx,
validation_data=(test_features, test_labels)
)

It is failing with following error. Looks like CLR expects batch size but then we cant pass the value?
147
148 self.epochs = self.params['epochs']
--> 149 self.batch_size = self.params['batch_size']
150 self.samples = self.params['samples']
151 self.steps = self.params['steps']

KeyError: 'batch_size'

strange LR vs loss plot

Hi, thanks for creating this repo! This is likely not a bug, but I'm curious if you would be able to interpret for me what this lr vs. loss plot means? It looks quite different from the plots I've gotten in the past from the Fastai library.

Automatically Finding the Values Without Plots?

Hi,
I'm visually impaired, so I can't look at the plots.
I'm wondering if it's possible to implement a feature for the script to find the right values, so users don't have to look at the plots and decide.
If not, do you have some suggestions on how I can interpret the data without visualization?
Thanks,

validation data generator

Can I use ImageDataGenerator.flow_from_directory for the validation set when finding a momentum value?

Do I need weight decay?

My model isn't using any regularizers. Do I need weight decay? Or Should add l2?
(Model in questions is a mix of 3d convolutions and 2d convolutions)

tensorflow 2.0

File "F:/workspace/python/tf-fit/keras_clr/clr.py", line 603, in
inp = Input(shape=(10,))
File "E:\Anaconda3_5_0_0\envs\DeepLearning\lib\site-packages\keras\engine\input_layer.py", line 178, in Input
input_tensor=tensor)
File "E:\Anaconda3_5_0_0\envs\DeepLearning\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "E:\Anaconda3_5_0_0\envs\DeepLearning\lib\site-packages\keras\engine\input_layer.py", line 39, in init
name = prefix + '_' + str(K.get_uid(prefix))
File "E:\Anaconda3_5_0_0\envs\DeepLearning\lib\site-packages\keras\backend\tensorflow_backend.py", line 74, in get_uid
graph = tf.get_default_graph()
AttributeError: module 'tensorflow' has no attribute 'get_default_graph'

AttributeError: 'LRFinder' object has no attribute '_implements_train_batch_hooks'

I received this output when using data generators:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-1aa4d91b8f8d> in <module>
      1 model.compile(optimizer='adam', loss=losses.CategoricalCrossentropy(),metrics=['accuracy'])
      2 lr_callback = LRFinder(train_generator.n, train_generator.batch_size)
----> 3 history = model.fit_generator(train_generator, validation_data=val_generator, epochs=10, callbacks = [lr_callback])

/opt/conda/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py in new_func(*args, **kwargs)
    322               'in a future version' if date is None else ('after %s' % date),
    323               instructions)
--> 324       return func(*args, **kwargs)
    325     return tf_decorator.make_decorator(
    326         func, new_func, 'deprecated',

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1477         use_multiprocessing=use_multiprocessing,
   1478         shuffle=shuffle,
-> 1479         initial_epoch=initial_epoch)
   1480 
   1481   @deprecation.deprecated(

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
     64   def _method_wrapper(self, *args, **kwargs):
     65     if not self._in_multi_worker_mode():  # pylint: disable=protected-access
---> 66       return method(self, *args, **kwargs)
     67 
     68     # Running inside `run_distribute_coordinator` already.

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
    824             verbose=verbose,
    825             epochs=epochs,
--> 826             steps=data_handler.inferred_steps)
    827 
    828       self.stop_training = False

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py in __init__(self, callbacks, add_history, add_progbar, model, **params)
    229     # pylint: disable=protected-access
    230     self._should_call_train_batch_hooks = any(
--> 231         cb._implements_train_batch_hooks() for cb in self.callbacks)
    232     self._should_call_test_batch_hooks = any(
    233         cb._implements_test_batch_hooks() for cb in self.callbacks)

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py in <genexpr>(.0)
    229     # pylint: disable=protected-access
    230     self._should_call_train_batch_hooks = any(
--> 231         cb._implements_train_batch_hooks() for cb in self.callbacks)
    232     self._should_call_test_batch_hooks = any(
    233         cb._implements_test_batch_hooks() for cb in self.callbacks)

AttributeError: 'LRFinder' object has no attribute '_implements_train_batch_hooks'

no find_schedule_cifar_10.py

Just an aside: there is no "find_schedule_cifar_10.py" file in the repo.

Did you mean "./models/mobilenet/find_lr_schedule.py" ?

__init__() got multiple values for argument 'and percentage'

when I use
clr_triangular = OneCycleLR(NUM_SAMPLES, NUM_EPOCHS, BATCH_SIZE, MAX_LR, end_percentage=0.2, scale_percentage=0.2)
in your plot_clr.py
I got the following errors:
init() got multiple values for argument 'and percentage'

But if I use
clr_triangular = OneCycleLR(max_lr=0.02, maximum_momentum=0.9, verbose=True)
it works well.

The last batch not considered when lr_scale="linear"

self.num_batches_ = num_samples // batch_size - 1

When using the LRFinder for searching linearly (lr_scale="linear") over some sample space, it raises a "list out of bounds" error. The problem seems to be in the line above.

It should be the following instead, I think.

self.num_batches_ = num_samples // batch_size

Now, after adding the extra_batch, things works perfectly.

keras-one-cycle/clr.py

Lines 282 to 284 in ec71a67

else:
extra_batch = int((num_samples % batch_size) != 0)
self.lr_multiplier_ = np.linspace(minimum_lr, maximum_lr, num=self.num_batches_ + extra_batch)

Not working for ImageDataGenerator.flow_from_directory()

Have used ImageDataGenerator.flow_from_directory() and while trying to fit using

fit_generator(generator=train_gen,steps_per_epoch=steps_train,
validation_data=valid_gen,epochs=1, verbose=1,
validation_steps=steps_valid,
callbacks=[lr_finder, model_checkpoint])

getting the error :

clr.py in on_batch_end(self, batch, logs)
    353             num_samples = self.batch_size * self.validation_sample_rate
    354 
--> 355             if num_samples > X.shape[0]:
    356                 num_samples = X.shape[0]
    357 

AttributeError: 'tuple' object has no attribute 'shape'

I think it's because of using train and validation_datagen instead of (X_train, Y_train), (X_test, Y_test) in fit_generator() as done in the provided example.
How could I make it work for train_generator and valid_generator ( data iterators) instead of (X_train, Y_train)?

LRFinder verbos print progress in multiple lines

When verbose is used, LR finder prints progress in multiple lines which is ugly.
There progress bar should be printed in the same line

256/298 [========================>.....] - ETA: 38s - loss: 3.3550 - accuracy: 0.4697 - LRFinder: lr = 1.49450212
257/298 [========================>.....] - ETA: 37s - loss: 3.3483 - accuracy: 0.4698 - LRFinder: lr = 1.56541953
258/298 [========================>.....] - ETA: 36s - loss: 3.3515 - accuracy: 0.4680 - LRFinder: lr = 1.63970221
259/298 [=========================>....] - ETA: 35s - loss: 3.3479 - accuracy: 0.4662 - LRFinder: lr = 1.71750974
260/298 [=========================>....] - ETA: 34s - loss: 3.3386 - accuracy: 0.4673 - LRFinder: lr = 1.79900942
261/298 [=========================>....] - ETA: 33s - loss: 3.3326 - accuracy: 0.4655 - LRFinder: lr = 1.88437646
262

Documentation has different initialization

The initialization in clr.py is:
def __init__(self, num_samples, batch_size, max_lr, end_percentage=0.1, scale_percentage=None, maximum_momentum=0.95, minimum_momentum=0.85, verbose=True)

The initialization in the documentation is:
lr_manager = OneCycleLR(num_samples, num_epoch, batch_size, max_lr end_percentage=0.1, scale_percentage=None, maximum_momentum=0.95, minimum_momentum=0.85)

This causes unexpected behaviour in the function compute_lr.

How do you install keras-one-cycle on Anaconda

I went to PyPI and there they say you could install using

pip install keras-one-cycle-lr

which I did but then when I execute

from clr import LRFinder

I get

ModuleNotFoundError: No module named 'clr'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.