Coder Social home page Coder Social logo

tony-y / pytorch_warmup Goto Github PK

View Code? Open in Web Editor NEW
363.0 5.0 25.0 6.02 MB

Learning Rate Warmup in PyTorch

Home Page: https://tony-y.github.io/pytorch_warmup/

License: MIT License

Python 100.00%
pytorch warmup adam learning-rate-scheduling deep-learning

pytorch_warmup's People

Contributors

tony-y avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pytorch_warmup's Issues

Why is warmup better than RAdam?

I've argued here LiyuanLucasLiu/RAdam#62 that if warm up and RAdam are equivalent that using RAdam might be simpler - however, I'd be curious about arguments in favour of warm up presented in this repo and related paper.

What are reasons to choose warm up isntead of RAdam?

Can the warmup_scheduler update the learning rate every epoch and not every batch?

Hi,
If i want that the 'warmup_scheduler' will update the learning after every epoch and not after every batch, should i just do as follows ( using the dampening() after every epoch):
for epoch in range(1,num_epochs+1):
for idx, batch in enumerate(dataloader):
optimizer.zero_grad()
loss = ...
loss.backward()
optimizer.step()
with warmup_scheduler.dampening():
lr_scheduler.step(epoch + idx / iters)

Thanks!

no attribute named dampening

I tried to run your emnist example and got this error:
Traceback (most recent call last):
File "main.py", line 163, in
main()
File "main.py", line 152, in main
warmup_scheduler, epoch, history)
File "main.py", line 42, in train
with warmup_scheduler.dampening():
AttributeError: 'UntunedLinearWarmup' object has no attribute 'dampening'

I can't seem to get any of your warmups to work. Do you have any idea why that might be?

Thanks so much!

Unexpected keyword argument `warmup_period`

Hi,

I just installed your library through pip install -U pytorch_warmup

I tried the following

a = warmup.UntunedLinearWarmup(optimizer, warmup_period=500)

This gives me the following error message: TypeError: __init__() got an unexpected keyword argument 'warmup_period'

If I try

a = warmup.UntunedLinearWarmup(optimizer, warmup_period=500)

I get the following for: a.warmup_params

a.warmup_params
[{'warmup_period': 1999}]

About the learning rate in scheduler

I follow the tutorial to implement the warmup_scheduler, but the learning rate I get from the get_lasr_lr() of the torch.optim.lr_scheduler.MultiStepLR is the same as the initial learning rate. How should I get the learning rate after the warmup process?

My lr jumped from 0.01 to 0.0498 without any linear signs.

Hello! I'm currently using your LinearWarmup and somehow my lr started with 0.1 and then maintained to be 0.0498 until the warmup period was over. I couldn't find out why and here's part of my code.

model = torch.nn.DataParallel(model).cuda()
// args.lr * args.lrf = 0.05
optimizer = torch.optim.SGD(model.parameters(), args.lr * args.lrf, momentum=args.momentum, weight_decay=args.weight_decay * args.wdf)
lr_scheduler =  torch.optim.lr_scheduler.CosineAnnealingLR(optimizer = optimizer,T_max = 23)
warmup_scheduler = warmup.LinearWarmup(optimizer, warmup_period = 5)
# Inside training using epoch, not iteration
 for i, (input, target) in enumerate(train_loader):
        # measure data loading time
        data_time.update(time.time() - end)

        if args.gpu is not None:
            input = input.cuda(args.gpu, non_blocking=True)
        target = target.cuda(args.gpu, non_blocking=True)

        # compute output
        output = model(input)
        loss = criterion(output, target)


        # measure accuracy and record loss
        acc1, acc5 = accuracy(output, target, topk=(1, 5))
        losses.update(loss.item(), input.size(0))
        top1.update(acc1[0], input.size(0))
        top5.update(acc5[0], input.size(0))

        # compute gradient and do SGD step
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # notice: pypi warmup project
        if i < len(train_loader)-1 and warmup_scheduler is not None:
            with warmup_scheduler.dampening():
                pass
# when the epoch ends...
with warmup_scheduler.dampening():
                    lr_scheduler.step()            

Looking forward to your reply!

UserWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible.

Hi Tony,
I am using torch1.9 and lr_scheduler.step(lr_scheduler.last_epoch + 1), but I got this UserWarning:
UserWarning: The epoch parameter in scheduler.step() was not necessary and is being deprecated where possible. Please use scheduler.step() to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available.

Will this lead to some learning rate bugs?

UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`

Hi Tony, I got a similar warning as #5 on using warmup.UntunedLinearWarmup after I upgraded my Pytorch to 1.12.1

UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

Would you help double-check whether I can still ignore the warning in my pytorch version?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.