tony-y / pytorch_warmup Goto Github PK

View Code? Open in Web Editor NEW

363.0 5.0 25.0 6.02 MB

Learning Rate Warmup in PyTorch

Home Page: https://tony-y.github.io/pytorch_warmup/

License: MIT License

Python 100.00%

pytorch warmup adam learning-rate-scheduling deep-learning

pytorch_warmup's People

Contributors

Stargazers

Watchers

pytorch_warmup's Issues

Why is warmup better than RAdam?

I've argued here LiyuanLucasLiu/RAdam#62 that if warm up and RAdam are equivalent that using RAdam might be simpler - however, I'd be curious about arguments in favour of warm up presented in this repo and related paper.

What are reasons to choose warm up isntead of RAdam?

What is the decay rule of thumb?

I see in the original paper https://arxiv.org/abs/1910.04209 there is a rule of thumb of

2 * (1 - beta_2)^-1

which seems to be for the warm up. But what about the decay rate?

related: LiyuanLucasLiu/RAdam#66

Why did my learning rate drop from the initial lr

In every batch, I execute

loss.backward()
optimizer.zero_grad()
optimizer.step()
with warmup_scheduler.dampening():
    lr_scheduler.step()

It doesn't have a warm up process.

License file is not included in sdist

The sdist on PyPI does not include the license file, it would be preferable to have the license files included with the distributions.

Can the warmup_scheduler update the learning rate every epoch and not every batch?

Hi,
If i want that the 'warmup_scheduler' will update the learning after every epoch and not after every batch, should i just do as follows ( using the dampening() after every epoch):
for epoch in range(1,num_epochs+1):
for idx, batch in enumerate(dataloader):
optimizer.zero_grad()
loss = ...
loss.backward()
optimizer.step()
with warmup_scheduler.dampening():
lr_scheduler.step(epoch + idx / iters)

Thanks!

How to use in `pytorch-lightning`?

Thank you for a great implementation,

what do you think is the most appropriate way to use this library inside pytorch-lightning?

no attribute named dampening

I tried to run your emnist example and got this error:
Traceback (most recent call last):
File "main.py", line 163, in
main()
File "main.py", line 152, in main
warmup_scheduler, epoch, history)
File "main.py", line 42, in train
with warmup_scheduler.dampening():
AttributeError: 'UntunedLinearWarmup' object has no attribute 'dampening'

I can't seem to get any of your warmups to work. Do you have any idea why that might be?

Thanks so much!

Unexpected keyword argument `warmup_period`

Hi,

I just installed your library through pip install -U pytorch_warmup

I tried the following

a = warmup.UntunedLinearWarmup(optimizer, warmup_period=500)

This gives me the following error message: TypeError: __init__() got an unexpected keyword argument 'warmup_period'

If I try

a = warmup.UntunedLinearWarmup(optimizer, warmup_period=500)

I get the following for: a.warmup_params

a.warmup_params
[{'warmup_period': 1999}]

About the learning rate in scheduler

I follow the tutorial to implement the warmup_scheduler, but the learning rate I get from the get_lasr_lr() of the torch.optim.lr_scheduler.MultiStepLR is the same as the initial learning rate. How should I get the learning rate after the warmup process?

difference of this library with hugging face

Can I implement what you did here but using hugging face? What is the difference of what you did and what is given at hugging face?

https://huggingface.co/transformers/main_classes/optimizer_schedules.html?highlight=cosine#transformers.get_cosine_schedule_with_warmup

How to schedule LR with warmup on global_step initially, and then epoches after warmup?

Hi, Tony,
I have a request, that during warmup training in the first epoch, the warmup-scheduler can adjust learning rate every step [or every N steps], and after the warmup stage, we will use regular lr-schedular to adjust the learning-rate every epoch. Is there any example code about it?

My lr jumped from 0.01 to 0.0498 without any linear signs.

Hello! I'm currently using your LinearWarmup and somehow my lr started with 0.1 and then maintained to be 0.0498 until the warmup period was over. I couldn't find out why and here's part of my code.

model = torch.nn.DataParallel(model).cuda()
// args.lr * args.lrf = 0.05
optimizer = torch.optim.SGD(model.parameters(), args.lr * args.lrf, momentum=args.momentum, weight_decay=args.weight_decay * args.wdf)
lr_scheduler =  torch.optim.lr_scheduler.CosineAnnealingLR(optimizer = optimizer,T_max = 23)
warmup_scheduler = warmup.LinearWarmup(optimizer, warmup_period = 5)
# Inside training using epoch, not iteration
 for i, (input, target) in enumerate(train_loader):
        # measure data loading time
        data_time.update(time.time() - end)

        if args.gpu is not None:
            input = input.cuda(args.gpu, non_blocking=True)
        target = target.cuda(args.gpu, non_blocking=True)

        # compute output
        output = model(input)
        loss = criterion(output, target)


        # measure accuracy and record loss
        acc1, acc5 = accuracy(output, target, topk=(1, 5))
        losses.update(loss.item(), input.size(0))
        top1.update(acc1[0], input.size(0))
        top5.update(acc5[0], input.size(0))

        # compute gradient and do SGD step
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # notice: pypi warmup project
        if i < len(train_loader)-1 and warmup_scheduler is not None:
            with warmup_scheduler.dampening():
                pass
# when the epoch ends...
with warmup_scheduler.dampening():
                    lr_scheduler.step()

Looking forward to your reply!

UserWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible.

Hi Tony,
I am using torch1.9 and lr_scheduler.step(lr_scheduler.last_epoch + 1), but I got this UserWarning:
UserWarning: The epoch parameter in scheduler.step() was not necessary and is being deprecated where possible. Please use scheduler.step() to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available.

Will this lead to some learning rate bugs?

UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`

Hi Tony, I got a similar warning as #5 on using warmup.UntunedLinearWarmup after I upgraded my Pytorch to 1.12.1

UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

Would you help double-check whether I can still ignore the warning in my pytorch version?

tony-y / pytorch_warmup Goto Github PK

pytorch_warmup's People

Contributors

Stargazers

Watchers

Forkers

pytorch_warmup's Issues

Recommend Projects

Recommend Topics

Recommend Org