tony-y / pytorch_warmup Goto Github PK
View Code? Open in Web Editor NEWLearning Rate Warmup in PyTorch
Home Page: https://tony-y.github.io/pytorch_warmup/
License: MIT License
Learning Rate Warmup in PyTorch
Home Page: https://tony-y.github.io/pytorch_warmup/
License: MIT License
I've argued here LiyuanLucasLiu/RAdam#62 that if warm up and RAdam are equivalent that using RAdam might be simpler - however, I'd be curious about arguments in favour of warm up presented in this repo and related paper.
What are reasons to choose warm up isntead of RAdam?
I see in the original paper https://arxiv.org/abs/1910.04209 there is a rule of thumb of
2 * (1 - beta_2)^-1
which seems to be for the warm up. But what about the decay rate?
related: LiyuanLucasLiu/RAdam#66
In every batch, I execute
loss.backward()
optimizer.zero_grad()
optimizer.step()
with warmup_scheduler.dampening():
lr_scheduler.step()
It doesn't have a warm up process.
The sdist on PyPI does not include the license file, it would be preferable to have the license files included with the distributions.
Hi,
If i want that the 'warmup_scheduler' will update the learning after every epoch and not after every batch, should i just do as follows ( using the dampening() after every epoch):
for epoch in range(1,num_epochs+1):
for idx, batch in enumerate(dataloader):
optimizer.zero_grad()
loss = ...
loss.backward()
optimizer.step()
with warmup_scheduler.dampening():
lr_scheduler.step(epoch + idx / iters)
Thanks!
Thank you for a great implementation,
what do you think is the most appropriate way to use this library inside pytorch-lightning
?
I tried to run your emnist example and got this error:
Traceback (most recent call last):
File "main.py", line 163, in
main()
File "main.py", line 152, in main
warmup_scheduler, epoch, history)
File "main.py", line 42, in train
with warmup_scheduler.dampening():
AttributeError: 'UntunedLinearWarmup' object has no attribute 'dampening'
I can't seem to get any of your warmups to work. Do you have any idea why that might be?
Thanks so much!
Hi,
I just installed your library through pip install -U pytorch_warmup
I tried the following
a = warmup.UntunedLinearWarmup(optimizer, warmup_period=500)
This gives me the following error message: TypeError: __init__() got an unexpected keyword argument 'warmup_period'
If I try
a = warmup.UntunedLinearWarmup(optimizer, warmup_period=500)
I get the following for: a.warmup_params
a.warmup_params
[{'warmup_period': 1999}]
I follow the tutorial to implement the warmup_scheduler, but the learning rate I get from the get_lasr_lr() of the torch.optim.lr_scheduler.MultiStepLR is the same as the initial learning rate. How should I get the learning rate after the warmup process?
Can I implement what you did here but using hugging face? What is the difference of what you did and what is given at hugging face?
Hi, Tony,
I have a request, that during warmup training in the first epoch, the warmup-scheduler can adjust learning rate every step [or every N steps], and after the warmup stage, we will use regular lr-schedular to adjust the learning-rate every epoch. Is there any example code about it?
Hello! I'm currently using your LinearWarmup
and somehow my lr started with 0.1 and then maintained to be 0.0498 until the warmup period was over. I couldn't find out why and here's part of my code.
model = torch.nn.DataParallel(model).cuda()
// args.lr * args.lrf = 0.05
optimizer = torch.optim.SGD(model.parameters(), args.lr * args.lrf, momentum=args.momentum, weight_decay=args.weight_decay * args.wdf)
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer = optimizer,T_max = 23)
warmup_scheduler = warmup.LinearWarmup(optimizer, warmup_period = 5)
# Inside training using epoch, not iteration
for i, (input, target) in enumerate(train_loader):
# measure data loading time
data_time.update(time.time() - end)
if args.gpu is not None:
input = input.cuda(args.gpu, non_blocking=True)
target = target.cuda(args.gpu, non_blocking=True)
# compute output
output = model(input)
loss = criterion(output, target)
# measure accuracy and record loss
acc1, acc5 = accuracy(output, target, topk=(1, 5))
losses.update(loss.item(), input.size(0))
top1.update(acc1[0], input.size(0))
top5.update(acc5[0], input.size(0))
# compute gradient and do SGD step
optimizer.zero_grad()
loss.backward()
optimizer.step()
# notice: pypi warmup project
if i < len(train_loader)-1 and warmup_scheduler is not None:
with warmup_scheduler.dampening():
pass
# when the epoch ends...
with warmup_scheduler.dampening():
lr_scheduler.step()
Looking forward to your reply!
Hi Tony,
I am using torch1.9 and lr_scheduler.step(lr_scheduler.last_epoch + 1)
, but I got this UserWarning:
UserWarning: The epoch parameter in scheduler.step()
was not necessary and is being deprecated where possible. Please use scheduler.step()
to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available.
Will this lead to some learning rate bugs?
Hi Tony, I got a similar warning as #5 on using warmup.UntunedLinearWarmup
after I upgraded my Pytorch to 1.12.1
UserWarning: Detected call of lr_scheduler.step()
before optimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step()
before lr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
Would you help double-check whether I can still ignore the warning in my pytorch version?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.