mikoto10032 / automaticweightedloss Goto Github PK
View Code? Open in Web Editor NEWMulti-task learning using uncertainty to weigh losses for scene geometry and semantics, Auxiliary Tasks in Multi-task Learning
License: Apache License 2.0
Multi-task learning using uncertainty to weigh losses for scene geometry and semantics, Auxiliary Tasks in Multi-task Learning
License: Apache License 2.0
作者你好:
我有2个任务,都用的交叉熵,训练时候发现awl的参数不更新,请问你碰到过这个问题吗?
Hello, is there any specific implementation code on tensorflow, or is there any annotation explanation. We want to implement it on our own model.Thank you.
Hey everyone,
first of all, thanks for your implementation
the following formula from the paper "Auxiliary Tasks in Multi-task Learning"
was implemented by:
loss_sum += 0.5 / (self.params[i] ** 2) * loss + torch.log(1 + self.params[i] ** 2)
but imo this is not exactly true, since 0.5 / sigma^2 is not the same as 1/(2*sigma^2) am I right or do i oversee something here?
Thanks for feedback
So in your example, Automaticweightedloss is a unique module of Model, another optimzer should be used to update the params separately, or the params would follow the direction of model gradient.
I am going to implement it in my work, hope for its effective improvement
我设置了多个优化器去优化不同的任务,awl的学习率和优化器应该怎样选择呢?谢谢博主
TypeError: optimizer can only optimize Tensors, but one of the params is list
我的optimizer代码为
`
def get_optimizer(self):
lr = opt.lr
params = []
# 此处个人理解为传参进入dict有偏置的学习率乘2无权重衰减(也即无惩罚项)
# 无偏置项的学习率不变,有权重衰减
for key, value in dict(self.named_parameters()).items():
if value.requires_grad:
if 'bias' in key:
params += [{'params': [value], 'lr': lr * 2, 'weight_decay': 0}]
else:
params += [{'params': [value], 'lr': lr, 'weight_decay': opt.weight_decay}]
if opt.use_adam:
self.optimizer = t.optim.Adam([params, {'params': self.awl.parameters(), 'weight_decay': 0}])
else:
self.optimizer = t.optim.SGD([params, {'params': self.awl.parameters(), 'weight_decay': 0}], momentum=0.9)
return self.optimizer
`
我想请问这样导入parameters是有什么问题吗?如果有问题的话请问您有什么解决方案吗?
Hi, thanks for sharing the code. It's really helpful to me. I have two questions.
In another implementation, the sigma^2
is used as a parameter. I tried both learning sigma
and sigma^2
. They show close but different performance. Do you think the implementation difference may have some significant impact?
I'm using hinge loss with uncertainty. For some batches, the loss value may be zero. In the case of loss being zero, the params
have a chance to become zero or very small. Do you have suggestions on this?
DDP means DistributedDataParallel
People usually use multiple module & optimizer on GAN model, for example:
moduleA = Generator()
moduleB = Discriminator()
moduleC = Predictor()
so the corresponding optimizers are:
optG = optim.Adam(Generator.parameters(), ...)
optD = optim.Adam(Discriminator.parameters(), ...)
optP = optim.Adam(Predictor.parameters(), ...)
For single module, the example show:
model = Model()
optimizer = optim.Adam([
{'params': model.parameters()},
{'params': awl.parameters(), 'weight_decay': 0}
])
For the multiple modules above, how to set the parameters in optimizers? I can guess two options but they might be wrong:
option1:
optG = optim.Adam(list(Generator.parameters()), ...)
optD = optim.Adam(list(Discriminator.parameters()), ...)
optP = optim.Adam(list(Predictor.parameters())+list(awl.parameters()), ...)
option2:
optG = optim.Adam(list(Generator.parameters())+list(awl.parameters()), ...)
optD = optim.Adam(list(Discriminator.parameters())+list(awl.parameters()), ...)
optP = optim.Adam(list(Predictor.parameters())+list(awl.parameters()), ...)
@Mikoto10032
Which one is correct?
Thanks!
Thanks for your work and I have a question
Why can't the loss be negative? It seems to me that the value of the loss does not affect the training of the network.
As an example, let's say my loss is the cross-entropy loss, which is (0, 1) most of the time, and the optimization goal is to minimize the loss.
Now suppose I add a constant of -100 to the loss. Loss = loss-100. The loss will be (-100, -99), and the optimization goal remains the same: reduce the loss
The way to reduce the loss is gradient descent. Obviously, the constant -100 does not affect the gradient of the network parameters, that is, the loss does not seem to affect the training process, what is important is the gradient of this value to the network parameters.
Now back to the original question, why is it necessary to avoid negative losses?
Please, let me know.
model.parameters()和awl.parameters(),lr 和weight_decay需要设置成一样吗?
I have 3 losses, such as
"""
tensor([0.9926, 0.9926, 0.9927], requires_grad=True)
tensor([0.9908, 0.9908, 0.9909], requires_grad=True)
tensor([0.9873, 0.9873, 0.9873], requires_grad=True)
"""
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.