mikoto10032 / automaticweightedloss Goto Github PK

Multi-task learning using uncertainty to weigh losses for scene geometry and semantics, Auxiliary Tasks in Multi-task Learning

License: Apache License 2.0

Python 100.00%

multi-task multi-task-learning weigh-losses auxiliary-tasks pytorch deep-learning

automaticweightedloss's People

Contributors

Stargazers

Watchers

Forkers

zefang t-mac-curry bikong2 mengkunzhao surej0824 zerinhwang03 pyten mqrshiyan le1elayne zmskye jiabinxue changreal ilteralp aiainui lambdald zyl1336110861 hxxxxh leilei-help yirui-fafa kingleao xrosliang wly-thu wentaozhu udemirezen mars-wei mgsong siomn jie311 zhujiangzhijia soulven louiszango lujing98 bill-birth dengxunzhi wujian1995 talkuhulk yuanchenbei cherish-zyq cvlzw goodpupil kiminh jackwolftomc 1633232731 zhangjiwei-japan zhangzhe2212 ansleywong hznnn wahaha116 ynoe0 weisili2016 plin1112 bingwen-hu pytorchtricks urbanist-ai wjgaas cytsinghua heweweh kidaa1 xq-meng shigan-liu zwq1230 rqzhang017 xucliang wrqf codebreathing sunyao-coder david-willo lxhq chenky23 zedrover zachyao qnamqj whoishandsomeboy weihaopan how-about qinyongxw

automaticweightedloss's Issues

awl的parameter不更新

作者你好：
我有2个任务，都用的交叉熵，训练时候发现awl的参数不更新，请问你碰到过这个问题吗？

On the realization of tensorflow

Hello, is there any specific implementation code on tensorflow, or is there any annotation explanation. We want to implement it on our own model.Thank you.

implementation of the loss function from paper

Hey everyone,

first of all, thanks for your implementation
the following formula from the paper "Auxiliary Tasks in Multi-task Learning"

was implemented by:

        loss_sum += 0.5 / (self.params[i] ** 2) * loss + torch.log(1 + self.params[i] ** 2)

but imo this is not exactly true, since 0.5 / sigma^2 is not the same as 1/(2*sigma^2) am I right or do i oversee something here?

Thanks for feedback

optimizers shoould be used for params from different Module

So in your example, Automaticweightedloss is a unique module of Model, another optimzer should be used to update the params separately, or the params would follow the direction of model gradient.

Wonderful Work~

I am going to implement it in my work, hope for its effective improvement

学习率设置

我设置了多个优化器去优化不同的任务，awl的学习率和优化器应该怎样选择呢？谢谢博主

您好，我在使用您的代码时碰到了这种问题

TypeError: optimizer can only optimize Tensors, but one of the params is list

我的optimizer代码为

`
def get_optimizer(self)：

    lr = opt.lr
    params = []
	# 此处个人理解为传参进入dict有偏置的学习率乘2无权重衰减（也即无惩罚项）
	# 无偏置项的学习率不变，有权重衰减
    for key, value in dict(self.named_parameters()).items():
        if value.requires_grad:
            if 'bias' in key:
                params += [{'params': [value], 'lr': lr * 2, 'weight_decay': 0}]
            else:
                params += [{'params': [value], 'lr': lr, 'weight_decay': opt.weight_decay}]
    if opt.use_adam:
        self.optimizer = t.optim.Adam([params, {'params': self.awl.parameters(), 'weight_decay': 0}])
    else:
        self.optimizer = t.optim.SGD([params, {'params': self.awl.parameters(), 'weight_decay': 0}], momentum=0.9)
    return self.optimizer

我想请问这样导入parameters是有什么问题吗？如果有问题的话请问您有什么解决方案吗？

Questions about implementation

Hi, thanks for sharing the code. It's really helpful to me. I have two questions.

In another implementation, the sigma^2 is used as a parameter. I tried both learning sigma and sigma^2. They show close but different performance. Do you think the implementation difference may have some significant impact?
I'm using hinge loss with uncertainty. For some batches, the loss value may be zero. In the case of loss being zero, the params have a chance to become zero or very small. Do you have suggestions on this?

How to use this in PyTorch DDP?

DDP means DistributedDataParallel

How to set parameter list to multiple optimizers?

People usually use multiple module & optimizer on GAN model, for example:

moduleA = Generator()
moduleB = Discriminator()
moduleC = Predictor()

so the corresponding optimizers are:

optG = optim.Adam(Generator.parameters(), ...)
optD = optim.Adam(Discriminator.parameters(), ...)
optP = optim.Adam(Predictor.parameters(), ...)

For single module, the example show:

model = Model()
optimizer = optim.Adam([
                {'params': model.parameters()},
                {'params': awl.parameters(), 'weight_decay': 0}	
            ])

For the multiple modules above, how to set the parameters in optimizers? I can guess two options but they might be wrong:
option1:

optG = optim.Adam(list(Generator.parameters()), ...)
optD = optim.Adam(list(Discriminator.parameters()), ...)
optP = optim.Adam(list(Predictor.parameters())+list(awl.parameters()), ...)

option2:

optG = optim.Adam(list(Generator.parameters())+list(awl.parameters()), ...)
optD = optim.Adam(list(Discriminator.parameters())+list(awl.parameters()), ...)
optP = optim.Adam(list(Predictor.parameters())+list(awl.parameters()), ...)

@Mikoto10032
Which one is correct?

Thanks!

Why avoid the loss of becoming negative

Thanks for your work and I have a question

Why can't the loss be negative? It seems to me that the value of the loss does not affect the training of the network.

As an example, let's say my loss is the cross-entropy loss, which is (0, 1) most of the time, and the optimization goal is to minimize the loss.

Now suppose I add a constant of -100 to the loss. Loss = loss-100. The loss will be (-100, -99), and the optimization goal remains the same: reduce the loss

The way to reduce the loss is gradient descent. Obviously, the constant -100 does not affect the gradient of the network parameters, that is, the loss does not seem to affect the training process, what is important is the gradient of this value to the network parameters.

Now back to the original question, why is it necessary to avoid negative losses?

How to use it for 3 regression tasks?

Hi awesome work!

I'm doing multitask learning in PyTorch Geometric with this code

How do I use AutomaticWeightedLoss for 3 tasks? Is it possible? Thanks, Sam