Hi，Below line 114 of flatclr.py, no matter what the value of v is, the value of loss_v

can not understand the loss of flatclr.py about flatclr HOT 5 OPEN

yikeqingli commented on June 5, 2024 2

can not understand the loss of flatclr.py

from flatclr.

Comments (5)

ZifengLiu98 commented on June 5, 2024 1

Hi! Thanks for your great work and I have learned a lot from it. However I still have the confusion about the loss.
I want to apply your flatclr in my supervised contrastive learning project. Since the value of loss is always 0, how can gradient descent to be completed? I added the entropy loss and the flatclr. It seems that the flatclr loss contribute nothing to the total loss.
Could u please help me to understand your flatclr? In my project it always equal to 0

from flatclr.

lxysl commented on June 5, 2024 1

I am impressed by this brilliant work. But I have the same confusion about the code implementation. It seems the code implementation not completely match with the paper. I wonder if I got something wrong or lost.

In the paper, the pseudo code of the FlatNCE is as follow:

In this repository, the implementation code of it is as follow:

_, features = self.model(images)
logits, labels = self.flat_loss(features)
v = torch.logsumexp(logits, dim=1, keepdim=True) #(512,1)
loss_vec = torch.exp(v-v.detach())

assert loss_vec.shape == (len(logits),1)
dummy_logits = torch.cat([torch.zeros(logits.size(0),1).to(self.args.device), logits],1)
loss = loss_vec.mean()-1 + self.criterion(dummy_logits, labels).detach() #+

What confused me a lot is, isn't the loss_vec here is just the $l_{FlatNCE}$ in the pseudo code? What are the next a few lines doing? Can you help me with some explanation please?

Besides, I am working on integrate your FlatNCE loss to the Supervised Contrastive Learning. It's kind of difficult since there is a mask of the positive samples in it, instead of only one positive sample which is the sample itself. And also, there isn't a InfoNCE function or so in it. If you could also give me some guidance on it, I will be so much grateful!

from flatclr.

LanXiaoPang613 commented on June 5, 2024 1

Hi! Thanks for your great work and I have learned a lot from it. However I still have the confusion about the loss. I want to apply your flatclr in my supervised contrastive learning project. Since the value of loss is always 0, how can gradient descent to be completed? I added the entropy loss and the flatclr. It seems that the flatclr loss contribute nothing to the total loss. Could u please help me to understand your flatclr? In my project it always equal to 0

In my undestanding, the first term in the flatNce maintains zero, but the second terms will gererate different gradient during each iteration.

from flatclr.

Junya-Chen commented on June 5, 2024

loss_vec.mean()-1 is equal to zero, but the gradient is not equal to zero. Thus, by minimizing the loss function by gradient descent, we can get a contrastive representation learning. cross_entropy.detach() is just used to show the progress of learning, which is not included in the actual loss function.

from flatclr.

ZifengLiu98 commented on June 5, 2024

I am impressed by this brilliant work. But I have the same confusion about the code implementation. It seems the code implementation not completely match with the paper. I wonder if I got something wrong or lost.

In the paper, the pseudo code of the FlatNCE is as follow: In this repository, the implementation code of it is as follow:
_, features = self.model(images)
logits, labels = self.flat_loss(features)
v = torch.logsumexp(logits, dim=1, keepdim=True) #(512,1)
loss_vec = torch.exp(v-v.detach())

assert loss_vec.shape == (len(logits),1)
dummy_logits = torch.cat([torch.zeros(logits.size(0),1).to(self.args.device), logits],1)
loss = loss_vec.mean()-1 + self.criterion(dummy_logits, labels).detach() #+
What confused me a lot is, isn't the loss_vec here is just the lFlatNCE in the pseudo code? What are the next a few lines doing? Can you help me with some explanation please?

Besides, I am working on integrate your FlatNCE loss to the Supervised Contrastive Learning. It's kind of difficult since there is a mask of the positive samples in it, instead of only one positive sample which is the sample itself. And also, there isn't a InfoNCE function or so in it. If you could also give me some guidance on it, I will be so much grateful!

我在表情识别的任务中使用了flatclr,但是它的表现不及交叉熵损失，比[Supervised Contrastive Learning]提供的对比损失会稍微好点。我不确定是不是我使用有误，更大的batchsize下flatclr也没有更优

from flatclr.

can not understand the loss of flatclr.py about flatclr HOT 5 OPEN

Comments (5)

Related Issues (2)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent