Thank you for sharing the code, and these days I have been trying to test Relahash's performers in another dataset, I found that in the "labels_scaled" part, It will be easy to get a NAN loss,
def forward(self, logits, z, labels):
if self.multiclass:
if not self.one hot:
labels = F.one_hot(labels, logits.size(1))
labels = labels.float()
margin_logits = self.compute_margin_logits(logits, labels)
# label smoothing
log_logits = F.log_softmax(margin_logits, dim=1)
labels_scaled = labels/labels.sum(dim=1, keepdim=True) # that is place
loss = - (labels_scaled * log_logits).sum(dim=1)
loss = loss.mean()
The way I came across this question was during my attempt to identify the main cause of the NAN loss. Throughout this process, I have preserved every output variable, such as log_logits and labels_scaled. Upon closer examination, I discovered that there are some Nan values in the labels_scaled output. I believe this discrepancy may be attributed to an incompatibility between my model and yours.
To elaborate, if I don't consider a large loss to be problematic, I could potentially remove the labels_scaled part without significantly impacting the model's performance. This is the most straightforward solution I can currently think of.
In my efforts to mitigate any negative effects, I have attempted to use the loss_scaled approach by utilizing the apex package from https://github.com/NVIDIA/apex. However, I am uncertain whether my approach is correct or entirely misguided. Therefore, I would greatly appreciate your feedback on the matter or any appropriate solutions you may provide.
Thank you sincerely,
An individual from China with limited proficiency in English. 🙏