Not sure what I am doing wrong, but I am not able to get near the results reported in

Reproducibility Issues,about yongchao97/frepo

Comments (3)

luke-mcdermott-mi commented on May 18, 2024

Normalization on test data brings ConvBN to 80%, but the label issue is still happening. With my hyperparameters, the identity matrix is far better than any of the labels given.

from frepo.

yongchaoz commented on May 18, 2024

Hi Luke,

Thanks for reaching out. It seems there may be some preprocessing data issue in your implementation, but I cannot tell from the description. Have you tried to evaluate using the provided script? I have done some sanity checks in the past and found that they worked out fine. Maybe you can contrast your code with mine to see the difference? The label is right in my case since all my networks are trained using MSE loss rather than standard CE loss to make it consistent with the distillation objective. As for the image input, I always do the same normalization as the training (subtract mean and divide by standard deviation). The mean and standard deviation are here.

Generally speaking, I suggest first working with the CONV architecture in the repo, and when you get the expected performance, you can evaluate it on LeNET.

Best,
Yongchao

from frepo.

luke-mcdermott-mi commented on May 18, 2024

Thanks Yongchao, I just saw the pytorch branch too, so now I can properly compare code efficiently. As for running your code, I was running into issues with the environment on M1 mac. After making my own on a kubernetes cluster, jax does not see my gpu, but pytorch does so its a mess for me and I have never used jax before. I needed to use the checkpoints on my own models, so I am trying to get this to work on general implementations. I imagined these distilled images would work pretty much out of the box like standard images with general hyperparameter search, so I see that you be very careful with what you pick. I will have to switch over to your code.

When you say the "label is right", do you mean your implementation uses a standard identity matrix, or that the identity matrix - .1 is correct for learned_label false? If the latter, why is that the case to have negative values on all other classes?

Also, for larger architecures like ResNet-50/18 for distilling, you do not show the results in the paper but comment that they lead to poor cross-architecture generalization. When evaluating on ResNet, how well do they do as distillers compared to ConvBN?
ex: ResNet-18 distilled Images -> Evaluate on a fresh ResNet-18 vs. ConvBN distilled Images -> Evaluate on ResNet-18

If I did not care about cross-architecture generalization or the size of the distillation model(s), what would be the best distillation method for the self-data-distillation on larger models?

Thanks for reaching back, Luke

from frepo.

Reproducibility Issues about frepo HOT 3 CLOSED

Comments (3)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent