Coder Social home page Coder Social logo

Comments (3)

luke-mcdermott-mi avatar luke-mcdermott-mi commented on May 18, 2024

Normalization on test data brings ConvBN to 80%, but the label issue is still happening. With my hyperparameters, the identity matrix is far better than any of the labels given.

from frepo.

yongchaoz avatar yongchaoz commented on May 18, 2024

Hi Luke,

Thanks for reaching out. It seems there may be some preprocessing data issue in your implementation, but I cannot tell from the description. Have you tried to evaluate using the provided script? I have done some sanity checks in the past and found that they worked out fine. Maybe you can contrast your code with mine to see the difference? The label is right in my case since all my networks are trained using MSE loss rather than standard CE loss to make it consistent with the distillation objective. As for the image input, I always do the same normalization as the training (subtract mean and divide by standard deviation). The mean and standard deviation are here.

Generally speaking, I suggest first working with the CONV architecture in the repo, and when you get the expected performance, you can evaluate it on LeNET.

Best,
Yongchao

from frepo.

luke-mcdermott-mi avatar luke-mcdermott-mi commented on May 18, 2024

Thanks Yongchao, I just saw the pytorch branch too, so now I can properly compare code efficiently. As for running your code, I was running into issues with the environment on M1 mac. After making my own on a kubernetes cluster, jax does not see my gpu, but pytorch does so its a mess for me and I have never used jax before. I needed to use the checkpoints on my own models, so I am trying to get this to work on general implementations. I imagined these distilled images would work pretty much out of the box like standard images with general hyperparameter search, so I see that you be very careful with what you pick. I will have to switch over to your code.

When you say the "label is right", do you mean your implementation uses a standard identity matrix, or that the identity matrix - .1 is correct for learned_label false? If the latter, why is that the case to have negative values on all other classes?

Also, for larger architecures like ResNet-50/18 for distilling, you do not show the results in the paper but comment that they lead to poor cross-architecture generalization. When evaluating on ResNet, how well do they do as distillers compared to ConvBN?
ex: ResNet-18 distilled Images -> Evaluate on a fresh ResNet-18 vs. ConvBN distilled Images -> Evaluate on ResNet-18

If I did not care about cross-architecture generalization or the size of the distillation model(s), what would be the best distillation method for the self-data-distillation on larger models?

Thanks for reaching back, Luke

from frepo.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.