Comments (3)
Normalization on test data brings ConvBN to 80%, but the label issue is still happening. With my hyperparameters, the identity matrix is far better than any of the labels given.
from frepo.
Hi Luke,
Thanks for reaching out. It seems there may be some preprocessing data issue in your implementation, but I cannot tell from the description. Have you tried to evaluate using the provided script? I have done some sanity checks in the past and found that they worked out fine. Maybe you can contrast your code with mine to see the difference? The label is right in my case since all my networks are trained using MSE loss rather than standard CE loss to make it consistent with the distillation objective. As for the image input, I always do the same normalization as the training (subtract mean and divide by standard deviation). The mean and standard deviation are here.
Generally speaking, I suggest first working with the CONV architecture in the repo, and when you get the expected performance, you can evaluate it on LeNET.
Best,
Yongchao
from frepo.
Thanks Yongchao, I just saw the pytorch branch too, so now I can properly compare code efficiently. As for running your code, I was running into issues with the environment on M1 mac. After making my own on a kubernetes cluster, jax does not see my gpu, but pytorch does so its a mess for me and I have never used jax before. I needed to use the checkpoints on my own models, so I am trying to get this to work on general implementations. I imagined these distilled images would work pretty much out of the box like standard images with general hyperparameter search, so I see that you be very careful with what you pick. I will have to switch over to your code.
When you say the "label is right", do you mean your implementation uses a standard identity matrix, or that the identity matrix - .1 is correct for learned_label false? If the latter, why is that the case to have negative values on all other classes?
Also, for larger architecures like ResNet-50/18 for distilling, you do not show the results in the paper but comment that they lead to poor cross-architecture generalization. When evaluating on ResNet, how well do they do as distillers compared to ConvBN?
ex: ResNet-18 distilled Images -> Evaluate on a fresh ResNet-18 vs. ConvBN distilled Images -> Evaluate on ResNet-18
If I did not care about cross-architecture generalization or the size of the distillation model(s), what would be the best distillation method for the self-data-distillation on larger models?
Thanks for reaching back, Luke
from frepo.
Related Issues (13)
- Would you like to release pytorch version of the code HOT 1
- Would you like to compare your work with "Efficient Dataset Distillation using Random Feature Approximation" HOT 1
- Would you like to give scripts or hyper-parameter table for other settings? HOT 13
- Memory usage (PyTorch) HOT 4
- Can you provide a docker image to this environment? HOT 1
- Question about lb_margin_th HOT 1
- ImageNet-1K HOT 1
- A problem of environment HOT 3
- Can I apply FRePo to a regression task? HOT 1
- Suggestions for measuring FLOPs HOT 1
- Achieving random test accuracy when evaluating pre-distilled data with pytorch models HOT 2
- Attribute error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from frepo.