hendrycks / ss-ood Goto Github PK
View Code? Open in Web Editor NEWSelf-Supervised Learning for OOD Detection (NeurIPS 2019)
License: MIT License
Self-Supervised Learning for OOD Detection (NeurIPS 2019)
License: MIT License
Hi.
There seems to be no parameters to actually reproduce yours and the baseline for the multiclass_ood setting.
Can you provide the specific number for epochs and other parameters to actually reproduce the result?
The default param says 5 for epochs but it seems to be not right.
Hi, hendrycks:
Your paper is interesting. And would you like to share a trained model?
Thanks.
Before fed in network , the input bx become bx *2 - 1, i would like to why. Thanks!
Hi,
Thanks for providing this great paper and code. I am trying to use the method you proposed in normal classification task with dataset such as cifar-10. To make sure that I have understand the paper correctly, I feel I better ask you for some guidance:
Suppose my baseline cifar-10 classification model is WideResnet28-1, and I use batch size of 256 with cosine annealing lr scheduler. The initial learning rate is thus 0.2. The augmentation method is horizontal flip and random cropping after padded 4 pixels. Apart from these normal settings, I also used mixup to train the model.
The question is: what is the most suitable way to add self-supervision to the above training procedure? Here is my assumption: I should add a new 4-way classification fc layer head in parallel with the 10-way classification head to the model. The total loss thus should become L_10 + 0.5*L_4
according to the paper. As for the dataset, I first implement normal h-flip and random cropping augmentation, and then rotate the cropped and flipped image in (0, 90, 180, 360) to make the batch size to be 2564=1024. Since the batch size is amplified, I should also amplify the learning rate to 0.24=0.8. As for the mixup part, I should mix the 10-way classification labels as well as the 4-way rotation labels and then use cross entropy to compute the loss respectively.
Is this the correct way to use your method in normal classification?
Thank you for sharing your work. I have two questions.
First, I found a undefined class as 'BAM' and undefined type as 'BAM' in the line 118-121 at the \models\cbam\model_resnet.py. Do you mean 'CBAM' here?
Second, when I tried to train one-class-ood on ResidualNet with depth of 18 and 34, which used the BasicBlock defined in your code, the model run fine. However when using ResidualNet with depth of 50 and 101, based on the Bottleneck, the model seemed to have some problem. The size of "out" and "residual" in the line 96 at \models\cbam\model_resnet.py is not the same shape. Here is the Traceback:
Traceback (most recent call last):
7 File "train.py", line 224, in
8 train()
9 File "train.py", line 164, in train
10 x = net(2 * data - 1)
11 File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
12 result = self.forward(*input, **kwargs)
13 File "/data/workplace/ss-ood-master/models/cbam/model_resnet.py", line 169, in forward
14 x = self.layer1(x)
15 File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
16 result = self.forward(*input, **kwargs)
17 File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
18 input = module(input)
19 File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
20 result = self.forward(*input, **kwargs)
21 File "/data/workplace/ss-ood-master/models/cbam/model_resnet.py", line 96, in forward
22 out += residual
23 RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1
Firstly, thank you for releasing your codes. It's very helpful for my research :)
I wonder if the objective function in train.py(168 line) is just about Rotation and Translation class. Because, in your paper, highest score on ImageNet was the result of trained by RotNet + Translation + Self-attention + Resize.
I hope you could answer for my question soon !
Hi,
Thanks for your paper and code. But I got an error during running adversary = attacks.PGD(epsilon=8./255, num_steps=10, step_size=2./255).cpu()
in folder adversarial.
The error was:
logits, pen = model(adv_bx * 2 - 1)
ValueError: too many values to unpack (expected 2)
I don't know whether I did something wrong in the code. I just changed the .cuda()
to .cpu()
because I only used the CPU version.
Can anyone help me to solve this question?
Thanks so much.
Hi Hendrycks!
May I ask one more thing?
Your reply to Adversarial helped me a lot.
Can you tell me how to change corruption strengths for CIFAR-10-C?
I downloaded Cifar-10-C from the link provided in https://github.com/hendrycks/robustness.
However, there seems to be nothing related to corruption strengths.
Can you provide your exact performance in common corruption setting?
And Did you report best accuracy? or the accuracy from the last epoch?
Since if Cifar-10-C dataset contain all five corruption strengths, then the epoch 86 seems to be most similar with your results. However, clean accuracy doesn't seem to reach 95.5% at all with WRN40-2. Can you provide how to reproduce your settings?
corrupt acc : [53.946000000000005, 64.344, 57.306000000000004, 67.526, 58.382, 84.642, 79.41, 79.276, 71.054, 79.024, 83.966, 83.89, 93.55799999999999, 91.96600000000001, 82.48, 73.512, 83.636, 87.90599999999999, 81.658]
Epoch 86 |Time 150 |Tr Loss 0.0477 |Te Loss 0.884 |Test acc |corrupt mean acc 76.71
corrupt acc : [51.488, 62.674, 57.668, 66.06, 57.348, 84.75399999999999, 78.518, 78.544, 71.41199999999999, 78.74799999999999, 84.206, 85.49600000000001, 93.95599999999999, 92.78999999999999, 84.552, 72.418, 83.418, 87.768, 80.138]
Epoch 100 |Time 150 |Tr Loss 0.0261 |Te Loss 1.010 |Test acc |corrupt mean acc 76.42
Thanks a lot, again!
Hi,
I found your research paper very interesting.
However, when I was implementing your paper, I was unable to reproduce the results for CIFAR-10 with the following configs:
Network: wrn 40-2
Training loss = c.e(adv) + 0.5(Loss_rotation)
Adv. perturbation creation loss = cross-entropy(x,y) + Loss_rotation
SGD, learning rate = 0.1 , momentum = 0.9 and nestrov=true , batch=128 with cosine annealing for 205 epochs.
i.e.
optimizer = torch.optim.SGD([ {'params': model.parameters()}, {'params': rotate_classifier.parameters()} ] , lr=0.1 , nesterov = True , momentum = 0.9,weight_decay=0.0005)
scheduler = torch.optim.lr_scheduler.LambdaLR( optimizer, lr_lambda=lambda step: cosine_annealing( step, 205 * len(base_loader), 1, # since lr_lambda computes multiplicative factor 1e-6 / 0.1))
I am getting the following result:
Test Accuracy: 72.4425 : Rotation Accuracy 80.5675 : Adversarial Accuracy(pgd-10 only on cross-entropy loss ): 10.42
Can you please mention the hyper-parameters again for learning rate scheduler and the number of epochs of training you used for getting the results?
Thanks
I was wondering how do you know how many epochs to use during training? It seems like increasing the epochs actually worsens the performance. However, looking at the training loss and testing loss, it does not seem to really correlate with the AUC performance during testing. Any advice? Thank you!!
Hi, I want to detect the ood samples while performing one class classification. However my data is not consisting of images but signals like sound. Could this method be applicable to my scenario? Or do you have any suggestion regarding my scenario? Since I am not sure how I can rotate the signals. Thanks in advance
Hi,
I try your code of https://github.com/hendrycks/ss-ood/blob/master/adversarial/train.py on 2 and 3 GPUs,
but the speed is the same as the single-gpu version code.
I wonder maybe the problem comes from that the adversary PGD class also need to be DataParalleled?
As the title
First and Foremost , I really appreciate you for sharing code.
Now I try to reproduce ImageNet AUROC 85.7%, but I don't know exact parameter setting(eg learning rate, epoch etc..)
and also ImageNet dataset downloaded from your github is enough for training? plz let me know~ thanks!
I found the link to download the test data. But training data is not provided. Does that mean I should download the Imagenet dataset and use the 'symlink_to_data.py' to create the training data?
Thanks.
Dan,
I think I know the answer to this question but I will ask it anyway. Do you compute the adversarial examples on the batch that contains all of the rotated versions of the data OR do you compute the adversarial examples on the 0deg rotation batch then compute the rotated versions? If it is the former, do you incur a ~4x increase in training time because the effective batch size is 4x bigger?
Thanks in advance,
Nate
Hi, really interesting publication!
I tried to reproduce the results of your Multi-Class OoD Detector with rotation head compared to the vanilla MSP baseline.
The AUROC scores of the rotation network were quite similiar in my self-trained implementation: Gaussian OoD 99.38%, Cifar-100 OoD 90.65%.
My issue is now with the vanilla MSP baseline, because I get a very large deviation in AUROC scores to your baseline of more than 30% (Gaussian OoD: 65.41%, Cifar-100 OoD: 52.38%).
Now I am trying to figure out what the issue with my implementation is and would like to ask you to provide some more details about the (training) setup of your vanilla MSP baseline.
Basically how exactly is the model architecture, what training data (incl. perturbations) and loss function do you use and what hyperparameters did you have?
Best Regards and already thank you in advance!
Marc Alexander
Hi,
Thanks for the wonderful paper and open-sourcing the code.
However, I had two issues in adversarial robustness code.
File "/ss-ood/adversarial/attacks.py", line 62, in forward
logits, pen = model(adv_bx*2-1)
ValueError: too many values to unpack (expected 2)
from models.wrn_prime import WideResNet, wrn_prime not found.
I guess there is a model for which code is not present.
Hi,
Thanks for making the code public, which helps a lot. I'm just wondering why there is a -1 for the classification loss when doing OOD detection?
ss-ood/multiclass_ood/test_auxiliary_ood.py
Lines 182 to 185 in 2a284be
Thanks!
Firstly many thanks for sharing your work.
I found the paper very interesting and wanted to see if I could reproduce some of the results, especially regarding the one-class OOD detector on CIFAR-10 using OE. I was just wondering if this would be made available at any point.
Kind regards,
Se
Hi,
in the scoring formula on page 7 in the paper, shouldn't the KL divergence of the classifier prediction from uniform be small for OOD inputs, and the rotation CE be large on OOD since the rotation head has not been trained to predict the original rotation on OOD inputs? I.e. one of the terms should have a minus sign, right?
If I read it correctly, the code uses different signs for those terms:
ss-ood/multiclass_ood/test_auxiliary_ood.py
Lines 182 to 185 in 2a284be
where KL is positive CE minus the constant entropy of U.
Hi Hendrycks.
Its pleasure to review your works in ood.
Meanwhile, I have a question about the adversarial attack.
adversary = attacks.PGD(epsilon=8./255, num_steps=20, step_size=2./255).cuda()
This code in line 125 in ss-ood/adversarial/train.py seems to train the 20-step PGD for adversarial training+Auxiliary Rotations.(I changed the num_steps to 20 to make your reported setting.)
However, the training log seems something weird.
Epoch 57 | Time 607 | Train Loss 1.6344 | Test Loss 0.843 | Test Error 27.96
Epoch 58 | Time 605 | Train Loss 1.6214 | Test Loss 0.829 | Test Error 26.39
Epoch 59 | Time 594 | Train Loss 1.5835 | Test Loss 0.808 | Test Error 25.44
Epoch 60 | Time 591 | Train Loss 1.6008 | Test Loss 0.802 | Test Error 24.35
Epoch 61 | Time 602 | Train Loss 1.6263 | Test Loss 0.796 | Test Error 26.33
Epoch 62 | Time 611 | Train Loss 1.5923 | Test Loss 0.790 | Test Error 24.62
Epoch 63 | Time 597 | Train Loss 1.5906 | Test Loss 0.783 | Test Error 25.35
Epoch 64 | Time 607 | Train Loss 1.6116 | Test Loss 0.811 | Test Error 25.18
Your reported accuracy is 50.4% for 20-step PGD but the test error is too low and it seems to be similar to Clean.
Can you please provide how you ran the code for generating 20-step PGD and 100-step PGD?
Thanks for your great work!
please, one-class-train.tar can't be downloaded.
In the adversarial training code, the input to the model is in range (-1, 1). However, in the attack code the pixel values are clipped in range (0,1). Seems like a bug to me, unless I am missing something.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.