vikasverma1077 / manifold_mixup Goto Github PK

Code for reproducing Manifold Mixup results (ICML 2019)

Python 99.43% Shell 0.57%

deep-learning deep-neural-networks regularization data-augumentation supervised-machine-learning pytorch supervised-learning icml2019

manifold_mixup's Introduction

Manifold_mixup (ICML 2019)

This repo consists Pytorch code for the ICML 2019 paper Manifold Mixup: Better Representations by Interpolating Hidden States (https://arxiv.org/abs/1806.05236 ICML version (http://proceedings.mlr.press/v97/verma19a.html))

The goal of our proposed algorithm, Manifold Mixup, is to learn robust features by interpolating the hidden states of examples. The representations learned by our method are more discriminative and compact as shown in the below figure. Please refer to Figure 1 and Figure 2 of our paper for more details.

The repo consist of two subfolders for Supervised Learning and GAN experiments. Each subfolder is self-contained (can be used independently of the other subfolders). Each subfolder has its own instruction on "How to run" in its README.md file.

If you find this work useful and use it on your own research, please concider citing our paper.

@InProceedings{pmlr-v97-verma19a,
  title = 	 {Manifold Mixup: Better Representations by Interpolating Hidden States},
  author = 	 {Verma, Vikas and Lamb, Alex and Beckham, Christopher and Najafi, Amir and Mitliagkas, Ioannis and Lopez-Paz, David and Bengio, Yoshua},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {6438--6447},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Long Beach, California, USA},
  month = 	 {09--15 Jun},
  publisher = 	 {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/verma19a/verma19a.pdf},
  url = 	 {http://proceedings.mlr.press/v97/verma19a.html},
  }

Note: Please refer to our new repo for Interpolation based Semi-supervised Learning https://github.com/vikasverma1077/ICT

manifold_mixup's People

Contributors

Stargazers

Watchers

Forkers

kazk1018 nlml junhaowang alantsev dendisuhubdy daniellaszlo tinyloop dipikabablani stjordanis peidyen hyh12 chrisyxue chenyangsi chaos1992 yan-song prasanna1991 arita37 korchris youtang1993 snie2012 palver7 capybaralet yonigat roonakr fentahgg 5l1v3r1 ljm198134 rahuja123 sui6662012 sailfish009 w1998-jq peternara cloudjo21 aniket004 netphone tony109060581 jlim13 ethancaballero costantine20 ailiaili qinzhengmei darrenzhang01 ashok-arjun hucui2022 masonwang025 donglongzi liangzongchang racoutinho fyushan quantumalaviya ultimatefelix leolee0097 fusiming3 coco11563 woshishui1 yoogunwook hathawayxxh uestcwcw purewhites ip-augmentation ma8sa einson69 iq-scm serchirag netradeepakc

manifold_mixup's Issues

Error in line 109 of models/resnet.py

Hi, the original code that you have mentioned in your file (models/resnet.py) is given as:

        out = x
      
        if layer_mix == 0:
            #out = lam * out + (1 - lam) * out[index,:]
            out, y_a, y_b, lam = mixup_data(out, target, mixup_alpha)
        #print (out)       
        
        out = F.relu(self.bn1(self.conv1(x)))
        
        out = self.layer1(out)

At line 109, after the hidden mixup at layer 0, you are using x as input in self.conv1() layer.
Shouldn't that be changed to self.conv1(out)?

@vikasverma1077

More on Reproducing CIFAR10 supervised results

[This is similar to #5, but with the current code base and more networks.]

I am trying to recreate the Manifold Mixup CIFAR10 results, it seems that Manifold Mixup is a very promising development! I'm using the command lines from the project's README.md. I'm using Windows10, TitanXP, Python 3.7, PyTorch nightly (1.2, 7/6/2019), torchvision 0.3, and other packages the same or (mostly) slightly newer. My manifold_mixup version is 10/16/2019.

I only had to make one slight change, for torchvision 0.3: get_sampler(train_data.targets, ...) instead of get_sample(train_data.train_labels, ...).

Below, I show the test results from your paper, along with the results that I got. End is the final test error; best is the best test error during the run. The column "z" is a z-score, based on the mean μ and stdev σ from the arXiv paper, and my results. A negative z-score indicates that my results had a lower test error; a positive z-score = a higher test error. CLFR=="Command Line From README.md".

The results are mixed, and I'm not sure why; I thought you might have some thoughts. I'm seeing:

PreActResNet18: much better NoMixup and Input Mixup, about the same Manifold Mixup
PreActResNet34: somewhat worse Input Mixup, similar Manifold Mixup
WRN28-10: about the same NoMixup and Input Mixup, much worse Manifold Mixup

I accidentally tried Manifold Mixup without mixup_hidden for WRN28-10 (i.e. mixup, alpha=2.0), and actually got the mean result reported in the paper.

Any ideas? Some questions:

Are the results in the arXiv paper the "Best" value, or the "End" value?
I assume the results in the paper use {mixup_hidden, alpha=2} for mixup?
Is the current github software different from that used in the paper, in any substantial way?
Curious, are my run-times in the same ballpark as yours?

CIFAR 10	Err	Err	Tm	End	End	Best	Best	Best	CLFR
	μ	σ	[hrs]	Err	z	Iter	Err	z
PreActResNet18
No Mixup	4.83	.066	28.5	4.59	-3.6	642	4.4	-6.5	Y
AdaMix (Guo)	3.52
Input Mixup (Zhang)	4.2
Input Mixup (α = 1)	3.82	0.048	30	3.43	-8.1	1687	3.15	-14.0	Y
Manifold Mixup (α = 2)	2.95	0.046	32	3.18	5.0	1640	3.01	1.3	Y
PreActResNet34
No Mixup	4.64	.072
Input Mixup (α = 1)	2.88	0.043	44	3.21	7.7	1159	2.99	2.6	Y
Manifold Mixup (α = 2)	2.54	0.047	45	2.7	3.4	1230	2.47	-1.5	Y
Wide-Resnet-28-10
No Mixup	3.99	.118	19	4.12	1.1	299	3.89	-0.8	Y
Input Mixup (α = 1)	2.92	.088	20.5	2.79	-1.5	367	2.76	-1.8	Y
Manifold Mixup (α = 2)	2.55	.024	19	2.97	17.5	353	2.82	11.3	Y
Manifold Mixup (α = 2) , but not mixup_hidden	2.55	.024	18.5	2.73	7.5	391	2.55	0.0	N

Also, here is a plot of the test error, for each of the scenarios above. (The pink wrn28_10_mixup_alpha=0 is shortened / offset to the left, because it's from a restart.) Notably:

the 'best' error (marked by the bold 'x') is often a momentary low spike during the training session, often not near the final test error.
the blow-up behavior of the green plot (preactresnet18, vanilla) at iterations 701 and 919 is strange

Reproducing CIFAR10 supervised results

I am attempting to reproduce your CIFAR10 supervised results from Table 1 (https://arxiv.org/pdf/1806.05236.pdf) using code from this repository. I cannot get within 0.5% of the following results:

PreActResNet18, Manifold Mixup, 2.89% error
PreActResNet152, Manifold Mixup, 2.76% error
PreActResNet152, Manifold Mixup All Layers, 2.38% error

For example, the paper is vague on details such as initial learning rate, batchsize, Nesterov or not, and other settings. Could you kindly provide command line invocations to reproduce those results?

Also when running your training code I see test error variation of around 0.3-0.5% from epoch to epoch. Do you report results over multiple seeds or are these figures single-seed estimates of the test error?

Does mixup process still work during eval?

I didn't find anything in the code like nn.Dropout to determine if it was in training.

Question about accuracy during Training

I was trying to implement mix-up-hidden for preactresnet18, I found that for calculating accuracy you are comparing mix-up output with original target instead of re-weighted one, that makes accuracy low during training, I didn't understand what that accuracy signifies/implies?

Could you please point out the core novel code?

I want to copy the code from this project and use this methods to other task.
Thank you very much.
@vikasverma1077

Question about BCE

Hi,

I would like to know the reason of using BCE instead of CrossEntropy. Is this critical to Manifold Mixup? Is this also the reason you train 2000 epochs which is much longer than the common training schedules?

Plot Code

Thanks for sharing your work for all to reproduce!!

I was wondering if you had the extra plotting code to reproduce figure 1a, 1b from the paper. It would be very much appreciated.
Thanks in advance!!

Application error

error while tring download tiny imagenet

Making the training process generic for custom models

Hi! Great paper! I implemented manifold mixup and also support for interpolated adversarial training (https://github.com/shivamsaboo17/ManifoldMixup) for any custom model defined by user using PyTorch's forward hook functionality by:

Select a random index and apply forward hook to that layer
Forward pass using data input x_0 and record output at hooked layer
Use this output along with new input x_1 by adding new hook at the same layer to do this mixup operation

For now I am selecting the layer randomly without considering type of layer (batchnorm, relu etc are counted as different layer), hence I wanted to know if there should be any layer selection rule such as 'mixup should be done only after a conv block in resnet' and if yes how to extend this rule to custom models that users might build?

DataParallel Usage

When I try to use this the supervised PreActResnets with Manifold Mixup with torch.nn.DataParallel, it only returns the data from one of my GPUS. Is there a known lack of integration with the DataParallel module or am I likely doing something wrong?

Cannot reproduce CIFAR10 semi-supervised results

Hi there

I cloned the repo, pre-calced the ZCA matrix, and ran your command as follows on a single GPU using python 2.7, torch 0.3.1 and torchvision 0.2.0:
python main_mixup_hidden_ssl.py --dataset cifar10 --optimizer sgd --lr 0.1 --l2 0.0005 --nesterov --epochs 1000 --batch_size 100 --mixup_sup 1 --mixup_usup 1 --mixup_sup_hidden --mixup_usup_hidden --mixup_alpha_sup 0.1 --mixup_alpha_usup 2.0 --alpha_max 1.0 --alpha_max_at_factor 0.4 --net_type WRN28_2 --schedule 500 750 875 --gammas 0.1 0.1 0.1 --exp_dir exp1 --data_dir ../data/cifar10/

However I get a final test error of around 18%, whereas in your paper you report around 10%.

Is there something I'm missing here?

Thanks in advance
Liam

AttributeError: 'CIFAR100' object has no attribute 'targets'

Hi! Thanks so much for the well organised code!

Just wanted to double check regarding the torch and torchvision versions. With regards to the supervised models, torchvision 0.2.1 appears to give the error in the title. This seems to be confirmed by this other post mentioning that ".targets" is used from torchvision 0.4.1? Perhaps there were some retroactive library updates?

Warm regards,

Question about training epoch?

I have a question about the result of epoch. Why do you use 600-2000 epoch to validate the superiority of your method? I think that the epoch number is too large and sometimes I only use 200 epoch to train these tiny dataset. Any reasons about settings of epoch?

Best

Dataparallel issue

Hi,
in manifold_mixup_hidden_ssl.py row 215.
Using two GPU with DataParallel the dim of lam is two, .item() cannot convert to one single scalar.

Error running Semi-supervised Manifold mixup for Cifar10

Hi,

I run the Semi-supervised Manifold mixup for Cifar10 and the following error appears, I'm using python 2.7, torch 0.3.1 and torchvision 0.2.0

Traceback (most recent call last):
File "main_mixup_hidden_ssl.py", line 357, in
train(epoch)
File "main_mixup_hidden_ssl.py", line 216, in train
lam = lam.data.cpu().numpy().item()
ValueError: can only convert an array of size 1 to a Python scalar

I saw Iam here is an array of size 2, not 1, so I used item(0) to replace item() in line 216, then a similar error appears

File "main_mixup_hidden_ssl.py", line 357, in
train(epoch)
File "main_mixup_hidden_ssl.py", line 241, in train
mixedup_target = target_alam.expand_as(target_a) + target_b(1-lam.expand_as(target_b))
File "/home/wei.z/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 433, in expand_as
return self.expand(tensor.size())
RuntimeError: The expanded size of the tensor (10) must match the existing size (2) at non-singleton dimension 1. at /pytorch/torch/lib/THC/generic/THCTensor.c:340