Coder Social home page Coder Social logo

vikasverma1077 / manifold_mixup Goto Github PK

View Code? Open in Web Editor NEW
475.0 11.0 65.0 41.33 MB

Code for reproducing Manifold Mixup results (ICML 2019)

Python 99.43% Shell 0.57%
deep-learning deep-neural-networks regularization data-augumentation supervised-machine-learning pytorch supervised-learning icml2019

manifold_mixup's Introduction

Manifold_mixup (ICML 2019)

This repo consists Pytorch code for the ICML 2019 paper Manifold Mixup: Better Representations by Interpolating Hidden States (https://arxiv.org/abs/1806.05236 ICML version (http://proceedings.mlr.press/v97/verma19a.html))

The goal of our proposed algorithm, Manifold Mixup, is to learn robust features by interpolating the hidden states of examples. The representations learned by our method are more discriminative and compact as shown in the below figure. Please refer to Figure 1 and Figure 2 of our paper for more details.

The repo consist of two subfolders for Supervised Learning and GAN experiments. Each subfolder is self-contained (can be used independently of the other subfolders). Each subfolder has its own instruction on "How to run" in its README.md file.

If you find this work useful and use it on your own research, please concider citing our paper.

@InProceedings{pmlr-v97-verma19a,
  title = 	 {Manifold Mixup: Better Representations by Interpolating Hidden States},
  author = 	 {Verma, Vikas and Lamb, Alex and Beckham, Christopher and Najafi, Amir and Mitliagkas, Ioannis and Lopez-Paz, David and Bengio, Yoshua},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {6438--6447},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Long Beach, California, USA},
  month = 	 {09--15 Jun},
  publisher = 	 {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/verma19a/verma19a.pdf},
  url = 	 {http://proceedings.mlr.press/v97/verma19a.html},
  }


Note: Please refer to our new repo for Interpolation based Semi-supervised Learning https://github.com/vikasverma1077/ICT

manifold_mixup's People

Contributors

alexmlamb avatar christopher-beckham avatar vikasverma1077 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

manifold_mixup's Issues

Error in line 109 of models/resnet.py

Hi, the original code that you have mentioned in your file (models/resnet.py) is given as:

        out = x
      
        if layer_mix == 0:
            #out = lam * out + (1 - lam) * out[index,:]
            out, y_a, y_b, lam = mixup_data(out, target, mixup_alpha)
        #print (out)       
        
        out = F.relu(self.bn1(self.conv1(x)))
        
        out = self.layer1(out)

At line 109, after the hidden mixup at layer 0, you are using x as input in self.conv1() layer.
Shouldn't that be changed to self.conv1(out)?

@vikasverma1077

More on Reproducing CIFAR10 supervised results

[This is similar to #5, but with the current code base and more networks.]

I am trying to recreate the Manifold Mixup CIFAR10 results, it seems that Manifold Mixup is a very promising development! I'm using the command lines from the project's README.md. I'm using Windows10, TitanXP, Python 3.7, PyTorch nightly (1.2, 7/6/2019), torchvision 0.3, and other packages the same or (mostly) slightly newer. My manifold_mixup version is 10/16/2019.

I only had to make one slight change, for torchvision 0.3: get_sampler(train_data.targets, ...) instead of get_sample(train_data.train_labels, ...).

Below, I show the test results from your paper, along with the results that I got. End is the final test error; best is the best test error during the run. The column "z" is a z-score, based on the mean μ and stdev σ from the arXiv paper, and my results. A negative z-score indicates that my results had a lower test error; a positive z-score = a higher test error. CLFR=="Command Line From README.md".

The results are mixed, and I'm not sure why; I thought you might have some thoughts. I'm seeing:

  • PreActResNet18: much better NoMixup and Input Mixup, about the same Manifold Mixup
  • PreActResNet34: somewhat worse Input Mixup, similar Manifold Mixup
  • WRN28-10: about the same NoMixup and Input Mixup, much worse Manifold Mixup

I accidentally tried Manifold Mixup without mixup_hidden for WRN28-10 (i.e. mixup, alpha=2.0), and actually got the mean result reported in the paper.

Any ideas? Some questions:

  • Are the results in the arXiv paper the "Best" value, or the "End" value?
  • I assume the results in the paper use {mixup_hidden, alpha=2} for mixup?
  • Is the current github software different from that used in the paper, in any substantial way?
  • Curious, are my run-times in the same ballpark as yours?

CIFAR 10 Err Err Tm End End Best Best Best CLFR
μ σ [hrs] Err z Iter Err z
PreActResNet18
No Mixup 4.83 .066 28.5 4.59 -3.6 642 4.4 -6.5 Y
AdaMix (Guo) 3.52
Input Mixup (Zhang) 4.2
Input Mixup (α = 1) 3.82 0.048 30 3.43 -8.1 1687 3.15 -14.0 Y
Manifold Mixup (α = 2) 2.95 0.046 32 3.18 5.0 1640 3.01 1.3 Y
PreActResNet34
No Mixup 4.64 .072
Input Mixup (α = 1) 2.88 0.043 44 3.21 7.7 1159 2.99 2.6 Y
Manifold Mixup (α = 2) 2.54 0.047 45 2.7 3.4 1230 2.47 -1.5 Y
Wide-Resnet-28-10
No Mixup 3.99 .118 19 4.12 1.1 299 3.89 -0.8 Y
Input Mixup (α = 1) 2.92 .088 20.5 2.79 -1.5 367 2.76 -1.8 Y
Manifold Mixup (α = 2) 2.55 .024 19 2.97 17.5 353 2.82 11.3 Y
Manifold Mixup (α = 2) , but not mixup_hidden 2.55 .024 18.5 2.73 7.5 391 2.55 0.0 N

Also, here is a plot of the test error, for each of the scenarios above. (The pink wrn28_10_mixup_alpha=0 is shortened / offset to the left, because it's from a restart.) Notably:

  • the 'best' error (marked by the bold 'x') is often a momentary low spike during the training session, often not near the final test error.
  • the blow-up behavior of the green plot (preactresnet18, vanilla) at iterations 701 and 919 is strange

ManifoldMixupTestErr

Reproducing CIFAR10 supervised results

I am attempting to reproduce your CIFAR10 supervised results from Table 1 (https://arxiv.org/pdf/1806.05236.pdf) using code from this repository. I cannot get within 0.5% of the following results:

  • PreActResNet18, Manifold Mixup, 2.89% error
  • PreActResNet152, Manifold Mixup, 2.76% error
  • PreActResNet152, Manifold Mixup All Layers, 2.38% error

For example, the paper is vague on details such as initial learning rate, batchsize, Nesterov or not, and other settings. Could you kindly provide command line invocations to reproduce those results?

Also when running your training code I see test error variation of around 0.3-0.5% from epoch to epoch. Do you report results over multiple seeds or are these figures single-seed estimates of the test error?

Question about accuracy during Training

I was trying to implement mix-up-hidden for preactresnet18, I found that for calculating accuracy you are comparing mix-up output with original target instead of re-weighted one, that makes accuracy low during training, I didn't understand what that accuracy signifies/implies?

Question about BCE

Hi,

I would like to know the reason of using BCE instead of CrossEntropy. Is this critical to Manifold Mixup? Is this also the reason you train 2000 epochs which is much longer than the common training schedules?

Plot Code

Thanks for sharing your work for all to reproduce!!

I was wondering if you had the extra plotting code to reproduce figure 1a, 1b from the paper. It would be very much appreciated.
Thanks in advance!!

Making the training process generic for custom models

Hi! Great paper! I implemented manifold mixup and also support for interpolated adversarial training (https://github.com/shivamsaboo17/ManifoldMixup) for any custom model defined by user using PyTorch's forward hook functionality by:

  1. Select a random index and apply forward hook to that layer
  2. Forward pass using data input x_0 and record output at hooked layer
  3. Use this output along with new input x_1 by adding new hook at the same layer to do this mixup operation

For now I am selecting the layer randomly without considering type of layer (batchnorm, relu etc are counted as different layer), hence I wanted to know if there should be any layer selection rule such as 'mixup should be done only after a conv block in resnet' and if yes how to extend this rule to custom models that users might build?

DataParallel Usage

When I try to use this the supervised PreActResnets with Manifold Mixup with torch.nn.DataParallel, it only returns the data from one of my GPUS. Is there a known lack of integration with the DataParallel module or am I likely doing something wrong?

Cannot reproduce CIFAR10 semi-supervised results

Hi there

I cloned the repo, pre-calced the ZCA matrix, and ran your command as follows on a single GPU using python 2.7, torch 0.3.1 and torchvision 0.2.0:
python main_mixup_hidden_ssl.py --dataset cifar10 --optimizer sgd --lr 0.1 --l2 0.0005 --nesterov --epochs 1000 --batch_size 100 --mixup_sup 1 --mixup_usup 1 --mixup_sup_hidden --mixup_usup_hidden --mixup_alpha_sup 0.1 --mixup_alpha_usup 2.0 --alpha_max 1.0 --alpha_max_at_factor 0.4 --net_type WRN28_2 --schedule 500 750 875 --gammas 0.1 0.1 0.1 --exp_dir exp1 --data_dir ../data/cifar10/

However I get a final test error of around 18%, whereas in your paper you report around 10%.

Is there something I'm missing here?

Thanks in advance
Liam

Question about training epoch?

I have a question about the result of epoch. Why do you use 600-2000 epoch to validate the superiority of your method? I think that the epoch number is too large and sometimes I only use 200 epoch to train these tiny dataset. Any reasons about settings of epoch?

Best

Dataparallel issue

Hi,
in manifold_mixup_hidden_ssl.py row 215.
Using two GPU with DataParallel the dim of lam is two, .item() cannot convert to one single scalar.

Error running Semi-supervised Manifold mixup for Cifar10

Hi,

I run the Semi-supervised Manifold mixup for Cifar10 and the following error appears, I'm using python 2.7, torch 0.3.1 and torchvision 0.2.0

Traceback (most recent call last):
File "main_mixup_hidden_ssl.py", line 357, in
train(epoch)
File "main_mixup_hidden_ssl.py", line 216, in train
lam = lam.data.cpu().numpy().item()
ValueError: can only convert an array of size 1 to a Python scalar

I saw Iam here is an array of size 2, not 1, so I used item(0) to replace item() in line 216, then a similar error appears

File "main_mixup_hidden_ssl.py", line 357, in
train(epoch)
File "main_mixup_hidden_ssl.py", line 241, in train
mixedup_target = target_alam.expand_as(target_a) + target_b(1-lam.expand_as(target_b))
File "/home/wei.z/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 433, in expand_as
return self.expand(tensor.size())
RuntimeError: The expanded size of the tensor (10) must match the existing size (2) at non-singleton dimension 1. at /pytorch/torch/lib/THC/generic/THCTensor.c:340

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.