kuangliu / pytorch-cifar Goto Github PK

View Code? Open in Web Editor NEW

5.7K 5.7K 2.1K 85 KB

95.47% on CIFAR10 with PyTorch

License: MIT License

Python 100.00%

pytorch

pytorch-cifar's Introduction

Train CIFAR10 with PyTorch

I'm playing with PyTorch on the CIFAR10 dataset.

Prerequisites

Python 3.6+
PyTorch 1.0+

Training

# Start training with: 
python main.py

# You can manually resume the training with: 
python main.py --resume --lr=0.01

Accuracy

Model	Acc.
VGG16	92.64%
ResNet18	93.02%
ResNet50	93.62%
ResNet101	93.75%
RegNetX_200MF	94.24%
RegNetY_400MF	94.29%
MobileNetV2	94.43%
ResNeXt29(32x4d)	94.73%
ResNeXt29(2x64d)	94.82%
SimpleDLA	94.89%
DenseNet121	95.04%
PreActResNet18	95.11%
DPN92	95.16%
DLA	95.47%

pytorch-cifar's People

Contributors

Stargazers

Watchers

Forkers

alonegu mjc92 yichuan9527 andyhahaha fducau sunjieee nrontsis xibinyue benjamesbabala uzeful billedu andreicnica hminle bearpaw tianxingyzxq scp-173-cool celuigi mchorton zhengsx nianfudong linranran zhunzhong07 weigq ducha-aiki limitmhw alexlitz rbrigden oliver-batchelor bkj shajie17 1448764061 wpf535236337 wentaozhu ypwhs jaiabhayk yytdfc quanvuong mehdidc searobbersduck leelabcnbc chengyangfu liumenglife iacolippo wyq0227 andyliu93 psattige b2220333 aymenx17 xwuaustin thrandis zt1112 windweller flag-c jennyzzz21 ming-c ppxie arnaghosh sibylfiresoul seba-1511 lchia andreasveit wuwuwuxxx chenchuanch grseb9s mannykayy yangwangx zizhengtai seanhsieh kevingo xuanqing94 simplysimleprob yiwangnz kailangdebo ykwon0407 zhudaoruyi bohanli arantxacasanova fac2003 haofusheng kuochiyoug junchaozheng papercoming tvanh512 ouya-bytes hsuxu zhongminjin chao1224 mahdaneh rkyuca yechengxi heeryerate skyz8421 dathath gyzz aribenjamin ckanbak xukai92 crazy121 alancheg leehomyc

pytorch-cifar's Issues

problem with code at main.py line 126

what does net.module mean?

Issue about shortcut.

In your resnet model, you did a Relu activation before shortcut which after bn(conv2). I saw there are some other people just shortcut to the output of bn without Relu. Which one is better, did you try them separately before?

However, your work is great for new pytorcher like me ^_^. Hope to see wide resnet and deeplab in the future.

Best wishes.

What is the right hyper parameter for mobilenet V2 to get 94.47% acc on cifar10?

The first time I run mobilenet v2 experiment, it gives me an error in forward function. So I change the the kernel size of avg pooling layer from 2 --> 4. I train this model using batch size 128, rl 0.1 (x0.1 when epoch in [150, 250]). It gives me around 91.0% . What could be the problem? Thanks!

Performance regarding inference and backward time

Hi Kuangliu,
Nice job on achieving good result on Cifar10. Have you evaluated the speed performance of all the experiment models? It seems Mobilenet v2 cannot achieve significant speed improvement since the architecture is originally designed for Imagenet classification task.

Regards

Standard deviation for transforms.Normalize

Hey,

How did you calculate the standard deviation values for transforms.Normalize? I am getting the same means, but different standard deviations:

import numpy as np

from torchvision import datasets
from torchvision import transforms

transform_train = transforms.Compose([
#     transforms.RandomCrop(32, padding=4),
#     transforms.RandomHorizontalFlip(),
    transforms.ToTensor()
])

trainset = datasets.CIFAR10(root='data', train=True, download=True, transform=transform_train)
train_loader = torch.utils.data.DataLoader(trainset, batch_size=50_000, shuffle=True)

train = train_loader.__iter__().next()[0]

print('Mean: {}'.format(np.mean(train.numpy(), axis=(0, 2, 3))))
# Mean: [ 0.49139765  0.48215759  0.44653141]
print('STD: {}'.format(np.std(train.numpy(), axis=(0, 2, 3))))
# STD: [ 0.24703199  0.24348481  0.26158789]

py3 error

please add a better error message to indicate that Python 3 is not supported

'ResNet' object has no attribute 'to'

Hi,
I'm totally a newbie in machine learning and I am trying out the transfer learning tutorial on the Pytorch website https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html. When I run the file, this error pops up : AttributeError: 'ResNet' object has no attribute 'to'. Can any help me?

how i can test the algorithm after finish training?

LEts say i finished the training.

What code i should run to see the results for just testing?

Slower than TF?

I'm testing the speed-up of ResNet on TF and PyTorch.

In TF, typically it can converge within 80k steps, which is 80k batches, and when we set batch-size=128, that should be around ~205 epochs in PyTorch.

One interesting thing is, in TF I can finish 80k steps in about 6 hours. But in PyTorch, running 200 epochs took me around 13 hours. And this will expand to around 20 hours if I want to test 300 epochs.

I thought PyTorch should be much faster than TF. Does anyone knows the solution to this? BTW. I'm using ec2 g2.2xlarge.

Here is the ResNet20 TF implementation

The problem of 'shufflenet.py'

Hello, when I run the 'shufflenet.py', it comes to this bug:

"Traceback (most recent call last):
File "/home/w/Documents/code/cifar10/practice/models/shufflenet.py", line 218, in
test()
File "/home/w/Documents/code/cifar10/practice/models/shufflenet.py", line 214, in test
net = ShuffleNetG2()
File "/home/w/Documents/code/cifar10/practice/models/shufflenet.py", line 202, in ShuffleNetG2
return ShuffleNet(cfg)
File "/home/w/Documents/code/cifar10/practice/models/shufflenet.py", line 171, in init
self.layer1 = self._make_layer(out_planes[0], num_blocks[0], groups)
File "/home/w/Documents/code/cifar10/practice/models/shufflenet.py", line 181, in _make_layer
layers.append(Bottleneck(self.in_planes, out_planes-cat_planes, stride=stride, groups=groups))
File "/home/w/Documents/code/cifar10/practice/models/shufflenet.py", line 139, in init
self.conv1 = nn.Conv2d(in_planes, mid_planes, kernel_size=1, groups=g, bias=False)
File "/home/w/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 297, in init
False, _pair(0), groups, bias)
File "/home/w/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 33, in init
out_channels, in_channels // groups, *kernel_size))
TypeError: new() received an invalid combination of arguments - got (float, int, int, int), but expected one of:

(torch.device device)
(tuple of ints size, torch.device device)
(torch.Storage storage)
(Tensor other)
(object data, torch.device device)"

Do you meet the same problems? I wonder if you could help me solve this bug? Thanks a lot!

Pre-activation ResNet

Line 134 may not be necessary if pre-activation is used. In the current implementation, if pre-activation is used, then there will be 2 consecutive BN layers after the first 3x3 conv

pytorch-cifar/models/resnet.py

Line 134 in 2ded131

self.bn1 = nn.BatchNorm2d(64)

VGG16 Accuracy

In VGG16, can you tell me the specifications ?
I think you use SGD with momentum=0.9, weight_decay=5e-4 and lr
0.1 for epoch [0,150)
0.01 for epoch [150,250)
0.001 for epoch [250,350)

Can i use this script to create Resnet 56 and 110 models

I am wondering if I could create Resnet 56 and 110 architectures using these modules

can't achieve the reported accuracy on VGG19

Have anyone train a VGG19 network used this code? I used the total exactly the same code in the repository except using a lr_schuduler to change the learning rate automatically. But I can only achieve the accuracy around 88%. Can anyone tell me what's wrong?

How to extract relu layers from the forward function

Hi,

I want to extract the two relu layers in the forward function in BasicBlock?

When I look at model.modules(),
I don't see relu layers, I only see:

Why track the loss and accuracy on test mode?

What is the point of tracking the loss and accuracy iterating over the test set?

If your model is already trained, why don't you simple compute the accuracy over the entire test set and that's it?

Some of the Results dont match with the Paper

The results on Cifar10 using densenet is 3.46% error meaning 96.54% accuracy.

Paper: https://arxiv.org/pdf/1608.06993.pdf

But here the reported accuracy is 95.04. Why is this happening ?

small mistake(?) at the inception class code

hello @kuangliu , i would like to thank you for your work 👍 there is a small mistake i guess in the GoogleNet model, the inception class, you added one more conv2d in the 5x5 branch, it shouldn't only be one 1x1 conv followed by one 5x5 conv and not two ?

How long to train pretrained resnet?

Hi! thanks for the repository :)

How long did you train your pretrained resnet 18?

I tried 150, 100, 100 epochs with learning rates decayting as you specified on pre-resnet-50, but I only got ~94.6% accuracy.

What does the 1x1 average pooling layer do in VGG models?

In this line:
https://github.com/kuangliu/pytorch-cifar/blob/master/models/vgg.py#L37
There is a 1x1 average pooling layer with stride 1. According to my knowledge of pooling, this layer basically does nothing?

AttributeError: 'ResNet' object has no attribute 'to'

File "main.py", line 69, in net = net.to(device)
AttributeError: 'ResNet' object has no attribute 'to'

Is there anybody who got the same error?

Module Not Found Error: No module named 'vgg'

I am having a problem and I am not able to run it

File "/Desktop/Another Cifar/pytorch-cifar-master/main.py", line 16, in
from models import *

File "/Desktop/Another Cifar/pytorch-cifar-master/models/init.py", line 1, in
from vgg import *

ModuleNotFoundError: No module named 'vgg'

Is it because I am using python 3.6?

error showed when I tried to run --resume

Hi, I was trying to resume from one checkpoint and run into error.

==> Resuming from checkpoint..
Traceback (most recent call last):
  File "cifar10.py", line 59, in <module>
    checkpoint = torch.load('./checkpoint/ckpt.t7')
  File "/home/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 231, in load
    return _load(f, map_location, pickle_module)
  File "/home/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 379, in _load
    result = unpickler.load()
AttributeError: Can't get attribute 'Net' on <module '__main__' from 'cifar10.py'>

Can anyone help me?

Why the conv1's stride is changed from 2 to 1 for cifar 10 dataset on MobilenetV2 architecture?

Can you tell me why guyz?

When i changed conv1's stride 1 to 2, then it occurs an error.
In the models/mobilenetv2.py file, I've changed

self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, stride=1, padding=0, bias=False)
to
self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, stride=2, padding=0, bias=False)

and
cfg = [(1, 16, 1, 1), (6, 24, 2, 1), # NOTE: change stride 2 -> 1 for CIFAR10 (6, 32, 3, 2), (6, 64, 4, 2), (6, 96, 3, 1), (6, 160, 3, 2), (6, 320, 1, 1)]
to
cfg = [(1, 16, 1, 1), (6, 24, 2, 2), # NOTE: change stride 2 -> 1 for CIFAR10 (6, 32, 3, 2), (6, 64, 4, 2), (6, 96, 3, 1), (6, 160, 3, 2), (6, 320, 1, 1)]

and ran the code python main.py, then it occurs this error:
RuntimeError: The size of tensor a (8) must match the size of tensor b (16) at non-singleton dimension 3

Then, it it just for matching the size of tensor??
only for that??

Then, Isn't it different from the structure of the MobileNetV2 paper?

Help me guyz :)

ShuffleNet difference with paper at first convolution

Hi @kuangliu,
Thank you for these great models!
For ShuffleNet, I noticed a difference between your implementation and the original paper. Before the ShuffleNet stages, you do a 1x1 convolution, followed by batch normalization and ReLU: out = F.relu(self.bn1(self.conv1(x))), but in the original paper they perform a 3x3 convolution with stride 2 followed by a 3x3 maxpool with stride 2:

Was this done intentionally? Thank you in advance
TheFlux7

Why didnt use relu in resnet??

Great work, thanks for sharing your code. I've noticed that your resnet implementation did not use an activation function, could you please tell me why?
the official pytorch resnet code and the original paper both used relu activation function.

Did you guys test the accuracy of CIFAR100 ?

Did you guys test the accuracy of CIFAR100

What is the configuration of the model?

Hello, what is the configuration of the model? Such as python version, pytorch version, etc., thank you

This architecture is not following the paper for CIFAR-10

Hi,

In the ResNet publication they propose a different architecture where it comes to CIFAR-10 right?

The plain/residual architectures follow the form in Fig. 3 (middle/right). The network inputs are 32×32 images, with the per-pixel mean subtracted. The first layer is 3×3 convo- lutions. Then we use a stack of 6n layers with 3×3 convo- lutions on the feature maps of sizes {32, 16, 8} respectively, with 2n layers for each feature map size. The numbers of filters are {16, 32, 64} respectively. The subsampling is per- formed by convolutions with a stride of 2. The network ends with a global average pooling, a 10-way fully-connected layer, and softmax. There are totally 6n+2 stacked weighted layers. The following table summarizes the architecture:

There is only 3 layers and the feature map sizes are [16, 32, 64] not [64, 128, 256, 512] like for ImageNet.

Accuracy of Resnet50 is much higher than reported!

EDIT: Originally I title this issue "What epoch are reported results from?", but after further results have come to light I've renamed it to: "Accuracy of Resnet50 is much higher than reported!".

I've been reproducing some of these experiments and my output numbers don't exactly line up. At the moment I'm currently assuming its due to different random number seeds for the Kaiming Normal initialization.

Is the accuracy you report always the accuracy of the test set on the 350th epoch? Or are you reporting the accuracy of the best epoch?

In my reproduction of the DPN92 experiment I my measured accuracy of the last epoch is 94.92%, but the highest overall accuracy was 95.10% on epoch 275.

Surprisingly, when I ran the Resnet50 example I got an accuracy of 95.72%! (but this is likely some issue on my end) (Edit: Actually it doesn't seem to be; see bellow)

'VGG' object has no attribute 'to'

你好~我的torch的版本是0.3，之前也用过vgg，所以很奇怪，这个属性真的有嘛？

PreAct ResNet Models NOT following paper nomenclature.

The PreAct ResNet models are NOT following paper nomenclature.

https://arxiv.org/pdf/1603.05027.pdf

For example, it does not exist PreActResNet50... What exist is PreActResNet56...

It should have the format 9n+2...

Plesae, see: https://github.com/szagoruyko/wide-residual-networks/blob/master/models/resnet-pre-act.lua

Question regarding epoch vs. accuracy

Hi,

I have a simple question regarding the epoch vs. accuracy.

Like these,
VGG16 | 92.64%
ResNet18 | 93.02%
ResNet50 | 93.62%

If someone made his own network and trained for 100,000 epochs and it seems to show 95% accuracy at epoch 100,000 at max. and the accuracy declines after that.
And what if ResNet18 showed 93.02% max. at epoch 200 and accuracy declines after that,
Can we say his custom network is better than ResNet18?

Do we have to consider the same number of epoch to compare accuracy between two networks? or max. accuracy only (not considering number of epoch.)?

Let's say at 200 epoch, if ResNet18 has 93.02% and his custom network has around 80%, is it possible to say ResNet18 is better?

Thanks,

preactivation resnet

Hi --

I was looking at the implementation of the preactivation ResNet and had a question -- in the paper, they show the preactivation block below (on the right):

which AFAICT should look like:

    def forward(self, x):
        shortcut = self.shortcut(x)
        out = self.bn1(x)
        out = F.relu(out)
        out = self.conv1(out)
        
        out = self.bn2(out)
        out = F.relu(out)
        out = self.conv2(out)
        
        out += shortcut
        return out

which is slightly different from your implementation:

    def forward(self, x):
        out = self.bn1(x)
        out = F.relu(out)
        shortcut = self.shortcut(out)
        out = self.conv1(out)
        
        out = self.bn2(out)
        out = F.relu(out)
        out = self.conv2(out)
        
        out += shortcut
        return out

In the picture, I think your implementation would look like shifting the first bn and relu into the grey band, then splitting into two branches.

Any thoughts? Am I misinterpreting, or is there a reason for the difference?

Do you have any pre-trained model file?

Have you uploaded the pre-trained model file?So I can save a lot of time training them.Thanks very much!

Why dont we have validation data during cifar trainig?

Please correct me if im wrong, but should not we have validation dataset while we train ?
I am looking at cifar examples and i noticed that we dont have validation dataset, we just have train and test.
Should we skip validation? why?
Can you please explain it to me

Thanks

Some questions on resnet performance

Hi,
I downloaded your codes and tested variations of resnet using the same settings,
but I can't seem to get anywhere near the accuracy mentioned (regardless of the models, all of my tests seem to plateau at around 88~89%). Are there any additional measures that you applied such as weight decay or lowering learning rates?
Neat implementation, by the way.

License?

Could you please add a LICENSE file to this repo? What license are you making the code available under?

Many thanks!

Minor Typo

Hi, @kuangliu, nice work!

There is one closing bracket missing here: https://github.com/kuangliu/pytorch-cifar/blob/master/models/preact_resnet.py#L115

Cheers

Uses test data to select models during training

Based on my reading of this code, it appears like there is currently a split into only train and test datasets, and test set accuracy is used to select the best model during training. This appears to result in a quite overoptimistic estimate of the test performance of the model (especially due to the 3 stages of learning with different learning rates), because test accuracies were already used for training time model selection (the model is "cheating" by learning on the test data). In particular, for the VGG16 model I noticed the test accuracy jumps by around +10% between the end of the first training stage and the beginning of the second training stage (i.e. between lr=0.1 and lr=0.01).

In theory, if you do want to select models without this "cheating" effect then a way to address this is to split into 3 subsets of the data: train, test, and validation, use validation to pick the best model, and only ever evaluate your model on the test set, and do not choose between models based on test set accuracy. See this StackExchange post for details:

https://stats.stackexchange.com/questions/152907/how-do-you-use-test-data-set-after-cross-validation

However, in practice it seems that for CIFAR10 the models simply need more regularization by using both batch normalization and dropout, and it suffices to simply always report the test accuracy at the latest epoch (thus in practice, there is no need for the 3 way test/train/validate split). See e.g. the below repo or paper:

https://github.com/chengyangfu/pytorch-vgg-cifar10
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7486599

How comes transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)

Forgive my ignorance, I am wondering about the the following code
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)
Why are these number for (mean, std)? Are they for some mini-batch (how large the batch size) or total dataset?

Thanks?

请问SENet18表示有当中的18是怎么得来的？

一个block有4个参数层，包括两个卷积和两个全连接，一共有8个block，应该有32个参数层再加上conv1层和最后的linear层，一共是34吧？

LeNet performance request

Firstly, thank you for a comprehensive evaluation of all these models on CIFAR using PyTorch.

I noticed you have a LeNet network defined in your models, which is more-or-less identical to the one used in the PyTorch CIFAR example. Do you have the performance of this model on CIFAR-10?

It is a simple baseline, and I want to make sure my numbers are matching (getting only 65% test accuracy, which seems a bit suspicious).

Correct Normalization Values for CIFAR-10

Correct normalization values for CIFAR-10: (0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261)

Progress_bar value unpack error

from utils import progress_bar
File "C:\Users\Debadri\pytorch-cifar\utils.py", line 45, in
_, term_width = os.popen('stty size', 'r').read().split()
ValueError: not enough values to unpack (expected 2, got 0)
I'm getting this error.
I'm writing this command:
"python main.py"

Is there any argument I need to pass?

Pre-trained resnet50-101-152 models on Cifar 10 or 100

Hi Kuangliu,

Do you have an available pre-trained resnet (50, 101, 152) model (weights, biases) files on Cifar 10 or 100? I want to make some modifications to train on these datasets.
Thanks.

What does the accuracy refer to ?

What doesthe accuracy refer to ?
Train score or test score ?

I have tested with Resnet34. The test accuracy score isnt nearly close enough to the reported one.

Can someone please clarify ?

VGG16 Model and Performace on Cifar-10

I run the Vgg16 model on Cifar-10 ,but the acc in test image only has 89%.
I set epoch =350, and I automatically change learning rate as you mentioned.
And I want to know why do you set classifier=nn.Linear(512,10)? 512 ,the number is how to determine?
Thanks!

Why all bias of nn.Conv2d is False?

As the title, I can't understand all of the CNN's conv bias set as false, such as resnet, SEnet...

Why do you crop the data?

While downloading and transform the dataset, I saw the command:
transforms.RandomCrop(32, padding=4)
But the original img is already 32*32, why do we need to crop？