jiecaoyu / xnor-net-pytorch Goto Github PK

View Code? Open in Web Editor NEW

472.0 472.0 119.0 3.1 MB

PyTorch Implementation of XNOR-Net

Python 92.72% TeX 7.26% Shell 0.03%

xnor-net-pytorch's People

Contributors

Stargazers

Watchers

Forkers

akhauriyash flygyyy caseyanya codeaudit jorisgu zhaochangyong zhaoweicai lukasc-ch paperfactory hq-liu fangliangbai sagarkar10 slidelucask zjtgit djimy-softdev ywcdpp yanwang2014 zhaoluo senwang86 arvind-india niluanwudidadi hongxinggao csyhhu ichase5 pedronahum andeyeluguo huwade shayxurui oneraynyday ml-lab lberrada nocotan wyf0912 mmderakhshani tanmayv25 fkylwl mariusanje nejyeah trashcrash houlu369 amwons kastnerkyle pokemon-huang shuai-xie xig-data ceoluo justimyhxu l0vekry santhosh-ks briantliao ppplang beckgom zyiyy xiangqianma minisparrow 666dzy666 aung2phyowai mingren200323 timrayz yuxianzhi jonathanbonnard praveenvenugopal t1anzhenyu lusensama jiansfoggy entn-at hukim93 mygit007hub sunpengfei1122 dwang181 moshimowana lanwong1 xinyishen0304 llfl lliai yu-cognomotiv ericallen16 prantik1998 iitbombombay tummywang shaoyandea bonomali yaduydk97 utkarshsaxena1 elbruzozen tanvisharma luningyu yuanchunyu b-bread nhatuan84 lee-man girishmk0602 ybakman ashishpatel26 liang-zx tehtea koo9797 sugsugsug etemtezcan meowdla

xnor-net-pytorch's Issues

About the dataset

You rewrite the torchvision.datasets in your data.dataset. why don't you use torch.datasets.CIFAR10 directly? What 'train_data' and 'train_labels' are, where do I get them. My CIFAR10 dataset is not like that.

XNOR-net for mnist

Hi，thanks for your great job first！after I read your pytorch code，I find you didnt do xnor-net for mnist，you just finish BWN for mnist.Is it？

what is your version of pytorch

I just got the error "ImportError: No module named distributed" when I
import torch.utils.data.distributed

Question about mul(1.0-1.0/s[1]).mul(n)

Hi ,when I read the last line of function 'updateBinaryGradWeight', I'm confused about 'mul(1.0-1.0/s[1]).mul(n)' . I don't understand where the coefficient come from because the progress of computing gradient has been done. It confused me. Could you tell me the reason? Thanks.

self.target_modules[index].grad.data = m.add(m_add).mul(1.0-1.0/s[1]).mul(n)

Can we modify the code to train for binary weighted network and not XNOR?

Hey @jiecaoyu , first of all appreciate the work. Great job! 👍
When I check the code I only saw binarization of convolution parameters (weights). Is it XNOR or BWN?
Please let me know what I might be missing here.
If it binarizes the input also, can you point to where it does, so that I can edit and try training BWN?

Edit:
Hey @jiecaoyu , got to know where you binarize the input to the layer! It is using the function
BinActive()
Now if I just remove the function, will it be BWN or is there something else I need to do?

Scaling binarized activations.

Hi, I could not find where you are scaling the binarized input activations with their l1 norm. In the BinActive function you just call the sign function which is just directly used to the fwd pass. Could you please point me to where the scaling is happening.
In case you felt it wasn't necessary could you please tell why?

Thanks!

The hyperparameters for training AlexNet on ImageNet

Thanks for this useful implementation.
Do you mind sharing the working hyperparameters (e.g., optimizer, learning rate schedule, momentum, etc.) for training AlexNet on ImageNet? I used the default value with Adam. After training the model with several epochs, the validation accuracy is still 0.

Question about 1e+9

Hi,

I found in some networks, the gradients are multiplied by 1e+9 and some not.
Do you have any idea why they do that?
I know the reason must be tiny gradients on the earlier layers. However, I don't get how they came to the 1e+9 value, and as it is now they are multiplying all the layers by this value. Do you have any intuition or idea on what's actually happening and how do we get this number?

Best,
Fayez

Questions about gradient propagation in code , i am a new hand.

I am a novice and get confused about gradient propagation in this code about training part.
So i have some problems , these may be "stupid".

Can you tell me the gradient propagation how it works..I don`t understand the meaning about:
# restore weights
bin_op.restore()
bin_op.updateBinaryGradWeight()
It seems after backward, we update weights again? but why need restore?
the original code shows below.

def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        if args.cuda:
            data, target = data.cuda(), target.cuda()
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()

        # process the weights including binarization
        bin_op.binarization()

        output = model(data)
        loss = criterion(output, target)
    
        loss.backward()

        # restore weights
        bin_op.restore()
        bin_op.updateBinaryGradWeight()

        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
    return

Another problem shows in code below. I don not understand these two parts.
m.weight.data.zero_().add_(1.0) and m.weight.data.clamp_(min=0.01)
It seems BatchNorm1d (or 2d) layer`s weight should set some options during initializing and forward?


class LeNet_5(nn.Module):
    def __init__(self):
        super(LeNet_5, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, kernel_size=5, stride=1)
        self.bn_conv1 = nn.BatchNorm2d(20, eps=1e-4, momentum=0.1, affine=False)
        self.relu_conv1 = nn.ReLU(inplace=True)
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.bin_conv2 = BinConv2d(20, 50, kernel_size=5, stride=1, padding=0)
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.bin_ip1 = BinConv2d(50*4*4, 500, Linear=True,
                previous_conv=True, size=4*4)
        self.ip2 = nn.Linear(500, 10)

        for m in self.modules():
            if isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.BatchNorm1d):
                if hasattr(m.weight, 'data'):
                    m.weight.data.zero_().add_(1.0)
        return

    def forward(self, x):
        for m in self.modules():
            if isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.BatchNorm1d):
                if hasattr(m.weight, 'data'):
                    m.weight.data.clamp_(min=0.01)
        x = self.conv1(x)
        x = self.bn_conv1(x)
        x = self.relu_conv1(x)
        x = self.pool1(x)
        x = self.bin_conv2(x)
        x = self.pool2(x)

        # x = x.view(x.size(0), 50*4*4)

        x = self.bin_ip1(x)
        x = self.ip2(x)
        return x

Sincerely
THANK YOU VERY MUCH.

Why shuffle=False for train_loader?

Normally, shuffle should set to be True for train_loader. But why you set it to False instead? And why use the caffe normalization rather than the default pytorch transformation? Will it have a big influence to the final performance?

A typo in README

The second term of the correct backward gradient in Notes should be
.
Also, since , why not further simplify the equations? Did I miss something?

Documentation

First of all, thank you for your effort in implementing xnornet, i just have one suggestion, it would be amazing if you make a documentation for your code, and explain the intution behind binarization in a simpler way than the original paper. :)

Hi, If I have got the Imagenet data by caffe, how can I use it for Pytorch

when I compile the alexnet code, I meet the following error. my directory is ilsvrc12_train_lmdb and ilsvrc12_val_lmdb both directory has a data.mdb and lock.mdb, can I change it for pytorch?

Traceback (most recent call last):
File "main.py", line 27, in
import datasets as datasets
File "/home/yy/XNOR-Net-PyTorch-master/ImageNet/networks/../datasets/init.py", line 1, in
from .folder import ImageFolder
File "/home/yy/XNOR-Net-PyTorch-master/ImageNet/networks/../datasets/folder.py", line 6, in
import lmdb
ImportError: No module named lmdb

Little confused with BinActive function

For my view with BWN,In the forward pass,we only need to compute alpha and a binary filter B w.r.t its float one in XnorNet paper,but not to binarize a filter's output which is also the input of next layer.So ,why we need BinActive function?And I think your implementation of MNIST is real a BWN version instead of XNOR version after reading your code.Or maybe i ignore some details .
Another question is: Datatype to represent 0/1 is still float which can't reduce memory or time in training pass.I'm very confused.
THanks for your help

What is the point of this implementation?

Is it just a proof of concept for the accuracy?
It's not even using bitwise ops to get the 58x speedup and 32x less memory...

LeNet arch in models/LeNet_5.py

Hi, the number of channels in conv1 for LeNet-5 in models/LeNet_5.py is 20. But I think conv1 in the original LeNet-5 is 6.
self.conv1 = nn.Conv2d(1, 20, kernel_size=5, stride=1)

So why the number of channels is 20 in conv1?

Can you give me some advises?

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte

Hi,

I have run in a rather strange error.
When running your project I get the following error while attempting to read the data:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte
This gets triggered here: https://github.com/jiecaoyu/XNOR-Net-PyTorch/blob/master/CIFAR_10/main.py#L118

Now the weird thing is that when I manually tried to load the data through ipython I get the same error when using python3 but it works fine when using python2.

Any suggestions?Is the dataset encoded with a chinese characterset e.g?

AutoGrad (gradcheck) on BinActive

class BinActive(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        size = input.size()
        input = input.sign()
        mean = abs(input).mean()
        return mean*input
    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input.ge(1)] = 0
        grad_input[input.le(-1)] = 0
        p2_f = grad_input.abs().mean()*grad_input
        p1_f = grad_input.sign()*(1/grad_input.nelement())
        grad_input = p1_f + p2_f
        return grad_input

x = torch.randn(52, 24, requires_grad=True)
binactive = BinActive.apply
res = torch.autograd.gradcheck(binactive, (x,), atol=1e-2)

After running the following code, I get res as True.
Whereas, the BinActive code in this implementation does not work with gradcheck.
Am I correct to say that the BinActive backward propogation in this implementation is an approximation, and for this reason it does not work with gradcheck?
I tried to re-derive the backward propogation from your notes and code out the backward method myself.
This is not specifically an issue, but a general doubt.

Would appreciate your comments on this matter!

P.S. I understand that you haven't multiplied the mean in the forward, but I just wanted to get it working with gradcheck for my own satisfaction.

Thanks.

ImportError: No module named distributed

when use "import torch.utils.data.distributed",it ocurrs error "No module named distributed"
what's your version of pytorch?

updateBinaryGradWeight-factor

Thanks for your codes.
I have quick question about line 82 of 'CIFAR_10/util.py'. Why we have to 'mul(1.0-1.0/s[1]).mul(n)'?

xnor on tensorflow

Thanks for your sharing. And i found some bug in cifar-10/main.py.
Line 23:in py3, keys() is a lazy method, it will raise exception if you change the keys when iterating. Adding a list() outside will be fine.
Line 123:batch size of test should be 128 instead of 100, as it correspond to the number in Line 78.
Line 165:+1 tab?

Actually, I implement your code on tensorflow. But it doesn't work. And i reduce the lr to 1e-4, it gets 80.9% acc.
I run your code and get a model of 86.17% acc. And I load the model in my code and get 86.2%acc. Amazing!!!
I find the problem that bn of the third binconv get a small computational error, which can be only detected by sum of the output of bn. and will be amplified by layers after that bn.
Maybe it's the problem of cuda or cudnn.T_T
My code is on https://github.com/ljhandlwt/xnor-net-tf

Not quite get updateBinaryGradWeight()

Thanks so much for sharing, I am new to deep learning, and I learned a lot from your code.
I have a question, I kind get
m = weight.norm(1, 3, keepdim=True).sum(2, keepdim=True).sum(1, keepdim=True).div(n).expand(s)
is to calculate

Next step why you use m * self.target_modules[index].grad.data

Finally how do you get new weight grad data.
self.target_modules[index].grad.data = m.add(m_add).mul(1.0-1.0/s[1]).mul(n)

Could you explain more? for example which part is the

TypeError: clamp() got an unexpected keyword argument 'out'

When i run this code, problem happen, and how to solved this problem.
My python version is 2.7, thank you !

cd <Repository Root>/MNIST/
python main.py

File "/pytorch_test/XNOR-Net-PyTorch/MNIST/util.py", line 46, in clampConvParams
out = self.target_modules[index].data)
TypeError: clamp() got an unexpected keyword argument 'out'

details in L1 norm

in line 57 of util.py in MINST, you do the L1 norm for one dimension and sum up other dimension and then divide by n, which is not consistent with the paper where it treat the W as one cxhxw vector and do the L1 norm of whole vector, I am just curious why do you the former way? because of performance or implementation reason?

Thanks, real great work

train on custom data

Hey,

I was wondering if it s possible to train XNOR-Net-PyTorch on custom data ? I would appreciate your help.

Best regards

TypeError: expected a readable buffer object

File "pytorch-master/XNOR-Net-PyTorch-master/ImageNet/networks/../datasets/folder.py", line 125, in getitem datum.ParseFromString(value)

what occurs this error?
ps: I use the imagenet dataset which comes from the caffe ilsvrc12_train_lmdb and ilsvrc12_val_lmdb

./XNOR-Net-PyTorch/ImageNet/networks/main.py line 165

In python 3
line 165: print model
should be:
line 165: print (model)
the same as line 329

Will I get ~32x speedup on your XNOR implementation?

Sorry for potentially stupid question but I failed to find explicit answer on this vital question.

L1 conv issues

Hey @jiecaoyu ,
Thanks for your great work and sharing it.
I have a question about the LeNet_5 implementation. At line 12 in your util.py code, you set start_range =1 and end_range=count_targets-2. If my understanding is right, you mean here that the second conv layer(size:[20, 50, 5, 5 ]) and the first fc layer (size:[500, 800]) are binarized. When I set end_range = count_targets-1, I can binarize the second fc layer, and the accuracy degrades only a little(<1%). However, when I set start_range = 0, which means the first conv layer is binarized, the accuracy is kept at a low level.
Is it possible to binarize the first conv layer using your code? if is, could tell me how to binarize the first conv layer. Thanks for your sharing again.

Question about the LeNet_5 used in the MNIST training

Hello, I wonder if this network implementation is correct, since the block structure for the XNOR-Net in the original paper is like BNorm -> BinActive -> BinConv -> Pool. However, I couldn't find such sequences.

Thanks,
S.

Why the gradient in backward after the scaled sign function looks unlike the gradient in the paper?

Is there any trick here?

About the BinOp&Training

The BinOp helps to binarize the weights and store them in self.saved_params. But the weights in model remains the same. 'output = model(data)' in the training still use the floating point weights. So where do we perform the bitcount or binary weights? @jiecaoyu Really appreciate ur help!
" optimizer.zero_grad()
bin_op.binarization()
output = model(data)
loss = criterion(output, target)
loss.backward()
bin_op.restore()
bin_op.updateBinaryGradWeight()
optimizer.step()"

NIN structure cifar10 data

hi i just wonder how can i convert normal python CIFAR-10 dataset to train_data, label and train_data ( NIN sturcture???) if you have code about this please send it to me

Is this the implementation of BNN or XNOR ?

I look from your readme the accuracy is similar to the XNOR.
In XNOR forward path, The conv should be $(sign(I)\circledast sign(W)) \odot K \alpha$
But in your code, I don't find the calculation of $K$. And the input to conv layer is binary tensor.
Is this my misunderstanding?

Scale gradient of ImageNet by 1e+9

XNOR-Net-PyTorch/ImageNet/networks/util.py

Line 89 in 26cca8c

    
           self.target_modules[index].grad.data = self.target_modules[index].grad.data.mul(1e+9)

self.target_modules[index].grad.data = self.target_modules[index].grad.data.mul(1e+9)

I notice cifar 10 does not have this line. Is there any difference between ImageNet and Cifar. Could you please give some hints that why we want this line?

what's the difference between your dataset and the offical(CIFAR)

As I didn't see you use Normalize in your code, can you explain the details?

Pre-tarined model cannot reproduce the expected accuracy.

Hi!
I tested the pre-trained model on ImageNet but only get Top1 at 4% and Top5 at 12% which was far away from the asserted accuracy.
I think the gap may partly caused by some different image pre-processing procedures.
Here is my transform:

normalize = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

train_transform = Compose([
    RandomResizedCrop(224),
    RandomHorizontalFlip(),
    ToTensor(),
    normalize,
])

val_transform = Compose([
    Resize(256),
    CenterCrop(224),
    ToTensor(),
    normalize,
])

But I think only the difference between pre-processing is not enough to cause such a huge gap.
Do you have any ideas?
Thank you!

Why not binary the parameters of the first and the last conv layer?

XNOR-Net-PyTorch/CIFAR_10/util.py

Lines 12 to 28 in 26cca8c

    
           start_range = 1 
        
           end_range = count_Conv2d-2 
        
           self.bin_range = numpy.linspace(start_range, 
        
                   end_range, end_range-start_range+1)\ 
        
                           .astype('int').tolist() 
        
           self.num_of_params = len(self.bin_range) 
        
           self.saved_params = [] 
        
           self.target_params = [] 
        
           self.target_modules = [] 
        
           index = -1 
        
           for m in model.modules(): 
        
               if isinstance(m, nn.Conv2d): 
        
                   index = index + 1 
        
                   if index in self.bin_range: 
        
                       tmp = m.weight.data.clone() 
        
                       self.saved_params.append(tmp) 
        
                       self.target_modules.append(m.weight)

LeNet-5 - MNIST structure is unlike the paper?

Hello,
I am confused, is the LeNet-5 implementation the same as those available online?
https://cdn-images-1.medium.com/max/800/1*yXjgC7PFTxb-Oi_L_hoFXA.png
This is different from the LeNet-5 seen in your repository.
Kindly clarify.

P.S. By 'the paper', I refer to this http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
Thanks!

m[weight.lt(-1.0)] = 0 and m[weight.gt(1.0)] = 0 in updateBinaryGradWeight

Thanks for your code ， But I feel this sentence does not seem to correspond to the formula in your note

Bias terms are not binarized

It seems that bias terms in Conv2d and Linear layers are not binarized.

XNOR for ResNet

I was wondering whether there is XNOR for ResNet. Or can you give some hints about how to implement it?

No speedup and memory saving on CIFAR10

I have played around with CIFAR10 and also done a bit benchmark. It seems BinOp does not have noticeable effect on model size and inference speed compared to NIN model without BinOp. I have tested both on CPU and GPU. I thought the saved model nin.pth.tar would shrink, and the inference would speed up significantly. Do I miss something? Does anyone have this issue? Thanks.

you did not use the mean

Cannot load alexnet.baseline.pth.tar

I use my own code to load the 'alexnet.baseline.pth.tar' file.
However, the keys in the file are not consistent with those in the model which are shown in the pic.

And the model's code is

class AlexNet(nn.Module):
    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()
        self.num_classes = num_classes
        self.features = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2),
            nn.BatchNorm2d(96, eps=1e-4, momentum=0.1, affine=True),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            BinConv2d(96, 256, kernel_size=5, stride=1, padding=2, groups=1),
            nn.MaxPool2d(kernel_size=3, stride=2),
            BinConv2d(256, 384, kernel_size=3, stride=1, padding=1),
            BinConv2d(384, 384, kernel_size=3, stride=1, padding=1, groups=1),
            BinConv2d(384, 256, kernel_size=3, stride=1, padding=1, groups=1),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.classifier = nn.Sequential(
            BinConv2d(256 * 6 * 6, 4096, Linear=True),
            BinConv2d(4096, 4096, dropout=0.5, Linear=True),
            nn.BatchNorm1d(4096, eps=1e-3, momentum=0.1, affine=True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )

Shouldn't it be 'features.0.weight' but rather 'features.module.0.weight' ?
Or am I using a wrong pretrained model?
BTW I'm really curious how did you attach the '.module'? LOL

what is the purpose of line89&89 in /XNOR-Net-PyTorch/ImageNet/networks/util.py?

First, thanks for your effort of re-implementing the XNOR-Net.

When I read the code, I am confused with purposes of the line 88 and line 89 in the /XNOR-Net-PyTorch/ImageNet/networks/util.py.

self.target_modules[index].grad.data = m.add(m_add).mul(1.0-1.0/s[1]).mul(n)
self.target_modules[index].grad.data = self.target_modules[index].grad.data.mul(1e+9)

I have read your notes about the gradient of the binary weight filter, it is very elegant.
But I do not know what these two lines do.

And I think 1.0-1.0/s[1] is nearly equal to 1, so you just multiple the gradient by n * (1e+9)?
Is it because the gradient is too small and why you choose the number 1e+9?

Training logs for the networks

Could you kindly share the training logs for the NIN CIFAR-10 model? Maybe others as well. I would really appreciate that.
Thank you.

Confusion in notes

Hello!
I was attempting to understand your notes on Backward Gradient of Scaled Sign Function.
What is the bold W? Are those all the parameters of the network? (Assuming a fully connected network.)
W = [W1, W2,..., Wn]
Here, does the Wk denote a specific layer (kth layer) from the network? (k being any number from 1 to n)
Equation 4 seems to lead me to think that W is the weight for a specific layer, and Wi is the ith element from the W matrix.
I am pretty sure the W consists of all the layers, but honestly I want to be certain.
I ask because it is causing some ambiguity in the later equations.

Would really appreciate the help!
Thanks.

Mismatch of equation(12) and code

Thanks for your great job.
But I am confused of the code to achieve equation(12).
In your code, sign(Wi)'/Wi'=0 if |Wi|>1, sign(Wi)'/Wi'=1 if |Wi|<1. It doesn't match the function sign(Wi)'/Wi'=0 if Wi !=0. Is it just a Approximation?

Running pretrained model in pytorch

Hi jiecaoyu,

I was trying to run the pretrained AlexNet model you provided but failed. I noticed that you used an older version of pytorch which makes the network arch incompatible with 3.0. So I switched to 2.0.

I also noticed that you read all the images from caffe generated lmdb files. I instead use pytorch ImageFolder to read raw images and resize them to 256 by 256. I followed the same normalization transformation as subtracting per-pixel imagemean. So after transformation data lies roughly between -128 and +128.

Is there anything else that I missed? I really appreciate your help on reproducing your results!