Coder Social home page Coder Social logo

xnor-net-pytorch's People

Contributors

jiecaoyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xnor-net-pytorch's Issues

About the dataset

You rewrite the torchvision.datasets in your data.dataset. why don't you use torch.datasets.CIFAR10 directly? What 'train_data' and 'train_labels' are, where do I get them. My CIFAR10 dataset is not like that.

XNOR-net for mnist

Hi,thanks for your great job first!after I read your pytorch code,I find you didnt do xnor-net for mnist,you just finish BWN for mnist.Is it?

Question about mul(1.0-1.0/s[1]).mul(n)

Hi ,when I read the last line of function 'updateBinaryGradWeight', I'm confused about 'mul(1.0-1.0/s[1]).mul(n)' . I don't understand where the coefficient come from because the progress of computing gradient has been done. It confused me. Could you tell me the reason? Thanks.

self.target_modules[index].grad.data = m.add(m_add).mul(1.0-1.0/s[1]).mul(n)

Can we modify the code to train for binary weighted network and not XNOR?

Hey @jiecaoyu , first of all appreciate the work. Great job! 👍
When I check the code I only saw binarization of convolution parameters (weights). Is it XNOR or BWN?
Please let me know what I might be missing here.
If it binarizes the input also, can you point to where it does, so that I can edit and try training BWN?

Edit:
Hey @jiecaoyu , got to know where you binarize the input to the layer! It is using the function
BinActive()
Now if I just remove the function, will it be BWN or is there something else I need to do?

Scaling binarized activations.

Hi, I could not find where you are scaling the binarized input activations with their l1 norm. In the BinActive function you just call the sign function which is just directly used to the fwd pass. Could you please point me to where the scaling is happening.
In case you felt it wasn't necessary could you please tell why?

Thanks!

The hyperparameters for training AlexNet on ImageNet

Thanks for this useful implementation.
Do you mind sharing the working hyperparameters (e.g., optimizer, learning rate schedule, momentum, etc.) for training AlexNet on ImageNet? I used the default value with Adam. After training the model with several epochs, the validation accuracy is still 0.

Question about 1e+9

Hi,

I found in some networks, the gradients are multiplied by 1e+9 and some not.
Do you have any idea why they do that?
I know the reason must be tiny gradients on the earlier layers. However, I don't get how they came to the 1e+9 value, and as it is now they are multiplying all the layers by this value. Do you have any intuition or idea on what's actually happening and how do we get this number?

Best,
Fayez

Questions about gradient propagation in code , i am a new hand.

I am a novice and get confused about gradient propagation in this code about training part.
So i have some problems , these may be "stupid".

  1. Can you tell me the gradient propagation how it works..I don`t understand the meaning about:
    # restore weights
    bin_op.restore()
    bin_op.updateBinaryGradWeight()

    It seems after backward, we update weights again? but why need restore?
    the original code shows below.
def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        if args.cuda:
            data, target = data.cuda(), target.cuda()
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()

        # process the weights including binarization
        bin_op.binarization()

        output = model(data)
        loss = criterion(output, target)
    
        loss.backward()

        # restore weights
        bin_op.restore()
        bin_op.updateBinaryGradWeight()

        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
    return

  1. Another problem shows in code below. I don not understand these two parts.
    m.weight.data.zero_().add_(1.0) and m.weight.data.clamp_(min=0.01)
    It seems BatchNorm1d (or 2d) layer`s weight should set some options during initializing and forward?

class LeNet_5(nn.Module):
    def __init__(self):
        super(LeNet_5, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, kernel_size=5, stride=1)
        self.bn_conv1 = nn.BatchNorm2d(20, eps=1e-4, momentum=0.1, affine=False)
        self.relu_conv1 = nn.ReLU(inplace=True)
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.bin_conv2 = BinConv2d(20, 50, kernel_size=5, stride=1, padding=0)
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.bin_ip1 = BinConv2d(50*4*4, 500, Linear=True,
                previous_conv=True, size=4*4)
        self.ip2 = nn.Linear(500, 10)

        for m in self.modules():
            if isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.BatchNorm1d):
                if hasattr(m.weight, 'data'):
                    m.weight.data.zero_().add_(1.0)
        return

    def forward(self, x):
        for m in self.modules():
            if isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.BatchNorm1d):
                if hasattr(m.weight, 'data'):
                    m.weight.data.clamp_(min=0.01)
        x = self.conv1(x)
        x = self.bn_conv1(x)
        x = self.relu_conv1(x)
        x = self.pool1(x)
        x = self.bin_conv2(x)
        x = self.pool2(x)

        # x = x.view(x.size(0), 50*4*4)

        x = self.bin_ip1(x)
        x = self.ip2(x)
        return x

Sincerely
THANK YOU VERY MUCH.

Why shuffle=False for train_loader?

Normally, shuffle should set to be True for train_loader. But why you set it to False instead? And why use the caffe normalization rather than the default pytorch transformation? Will it have a big influence to the final performance?

A typo in README

The second term of the correct backward gradient in Notes should be
equation.
Also, since equation, why not further simplify the equations? Did I miss something?

Documentation

First of all, thank you for your effort in implementing xnornet, i just have one suggestion, it would be amazing if you make a documentation for your code, and explain the intution behind binarization in a simpler way than the original paper. :)

Hi, If I have got the Imagenet data by caffe, how can I use it for Pytorch

when I compile the alexnet code, I meet the following error. my directory is ilsvrc12_train_lmdb and ilsvrc12_val_lmdb both directory has a data.mdb and lock.mdb, can I change it for pytorch?

Traceback (most recent call last):
File "main.py", line 27, in
import datasets as datasets
File "/home/yy/XNOR-Net-PyTorch-master/ImageNet/networks/../datasets/init.py", line 1, in
from .folder import ImageFolder
File "/home/yy/XNOR-Net-PyTorch-master/ImageNet/networks/../datasets/folder.py", line 6, in
import lmdb
ImportError: No module named lmdb

Little confused with BinActive function

For my view with BWN,In the forward pass,we only need to compute alpha and a binary filter B w.r.t its float one in XnorNet paper,but not to binarize a filter's output which is also the input of next layer.So ,why we need BinActive function?And I think your implementation of MNIST is real a BWN version instead of XNOR version after reading your code.Or maybe i ignore some details .
Another question is: Datatype to represent 0/1 is still float which can't reduce memory or time in training pass.I'm very confused.
THanks for your help

LeNet arch in models/LeNet_5.py

Hi, the number of channels in conv1 for LeNet-5 in models/LeNet_5.py is 20. But I think conv1 in the original LeNet-5 is 6.
self.conv1 = nn.Conv2d(1, 20, kernel_size=5, stride=1)

So why the number of channels is 20 in conv1?

Can you give me some advises?

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte

Hi,

I have run in a rather strange error.
When running your project I get the following error while attempting to read the data:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte
This gets triggered here: https://github.com/jiecaoyu/XNOR-Net-PyTorch/blob/master/CIFAR_10/main.py#L118

Now the weird thing is that when I manually tried to load the data through ipython I get the same error when using python3 but it works fine when using python2.

Any suggestions?Is the dataset encoded with a chinese characterset e.g?

AutoGrad (gradcheck) on BinActive

class BinActive(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        size = input.size()
        input = input.sign()
        mean = abs(input).mean()
        return mean*input
    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input.ge(1)] = 0
        grad_input[input.le(-1)] = 0
        p2_f = grad_input.abs().mean()*grad_input
        p1_f = grad_input.sign()*(1/grad_input.nelement())
        grad_input = p1_f + p2_f
        return grad_input

x = torch.randn(52, 24, requires_grad=True)
binactive = BinActive.apply
res = torch.autograd.gradcheck(binactive, (x,), atol=1e-2)

After running the following code, I get res as True.
Whereas, the BinActive code in this implementation does not work with gradcheck.
Am I correct to say that the BinActive backward propogation in this implementation is an approximation, and for this reason it does not work with gradcheck?
I tried to re-derive the backward propogation from your notes and code out the backward method myself.
This is not specifically an issue, but a general doubt.

Would appreciate your comments on this matter!

P.S. I understand that you haven't multiplied the mean in the forward, but I just wanted to get it working with gradcheck for my own satisfaction.

Thanks.

updateBinaryGradWeight-factor

Thanks for your codes.
I have quick question about line 82 of 'CIFAR_10/util.py'. Why we have to 'mul(1.0-1.0/s[1]).mul(n)'?

xnor on tensorflow

Thanks for your sharing. And i found some bug in cifar-10/main.py.
Line 23:in py3, keys() is a lazy method, it will raise exception if you change the keys when iterating. Adding a list() outside will be fine.
Line 123:batch size of test should be 128 instead of 100, as it correspond to the number in Line 78.
Line 165:+1 tab?

Actually, I implement your code on tensorflow. But it doesn't work. And i reduce the lr to 1e-4, it gets 80.9% acc.
I run your code and get a model of 86.17% acc. And I load the model in my code and get 86.2%acc. Amazing!!!
I find the problem that bn of the third binconv get a small computational error, which can be only detected by sum of the output of bn. and will be amplified by layers after that bn.
Maybe it's the problem of cuda or cudnn.T_T
My code is on https://github.com/ljhandlwt/xnor-net-tf

Not quite get updateBinaryGradWeight()

Thanks so much for sharing, I am new to deep learning, and I learned a lot from your code.
I have a question, I kind get
m = weight.norm(1, 3, keepdim=True).sum(2, keepdim=True).sum(1, keepdim=True).div(n).expand(s)
is to calculate
screen shot 2018-06-28 at 1 06 26 pm
Next step why you use m * self.target_modules[index].grad.data

Finally how do you get new weight grad data.
self.target_modules[index].grad.data = m.add(m_add).mul(1.0-1.0/s[1]).mul(n)

Could you explain more? for example which part is the
screen shot 2018-06-28 at 1 10 42 pm
screen shot 2018-06-28 at 1 10 49 pm
screen shot 2018-06-28 at 1 10 55 pm

TypeError: clamp() got an unexpected keyword argument 'out'

When i run this code, problem happen, and how to solved this problem.
My python version is 2.7, thank you !

cd <Repository Root>/MNIST/
python main.py
File "/pytorch_test/XNOR-Net-PyTorch/MNIST/util.py", line 46, in clampConvParams
out = self.target_modules[index].data)
TypeError: clamp() got an unexpected keyword argument 'out'

details in L1 norm

in line 57 of util.py in MINST, you do the L1 norm for one dimension and sum up other dimension and then divide by n, which is not consistent with the paper where it treat the W as one cxhxw vector and do the L1 norm of whole vector, I am just curious why do you the former way? because of performance or implementation reason?

Thanks, real great work

train on custom data

Hey,

I was wondering if it s possible to train XNOR-Net-PyTorch on custom data ? I would appreciate your help.

Best regards

TypeError: expected a readable buffer object

File "pytorch-master/XNOR-Net-PyTorch-master/ImageNet/networks/../datasets/folder.py", line 125, in getitem datum.ParseFromString(value)

what occurs this error?
ps: I use the imagenet dataset which comes from the caffe ilsvrc12_train_lmdb and ilsvrc12_val_lmdb

L1 conv issues

Hey @jiecaoyu ,
Thanks for your great work and sharing it.
I have a question about the LeNet_5 implementation. At line 12 in your util.py code, you set start_range =1 and end_range=count_targets-2. If my understanding is right, you mean here that the second conv layer(size:[20, 50, 5, 5 ]) and the first fc layer (size:[500, 800]) are binarized. When I set end_range = count_targets-1, I can binarize the second fc layer, and the accuracy degrades only a little(<1%). However, when I set start_range = 0, which means the first conv layer is binarized, the accuracy is kept at a low level.
Is it possible to binarize the first conv layer using your code? if is, could tell me how to binarize the first conv layer. Thanks for your sharing again.

Question about the LeNet_5 used in the MNIST training

Hello, I wonder if this network implementation is correct, since the block structure for the XNOR-Net in the original paper is like BNorm -> BinActive -> BinConv -> Pool. However, I couldn't find such sequences.

Thanks,
S.

About the BinOp&Training

The BinOp helps to binarize the weights and store them in self.saved_params. But the weights in model remains the same. 'output = model(data)' in the training still use the floating point weights. So where do we perform the bitcount or binary weights? @jiecaoyu Really appreciate ur help!
" optimizer.zero_grad()
bin_op.binarization()
output = model(data)
loss = criterion(output, target)
loss.backward()
bin_op.restore()
bin_op.updateBinaryGradWeight()
optimizer.step()"

NIN structure cifar10 data

hi i just wonder how can i convert normal python CIFAR-10 dataset to train_data, label and train_data ( NIN sturcture???) if you have code about this please send it to me

Is this the implementation of BNN or XNOR ?

I look from your readme the accuracy is similar to the XNOR.
In XNOR forward path, The conv should be $(sign(I)\circledast sign(W)) \odot K \alpha$
But in your code, I don't find the calculation of $K$. And the input to conv layer is binary tensor.
Is this my misunderstanding?

Scale gradient of ImageNet by 1e+9

self.target_modules[index].grad.data = self.target_modules[index].grad.data.mul(1e+9)

self.target_modules[index].grad.data = self.target_modules[index].grad.data.mul(1e+9)

I notice cifar 10 does not have this line. Is there any difference between ImageNet and Cifar. Could you please give some hints that why we want this line?

Pre-tarined model cannot reproduce the expected accuracy.

Hi!
I tested the pre-trained model on ImageNet but only get Top1 at 4% and Top5 at 12% which was far away from the asserted accuracy.
I think the gap may partly caused by some different image pre-processing procedures.
Here is my transform:

normalize = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

train_transform = Compose([
    RandomResizedCrop(224),
    RandomHorizontalFlip(),
    ToTensor(),
    normalize,
])

val_transform = Compose([
    Resize(256),
    CenterCrop(224),
    ToTensor(),
    normalize,
])

But I think only the difference between pre-processing is not enough to cause such a huge gap.
Do you have any ideas?
Thank you!

Why not binary the parameters of the first and the last conv layer?

start_range = 1
end_range = count_Conv2d-2
self.bin_range = numpy.linspace(start_range,
end_range, end_range-start_range+1)\
.astype('int').tolist()
self.num_of_params = len(self.bin_range)
self.saved_params = []
self.target_params = []
self.target_modules = []
index = -1
for m in model.modules():
if isinstance(m, nn.Conv2d):
index = index + 1
if index in self.bin_range:
tmp = m.weight.data.clone()
self.saved_params.append(tmp)
self.target_modules.append(m.weight)

XNOR for ResNet

I was wondering whether there is XNOR for ResNet. Or can you give some hints about how to implement it?

No speedup and memory saving on CIFAR10

I have played around with CIFAR10 and also done a bit benchmark. It seems BinOp does not have noticeable effect on model size and inference speed compared to NIN model without BinOp. I have tested both on CPU and GPU. I thought the saved model nin.pth.tar would shrink, and the inference would speed up significantly. Do I miss something? Does anyone have this issue? Thanks.

Cannot load alexnet.baseline.pth.tar

I use my own code to load the 'alexnet.baseline.pth.tar' file.
However, the keys in the file are not consistent with those in the model which are shown in the pic.

screenshot from 2018-04-11 09-43-27

And the model's code is

class AlexNet(nn.Module):
    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()
        self.num_classes = num_classes
        self.features = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2),
            nn.BatchNorm2d(96, eps=1e-4, momentum=0.1, affine=True),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            BinConv2d(96, 256, kernel_size=5, stride=1, padding=2, groups=1),
            nn.MaxPool2d(kernel_size=3, stride=2),
            BinConv2d(256, 384, kernel_size=3, stride=1, padding=1),
            BinConv2d(384, 384, kernel_size=3, stride=1, padding=1, groups=1),
            BinConv2d(384, 256, kernel_size=3, stride=1, padding=1, groups=1),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.classifier = nn.Sequential(
            BinConv2d(256 * 6 * 6, 4096, Linear=True),
            BinConv2d(4096, 4096, dropout=0.5, Linear=True),
            nn.BatchNorm1d(4096, eps=1e-3, momentum=0.1, affine=True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )

Shouldn't it be 'features.0.weight' but rather 'features.module.0.weight' ?
Or am I using a wrong pretrained model?
BTW I'm really curious how did you attach the '.module'? LOL

what is the purpose of line89&89 in /XNOR-Net-PyTorch/ImageNet/networks/util.py?

First, thanks for your effort of re-implementing the XNOR-Net.

When I read the code, I am confused with purposes of the line 88 and line 89 in the /XNOR-Net-PyTorch/ImageNet/networks/util.py.

self.target_modules[index].grad.data = m.add(m_add).mul(1.0-1.0/s[1]).mul(n)
self.target_modules[index].grad.data = self.target_modules[index].grad.data.mul(1e+9)

I have read your notes about the gradient of the binary weight filter, it is very elegant.
But I do not know what these two lines do.

And I think 1.0-1.0/s[1] is nearly equal to 1, so you just multiple the gradient by n * (1e+9)?
Is it because the gradient is too small and why you choose the number 1e+9?

Training logs for the networks

Could you kindly share the training logs for the NIN CIFAR-10 model? Maybe others as well. I would really appreciate that.
Thank you.

Confusion in notes

Hello!
I was attempting to understand your notes on Backward Gradient of Scaled Sign Function.
What is the bold W? Are those all the parameters of the network? (Assuming a fully connected network.)
W = [W1, W2,..., Wn]
Here, does the Wk denote a specific layer (kth layer) from the network? (k being any number from 1 to n)
Equation 4 seems to lead me to think that W is the weight for a specific layer, and Wi is the ith element from the W matrix.
I am pretty sure the W consists of all the layers, but honestly I want to be certain.
I ask because it is causing some ambiguity in the later equations.

Would really appreciate the help!
Thanks.

Mismatch of equation(12) and code

Thanks for your great job.
But I am confused of the code to achieve equation(12).
In your code, sign(Wi)'/Wi'=0 if |Wi|>1, sign(Wi)'/Wi'=1 if |Wi|<1. It doesn't match the function sign(Wi)'/Wi'=0 if Wi !=0. Is it just a Approximation?

Running pretrained model in pytorch

Hi jiecaoyu,

I was trying to run the pretrained AlexNet model you provided but failed. I noticed that you used an older version of pytorch which makes the network arch incompatible with 3.0. So I switched to 2.0.

I also noticed that you read all the images from caffe generated lmdb files. I instead use pytorch ImageFolder to read raw images and resize them to 256 by 256. I followed the same normalization transformation as subtracting per-pixel imagemean. So after transformation data lies roughly between -128 and +128.

Is there anything else that I missed? I really appreciate your help on reproducing your results!

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.