jiecaoyu / xnor-net-pytorch Goto Github PK
View Code? Open in Web Editor NEWPyTorch Implementation of XNOR-Net
PyTorch Implementation of XNOR-Net
You rewrite the torchvision.datasets in your data.dataset. why don't you use torch.datasets.CIFAR10 directly? What 'train_data' and 'train_labels' are, where do I get them. My CIFAR10 dataset is not like that.
Hi,thanks for your great job first!after I read your pytorch code,I find you didnt do xnor-net for mnist,you just finish BWN for mnist.Is it?
I just got the error "ImportError: No module named distributed" when I
import torch.utils.data.distributed
Hi ,when I read the last line of function 'updateBinaryGradWeight', I'm confused about 'mul(1.0-1.0/s[1]).mul(n)' . I don't understand where the coefficient come from because the progress of computing gradient has been done. It confused me. Could you tell me the reason? Thanks.
self.target_modules[index].grad.data = m.add(m_add).mul(1.0-1.0/s[1]).mul(n)
Hey @jiecaoyu , first of all appreciate the work. Great job! 👍
When I check the code I only saw binarization of convolution parameters (weights). Is it XNOR or BWN?
Please let me know what I might be missing here.
If it binarizes the input also, can you point to where it does, so that I can edit and try training BWN?
Edit:
Hey @jiecaoyu , got to know where you binarize the input to the layer! It is using the function
BinActive()
Now if I just remove the function, will it be BWN or is there something else I need to do?
Hi, I could not find where you are scaling the binarized input activations with their l1 norm. In the BinActive function you just call the sign function which is just directly used to the fwd pass. Could you please point me to where the scaling is happening.
In case you felt it wasn't necessary could you please tell why?
Thanks!
Thanks for this useful implementation.
Do you mind sharing the working hyperparameters (e.g., optimizer, learning rate schedule, momentum, etc.) for training AlexNet on ImageNet? I used the default value with Adam. After training the model with several epochs, the validation accuracy is still 0.
Hi,
I found in some networks, the gradients are multiplied by 1e+9 and some not.
Do you have any idea why they do that?
I know the reason must be tiny gradients on the earlier layers. However, I don't get how they came to the 1e+9 value, and as it is now they are multiplying all the layers by this value. Do you have any intuition or idea on what's actually happening and how do we get this number?
Best,
Fayez
I am a novice and get confused about gradient propagation in this code about training part.
So i have some problems , these may be "stupid".
def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
# process the weights including binarization
bin_op.binarization()
output = model(data)
loss = criterion(output, target)
loss.backward()
# restore weights
bin_op.restore()
bin_op.updateBinaryGradWeight()
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
return
class LeNet_5(nn.Module):
def __init__(self):
super(LeNet_5, self).__init__()
self.conv1 = nn.Conv2d(1, 20, kernel_size=5, stride=1)
self.bn_conv1 = nn.BatchNorm2d(20, eps=1e-4, momentum=0.1, affine=False)
self.relu_conv1 = nn.ReLU(inplace=True)
self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
self.bin_conv2 = BinConv2d(20, 50, kernel_size=5, stride=1, padding=0)
self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
self.bin_ip1 = BinConv2d(50*4*4, 500, Linear=True,
previous_conv=True, size=4*4)
self.ip2 = nn.Linear(500, 10)
for m in self.modules():
if isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.BatchNorm1d):
if hasattr(m.weight, 'data'):
m.weight.data.zero_().add_(1.0)
return
def forward(self, x):
for m in self.modules():
if isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.BatchNorm1d):
if hasattr(m.weight, 'data'):
m.weight.data.clamp_(min=0.01)
x = self.conv1(x)
x = self.bn_conv1(x)
x = self.relu_conv1(x)
x = self.pool1(x)
x = self.bin_conv2(x)
x = self.pool2(x)
# x = x.view(x.size(0), 50*4*4)
x = self.bin_ip1(x)
x = self.ip2(x)
return x
Sincerely
THANK YOU VERY MUCH.
Normally, shuffle should set to be True for train_loader. But why you set it to False instead? And why use the caffe normalization rather than the default pytorch transformation? Will it have a big influence to the final performance?
First of all, thank you for your effort in implementing xnornet, i just have one suggestion, it would be amazing if you make a documentation for your code, and explain the intution behind binarization in a simpler way than the original paper. :)
when I compile the alexnet code, I meet the following error. my directory is ilsvrc12_train_lmdb and ilsvrc12_val_lmdb both directory has a data.mdb and lock.mdb, can I change it for pytorch?
Traceback (most recent call last):
File "main.py", line 27, in
import datasets as datasets
File "/home/yy/XNOR-Net-PyTorch-master/ImageNet/networks/../datasets/init.py", line 1, in
from .folder import ImageFolder
File "/home/yy/XNOR-Net-PyTorch-master/ImageNet/networks/../datasets/folder.py", line 6, in
import lmdb
ImportError: No module named lmdb
For my view with BWN,In the forward pass,we only need to compute alpha and a binary filter B w.r.t its float one in XnorNet paper,but not to binarize a filter's output which is also the input of next layer.So ,why we need BinActive function?And I think your implementation of MNIST is real a BWN version instead of XNOR version after reading your code.Or maybe i ignore some details .
Another question is: Datatype to represent 0/1 is still float which can't reduce memory or time in training pass.I'm very confused.
THanks for your help
Is it just a proof of concept for the accuracy?
It's not even using bitwise ops to get the 58x speedup and 32x less memory...
Hi, the number of channels in conv1 for LeNet-5 in models/LeNet_5.py is 20. But I think conv1 in the original LeNet-5 is 6.
self.conv1 = nn.Conv2d(1, 20, kernel_size=5, stride=1)
So why the number of channels is 20 in conv1?
Can you give me some advises?
Hi,
I have run in a rather strange error.
When running your project I get the following error while attempting to read the data:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte
This gets triggered here: https://github.com/jiecaoyu/XNOR-Net-PyTorch/blob/master/CIFAR_10/main.py#L118
Now the weird thing is that when I manually tried to load the data through ipython I get the same error when using python3 but it works fine when using python2.
Any suggestions?Is the dataset encoded with a chinese characterset e.g?
class BinActive(torch.autograd.Function):
@staticmethod
def forward(ctx, input):
ctx.save_for_backward(input)
size = input.size()
input = input.sign()
mean = abs(input).mean()
return mean*input
@staticmethod
def backward(ctx, grad_output):
input, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input.ge(1)] = 0
grad_input[input.le(-1)] = 0
p2_f = grad_input.abs().mean()*grad_input
p1_f = grad_input.sign()*(1/grad_input.nelement())
grad_input = p1_f + p2_f
return grad_input
x = torch.randn(52, 24, requires_grad=True)
binactive = BinActive.apply
res = torch.autograd.gradcheck(binactive, (x,), atol=1e-2)
After running the following code, I get res as True.
Whereas, the BinActive code in this implementation does not work with gradcheck.
Am I correct to say that the BinActive backward propogation in this implementation is an approximation, and for this reason it does not work with gradcheck?
I tried to re-derive the backward propogation from your notes and code out the backward method myself.
This is not specifically an issue, but a general doubt.
Would appreciate your comments on this matter!
P.S. I understand that you haven't multiplied the mean in the forward, but I just wanted to get it working with gradcheck for my own satisfaction.
Thanks.
when use "import torch.utils.data.distributed",it ocurrs error "No module named distributed"
what's your version of pytorch?
Thanks for your codes.
I have quick question about line 82 of 'CIFAR_10/util.py'. Why we have to 'mul(1.0-1.0/s[1]).mul(n)'?
Thanks for your sharing. And i found some bug in cifar-10/main.py.
Line 23:in py3, keys() is a lazy method, it will raise exception if you change the keys when iterating. Adding a list() outside will be fine.
Line 123:batch size of test should be 128 instead of 100, as it correspond to the number in Line 78.
Line 165:+1 tab?
Actually, I implement your code on tensorflow. But it doesn't work. And i reduce the lr to 1e-4, it gets 80.9% acc.
I run your code and get a model of 86.17% acc. And I load the model in my code and get 86.2%acc. Amazing!!!
I find the problem that bn of the third binconv get a small computational error, which can be only detected by sum of the output of bn. and will be amplified by layers after that bn.
Maybe it's the problem of cuda or cudnn.T_T
My code is on https://github.com/ljhandlwt/xnor-net-tf
Thanks so much for sharing, I am new to deep learning, and I learned a lot from your code.
I have a question, I kind get
m = weight.norm(1, 3, keepdim=True).sum(2, keepdim=True).sum(1, keepdim=True).div(n).expand(s)
is to calculate
Next step why you use m * self.target_modules[index].grad.data
Finally how do you get new weight grad data.
self.target_modules[index].grad.data = m.add(m_add).mul(1.0-1.0/s[1]).mul(n)
When i run this code, problem happen, and how to solved this problem.
My python version is 2.7, thank you !
cd <Repository Root>/MNIST/
python main.py
File "/pytorch_test/XNOR-Net-PyTorch/MNIST/util.py", line 46, in clampConvParams
out = self.target_modules[index].data)
TypeError: clamp() got an unexpected keyword argument 'out'
in line 57 of util.py in MINST, you do the L1 norm for one dimension and sum up other dimension and then divide by n, which is not consistent with the paper where it treat the W as one cxhxw vector and do the L1 norm of whole vector, I am just curious why do you the former way? because of performance or implementation reason?
Thanks, real great work
Hey,
I was wondering if it s possible to train XNOR-Net-PyTorch on custom data ? I would appreciate your help.
Best regards
File "pytorch-master/XNOR-Net-PyTorch-master/ImageNet/networks/../datasets/folder.py", line 125, in getitem datum.ParseFromString(value)
what occurs this error?
ps: I use the imagenet dataset which comes from the caffe ilsvrc12_train_lmdb and ilsvrc12_val_lmdb
In python 3
line 165: print model
should be:
line 165: print (model)
the same as line 329
Sorry for potentially stupid question but I failed to find explicit answer on this vital question.
Hey @jiecaoyu ,
Thanks for your great work and sharing it.
I have a question about the LeNet_5 implementation. At line 12 in your util.py code, you set start_range =1 and end_range=count_targets-2. If my understanding is right, you mean here that the second conv layer(size:[20, 50, 5, 5 ]) and the first fc layer (size:[500, 800]) are binarized. When I set end_range = count_targets-1, I can binarize the second fc layer, and the accuracy degrades only a little(<1%). However, when I set start_range = 0, which means the first conv layer is binarized, the accuracy is kept at a low level.
Is it possible to binarize the first conv layer using your code? if is, could tell me how to binarize the first conv layer. Thanks for your sharing again.
Hello, I wonder if this network implementation is correct, since the block structure for the XNOR-Net in the original paper is like BNorm -> BinActive -> BinConv -> Pool. However, I couldn't find such sequences.
Thanks,
S.
Is there any trick here?
The BinOp helps to binarize the weights and store them in self.saved_params. But the weights in model remains the same. 'output = model(data)' in the training still use the floating point weights. So where do we perform the bitcount or binary weights? @jiecaoyu Really appreciate ur help!
" optimizer.zero_grad()
bin_op.binarization()
output = model(data)
loss = criterion(output, target)
loss.backward()
bin_op.restore()
bin_op.updateBinaryGradWeight()
optimizer.step()"
hi i just wonder how can i convert normal python CIFAR-10 dataset to train_data, label and train_data ( NIN sturcture???) if you have code about this please send it to me
I look from your readme the accuracy is similar to the XNOR.
In XNOR forward path, The conv should be
But in your code, I don't find the calculation of
Is this my misunderstanding?
self.target_modules[index].grad.data = self.target_modules[index].grad.data.mul(1e+9)
I notice cifar 10 does not have this line. Is there any difference between ImageNet and Cifar. Could you please give some hints that why we want this line?
As I didn't see you use Normalize in your code, can you explain the details?
Hi!
I tested the pre-trained model on ImageNet but only get Top1 at 4% and Top5 at 12% which was far away from the asserted accuracy.
I think the gap may partly caused by some different image pre-processing procedures.
Here is my transform:
normalize = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
train_transform = Compose([
RandomResizedCrop(224),
RandomHorizontalFlip(),
ToTensor(),
normalize,
])
val_transform = Compose([
Resize(256),
CenterCrop(224),
ToTensor(),
normalize,
])
But I think only the difference between pre-processing is not enough to cause such a huge gap.
Do you have any ideas?
Thank you!
XNOR-Net-PyTorch/CIFAR_10/util.py
Lines 12 to 28 in 26cca8c
Hello,
I am confused, is the LeNet-5 implementation the same as those available online?
https://cdn-images-1.medium.com/max/800/1*yXjgC7PFTxb-Oi_L_hoFXA.png
This is different from the LeNet-5 seen in your repository.
Kindly clarify.
P.S. By 'the paper', I refer to this http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
Thanks!
Thanks for your code , But I feel this sentence does not seem to correspond to the formula in your note
It seems that bias terms in Conv2d and Linear layers are not binarized.
I was wondering whether there is XNOR for ResNet. Or can you give some hints about how to implement it?
I have played around with CIFAR10 and also done a bit benchmark. It seems BinOp does not have noticeable effect on model size and inference speed compared to NIN model without BinOp. I have tested both on CPU and GPU. I thought the saved model nin.pth.tar would shrink, and the inference would speed up significantly. Do I miss something? Does anyone have this issue? Thanks.
I use my own code to load the 'alexnet.baseline.pth.tar' file.
However, the keys in the file are not consistent with those in the model which are shown in the pic.
And the model's code is
class AlexNet(nn.Module):
def __init__(self, num_classes=1000):
super(AlexNet, self).__init__()
self.num_classes = num_classes
self.features = nn.Sequential(
nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2),
nn.BatchNorm2d(96, eps=1e-4, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
BinConv2d(96, 256, kernel_size=5, stride=1, padding=2, groups=1),
nn.MaxPool2d(kernel_size=3, stride=2),
BinConv2d(256, 384, kernel_size=3, stride=1, padding=1),
BinConv2d(384, 384, kernel_size=3, stride=1, padding=1, groups=1),
BinConv2d(384, 256, kernel_size=3, stride=1, padding=1, groups=1),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.classifier = nn.Sequential(
BinConv2d(256 * 6 * 6, 4096, Linear=True),
BinConv2d(4096, 4096, dropout=0.5, Linear=True),
nn.BatchNorm1d(4096, eps=1e-3, momentum=0.1, affine=True),
nn.Dropout(),
nn.Linear(4096, num_classes),
)
Shouldn't it be 'features.0.weight' but rather 'features.module.0.weight' ?
Or am I using a wrong pretrained model?
BTW I'm really curious how did you attach the '.module'? LOL
First, thanks for your effort of re-implementing the XNOR-Net.
When I read the code, I am confused with purposes of the line 88
and line 89
in the /XNOR-Net-PyTorch/ImageNet/networks/util.py
.
self.target_modules[index].grad.data = m.add(m_add).mul(1.0-1.0/s[1]).mul(n)
self.target_modules[index].grad.data = self.target_modules[index].grad.data.mul(1e+9)
I have read your notes about the gradient of the binary weight filter, it is very elegant.
But I do not know what these two lines do.
And I think 1.0-1.0/s[1]
is nearly equal to 1, so you just multiple the gradient by n * (1e+9)
?
Is it because the gradient is too small and why you choose the number 1e+9
?
Could you kindly share the training logs for the NIN CIFAR-10 model? Maybe others as well. I would really appreciate that.
Thank you.
Hello!
I was attempting to understand your notes on Backward Gradient of Scaled Sign Function.
What is the bold W? Are those all the parameters of the network? (Assuming a fully connected network.)
W = [W1, W2,..., Wn]
Here, does the Wk denote a specific layer (kth layer) from the network? (k being any number from 1 to n)
Equation 4 seems to lead me to think that W is the weight for a specific layer, and Wi is the ith element from the W matrix.
I am pretty sure the W consists of all the layers, but honestly I want to be certain.
I ask because it is causing some ambiguity in the later equations.
Would really appreciate the help!
Thanks.
Thanks for your great job.
But I am confused of the code to achieve equation(12).
In your code, sign(Wi)'/Wi'=0 if |Wi|>1, sign(Wi)'/Wi'=1 if |Wi|<1. It doesn't match the function sign(Wi)'/Wi'=0 if Wi !=0. Is it just a Approximation?
Hi jiecaoyu,
I was trying to run the pretrained AlexNet model you provided but failed. I noticed that you used an older version of pytorch which makes the network arch incompatible with 3.0. So I switched to 2.0.
I also noticed that you read all the images from caffe generated lmdb files. I instead use pytorch ImageFolder to read raw images and resize them to 256 by 256. I followed the same normalization transformation as subtracting per-pixel imagemean. So after transformation data lies roughly between -128 and +128.
Is there anything else that I missed? I really appreciate your help on reproducing your results!
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.