Comments (10)
You will need to make some changes in BayesianLayers.py and get_masks() function to prune the conv layers. With the current code, you can only prune linear layers.
def compute_posterior_params(self):
weight_var, z_var = self.weight_logvar.exp(), self.z_logvar.exp()
part1 = self.z_mu.pow(2)[:, None, None, None] * weight_var
part2 = z_var[:, None , None, None] * self.weight_mu.pow(2)
part3 = z_var[:, None , None, None] * weight_var
self.post_weight_var = part1 + part2 + part3
self.post_weight_mu = self.z_mu[:, None , None, None] * self.weight_mu
return self.post_weight_mu, self.post_weight_var
To explain this in a bit more detail, z_mu and weight_var for lenet-5's first conv layer are respectively of size (20) and (20,1,5,5), and therefore you get a error in multiplying them.
You will also need to change the get_masks() function, to create mask for conv weights.
from tutorial_bayesiancompressionfordl.
Sure. This is the get_masks() function I am using. Basically incorporating the difference in size of weights in conv layers and linear layers, as done in compute posterior params. Secondly, flattening out the mask of last conv layer so that it can be multiplied with mask of linear layer. The code is self explanatory.
I think, this should work with both CNNs and fully connected neural networks, although it can be simplified a bit more.
def get_masks(self,thresholds):
weight_masks = []
mask = None
for i, (layer, threshold) in enumerate(zip(self.kl_list, thresholds)):
# compute dropout mask
if len(layer.weight_mu.shape) > 2:
if mask is None:
mask = [True]*layer.in_channels
else:
mask = np.copy(next_mask)
log_alpha = layers[i].get_log_dropout_rates().cpu().data.numpy()
next_mask = log_alpha < thresholds[i]
weight_mask = np.expand_dims(mask, axis=0) * np.expand_dims(next_mask, axis=1)
weight_mask = weight_mask[:,:,None,None]
else:
if mask is None:
log_alpha = layer.get_log_dropout_rates().cpu().data.numpy()
mask = log_alpha < threshold
elif len(weight_mask.shape) > 2:
temp = next_mask.repeat(layer.in_features/next_mask.shape[0])
log_alpha = layer.get_log_dropout_rates().cpu().data.numpy()
mask = log_alpha < threshold
#mask = mask | temp ##Upper bound for number of weights at first fully connected layer
mask = mask & temp ##Lower bound for number of weights at fully connected layer
else:
mask = np.copy(next_mask)
try:
log_alpha = layers[i + 1].get_log_dropout_rates().cpu().data.numpy()
next_mask = log_alpha < thresholds[i + 1]
except:
# must be the last mask
next_mask = np.ones(10)
weight_mask = np.expand_dims(mask, axis=0) * np.expand_dims(next_mask, axis=1)
weight_masks.append(weight_mask.astype(np.float))
return weight_masks
from tutorial_bayesiancompressionfordl.
Hi Lyken17,
sorry for coming back to you so late. Notifications are activated now ;).
The first thing that pops into my mind is a pytorch version issue. Could you provide me a
conda list
or equivalent?
A complete example is included and you should be able to run it simply. What exactly are you missing in our tutorial?
Best,
Karen
from tutorial_bayesiancompressionfordl.
Hi Karen
The output of conda list
is
(test) ➜ Tutorial_BayesianCompressionForDL git:(master) ✗ conda list
# packages in environment at /home/ligeng/anaconda3/envs/test:
#
# Name Version Build Channel
ca-certificates 2018.03.07 0
certifi 2018.1.18 py36_0
cycler 0.10.0 <pip>
imageio 2.3.0 <pip>
kiwisolver 1.0.1 <pip>
libedit 3.1 heed3624_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 7.2.0 hdf63c60_3
libstdcxx-ng 7.2.0 hdf63c60_3
matplotlib 2.2.2 <pip>
ncurses 6.0 h9df7e31_2
numpy 1.14.2 <pip>
openssl 1.0.2o h20670df_0
pandas 0.22.0 <pip>
Pillow 5.1.0 <pip>
pip 9.0.3 py36_0
pyparsing 2.2.0 <pip>
python 3.6.5 hc3d631a_0
python-dateutil 2.7.2 <pip>
pytz 2018.4 <pip>
PyYAML 3.12 <pip>
readline 7.0 ha6073c6_4
scipy 1.0.1 <pip>
seaborn 0.8.1 <pip>
setuptools 39.0.1 py36_0
six 1.11.0 <pip>
sqlite 3.22.0 h1bed415_0
tk 8.6.7 hc745277_3
torch 0.3.1 <pip>
torchvision 0.2.0 <pip>
wheel 0.31.0 py36_0
xz 5.2.3 h55aa19d_2
zlib 1.2.11 ha838bed_2
When I try to run example lenet by python example.py
, I get following errors
(test) ➜ Tutorial_BayesianCompressionForDL git:(master) ✗ python example.py
Traceback (most recent call last):
File "example.py", line 193, in <module>
main()
File "example.py", line 37, in main
transforms.ToTensor(),lambda x: 2 * (x - 0.5),
File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torchvision/datasets/mnist.py", line 53, in __init__
os.path.join(self.root, self.processed_folder, self.training_file))
File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/serialization.py", line 267, in load
return _load(f, map_location, pickle_module)
File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/serialization.py", line 420, in _load
result = unpickler.load()
AttributeError: Can't get attribute '_rebuild_tensor_v2' on <module 'torch._utils' from '/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/_utils.py'>
from tutorial_bayesiancompressionfordl.
Oops, the error is different from what I saw two month before. I guess there be some API update in Torch.
from tutorial_bayesiancompressionfordl.
After solving some compatibility issues, I modify the network to LeNet and re-rerun python example.py
The network structure is
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = BayesianLayers.Conv2dGroupNJ(1, 6, 5)
self.conv2 = BayesianLayers.Conv2dGroupNJ(6, 16, 5)
# activation
self.relu = nn.ReLU()
# layers
self.fc1 = BayesianLayers.LinearGroupNJ(16*5*5, 120, clip_var=0.04, cuda=FLAGS.cuda)
self.fc2 = BayesianLayers.LinearGroupNJ(120, 84, cuda=FLAGS.cuda)
self.fc3 = BayesianLayers.LinearGroupNJ(84, 10, cuda=FLAGS.cuda)
# layers including kl_divergence
self.kl_list = [self.conv1, self.conv2, self.fc1, self.fc2, self.fc3]
def forward(self, x):
# x = x.view(-1, 28 * 28)
# x = self.relu(self.fc1(x))
# x = self.relu(self.fc2(x))
out = F.relu(self.conv1(x))
out = F.max_pool2d(out, 2)
out = F.relu(self.conv2(out))
out = F.max_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
out = self.fc3(out)
return out
command line output is
(test) ➜ Tutorial_BayesianCompressionForDL git:(master) ✗ python example.py
Traceback (most recent call last):
File "example.py", line 217, in <module>
main()
File "example.py", line 176, in main
train(epoch)
File "example.py", line 147, in train
output = model(data)
File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "example.py", line 90, in forward
out = F.relu(self.fc1(out))
File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/ligeng/Public/Developing/Tutorial_BayesianCompressionForDL/BayesianLayers.py", line 126, in forward
xz = x * z
RuntimeError: The size of tensor a (256) must match the size of tensor b (400) at non-singleton dimension 1
The modified example.py
is uploaded to gist https://gist.github.com/Lyken17/8e0cae9a9aa6911190fd1b580ca75296
I can run original example without problem, but when I try to run with convolutional layer, I cannot figure out the proper way. Could you show an example of pruning LeNet?
from tutorial_bayesiancompressionfordl.
Hi Lyken17,
the problem you are experiencing has little to do with the Bayesian Layer but rather with a shape mismatch. The feature map coming out of 'conv2' is (16x4x4). If you change it, it should run. Additionally, I recommend telling all layers the cuda status.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# activation
self.relu = nn.ReLU()
# layers
self.conv1 = BayesianLayers.Conv2dGroupNJ(1, 6, 5, cuda=FLAGS.cuda)
self.conv2 = BayesianLayers.Conv2dGroupNJ(6, 16, 5, cuda=FLAGS.cuda)
self.fc1 = BayesianLayers.LinearGroupNJ(16*4*4, 120, clip_var=0.04, cuda=FLAGS.cuda)
self.fc2 = BayesianLayers.LinearGroupNJ(120, 84, cuda=FLAGS.cuda)
self.fc3 = BayesianLayers.LinearGroupNJ(84, 10, cuda=FLAGS.cuda)
# layers including kl_divergence
self.kl_list = [self.conv1, self.conv2, self.fc1, self.fc2, self.fc3]
def forward(self, x):
out = F.relu(self.conv1(x))
out = F.max_pool2d(out, 2)
out = F.relu(self.conv2(out))
out = F.max_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
out = self.fc3(out)
return out
Runs for me!
I will also add a requirements file so that we do not run into trouble with pytorch's API changes.
Cheers,
Karen
from tutorial_bayesiancompressionfordl.
@KarenUllrich
The network trains fine for convolution layers, but the compression.py functions do not work for convolutional weights/filters. I have made some changes in the compute_posterior_params to compute post_weight_mu and post_weight_var correctly for Convolutional layers.
I still get the error in extract_pruned_params because the size of mask and post_weight_mu for Conv layer 1 is different. To be specific, if you consider the above example, post_weight_mu has size (6,1,5,5) where as mask has a size (16,6). It looks like, get_masks() needs to be changed as well to get the correct masks for convolutional filters.
Is it?
from tutorial_bayesiancompressionfordl.
Hi,
I am having the same issue. The conv network trains but I am unable to get compression rates - same error as above. Here is a snippet to reproduce - part of example.py.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# activation
self.relu = nn.ReLU()
# layers
self.conv1 = BayesianLayers.Conv2dGroupNJ(1, 16, 5, cuda=FLAGS.cuda, padding=2)
self.conv2 = BayesianLayers.Conv2dGroupNJ(16, 36, 5, cuda=FLAGS.cuda, padding=2)
self.fc1 = BayesianLayers.LinearGroupNJ(36 * 7 * 7, 128, clip_var=0.04, cuda=FLAGS.cuda)
self.fc2 = BayesianLayers.LinearGroupNJ(128, 10, cuda=FLAGS.cuda)
#pool
self.pool = nn.MaxPool2d((2,2))
# layers including kl_divergence
self.kl_list = [self.conv1, self.conv2, self.fc1, self.fc2]
def forward(self, x):
x = x.view(-1, 1, 28, 28)
x = self.conv1(x)
x = self.pool(x)
x = self.relu(x)
x = self.conv2(x)
x = self.pool(x)
x = self.relu(x)
x = x.view(-1, 36*7*7)
x = self.relu(self.fc1(x))
x = self.fc2(x)
return x
I run python convMLP.py --batchsize 64 --epochs 1
I get
Epoch: 1 Train loss: 15.456320
Test loss: 0.0380, Accuracy: 9883/10000 (98.83%)
Traceback (most recent call last):
File "convMLP.py", line 204, in <module>
main()
File "convMLP.py", line 181, in main
compute_compression_rate(layers, model.get_masks(thresholds))
File "compression.py", line 119, in compute_compression_rate
weight_mus, weight_vars = extract_pruned_params(layers, masks)
File "compression.py", line 83, in extract_pruned_params
post_weight_mu, post_weight_var = layer.compute_posterior_params()
File "BayesianLayers.py", line 251, in compute_posterior_params
self.post_weight_var = self.z_mu.pow(2) * weight_var + z_var * self.weight_mu.pow(2) + z_var * weight_var
RuntimeError: The size of tensor a (16) must match the size of tensor b (5) at non-singleton dimension 3
In your paper you show compression rates for VGG and convolutional architectures, that is what I am trying to reproduce. Help!
Aswin
from tutorial_bayesiancompressionfordl.
Thank you @gullalc for your answer. Do you know what the changed get_masks() would be?
EDIT: It would be great if you issue a PR with those changes to conv and hopefully the authors will merge the changes.
EDIT2: Thank you for adding an explanation. It would be great if @KarenUllrich can comment.
from tutorial_bayesiancompressionfordl.
Related Issues (11)
- Small error in the kl-divergence HOT 1
- Will you publish the code for the horseshoe prior?
- Anyone encounter loss nan? HOT 2
- KL divergence approx for Linear appears to be wrong HOT 1
- Do you have any tricks to choose threshold of every layer? HOT 1
- Horseshoe Prior HOT 1
- How to use bayesianConv2d? HOT 1
- Is it in need to prune bias? HOT 1
- Missing factor 0.5 in KL-divergence? HOT 3
- padding/string/dilation missing in testing the quantized network. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tutorial_bayesiancompressionfordl.