karenullrich / tutorial_bayesiancompressionfordl Goto Github PK
View Code? Open in Web Editor NEWA tutorial on "Bayesian Compression for Deep Learning" published at NIPS (2017).
License: MIT License
A tutorial on "Bayesian Compression for Deep Learning" published at NIPS (2017).
License: MIT License
After the model have been trained by the group horseshoe with the half-cauchy scale priors,I want to pruning the parameters.But I find it's hard to find the pruning threshold.
Do you have any trick to pick up the threshold?Thanks very much.
@KarenUllrich
The Linear layer uses this:
KLD_element = -0.5 * self.weight_logvar + 0.5 * (self.weight_logvar.exp() + self.weight_mu.pow(2)) - 0.5
But the convolutional layers use this:
KLD_element = -self.weight_logvar + 0.5 * (self.weight_logvar.exp().pow(2) + self.weight_mu.pow(2)) - 0.5
The second appears to match equation 8 of the paper, so is it just a mistake in the Linear layer?
Hi author
Thanks for sharing the code. I am pretty interested in this work. When I am testing the compression LeNet, it raises "dimension not match" error. Could you share an example of compressing neural network with convolutional layers?
Unless I am missing something there is a slight error in the kl_divergence()
definition in the _ConvNdGroupNJ
class.
KLD_element = -self.weight_logvar + 0.5 * (self.weight_logvar.exp().pow(2) + self.weight_mu.pow(2)) - 0.5
treats self.weight_logvar
as if it was the log std instead of the log variance. The correct expression should be (as in LinearGroupNJ
):
KLD_element = -0.5 * self.weight_logvar + 0.5 * (self.weight_logvar.exp() + self.weight_mu.pow(2)) - 0.5
The convolution layer seems to miss a factor 0.5 in front of the log variance term in the KL-divergence. The Dense layer does have this factor.
Hi @KarenUllrich ,
I find your code does not eliminate the corresponding bias when pruning weights, which may, I think, lead to a high performance but not necessarily real.
Maybe I have missed some important parts of your code, or, the bias influence is not that essential?
Looking forward to discussing with you!
It is ok for small scale models like MLP/LeNet5. However, when it comes to vgg16/ resnet18, it will always produce nan loss.
The model structure configuration is below:
`
cfg = {
'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}
class VGG_CIFAR10_BAY(nn.Module):
kl_list = []
def init(self, vgg_name):
super(VGG_CIFAR10_BAY, self).init()
self.features = self._make_layers(cfg[vgg_name])
linear_index = BayesianLayer.LinearGroupNJ(512, 10, clip_var=0.04, cuda=True)
self.classifier = linear_index
self.kl_list.append(linear_index)
def forward(self, x):
out = self.features(x)
out = out.view(out.size(0), -1)
out = self.classifier(out)
return out
def _make_layers(self, cfg):
layers = []
in_channels = 3
for x in cfg:
if x == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
else:
conv_index = BayesianLayer.Conv2dGroupNJ(in_channels, x, kernel_size=3, padding=1, clip_var=0.04, cuda=True)
layers += [conv_index,
nn.BatchNorm2d(x),
nn.ReLU(inplace=True)]
self.kl_list.append(conv_index)
in_channels = x
layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
return nn.Sequential(*layers)
def get_masks(self,thresholds):
# import pdb
# pdb.set_trace()
weight_masks = []
mask = None
layers = self.kl_list
for i, (layer, threshold) in enumerate(zip(layers, thresholds)):
# compute dropout mask
if len(layer.weight_mu.shape) > 2:
if mask is None:
mask = [True]*layer.in_channels
else:
mask = np.copy(next_mask)
log_alpha = layers[i].get_log_dropout_rates().cpu().data.numpy()
next_mask = log_alpha <= thresholds[i]
weight_mask = np.expand_dims(mask, axis=0) * np.expand_dims(next_mask, axis=1)
weight_mask = weight_mask[:,:,None,None]
else:
if mask is None:
log_alpha = layer.get_log_dropout_rates().cpu().data.numpy()
mask = log_alpha <= threshold
elif len(weight_mask.shape) > 2:
temp = next_mask.repeat(layer.in_features/next_mask.shape[0])
log_alpha = layer.get_log_dropout_rates().cpu().data.numpy()
mask = log_alpha <= threshold
#mask = mask | temp ##Upper bound for number of weights at first fully connected layer
mask = mask & temp ##Lower bound for number of weights at fully connected layer
else:
mask = np.copy(next_mask)
try:
log_alpha = layers[i + 1].get_log_dropout_rates().cpu().data.numpy()
next_mask = log_alpha <= thresholds[i + 1]
except:
# must be the last mask
next_mask = np.ones(10)
weight_mask = np.expand_dims(mask, axis=0) * np.expand_dims(next_mask, axis=1)
weight_masks.append(weight_mask.astype(np.float))
return weight_masks
def model_kl_div(self):
KLD = 0
for layer in self.kl_list:
KLD += layer.layer_kl_div()
return KLD
`
Does it cause by high variance? But I have tried to clip variance, it doesn't work...
@clouizos @KarenUllrich
How is the equation derived according to
Can you show a example of bayesianLayers.Conv2dGroupNJ()? I want to use this conv on my network but there is some error.
Thanks for your help.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.