karenullrich / tutorial_bayesiancompressionfordl Goto Github PK

View Code? Open in Web Editor NEW

201.0 201.0 48.0 35.94 MB

A tutorial on "Bayesian Compression for Deep Learning" published at NIPS (2017).

License: MIT License

Python 100.00%

tutorial_bayesiancompressionfordl's People

Contributors

Stargazers

Watchers

Forkers

codeaudit universityai liyuanyaun linpingchuan ruohoruotsi shubhampachori12110095 chuneli bayeswatch gngdb isr-wang caozhengquan lyf910919 lckfork dzuwhf ztwe shi27feng kourouklides wukailun fatihayar py-ranoid rgbitx kevinlemon franciszchen gullalc pankeshgupta iankuoli kuanzi evgenizer baldr-y morinw panghua vin136 juampamuc colinsongf rgib37190 lizhaofu hoangcuong2011 snehashis1997 lizhzh8 siaimes nhatuan84 lufeng22 sirius93123 ihaeyong asclepiusinformatica tranthanhbinh1

tutorial_bayesiancompressionfordl's Issues

Do you have any tricks to choose threshold of every layer?

After the model have been trained by the group horseshoe with the half-cauchy scale priors,I want to pruning the parameters.But I find it's hard to find the pruning threshold.
Do you have any trick to pick up the threshold?Thanks very much.
@KarenUllrich

KL divergence approx for Linear appears to be wrong

The Linear layer uses this:

KLD_element = -0.5 * self.weight_logvar + 0.5 * (self.weight_logvar.exp() + self.weight_mu.pow(2)) - 0.5

But the convolutional layers use this:

KLD_element = -self.weight_logvar + 0.5 * (self.weight_logvar.exp().pow(2) + self.weight_mu.pow(2)) - 0.5

The second appears to match equation 8 of the paper, so is it just a mistake in the Linear layer?

Samples of compression for LeNet

Hi author

Thanks for sharing the code. I am pretty interested in this work. When I am testing the compression LeNet, it raises "dimension not match" error. Could you share an example of compressing neural network with convolutional layers?

Small error in the kl-divergence

Unless I am missing something there is a slight error in the kl_divergence() definition in the _ConvNdGroupNJ class.

KLD_element = -self.weight_logvar + 0.5 * (self.weight_logvar.exp().pow(2) + self.weight_mu.pow(2)) - 0.5

treats self.weight_logvar as if it was the log std instead of the log variance. The correct expression should be (as in LinearGroupNJ):

KLD_element = -0.5 * self.weight_logvar + 0.5 * (self.weight_logvar.exp() + self.weight_mu.pow(2)) - 0.5

padding/string/dilation missing in testing the quantized network.

Tutorial_BayesianCompressionForDL/BayesianLayers.py

Line 350 in a7d3d83

return F.conv2d(x, self.post_weight_mu, self.bias_mu)

Missing factor 0.5 in KL-divergence?

The convolution layer seems to miss a factor 0.5 in front of the log variance term in the KL-divergence. The Dense layer does have this factor.

Tutorial_BayesianCompressionForDL/BayesianLayers.py

Line 264 in a7d3d83

    
           KLD_element = -self.weight_logvar + 0.5 * (self.weight_logvar.exp() + self.weight_mu.pow(2)) - 0.5

Is it in need to prune bias?

Hi @KarenUllrich ,

I find your code does not eliminate the corresponding bias when pruning weights, which may, I think, lead to a high performance but not necessarily real.
Maybe I have missed some important parts of your code, or, the bias influence is not that essential?

Looking forward to discussing with you!

Anyone encounter loss nan?

It is ok for small scale models like MLP/LeNet5. However, when it comes to vgg16/ resnet18, it will always produce nan loss.
The model structure configuration is below:

`
cfg = {
'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}

class VGG_CIFAR10_BAY(nn.Module):
kl_list = []
def init(self, vgg_name):
super(VGG_CIFAR10_BAY, self).init()
self.features = self._make_layers(cfg[vgg_name])
linear_index = BayesianLayer.LinearGroupNJ(512, 10, clip_var=0.04, cuda=True)
self.classifier = linear_index
self.kl_list.append(linear_index)

def forward(self, x):
out = self.features(x)
out = out.view(out.size(0), -1)
out = self.classifier(out)
return out

def _make_layers(self, cfg):
layers = []
in_channels = 3
for x in cfg:
if x == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
else:
conv_index = BayesianLayer.Conv2dGroupNJ(in_channels, x, kernel_size=3, padding=1, clip_var=0.04, cuda=True)
layers += [conv_index,
nn.BatchNorm2d(x),
nn.ReLU(inplace=True)]
self.kl_list.append(conv_index)
in_channels = x
layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
return nn.Sequential(*layers)

def get_masks(self,thresholds):
# import pdb
# pdb.set_trace()
weight_masks = []
mask = None
layers = self.kl_list
for i, (layer, threshold) in enumerate(zip(layers, thresholds)):
# compute dropout mask
if len(layer.weight_mu.shape) > 2:
if mask is None:
mask = [True]*layer.in_channels
else:
mask = np.copy(next_mask)

        log_alpha = layers[i].get_log_dropout_rates().cpu().data.numpy()
        next_mask = log_alpha <= thresholds[i]

        weight_mask = np.expand_dims(mask, axis=0) * np.expand_dims(next_mask, axis=1)
        weight_mask = weight_mask[:,:,None,None]
    else:
        if mask is None:
            log_alpha = layer.get_log_dropout_rates().cpu().data.numpy()
            mask = log_alpha <= threshold
        elif len(weight_mask.shape) > 2:
            temp = next_mask.repeat(layer.in_features/next_mask.shape[0])
            log_alpha = layer.get_log_dropout_rates().cpu().data.numpy()
            mask = log_alpha <= threshold
            #mask = mask | temp  ##Upper bound for number of weights at first fully connected layer
            mask = mask & temp   ##Lower bound for number of weights at fully connected layer
        else:
            mask = np.copy(next_mask)

        try:
            log_alpha = layers[i + 1].get_log_dropout_rates().cpu().data.numpy()
            next_mask = log_alpha <= thresholds[i + 1]
        except:
            # must be the last mask
            next_mask = np.ones(10)

        weight_mask = np.expand_dims(mask, axis=0) * np.expand_dims(next_mask, axis=1)
    weight_masks.append(weight_mask.astype(np.float))
return weight_masks

def model_kl_div(self):
KLD = 0
for layer in self.kl_list:
KLD += layer.layer_kl_div()
return KLD
`

Does it cause by high variance? But I have tried to clip variance, it doesn't work...

karenullrich / tutorial_bayesiancompressionfordl Goto Github PK

tutorial_bayesiancompressionfordl's People

Contributors

Stargazers

Watchers

Forkers

tutorial_bayesiancompressionfordl's Issues

Do you have any tricks to choose threshold of every layer?

KL divergence approx for Linear appears to be wrong

Samples of compression for LeNet

Small error in the kl-divergence

padding/string/dilation missing in testing the quantized network.

Missing factor 0.5 in KL-divergence?

Is it in need to prune bias?

Anyone encounter loss nan?

Will you publish the code for the horseshoe prior?

Horseshoe Prior

How to use bayesianConv2d?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent