Coder Social home page Coder Social logo

synchronized-batchnorm-pytorch's Introduction

Synchronized-BatchNorm-PyTorch

IMPORTANT: Please read the "Implementation details and highlights" section before use.

Synchronized Batch Normalization implementation in PyTorch.

This module differs from the built-in PyTorch BatchNorm as the mean and standard-deviation are reduced across all devices during training.

For example, when one uses nn.DataParallel to wrap the network during training, PyTorch's implementation normalize the tensor on each device using the statistics only on that device, which accelerated the computation and is also easy to implement, but the statistics might be inaccurate. Instead, in this synchronized version, the statistics will be computed over all training samples distributed on multiple devices.

Note that, for one-GPU or CPU-only case, this module behaves exactly same as the built-in PyTorch implementation.

This module is currently only a prototype version for research usages. As mentioned below, it has its limitations and may even suffer from some design problems. If you have any questions or suggestions, please feel free to open an issue or submit a pull request.

Why Synchronized BatchNorm?

Although the typical implementation of BatchNorm working on multiple devices (GPUs) is fast (with no communication overhead), it inevitably reduces the size of batch size, which potentially degenerates the performance. This is not a significant issue in some standard vision tasks such as ImageNet classification (as the batch size per device is usually large enough to obtain good statistics). However, it will hurt the performance in some tasks that the batch size is usually very small (e.g., 1 per GPU).

For example, the importance of synchronized batch normalization in object detection has been recently proved with a an extensive analysis in the paper MegDet: A Large Mini-Batch Object Detector.

Usage

To use the Synchronized Batch Normalization, we add a data parallel replication callback. This introduces a slight difference with typical usage of the nn.DataParallel.

Use it with a provided, customized data parallel wrapper:

from sync_batchnorm import SynchronizedBatchNorm1d, DataParallelWithCallback

sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False)
sync_bn = DataParallelWithCallback(sync_bn, device_ids=[0, 1])

Or, if you are using a customized data parallel module, you can use this library as a monkey patching.

from torch.nn import DataParallel  # or your customized DataParallel module
from sync_batchnorm import SynchronizedBatchNorm1d, patch_replication_callback

sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False)
sync_bn = DataParallel(sync_bn, device_ids=[0, 1])
patch_replication_callback(sync_bn)  # monkey-patching

You can use convert_model to convert your model to use Synchronized BatchNorm easily.

import torch.nn as nn
from torchvision import models
from sync_batchnorm import convert_model
# m is a standard pytorch model
m = models.resnet18(True)
m = nn.DataParallel(m)
# after convert, m is using SyncBN
m = convert_model(m)

See also tests/test_sync_batchnorm.py for numeric result comparison.

Implementation details and highlights

If you are interested in how batch statistics are reduced and broadcasted among multiple devices, please take a look at the code with detailed comments. Here we only emphasize some highlights of the implementation:

  • This implementation is in pure-python. No C++ extra extension libs.
  • Easy to use as demonstrated above.
  • It uses unbiased variance to update the moving average, and use sqrt(max(var, eps)) instead of sqrt(var + eps).
  • The implementation requires that each module on different devices should invoke the batchnorm for exactly SAME amount of times in each forward pass. For example, you can not only call batchnorm on GPU0 but not on GPU1. The #i (i = 1, 2, 3, ...) calls of the batchnorm on each device will be viewed as a whole and the statistics will be reduced. This is tricky but is a good way to handle PyTorch's dynamic computation graph. Although sounds complicated, this will usually not be the issue for most of the models.

Known issues

Runtime error on backward pass.

Due to a PyTorch Bug, using old PyTorch libraries will trigger an RuntimeError with messages like:

Assertion `pos >= 0 && pos < buffer.size()` failed.

This has already been solved in the newest PyTorch repo, which, unfortunately, has not been pushed to the official and anaconda binary release. Thus, you are required to build the PyTorch package from the source according to the instructions here.

Numeric error.

Because this library does not fuse the normalization and statistics operations in C++ (nor CUDA), it is less numerically stable compared to the original PyTorch implementation. Detailed analysis can be found in tests/test_sync_batchnorm.py.

Authors and License:

Copyright (c) 2018-, Jiayuan Mao.

Contributors: Tete Xiao, DTennant.

Distributed under MIT License (See LICENSE)

synchronized-batchnorm-pytorch's People

Contributors

acgtyrant avatar aywi avatar d-li14 avatar dtennant avatar vacancy avatar wang93 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

synchronized-batchnorm-pytorch's Issues

Train Stucked

Hi ~
I Use
if torch.cuda.device_count() > 1: model = torch.nn.DataParallel(model) model = bnconvert(model) model.cuda()
to use sync-bn during multi-gpu training, but when training the network, it looks like training procedure stucked at final batch in one epoch

image

shall I use your `DataParallelWithCallback` instead of pytorch `nn.DataParallel`?

Hi,

I do not quite understand the examples you released. What if nn.BatchNorm2d is part of a module ?
For example, if I use your SynchronizedBatchNorm2d to replace the nn.BatchNorm2d in the resnet50 model, shall I use net = DataParalleWithCallBack(net) instead of net = nn.DataParallel(net) ? Or do I just need to use DataParallelWithCallBack to wrap your implemented SynchronizedBatchNorm2d ?

Is it more difficult to train model?

when I use this sync-BN in my model , it became more slowly and even stop . I don not know why it happened,IO load? or other reasons?Can you help me?

Unstable performance between adjacent epochs during test phase

Hi, thanks very much for your code.

But, trained with sync-bn layer, my model seems unstable between adjacent epochs during test phase, i.e., on the test dataset, I tested my model every two epochs, and performance curve present to vibrate seriously.

PS. I used the momentum of 0.1 by default, does it seem too large ?

RuntimeError with convert_model - "found one of them on device: cpu"

Hi!

I tried to use convert_model to convert my model to use synchronized BatchNorm, but I got the following error.

File "/home/xxx/anaconda3/envs/pt1.2/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 146, in forward
    "them on device: {}".format(self.src_device_obj, t.device))

RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

Here are the codes:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = net()  # my model

if args.use_gpu:
    model = torch.nn.DataParallel(model, device_ids=gpus).to(device)

model = convert_model(model)

Have I done something wrong? Also, the codes run ok when conver_model is not being used.

Wired things happen when applied to FPN

Hi, thanks a lot for your code.

But when I apply this code to my implemented e2e version of FPN, some wired things happen.

If I use 8 cards, the GPU memory continues to increase until one card occupies all GPU memory. Then the FPN get stuck and the GPU utilization of other not-full cards are 100%.

If I use 4 cards, the memory continue to increase and then FPN may get struck with all GPU utilization as 0.

I used PyTorch 0.4.0 with your DataParallelWithCallback and the input image size is different on different cards. And if I use BN from official pytorch, my code works well.

Could you pls give me any hints to help me to find the reason?

_SynchronizedBatchNorm's backward is redefined by its forward, right?

I am reading the source codes of Synced BN in fyu/drn, zhanghang1989/PyTorch-SyncBatchNorm and your program at the same time.

I find that your program is different with the others, the others rewrite the BN function to pass the global mean and std to it while the official BN function calculate the local mean and std inside, but you just use simple expression like output = (input - _unsqueeze_ft(mean)) * _unsqueeze_ft(inv_std * self.weight) + _unsqueeze_ft(self.bias) in the _SynchronizedBatchNorm module which is inherited from the torch.nn.modules.batchnorm.

So I think this means that the _SynchronizedBatchNorm's backward will be redefined automatically to be adjusted to the expression so that you do not need to rewrite the function like others program, right?

When I use the sync bn on 8x GPU, I will stop sometimes.

GPU benchmark: 8 x 1080 Ti
Cuda version: 9.0
Pytorch version: 0.4.1

Experiment config:
batch size: 16
num workers: 16
input size: 480x480

When I use the sync bn on the ADE20K dataset, my experiment will stop at a certain iteration without other notion output. And the utilization rate of GPU will drop to 0. Did you have the similar experience?

About python test scripts and SyncBN usage

Hi,

Nice work!But when I run this test file https://github.com/vacancy/Synchronized-BatchNorm-PyTorch/blob/master/tests/test_numeric_batchnorm.py , it gives me following output.

F
======================================================================
FAIL: testNumericBatchNorm (__main__.NumericTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_numeric_batchnorm.py", line 51, in testNumericBatchNorm
    self.assertTensorClose(b_var1.data, b_var2.data)
  File "/home/cws/Download/Synchronized-BatchNorm-PyTorch/sync_batchnorm/unittest.py", line 28, in assertTensorClose
    self.assertTrue(torch.allclose(x, y), message)
AssertionError: False is not true : Tensor close check failed
adiff=0.0003001689910888672
rdiff=0.02449820376932621

----------------------------------------------------------------------
Ran 1 test in 0.157s

FAILED (failures=1)

And other 2 test python files give me similar output. I'm not sure whether this is expected.
Version: Pytorch1.0, cuda9.0

========
Then I use SyncBN in my code, and everything seems fine( at least from ouput, like loss, accuracy, etc.) Specificly, I do this in following methods.

  • replace all nn.BatchNorm2d in model definition with SynchronizedBatchNorm2d
  • replace torch.nn.DataParallel with DataParallelWithCallback
  • no change to loss function, just the same as the situation when training the model with 1 GPU

Is this correct way to use SyncBN here? Do I have some method to confirm SyncBN works?

Thanks

about convert_model

there are some problme when i run the example code about convert_model,the variable ‘mod’ was not assigned,it seems something wrong about the recursion

test gap between training and test

Hi~
Thanking for your code firstly !

I use the SyncBatch for training SSD, when training I can get 46.81% mAP after 10 epoches finishes. However, when I use the saved model, I only get 36.7% mAP

Here is how I convert model
ssd_student = convert_model(ssd_student) net_student = ssd_student

This is code for saving model
torch.save(net_student.state_dict(), 'weights/' + model_name + '/ssd300_COCO_' + repr(iteration) + '_%.2f.pth' % mAP_student)

When load saved model, I didnt use the convert_model to convert to SyncBN since i think there is no need to convert model when key actually is the same.

I couldn't find where I'm wrong, please give some kind advice, thank you !!

a question about the highlight "use sqrt(max(var, eps)) instead of sqrt(var + eps)"

I have checked that, in pytorch, the std is computed as sqrt(var + eps). But the Synchronized-BatchNorm uses the equation sqrt(max(var, eps)) instead. In the file README.md, it says that "It is completely compatible with PyTorch's implementation. Specifically, it uses unbiased variance to update the moving average, and use sqrt(max(var, eps)) instead of sqrt(var + eps).".

Are there something wrong in my understanding? Please help.

Thanks very much.

How to load bias weights

Hi,

I have merged the implementation into my code.
But when loading the bias weights of this layer, it keeps popping out this kind of error:

AttributeError: 'DataParallelWithCallback' object has no attribute 'bias'

I wonder if I want to set
affine=True
in
sync_bn = SynchronizedBatchNorm1d(filters, eps=1e-5, affine=True)

How should I modify my code?

Thanks

Where is the "BatchNormReimpl"?

sync_batchnorm/batchnorm_reimpl.py
all = ['BatchNormReimpl']

There is no BatchNormReimpl in batchnorm_reimpl.py.
Maybe BatchNorm2dReimpl instead?

Another test script for numerical stability

Create a virtual environment and install torch and pytest, run pytest immediately to test the script below:

import torch
from torch import nn
from torch.nn import init

import pytest


def allclose(x, y):
    adiff = (x - y).abs().max()
    if (y == 0).all():
        rdiff = 'NaN'
    else:
        rdiff = (adiff / y).abs().max()
    message = (
            'Tensor close check failed\n'
            'adiff={}\n'
            'rdiff={}\n'
    ).format(adiff, rdiff)
    assert torch.allclose(x, y), message


class TBatchNorm(nn.Module):

    def __init__(
            self,
            num_features,
            eps=1e-5,
            momentum=0.1):
        super(TBatchNorm, self).__init__()
        self.num_features = num_features
        self.eps = eps
        self.momentum = momentum
        self.weight = nn.Parameter(torch.Tensor(num_features))
        self.bias = nn.Parameter(torch.Tensor(num_features))
        self.register_buffer('running_mean', torch.zeros(num_features))
        self.register_buffer('running_var', torch.ones(num_features))
        self.reset_parameters()

    def reset_running_stats(self):
        self.running_mean.zero_()
        self.running_var.fill_(1)

    def reset_parameters(self):
        self.reset_running_stats()
        nn.init.uniform_(self.weight)
        nn.init.zeros_(self.bias)

    def forward(self, input_):
        batchsize, channels, height, width = input_.size()
        numel = batchsize * height * width
        input_ = input_.permute(1, 0, 2, 3).contiguous().view(channels, numel)
        sum_ = input_.sum(1)
        sum_of_square = input_.pow(2).sum(1)
        mean = sum_ / numel
        sumvar = sum_of_square - sum_ * mean
        unbias_var = sumvar / (numel - 1)
        bias_var = sumvar / numel
        std = bias_var.clamp(self.eps) ** 0.5

        self.running_mean = (
                (1 - self.momentum) * self.running_mean
                + self.momentum * mean.detach())
        self.running_var = (
                (1 - self.momentum) * self.running_var
                + self.momentum * unbias_var.detach())

        output = (
                (input_ - mean.unsqueeze(1)) / std.unsqueeze(1) *
                self.weight.unsqueeze(1) + self.bias.unsqueeze(1))

        return output.permute(1, 0).contiguous().view(
                batchsize, channels, height, width)


@pytest.fixture(scope='module')
def instance():
    CHANNELS = 16
    batchnorm1 = nn.BatchNorm2d(CHANNELS, momentum=1)
    optimizer1 = torch.optim.SGD(batchnorm1.parameters(), lr=0.01)
    batchnorm2 = TBatchNorm(CHANNELS, momentum=1)
    with torch.no_grad():
        batchnorm2.weight = nn.Parameter(batchnorm1.weight.clone())
        batchnorm2.bias = nn.Parameter(batchnorm1.bias.clone())
    optimizer2 = torch.optim.SGD(batchnorm2.parameters(), lr=0.01)

    for _ in range(100):
        input_ = torch.rand(16, CHANNELS, 16, 16)

        input1 = input_.clone().requires_grad_(True)
        output1 = batchnorm1(input1)
        output1.sum().backward()
        optimizer1.step()

        input2 = input_.clone().requires_grad_(True)
        output2 = batchnorm2(input2)
        output2.sum().backward()
        optimizer2.step()

    return input1, batchnorm1, output1, input2, batchnorm2, output2


def test_input(instance):
    input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
    allclose(input1, input2)


def test_output(instance):
    input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
    allclose(output1, output2)


def test_input_grad(instance):
    input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
    allclose(input1.grad, input2.grad)


def test_weight_grad(instance):
    input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
    allclose(batchnorm1.weight.grad, batchnorm2.weight.grad)


def test_bias_grad(instance):
    input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
    allclose(batchnorm1.bias.grad, batchnorm2.bias.grad)

Is this script reasonable?

raining couldn 't start

Hi,

I tried to used it as

model = Model(...)
model = nn.DataParallel(model)
model = convert_model(model)
model = model.cuda()

However, it stucked and training couldn 't start.
Have you seem similar problems before ?

How to use it when testing

I'm not sure whether I need to use the function convert_to when testing. Besides, can model.eval() switches it to the test situation?

How to solve the numeric error?

What is the numeric error? Overflow? Or not same behavior as the original batchnorm? And could you give some hints to solve it? Thank you very much!

Can't test successfully using the scripts in ./tests

Thanks for this excellent project, but I have problems to test it successfully.

1. I first run the scripts in ./tests, the errors are as follows:

(1) test_numeric_batchnorm.py

ERROR: testNumericBatchNorm (main.NumericTestCase)
Traceback (most recent call last):
File "test_numeric_batchnorm.py", line 48, in testNumericBatchNorm
self.assertTensorClose(bn.running_mean, a.mean(dim=0))
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'
Ran 1 test in 0.192s
FAILED (errors=1)

(2) test_numeric_batchnorm_v2.py

ERROR: testNumericBatchNorm (main.NumericTestCasev2)
Traceback (most recent call last):
File "test_numeric_batchnorm_v2.py", line 33, in testNumericBatchNorm
batchnorm2 = BatchNorm2dReimpl(CHANNELS, momentum=1)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/batchnorm_reimpl.py", line 33, in init
self.weight = nn.Parameter(torch.empty(num_features))
AttributeError: module 'torch' has no attribute 'empty'
Ran 1 test in 0.001s
FAILED (errors=1)

(3) test_sync_batchnorm.py

ERROR: testSyncBatchNorm2DSyncTrain (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 107, in testSyncBatchNorm2DSyncTrain
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10, 16, 16), True, cuda=True)
File "test_sync_batchnorm.py", line 59, in _checkBatchNormResult
output2.sum().backward()
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion pos >= 0 && pos < buffer.size() failed.

ERROR: testSyncBatchNormNormalEval (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 77, in testSyncBatchNormNormalEval
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), False)
File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult
self.assertTensorClose(input1.data, input2.data)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'

ERROR: testSyncBatchNormNormalTrain (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 71, in testSyncBatchNormNormalTrain
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), True)
File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult
self.assertTensorClose(input1.data, input2.data)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'

ERROR: testSyncBatchNormSyncEval (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 97, in testSyncBatchNormSyncEval
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), False, cuda=True)
File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult
self.assertTensorClose(input1.data, input2.data)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'

ERROR: testSyncBatchNormSyncTrain (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 87, in testSyncBatchNormSyncTrain
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), True, cuda=True)
File "test_sync_batchnorm.py", line 59, in _checkBatchNormResult
output2.sum().backward()
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion pos >= 0 && pos < buffer.size() failed.
Ran 5 tests in 6.546s
FAILED (errors=5)

2. I secondly run my scripts using net = DataParallelWithCallback(net, device_ids=[0, 1]) with two GPUs (single GPU is all right), the error is:

Traceback (most recent call last):
File "train_scan_em13.py", line 190, in
loss.backward()
File "/home/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion pos >= 0 && pos < buffer.size() failed.

Ubuntu 14.04

cuda8.0 & cudnn5.1

python3.6

pytorch 0.3.1

Do you have any suggestions? Thanks.

Training cannot start

Hi,

Good job! I tried to used it as

device=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model = Model(...)
model = nn.DataParallel(model, device_ids=[0, 1])
model = convert_model(model).to(device)

However, it stucked and training couldn 't start.
Have you seem similar problems before ?

Is this a bug that channel between input tensor and sync batchnorm are mismatch the code still run successful?

I wrote code like this in my net,
net = nn.Sequential( nn.Conv2d(18*2, 18, kernel_size=3, padding=1, bias=True), SynchronizedBatchNorm2d(18*2), nn.ReLU(inplace=True),)

and I found the channel between conv layer output and sync batchnorm input are not match. (18 and 18*2), but the code is run successfully without warning or error. After I debug some code for this, it work when use multiply gpus in training, which means adding this lines in the code.
net = nn.DataParallel(net, device_ids=[0,1]) patch_replication_callback(net) net = net.cuda()

If use this sync batch norm on one gpu, that will report error, RuntimeError: running_mean should contain 18 elements not 36
I don't know whether it is a bug. I found in the source code of sync batch norm, it will reshape the input tensor like
input = input.view(input.size(0), self.num_features, -1).
Is that means once this shape of tensor can be reshape by the code in this line, even the channels are mismatch, the batch norm process still work with no error?

how to export with onnx

How do I output the model code as onnx

some error export like tihs

RuntimeError: Unsupported: ONNX export of batch_norm for unknown channel size.

module问题

运行test报错:No module named 'models.networks.sync_batchnorm',已经查过normalization.py和generator.py文件,只发现,在整个工程里只有这两个文件中提到models.networks.sync_batchnorm,但并没有接口或被调用?本人版本也在要求之内,确实想不出来是哪里出的问题了。请教了。

Training cannot start

Hi,

Good job! I tried to used it as

device=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model = Model(...)
model = nn.DataParallel(model, device_ids=[0, 1])
model = convert_model(model).to(device)

However, it stucked and training couldn 't start.
Have you seem similar problems before ?

Question on `sqrt(max(var, eps))`

Hi, really appreciate the great code.

I just got one question on computing the invstd (#L150), which uses sqrt(max(var, eps)) instead of sqrt(var+eps).
For example, in Normalization.cuh in pytorch, it seems that they compute invstd with sqrt(var+eps) as below.

template<typename T>
struct InvStd {
  __device__ __forceinline__ T operator()(T var, double epsilon) const {
    T invstd = 0;
    if (var != static_cast<T>(0) || epsilon != static_cast<T>(0)) {
      invstd = static_cast<T>(1) / device_sqrt(var + epsilon);
    }
    return invstd;
  }
};

or is there any reason to choose the way of calculating invstd as now?
Thanks!

Thinking about 'sync_batchnorm.batchnorm.convert_model(module)'..

I've taken a close look at the source code in this method, and I have thought about the code below:

line 1: if isinstance(module, torch.nn.DataParallel):
line 2: mod = module.module
line 3: mod = convert_model(mod)
line 4: mod = DataParallelWithCallback(mod)
line 5: return mod

Whether line 4 should be 'DataParallelWithCallback(mod, device_ids=module.device_ids)'?
this can keep the same cuda device ids, otherwise, it'll use all cuda devices which is available.

Please feel free to comment, and show your thoughts, thanks!

Can one past the test ?

Hello. Thanks for the awesome code.
I read this code base and the adaptation from zhanghang1989 . His code customs two operators for saving gpu memory. I've just combined the cuda extension for PyTorch 0.4.1. Although I have past the gradcheck of the operator bn and the opeartor sum_square, and for each operator I've compared the operator ouput with the ouput from imperative implementation using PyTorch, I can not past the test case provided here...
May you have any suggestion about the numeric stability ?

.

Sry confused wrong repo

Network performance is getting worse

Hi,
Thank you for your work and sharing.
I try to use convert_model function on my own code, for example:

    cudnn.benchmark = True
    net = Network()
    net.cuda()
    net = nn.DataParallel(net, device_ids=args.gpus)

    net = convert_model(net)

However, after training, I found that the result is far away from my expectations, even worse than using the nn.BatchNorm2d that comes with the PyTorch. Do I use convert_model function wrongly? Or are there some points to note? Thank you very much!

about fp16

When I use fp16 (16-bit float) and multi-gpu training,the code will wait in SyncBN(comm.py).
tim 20190220212250

Training stuck with multiple call of forward function

Hi,

Thank you for the great code. I have looked at the related issues but it turns out that it doesnt help in my case. I have a network using your Sync BN. I try to call the forward pass of the model for 4 times and sum over all the 4 outputs, and it stuck in the last forward call. If I reduce the number of calling to 3, everything works fine. I am sure that I do the same thing on different GPUs.

Besides, if I dont do the sum, then my code also works well. It is really wired so I would like to ask if you have any suggestions? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.