Coder Social home page Coder Social logo

synchronized-batchnorm-pytorch's Issues

RuntimeError with convert_model - "found one of them on device: cpu"

Hi!

I tried to use convert_model to convert my model to use synchronized BatchNorm, but I got the following error.

File "/home/xxx/anaconda3/envs/pt1.2/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 146, in forward
    "them on device: {}".format(self.src_device_obj, t.device))

RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

Here are the codes:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = net()  # my model

if args.use_gpu:
    model = torch.nn.DataParallel(model, device_ids=gpus).to(device)

model = convert_model(model)

Have I done something wrong? Also, the codes run ok when conver_model is not being used.

Is this a bug that channel between input tensor and sync batchnorm are mismatch the code still run successful?

I wrote code like this in my net,
net = nn.Sequential( nn.Conv2d(18*2, 18, kernel_size=3, padding=1, bias=True), SynchronizedBatchNorm2d(18*2), nn.ReLU(inplace=True),)

and I found the channel between conv layer output and sync batchnorm input are not match. (18 and 18*2), but the code is run successfully without warning or error. After I debug some code for this, it work when use multiply gpus in training, which means adding this lines in the code.
net = nn.DataParallel(net, device_ids=[0,1]) patch_replication_callback(net) net = net.cuda()

If use this sync batch norm on one gpu, that will report error, RuntimeError: running_mean should contain 18 elements not 36
I don't know whether it is a bug. I found in the source code of sync batch norm, it will reshape the input tensor like
input = input.view(input.size(0), self.num_features, -1).
Is that means once this shape of tensor can be reshape by the code in this line, even the channels are mismatch, the batch norm process still work with no error?

Network performance is getting worse

Hi,
Thank you for your work and sharing.
I try to use convert_model function on my own code, for example:

    cudnn.benchmark = True
    net = Network()
    net.cuda()
    net = nn.DataParallel(net, device_ids=args.gpus)

    net = convert_model(net)

However, after training, I found that the result is far away from my expectations, even worse than using the nn.BatchNorm2d that comes with the PyTorch. Do I use convert_model function wrongly? Or are there some points to note? Thank you very much!

How to load bias weights

Hi,

I have merged the implementation into my code.
But when loading the bias weights of this layer, it keeps popping out this kind of error:

AttributeError: 'DataParallelWithCallback' object has no attribute 'bias'

I wonder if I want to set
affine=True
in
sync_bn = SynchronizedBatchNorm1d(filters, eps=1e-5, affine=True)

How should I modify my code?

Thanks

.

Sry confused wrong repo

Training stuck with multiple call of forward function

Hi,

Thank you for the great code. I have looked at the related issues but it turns out that it doesnt help in my case. I have a network using your Sync BN. I try to call the forward pass of the model for 4 times and sum over all the 4 outputs, and it stuck in the last forward call. If I reduce the number of calling to 3, everything works fine. I am sure that I do the same thing on different GPUs.

Besides, if I dont do the sum, then my code also works well. It is really wired so I would like to ask if you have any suggestions? Thanks!

Wired things happen when applied to FPN

Hi, thanks a lot for your code.

But when I apply this code to my implemented e2e version of FPN, some wired things happen.

If I use 8 cards, the GPU memory continues to increase until one card occupies all GPU memory. Then the FPN get stuck and the GPU utilization of other not-full cards are 100%.

If I use 4 cards, the memory continue to increase and then FPN may get struck with all GPU utilization as 0.

I used PyTorch 0.4.0 with your DataParallelWithCallback and the input image size is different on different cards. And if I use BN from official pytorch, my code works well.

Could you pls give me any hints to help me to find the reason?

Another test script for numerical stability

Create a virtual environment and install torch and pytest, run pytest immediately to test the script below:

import torch
from torch import nn
from torch.nn import init

import pytest


def allclose(x, y):
    adiff = (x - y).abs().max()
    if (y == 0).all():
        rdiff = 'NaN'
    else:
        rdiff = (adiff / y).abs().max()
    message = (
            'Tensor close check failed\n'
            'adiff={}\n'
            'rdiff={}\n'
    ).format(adiff, rdiff)
    assert torch.allclose(x, y), message


class TBatchNorm(nn.Module):

    def __init__(
            self,
            num_features,
            eps=1e-5,
            momentum=0.1):
        super(TBatchNorm, self).__init__()
        self.num_features = num_features
        self.eps = eps
        self.momentum = momentum
        self.weight = nn.Parameter(torch.Tensor(num_features))
        self.bias = nn.Parameter(torch.Tensor(num_features))
        self.register_buffer('running_mean', torch.zeros(num_features))
        self.register_buffer('running_var', torch.ones(num_features))
        self.reset_parameters()

    def reset_running_stats(self):
        self.running_mean.zero_()
        self.running_var.fill_(1)

    def reset_parameters(self):
        self.reset_running_stats()
        nn.init.uniform_(self.weight)
        nn.init.zeros_(self.bias)

    def forward(self, input_):
        batchsize, channels, height, width = input_.size()
        numel = batchsize * height * width
        input_ = input_.permute(1, 0, 2, 3).contiguous().view(channels, numel)
        sum_ = input_.sum(1)
        sum_of_square = input_.pow(2).sum(1)
        mean = sum_ / numel
        sumvar = sum_of_square - sum_ * mean
        unbias_var = sumvar / (numel - 1)
        bias_var = sumvar / numel
        std = bias_var.clamp(self.eps) ** 0.5

        self.running_mean = (
                (1 - self.momentum) * self.running_mean
                + self.momentum * mean.detach())
        self.running_var = (
                (1 - self.momentum) * self.running_var
                + self.momentum * unbias_var.detach())

        output = (
                (input_ - mean.unsqueeze(1)) / std.unsqueeze(1) *
                self.weight.unsqueeze(1) + self.bias.unsqueeze(1))

        return output.permute(1, 0).contiguous().view(
                batchsize, channels, height, width)


@pytest.fixture(scope='module')
def instance():
    CHANNELS = 16
    batchnorm1 = nn.BatchNorm2d(CHANNELS, momentum=1)
    optimizer1 = torch.optim.SGD(batchnorm1.parameters(), lr=0.01)
    batchnorm2 = TBatchNorm(CHANNELS, momentum=1)
    with torch.no_grad():
        batchnorm2.weight = nn.Parameter(batchnorm1.weight.clone())
        batchnorm2.bias = nn.Parameter(batchnorm1.bias.clone())
    optimizer2 = torch.optim.SGD(batchnorm2.parameters(), lr=0.01)

    for _ in range(100):
        input_ = torch.rand(16, CHANNELS, 16, 16)

        input1 = input_.clone().requires_grad_(True)
        output1 = batchnorm1(input1)
        output1.sum().backward()
        optimizer1.step()

        input2 = input_.clone().requires_grad_(True)
        output2 = batchnorm2(input2)
        output2.sum().backward()
        optimizer2.step()

    return input1, batchnorm1, output1, input2, batchnorm2, output2


def test_input(instance):
    input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
    allclose(input1, input2)


def test_output(instance):
    input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
    allclose(output1, output2)


def test_input_grad(instance):
    input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
    allclose(input1.grad, input2.grad)


def test_weight_grad(instance):
    input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
    allclose(batchnorm1.weight.grad, batchnorm2.weight.grad)


def test_bias_grad(instance):
    input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
    allclose(batchnorm1.bias.grad, batchnorm2.bias.grad)

Is this script reasonable?

how to export with onnx

How do I output the model code as onnx

some error export like tihs

RuntimeError: Unsupported: ONNX export of batch_norm for unknown channel size.

Can't test successfully using the scripts in ./tests

Thanks for this excellent project, but I have problems to test it successfully.

1. I first run the scripts in ./tests, the errors are as follows:

(1) test_numeric_batchnorm.py

ERROR: testNumericBatchNorm (main.NumericTestCase)
Traceback (most recent call last):
File "test_numeric_batchnorm.py", line 48, in testNumericBatchNorm
self.assertTensorClose(bn.running_mean, a.mean(dim=0))
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'
Ran 1 test in 0.192s
FAILED (errors=1)

(2) test_numeric_batchnorm_v2.py

ERROR: testNumericBatchNorm (main.NumericTestCasev2)
Traceback (most recent call last):
File "test_numeric_batchnorm_v2.py", line 33, in testNumericBatchNorm
batchnorm2 = BatchNorm2dReimpl(CHANNELS, momentum=1)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/batchnorm_reimpl.py", line 33, in init
self.weight = nn.Parameter(torch.empty(num_features))
AttributeError: module 'torch' has no attribute 'empty'
Ran 1 test in 0.001s
FAILED (errors=1)

(3) test_sync_batchnorm.py

ERROR: testSyncBatchNorm2DSyncTrain (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 107, in testSyncBatchNorm2DSyncTrain
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10, 16, 16), True, cuda=True)
File "test_sync_batchnorm.py", line 59, in _checkBatchNormResult
output2.sum().backward()
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion pos >= 0 && pos < buffer.size() failed.

ERROR: testSyncBatchNormNormalEval (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 77, in testSyncBatchNormNormalEval
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), False)
File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult
self.assertTensorClose(input1.data, input2.data)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'

ERROR: testSyncBatchNormNormalTrain (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 71, in testSyncBatchNormNormalTrain
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), True)
File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult
self.assertTensorClose(input1.data, input2.data)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'

ERROR: testSyncBatchNormSyncEval (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 97, in testSyncBatchNormSyncEval
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), False, cuda=True)
File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult
self.assertTensorClose(input1.data, input2.data)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'

ERROR: testSyncBatchNormSyncTrain (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 87, in testSyncBatchNormSyncTrain
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), True, cuda=True)
File "test_sync_batchnorm.py", line 59, in _checkBatchNormResult
output2.sum().backward()
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion pos >= 0 && pos < buffer.size() failed.
Ran 5 tests in 6.546s
FAILED (errors=5)

2. I secondly run my scripts using net = DataParallelWithCallback(net, device_ids=[0, 1]) with two GPUs (single GPU is all right), the error is:

Traceback (most recent call last):
File "train_scan_em13.py", line 190, in
loss.backward()
File "/home/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion pos >= 0 && pos < buffer.size() failed.

Ubuntu 14.04

cuda8.0 & cudnn5.1

python3.6

pytorch 0.3.1

Do you have any suggestions? Thanks.

raining couldn 't start

Hi,

I tried to used it as

model = Model(...)
model = nn.DataParallel(model)
model = convert_model(model)
model = model.cuda()

However, it stucked and training couldn 't start.
Have you seem similar problems before ?

Is it more difficult to train model?

when I use this sync-BN in my model , it became more slowly and even stop . I don not know why it happened,IO load? or other reasons?Can you help me?

shall I use your `DataParallelWithCallback` instead of pytorch `nn.DataParallel`?

Hi,

I do not quite understand the examples you released. What if nn.BatchNorm2d is part of a module ?
For example, if I use your SynchronizedBatchNorm2d to replace the nn.BatchNorm2d in the resnet50 model, shall I use net = DataParalleWithCallBack(net) instead of net = nn.DataParallel(net) ? Or do I just need to use DataParallelWithCallBack to wrap your implemented SynchronizedBatchNorm2d ?

How to use it when testing

I'm not sure whether I need to use the function convert_to when testing. Besides, can model.eval() switches it to the test situation?

When I use the sync bn on 8x GPU, I will stop sometimes.

GPU benchmark: 8 x 1080 Ti
Cuda version: 9.0
Pytorch version: 0.4.1

Experiment config:
batch size: 16
num workers: 16
input size: 480x480

When I use the sync bn on the ADE20K dataset, my experiment will stop at a certain iteration without other notion output. And the utilization rate of GPU will drop to 0. Did you have the similar experience?

module问题

运行test报错:No module named 'models.networks.sync_batchnorm',已经查过normalization.py和generator.py文件,只发现,在整个工程里只有这两个文件中提到models.networks.sync_batchnorm,但并没有接口或被调用?本人版本也在要求之内,确实想不出来是哪里出的问题了。请教了。

Train Stucked

Hi ~
I Use
if torch.cuda.device_count() > 1: model = torch.nn.DataParallel(model) model = bnconvert(model) model.cuda()
to use sync-bn during multi-gpu training, but when training the network, it looks like training procedure stucked at final batch in one epoch

image

Unstable performance between adjacent epochs during test phase

Hi, thanks very much for your code.

But, trained with sync-bn layer, my model seems unstable between adjacent epochs during test phase, i.e., on the test dataset, I tested my model every two epochs, and performance curve present to vibrate seriously.

PS. I used the momentum of 0.1 by default, does it seem too large ?

About python test scripts and SyncBN usage

Hi,

Nice work!But when I run this test file https://github.com/vacancy/Synchronized-BatchNorm-PyTorch/blob/master/tests/test_numeric_batchnorm.py , it gives me following output.

F
======================================================================
FAIL: testNumericBatchNorm (__main__.NumericTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_numeric_batchnorm.py", line 51, in testNumericBatchNorm
    self.assertTensorClose(b_var1.data, b_var2.data)
  File "/home/cws/Download/Synchronized-BatchNorm-PyTorch/sync_batchnorm/unittest.py", line 28, in assertTensorClose
    self.assertTrue(torch.allclose(x, y), message)
AssertionError: False is not true : Tensor close check failed
adiff=0.0003001689910888672
rdiff=0.02449820376932621

----------------------------------------------------------------------
Ran 1 test in 0.157s

FAILED (failures=1)

And other 2 test python files give me similar output. I'm not sure whether this is expected.
Version: Pytorch1.0, cuda9.0

========
Then I use SyncBN in my code, and everything seems fine( at least from ouput, like loss, accuracy, etc.) Specificly, I do this in following methods.

  • replace all nn.BatchNorm2d in model definition with SynchronizedBatchNorm2d
  • replace torch.nn.DataParallel with DataParallelWithCallback
  • no change to loss function, just the same as the situation when training the model with 1 GPU

Is this correct way to use SyncBN here? Do I have some method to confirm SyncBN works?

Thanks

test gap between training and test

Hi~
Thanking for your code firstly !

I use the SyncBatch for training SSD, when training I can get 46.81% mAP after 10 epoches finishes. However, when I use the saved model, I only get 36.7% mAP

Here is how I convert model
ssd_student = convert_model(ssd_student) net_student = ssd_student

This is code for saving model
torch.save(net_student.state_dict(), 'weights/' + model_name + '/ssd300_COCO_' + repr(iteration) + '_%.2f.pth' % mAP_student)

When load saved model, I didnt use the convert_model to convert to SyncBN since i think there is no need to convert model when key actually is the same.

I couldn't find where I'm wrong, please give some kind advice, thank you !!

Can one past the test ?

Hello. Thanks for the awesome code.
I read this code base and the adaptation from zhanghang1989 . His code customs two operators for saving gpu memory. I've just combined the cuda extension for PyTorch 0.4.1. Although I have past the gradcheck of the operator bn and the opeartor sum_square, and for each operator I've compared the operator ouput with the ouput from imperative implementation using PyTorch, I can not past the test case provided here...
May you have any suggestion about the numeric stability ?

Question on `sqrt(max(var, eps))`

Hi, really appreciate the great code.

I just got one question on computing the invstd (#L150), which uses sqrt(max(var, eps)) instead of sqrt(var+eps).
For example, in Normalization.cuh in pytorch, it seems that they compute invstd with sqrt(var+eps) as below.

template<typename T>
struct InvStd {
  __device__ __forceinline__ T operator()(T var, double epsilon) const {
    T invstd = 0;
    if (var != static_cast<T>(0) || epsilon != static_cast<T>(0)) {
      invstd = static_cast<T>(1) / device_sqrt(var + epsilon);
    }
    return invstd;
  }
};

or is there any reason to choose the way of calculating invstd as now?
Thanks!

about fp16

When I use fp16 (16-bit float) and multi-gpu training,the code will wait in SyncBN(comm.py).
tim 20190220212250

How to solve the numeric error?

What is the numeric error? Overflow? Or not same behavior as the original batchnorm? And could you give some hints to solve it? Thank you very much!

Training cannot start

Hi,

Good job! I tried to used it as

device=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model = Model(...)
model = nn.DataParallel(model, device_ids=[0, 1])
model = convert_model(model).to(device)

However, it stucked and training couldn 't start.
Have you seem similar problems before ?

Thinking about 'sync_batchnorm.batchnorm.convert_model(module)'..

I've taken a close look at the source code in this method, and I have thought about the code below:

line 1: if isinstance(module, torch.nn.DataParallel):
line 2: mod = module.module
line 3: mod = convert_model(mod)
line 4: mod = DataParallelWithCallback(mod)
line 5: return mod

Whether line 4 should be 'DataParallelWithCallback(mod, device_ids=module.device_ids)'?
this can keep the same cuda device ids, otherwise, it'll use all cuda devices which is available.

Please feel free to comment, and show your thoughts, thanks!

Where is the "BatchNormReimpl"?

sync_batchnorm/batchnorm_reimpl.py
all = ['BatchNormReimpl']

There is no BatchNormReimpl in batchnorm_reimpl.py.
Maybe BatchNorm2dReimpl instead?

_SynchronizedBatchNorm's backward is redefined by its forward, right?

I am reading the source codes of Synced BN in fyu/drn, zhanghang1989/PyTorch-SyncBatchNorm and your program at the same time.

I find that your program is different with the others, the others rewrite the BN function to pass the global mean and std to it while the official BN function calculate the local mean and std inside, but you just use simple expression like output = (input - _unsqueeze_ft(mean)) * _unsqueeze_ft(inv_std * self.weight) + _unsqueeze_ft(self.bias) in the _SynchronizedBatchNorm module which is inherited from the torch.nn.modules.batchnorm.

So I think this means that the _SynchronizedBatchNorm's backward will be redefined automatically to be adjusted to the expression so that you do not need to rewrite the function like others program, right?

Training cannot start

Hi,

Good job! I tried to used it as

device=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model = Model(...)
model = nn.DataParallel(model, device_ids=[0, 1])
model = convert_model(model).to(device)

However, it stucked and training couldn 't start.
Have you seem similar problems before ?

about convert_model

there are some problme when i run the example code about convert_model,the variable ‘mod’ was not assigned,it seems something wrong about the recursion

a question about the highlight "use sqrt(max(var, eps)) instead of sqrt(var + eps)"

I have checked that, in pytorch, the std is computed as sqrt(var + eps). But the Synchronized-BatchNorm uses the equation sqrt(max(var, eps)) instead. In the file README.md, it says that "It is completely compatible with PyTorch's implementation. Specifically, it uses unbiased variance to update the moving average, and use sqrt(max(var, eps)) instead of sqrt(var + eps).".

Are there something wrong in my understanding? Please help.

Thanks very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.