vacancy / synchronized-batchnorm-pytorch Goto Github PK
View Code? Open in Web Editor NEWSynchronized Batch Normalization implementation in PyTorch.
License: MIT License
Synchronized Batch Normalization implementation in PyTorch.
License: MIT License
Hi!
I tried to use convert_model to convert my model to use synchronized BatchNorm, but I got the following error.
File "/home/xxx/anaconda3/envs/pt1.2/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 146, in forward
"them on device: {}".format(self.src_device_obj, t.device))
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu
Here are the codes:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = net() # my model
if args.use_gpu:
model = torch.nn.DataParallel(model, device_ids=gpus).to(device)
model = convert_model(model)
Have I done something wrong? Also, the codes run ok when conver_model is not being used.
I wrote code like this in my net,
net = nn.Sequential( nn.Conv2d(18*2, 18, kernel_size=3, padding=1, bias=True), SynchronizedBatchNorm2d(18*2), nn.ReLU(inplace=True),)
and I found the channel between conv layer output and sync batchnorm input are not match. (18 and 18*2), but the code is run successfully without warning or error. After I debug some code for this, it work when use multiply gpus in training, which means adding this lines in the code.
net = nn.DataParallel(net, device_ids=[0,1]) patch_replication_callback(net) net = net.cuda()
If use this sync batch norm on one gpu, that will report error, RuntimeError: running_mean should contain 18 elements not 36
I don't know whether it is a bug. I found in the source code of sync batch norm, it will reshape the input tensor like
input = input.view(input.size(0), self.num_features, -1)
.
Is that means once this shape of tensor can be reshape by the code in this line, even the channels are mismatch, the batch norm process still work with no error?
Hi,
Thank you for your work and sharing.
I try to use convert_model
function on my own code, for example:
cudnn.benchmark = True
net = Network()
net.cuda()
net = nn.DataParallel(net, device_ids=args.gpus)
net = convert_model(net)
However, after training, I found that the result is far away from my expectations, even worse than using the nn.BatchNorm2d that comes with the PyTorch. Do I use convert_model
function wrongly? Or are there some points to note? Thank you very much!
Does this module work if I use distributed data parallel as described here - https://github.com/dougsouza/pytorch-sync-batchnorm-example
Hi,
I have merged the implementation into my code.
But when loading the bias weights of this layer, it keeps popping out this kind of error:
AttributeError: 'DataParallelWithCallback' object has no attribute 'bias'
I wonder if I want to set
affine=True
in
sync_bn = SynchronizedBatchNorm1d(filters, eps=1e-5, affine=True)
How should I modify my code?
Thanks
1240
Sry confused wrong repo
Hi,
Thank you for the great code. I have looked at the related issues but it turns out that it doesnt help in my case. I have a network using your Sync BN. I try to call the forward pass of the model for 4 times and sum over all the 4 outputs, and it stuck in the last forward call. If I reduce the number of calling to 3, everything works fine. I am sure that I do the same thing on different GPUs.
Besides, if I dont do the sum, then my code also works well. It is really wired so I would like to ask if you have any suggestions? Thanks!
Hi, thanks a lot for your code.
But when I apply this code to my implemented e2e version of FPN, some wired things happen.
If I use 8 cards, the GPU memory continues to increase until one card occupies all GPU memory. Then the FPN get stuck and the GPU utilization of other not-full cards are 100%.
If I use 4 cards, the memory continue to increase and then FPN may get struck with all GPU utilization as 0.
I used PyTorch 0.4.0 with your DataParallelWithCallback and the input image size is different on different cards. And if I use BN from official pytorch, my code works well.
Could you pls give me any hints to help me to find the reason?
Create a virtual environment and install torch and pytest, run pytest immediately to test the script below:
import torch
from torch import nn
from torch.nn import init
import pytest
def allclose(x, y):
adiff = (x - y).abs().max()
if (y == 0).all():
rdiff = 'NaN'
else:
rdiff = (adiff / y).abs().max()
message = (
'Tensor close check failed\n'
'adiff={}\n'
'rdiff={}\n'
).format(adiff, rdiff)
assert torch.allclose(x, y), message
class TBatchNorm(nn.Module):
def __init__(
self,
num_features,
eps=1e-5,
momentum=0.1):
super(TBatchNorm, self).__init__()
self.num_features = num_features
self.eps = eps
self.momentum = momentum
self.weight = nn.Parameter(torch.Tensor(num_features))
self.bias = nn.Parameter(torch.Tensor(num_features))
self.register_buffer('running_mean', torch.zeros(num_features))
self.register_buffer('running_var', torch.ones(num_features))
self.reset_parameters()
def reset_running_stats(self):
self.running_mean.zero_()
self.running_var.fill_(1)
def reset_parameters(self):
self.reset_running_stats()
nn.init.uniform_(self.weight)
nn.init.zeros_(self.bias)
def forward(self, input_):
batchsize, channels, height, width = input_.size()
numel = batchsize * height * width
input_ = input_.permute(1, 0, 2, 3).contiguous().view(channels, numel)
sum_ = input_.sum(1)
sum_of_square = input_.pow(2).sum(1)
mean = sum_ / numel
sumvar = sum_of_square - sum_ * mean
unbias_var = sumvar / (numel - 1)
bias_var = sumvar / numel
std = bias_var.clamp(self.eps) ** 0.5
self.running_mean = (
(1 - self.momentum) * self.running_mean
+ self.momentum * mean.detach())
self.running_var = (
(1 - self.momentum) * self.running_var
+ self.momentum * unbias_var.detach())
output = (
(input_ - mean.unsqueeze(1)) / std.unsqueeze(1) *
self.weight.unsqueeze(1) + self.bias.unsqueeze(1))
return output.permute(1, 0).contiguous().view(
batchsize, channels, height, width)
@pytest.fixture(scope='module')
def instance():
CHANNELS = 16
batchnorm1 = nn.BatchNorm2d(CHANNELS, momentum=1)
optimizer1 = torch.optim.SGD(batchnorm1.parameters(), lr=0.01)
batchnorm2 = TBatchNorm(CHANNELS, momentum=1)
with torch.no_grad():
batchnorm2.weight = nn.Parameter(batchnorm1.weight.clone())
batchnorm2.bias = nn.Parameter(batchnorm1.bias.clone())
optimizer2 = torch.optim.SGD(batchnorm2.parameters(), lr=0.01)
for _ in range(100):
input_ = torch.rand(16, CHANNELS, 16, 16)
input1 = input_.clone().requires_grad_(True)
output1 = batchnorm1(input1)
output1.sum().backward()
optimizer1.step()
input2 = input_.clone().requires_grad_(True)
output2 = batchnorm2(input2)
output2.sum().backward()
optimizer2.step()
return input1, batchnorm1, output1, input2, batchnorm2, output2
def test_input(instance):
input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
allclose(input1, input2)
def test_output(instance):
input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
allclose(output1, output2)
def test_input_grad(instance):
input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
allclose(input1.grad, input2.grad)
def test_weight_grad(instance):
input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
allclose(batchnorm1.weight.grad, batchnorm2.weight.grad)
def test_bias_grad(instance):
input1, batchnorm1, output1, input2, batchnorm2, output2 = instance
allclose(batchnorm1.bias.grad, batchnorm2.bias.grad)
Is this script reasonable?
How do I output the model code as onnx
some error export like tihs
RuntimeError: Unsupported: ONNX export of batch_norm for unknown channel size.
Thanks for this excellent project, but I have problems to test it successfully.
ERROR: testNumericBatchNorm (main.NumericTestCase)
Traceback (most recent call last):
File "test_numeric_batchnorm.py", line 48, in testNumericBatchNorm
self.assertTensorClose(bn.running_mean, a.mean(dim=0))
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'
Ran 1 test in 0.192s
FAILED (errors=1)
ERROR: testNumericBatchNorm (main.NumericTestCasev2)
Traceback (most recent call last):
File "test_numeric_batchnorm_v2.py", line 33, in testNumericBatchNorm
batchnorm2 = BatchNorm2dReimpl(CHANNELS, momentum=1)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/batchnorm_reimpl.py", line 33, in init
self.weight = nn.Parameter(torch.empty(num_features))
AttributeError: module 'torch' has no attribute 'empty'
Ran 1 test in 0.001s
FAILED (errors=1)
ERROR: testSyncBatchNorm2DSyncTrain (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 107, in testSyncBatchNorm2DSyncTrain
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10, 16, 16), True, cuda=True)
File "test_sync_batchnorm.py", line 59, in _checkBatchNormResult
output2.sum().backward()
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion pos >= 0 && pos < buffer.size()
failed.
ERROR: testSyncBatchNormNormalEval (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 77, in testSyncBatchNormNormalEval
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), False)
File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult
self.assertTensorClose(input1.data, input2.data)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'
ERROR: testSyncBatchNormNormalTrain (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 71, in testSyncBatchNormNormalTrain
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), True)
File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult
self.assertTensorClose(input1.data, input2.data)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'
ERROR: testSyncBatchNormSyncEval (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 97, in testSyncBatchNormSyncEval
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), False, cuda=True)
File "test_sync_batchnorm.py", line 61, in _checkBatchNormResult
self.assertTensorClose(input1.data, input2.data)
File "/home/liuyongcheng/3dcls/scannet/embed/scancls_embed13/syncbn/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AttributeError: module 'torch' has no attribute 'allclose'
ERROR: testSyncBatchNormSyncTrain (main.SyncTestCase)
Traceback (most recent call last):
File "test_sync_batchnorm.py", line 87, in testSyncBatchNormSyncTrain
self._checkBatchNormResult(bn, sync_bn, torch.rand(16, 10), True, cuda=True)
File "test_sync_batchnorm.py", line 59, in _checkBatchNormResult
output2.sum().backward()
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/liuyongcheng/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion pos >= 0 && pos < buffer.size()
failed.
Ran 5 tests in 6.546s
FAILED (errors=5)
Traceback (most recent call last):
File "train_scan_em13.py", line 190, in
loss.backward()
File "/home/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: torch/csrc/autograd/input_buffer.cpp:14: add: Assertion pos >= 0 && pos < buffer.size()
failed.
Do you have any suggestions? Thanks.
Hi,
I tried to used it as
model = Model(...)
model = nn.DataParallel(model)
model = convert_model(model)
model = model.cuda()
However, it stucked and training couldn 't start.
Have you seem similar problems before ?
Hi!
I looked for "track_running_stats" implementation code. But I could not find.
Since SynchBN inherit nn.modules.batchNrom, It automatically track running statics?
Thank you for your nice code.
when I use this sync-BN in my model , it became more slowly and even stop . I don not know why it happened,IO load? or other reasons?Can you help me?
First of all, thank you for the implementation. It's very helpful.
I have one question.
After sync batch norm is applied, it consumes more GPU memory than normal batch norm.
Is it right?
If I want to use this module on the base model, how can I load the pre-trained model offered by the official PyTorch?
Or, should I train the base model on the ImageNet from scratch with this sync-bn?
Hi,
I do not quite understand the examples you released. What if nn.BatchNorm2d
is part of a module ?
For example, if I use your SynchronizedBatchNorm2d
to replace the nn.BatchNorm2d
in the resnet50 model, shall I use net = DataParalleWithCallBack(net)
instead of net = nn.DataParallel(net)
? Or do I just need to use DataParallelWithCallBack
to wrap your implemented SynchronizedBatchNorm2d
?
I'm not sure whether I need to use the function convert_to when testing. Besides, can model.eval() switches it to the test situation?
GPU benchmark: 8 x 1080 Ti
Cuda version: 9.0
Pytorch version: 0.4.1
Experiment config:
batch size: 16
num workers: 16
input size: 480x480
When I use the sync bn on the ADE20K dataset, my experiment will stop at a certain iteration without other notion output. And the utilization rate of GPU will drop to 0. Did you have the similar experience?
See https://github.com/vacancy/Synchronized-BatchNorm-PyTorch/blob/master/sync_batchnorm/unittest.py#L28
If npa is negative at all, then np.fmax will return 1e-5
, even the abs of npa is greater than 1e-5
.
I use pytorch 1.2.0, I first init the model, then use model.cuda()
to put the model into gpu, then, I call
model = nn.DataParallel(model) model = convert_model(model)
train the model, give me such error, could you give me some information to avoid this ? thank you
运行test报错:No module named 'models.networks.sync_batchnorm',已经查过normalization.py和generator.py文件,只发现,在整个工程里只有这两个文件中提到models.networks.sync_batchnorm,但并没有接口或被调用?本人版本也在要求之内,确实想不出来是哪里出的问题了。请教了。
Hi, thanks very much for your code.
But, trained with sync-bn layer, my model seems unstable between adjacent epochs during test phase, i.e., on the test dataset, I tested my model every two epochs, and performance curve present to vibrate seriously.
PS. I used the momentum of 0.1 by default, does it seem too large ?
Hi,
Nice work!But when I run this test file https://github.com/vacancy/Synchronized-BatchNorm-PyTorch/blob/master/tests/test_numeric_batchnorm.py , it gives me following output.
F
======================================================================
FAIL: testNumericBatchNorm (__main__.NumericTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_numeric_batchnorm.py", line 51, in testNumericBatchNorm
self.assertTensorClose(b_var1.data, b_var2.data)
File "/home/cws/Download/Synchronized-BatchNorm-PyTorch/sync_batchnorm/unittest.py", line 28, in assertTensorClose
self.assertTrue(torch.allclose(x, y), message)
AssertionError: False is not true : Tensor close check failed
adiff=0.0003001689910888672
rdiff=0.02449820376932621
----------------------------------------------------------------------
Ran 1 test in 0.157s
FAILED (failures=1)
And other 2 test python files give me similar output. I'm not sure whether this is expected.
Version: Pytorch1.0, cuda9.0
========
Then I use SyncBN in my code, and everything seems fine( at least from ouput, like loss, accuracy, etc.) Specificly, I do this in following methods.
nn.BatchNorm2d
in model definition with SynchronizedBatchNorm2d
torch.nn.DataParallel
with DataParallelWithCallback
Is this correct way to use SyncBN here? Do I have some method to confirm SyncBN works?
Thanks
I noticed that you divide the sum by it sum_size, however the sum_size is not multiplied by the device_ids:
Hi~
Thanking for your code firstly !
I use the SyncBatch for training SSD, when training I can get 46.81% mAP after 10 epoches finishes. However, when I use the saved model, I only get 36.7% mAP
Here is how I convert model
ssd_student = convert_model(ssd_student) net_student = ssd_student
This is code for saving model
torch.save(net_student.state_dict(), 'weights/' + model_name + '/ssd300_COCO_' + repr(iteration) + '_%.2f.pth' % mAP_student)
When load saved model, I didnt use the convert_model to convert to SyncBN since i think there is no need to convert model when key actually is the same.
I couldn't find where I'm wrong, please give some kind advice, thank you !!
Hello. Thanks for the awesome code.
I read this code base and the adaptation from zhanghang1989 . His code customs two operators for saving gpu memory. I've just combined the cuda extension for PyTorch 0.4.1. Although I have past the gradcheck of the operator bn and the opeartor sum_square, and for each operator I've compared the operator ouput with the ouput from imperative implementation using PyTorch, I can not past the test case provided here...
May you have any suggestion about the numeric stability ?
Hi, really appreciate the great code.
I just got one question on computing the invstd (#L150), which uses sqrt(max(var, eps))
instead of sqrt(var+eps)
.
For example, in Normalization.cuh in pytorch, it seems that they compute invstd with sqrt(var+eps)
as below.
template<typename T>
struct InvStd {
__device__ __forceinline__ T operator()(T var, double epsilon) const {
T invstd = 0;
if (var != static_cast<T>(0) || epsilon != static_cast<T>(0)) {
invstd = static_cast<T>(1) / device_sqrt(var + epsilon);
}
return invstd;
}
};
or is there any reason to choose the way of calculating invstd as now?
Thanks!
What is the numeric error? Overflow? Or not same behavior as the original batchnorm? And could you give some hints to solve it? Thank you very much!
Hi,
Good job! I tried to used it as
device=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model = Model(...)
model = nn.DataParallel(model, device_ids=[0, 1])
model = convert_model(model).to(device)
However, it stucked and training couldn 't start.
Have you seem similar problems before ?
I've taken a close look at the source code in this method, and I have thought about the code below:
line 1: if isinstance(module, torch.nn.DataParallel):
line 2: mod = module.module
line 3: mod = convert_model(mod)
line 4: mod = DataParallelWithCallback(mod)
line 5: return mod
Whether line 4 should be 'DataParallelWithCallback(mod, device_ids=module.device_ids)'?
this can keep the same cuda device ids, otherwise, it'll use all cuda devices which is available.
Please feel free to comment, and show your thoughts, thanks!
sync_batchnorm/batchnorm_reimpl.py
all = ['BatchNormReimpl']
There is no BatchNormReimpl in batchnorm_reimpl.py.
Maybe BatchNorm2dReimpl instead?
I am reading the source codes of Synced BN in fyu/drn, zhanghang1989/PyTorch-SyncBatchNorm and your program at the same time.
I find that your program is different with the others, the others rewrite the BN function to pass the global mean and std to it while the official BN function calculate the local mean and std inside, but you just use simple expression like output = (input - _unsqueeze_ft(mean)) * _unsqueeze_ft(inv_std * self.weight) + _unsqueeze_ft(self.bias)
in the _SynchronizedBatchNorm
module which is inherited from the torch.nn.modules.batchnorm
.
So I think this means that the _SynchronizedBatchNorm
's backward
will be redefined automatically to be adjusted to the expression so that you do not need to rewrite the function like others program, right?
Hi,
Good job! I tried to used it as
device=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model = Model(...)
model = nn.DataParallel(model, device_ids=[0, 1])
model = convert_model(model).to(device)
However, it stucked and training couldn 't start.
Have you seem similar problems before ?
there are some problme when i run the example code about convert_model,the variable ‘mod’ was not assigned,it seems something wrong about the recursion
I have checked that, in pytorch, the std is computed as sqrt(var + eps). But the Synchronized-BatchNorm uses the equation sqrt(max(var, eps)) instead. In the file README.md, it says that "It is completely compatible with PyTorch's implementation. Specifically, it uses unbiased variance to update the moving average, and use sqrt(max(var, eps)) instead of sqrt(var + eps).".
Are there something wrong in my understanding? Please help.
Thanks very much.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.