Comments (9)
Hi @shachoi Thank you for your interested in!
I haven't test the memory usage carefully, but a quick answer is yes, mainly for the master GPU card, because it need to collect the statistics from other cards.
But I don't think there will be some big difference. Could you please share with us how precisely is the extra memory usage (in percent, for example)?
from synchronized-batchnorm-pytorch.
Hi @vacancy,
Thanks a lot for your reply :)
I have tested sync batch norm on deeplab-resnet based segmentation task.
When I applied sync batch norm, it consumes about 30-40% more GPU memory. Detailed memory consumption info. is as follows.
- sync batch norm : GPU1 - 8769 / GPU2 - 7125 / GPU3 - 7125
- pytorch typical batch norm : GPU1 - 6687 / GPU2 - 5039 / GPU3 - 5039
from synchronized-batchnorm-pytorch.
Hi,
I currently have little idea about the exact cause of the memory consumption. I will probably revisit this issue next week.
Just for your reference, here is another project using this SyncBN: https://github.com/CSAILVision/semantic-segmentation-pytorch
@Tete-Xiao, do you have any comment on this?
from synchronized-batchnorm-pytorch.
@vacancy I did notice that the segmentation framework consumes more GPU memory than the normal one.
from synchronized-batchnorm-pytorch.
@shachoi Thank you for posting this issue! I think the memory consumption issue is confirmed. I will get back to this next week.
from synchronized-batchnorm-pytorch.
Hi @vacancy 。 Thanks for your great work!, And do you have any solution to the memory consumption issue now?
from synchronized-batchnorm-pytorch.
@Tete-Xiao If you have spare time recently, can you help me with this issue?
@Hellomodo Here is my quick reply. There are two major reasons.
- We use the NCCL backend provided by PyTorch to sync the feature statistics across GPUs. This requires a certain amount of extra memory. Although it shouldn't be this much in theory, in practice, PyTorch/NCCL might allocate more memory than required, depending on the implementation.
- We implemented bachnorm using primitive PyTorch apis, this requires extra memories to store intermediate variables. One way to reduce such cost is by optimizing the codes in
https://github.com/vacancy/Synchronized-BatchNorm-PyTorch/blob/master/sync_batchnorm/batchnorm.py
.
from synchronized-batchnorm-pytorch.
I have faced the same issue. Any progress so far?
from synchronized-batchnorm-pytorch.
I have faced the same issue,too.
Before using convet_model to replace typical batch norm with SynchronizedBatchNorm2d:
GPU_1---7520, GPU_2---6756.
After that,
GPU_1---9760,GPU_2---8796.
can you help me? @vacancy
from synchronized-batchnorm-pytorch.
Related Issues (20)
- test gap between training and test HOT 3
- Is this a bug that channel between input tensor and sync batchnorm are mismatch the code still run successful? HOT 2
- How to cite this repo in bib?
- How to use it when testing HOT 1
- RuntimeError with convert_model - "found one of them on device: cpu" HOT 2
- Thinking about 'sync_batchnorm.batchnorm.convert_model(module)'.. HOT 1
- Wired things, module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu HOT 1
- Question on `sqrt(max(var, eps))` HOT 2
- module问题
- Training cannot start HOT 7
- Training cannot start
- batchnor while using distributed dataparallel HOT 1
- a question about the highlight "use sqrt(max(var, eps)) instead of sqrt(var + eps)" HOT 4
- Train Stucked HOT 4
- raining couldn 't start HOT 2
- Training stuck with multiple call of forward function HOT 7
- Where is "track_running_stats" implementation code? HOT 1
- how to export with onnx
- .
- spam
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from synchronized-batchnorm-pytorch.