Coder Social home page Coder Social logo

Comments (9)

vacancy avatar vacancy commented on August 18, 2024

Hi @shachoi Thank you for your interested in!

I haven't test the memory usage carefully, but a quick answer is yes, mainly for the master GPU card, because it need to collect the statistics from other cards.

But I don't think there will be some big difference. Could you please share with us how precisely is the extra memory usage (in percent, for example)?

from synchronized-batchnorm-pytorch.

shachoi avatar shachoi commented on August 18, 2024

Hi @vacancy,
Thanks a lot for your reply :)

I have tested sync batch norm on deeplab-resnet based segmentation task.
When I applied sync batch norm, it consumes about 30-40% more GPU memory. Detailed memory consumption info. is as follows.

  • sync batch norm : GPU1 - 8769 / GPU2 - 7125 / GPU3 - 7125
  • pytorch typical batch norm : GPU1 - 6687 / GPU2 - 5039 / GPU3 - 5039

from synchronized-batchnorm-pytorch.

vacancy avatar vacancy commented on August 18, 2024

Hi,

I currently have little idea about the exact cause of the memory consumption. I will probably revisit this issue next week.

Just for your reference, here is another project using this SyncBN: https://github.com/CSAILVision/semantic-segmentation-pytorch

@Tete-Xiao, do you have any comment on this?

from synchronized-batchnorm-pytorch.

Tete-Xiao avatar Tete-Xiao commented on August 18, 2024

@vacancy I did notice that the segmentation framework consumes more GPU memory than the normal one.

from synchronized-batchnorm-pytorch.

vacancy avatar vacancy commented on August 18, 2024

@shachoi Thank you for posting this issue! I think the memory consumption issue is confirmed. I will get back to this next week.

from synchronized-batchnorm-pytorch.

Hellomodo avatar Hellomodo commented on August 18, 2024

Hi @vacancy 。 Thanks for your great work!, And do you have any solution to the memory consumption issue now?

from synchronized-batchnorm-pytorch.

vacancy avatar vacancy commented on August 18, 2024

@Tete-Xiao If you have spare time recently, can you help me with this issue?

@Hellomodo Here is my quick reply. There are two major reasons.

  1. We use the NCCL backend provided by PyTorch to sync the feature statistics across GPUs. This requires a certain amount of extra memory. Although it shouldn't be this much in theory, in practice, PyTorch/NCCL might allocate more memory than required, depending on the implementation.
  2. We implemented bachnorm using primitive PyTorch apis, this requires extra memories to store intermediate variables. One way to reduce such cost is by optimizing the codes in https://github.com/vacancy/Synchronized-BatchNorm-PyTorch/blob/master/sync_batchnorm/batchnorm.py.

from synchronized-batchnorm-pytorch.

yelantf avatar yelantf commented on August 18, 2024

I have faced the same issue. Any progress so far?

from synchronized-batchnorm-pytorch.

CarpeDiemly avatar CarpeDiemly commented on August 18, 2024

I have faced the same issue,too.
Before using convet_model to replace typical batch norm with SynchronizedBatchNorm2d:
GPU_1---7520, GPU_2---6756.
After that,
GPU_1---9760,GPU_2---8796.
can you help me? @vacancy

from synchronized-batchnorm-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.