Comments (4)
The running_mean and running_std are not used by training; they are only used for inference. The motivation of having running_mean and running_std is to have an approximate estimation of the statistics of the training samples. This estimation can happen in any sequence of random samples, and when the model converges, this estimation should tend to be the same regardless of the sequence. So it is not necessary to sync the running_mean and running_std; it is sufficient to use them from any of the GPUs.
from moco.
What you described is Sync SGD; it has nothing to do with SyncBN. BN can be applied with DDP training. DDP training is by default Sync SGD training.
from moco.
As I know, the original BN only compute the mean and variance for each process (GPU) separately and does not communicate accross GPUs. That's why SyncBN arises to enable the mean and variance being computed on all batch samples.
In detectron2, defaults.py L457-L458 allow us to choose BN from (FrozenBN, GN, "SyncBN", "BN"). Here I focus on "BN" case. As you mention that Sync SGD can ensure the gradients of the affine weights and bias will be synchronized across GPUs. But what about the running_mean and running_var? Note that they do not belong to model weights and have nothing to do with SGD. Will they also be synchronized in original "BN"? Thanks.
from moco.
Thanks a lot for rectifying my misunderstanding of BN. Originally, I thought that the running_mean and running_var were also used in training phase, which confused me a lot.
from moco.
Related Issues (20)
- Issue about dequeue_and_enqueue HOT 3
- Question about the queue for key encoder HOT 2
- Why labels are all zeros, should first columns of labels be ones? HOT 4
- Issue with batch size HOT 1
- Low Accuracy
- One question about single GPU HOT 2
- How to load the Hyperparameters without command line code Argument Parser?
- About training HOT 2
- what information is leaked due to intra-batch communication? HOT 2
- What is the label format of the cifar-10 dataset? HOT 1
- Concerns about feature dimensionality in MoCo self-training
- Can you tell me dataset structure and how images are named in the dataset HOT 1
- why pretrain from encoder_q? HOT 1
- Question about queue dimension
- How is BN in key-encoder updated (in Moco v1)? HOT 1
- Why is labels = zeros(N) set to zero? HOT 3
- The size of the dictionary HOT 2
- About License
- About using the model dict HOT 1
- Unable to open the compressed file
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from moco.