Comments (4)
Hi @FateScript , yeah good point. We never published this. It's some math I sketched out. I'll share my math below with the caveat that I haven't been very careful to verify its full correctness, and the math may lack context (e.g., variable meanings). I sketched the math and implemented it, and it works empirically. So I'm sharing the math, and hopefully there's no embarrassing mistakes in it. If you find an embarrassing mistake in the math below, please lmk! Now, the math is lacking context, I hope you'll be able to understand it. I probably won't have time to go into more detail, but I figured I'd share in the hope that you can try to decode it and it's somewhat useful :)
Momentum formulation [α=.999]
v = α · v + (1 - α) · u
Update formulation [α=.001]:
v = (1 - α) · v + α · u
Two step update rolled into one assuming α2 ≈ 0 and setting u=(u0+u1)/2:
v1 = (1 - α) · v0 + α · u0
v2 = (1 - α) · v1 + α · u1
v2 = (1 - α) · ((1 - α) · v0 + α · u0) + α · u1
v2 = (1 - α) · ((1 - α) · v0 ) + α · u0 + α · u1
v2 = (1 - 2α + α2) · v0 + α · u0 + α · u1
v2 ≈ (1 - 2α) · v0 + 2α · u
The same holds for n>>1 updates not just 2 since for small α and αn<<1 the following holds:
(1 - α)n ≈ 1 - αn + n(n-1)α2/2! - n(n-1)(n-2)α3/3! + ... [binomial expansion]
(1 - α)n ≈ 1 - αn + n2α2/2! - n3α3/3! + ... [n>>1]
(1 - α)n ≈ 1 - αn [αn<<1]
Thus, To make the update independent of batch size n, we will specify α* (independent of batch_size) and we will use α in the update step where:
α = α* · n
This will make ema behavior roughly independent of the batch size n. Furthermore, it is not necessary to perform an update at every iteration. If we perform the update every k iterations, effectively we do an update after seeing n·k examples, and thus can use:
α = α* · n · k
Finally, to normalize by schedule length, we set:
α = α* · n · k / m
Where m = #epochs. Empirically we find using α set this way allows for using a fairly constant α across schedule lengths without needing to carefully tune α for each schedule length. The logic isn’t exactly equivalent for this step, this is more that your “history” is proportional across runs w different epoch length. [Note: need to make this explanation more precise.]
from pycls.
formatting got lost in my previous post. here's a screenshot of my note with formatting preserved.
from pycls.
Thanks @pdollar , I understand how the magic code works now. It's soooooo kind of you : )
BTW, I want to discuss this issue a bit more.
In my opinion, if your total number of images in training process is not changed,
#iter = #epoch * #image per epoch / bachsize
so value of batch_size / #epoch
could be treated as k / #iters
, where k is a constant number depends on your dataset and k could be absorbed into alpha.
Maybe adjust = update_period / total_iters
is more intuitive? WDYT ?
from pycls.
Hey thanks for digging in deeper! I don't think I have time to adjust this or think more deeply, and we're already using this way of defining EMA for many models we have trained. I find it works really well, but more importantly, I wouldn't want to break backward compatibility at this stage even if the result was more intuitive! Thanks for the discussion/suggestions tho.
from pycls.
Related Issues (20)
- AttributeError: module 'signal' has no attribute 'SIGUSR1' HOT 3
- RuntimeError: Invalid process group specified HOT 2
- ERROR: Unexpected segmentation fault encountered in worker. HOT 1
- about norm2d_cx function
- Thanks for the job, i wonder how to support multi-machine training on this.
- FP16 Support? HOT 1
- Unable to reproduce results on TPUs
- Unable to reproduce results of RegNets in TensorFlow HOT 2
- How to train the net using slurm?
- How to Pick Best Model in RegNetX?
- how charge classes number?
- url download error HOT 1
- Integration of other data types
- Mismatch depth of resnet in model.py HOT 1
- 《Fast and Accurate Model Scaling》 reproducte for EfficientNet HOT 4
- is there is a plan to release RegNetZ model weights?
- RuntimeError: Cannot re-initialize CUDA in forked subprocess. HOT 1
- How to train own data with pre-trained model
- Use of opencv-python-headless
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pycls.