Hi, thanks for your wonderful repo. In your code of upda

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

question about ema alpha setting about pycls HOT 4 CLOSED

FateScript commented on July 29, 2024

question about ema alpha setting

from pycls.

Comments (4)

pdollar commented on July 29, 2024 4

Hi @FateScript , yeah good point. We never published this. It's some math I sketched out. I'll share my math below with the caveat that I haven't been very careful to verify its full correctness, and the math may lack context (e.g., variable meanings). I sketched the math and implemented it, and it works empirically. So I'm sharing the math, and hopefully there's no embarrassing mistakes in it. If you find an embarrassing mistake in the math below, please lmk! Now, the math is lacking context, I hope you'll be able to understand it. I probably won't have time to go into more detail, but I figured I'd share in the hope that you can try to decode it and it's somewhat useful :)

Momentum formulation [α=.999]
v = α · v + (1 - α) · u

Update formulation [α=.001]:
v = (1 - α) · v + α · u

Two step update rolled into one assuming α2 ≈ 0 and setting u=(u0+u1)/2:
v1 = (1 - α) · v0 + α · u0
v2 = (1 - α) · v1 + α · u1
v2 = (1 - α) · ((1 - α) · v0 + α · u0) + α · u1
v2 = (1 - α) · ((1 - α) · v0 ) + α · u0 + α · u1
v2 = (1 - 2α + α2) · v0 + α · u0 + α · u1
v2 ≈ (1 - 2α) · v0 + 2α · u

The same holds for n>>1 updates not just 2 since for small α and αn<<1 the following holds:
(1 - α)n ≈ 1 - αn + n(n-1)α2/2! - n(n-1)(n-2)α3/3! + ... [binomial expansion]
(1 - α)n ≈ 1 - αn + n2α2/2! - n3α3/3! + ... [n>>1]
(1 - α)n ≈ 1 - αn [αn<<1]

Thus, To make the update independent of batch size n, we will specify α* (independent of batch_size) and we will use α in the update step where:
α = α* · n
This will make ema behavior roughly independent of the batch size n. Furthermore, it is not necessary to perform an update at every iteration. If we perform the update every k iterations, effectively we do an update after seeing n·k examples, and thus can use:
α = α* · n · k

Finally, to normalize by schedule length, we set:
α = α* · n · k / m
Where m = #epochs. Empirically we find using α set this way allows for using a fairly constant α across schedule lengths without needing to carefully tune α for each schedule length. The logic isn’t exactly equivalent for this step, this is more that your “history” is proportional across runs w different epoch length. [Note: need to make this explanation more precise.]

from pycls.

pdollar commented on July 29, 2024 4

formatting got lost in my previous post. here's a screenshot of my note with formatting preserved.

from pycls.

FateScript commented on July 29, 2024

Thanks @pdollar , I understand how the magic code works now. It's soooooo kind of you : )

BTW, I want to discuss this issue a bit more.
In my opinion, if your total number of images in training process is not changed,
#iter = #epoch * #image per epoch / bachsize
so value of batch_size / #epoch could be treated as k / #iters, where k is a constant number depends on your dataset and k could be absorbed into alpha.
Maybe adjust = update_period / total_iters is more intuitive? WDYT ?

from pycls.

pdollar commented on July 29, 2024

Hey thanks for digging in deeper! I don't think I have time to adjust this or think more deeply, and we're already using this way of defining EMA for many models we have trained. I find it works really well, but more importantly, I wouldn't want to break backward compatibility at this stage even if the result was more intuitive! Thanks for the discussion/suggestions tho.

from pycls.

question about ema alpha setting about pycls HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent