Hi! Thanks for sharing your inspiring work. I have some questions about the scale factors for the losses in the paper.
In your paper, you set $\lambda_{lce}^i=\frac{1+1/\alpha}{\beta} \lambda_{lce}, \lambda_{hkd}^i=\alpha \beta \lambda_{hkd}, \lambda_{rkd}^i=\alpha \beta \lambda_{rkd}$. It's a little bit confusing for me. Because in my opinion, when the number of new classes increases, $\alpha$ becomes bigger and it should take more attention to the $L_{lce}$ loss. But on contrary, it takes more attention to the distillation loss to maintain the old representation space. I want to know if I was wrong about it. And if not, are there more intuitions about these settings?