The sa_ppo from huanzhang12

sa_ppo's People

Contributors

Stargazers

Watchers

sa_ppo's Issues

Version mismatch?

When I run the code I have an error.
In your code, env.reset() is expected to return one array, but it actually returns a tuple. Is it a gym version mismatch?
What gym version where you using?

Robust deep reinforcement learning with adversarial attacks.

Hi, in your paper, you compare your result with 'Robust deep reinforcement learning with adversarial attacks.'. I can't find the code. Could you provide it,thanks.

Undocumented implementation of SGLD

You provide an implementation of SGLD that is substantially different from what you describe in the article. The main differences are:

the scaling factor of the random noise is not consistent with the article. To my understanding, it should be np.sqrt(2 * beta / step_eps) instead of np.sqrt(2 * step_eps * beta)
the scheduling of the temperature by a factor 1/(i+2)^2 looks arbitrary and not explained anywhere in the code or the article.
the use of the sign of the gradient instead of its actual value. I guess this design choice is inspired from PGD (aka. multi-iteration FGSM), but it is not motivated anywhere. Your algorithm is should rather be called PGD-SGLD than plain SGLD.
the initialization of the perturbation is not zero, which is critical because it avoids the saddle point issue completely. The critical advantage of doing so has been discussed here. So I think the SGLD gradient is not even relevant anymore. Sadly, you do not motivate the choice of the initialization either.

Recommend Projects