The goal of we train a agent is that we want the agent to play the game by itself until can find an appropriate way, i.e. a good policy, to play the game well.
Agent must find a good policy to survive by some algorithms or models. Basically, this is a control and optimization problem. We need control the agent so that it can consecutively optimize its algorithm until it acquires the best returns.
- Environments (envs)
- Policy
- Reward signal
- Value function
- Model
- ...
- Dynamic Programming
- Monte Carlo Methods
- Temporal-Difference learning
- ...
- Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA.
- Mnih, V., Kavukcuoglu, K., Silver, D. et al. Human-level control through deep reinforcement learning. Nature 518, 529โ533 (2015).
- Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M.A. (2013). Playing Atari with Deep Reinforcement Learning. ArXiv, abs/1312.5602.