- Progress & Compress: A scalable framework for continual learning [arxiv] & [notes]
- Playing hard exploration games by watching YouTube [arxiv] & [notes]
- DORA The Explorer: Directed Outreaching Reinforcement Action-Selection [arxiv] & [notes]
- Gotta Learn Fast: A New Benchmark for Generalization in RL [arxiv] & [notes]
- An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling [arxiv] & [notes]
- Generative Multi-Agent Behavioral Cloning [arxiv] & [notes]
- World Models [arxiv] & [notes]
- Semi-parametric Topological Memory for Navigation [arxiv] & [notes]
- A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay [arxiv] & [notes]
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor [arxiv] & [notes]