A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.
lagom is a 'magic' word in Swedish, inte för mycket och inte för lite, enkelhet är bäst (not too much and not too little, simplicity is often the best). It is the philosophy on which this library was designed.
lagom
balances between the flexibility and the usability when developing reinforcement learning (RL) algorithms. The library is built on top of PyTorch and provides modular tools to quickly prototype RL algorithms. However, it does not go overboard, because too low level is often time consuming and prone to potential bugs, while too high level degrades the flexibility which makes it difficult to try out some crazy ideas fast.
We are continuously making lagom
more 'self-contained' to set up and run experiments quickly. It internally supports base classes for multiprocessing (master-worker framework) for parallelization (e.g. experiments and evolution strategies). It also supports hyperparameter search by defining configurations either as grid search or random search.
Table of Contents
We highly recommand using an Miniconda environment:
conda create -n lagom python=3.7
pip install -r requirements.txt
We also provide some bash scripts in scripts/ directory to automatically set up the system configurations, conda environment and dependencies.
git clone https://github.com/zuoxingdong/lagom.git
cd lagom
pip install -e .
Installing from source allows to flexibly modify and adapt the code as you pleased, this is very convenient for research purpose.
The documentation hosted by ReadTheDocs is available online at http://lagom.readthedocs.io
We implemented a collection of standard reinforcement learning algorithms at baselines using lagom.
A common pipeline to use lagom
can be done as following:
- Define your RL agent
- Define your environment
- Define your engine for training and evaluating the agent in the environment.
- Define your Configurations for hyperparameter search
- Define
run(config, seed, device)
for your experiment pipeline - Call
run_experiment(run, config, seeds, num_worker)
to parallelize your experiments
A graphical illustration is coming soon.
We provide a few simple examples.
We are using pytest for tests. Feel free to run via
pytest test -v
-
2019-03-04 (v0.0.3)
- Much easier and cleaner APIs
-
2018-11-04 (v0.0.2)
- More high-level API designs
- More unit tests
-
2018-09-20 (v0.0.1)
- Initial release
This repo is inspired by OpenAI Gym, OpenAI baselines, OpenAI Spinning Up
Please use this bibtex if you want to cite this repository in your publications:
@misc{lagom,
author = {Zuo, Xingdong},
title = {lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms},
year = {2018},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/zuoxingdong/lagom}},
}
lagom's People
Forkers
trendingtechnology ml-lab cclauss mitch354 lkylych hyzcn muharremokutan shubhampachori12110095 shafiahmed ruotianluo robot-ai-machinelearning dkorduban wolegechu lewiskit vin136 johny-c teaganli stanfordvl russell0 vishalbelsare lorinchen jlqzzz wook133 mikepsn jtaets rl-code-lib trellixvulnteam ahmad-abdellatif notthatanonymouslagom's Issues
Refactoring: top priority
- agents
- engine
- envs
- es
- experiment
- multiprocessing
- networks
- policies
- runner
- transform
- utils
- vis
- lagom
- example/es
- example/mdn
- example/vae
- example/pg
Move all subfolders within core under lagom directly
Update LinearSchedule
Support get_current
Add tests for VecStandardizeObservation and VecStandardizeReward
Update example: logging add config ID and seed, easier for DataFrame conversion in plotting; Also add number of params (trainable, untrainable)
[Examples/MDN]: refactor code to standard pipeline with .py files
Refactor MDN code to use lagom pipelines with .py files with Experiment
, Engine
etc.
- Change
last_dim
tolast_feature_dim
to be consistent with other networks and policies.
Update BaseAgent: add save/load
Support state-dependent DiagGaussianHead ?
Add MPI master/worker: consistent API with BaseMaster and BaseWorker
HistoryMetric
For both trajectory and segment, computing TD, GAE or something, extendable
put value_functions and policies to networks folder
Use Miniconda, change scripts adapted from CI
Plotter: add `tight_layout`
Refactor codes to minimize obvious comments (distraction) complying with PEP8
Add master/worker for PyTorch multiprocessing
TODO: PyTorch variant, maybe unnecessary for RL ?
from torch.multiprocessing import Process
from torch.multiprocessing import Queue
# SimpleQueue sometimes better, it does not use additional threads
from torch.multiprocessing import SimpleQueue
Simplify PG code
- experiment.py: Only support
train.timestep
, removecount()
, use while loop
Update logger: put '-'*50 into arg, maybe called header ?
Put ask_yes_or_no from experiment to utils
Remove Intel Coach in README
Remove BaseLogger, just Logger
Put env and vec_env in same place, and wrapper in envs/wrappers
Polish code in lagom.vis and write tests
Update Engine: Merge train and log_train, eval and log_eval into just train and eval
All lagom.transform: vectorized code to support batch
Add tqdm bar to parallelized experiment
Experiment master creates a tqdm bar with total size of configurations, call update
for each process_algo_result
after it finishes its job.
Remove A2C and replace with IMPALA
Add OpenAI-ES
import numpy as np
import torch
import torch.optim as optim
from lagom.es import BaseES
from lagom.transform import RankTransform
class OpenAIES(BaseES):
r"""Implements OpenAI evolution strategies.
.. note::
In practice, the learning rate is better to be proportional to the batch size.
i.e. for larger batch size, use larger learning rate and vise versa.
"""
def __init__(self,
mu0,
std0,
popsize,
std_decay=0.999,
min_std=0.01,
lr=1e-3,
lr_decay=0.9999,
min_lr=1e-2,
antithetic=False,
rank_transform=True):
r"""Initialize OpenAI-ES.
Args:
mu0 (ndarray): initial mean
std0 (float): initial standard deviation
popsize (int): population size
std_decay (float): standard deviation decay
min_std (float): minimum of standard deviation
lr (float): learning rate
lr_decay (float): learning rate decay
min_lr (float): minumum of learning rate
antithetic (bool): If True, then use antithetic sampling to generate population.
rank_transform (bool): If True, then use rank transformation of fitness (combat with outliers).
"""
self.mu0 = np.array(mu0)
self.std0 = std0
self.popsize = popsize
self.std_decay = std_decay
self.min_std = min_std
self.lr = lr
self.lr_decay = lr_decay
self.min_lr = min_lr
self.antithetic = antithetic
if self.antithetic:
assert self.popsize % 2 == 0, 'popsize must be even for antithetic sampling. '
self.rank_transform = rank_transform
if self.rank_transform:
self.rank_transformer = RankTransform()
self.num_params = self.mu0.size
self.mu = torch.from_numpy(self.mu0).float()
self.mu.requires_grad = True # requires gradient for optimizer to update
self.std = self.std0
self.optimizer = optim.Adam([self.mu], lr=self.lr)
self.lr_scheduler = optim.lr_scheduler.ExponentialLR(optimizer=self.optimizer,
gamma=self.lr_decay)
self.solutions = None
self.best_param = None
self.best_f_val = None
self.hist_best_param = None
self.hist_best_f_val = None
def ask(self):
# Generate standard Gaussian noise for perturbating model parameters.
if self.antithetic: # antithetic sampling
eps = np.random.randn(self.popsize//2, self.num_params)
eps = np.concatenate([eps, -eps], axis=0)
else:
eps = np.random.randn(self.popsize, self.num_params)
# Record the noise for gradient computation in tell()
self.eps = eps
# Perturbate the parameters
self.solutions = self.mu.detach().numpy() + self.eps*self.std
return list(self.solutions)
def tell(self, solutions, function_values):
# Enforce ndarray of function values
function_values = np.array(function_values)
if self.rank_transform:
# Make a copy of original function values, for recording true values
original_function_values = np.copy(function_values)
# Use centered ranks instead of raw values, combat with outliers.
function_values = self.rank_transformer(function_values, centered=True)
# Sort function values and select the minimum, since we are minimizing the objective.
idx = np.argsort(function_values)[0] # argsort is in ascending order
self.best_param = solutions[idx]
if self.rank_transform: # use rank transform, we should record the original function values
self.best_f_val = original_function_values[idx]
else:
self.best_f_val = function_values[idx]
# Update the historical best result
first_iteration = self.hist_best_param is None or self.hist_best_f_val is None
if first_iteration or self.best_f_val < self.hist_best_f_val:
self.hist_best_f_val = self.best_f_val
self.hist_best_param = self.best_param
# Compute gradient from original paper
# Enforce fitness as Gaussian distributed, here we use centered ranks
F = (function_values - function_values.mean(-1))/(function_values.std(-1) + 1e-8)
# Compute gradient, F:[popsize], eps: [popsize, num_params]
grad = (1/self.std)*np.mean(np.expand_dims(F, 1)*self.eps, axis=0)
grad = torch.from_numpy(grad).float()
self.mu.grad = grad
self.lr_scheduler.step()
self.optimizer.step()
if self.std > self.min_std:
self.std = self.std_decay*self.std
@property
def result(self):
results = {'best_param': self.best_param,
'best_f_val': self.best_f_val,
'hist_best_param': self.hist_best_param,
'hist_best_f_val': self.hist_best_f_val,
'stds': self.std}
return results
Separate calculations e.g. TD, returns from Trajectory/Segment
Exhaustive unit test with pytest.mark.parametrize
ortho_init doesn't work with PyTorch 0.4.1
When trying to run VAE example got this:
Traceback (most recent call last):
File "/Users/dkorduban/.pyenv/versions/3.6.1/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/Users/dkorduban/.pyenv/versions/3.6.1/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/Users/dkorduban/workspace/sc2/lagom/lagom/core/multiprocessing/base_worker.py", line 47, in __call__
task_id, result = self.work(master_cmd)
File "/Users/dkorduban/workspace/sc2/lagom/lagom/experiment/base_experiment_worker.py", line 55, in work
result = algo(config, seed, device_str=device_str)
File "/Users/dkorduban/workspace/sc2/lagom/examples/vae/algo.py", line 58, in __call__
model = ConvVAE(config=config)
File "/Users/dkorduban/workspace/sc2/lagom/lagom/core/networks/base_network.py", line 66, in __init__
self.init_params(self.config)
File "/Users/dkorduban/workspace/sc2/lagom/examples/vae/network.py", line 107, in init_params
ortho_init(layer, nonlinearity='relu', constant_bias=0.0)
File "/Users/dkorduban/workspace/sc2/lagom/lagom/core/networks/init.py", line 34, in ortho_init
if isinstance(module, (nn.RNNBase, nn.RNNCellBase)): # RNN
AttributeError: module 'torch.nn' has no attribute 'RNNCellBase'
Dmytros-MacBook-Pro:vae dkorduban$ pip freeze | grep torch
torch==0.4.1
torchvision==0.2.1
Update GaussianPolicy: support min/max bound of std, use trick from
Use tricks from https://arxiv.org/pdf/1805.12114.pdf
Protect master branch
Add CEM
Same API style to CMAES, e.g. {'popsize': 32, 'seed': 1}
and also cem.result
as a namedtuple
Refactor Network & Module: maybe remove make_params/init_params/reset (?) maybe remove BasePolicy ?
Use ABC for all base classes
This enforces strict API inheritance for subclasses, better OOP management.
CNN use new `torch.no_grad` to save memory for flatten dimension calculation
Merge Experiment and Algorithm classes, remove Algo class,
Put this API to Experiment
def __call__(self, config, seed, device):
r"""Run the algorithm with a configuration, a random seed and a PyTorch device.
Args:
config (dict): a dictionary of configuration items
seed (int): a random seed to run the algorithm
device (torch.device): a PyTorch device.
Returns
-------
out : object
output of the algorithm execution. If no need to return anything, then an ``None`` should be returned.
"""
pass
Put `device` as argument to many classes
Requires device
to the classes that might use GPU/CPU, by doing this, it is safer and easier to make sure all things are in CUDA or CPU together without having to worry about writing .to(device)
in many places.
e.g. Experiment worker
Update Configurator: remove dataframe_groupview & dataframe_subset
Re-implement with new TimeLimit concerns ?
According to the paper, Time Limit in RL, maybe it's better to adapt implementation of metrics
and PG algorithms ?
Take out agent & env out of Runner, put agent and env into __call__ argument
Policy class inherit from nn.Module
Svenska
Har du bott i Sverige? Talar du svenska?
[Runner]: separate running and converting
Decoupled the sequential running maybe to collect list of data, and then another function to convert everything to batch of Trajectory or Segment. This might make API more generic and flexible.
TODO files:
Add TensorEnvWrapper
- Output Observation: convert to tensor and device
- Input Action: convert to numpy array from tensor
Agent handles optimizer and lr_scheduler inside class
Add __getitem__ to Trajectory
Upgrade ReadTheDocs YAML
# .readthedocs.yml
version: 2
python:
version: 3.7
install:
- method: pip
path: .
Roadmap 0.0.2
Here we list some todos for next release of 0.0.2
-
Global
make_env
and return EnvSpec. Similar to OpenAI baselines, handle all kinds of environments and make use of functiontools.partial to return argument-free functions -
Support most of function in Network to Policy/Agent class:
to(device)
num_params
train()/eval()
- Might be more networks in one policy.
- How to group all networks being trackable with internal methods. ModuleList ? such as
num_params
with all networks together.
-
New logger: avoid hierarchical structure of mixture of list, dictionary and ndarray. Pickling it will be extremely slow. Keep only top level as dictionary. Add function with similar to add_tabular.
-
Where to handle dtype conversion from numpy to Tensor. Suggested in
Agent.choose_action
-
Supports VecEnv
- StackObservation
- VecWrapper
- VecNormalize
-
Adapts all standard Agent to both single Env and VecEnv
-
Write a function to automatically split config IDs with a key
-
Write
__repr__
for string representation, e.g. Transition/Segment, EnvSpec... -
Add GAE to Trajectory and Segment
-
Add non-rolling VecEnv, returning zero for terminated sub-environments, and update TrajectoryRunner to make it more efficient, remove argument N, only with T.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.