Coder Social home page Coder Social logo

zuoxingdong / lagom Goto Github PK

View Code? Open in Web Editor NEW
375.0 15.0 30.0 98.22 MB

lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.

License: MIT License

Python 15.07% Shell 0.17% Jupyter Notebook 84.76%
reinforcement-learning pytorch machine-learning python research deep-learning artificial-intelligence policy-gradient evolution-strategies deep-reinforcement-learning deep-deterministic-policy-gradient ddpg td3 soft-actor-critic mujoco proximal-policy-optimization ppo cem cmaes sac

lagom's Introduction

lagom

A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.

lagom is a 'magic' word in Swedish, inte för mycket och inte för lite, enkelhet är bäst (not too much and not too little, simplicity is often the best). It is the philosophy on which this library was designed.

Why to use lagom ?

lagom balances between the flexibility and the usability when developing reinforcement learning (RL) algorithms. The library is built on top of PyTorch and provides modular tools to quickly prototype RL algorithms. However, it does not go overboard, because too low level is often time consuming and prone to potential bugs, while too high level degrades the flexibility which makes it difficult to try out some crazy ideas fast.

We are continuously making lagom more 'self-contained' to set up and run experiments quickly. It internally supports base classes for multiprocessing (master-worker framework) for parallelization (e.g. experiments and evolution strategies). It also supports hyperparameter search by defining configurations either as grid search or random search.

Table of Contents

Installation

We highly recommand using an Miniconda environment:

conda create -n lagom python=3.7

Install dependencies

pip install -r requirements.txt

We also provide some bash scripts in scripts/ directory to automatically set up the system configurations, conda environment and dependencies.

Install lagom from source

git clone https://github.com/zuoxingdong/lagom.git
cd lagom
pip install -e .

Installing from source allows to flexibly modify and adapt the code as you pleased, this is very convenient for research purpose.

Documentation

The documentation hosted by ReadTheDocs is available online at http://lagom.readthedocs.io

RL Baselines

We implemented a collection of standard reinforcement learning algorithms at baselines using lagom.

How to use lagom

A common pipeline to use lagom can be done as following:

  1. Define your RL agent
  2. Define your environment
  3. Define your engine for training and evaluating the agent in the environment.
  4. Define your Configurations for hyperparameter search
  5. Define run(config, seed, device) for your experiment pipeline
  6. Call run_experiment(run, config, seeds, num_worker) to parallelize your experiments

A graphical illustration is coming soon.

Examples

We provide a few simple examples.

Test

We are using pytest for tests. Feel free to run via

pytest test -v

What's new

  • 2019-03-04 (v0.0.3)

    • Much easier and cleaner APIs
  • 2018-11-04 (v0.0.2)

    • More high-level API designs
    • More unit tests
  • 2018-09-20 (v0.0.1)

    • Initial release

Reference

This repo is inspired by OpenAI Gym, OpenAI baselines, OpenAI Spinning Up

Please use this bibtex if you want to cite this repository in your publications:

@misc{lagom,
      author = {Zuo, Xingdong},
      title = {lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms},
      year = {2018},
      publisher = {GitHub},
      journal = {GitHub repository},
      howpublished = {\url{https://github.com/zuoxingdong/lagom}},
    }

lagom's People

Contributors

zuoxingdong avatar mitch354 avatar lkylych avatar

Stargazers

Dimitrios Kapetanios avatar  avatar Krzysztof Joachimiak avatar Jack avatar  avatar Eddie Zhang avatar Henry Lao avatar Rina Cheng avatar SIV4K avatar  avatar  avatar  avatar  avatar Rish avatar  avatar Liang7 avatar Muddy Kipper avatar Li Zeng avatar  avatar jc avatar Michael Corrado avatar MP avatar Alex Lampe avatar  avatar Robot Learning avatar Ugurkan Ates avatar Evgeniy avatar Bakhtiyar Syed avatar  avatar Andrew Szot avatar Lei Wang avatar Philip Adzanoukpe avatar Jet New avatar Mateus Gonçalves  avatar daiwk avatar zhangwei hong avatar  avatar  avatar Mohit avatar Ryan McAndrews avatar  avatar Connor Monahan avatar ROHITHA DASSANAYAKE avatar Jody 3t avatar Mingjian Lu avatar  avatar Amr Kayid avatar Panagiotis Tigas avatar Kei Ohta avatar Xiaojian Ma avatar Tucker Wash avatar Jimmie avatar Saran Ahluwalia avatar justiceli avatar Adil Zouitine avatar iamin avatar huiqing avatar Mohammad Reza Taesiri avatar XS avatar Carlo Lucibello avatar Nantas Nardelli avatar Bogdan Mazoure avatar ManKit Pong avatar Vishal Belsare avatar Lekton avatar  avatar Gong Linyuan avatar Tianhong Dai avatar Qiwei Ye avatar Aleksandr Belskikh avatar Erick Tornero avatar Smrutiranjan Sahu avatar  avatar Anthony Platanios avatar Bharathan Balaji avatar Martin Renold avatar Tegmark avatar Saransh Karira avatar  avatar  avatar  avatar Rajaswa Patil avatar Denis Perevalov avatar Roman Ishchenko avatar Rang Meng(孟让) avatar Eueung Mulyana avatar Aaron Wu avatar Felix Lau avatar lp avatar Aakash Maroti avatar  avatar Eli Belash avatar Lukas Jendele avatar Amanpreet Singh avatar Marian Andrecki avatar Dileep Kishore avatar Jerry Zhi-Yang He avatar Tushar Jain avatar Tristan Deleu avatar  avatar

Watchers

LorinC avatar  avatar  avatar  avatar justiceli avatar  avatar Sjoerd van Steenkiste avatar Bogdan Mazoure avatar JelleyCat avatar lanye avatar  avatar  avatar Jefferson Hernandez avatar chaipat nengcomma avatar  avatar

lagom's Issues

Refactoring: top priority

  • agents
  • engine
  • envs
  • es
  • experiment
  • multiprocessing
  • networks
  • policies
  • runner
  • transform
  • utils
  • vis
  • lagom
  • example/es
  • example/mdn
  • example/vae
  • example/pg

HistoryMetric

For both trajectory and segment, computing TD, GAE or something, extendable

Add master/worker for PyTorch multiprocessing

TODO: PyTorch variant, maybe unnecessary for RL ?

from torch.multiprocessing import Process
from torch.multiprocessing import Queue
# SimpleQueue sometimes better, it does not use additional threads
from torch.multiprocessing import SimpleQueue

Simplify PG code

  • experiment.py: Only support train.timestep, remove count(), use while loop

Add OpenAI-ES

import numpy as np

import torch
import torch.optim as optim

from lagom.es import BaseES

from lagom.transform import RankTransform


class OpenAIES(BaseES):
    r"""Implements OpenAI evolution strategies.
    
    .. note::
    
        In practice, the learning rate is better to be proportional to the batch size.
        i.e. for larger batch size, use larger learning rate and vise versa. 
        
    """
    def __init__(self, 
                 mu0, 
                 std0, 
                 popsize,
                 std_decay=0.999, 
                 min_std=0.01,
                 lr=1e-3, 
                 lr_decay=0.9999, 
                 min_lr=1e-2, 
                 antithetic=False,
                 rank_transform=True):
        r"""Initialize OpenAI-ES. 
        
        Args:
            mu0 (ndarray): initial mean
            std0 (float): initial standard deviation
            popsize (int): population size
            std_decay (float): standard deviation decay
            min_std (float): minimum of standard deviation
            lr (float): learning rate
            lr_decay (float): learning rate decay
            min_lr (float): minumum of learning rate
            antithetic (bool): If True, then use antithetic sampling to generate population.
            rank_transform (bool): If True, then use rank transformation of fitness (combat with outliers). 
        """
        self.mu0 = np.array(mu0)
        self.std0 = std0
        self.popsize = popsize
        self.std_decay = std_decay
        self.min_std = min_std
        self.lr = lr
        self.lr_decay = lr_decay
        self.min_lr = min_lr
        self.antithetic = antithetic
        if self.antithetic:
            assert self.popsize % 2 == 0, 'popsize must be even for antithetic sampling. '
        self.rank_transform = rank_transform
        if self.rank_transform:
            self.rank_transformer = RankTransform()
        
        self.num_params = self.mu0.size
        self.mu = torch.from_numpy(self.mu0).float()
        self.mu.requires_grad = True  # requires gradient for optimizer to update
        self.std = self.std0
        self.optimizer = optim.Adam([self.mu], lr=self.lr)
        self.lr_scheduler = optim.lr_scheduler.ExponentialLR(optimizer=self.optimizer, 
                                                             gamma=self.lr_decay)
        
        self.solutions = None
        self.best_param = None
        self.best_f_val = None
        self.hist_best_param = None
        self.hist_best_f_val = None
    
    def ask(self):
        # Generate standard Gaussian noise for perturbating model parameters. 
        if self.antithetic:  # antithetic sampling
            eps = np.random.randn(self.popsize//2, self.num_params)
            eps = np.concatenate([eps, -eps], axis=0)
        else:
            eps = np.random.randn(self.popsize, self.num_params)
        # Record the noise for gradient computation in tell()
        self.eps = eps
        
        # Perturbate the parameters
        self.solutions = self.mu.detach().numpy() + self.eps*self.std
        
        return list(self.solutions)
        
    def tell(self, solutions, function_values):
        # Enforce ndarray of function values
        function_values = np.array(function_values)
        if self.rank_transform:
            # Make a copy of original function values, for recording true values
            original_function_values = np.copy(function_values)
            # Use centered ranks instead of raw values, combat with outliers. 
            function_values = self.rank_transformer(function_values, centered=True)
            
        # Sort function values and select the minimum, since we are minimizing the objective. 
        idx = np.argsort(function_values)[0]  # argsort is in ascending order
        self.best_param = solutions[idx]
        if self.rank_transform:  # use rank transform, we should record the original function values
            self.best_f_val = original_function_values[idx]
        else:
            self.best_f_val = function_values[idx]
        # Update the historical best result
        first_iteration = self.hist_best_param is None or self.hist_best_f_val is None
        if first_iteration or self.best_f_val < self.hist_best_f_val:
            self.hist_best_f_val = self.best_f_val
            self.hist_best_param = self.best_param
            
        # Compute gradient from original paper
        # Enforce fitness as Gaussian distributed, here we use centered ranks
        F = (function_values - function_values.mean(-1))/(function_values.std(-1) + 1e-8)
        # Compute gradient, F:[popsize], eps: [popsize, num_params]
        grad = (1/self.std)*np.mean(np.expand_dims(F, 1)*self.eps, axis=0)
        grad = torch.from_numpy(grad).float()
        self.mu.grad = grad
        self.lr_scheduler.step()
        self.optimizer.step()
        
        if self.std > self.min_std:
            self.std = self.std_decay*self.std
        
    @property
    def result(self):
        results = {'best_param': self.best_param, 
                   'best_f_val': self.best_f_val, 
                   'hist_best_param': self.hist_best_param, 
                   'hist_best_f_val': self.hist_best_f_val,
                   'stds': self.std}
        
        return results

ortho_init doesn't work with PyTorch 0.4.1

When trying to run VAE example got this:

Traceback (most recent call last):
  File "/Users/dkorduban/.pyenv/versions/3.6.1/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/Users/dkorduban/.pyenv/versions/3.6.1/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/dkorduban/workspace/sc2/lagom/lagom/core/multiprocessing/base_worker.py", line 47, in __call__
    task_id, result = self.work(master_cmd)
  File "/Users/dkorduban/workspace/sc2/lagom/lagom/experiment/base_experiment_worker.py", line 55, in work
    result = algo(config, seed, device_str=device_str)
  File "/Users/dkorduban/workspace/sc2/lagom/examples/vae/algo.py", line 58, in __call__
    model = ConvVAE(config=config)
  File "/Users/dkorduban/workspace/sc2/lagom/lagom/core/networks/base_network.py", line 66, in __init__
    self.init_params(self.config)
  File "/Users/dkorduban/workspace/sc2/lagom/examples/vae/network.py", line 107, in init_params
    ortho_init(layer, nonlinearity='relu', constant_bias=0.0)
  File "/Users/dkorduban/workspace/sc2/lagom/lagom/core/networks/init.py", line 34, in ortho_init
    if isinstance(module, (nn.RNNBase, nn.RNNCellBase)):  # RNN
AttributeError: module 'torch.nn' has no attribute 'RNNCellBase'
Dmytros-MacBook-Pro:vae dkorduban$ pip freeze | grep torch
torch==0.4.1
torchvision==0.2.1

Add CEM

Same API style to CMAES, e.g. {'popsize': 32, 'seed': 1} and also cem.result as a namedtuple

Merge Experiment and Algorithm classes, remove Algo class,

Put this API to Experiment

    def __call__(self, config, seed, device):
        r"""Run the algorithm with a configuration, a random seed and a PyTorch device.
        
        Args:
            config (dict): a dictionary of configuration items
            seed (int): a random seed to run the algorithm
            device (torch.device): a PyTorch device. 
            
        Returns
        -------
        out : object
            output of the algorithm execution. If no need to return anything, then an ``None`` should be returned. 
        """
        pass

Put `device` as argument to many classes

Requires device to the classes that might use GPU/CPU, by doing this, it is safer and easier to make sure all things are in CUDA or CPU together without having to worry about writing .to(device) in many places.

e.g. Experiment worker

Svenska

Har du bott i Sverige? Talar du svenska?

[Runner]: separate running and converting

Decoupled the sequential running maybe to collect list of data, and then another function to convert everything to batch of Trajectory or Segment. This might make API more generic and flexible.

Add TensorEnvWrapper

  • Output Observation: convert to tensor and device
  • Input Action: convert to numpy array from tensor

Roadmap 0.0.2

Here we list some todos for next release of 0.0.2

  • Global make_env and return EnvSpec. Similar to OpenAI baselines, handle all kinds of environments and make use of functiontools.partial to return argument-free functions

  • Support most of function in Network to Policy/Agent class:

    1. to(device)
    2. num_params
    3. train()/eval()
    4. Might be more networks in one policy.
    5. How to group all networks being trackable with internal methods. ModuleList ? such as num_params with all networks together.
  • New logger: avoid hierarchical structure of mixture of list, dictionary and ndarray. Pickling it will be extremely slow. Keep only top level as dictionary. Add function with similar to add_tabular.

  • Where to handle dtype conversion from numpy to Tensor. Suggested in Agent.choose_action

  • Supports VecEnv

    1. StackObservation
    2. VecWrapper
    3. VecNormalize
  • Adapts all standard Agent to both single Env and VecEnv

  • Write a function to automatically split config IDs with a key

  • Write __repr__ for string representation, e.g. Transition/Segment, EnvSpec...

  • Add GAE to Trajectory and Segment

  • Add non-rolling VecEnv, returning zero for terminated sub-environments, and update TrajectoryRunner to make it more efficient, remove argument N, only with T.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.