Coder Social home page Coder Social logo

diddiz / donk.ai Goto Github PK

View Code? Open in Web Editor NEW
1.0 4.0 0.0 1.01 MB

Reinforcement Learning Toolbox

License: MIT License

Python 99.09% Dockerfile 0.91%
reinforcement-learning reinforcement-learning-algorithms machine-learning industrial industry-40 guided-policy-search

donk.ai's Introduction

Tests

donk.ai logo

Guided Policy Search Toolbox

This repository provides an implementation of the Guided Policy Search (GPS) algorithm for trajectroy-centric Reinforcement Learning.

Guided Policy Search

TODO

Features

LQR

The linear–quadratic regulator (LQR) computes cost optimal linerar controllers.

from donk.dynamics import linear_dynamics
from donk.samples import StateDistribution
from donk.traj_opt import ILQR

X, U = ... # Samples
cost_function = ... # Define cost function

# Fit dynamics
dyn = linear_dynamics.fit_lr(X, U)
# Quadratic costs
costs = cost_function.quadratic_approximation(np.mean(X, axis=0), np.mean(U, axis=0))
# Initial state distribution
x0 = StateDistribution.fit(X[:, 0])

# Perform LQR trajectory optimization
ilqr = ILQR(dyn, prev_pol, costs, x0)
pol = ilqr.optimize(kl_step=1).policy # Use a KL constraint of 1

Quadratic Cost Approximation

LQR trajectory optimization requires a quadratic cost function. Donk.ai can automatically compute quadratic approximations using symbolic differentiation via sympy.

from donk.costs import SymbolicCostFunction

def cost_function_(X, U):
    """Define cost function.

    Args:
        X: (T+1, dX) ndarray of symbols for states
        U: (T, dU) ndarray of symbols for action
    """

    # Cost depends on the last three states
    costs = np.sum(X[:, -3:]**2, axis=-1) / 2

    # And some small action costs
    costs[:-1] += 1e-3 * np.sum(U**2, axis=-1) / 2
    return costs

# Wrap into symbolic cost function, which handles symbolic differentation.
cost_function = SymbolicCostFunction(cost_function_, T, dX, dU)

# Quadratic approxmation around a specific trajectory.
cost_approx = cost_function.quadratic_approximation(X, U)

As cost_function_ is called with sympy symbols,

TVLG Dynamics

A time-varying linear Gaussian (TVGL) dynamics model dyn: LinearDynamics consists of a linear term dyn.K, a constant term dyn.k and a covariance dyn.covar. The transitions are modelled as a series of Gaussian distributions

where the distribution of next state depends on current state and action.

Linear dynamics models can be fitted with or without a prior distribution.

Linear Regression

from donk.dynamics import linear_dynamics

X, U = ... # Samples


dyn = linear_dynamics.fit_lr(X, U, regularization=1e-6)

It is important to employ a regularization term when using no prior, as otherwise the resulting covariances might not be symmetric positive definite.

Linear Regression with NIW-Prior

from donk.dynamics import linear_dynamics
from donk.dynamics.prior import GMMPrior
from donk.samples import TransitionPool

X, U = ... # Samples
X_prior, U_prior = ... # Data for prior

# Initialize transition pool
transition_pool = TransitionPool()
transitions_pool.add(X_prior, X_prior)
transitions_pool.add(Y, Y) # Also add current sample data

# Update prior
prior = GMMPrior(n_clusters=4)
prior.update(transitions_pool.get_transitions())

# Fit dynamics
dyn = linear_dynamics.fit_lr(X, U, prior=prior)

Sufficient size of the prior assumed, regularization of the dynamics is not required.

Datalogging

Inside into internals and interim results of sub-algorithms is often crucial for understanding and debugging. Therfore, donk.ai provides a context-based datalogging functionality, using Python shelves.

Example

import donk.datalogging as datalogging


def inner_fun(a, b):
    "Inner function with datalogging"
    c = a + b
    datalogging.log(c=c) # Log interim result
    return c**2

with datalogging.ShelveDataLogger("logfile"):
    a = 1
    b = 2

    datalogging.log(a=a, b=b)

    # Call function within context
    with datalogging.Context("inner_fun"):
        d = inner_fun(a, b)
    datalogging.log(d=d)

    # Call function without context
    inner_fun(a, b)

Read shelves

The resulting shelve contains the data and can be opened via the shelve library.

import shelve

with shelve.open('logfile') as data:
    for key, value in data.items():
        print(f"{key}: {value}")

results in:

a: 1
b: 2
inner_fun/c: 3
d: 9
c: 3

See also

donk.ai's People

Contributors

diddiz avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

donk.ai's Issues

Write Readme

Flesh out readme.

  • GPS
  • LQR
  • Quadratic Cost Approximation
  • TVLG Dynamics
  • Datalogging

Abstract GPS algorithm class

Introduce abstract class for GPS algorithm.
Imperative algorithm definition works, but is not beginner friendly.

Is an abstract class the most flexible apporach?

Add automatic sample normalization

Manual state normalization is ugly at the moment.
State normalization is highly desired, though.
Therefore implement functionality for automatic sample normalization.

Refactor timesteps in cost functions

Cost functions should only have T timesteps.
The initial state should not incur costs, as it can't be influenced.

Therefore, cost at timesteps t should depend on action at t and next state at t+1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.