Coder Social home page Coder Social logo

differential-training's Introduction

Difference Critic

Motivation

The goal is to train a RL system that learns a difference of value functions in order to perform effectively under simulation and approximation errors. In other words, there is mis-match between simulated and target domains. This is a OpenAI Request for Research problem "Difference of Value Functions".

What's New

The main idea idea comes from a 1997 paper Differential Training of Rollout Policies, by Bertsekas. The paper introduces a technique called differential training and argues that under simulation and approximation error, learning a difference of value functions can do better than learning vanilla value functions.

Instead of learning a difference of value function as suggested by Bertsekas, in this work I introduce a variant of DDPG (Deep Deterministic Policy Gradients), which instead of learning a Q(state, action) function, learns a difference of Q function Q(state1, action1, state2, action2) which approximates the difference of expected Q-values between two state, action pairs under the current policy. We use the gradient from this function to train the policy network in DDPG.

Implementation Details

The mismatch between simulated and target domains is modeled using Mujoco agents with varying torso masses, similar to EPOpt. As in EPOpt, we train on a ensemble of robot models.

We use the Mujoco physics simulator for training on the HalfCheetah-v1 environment.

We use a Tensorflow Eager adaptation of OpenAI Baselines for Deep Deterministic Policy Gradients (DDPG) as the baseline.

This model has been ported to Tensorflow Eager, which gives us a better Pythonic expression of the model (define-by-run as opposed to define-and-run) and makes it easier to debug in many cases.

Instructions to for installation

  1. Install OpenAI Gym and Mujoco (needs a software license).
  2. Install Tensorflow from the nightly build (we need nightly builds for TF Eager unless you have >=1.5)
  3. Install pybullet
  4. Install numpy

Future work

Apply the concept of differential training to other Deep RL methods and see if this gives us benefits in the presence of simulation error.

differential-training's People

Contributors

nikhil-dev avatar

Stargazers

Buyu Zhang avatar Varun Maudgalya avatar

Watchers

 avatar paper2code - bot avatar

Forkers

dilaraaykanatt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.