Coder Social home page Coder Social logo

aidynamicaction / rcognita Goto Github PK

View Code? Open in Web Editor NEW
16.0 4.0 6.0 19.12 MB

rcognita is a flexibly configurable framework for agent-enviroment simulation with a menu of predictive and safe reinforcement learning controllers

License: MIT License

Python 99.31% Makefile 0.33% Batchfile 0.36%
reinforcement-learning prediction-model python simulation control-systems

rcognita's Introduction

right-aligned logo in README

About

rcognita is a flexibly configurable framework for agent-enviroment simulation with a menu of predictive and safe reinforcement learning controllers. A detailed documentation is available here.

Example run with a mobile robot simulation

https://raw.githubusercontent.com/AIDynamicAction/rcognita/d0c7d1173b51e0ed5df044cf1fb1c92eca53d819/gfx/demo/3wheel_robot_exm_run.gif

Table of content

Installation

Basic

To table of content

Run in terminal:

pip3 install rcognita

Alternatively, one can install the package direcly form the master branch. The following instruction is for Unix-based systems, assuming a terminal and Python3 interpreter.

git clone https://github.com/AIDynamicAction/rcognita
cd rcognita
python3 setup.py install

Notice that your Python 3 interpreter might be called something else, say, just python.

With model estimation tools

To table of content

The package was tested with online model estimation using SIPPY. The respective functionality is implemented and enabled via is_est_model. Related parameters can be found in the documentation of the CtrlOptPred class.

Installing dependencies

To install SIPPY, first take care of the dependencies:

Ubuntu/Debian:
sudo apt-get install -y build-essential gfortran cmake libopenblas-dev
Arch
pacman -Sy gcc gcc-fortran cmake base-devel openblas

Installing scikit-build

pip install scikit-build

or, using Anaconda,

conda install scikit-build

Installing rcognita with SIPPY

pip3 install rcognita[SIPPY]

General description

To table of content

rcognita Python package is designed for hybrid simulation of agents and environments (generally speaking, not necessarily reinforcement learning agents). Its main idea is to have an explicit implementation of sampled controls with user-defined sampling time specification. The package consists of several modules, namely, controllers, loggers, models, simulator, systems, utilities, visuals and a collection of main modules (presets) for each agent-environment configuration.

This flowchart shows interaction of the core rcognita classes contained in the said modules (the latter are not shown on the diagram).

The main module is a preset, on the flowchart a 3-wheel robot. It initializes the system (the environment), the controllers (the agents, e. g., a safe agent, a benchmarking agent, a reinforcement learning agent etc.), the visualization engine called animator, the logger and the simulator. The latter is a multi-purpose device for simulating agent-environment loops of different types (specified by sys_type).

Depending on sys_type, the environment can either be described by a differential equation (including stochastic ones), a difference equation (for discrete-time systems), or by a probability distribution (for, e.g., Markov decision processes).

The parameter dt determines the maximal step size for the numerical solver in case of differential equations. The main method of this class is sim_step which performs one solver step, whereas reset re-initializes the simulator after an episode.

The Logger class is an interface defining stubs of a print-to-console method print sim step, and print-to-file method log data row, respectively. Concrete loggers realize these methods.

A similar class inheritance scheme is used in Animator, and System. The core data of Animator’s subclasses are objects, which include entities to be updated on the screen, and their parameters stored in pars.

A concrete realization of a system interface must realize sys_dyn, which is the “right-handside” of the environment description, optionally disturbance dynamics via disturb_dyn, optionally controller dynamics (if the latter is, e.g., time-varying), and the output function out. The method receive_action gets a control action and stores it. Everything is packed together in the closed_loop_rhs for the use in Simulator.

Finally, the controllers module contains various agent types. One of them is CtrlOptPred – the class of predictive objective-optimizing agents (model-predictive control and predictive reinforcement learning) as shown in this flowchart. Notice it contains an explicit specification of the sampling time dt.

The method _critic computes a model of something related to the value, e.g., value function, Q-function or advantage. In turn, _critic_cost defines a cost (loss) function to fir the critic (commonly based on temporal errors). The method _critic_optimizer actually optimizes the critic cost. The principle is analogous with the actor, except that it optimizes an objective along a prediction horizon. The details can be found in the code documentation. The method compute_action essentially watches the internal clock and performs an action updates when a time sample has elapsed.

Auxiliary modules of the package are models and utilities which provide auxiliary functions and data structures, such as neural networks.

Usage

To table of content

After the package is installed, you may just python run one of the presets found here, say,

python3 main_3wrobot_NI.py

This will call the preset with default settings, description of which can be found in the preset accordingly.

The naming convention is main_ACRONYM, where ACRONYM is actually related to the system (environment). You may create your own by analogy.

For configuration of hyper-parameters, just call help on the required preset, say,

python3 main_3wrobot_NI.py -h

Settings

To table of content

Some key settings are described below (full description is available via -h option).

Parameter Type Description
ctrl_mode string Controller mode
dt number Controller sampling time
t1 number Final time
state_init list Initial state
is_log_data binary Flag to log data
is_visualization binary Flag to produce graphical output
is_print_sim_step binary Flag to print simulation step data
is_est_model binary If a model of the system is to be estimated online
Nactor integer Horizon length (in steps) for predictive controllers
stage_obj_struct string Structure of running objective function
Ncritic integer Critic stack size (number of TDs)
gamma number Discount factor
critic_struct string Structure of critic features
actor_struct string Structure of actor features

Advanced customization

To table of content

  • Custom environments: realize system interface in the systems module. You might need nominal controllers for that, as well as an animator, a logger etc.
  • Custom running cost: adjust rcost in controllers
  • Custom AC method: simplest way -- by adding a new mode and updating _actor_cost, _critic_cost and, possibly, _actor, _critic. For deep net AC structures, use, say, PyTorch
  • Custom model estimator: so far, the framework offers a state-space model structure. You may use any other one. In case of neural nets, use, e.g., PyTorch

Experimental things

To table of content

An interface for dynamical controllers, which can be considered as extensions of the system state vector, is provided in _ctrl_dyn of the systems module. RL is usually understood as a static controller, i.e., a one which assigns actions directly to outputs. A dynamical controller does this indirectly, via an internal state as intermediate link. ynamical controllers can overcome some limitations of static controllers.

Related literature

To table of content

Closing remarks

To table of content

Please contact me for any inquiries and don't forget to give me credit for usage of this code. If you are interested in stacked Q-learning, kindly read the paper.

Original author: P. Osinenko, 2020

Bibtex reference

@misc{rcognita2020,
author =   {Pavel Osinenko},
title =    {Rcognita: a framework for hybrid agent-enviroment simultion},
howpublished = {\url{https://github.com/AIDynamicAction/rcognita}},
year = {2020}
}

rcognita's People

Contributors

ebolotin6 avatar kefir8888 avatar kompaso avatar osinenkop avatar yaremenko8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rcognita's Issues

Sort old data

Check everything for correctness and necessity.
Split into different readable folders.
All data on kompaso's ssd.

Disturbance dynamics

Right now, the full state vector in the closed loop function of the system interface contain components related to the disturbance, even if the latter is switched off. Need case distinction as:

  1. is_disturb => dim_full_state = dim_state + dim_disturb
  2. not is_disturb => dim_full_state = dim_state

Inline RST documentation does not follow conventional practices. Info field lists should be utilized.

A great deal of code is currently documented in an unconventional fashion. To be more precise, there seems to be a tendency to use headings to describe attributes, parameters and returned values, as opposed using info field lists.

For instance consider the docstring for rcognita.controllers.ctrl_selector:

    Main interface for various controllers.
        Parameters
        ----------
        mode : : string
            Controller mode as acronym of the respective control method.
        Returns
        -------
        action : : array of shape ``[dim_input, ]``.
            Control action.

The conventional way to produce a docstring bearing such information would be:

    Main interface for various controllers.

    :param str mode: Controller mode as acronym of the respective control method.
    :return: Control action
    :rtype: array of shape ``[dim_input, ]``

Constructor arguments recorded as "attributes" in class docstrings.

__init__ should have a docstring of its own. The "Attributes" section in class dosctrings is reserved for attributes. Violating this convention really messes up the wiki. We should fix that (preferably by next release I think) and from now on proceed to document new classes conventionally.

CASADI integration

  • Completely refactor the code: separate the symbolic case and numerical case and move the code out of the controller.py
  • Create tests for CASADI integration
  • Make some benchmarks and create a table of comparison

Refactor presets in a pipeline style

There were currently implemented a framework for testing and reference data generation. To make it possible to test preset and improve readability of the code all presets were implemented using pipeline approach that appeared to be a good pattern for implementation of presets. To transfer Rcognita to the new preset implementation pattern, it is necessary to:

  • Implement a CLI-interface consistent with the pipeline
  • Refactor and implement pipelines for all presets

ROS harnesses

Create a ROS_harnesses.py module to separate a ROS preset and a ROS setting utility

Architecture refactoring

  • no switch cases inside classes, only on pipeline or configuration level
  • self.critic_clock into Critic class and, in general, all class-related field put in classes (with corresp. renaming)

Add parsing of command line arguments in presets

We need a call capability like:

python main_3wrobot_NI.py -ctrl_mode JACS -dt 0.01 ...

Required parameters:

Parameter name Values Notes
ctrl_mode string see description of methods in preset
dt number controller sampling time
t1 number final time
x0 numpy vector initial state, dimension preset-specific!

Optional parameters, set to default values unless specified otherwise:

Parameter name Values Default Description
is_log_data binary 0
is_visualization binary 1
is_print_sim_step binary 1
is_est_model binary 0 if a model of the env. is to be estimated online
model_est_stage number 1 seconds to learn model until benchmarking controller kicks in
model_est_period number 1*dt model is updated every model_est_period seconds
model_order integer 5 order of state-space estimation model
prob_noise_pow number 8 power of probing noise
uMan numpy vector zeros manual control action to be fed constant, system-specific!
Nactor integer 3 horizon length (in steps) for predictive controllers
pred_step_size number dt
buffer_size integer 10
rcost_struct string quadratic structure of running cost function
R1 numpy matrix identity matrix must have proper dimension
R2 numpy matrix identity matrix must have proper dimension
Ncritic integer 4 critic stack size (number of TDs)
gamma number 1 discount factor
critic_period number dt critic is updated every critic_period seconds
critic_struct string quad-nomix structure of critic features
actor_struct string quad-nomix structure of actor features

This needs to be reflected in the readme, as an example call of an example present. Could probably be translated from this text.

Tests framework implementation

Here are some requirements for implementation of Rcognita framework for tests.

  • A framework should be easy-to-use

  • It should be provided with comprehensive and clear instructions on how to create tests using this framework

  • It should cover all currently implemented presets

  • It should prevent code duplication

  • There should be an out-of-the-box possibility to generate a reference data for unit-tests

Implement Monte-Carlo method and pipeline

Need:

  1. System: pendulum
  2. Scenario for Monte-Carlo learning
  3. REINFORCE

Visualizer: as always (like 3wrobot), but upper left screen: pendulum and its trajectory (dotted line like 3wrobot)

Monte-Carlo scenario:

  1. loop over policy gradient updates
  2. each such update needs several episodes (former runs), so loop over episodes
  3. each episode is like the current main loop, i.e., it iterates over steps
  4. when all episodes are done, experience is used to update policy parameters

Policy must be a PDF (probability distro func). Useful policy parametrizations -- see S&B, p. 322 book.
REINFORCE algorithm can also be found there

Environment configuration is very inconvenient

The solution here is a class which has the following structure:

class abstract_config:
    def __init__(self):
        self.name = "some_agent"
    def argument_parser(self):
        pass
    def post_processing(self):
        pass
    def get_env(self):
        pass

It's very intuitive separation of command-line arguments and other arguments together with their post-processing.

trust-constr clean up

@osinenkop
Is it possible to move it

critic_opt_method = 'SLSQP'
if critic_opt_method == 'trust-constr':
critic_opt_options = {'maxiter': 200, 'disp': False} #'disp': True, 'verbose': 2}
else:
critic_opt_options = {'maxiter': 200, 'maxfev': 1500, 'disp': False, 'adaptive': True, 'xatol': 1e-7, 'fatol': 1e-7} # 'disp': True, 'verbose': 2}

outside the module?

Fix animation

Fix simulation animation to work in default Python interpreter instead of only ipython

Multiple modifications

Code cleaning and refactoring needed for the modifications done throughout Q1, Q2 of 2021, including those done for education.

This concerns:

  • actor constraints
  • new methods: JAC-stab, SQL, SQL-stab, SQL-V etc.
  • critic constraints for the respective methods, e.g., JAC-stab, SQL-stab etc.
  • new systems
  • generic main module (make to a class)
  • model estimation
  • loggers
  • ROS integration

and so on

NN model

  • Make an NN model
  • Make a torch optimizer for NN model

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.