Sort old data

About

rcognita is a flexibly configurable framework for agent-enviroment simulation with a menu of predictive and safe reinforcement learning controllers. A detailed documentation is available here.

Example run with a mobile robot simulation

Table of content

Installation
- Basic
- With model estimation tools
General description
Usage
Related literature
Closing remarks

Installation

Basic

To table of content

Run in terminal:

pip3 install rcognita

Alternatively, one can install the package direcly form the master branch. The following instruction is for Unix-based systems, assuming a terminal and Python3 interpreter.

git clone https://github.com/AIDynamicAction/rcognita
cd rcognita
python3 setup.py install

Notice that your Python 3 interpreter might be called something else, say, just python.

With model estimation tools

To table of content

The package was tested with online model estimation using SIPPY. The respective functionality is implemented and enabled via is_est_model. Related parameters can be found in the documentation of the CtrlOptPred class.

Installing dependencies

To install SIPPY, first take care of the dependencies:

Ubuntu/Debian:

sudo apt-get install -y build-essential gfortran cmake libopenblas-dev

Arch

pacman -Sy gcc gcc-fortran cmake base-devel openblas

Installing `scikit-build`

pip install scikit-build

or, using Anaconda,

conda install scikit-build

Installing `rcognita` with `SIPPY`

pip3 install rcognita[SIPPY]

General description

To table of content

rcognita Python package is designed for hybrid simulation of agents and environments (generally speaking, not necessarily reinforcement learning agents). Its main idea is to have an explicit implementation of sampled controls with user-defined sampling time specification. The package consists of several modules, namely, controllers, loggers, models, simulator, systems, utilities, visuals and a collection of main modules (presets) for each agent-environment configuration.

This flowchart shows interaction of the core rcognita classes contained in the said modules (the latter are not shown on the diagram).

The main module is a preset, on the flowchart a 3-wheel robot. It initializes the system (the environment), the controllers (the agents, e. g., a safe agent, a benchmarking agent, a reinforcement learning agent etc.), the visualization engine called animator, the logger and the simulator. The latter is a multi-purpose device for simulating agent-environment loops of different types (specified by sys_type).

Depending on sys_type, the environment can either be described by a differential equation (including stochastic ones), a difference equation (for discrete-time systems), or by a probability distribution (for, e.g., Markov decision processes).

The parameter dt determines the maximal step size for the numerical solver in case of differential equations. The main method of this class is sim_step which performs one solver step, whereas reset re-initializes the simulator after an episode.

The Logger class is an interface defining stubs of a print-to-console method print sim step, and print-to-file method log data row, respectively. Concrete loggers realize these methods.

A similar class inheritance scheme is used in Animator, and System. The core data of Animator’s subclasses are objects, which include entities to be updated on the screen, and their parameters stored in pars.

A concrete realization of a system interface must realize sys_dyn, which is the “right-handside” of the environment description, optionally disturbance dynamics via disturb_dyn, optionally controller dynamics (if the latter is, e.g., time-varying), and the output function out. The method receive_action gets a control action and stores it. Everything is packed together in the closed_loop_rhs for the use in Simulator.

Finally, the controllers module contains various agent types. One of them is CtrlOptPred – the class of predictive objective-optimizing agents (model-predictive control and predictive reinforcement learning) as shown in this flowchart. Notice it contains an explicit specification of the sampling time dt.

The method _critic computes a model of something related to the value, e.g., value function, Q-function or advantage. In turn, _critic_cost defines a cost (loss) function to fir the critic (commonly based on temporal errors). The method _critic_optimizer actually optimizes the critic cost. The principle is analogous with the actor, except that it optimizes an objective along a prediction horizon. The details can be found in the code documentation. The method compute_action essentially watches the internal clock and performs an action updates when a time sample has elapsed.

Auxiliary modules of the package are models and utilities which provide auxiliary functions and data structures, such as neural networks.

Usage

To table of content

After the package is installed, you may just python run one of the presets found here, say,

python3 main_3wrobot_NI.py

This will call the preset with default settings, description of which can be found in the preset accordingly.

The naming convention is main_ACRONYM, where ACRONYM is actually related to the system (environment). You may create your own by analogy.

For configuration of hyper-parameters, just call help on the required preset, say,

python3 main_3wrobot_NI.py -h

Settings

To table of content

Some key settings are described below (full description is available via -h option).

Parameter	Type	Description
`ctrl_mode`	string	Controller mode
`dt`	number	Controller sampling time
`t1`	number	Final time
`state_init`	list	Initial state
`is_log_data`	binary	Flag to log data
`is_visualization`	binary	Flag to produce graphical output
`is_print_sim_step`	binary	Flag to print simulation step data
`is_est_model`	binary	If a model of the system is to be estimated online
`Nactor`	integer	Horizon length (in steps) for predictive controllers
`stage_obj_struct`	string	Structure of running objective function
`Ncritic`	integer	Critic stack size (number of TDs)
`gamma`	number	Discount factor
`critic_struct`	string	Structure of critic features
`actor_struct`	string	Structure of actor features

Advanced customization

To table of content

Custom environments: realize system interface in the systems module. You might need nominal controllers for that, as well as an animator, a logger etc.
Custom running cost: adjust rcost in controllers
Custom AC method: simplest way -- by adding a new mode and updating _actor_cost, _critic_cost and, possibly, _actor, _critic. For deep net AC structures, use, say, PyTorch
Custom model estimator: so far, the framework offers a state-space model structure. You may use any other one. In case of neural nets, use, e.g., PyTorch

Experimental things

To table of content

An interface for dynamical controllers, which can be considered as extensions of the system state vector, is provided in _ctrl_dyn of the systems module. RL is usually understood as a static controller, i.e., a one which assigns actions directly to outputs. A dynamical controller does this indirectly, via an internal state as intermediate link. ynamical controllers can overcome some limitations of static controllers.

Related literature

To table of content

Closing remarks

To table of content

Please contact me for any inquiries and don't forget to give me credit for usage of this code. If you are interested in stacked Q-learning, kindly read the paper.

Original author: P. Osinenko, 2020

Bibtex reference

@misc{rcognita2020,
author =   {Pavel Osinenko},
title =    {Rcognita: a framework for hybrid agent-enviroment simultion},
howpublished = {\url{https://github.com/AIDynamicAction/rcognita}},
year = {2020}
}

Parameter name	Values	Notes
`ctrl_mode`	string	see description of methods in preset
`dt`	number	controller sampling time
`t1`	number	final time
`x0`	numpy vector	initial state, dimension preset-specific!

Parameter name	Values	Default	Description
`is_log_data`	binary	0
`is_visualization`	binary	1
`is_print_sim_step`	binary	1
`is_est_model`	binary	0	if a model of the env. is to be estimated online
`model_est_stage`	number	1	seconds to learn model until benchmarking controller kicks in
`model_est_period`	number	1*`dt`	model is updated every `model_est_period` seconds
`model_order`	integer	5	order of state-space estimation model
`prob_noise_pow`	number	8	power of probing noise
`uMan`	numpy vector	zeros	manual control action to be fed constant, system-specific!
`Nactor`	integer	3	horizon length (in steps) for predictive controllers
`pred_step_size`	number	`dt`
`buffer_size`	integer	10
`rcost_struct`	string	`quadratic`	structure of running cost function
`R1`	numpy matrix	identity matrix	must have proper dimension
`R2`	numpy matrix	identity matrix	must have proper dimension
`Ncritic`	integer	4	critic stack size (number of TDs)
`gamma`	number	1	discount factor
`critic_period`	number	`dt`	critic is updated every `critic_period` seconds
`critic_struct`	string	`quad-nomix`	structure of critic features
`actor_struct`	string	`quad-nomix`	structure of actor features

	critic_opt_method = 'SLSQP'
	if critic_opt_method == 'trust-constr':
	critic_opt_options = {'maxiter': 200, 'disp': False} #'disp': True, 'verbose': 2}
	else:
	critic_opt_options = {'maxiter': 200, 'maxfev': 1500, 'disp': False, 'adaptive': True, 'xatol': 1e-7, 'fatol': 1e-7} # 'disp': True, 'verbose': 2}

aidynamicaction / rcognita Goto Github PK

rcognita's Introduction

About

Example run with a mobile robot simulation

Table of content

Installation

Basic

With model estimation tools

Installing dependencies

Ubuntu/Debian:

Arch

Installing scikit-build

Installing rcognita with SIPPY

General description

Usage

Settings

Advanced customization

Experimental things

Related literature

Closing remarks

Bibtex reference

rcognita's People

Contributors

Stargazers

Watchers

Forkers

rcognita's Issues

Ubuntu/Debian:

Arch

Then after that

Recommend Projects

Recommend Topics

Recommend Org

Installing `scikit-build`

Installing `rcognita` with `SIPPY`