whitemech / temprl Goto Github PK

View Code? Open in Web Editor NEW

11.0 2.0 4.0 1.69 MB

Reinforcement Learning framework for Temporal Goals

Home Page: https://whitemech.github.io/temprl

License: GNU Lesser General Public License v3.0

Makefile 6.54% Python 93.46%

reinforcement-learning temporal-goals temporal-logic temporal-constraints automata

temprl's Introduction

temprl

Framework for Reinforcement Learning with Temporal Goals defined by LTLf/LDLf formulas.

Status: development.

Install

Install the package:

from PyPI:
```
  pip3 install temprl
```

with pip from GitHub:

  pip3 install git+https://github.com/whitemech/temprl.git

or, clone the repository and install:

  git clone htts://github.com/whitemech/temprl.git
  cd temprl
  pip install .

Tests

To run tests: tox

To run only the code tests: tox -e py3.7

To run only the linters:

tox -e flake8
tox -e mypy
tox -e black-check
tox -e isort-check

Please look at the tox.ini file for the full list of supported commands.

Docs

To build the docs: mkdocs build

To view documentation in a browser: mkdocs serve and then go to http://localhost:8000

License

temprl is released under the GNU Lesser General Public License v3.0 or later (LGPLv3+).

Authors

Marco Favorito

temprl's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger gallorob salvatorecognetta sjtuguofei

temprl's Issues

Generate model from discrete env

Is your feature request related to a problem? Please describe.

In case the wrapped environment is DiscreteEnv, the wrapper forgets the model.

Describe the solution you'd like
Make TemporalGoalWrapper able to detect if the wrapped environment is an instance of DiscreteEnv, and in that case, extend the model such that the automata transitions are included.

Describe alternatives you've considered
n/a

Additional context
n/a

Let `feature_extractor` and `extract_fluents` callable being resettable.

Is your feature request related to a problem? Please describe.

TemporalGoalWrapper.feature_extractor and TemporalGoal.extract_fluents are of type callable.

It is possible to create extractor that depends on more than one state, by keeping memory of the past states:

class my_feature_extractor:
    
    def __init__(self, *args, *kwargs):
        ...

    # this method makes the class callable
    def __call__(self, obs, action):
        ...

wrapper = TemporalGoalWrapper(env, feature_extractor=my_feature_extractor(), ...)

however, the state of the extractor is kept across episodes.

Describe the solution you'd like

TemporalGoalWrapper should call a reset() method of feature_extractor and of every extract_fluents. Of course they might not have a method called reset(), in that case just skip them.

Describe alternatives you've considered

Additional context

Purpose of sink state

Subject of the issue

The RewardDFA adds a sink state. Is this an intended behaviour? There will be formulae that may never fail. Also, if a sink state exists, that will be included in the automaton already.

If we want the ability to distinguish such a state, we could detect a sink by traversing the graph.
What do you think?

Implement deterministic serialization of temporal goals.

Is your feature request related to a problem? Please describe.

Across different runs, the DFA generated by the formula is not the same. That changes the observation space the agent learns on.

Describe the solution you'd like

A way to recover the previous DFA by implementing a proper serialization method.

Describe alternatives you've considered

Additional context

Use the strategy pattern for reward shaping in `TemporalGoal` class.

Is your feature request related to a problem? Please describe.

The class TemporalGoal hard-codes the behaviour of reward shaping, that can be controlled by the flag reward_shaping passed in the constructor.

Describe the solution you'd like

Make the approach more modular and customizable by introducing the RewardShaper class, such that it makes it easier for a developer to change the default behaviour.

Describe alternatives you've considered
None.

Additional context
None.

Call 'extract_fluents' with the episode number and the step number.

Is your feature request related to a problem? Please describe.

When working with multiple temporal goals, it might happen that the fluents are extracted multiple times from the same state.

Describe the solution you'd like

Provide an episode and step argument to the extract_fluents method such that it allows caching the fluents already computed for that iteration.

Describe alternatives you've considered

Additional context

Make it possible to create a temporal goal from an automaton.

Is your feature request related to a problem? Please describe.

It is not possible to initialize the temporal goal directly from an automaton, but only from a temporal logic formula (LTLf or LDLf)

Describe the solution you'd like

Make it possible to provide just a DFA.

Describe alternatives you've considered

Additional context

whitemech / temprl Goto Github PK

temprl's Introduction

temprl

Install

Tests

Docs

License

Authors

temprl's People

Contributors

Stargazers

Watchers

Forkers

temprl's Issues

Subject of the issue

Recommend Projects

Recommend Topics

Recommend Org