Coder Social home page Coder Social logo

bettermdptools's People

Contributors

audols avatar cbhyphen avatar dcangulo avatar fidmor89 avatar gagancodes avatar jlm429 avatar leeykang avatar mattteal avatar tim-k-dfw avatar zbalda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bettermdptools's Issues

README - minor correction

2nd paragraph says (emphasis added):

Here, blackjack.convert_state_obs changes the 3-tuple into a discrete space with 290 states by concatenating player states 0-28 (hard 4-21 & soft 12-21) with dealer states 0-9 (2-9, ten, ace).

This appears to be a typo. There are 280 total states, because player states range from 0 to 27 (not 0 to 28): 18 states for hard 4-21 (so 0, 1, ..., 17) and 10 states for soft 12-21 (18, 19, ...27).

Taxi-v3 infinite loop in policy_iteration

The "value_iteration()" works perfectly.

However, "policy_iteration()" appears to get stuck on line 135 of planner.py in "policy_evaluation()" as the break statement isn't reached.

if np.max(np.abs(prev_V - V)) < theta:
    break

This appears to happen due to the Taxi-v3 environment (link) having a -1 reward for each step taken, which prevents the absolute difference from line 135 from being satisfied.

Code to reproduce:

import gymnasium as gym
from bettermdptools.algorithms.planner import Planner

large_mdp = gym.make('Taxi-v3', render_mode=None)
observation, info = large_mdp.reset(seed=555) # passenger at green, destination at yellow
V, V_track, pi = Planner(large_mdp.unwrapped.P).policy_iteration()

To bypass the infinite loop, and reach an identical result to value iteration I increased theta and n_iters, this works, but seems against the spirit of policy iteration:

import gymnasium as gym
from bettermdptools.algorithms.planner import Planner

large_mdp = gym.make('Taxi-v3', render_mode=None)
observation, info = large_mdp.reset(seed=555) # passenger at green, destination at yellow
V, V_track, pi = Planner(large_mdp.unwrapped.P).policy_iteration(n_iters=10000, theta=10000)

I was reading up on Policy Evaluation, it seems that it is required that for convergence the value function (within policy_evaluation()) needs to be monotonically increasing, and the Taxi-v3 environment doesn't satisfy that requirement as it has -1 rewards for steps.

I wonder if a different criteria of convergence could be used in "policy_evaluation()", perhaps np.isclose and use a relative difference?

plots Q-learning wrapper

Run Q-learning on wrapped env in plot examples instead of .env attribute, which is the environment underneath the first wrapper.

examples/blackjack-envP doesn't install by default due to lack of .py extension

When I try to instantiate the Blackjack class, I receive the following error:

AttributeError: 'Blackjack' object has no attribute '_P'

Which appears to stem from the missing file blackjack-envP:

Pickle load failed. Check path C:\Users\diuan.conda\envs\py3.8\lib\site-packages\examples\blackjack-envP

Suspicion is that the pip install doesn't load this file alongside others in the example folder, since this file in particular does not contain a .py extension.

WARN: env.P to get variables from other wrappers is deprecated

Using the example tutorial code from the ReadMe:

import gymnasium as gym


from bettermdptools.algorithms.planner import Planner
from bettermdptools.utils.plots import Plots

# make gym environment 
frozen_lake = gym.make('FrozenLake8x8-v1', render_mode=None)

# run VI
V, V_track, pi = Planner(frozen_lake.P).value_iteration()

#plot state values
size=(8,8)
Plots.values_heat_map(V, "Frozen Lake\nValue Iteration State Values", size)

I get the following warning:

UserWarning: WARN: env.P to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.P` for environment variables or `env.get_wrapper_attr('P')` that will search the reminding wrappers.

I know it's just a warning, but this should probably be updated to use the modern wrapper technique.

Fix README.md example - add needed imports

Add the imports to the README:

from examples.plots import Plots
from examples.grid_search import GridSearch

epsilon_decay = [.4, .7, .9]
iters = [500, 5000, 50000]
GridSearch.Q_learning_grid_search(frozen_lake.env, epsilon_decay, iters)

#plot state values
frozen_lake = gym.make('FrozenLake8x8-v1', render_mode=None)
V, V_track, pi = Planner(frozen_lake.env.P).value_iteration()
Plots.grid_values_heat_map(V, "State Values")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.