jlm429 / bettermdptools Goto Github PK

View Code? Open in Web Editor NEW

17.0 17.0 28.0 513 KB

so much room for activities

Python 11.58% Jupyter Notebook 88.42%

bettermdptools's People

Contributors

Stargazers

Watchers

bettermdptools's Issues

README - minor correction

2nd paragraph says (emphasis added):

Here, blackjack.convert_state_obs changes the 3-tuple into a discrete space with 290 states by concatenating player states 0-28 (hard 4-21 & soft 12-21) with dealer states 0-9 (2-9, ten, ace).

This appears to be a typo. There are 280 total states, because player states range from 0 to 27 (not 0 to 28): 18 states for hard 4-21 (so 0, 1, ..., 17) and 10 states for soft 12-21 (18, 19, ...27).

off by one bug

bettermdptools/algorithms/planner.py

Line 127 in 12163cf

while i < n_iters and not converged:

Taxi-v3 infinite loop in policy_iteration

The "value_iteration()" works perfectly.

However, "policy_iteration()" appears to get stuck on line 135 of planner.py in "policy_evaluation()" as the break statement isn't reached.

if np.max(np.abs(prev_V - V)) < theta:
    break

This appears to happen due to the Taxi-v3 environment (link) having a -1 reward for each step taken, which prevents the absolute difference from line 135 from being satisfied.

Code to reproduce:

import gymnasium as gym
from bettermdptools.algorithms.planner import Planner

large_mdp = gym.make('Taxi-v3', render_mode=None)
observation, info = large_mdp.reset(seed=555) # passenger at green, destination at yellow
V, V_track, pi = Planner(large_mdp.unwrapped.P).policy_iteration()

To bypass the infinite loop, and reach an identical result to value iteration I increased theta and n_iters, this works, but seems against the spirit of policy iteration:

import gymnasium as gym
from bettermdptools.algorithms.planner import Planner

large_mdp = gym.make('Taxi-v3', render_mode=None)
observation, info = large_mdp.reset(seed=555) # passenger at green, destination at yellow
V, V_track, pi = Planner(large_mdp.unwrapped.P).policy_iteration(n_iters=10000, theta=10000)

I was reading up on Policy Evaluation, it seems that it is required that for convergence the value function (within policy_evaluation()) needs to be monotonically increasing, and the Taxi-v3 environment doesn't satisfy that requirement as it has -1 rewards for steps.

I wonder if a different criteria of convergence could be used in "policy_evaluation()", perhaps np.isclose and use a relative difference?

pygame issues

Pygame errors out on linux ("libGL error: MESA-LOADER: failed to open iris"

See: https://stackoverflow.com/questions/72110384/libgl-error-mesa-loader-failed-to-open-iris

Solution 2, edit the first line

cd /home/$USER/anaconda3/envs/$ENV/lib

where:
$USER is your username
$ENV is the name of your conda env.

Not working with latest gym version

See deprecation warnings.

plots Q-learning wrapper

Run Q-learning on wrapped env in plot examples instead of .env attribute, which is the environment underneath the first wrapper.

Policy iteration - can be improved to run faster and aligned with Sutton-Barto

... but the current implementation starts every evaluation with zero utility values.

A trivial code update sped things up by 1.5x in several experiments.

example plots are not resizing

#ireplace 8x8 with sqrt of data length
https://github.com/jlm429/bettermdptools/blob/master/examples/plots.py#L39

examples/blackjack-envP doesn't install by default due to lack of .py extension

When I try to instantiate the Blackjack class, I receive the following error:

AttributeError: 'Blackjack' object has no attribute '_P'

Which appears to stem from the missing file blackjack-envP:

Pickle load failed. Check path C:\Users\diuan.conda\envs\py3.8\lib\site-packages\examples\blackjack-envP

Suspicion is that the pip install doesn't load this file alongside others in the example folder, since this file in particular does not contain a .py extension.

not able to import after conda install

Make sure the site-packages directory has the correct name. Run the following command to locate the site-packages folder.

python -c "import site; print(site.getsitepackages())"

Also see:
https://stackoverflow.com/questions/43485569/installed-a-package-with-anaconda-cant-import-in-python

WARN: env.P to get variables from other wrappers is deprecated

Using the example tutorial code from the ReadMe:

import gymnasium as gym


from bettermdptools.algorithms.planner import Planner
from bettermdptools.utils.plots import Plots

# make gym environment 
frozen_lake = gym.make('FrozenLake8x8-v1', render_mode=None)

# run VI
V, V_track, pi = Planner(frozen_lake.P).value_iteration()

#plot state values
size=(8,8)
Plots.values_heat_map(V, "Frozen Lake\nValue Iteration State Values", size)

I get the following warning:

UserWarning: WARN: env.P to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.P` for environment variables or `env.get_wrapper_attr('P')` that will search the reminding wrappers.

I know it's just a warning, but this should probably be updated to use the modern wrapper technique.

Fix README.md example - add needed imports

Add the imports to the README:

from examples.plots import Plots
from examples.grid_search import GridSearch

epsilon_decay = [.4, .7, .9]
iters = [500, 5000, 50000]
GridSearch.Q_learning_grid_search(frozen_lake.env, epsilon_decay, iters)

#plot state values
frozen_lake = gym.make('FrozenLake8x8-v1', render_mode=None)
V, V_track, pi = Planner(frozen_lake.env.P).value_iteration()
Plots.grid_values_heat_map(V, "State Values")

SARSA doc init_epsilon

https://github.com/jlm429/bettermdptools/blob/master/algorithms/rl.py#L242

default=1.0

jlm429 / bettermdptools Goto Github PK

bettermdptools's People

Contributors

Stargazers

Watchers

Forkers

bettermdptools's Issues

Recommend Projects

Recommend Topics

Recommend Org