Dyna with Options

Template based on https://github.com/andnp/rl-control-template

Installing Jax on MacOS

Currently (2022-01-18), the only published wheel for MacOS is for python 3.9/3.10. The python wheel for 3.8 has not yet been published but should be build-able. For more details, follow this thread.

pip install -U pip
pip install -U https://storage.googleapis.com/jax-releases/mac/jaxlib-0.1.74-cp39-none-macosx_11_0_arm64.whl

Description of the GrazingWorld Environment

10x10 matrix, where

each timestep incurs -0.1 reward
each episode ends when an agent takes any action in any of the three goal states - and is teleported back to S
if the agent takes an action that would lead it into the [W]all, the agent remains in the state where they selected that action. Similarly, if the agent fell off the grid.
G1 starts off giving out a reward of 0 for a number of timesteps determined by a number sampled from a Poisson distribution with mean=50 by default for now. After that many timesteps, the rewards switches to +50 for a number of timesteps sampled from the same distribution.
The same is true for G2, except its non-zero reward is +40.
G1 always gives a reward of +1.

Setting up repo

This codebase only works with python 3.6 and above.

Packages are stored in a requirements.txt file. To install:

pip install -r requirements.txt

On machines that you do not have root access to (like compute canada machines), you will need to install in the user directory. You can do this with:

pip install --user -r requirements.txt

Or you need to set up a virtual environment:

virtualenv -p python3 env

Step-by-step example

To get set-up on compute canada:

ssh $cedar
# You should replace this github url with your repo that is a copy of the template repo
git clone [email protected]:GaborMihucz/dyna-options.git
cd dyna-options

# build virtual environment
virtualenv -p python3 env
. env/bin/activate

# install dependencies
pip install -r requirements.txt

Here is a quick guide to run an already existent experiment on compute canada. This should run a parameter sweep over alpha and epsilon for e-greedy SARSA on MountainCar then plot the learning curve of the parameter setting that achieves the highest return averaged over 10 runs with standard error bars.

ssh $cedar
cd rl-control-template
git pull # make sure you are up to date

# remove any old results that you might have lying around
# that way you don't accidentally zip them up and re-download them after the experiment
rm -rf results &

# check the cluster parameters
# make sure to balance using many parallel cpu cores
# while also being a good citizen of the resources (e.g. don't schedule 1000s of 2m jobs)
nano clusters/cedar.json

# run the experiment
python scripts/slurm.py clusters/cedar.json src/main.py ./ 10 experiments/example/*.json

# wait for a while
# then zip and download results
tar -cavf results.tar.bz2 results

# go back to your laptop
exit
scp $cedar:~/rl-control-template/results.tar.bz2 ./
tar -xvf results.tar.bz2

# plot your results
python analysis/learning_curve.py experiments/example/*.json

Getting a Compute Canada Node

Drop this shell script into your project folder

#!/bin/sh
#SBATCH --time=23:58:00
#SBATCH --mem=186G
#SBATCH --cpus-per-task=48
#SBATCH --account=rrg-whitem
#SBATCH --mail-type=ALL
#SBATCH --nodes=1
python sleep.py

where sleepy.py is

import time

while True:
    time.sleep(60)

then, run sbatch <script>.sh.

Run sq. It will list the node that has been queued, once it’s running, ssh in to the node number (starts with cdr) ssh cdr<numbers>

Dependencies

This template repo depends on a few other shared libraries to make code-splitting and sharing a little easier (for me). The documentation and source code can be found at the following links.

RLGlue - my own minimal implementation of RLGlue
PyExpUtils - a library containing the experiment running framework
PyFixedReps - a few fixed representation algorithms implemented in python (e.g. tile-coding, rbfs, etc.)

Organization Patterns

Experiments

All experiments are described as completely as possible within static data files. I choose to use .json files for human readability and because I am most comfortable with them. These are stored in the experiments folder, usually in a subdirectory with a short name for the experiment being run (e.g. experiments/idealH would specify an experiment that tests the effects of using h*).

Experiment .json files look something like:

{
    "agent": "gtd2", // <-- name of your agent. these names are defined in agents/registry.py
    "problem": "randomwalk", // <-- name of the problem you're solving. these are defined in problems/registry.py
    "metaParameters": { // <-- a dictionary containing all of the meta-parameters for this particular algorithm
        "alpha": [1, 0.5, 0.25], // <-- sweep over these 3 values of alpha
        "beta": 1.0, // <-- don't sweep over beta, always use 1.0
        "use_ideal_h": true,
        "lambda": [0.0, 0.1]
    }
}

Problems

I define a problem as a combination of:

environment
representation
target/behavior policies
number of steps
gamma
starting conditions for the agent (like in Baird's)

results

The results are saved in a path that is defined by the experiment definition used. The configuration for the results is specified in config.json. Using the current config.json yields results paths that look like:

<base_path>/results/<experiment short name>/<agent name>/<parameter values>/errors_summary.npy

Where <base_path> is defined when you run an experiment.

src

This is where the source code is stored. The only .py files it contains are "top-level" scripts that actually run an experiment. No utility files or shared logic at the top-level.

agents: contains each of the agents. Preferably, these would be one agent per file.

analysis: contains shared utility code for analysing the results. This does not contain scripts for analysing results, only shared logic (e.g. plotting code or results filtering).

environments: contains minimal implementations of just the environment dynamics.

utils: various utility code snippets for doing things like manipulating file paths or getting the last element of an array. These are just reusable code chunks that have no other clear home. I try to sort them into files that roughly name how/when they will be used (e.g. things that manipulate files paths goes in paths.py, things that manipulate arrays goes in arrays.py, etc.).

clusters

This folder contains the job submission information that is needed to run on a cluster. These are also .json files that look like:

{
    "account": "which compute canada account to use",
    "time": "how much time the job is expected to take",
    "nodes": "the number of cpu cores to use",
    "memPerCpu": "how much memory one parameter setting requires", // doesn't need to change
    "tasksPerNode": "how many parameter settings to run in serial on each cpu core"
}

Some quick terminology (that I made up and is kinda bad):

node: a CPU core
task: a single call to the experiment entry file (e.g. src/main.py). Generally only runs one parameter setting for a single run.
job: a compute canada job (contains many tasks and run across multiple nodes).

The nodes setting determines the number of CPU cores for the job to request. These CPU cores may not all be on the same server node and most likely will be split across several server nodes. The job scheduling script bundled with this template repo will handle distributing jobs across multiple server nodes in the way recommended by compute canada support.

The tasksPerNode sets up the number of processes (calls to the experiment entry file) to be lined up per node requested. If you request nodes=16, then 16 jobs will be run in parallel. If you request tasksPerNode=4, then each node will run 4 tasks in serial. In total, 64 tasks will be scheduled for one compute canada job with this configuration. If there are 256 total tasks that need to be run, then 4 compute canada jobs will be scheduled.

Running the code

There are a few layers for running the code. The most simple layer is directly running a single experiment for a single parameter setting. The highest layer will schedule jobs on a cluster (or on a local computer) that sweeps over all of the parameter settings.

The higher layers of running the code work by figuring out how to call the most simple layer many times, then generating a script that calls the simple layer for each parameter setting.

Everything should be run from the root directory of the repo!

Directly run experiment

Let's say you want to generate a learning curve over N runs of an algorithm.

python src/main.py <N> <path/to/experiment.json> <parameter_setting_idx>

It isn't super easy to know which parameter_setting_idx to use. It is more simple to make an experiment description .json that only contains one possible parameter permutation (i.e. has no arrays in it). This will save the results in the results folder as specified above.

Run parameter sweeps

Note: When you are pulling from latest, make sure to regenerate the options with

python src/utils/create_options.py <environment-name>

The pulled option will likely not work because of differences in library versions and cause an error when running.

To run an entire sweep locally, you'll want to do the following:

python run/local.py -p src/main.py -j <path>/<to>/<experiment>/<json>

To run it on a cluster (like cedar), run the following

python run/slurm.py -p src/main.py -j <path>/<to>/<experiment>/<json> --email=<notification-email-address> -m=<minutes-to-run>

Check out run/local.py and run/slurm.py for more details on what arguments they take!

I'm not sure if this below works:

If you want to run a larger experiment (i.e. a parameter sweep), you'll want to run these on a cluster (like cedar).

python scripts/slurm.py ./clusters/cedar.json src/main.py <path/where/results/are/saved> <num runs> <path/to/experiment.json>

example: if I want to run an experiment called ./experiments/idealH/gtd2_not.json

python scripts/slurm.py ./clusters/cedar.json src/main.py ./ 100 ./experiments/idealH/gtd2_not.json

To run multiple experiments at once, you can specify several .json files.

python scripts/slurm.py ./clusters/cedar.json src/main.py ./ 100 ./experiments/idealH/*.json

python scripts/slurm.py ./clusters/cedar.json src/main.py ./ 100 ./experiments/idealH/gtd2.json ./experiments/idealH/gtd2_not.json

Generate learning curves

The top-level analysis folder contains the scripts for generating learning curves.

python analysis/learning_curve.py <path/to/experiments.json>

example: One algorithm (one line)

python analysis/learning_curve.py ./experiments/idealH/gtd2_not.json

example: compare algorithms (multiple lines)

python analysis/learning_curve.py ./experiments/idealH/gtd2_not.json ./experiments/idealH/gtd2.json

FAQs

What are the best settings for clusters/cedar.json?

As per the best practices document from compute canada, I make sure my CC jobs always take at least one hour to complete. Because many of my tasks take about 5 minutes, I generally set the tasksPerNode parameter to ~16 to accomplish this (16*5m = 1h20m). I also try to make sure my jobs take no longer than 12hrs to complete (if I can help it). The optimal---if I can wait---is to make the jobs take just under 3hrs so that my jobs are in the highest priority queue, but put the least strain on the scheduler. Always leave a bit of wiggle room.

There is a fine balance between CC job size and the number of CC jobs scheduled. Large CC jobs take longer to be scheduled, but a large number of small jobs put unnecessary strain on the scheduler. I try to limit my number of scheduled jobs to ~100 (we have a max of 2000 per person). To figure out how many tasks will be scheduled for an experiment, you can run:

import src.experiment.ExperimentModel as Experiment

exp = Experiment.load('experiments/path/to.json')
print(exp.numPermutations())

How do you get your code from your laptop to the compute canada server?

Git is your friend. All of my code is always checked-in to git, and I have my experiment code cloned on my laptop and on the CC server. I use GitHub (or sometime bitbucket) private repos to house the code remotely. I make liberal use of git tags to mark checkpoints in the repo's lifespan (e.g. before I add a new contributor: git tag before-aj-messed-things-up, or when I submit a paper git tag icml-2020). This helps maintain my sanity when code changes and evolves over time, because now all codebase states are still accessible.
What if one of my jobs fails or some of the tasks did not finish in time?

One of the major advantages to the way this experiment framework is set up is that you can trivially determine exactly which results are missing after scheduling a job. In fact, the job scheduling script in this template repo already handles this issue by default. If you have results that are missing, simply run the scheduling script again with no changes and it will schedule only the missing tasks.
I'm running the scheduling script, but it exits immediately and no jobs are scheduled?

See the above. Chances are, your results/ folder is not empty so there are no "missing results" to be scheduled. If you want to force the scheduling script to go forward anyways, either run mv results results.old or rm -rf results/ to get rid of the results (or some other less aggressive strategy).
Can your code use GPUs?

Yup! Just change the bash script that is generated in scripts/slurm.py to request GPUs from compute canada.
Can your code use multi-threading?

Currently the scheduling script is not designed to handle multi-threading. Because my tasks tend to be relatively short (a few hours at most), and because it is generally better to have many single-threaded processes than one multi-threaded process, I have had no need to design a script to handle multi-threading. However, the underlying experiment framework, PyExpUtils, does have support for handling multi-threaded tasks. You will need to make a few modifications to scripts/slurm.py to change how many tasks are bundled into each job to account for using multiple threads. Talk to Andy if you need help!

Operational Things:

I have committed/uncommitted local changes and I want to pull in the latest remote changes and have it be the base

Suppose your repo is in the state of
- A-B-(unfinished work)
- A-B-C
The master branch is at the state of A-B-D
And you want the end result to be either
- A-B-D-(unfinished work)
- A-B-D-C-(unfinished work)

Steps:

Commit all unfinished work, name the commit message however you want
git pull --rebase (this will pull the latest and rebase)
Resolve any conflicts that might have occurred from the rebase
1. Resolve merge conflicts within your editor of choice
2. git add the file after you resolved the conflicts
3. git rebase --continue to finish rebasing
git reset HEAD~ to uncommit your unfinished work

Now, you should have your repo in a state where you have rebased onto the latest with your changes applied! (Hopefully)

kevroi / goal-space-planning-py Goto Github PK

goal-space-planning-py's Introduction