Coder Social home page Coder Social logo

lebrice / sequoia Goto Github PK

View Code? Open in Web Editor NEW
190.0 20.0 18.0 6.98 MB

The Research Tree - A playground for research at the intersection of Continual, Reinforcement, and Self-Supervised Learning.

License: GNU General Public License v3.0

Python 99.27% Shell 0.53% Dockerfile 0.20%

sequoia's Introduction

Sequoia - The Research Tree

A Playground for research at the intersection of Continual, Reinforcement, and Self-Supervised Learning.

Note: This project is not being actively developed at the moment. If you encounter any difficulties, please create an issue and I'll help you out.

If you have any questions or comments, please make an issue!

Motivation:

Most applied ML research generally either proposes new Settings (research problems), new Methods (solutions to such problems), or both.

  • When proposing new Settings, researchers almost always have to reimplement or heavily modify existing solutions before they can be applied onto their new problem.

  • Likewise, when creating new Methods, it's often necessary to first re-create the experimental setting of other baseline papers, or even the baseline methods themselves, as experimental conditions may be slightly different between papers!

The goal of this repo is to:

  • Organize various research Settings into an inheritance hierarchy (a tree!), with more general, challenging settings with few assumptions at the top, and more constrained problems at the bottom.

  • Provide a mechanism for easily reusing existing solutions (Methods) onto new Settings through Polymorphism!

  • Allow researchers to easily create new, general Methods and quickly gather results on a multitude of Settings, ranging from Supervised to Reinforcement Learning!

Installation

Requires python >= 3.7

Basic installation:

$ git clone https://www.github.com/lebrice/Sequoia.git
$ pip install -e Sequoia

Optional Addons

You can also install optional "addons" for Sequoia, each of which either adds new Methods, new environments/datasets, or both. using either the usual extras_require feature of setuptools, or by pip-installing other repositories which register Methods for Sequoia using an entry_point in their setup.py file.

pip install -e Sequoia[all|<plugin name>]

Here are some of the optional addons:

  • avalanche:

    Continual Supervised Learning methods, provided by the Avalanche library:

    $ pip install -e Sequoia[avalanche]
  • CN-DPM: Continual Neural Dirichlet Process Mixture model:

    $ cd Sequoia
    $ git submodule init  # to setup the submodules
    $ pip install -e sequoia/methods/cn_dpm    
  • orion:

    Hyper-parameter optimization using Orion

    $ pip install -e Sequoia[orion]
  • metaworld:

    Continual / Multi-Task Reinforcement Learning environments, thanks to the metaworld package. The usual setup for mujoco needs to be done, Sequoia unfortunately can't do it for you ;(

    $ pip install -e Sequoia[metaworld]
  • monsterkong:

    Continual Reinforcement Learning environment from the Meta-MonsterKong repo.

    $ pip install -e Sequoia[monsterkong]
  • continual_world: The Continual World benchmark for Continual Reinforcement learning. Adds 6 different Continual RL Methods to Sequoia.

    $ cd Sequoia
    $ git submodule init  # to setup the submodules
    $ pip install -e sequoia/methods/continual_world   

See the setup.py file for all the optional extras.

Additional Installation Steps for Mac

Install the latest XQuartz app from here: https://www.xquartz.org/releases/index.html

Then run the following commands on the terminal:

mkdir /tmp/.X11-unix 
sudo chmod 1777 /tmp/.X11-unix 
sudo chown root /tmp/.X11-unix/

Documentation overview:

Current Settings & Assumptions:

Setting RL vs SL clear task boundaries? Task boundaries given? Task labels at training time? task labels at test time Stationary context? Fixed action space
Continual RL RL no no no no no no(?)
Discrete Task-Agnostic RL RL yes yes no no no no(?)
Incremental RL RL yes yes yes no no no(?)
Task-Incremental RL RL yes yes yes yes no no(?)
Traditional RL RL yes yes yes no yes no(?)
Multi-Task RL RL yes yes yes yes yes no(?)
Continual SL SL no no no no no no
Discrete Task-Agnostic SL SL yes no no no no no
(Class) Incremental SL SL yes yes no no no no
Domain-Incremental SL SL yes yes yes no no yes
Task-Incremental SL SL yes yes yes yes no no
Traditional SL SL yes yes yes no yes no
Multi-Task SL SL yes yes yes yes yes no

Notes

  • Active / Passive: Active settings are Settings where the next observation depends on the current action, i.e. where actions influence future observations, e.g. Reinforcement Learning. Passive settings are Settings where the current actions don't influence the next observations (e.g. Supervised Learning.)

  • Bold entries in the table mark constant attributes which cannot be changed from their default value.

  • *: The environment is changing constantly over time in ContinualRLSetting, so there aren't really "tasks" to speak of.

Running experiments

--> (Reminder) First, take a look at the Examples <--

Directly in code:

from sequoia.settings import TaskIncrementalSLSetting
from sequoia.methods import BaseMethod
# Create the setting
setting = TaskIncrementalSLSetting(dataset="mnist")
# Create the method
method = BaseMethod(max_epochs=1)
# Apply the setting to the method to generate results.
results = setting.apply(method)
print(results.summary())

Command-line:

$ sequoia --help
usage: sequoia [-h] [--version] {run,sweep,info} ...

Sequoia - The Research Tree 

Used to run experiments, which consist in applying a Method to a Setting.

optional arguments:
  -h, --help        show this help message and exit
  --version         Displays the installed version of Sequoia and exits.

command:
  Command to execute

  {run,sweep,info}
    run             Run an experiment on a given setting.
    sweep           Run a hyper-parameter optimization sweep.
    info            Displays some information about a Setting or Method.

For example:

$ sequoia run [--debug] <setting> (setting arguments) <method> (method arguments)
$ sequoia sweep [--debug] <setting> (setting arguments) <method> (method arguments)
$ sequoia info [setting or method]

For a detailed description of all the arguments, use the --help command for any of the actions:

$ sequoia --help
$ sequoia run --help
$ sequoia run <some_setting> --help
$ sequoia run <some_setting> <some_method> --help
$ sequoia sweep --help
$ sequoia sweep <some_setting> --help
$ sequoia sweep <some_setting> <some_method> --help

For example:

$ sequoia run --debug task_incremental_sl --dataset mnist random_baseline

For example:

  • Run the BaseMethod on task-incremental MNIST, with one epoch per task, and without wandb:
    $ sequoia run task_incremental_sl --dataset mnist base --max_epochs 1
  • Run the PPO Method from stable-baselines3 on an incremental RL setting, with the default dataset (CartPole) and 5 tasks:
    $ sequoia --setting incremental_rl --nb_tasks 5 --method sb3.ppo --steps_per_task 10_000

More questions? Please let us know by creating an issue or posting in the discussions!

sequoia's People

Contributors

digantamisra98 avatar fgolemo avatar jeromepl avatar julioushurtado avatar lebrice avatar lucasc-99 avatar mostafaelaraby avatar oleksost avatar optimass avatar pclucas14 avatar prlz77 avatar rrwiyatn avatar ryanlindeborg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sequoia's Issues

Encoder doesn't get gradients

Taking for example:

likelihood = torch.log_softmax(self._model(observation).logits, dim=1)[0][0]
grad = torch.autograd.grad(
            likelihood, self._model.parameters(),
            retain_graph=True, allow_unused=True,
        )
for (name, p ),g in zip(self._model.named_parameters(), grad): print(name, g)

for the encoder parameters, the gradient is None.

This can be reproduced on the 'reg_method' branch with these parameters for example:

"--setting", "task_incremental_rl", "--method", "baseline", "--ewc.coef", "1", "--verbose", "1", "--max_epochs", "1", "--batch_size", "32", "--max_steps", "1000", "--dataset", "cartpole", "--wandb", "test", "--no_wandb", "--FIM_representation", "block_diagonal", "--ewc_critic", "1", "--rl_algo", "DQN", "--ewc_coefficient", "100", "--nb_tasks", "5", "--train_task_schedule_path", "/Users/oleksostapenko/Projects/SSCL/sequoia/schedules/train_schedule.json"].

Train schedule is something like this:

{              
    "0":   {"gravity": 5, "length": 0.8},
    "200":   {"gravity": 20, "length": 0.2}
}

Validate the RL "baseline" Methods

Related to #12 and #22

While #21 added the PPO and A2C methods, they haven't been truly properly evaluated / allowed to train enough on the different settings for us to be able to tell that they are actually working the way that they should.

For instance, here are a few things that I think we should have before we can safely say that the RL methods like A2C or PPO truly work:

  • Results with average / total reward matching or somewhat close to an expected value from the litterature.
  • Videos of some episodes (maybe through using the wandb.Monitor feature).
  • Plots of the evolution of the performance over time?
  • Etc.

Add interchangeable Task-inference modules

TODO: We should design the API for some kind of 'task inference' module, to be used by the ClassIncrementalModel to predict the task labels for the given samples, when none are available.

This could either have "hard" predictions (i.e. task inference as a simple classification task), or "soft" predictions (some kind of weighted average / mixture of experts).

As far as I can currently tell, this would be implemented as some type of Auxiliary task, and trained alongside the model during training (like all auxiliary tasks), but used by the ClassIncrementalModel during testing to try and choose the appropriate output head in its get_actions method (during 'inference')

Add Support for Semi-Supervision

Need to add semi-supervision, probably as an "innate" property of all settings, unless specified otherwise.

What we want for sure is at least a way to control what fraction of the dataset should be labeled, or the number of labeled samples.

I have a vague idea of how this could be implemented:
We could use some kind of wrapper around the Dataset, before or after the DataLoader is created, which would render it partially unlabeled. In the case of a gym environment, we could also add a Wrapper that gives back a given percentage of None rewards.

However, this wouldn't necessarily fit the "traditional semi-supervised setup" where (I think, but correct me if I'm wrong @oleksost ) you get distinct dataloaders for both the labeled data and the unlabeled data.

Add Smooth Task Boundaries in Supervised CL

Need to add some 'smooth' task boundaries in the Supervised CL branch.
Also need to figure out where/how exactly this will fit in that branch, as it might break the API of the ClassIncrementalSetting.

add support for Synbols

Let's have the SL part of the competition on Synbols https://github.com/ElementAI/synbols

@prlz77 (Sorry in advance, but you are the Synbols' whisperer) could you add the classification dataset to the repository.
I'm not sure what's the best way to do this. Maybe you just drop it somewhere and Fabrice takes care of connecting the APIs?

Clean up the 'Setting' API

Need to clean-up the 'Setting' class:

  • Move some specialized stuff down into the subclasses
  • Remove some of the extraneous/over-engineered methods
  • Add more comments / documentation
  • Clean up the interaction with the LightningDataModule methods, especially when it comes to CL.

Add Meta-World as a source of datasets/tasks for Continual RL settings

Use Meta-World to create the sequence of tasks to be used as part of the ClassIncrementalRLSetting (task_labels_at_test_time=False) or TaskIncrementalRlSetting (`task_labels_at_test_time=True).

Taken from their page:

import metaworld
import random

ml10 = metaworld.ML10() # Construct the benchmark, sampling tasks

training_envs = []
for name, env_cls in ml10.train_classes.items():
  env = env_cls()
  task = random.choice([task for task in ml10.train_tasks
                        if task.env_name == name])
  env.set_task(task)
  training_envs.append(env)

for env in training_envs:
  obs = env.reset()  # Reset environment
  a = env.action_space.sample()  # Sample an action
  obs, reward, done, info = env.step(a)  # Step the environoment with the sampled random action

We could pretty easily adapt the ClassIncrementalRLSetting class so it calls set_task() on the environment when a given step is reached!

Refactor the training/evaluation loops to produce entire transfer matrix

As of right now, all the settings do something a bit like this:

for task_id in range(self.nb_tasks):
    method.fit(train_envs[task_id], val_envs[task_id])

results = []
for task_id in range(self.nb_tasks):
    method.test(test_envs[task_id])
    results += test_envs[task_id].get_results()

return results

But we should instead do something a bit more like this, which would produce the entire transfer matrix, rather than only its bottom row:

results = [[] for _ in range(self.nb_tasks)]
for task_id in range(self.nb_tasks):
    method.fit(train_envs[task_id], val_envs[task_id])
    for test_task_id in range(self.nb_tasks):
        method.test(test_envs[task_id])
        results[task_id] += test_envs[task_id].get_results()
return results

Each Setting could then use that transfer matrix to inform their "objective".
(Currently, all settings only care about the last row, i.e. the performance on all tasks at the end of training, but some newer settings like OSAKA could instead measure only online performance, for instance (or only the diagonal entries of the transfer matrix).

Imagenet Training

Continuum already has the dataloaders for ImageNet, and we were able to get it to work in the old version of the repo. We need to get it to work now, and also make it fast, if possible. Getting this to even work (no matter how slow) would be the first step. Then, #9 and #8 should hopefully make it faster!

Make it possible to use existing models from pl_bolts as Methods and datamodules as IIDSettings in the tree with minimal overhead.

We would ideally like to consider any existing LightningDataModule classes in the pytorch-lightning-bolts package as particular instances of the IIDSetting class, so that we could just easily reuse them and evaluate our methods on them!

Similarly, (and even more importantly), we would need to figure out a way to have the existing LightningModules from pl_bolts be reusable as methods targetting such settings! For instance, it would be fantastic if we could just plug and play some existing models, without requiring too much overhead in terms of code!

Add Class diagram for the BaselineMethod

Need to create a UML-style class diagram of the BaselineMethod and its hierarchy (BaseModel, SelfSupervisedModel, ClassIncrementalModel (will probably be renamed to MultiHeadModel), well as their hyper-parameters.

Design a mechanism for auto-generating citations

Would be nice if we could generate a list of citations for everything that a given Method is using, or for all the other Methods it is comparing itself to!

One idea could be to add a __citations__ attribute to the Methods / Settings / models etc which are based on a work from the litterature, and then when running these methods, we could easily just pull all of those attributes and aggregate them into a neatly formatted .bib file of some sort!

Make the BaselineMethod truly applicable on RL Settings

The baseline method isn't currently applicable to "off-policy"-biased RL settings like CartPole, for instance, because in that environment, the reward is always 1, while our model tries to predict the reward (using a Critic) and choose the action to take (with an Actor).

What I'm thinking of doing is adding something like the Intrinsic Curiosity Unit (Forward and Inverse models) from Curiosity-driven Exploration by Self-supervised Prediction as an Auxiliary task of some sort, and then we would be able to use the Forward model to perform rollouts and get a "Return", rather than being limited to observing/predicting the immediate reward.

MonsterKong Environment

We would like to add support for the MonsterKong environment to the repo following code from https://github.com/mattriemer/monsterkong_examples. See the following google doc for thoughts on different tracks and evaluation procedures https://docs.google.com/document/d/16M5J8DIfRjNwuthm8q_k-eUFpJ9tq7ZdnMtoMpPVT7c/edit?usp=sharing.

A single environment for training an agent is made from a combination of a map and texture specification. Example maps can be found in https://github.com/mattriemer/monsterkong_examples/tree/master/envs/meta_monsterkong/monsterkong_randomensemble/maps20x20 and example “textures” can be found in https://github.com/mattriemer/monsterkong_examples/tree/master/envs/meta_monsterkong/monsterkong_randomensemble/assets.

We need to match the tasks considered to the difficulty and hardware requirements we desire for the competition. An easy thing to try first would be to move the goal location (originally specified as the "princess") while keeping the rest of the map constant. Additionally, a curriculum can be designed where the goal is gradually moved further and further from the agents initial location to subgoals of a more challenging final task where the agent has to travel a significant distance.

If we have extra time, we would also like to play with changing textures to test generalization within a single map configuration i.e. task. Additionally, changing the map configuration itself could also be an interesting direction which makes the game more challenging.

Add the Continual RL Branch to the tree

Add four different settings for Continual RL:

  1. Continual RL: Task changes smoothly over time
  2. "Class-Incremental RL": Task changes suddenly at given steps. Task labels given at train time.
  3. "Task-Incremental RL": Task labels are given at both train and test time
  4. "IID/Standard" RL: Only one task.

"proper" logging to wandb

Need to make sure that the components of the repo that explicitly log stuff to wandb (BaselineMethod) as well as the components that could be logged to wandb, (TestEnvironments (gym Monitors), the Results objects, etc) generate a clean, crisp, minimal summary of the performance of a given method, that can ideally be already clearly understandable on wandb without needing to customize / filter through a bunch of stuff.

There are basically two ways I know of to use wandb:

  1. Log everything, and clean it up later
  2. Keep it simple, only log what you need.

While I'd naturally be inclined to opt for the first option, I think it might be best to stick to something like the second option, unless something like the --versose flag is passed, in which case we could just log everything.

I don't know all the tips and tricks of wandb yet, so I'd welcome any suggestions of things / plots / stuff we could generate "for free" for the users of the repo.

BaselineModel doesn't have any shared weights (no encoder) in cartpole-state

Currently, if the observations aren't pixels, (Setting has observe_state_directly=True), then the BaselineModel doesn't use an encoder, and the inputs are passed directly to the output head. As a result, there is no forgetting in multi-head RL (i.e. when applying the BaselineMethod to the TaskIncrementalRLSetting).

Need to add a dense encoder and shorten the output heads back to being only a single dense layer.

possible bug in ewc_in_rl

possible bug: in ewc_in_rl.py even though I set max_steps=100 (line 303) it still runs for much more steps

IID RL should sample new tasks after each episode

Currently, the "IID" RL setting (RLSetting) creates a task schedule with only one task.
Would be interesting to allow an 'iid' distribution of tasks (as in supervised learning), rather than just having one task!

TODOs:

  • Sample a new task on each reset (possibly from the values of the task schedule dict?)
  • (Not sure): Change the test loop, so that it doesn't do one task at a time as in task-incremental?
  • Change the Results object so it shows the performance overall, rather than on each task?

More General CL Framework

Right now, to perform e.g., class-incremental learning, the user has to choose from a hardcoded list of benchmarks. It would be more interesting if, instead, the user provided a list of {X,Y} pairs of samples, where X could be images and Y could be attributes. For instance, for Synbols X would be an image and Y would contain information about rotation, character, font, etc. Then, Sequoia would create tasks from that list. For instance:

class SequoiaDataset():
    def __init__(self, x, y):
        # right now I don't know what else should the base class do
        self.x = x
        self.y = y

class TaskIncrementalDataset(SequoiaDataset):
    def __init__(self, x, y, attribute_name=None):
        super().__init__(x, y if attribute_name is None else y[attribute_name])
        self.create_partitions()
        
    def create_task_partitions(self):
        # partition y into multiple tasks
        
    def __getitem__(self, item):
        # Get item from task

class DomainIncrementalDataset(SequoiaDataset):
    def __init__(self, x, y, attribute_class_name, attribute_domain_name):
        super().__init__(x, y)
        self.create_partitions(attribute_class_name, attribute_domain_name) # use attribute_class and attribute_domain to partition by domain
        
    def create_task_partitions(self, attribute_class_name, attribute_domain_name):
        # partition y into multiple domains with attribute_class and attribute_domain to partition by domain
        
    def __getitem__(self, item):
        # Get item from task

monitoring: Add average performance over time

What I mean is, at task t, we should monitor the performance on task 1:t.

In the end, you end up with a lower triangular matrix of results.

This should be done on the validation set for SL* and train env for RL (maybe add a different random seed)

Installing other repos (e.g. nngeometry) by git url as dependencies through `pip install -e .`

There is this problem I have when trying to install nngeometry and the gym fork I've created when running pip install -r requirements.txt

I guess it might be intentional (in order to prevent people from installing stuff that isn't on PyPI or a submodule of the repo), but it seems very difficult to add a requirement for a package from a git URL or a path, either through requirements.txt or through "extras_require" of setup.py.

Improved command-line API

Need to design and implement a better command-line API for with commands, etc.

Hey @oleksost , @optimass , @prlz77, @pclucas14, please let me know what you think:

Users should be able to easily do these from the command-line

  • Evaluate a single Method on a single Setting
  • Evaluate a single method on a "benchmark" (pre-configured Setting or multiple settings?)
  • Evaluate a single Method on all its applicable Settings
  • Evaluate a single Method on a single Setting, but multiple datasets?
  • Evaluate all applicable Methods for a given Setting

Here are some ideas:

  • sequoia list: lists out all the available settings / methods
  • sequoia help <some_method_or_setting>: prints out available command-line arguments / class docstring / some useful documentation about a Method or Setting
  • sequoia --benchmark incremental_cartpole --method baseline ?
  • sequoia --setting task_incremental --method baseline ? or sequoia task_incremental_rl baseline?
  • sequoia --setting task_incremental --method ALL ?
  • sequoia --method baseline --setting ALL ?

Please chip in if you have any other ideas, the goal is for this to be simple, intuitive, and flexible.

"ClassIncrementalSetting" isn't really Class-Incremental, more like Domain-Incremental

It seems I am not immune to confusion w.r.t naming of CL settings! :)

Turns out, what I have implemented in the repo as the ClassIncrementalSetting is actually more related to Domain Incremental Learning than it is to Class-Incremental Learning as it is described in the iCaRL paper:

"Formally, we demand the following three properties of an algorithm to qualify
as class-incremental:
i) it should be trainable from a stream of data in which examples of
different classes occur at different times
ii) it should at any time provide a competitive multi-class classifier for
the classes observed so far,
iii) its computational requirements and memory footprint should remain
bounded, or at least grow very slowly, with respect to the number of classes
seen so far."

Currently, the ClassIncrementalSetting (with dataset="mnist" for example) creates 5 binary classification tasks and provides task labels to the Method at training time, but not at test time. Methods could be either single-head or multi-head if they want, but the problem remains the same: they need to be able to determine which task they are currently in at test time.

Potential TODOs:

  • Rename ClassIncrementalSetting to something like DomainIncrementalSetting ? (@optimass other name suggestions are welcome!)

  • Add a "true" Class-Incremental Setting, where:

    • there are no task labels, at training or at test time
    • classes are introduced progressively (without relabelling)
    • the total number of classes is unknown to the Method (might be hard to enforce!)
  • Decide where to place this Class-Incremental Setting in relation to Domain and Task-Incremental in the tree

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.