allenai / allenact Goto Github PK

View Code? Open in Web Editor NEW

296.0 12.0 46.0 25.65 MB

An open source framework for research in Embodied-AI from AI2.

Home Page: https://www.allenact.org

License: Other

Python 99.66% Shell 0.31% HTML 0.03%

reinforcement-learning embodied-agent ai research python deep-learning ai2 computer-vision

allenact's People

Contributors

Stargazers

Watchers

Forkers

drschwenk mattdeitke admariner codeaudit abdulrahim-a prithv1 andrlima klemenkotar architectureofthings jvrana colinmatthewgeorge87 marlohmann kolbytn tzadouri gianscarpe qiwu57kevin ehsanik jws39 trendingtechnology elliotthwang gracefulman x796c16 stonegiggity chandanpanda mengdi-li twni2016 kshitijd20 zcczhang running-mars brandontrabucco checkjunghyeon sjingwen xiaobaishu0097 microsoft-fevieira ram81 shijiel2 uyeongkim ngys321 lgtm-migrator hyunsulee wzdthu dtch1997 surper518 richienb mister-jp

allenact's Issues

Multi-agent furniture moving/lifting support

Problem

We would like to support the multi-agent FurnMove and FurnLift tasks we have introduced in prior work.

Solution

We will need to implement the Task, TaskSampler, and models from https://github.com/allenai/cordial-sync.

Dependencies

This should only be undertaken after multi-agent support is in (#109).

Errors when running the BabyAI baseline

I got this error when running the babyAI baseline

CUDA_VISIBLE_DEVICES=0 python ddmain.py go_to_local.ppo --experiment_base projects/babyai_baselines/experiments

08/11 15:58:26: INFO: Running with args Namespace(checkpoint=None, deterministic_cudnn=False, experiment='go_to_local.ppo', experiment_base='projects/babyai_baselines/experiments', extra_tag='', gp=None, max_sampler_processes_per_worker=None, output_dir='experiment_output', restart_pipeline=False, seed=None, skip_checkpoints=0, test_date=None)	[ddmain.py: 175]
08/11 15:58:26: INFO: Git diff saved to experiment_output/used_configs/BabyAIGoToLocalPPO/2020-08-11_15-58-26[runner.py: 436]
08/11 15:58:26: INFO: Config files saved to experiment_output/used_configs/BabyAIGoToLocalPPO/2020-08-11_15-58-26	[runner.py: 453]
08/11 15:58:26: INFO: Using 1 train workers on devices [device(type='cpu')]	[runner.py: 138]
08/11 15:58:26: INFO: Started 1 train processes	[runner.py: 268]
08/11 15:58:26: INFO: No processes allocated to validation, no validation will be run.	[runner.py: 297]
08/11 15:58:28: INFO: train 0 args {'experiment_name': 'BabyAIGoToLocalPPO', 'config': <experiments.go_to_local.ppo.PPOBabyAIGoToLocalExperimentConfig object at 0x7f16e44a5da0>, 'results_queue': <multiprocessing.queues.Queue object at 0x7f16e44a57b8>, 'checkpoints_queue': None, 'checkpoints_dir': 'experiment_output/checkpoints/BabyAIGoToLocalPPO/2020-08-11_15-58-26', 'seed': None, 'deterministic_cudnn': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f16e4263390>, 'num_workers': 1, 'device': device(type='cpu'), 'distributed_port': 0, 'distributed_barrier': None, 'max_sampler_processes_per_worker': None, 'mode': 'train', 'worker_id': 0}	[runner.py: 185]
08/11 15:58:30: ERROR: Encountered Exception. Terminating train worker 0	[light_engine.py: 1123]
08/11 15:58:30: ERROR: Traceback (most recent call last):
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/light_engine.py", line 1109, in train
    else typing.cast(ActorCriticModel, self.actor_critic.module),
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/light_engine.py", line 972, in run_pipeline
    self.initialize_rollouts(rollouts)
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/light_engine.py", line 402, in initialize_rollouts
    observations = self.vector_tasks.get_observations()
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/light_engine.py", line 235, in vector_tasks
    max_processes=self.max_sampler_processes_per_worker,
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 154, in __init__
    args_list for args_list in self._partition_to_processes(sampler_fn_args)
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 325, in _spawn_workers
    *[self._mp_ctx.Pipe(duplex=True) for _ in range(self._num_processes)]
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 325, in <listcomp>
    *[self._mp_ctx.Pipe(duplex=True) for _ in range(self._num_processes)]
  File "/usr/lib/python3.6/multiprocessing/context.py", line 62, in Pipe
    return Pipe(duplex)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 506, in Pipe
    s1, s2 = socket.socketpair()
  File "/usr/lib/python3.6/socket.py", line 488, in socketpair
    a, b = _socket.socketpair(family, type, proto)
OSError: [Errno 24] Too many open files
	[light_engine.py: 1126]
Traceback (most recent call last):
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/light_engine.py", line 1109, in train
    else typing.cast(ActorCriticModel, self.actor_critic.module),
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/light_engine.py", line 972, in run_pipeline
    self.initialize_rollouts(rollouts)
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/light_engine.py", line 402, in initialize_rollouts
    observations = self.vector_tasks.get_observations()
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/light_engine.py", line 235, in vector_tasks
    max_processes=self.max_sampler_processes_per_worker,
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 154, in __init__
    args_list for args_list in self._partition_to_processes(sampler_fn_args)
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 325, in _spawn_workers
    *[self._mp_ctx.Pipe(duplex=True) for _ in range(self._num_processes)]
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 325, in <listcomp>
    *[self._mp_ctx.Pipe(duplex=True) for _ in range(self._num_processes)]
  File "/usr/lib/python3.6/multiprocessing/context.py", line 62, in Pipe
    return Pipe(duplex)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 506, in Pipe
    s1, s2 = socket.socketpair()
  File "/usr/lib/python3.6/socket.py", line 488, in socketpair
    a, b = _socket.socketpair(family, type, proto)
OSError: [Errno 24] Too many open files
08/11 15:58:30: ERROR: Encountered Exception. Terminating runner	[runner.py: 715]
08/11 15:58:30: ERROR: Traceback (most recent call last):
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/runner.py", line 680, in log
    package[1] - 1
Exception: Train worker 0 abnormally terminated
	[runner.py: 716]
Traceback (most recent call last):
  File "/home/jiasen/Code/embodied-rl/core/algorithms/onpolicy_sync/runner.py", line 680, in log
    package[1] - 1
Exception: Train worker 0 abnormally terminated
08/11 15:58:30: INFO: Joining train 0	[runner.py: 762]
08/11 15:58:30: INFO: Closed train 0	[runner.py: 762]

Off-policy training

Problem

The ADVISOR code-base has support for interleaving off-policy updates (from an arbitrary pytorch dataset and with arbitrary losses) with on-policy updates. It would be great to have similar capabilities here. In particular, we should be able to:

Define a pipeline stage so that it performs some fixed number of off-policy updates.
Allow for off-policy updates to be interleaved with on-policy updates.

Solution

This requires:

A way to specify how this training will occur (e.g. defining off-policy losses and updating the pipeline stage to allow for these types of losses + dataset).
Updating the runner/light_engine to implement these types of updates.

Possible issues:

Currently we log based on the number of rollout steps. This will not be reasonable if we allow a stage to do purely off-policy updates.

Dependencies

None

Tutorial: Warm-starting with Imitation Learning

Problem

We would like a tutorial which demonstrates how to use multiple PipelineStages. To this end lets make a tutorial showing how to warm start with imitation learning. As @unnat as already implemented such experiment configurations for the LC Corrupt S15N7 task (and these are in the paper) we should probably just use these.

Solution

This requires:

Merging the relevant ADVISOR configuration.
Writing the tutorial.

Possible issues:

The ADVISOR configurations make some about how gin-config works that are no longer true after this update. Namely, since the light_engine is initialized on a separate process (created by "forkserver") any parameters set by gin-config will not propagate to this newly spawned process. To get around this we'll have to change how we're using gin-config a bit. Ask me about this.

Dependencies

None

Update package version to 0.1 in setup.py and create release.

Tutorial: Using an Arbitrary OpenAI Gym Environment

Problem

We would like a tutorial showing, from start to finish, how an arbitrary OpenAI Gym environment can be used with our framework.

Solution

This requires:

Creating generic GymTask and GymTaskSampler classes which just wrap a given gym environment (or gym environment string identifier) for our use.
Writing the tutorial using the above.

Possible issues:

I'm not entirely sure how we should handle Sensors in this case. Since the Gym just returns observations itself we could just have a GymSensor that wraps these?

Dependencies

None

Simply running babyai example after installation (Ubuntu 20.04)

Problem

After installing AllenAct with either pip or pipenv, simply running the basic babyai experiment

python main.py minigrid_tutorial -b projects/tutorials -m 8 -o minigrid_output -s 12345

fails with the following output details:

10/08 11:23:49 INFO: Running with args Namespace(checkpoint=None, deterministic_agents=False, deterministic_cudnn=False, experiment='minigrid_tutorial', experiment_base='projects/tutorials', extra_tag='', gp=None, log_level='info', max_sampler_processes_per_worker=8, output_dir='minigrid_output', restart_pipeline=False, seed=12345, skip_checkpoints=0, test_date=None)     [main.py: 242]
10/08 11:23:49 ERROR: Uncaught exception:       [system.py: 137]
Traceback (most recent call last):
  File "main.py", line 205, in load_config
    module = importlib.import_module(module_path)
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "./projects/tutorials/minigrid_tutorial.py", line 12, in <module>
    from plugins.minigrid_plugin.minigrid_sensors import EgocentricMiniGridSensor
  File "./plugins/minigrid_plugin/minigrid_sensors.py", line 8, in <module>
    from babyai.utils.format import InstructionsPreprocessor
ModuleNotFoundError: No module named 'babyai'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "main.py", line 282, in <module>
    main()
  File "main.py", line 246, in main
    cfg, srcs = load_config(args)
  File "main.py", line 219, in load_config
    ) from e
ModuleNotFoundError: Could not import experiment 'projects.tutorials.minigrid_tutorial', are you sure this is the right path? Possibly relevant files include ['./allenact/projects/tutorials/minigrid_tutorial.py'].

Desktop

Please add the following information:

OS: Ubuntu 20.04
AllenAct Version: current HEAD of master

Additional context

Perhaps it's a paths issue, but I'm not sure of an immediate fix.

Matt :)

Possible bug in how max-tasks is being set in objectnav dataset task samplers

Problem

I think the max-tasks parameter is being set incorrectly in the ObjectNavDatasetTaskSampler definition. Here's the line:

allenact/plugins/robothor_plugin/robothor_task_samplers.py

Line 338 in ee736e6

self.max_tasks = sum(

Right now, validation (I think) and test evaluation runs only over a subset of all the provided episode configs -- 240 to be precise on val. This is happening since
self.max_tasks = sum(len(scene_episodes) for scene_episodes in self.episodes)
only considers 16 episodes (length of the keys in self.episodes) per-scene.

Steps to reproduce

The metric json stored after running evaluation with an objectnav checkpoint trained for a few steps contains fine-grained results for only 240 episodes.

Expected behavior

I think self.max_tasks = sum(len(scene_episodes) for scene_episodes in self.episodes) should be replaced with self.max_tasks = sum(len(self.episodes[scene]) for scene in self.episodes) akin to

allenact/plugins/robothor_plugin/robothor_task_samplers.py

Line 767 in ee736e6

self.max_tasks = sum(len(self.episodes[scene]) for scene in self.episodes)

Allow specification of machine params from the command line

Problem

Currently it can be a bit of struggle to specify machine parameters as you have to manually change parameters in the configuration.

Solution

Let's have a command-line means by which to change machine parameters. Some things that would be nice to be able to change:

Workers and their devices
Samplers and their devices

One possible problem is that we might need a way to change the above for training/validation/testing separately. This might be a little messy.

Dependencies

None.

Add histogram of actions as a metric

Very useful to track progress and understand if the agent gets stuck in some degenerate solution space.

Freeze class-level experiment attributes

Problem

Currently there is nothing preventing a user from changing the values of class-level attributes of ExperimentConfigs at runtime. This was ok in the old implementation but now, as each light_engine is spawned using "forkserver", we run into a problem, the code:

import torch.multiprocessing as mp
import time

class BaseBabyAIGoToObjExperimentConfig:
    MY_GREAT_VARIABLE = 3

def my_func(config):
    print(config.MY_GREAT_VARIABLE)

if __name__ == "__main__":
    mp = mp.get_context("forkserver") # Broken
    # mp = mp.get_context("fork") # Works
    BaseBabyAIGoToObjExperimentConfig.MY_GREAT_VARIABLE = 5
    cfg = BaseBabyAIGoToObjExperimentConfig()
    p = mp.Process(target=my_func, kwargs=dict(config=cfg))
    print("main", cfg.MY_GREAT_VARIABLE)
    p.start()
    p.join()

will print 5 and then 3 when mp = mp.get_context("forkserver") but 5 then 5 when mp = mp.get_context("fork"). This means that a user might change the value of a class-level attribute before running training but this change will not propagate to the training process. We seem to need "forkserver" for some CUDA reasons (?) so it's probably best to:

Disallow the user from changing class-level variables, or
Detect any such change and throw an error if the runner is called with such a, modified, config.

Solution

This requires:

Having some means by which to stop changes to class-level variables. One approach that goes part of the way there is the following pattern:

class FrozenClassVariables(type):
    def __setattr__(cls, attr, value):
        raise RuntimeError("Cannot edit class-level attributes.")

class SomeClass(object, metaclass=FrozenClassVariables):
    yar = 3
    

if __name__ == "__main__":
    
    try:
        SomeClass.yar = 6 # Error
    except Exception as _:
        print("Threw exception")
    
    SomeClass().bar = 12 # No error

I'm not sure how to make ExperimentConfigs automatically have this metaclass but I presume it's possible.

Dependencies

None

Runner doesn't terminate

e.g. python main.py go_to_local.distributed_pure_offpolicy --experiment_base projects/babyai_baselines/experiments -m 8

Testing is surprisingly slow

Problem

Running testing is surprisingly slow: often as slow or slower (in fps) than training.

Solution

While I am not certain, I believe this has to do with redundant/unnecessary copying of tensors around in the rollout storage. This will require some debugging and profiling to get to the root problem.

Dependencies

None

Incorrect import for resnet based actor-critic model

The following line https://github.com/allenai/embodied-rl/blob/c299cc9e8a1075c801f7e7b4d9c21264c401f71d/experiments/objectnav_robothor_rgb_ddppo.py#L14 importing the ResNet backbone based actor critic model should be changed to
from models.resnet_tensor_object_nav_models import ResnetTensorObjectNavActorCritic

Train and test in ALFRED

Integrate all ADVISOR code

Problem

We should integrate the remaining ADVISOR experiments (@unnat).

Solution

This requires:

This is relatively straightforward but we'll need to adjust the hyperparameter searching code, and our experimental configs, to account for the issues presented in #116 which make using gin-config more difficult.

Dependencies

Issues #116, #113, #112, #111

Environment for honey bee foraging and dance

Problem

I'm interested in porting a simulation of honey bee foraging behavior, including their unique dance that informs the colony of the whereabouts of nectar, to the allenact platform.

Desired solution

The current simulation is in a grid world, so it would probably be a variation of the minigrid world. There would be a variable number of nectar-containing flowers in the world, each uniquely identified, along with a central hive. A run would consist of randomly placing a bee on a flower with a random orientation. This simulates arriving from a foraging mission. The bee would extract the flower's nectar and observe if there is surplus nectar in the flower that the colony should be made aware of (through the bee dance). Knowing the flower id, the bee then flies to the hive where is deposits the nectar. If surplus nectar was observed, it then "dances" to indicate the direction and distance to the flower (or maybe just the id of the flower).

Fix imitation loss

Broken after #141 due to hardcoded tensor shapes.

Rename code and docs to use AllenAct

Tutorial: Creating a new loss (e.g. curiosity-based-exploration?)

Problem

We would like a tutorial on creating new losses. Perhaps this loss could be of the curiosity-based-exploration variety (e.g. forward/inverse dynamics prediction)?

Solution

This requires:

Writing the tutorial and defining the new losses.

Dependencies

None

PILLOW_VERSION depracated in 7.0.0

pipenv run python main.py object_nav_thor_ppo gives an error:

  File "/Users/unnatj/.local/share/virtualenvs/embodied-rl-2Q4uo4QS/lib/python3.6/site-packages/torchvision/transforms/functional.py", line 5, in <module>
    from PIL import Image, ImageOps, ImageEnhance, PILLOW_VERSION
ImportError: cannot import name 'PILLOW_VERSION'

This suggests to either downgrade to pillow==6.1 or to change the torchvision source code. The former edit does work.

Add docstrings to all classes

Problem

Some of our classes do not have docstrings. But we refer to them within the documentation, assuming that they will be documented within the API folder. This leads to dead links.

Solution

Go through all the classes in the repository.
Add meaningful docstrings to the important ones.
Add placeholder docstrings to the less important ones.

Dependencies

None

Tutorial: running inference on the habitat 75M and RoboTHOR 200M models.

Problem

We would like a tutorial on running inference with the above models.

Solution

This requires:

Writing the tutorial.
Writing the code to be able to download the appropriate model weights.

Dependencies

None

Improve inline documentation for primary abstractions

Problem

All of our primary abstractions should have good documentation and be fully typed.

Solution

Go through all of these critical files and ensure that:

All classes/methods/modules have doc strings and these doc strings enumerate all the parameters if relevant.
All (except in rare cases where doing so would be highly unpractical) parameters are typed.

Dependencies

None

RoboTHOR PointNav SuperSmall dataset

Create a subset of the RoboTHOR dataset, that trains to a high success value in a few minutes on CPU.
This dataset will be useful for debugging.

Objectnav tutorial

Or PointNav to ObjectNav

Unflatten tensors in RolloutStorage

Problem

In the rollout storage, we currently flatten tensors along some dimensions (combining the rollout index dim and the time dim into one). This is awkward and means that every actor critic model needs to remember this arbitrary ordering.

Solution

Let's stop flattening and fix existing models to expect these unflattened tensors (fixing the RNNStateEncoder should go a long way towards this).

debug option for objectnav

It would be great if objectnav also had a debug dataset like pointnav does.

generally, I wonder if contributors should be encouraged to implement a debug option if writing a new task. If the task's debug mode is on, only simple, hand-picked, instantiations of the task are sampled for training, and testing is on the same scenes (or very similar scenes, for which generalization should be much easier than in the full setting).

Remove Distance Caching for RoboTHOR and iTHOR Navigation Tasks and Rely Solely on the Built in Navmeshes

Problem

Currently AllenAct is using pre-computed distance caches as the default way of finding the distance between any point in the scene and the target. It has been shown that these caches offer little in the way of FPS improvements and makes the model train slower (in terms of number of steps). They are also tied to a specific release of ai2thor and awkward to use.

Desired solution

iTHOR has now built in navmeshes for local path finding, and RoboTHOR has had this feature for some time. We can now use these native Unity methods to efficiently find a good distance to the target that can be used in the reward function and the SPL calculation.

What This Requires

Updating the task sampler and task code for all the navigation tasks
Removing the distance caches from the online dataset files
Re-Generating the PointNav and ObjectNav datasets for iTHOR and RoboTHOR using the new distance + path finding function

Viz support for multi-agent

For multi-agent viz It would make more sense to name the current "groups" by an episode id and include visualizations for each agent in the episode.

Disable saving (tensorboard) log files

Problem

We should be able to disable tensorboard logging from the command line. We also need the option to skip other type of logging (e.g. saving to the used_configs folder).

Solution

Hopefully this is as easy as passing a few additional parameters to the runner.

Dependencies

None

Logging losses on the validation / test split

Problem

I think right now logging losses (PPO or otherwise) as defined under core/algorithms/onpolicy_sync/losses/ is only supported on the train split. If I understand correctly, right now on the train split, it's possible to log losses (and other quantities defined in the loss definitions -- policy entropy, etc.) as well as metrics (and rewards) defined in the task definitions (success, SPL, etc. under plugins/robothor_plugin/robothor_tasks.py for instance). However, on val, only logging the latter is supported. While logging loss values (and other quantities defined in loss definitions) on val may not be super-useful in case of PPO, but may be worthwile for debugging experiments that have additional custom defined losses (action-prediction, etc.).

Desired solution

To be able to observe loss (and other quantities defined under loss definitions) on the val split in the tensorboard logs.

Tutorial: Mastering decentralized distributed training

Problem

We should have a tutorial that discusses how the OnPolicyRunner and OnPolicyRLEngine work together.

Solution

This requires:

Writing the tutorial.

Dependencies

None

Allow specifying the use of determistic validation/testing agents from the command line.

Problem

Currently we have the functionality to allow for deterministic agents (i.e. always taking the mode action) during validation/testing but there is no way to specify this from the command line.

Solution

Lets add a command line argument for this. I think we should make deterministic inference off by default.

Dependencies

None.

Control over logging level

Problem

There should be a command line argument allowing us to control the logging level (i.e. DEBUG, ERROR, etc). This is especially necessary when running many experiments at the same time (e.g. in hyperparameter search) in which training outputs is mostly useless.

Solution

This requires:

A better logging strategy that propagates the settings to forked processes.

Dependencies

None

Projects for objectnav and pointnav baselines need improvement

If we are reproducing results from a paper, we should cite the paper.
If we are providing a baseline model, ideally we need a simple model diagram for the experiment. So that the reader can understand more details about what this baseline is.
In addition, we need to provide metrics via tensorboard screenshots for all runs in the project

Incorrect link to experiments doc in README

The link to the doc run your first experiment seems incorrect in the README. The path to the markdown file needs to be changed from ./overview/running-your-first-experiment.md to ./docs/overview/running-your-first-experiment.md

Documenting all the command line arguments

Problem

[Creating an issue here following Luca's suggestion]
It might be beneficial for users if all the command line arguments in the main.py script were documented in the AllenAct docs. I ran into an issue (fixed now) where I accidentally started training from the latest checkpoint with the restart_pipeline flag for a run that had crashed prematurely -- my goal was to resume training as opposed starting an entirely new training run from a pre-trained checkpoint. Clarifying when to use different command line arguments (beyond the ones which are already documented) will be helpful.

Desired solution

Explicitly documenting all the command line arguments.

Possible bug in how recurrent models are storing their hidden state

I think (but am not 100% sure) that there's a bug here:

https://github.com/allenai/embodied-rl/blob/eb330eff7187e9b1beb77c3d2a67a6370211b822/onpolicy_sync/storage.py#L199
self.recurrent_states is shape [T, num_layers, num_processors, h].

If my understanding is correct, then this line seems to take the cached recurrent state at timestep 0 only, which later gets used to do a training update. though I could be wrong 😅

Tutorial: Arbitrary Memory Types

Problem

We would like a tutorial showing how to use custom memory types, as well as multiple losses.

Solution

This requires:

Creating a Memory Map class, an ANM model and a supervised loss to train it
Writing a tutorial on how to implement a custom memory type and use it

Option to allow changing / overriding experiment config parameters from command line arguments

Problem

In addition to worker_devices, it'd be great if there was an option to modify / override other parameters in the experiment configs without rewriting a new experiment config. Handling multiple configs for changes in learning rate, number of rollout steps, new datasets (new episode configs) or sensor specifications can become unweildy while running multiple controlled sweeps for instance.

Desired solution

Override experiment config parameters if specified in command line arguments and ignore if not specified. Monkey patching the experiment configs in the load_config() function in main.py is one possible solution but not sure if that's the right way to go about this.

Multi-agent support

Problem

We do not currently support multiple agents.

Solution

To support multiple agent's we'll need to:

Extend the Task abstraction (and also VectorSampledTasks) to:
- Accept multiple actions at once, e.g. the step should change from action: int to Union[int, Sequence[int]].
- Return a sequence of rewards and observations (one for each agent). I don't know if we'd rather return Sequence[RLStepResult] or, instead, update RLStepResult to return sequences of values when appropriate. I have a slight preference for the second variant (as it would make it easier in the future to return values that are common to all agents, e.g. a joint reward) but could be convinced otherwise.
Update the RolloutStorage class so that one dimension is dedicated to different agents.
Other changes to handle the above changes multiple in the light_engine

I'm sure I'm missing some additional places that will need to be updated.

Dependencies

This should be completed after #108.

Distributed training problems with more complicated losses

Problem

The DistributedDataParallel wrapper that we use to enable distributed training has some limitations in that it assumes all model parameters will be used when losses are computed and, otherwise, complains.

Solution

We should either find a way to fix this problem (e.g. using dummy losses which send 0-value gradients) or document this behavior very well so that people are not confused when their code fails to train.

Dependencies

None

Note

I do not have time to provide a minimal working example but am available to discuss how to reproduce this with anyone undertaking this issue.

Tutorial: Off-policy training

Problem

We would like a tutorial on training with off-policy. A good candidate for this seems to me the BabyAI GoToLocal imitation learning experiment we have.

Solution

This requires:

Writing the tutorial.

Dependencies

None

Rename "gpu_ids" parameter to "worker_devices".

Problem

Currently the devices workers use is set by a "gpu_ids" dictionary entry in the machine_params method of a config file. This is a bit strange as these devices can (e.g. for debugging) be cpus.

Solution

Let's rename "gpu_ids" to something more evocative (e.g. "worker_devices" or "distributed_devices").

Dependencies

None.

Warnings

This will be a breaking change.

Remove Gibson assets from the Docker container

New simplified directory structures

Problem

The codebase is currently a bit fragmented and not standardized.

Solution

Modify the current code base to reflect this structure (agreed upon by multiple team members):

Tutorial: Creating a new TaskSampler (for curriculum learning?)

Problem

We need a tutorial that guides someone through creating a more complicated task sampler. I would suggest creating a task sampler that does curriculum learning (e.g. PointNav but where you start near the goal at the beginning of training and only later see further and further initial starting positions).

Solution

This requires:

Writing the tutorial and creating the task sampler.

Dependencies

None

Properly resume training with offpolicy losses

Currently, a new epoch will be started when resuming training (new Iterator will be instantiated). We should save the random seed used to shuffle the datasets (for all workers?) and the length of the remaining data, besides enforcing a resume API for iterators.

Fix all mypy type errors

Problem

mypy currently generates a long list of type errors which need to be fixed. After this release, we should not merge PRs unless mypy returns no errors.

Solution

These just have to be dealt with one my one. In some, reasonably rare, cases mypy is wrong in which case we can use # type: ignore to ignore the error (this should be avoided whenever possible).

Dependencies

None

allenai / allenact Goto Github PK

allenact's People

Contributors

Stargazers

Watchers

Forkers

allenact's Issues

Problem

Solution

Dependencies

Problem

Solution

Dependencies

Problem

Solution

Dependencies

Problem

Solution

Dependencies

Problem

Desktop

Additional context

Problem

Steps to reproduce

Expected behavior

Problem

Solution

Samplers and their devices

Dependencies

Problem

Solution

Dependencies

Problem

Solution

Dependencies

Problem

Solution

Dependencies

Problem

Desired solution

Problem

Solution

Dependencies

Problem

Solution

Dependencies

Problem

Solution

Dependencies

Problem

Solution

Dependencies

Problem

Solution

Problem

Desired solution

What This Requires

Problem

Solution

Dependencies

Problem

Desired solution

Problem

Solution

Dependencies

Problem

Solution

Dependencies

Problem

Solution

Dependencies

Problem

Desired solution

Problem

We would like a tutorial showing how to use custom memory types, as well as multiple losses.

Solution

Problem

Desired solution

Problem

Solution