semitable / lb-foraging Goto Github PK

View Code? Open in Web Editor NEW

148.0 3.0 62.0 360 KB

Level-based Foraging (LBF): A multi-agent environment for RL

License: MIT License

Python 100.00%

lb-foraging's Introduction

A multi-agent reinforcement learning environment

About The Project
Getting Started
- Installation
Usage
Please Cite
Contributing
Contact

Caution

The LBF environment was updated to support the new Gymnasium interface in replacement of the deprecated gym=0.21 dependency (many thanks @LukasSchaefer). For backwards compatibility, please see Gymnasium compatibility documentation or use version v1.1.1 of the repository. The main changes to the interface are as follows:

obss = env.reset() --> obss, info = env.reset()
obss, rewards, dones, info = env.step(actions) --> obss, rewards, done, truncated, info = env.step(actions)
The done flag is now given as a single boolean value instead of a list of booleans.
You can give the reset function a particular seed with obss, info = env.reset(seed=42) to initialise a particular episode.

About The Project

This environment is a mixed cooperative-competitive game, which focuses on the coordination of the agents involved. Agents navigate a grid world and collect food by cooperating with other agents if needed.

More specifically, agents are placed in the grid world, and each is assigned a level. Food is also randomly scattered, each having a level on its own. Agents can navigate the environment and can attempt to collect food placed next to them. The collection of food is successful only if the sum of the levels of the agents involved in loading is equal to or higher than the level of the food. Finally, agents are awarded points equal to the level of the food they helped collect, divided by their contribution (their level). The figures below show two states of the game, one that requires cooperation, and one more competitive.

While it may appear simple, this is a very challenging environment, requiring the cooperation of multiple agents while being competitive at the same time. In addition, the discount factor also necessitates speed for the maximisation of rewards. Each agent is only awarded points if it participates in the collection of food, and it has to balance between collecting low-levelled food on his own or cooperating in acquiring higher rewards. In situations with three or more agents, highly strategic decisions can be required, involving agents needing to choose with whom to cooperate. Another significant difficulty for RL algorithms is the sparsity of rewards, which causes slower learning.

This is a Python simulator for level based foraging. It is based on OpenAI's RL framework, with modifications for the multi-agent domain. The efficient implementation allows for thousands of simulation steps per second on a single thread, while the rendering capabilities allows humans to visualise agent actions. Our implementation can support different grid sizes or agent/food count. Also, game variants are implemented, such as cooperative mode (agents always need to cooperate) and shared reward (all agents always get the same reward), which is attractive as a credit assignment problem.

Getting Started

Installation

Install using pip

pip install lbforaging

Or to ensure that you have the latest version:

git clone https://github.com/semitable/lb-foraging.git
cd lb-foraging
pip install -e .

Usage

Create environments with the gym framework. First import

import lbforaging

Then create an environment:

env = gym.make("Foraging-8x8-2p-1f-v3")

We offer a variety of environments using this template:

"Foraging-{GRID_SIZE}x{GRID_SIZE}-{PLAYER COUNT}p-{FOOD LOCATIONS}f{-coop IF COOPERATIVE MODE}-v0"

But you can register your own variation using (change parameters as needed):

from gym.envs.registration register

register(
    id="Foraging-{0}x{0}-{1}p-{2}f{3}-v3".format(s, p, f, "-coop" if c else ""),
    entry_point="lbforaging.foraging:ForagingEnv",
    kwargs={
        "players": p,
        "max_player_level": 3,
        "field_size": (s, s),
        "max_food": f,
        "sight": s,
        "max_episode_steps": 50,
        "force_coop": c,
    },
)

Similarly to Gym, but adapted to multi-agent settings step() function is defined as

nobs, nreward, ndone, ninfo = env.step(actions)

Where n-obs, n-rewards, n-done and n-info are LISTS of N items (where N is the number of agents). The i'th element of each list should be assigned to the i'th agent.

Observation Space

Action space

actions is a LIST of N INTEGERS (one of each agent) that should be executed in that step. The integers should correspond to the Enum below:

class Action(Enum):
    NONE = 0
    NORTH = 1
    SOUTH = 2
    WEST = 3
    EAST = 4
    LOAD = 5

Valid actions can always be sampled like in a gym environment, using:

env.action_space.sample() # [2, 3, 0, 1]

Also, ALL actions are valid. If an agent cannot move to a location or load, his action will be replaced with NONE automatically.

Rewards

The rewards are calculated as follows. When one or more agents load a food, the food level is rewarded to the agents weighted with the level of each agent. Then the reward is normalised so that at the end, the sum of the rewards (if all foods have been picked-up) is one. If you prefer code:

for a in adj_players: # the players that participated in loading the food
    a.reward = float(a.level * food) # higher-leveled agents contribute more and are rewarded more. 
    if self._normalize_reward:
        a.reward = a.reward / float(
            adj_player_level * self._food_spawned
        )  # normalize reward so that the final sum of rewards is one.

Please Cite

The paper that first uses this implementation of Level-based Foraging (LBF) and achieves state-of-the-art results:

@inproceedings{christianos2020shared,
  title={Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning},
  author={Christianos, Filippos and Schäfer, Lukas and Albrecht, Stefano V},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year={2020}
}

A comperative evaluation of cooperative MARL algorithms and includes an introduction to this environment:

@inproceedings{papoudakis2021benchmarking,
   title={Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks},
   author={Georgios Papoudakis and Filippos Christianos and Lukas Schäfer and Stefano V. Albrecht},
   booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS)},
   year={2021},
   openreview = {https://openreview.net/forum?id=cIrPX-Sn5n},
}

Contributing

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature)
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Contact

Filippos Christianos - [email protected]

Project Link: https://github.com/semitable/lb-foraging

lb-foraging's People

Contributors

Stargazers

Watchers

Forkers

uoe-agents yuchen-x joshwlks vidya1899 lukasschaefer pwollstadthri leoparada goingmyway ingyn oslumbers manueleberhardinger efosong wenhaoma-uts qiaoptdun larissa-tomaz himanshu-02 davidrother edu-ai ppsantos junhoseo0 lcdbezerra hackayan basin-ships callumtilbury dhavids adityakapoor74 sharlinu ruanjohn marianalaranjo anands29 lambdasaturn arnaudgardille shenjiede aledm1 lilmortyj jbhoffman613 luizdeabreu maayanorner wiemkhlifi chouakifares oakenmount qiancapital ferdos202150762 swaminathansk maxtaylordavies jamesemf

lb-foraging's Issues

Import hangs with Gym >0.21

When installing lb-foraging with certain versions of gym, the import hangs for a (very) long time.

Consider a simple script, import_tests.py:

from time import time
start = time()
import lbforaging
print(f"time = {time() - start:.3f}s")

With gym==0.21.*:

$ python import_tests.py
time = 0.650s

vs. with gym==0.22.*:

$ python import_tests.py
time = 1188.446s

This massive bottleneck is caused by the way gym registers environments from v0.22 onwards. Consider the gym/envs/registrations.py::EnvRegistry::register() method in v0.21:

https://github.com/openai/gym/blob/c755d5c35a25ab118746e2ba885894ff66fb8c43/gym/envs/registration.py#L205-L217

    def register(self, id, **kwargs):
        if self._ns is not None:
            if "/" in id:
                namespace, id = id.split("/")
                logger.warn(
                    f"Custom namespace '{namespace}' is being overrode by namespace '{self._ns}'. "
                    "If you are developing a plugin you shouldn't specify a namespace in `register` calls. "
                    "The namespace is specified through the entry point key."
                )
            id = f"{self._ns}/{id}"
        if id in self.env_specs:
            logger.warn("Overriding environment {}".format(id))
        self.env_specs[id] = EnvSpec(id, **kwargs)

See how this is different in v0.22:

https://github.com/openai/gym/blob/95063a08943e1f587c58be7435d94f81ccac8fd9/gym/envs/registration.py#L542-L596

    def register(self, id: str, **kwargs) -> None:
        spec = EnvSpec(id, **kwargs)


        if self._ns is not None:
            if spec.namespace is not None:
                logger.warn(
                    f"Custom namespace `{spec.namespace}` is being overridden "
                    f"by namespace `{self._ns}`. If you are developing a "
                    "plugin you shouldn't specify a namespace in `register` "
                    "calls. The namespace is specified through the "
                    "entry point package metadata."
                )
            # Replace namespace
            spec.namespace = self._ns


        try:
            # Get all versions of this spec.
            versions = self.env_specs.versions(spec.namespace, spec.name)


            # We raise an error if the user is attempting to initialize an
            # unversioned environment when a versioned one already exists.
            latest_versioned_spec = max(
                filter(lambda spec: isinstance(spec.version, int), versions),
                key=lambda spec: cast(int, spec.version),
                default=None,
            )
            unversioned_spec = next(
                filter(lambda spec: spec.version is None, versions), None
            )


            # Trying to register an unversioned spec when versioned spec exists
            if unversioned_spec and spec.version is not None:
                message = (
                    "Can't register the versioned environment "
                    f"`{spec.id}` when the unversioned environment "
                    f"`{unversioned_spec.id}` of the same name already exists."
                )
                raise error.RegistrationError(message)
            elif latest_versioned_spec and spec.version is None:
                message = (
                    f"Can't register the unversioned environment `{spec.id}` "
                    f"when version `{latest_versioned_spec.version}` "
                    "of the same name already exists. Note: the default "
                    "behavior is that the `gym.make` with the unversioned "
                    "environment will return the latest versioned environment."
                )
                raise error.RegistrationError(message)
        # We might not find this namespace or name in which case
        # we should continue to register the environment.
        except (error.NamespaceNotFound, error.NameNotFound):
            pass
        finally:
            if spec.id in self.env_specs:
                logger.warn(f"Overriding environment {id}")
            self.env_specs[spec.id] = spec

Specifically, the issue here is the addition of:
https://github.com/openai/gym/blob/95063a08943e1f587c58be7435d94f81ccac8fd9/gym/envs/registration.py#L559

            versions = self.env_specs.versions(spec.namespace, spec.name)

which calls
https://github.com/openai/gym/blob/95063a08943e1f587c58be7435d94f81ccac8fd9/gym/envs/registration.py#L220

        self._assert_name_exists(namespace, name)

which eventually hits:
https://github.com/openai/gym/blob/95063a08943e1f587c58be7435d94f81ccac8fd9/gym/envs/registration.py#L293

            suggestions = difflib.get_close_matches(name, self.names(namespace), n=1)

Herein lies the problem: lbforaging attempts to register a large number of environments (9720 by default). With each new environment, gym checks whether any environment has already been registered with a similar name. Naturally, this fuzzy match scales really badly with a large number of environments being registered.

v0.23 of gym takes the same approach to registration as v0.22, and thus faces the same issue. In v0.24-v0.26, the registration no longer looks for a fuzzy match (with difflib), but there is still a bottleneck due to an iteration over all registered envs:
https://github.com/openai/gym/blob/dcd185843a62953e27c2d54dc8c2d647d604b635/gym/envs/registration.py#L379-L403

def _check_spec_register(spec: EnvSpec):
    """Checks whether the spec is valid to be registered. Helper function for `register`."""
    global registry
    latest_versioned_spec = max(
        (
            spec_
            for spec_ in registry.values()
            if spec_.namespace == spec.namespace
            and spec_.name == spec.name
            and spec_.version is not None
        ),
        key=lambda spec_: int(spec_.version),  # type: ignore
        default=None,
    )


    unversioned_spec = next(
        (
            spec_
            for spec_ in registry.values()
            if spec_.namespace == spec.namespace
            and spec_.name == spec.name
            and spec_.version is None
        ),
        None,
    )

thus still yielding bad performance (definitely not as bad as v0.22 & v0.23 though):

$ python import_tests.py
time = 8.575s

I'm not sure what the solution for lbforaging is here?

Advise users that gym v0.22 & v0.23 are incompatible
Don't pre-register loads of environment configs. Perhaps do a lazy registration only when user asks for a given set-up? Not sure if/why this isn't a preferred approach.

Note: I am aware that gym has officially moved to https://github.com/Farama-Foundation/Gymnasium, but the issue remains there with gymnasium v0.27:
https://github.com/Farama-Foundation/Gymnasium/blob/6f35e7f87fc5b455b8cc70e366016c463fa52850/gymnasium/envs/registration.py#L298-L321

[Bug] Observation out of the range of low and high of the obs space

Dear authors, thanks for developing such an awesome library for MARL research. I tried to run the env Foraging-4s-8x8-2p-2f-v2 and found that the obs seems to be out of the low and high range of the obs space.

In RLlib, it will raise errors. Is it a severe issue? Can I clip the obs after getting the observations from env.step?

You can put the following before returning the nobs for checking.

for obs in nobs:
    assert self.observation_space[0].contains(obs)

Please correct me if I am wrong. Thanks in advance.

How could I render the test result?

When I render the view of the experiment, I met the following problem, is there anyone else had met the same problem before?

How could I render the test result?

other types of agents

how can I set players as q_agent? I want to use other types of agents

types of agents

hello , if I want to start the simulation with q_agent or heuristic_agent, how do I do?

Failed to import lbforaging

Hi,

I am trying to import lbforaging module both pip installed or git cloned without success. The import is never finished and the code has to be interrupted:

(aasma) $ python lbforaging.py
^CTraceback (most recent call last):
  File "lbforaging.py", line 7, in <module>
    import lbforaging
  File "/home/gsavarela/Repos/lb-foraging/lbforaging/__init__.py", line 23, in <module>
    "grid_observation": grid_obs,
  File "/home/gsavarela/.local/share/.pyenv/versions/aasma/lib/python3.7/site-packages/gym/envs/registration.py", line 613, in register
    return registry.register(id, **kwargs)
  File "/home/gsavarela/.local/share/.pyenv/versions/aasma/lib/python3.7/site-packages/gym/envs/registration.py", line 559, in register
    versions = self.env_specs.versions(spec.namespace, spec.name)
  File "/home/gsavarela/.local/share/.pyenv/versions/aasma/lib/python3.7/site-packages/gym/envs/registration.py", line 220, in versions
    self._assert_name_exists(namespace, name)
  File "/home/gsavarela/.local/share/.pyenv/versions/aasma/lib/python3.7/site-packages/gym/envs/registration.py", line 293, in _assert_name_exists
    suggestions = difflib.get_close_matches(name, self.names(namespace), n=1)
  File "/home/gsavarela/.local/share/.pyenv/versions/3.7.5/lib/python3.7/difflib.py", line 726, in get_close_matches
    if s.real_quick_ratio() >= cutoff and \
  File "/home/gsavarela/.local/share/.pyenv/versions/3.7.5/lib/python3.7/difflib.py", line 686, in real_quick_ratio
    return _calculate_ratio(min(la, lb), la + lb)

The offending line 220 seem to be a failed assertion. Upon closer inspection I got, namespace=None and name='CartPole':

ipdb> c 2020
> /home/gsavarela/.local/share/.pyenv/versions/aasma/lib/python3.7/site-packages/gym/envs/registration.py(221)versions()
    220         import ipdb; ipdb.set_trace()
--> 221         self._assert_name_exists(namespace, name)
    222

ipdb> namespace
ipdb> name
'CartPole'
ipdb>

My system is archlinux and my python version is 3.7.5

Could you point me to the right direction? Your project seems very promissing. Thanks in advance.

how can I set players as q_agent? I want to use other types of agents

Error when resetting environment

Hi. I just cloned the repo and tried to run the example in the readme file.

env = gym.make("Foraging-8x8-2p-1f-v2")
env.reset()

But then I got an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/camaral/miniconda3/envs/lbforaging/lib/python3.7/site-packages/gym/wrappers/order_enforcing.py", line 42, in reset
    return self.env.reset(**kwargs)
  File "/home/camaral/miniconda3/envs/lbforaging/lib/python3.7/site-packages/gym/wrappers/env_checker.py", line 45, in reset
    return env_reset_passive_checker(self.env, **kwargs)
  File "/home/camaral/miniconda3/envs/lbforaging/lib/python3.7/site-packages/gym/utils/passive_env_checker.py", line 192, in env_reset_passive_checker
    result = env.reset(**kwargs)
  File "/home/camaral/code/lb-foraging/lbforaging/foraging/environment.py", line 470, in reset
    self.spawn_players(self.max_player_level)
  File "/home/camaral/code/lb-foraging/lbforaging/foraging/environment.py", line 291, in spawn_players
    row = self.np_random.randint(0, self.rows)
AttributeError: 'numpy.random._generator.Generator' object has no attribute 'randint'

This is probably related to the NumPy version, but it doesn't say anywhere which version to use

Dependencies:

python=3.7.13
gym=0.26.2
numpy=1.21.6

What is the version of pyglet?

File "/home/lenovo/anaconda3/envs/env_rl_mujoco/lib/python3.9/site-packages/lbforaging/foraging/rendering.py", line 134, in _draw_grid
batch.add(
AttributeError: 'Batch' object has no attribute 'add'

batch is a instance of pyglet as follow
batch = pyglet.graphics.Batch()
for r in range(self.rows + 1):
batch.add(

What does the obs of each agent contain?

Hi, what does the obs of each agent contain? It seems there is no introduction in the docs. By reading the code, I guess the obs of each agent contains the position of agent and level etc. I found that the obs can also be fully observable. It would be great if you can update the docs.

Wonder whether step() and reset() only return observation

Thank you very much for such a great work!

I hope to know whether the code means it will not return any information about the absolute position of the agents and the food when using step() and reset() in environment.py

I am looking forward to hearing from you! Thank you! (I'm sorry if I missed somthing important)

Are cooperative tasks hard to train with QMIX?

Dear authors, in cooperative tasks (-coop), it seems it is hard to train converged policies with QMIX (the episode rewards are nearly zero). I used the default setting provided by PyMARL and used RLlib to train LBF with QMIX. I found in your paper, you trained Foraging-2s-8x8-2p-2f-coop-v2 and Foraging-8x8-2p-2f-coop-v2 with QMIX and the performance were converged. It would be great if you can provide some suggestions.

No attribute 'randint

Hello, I am trying to study lbforaging but I am facing this error when calling env.reset():
AttributeError: 'numpy.random._generator.Generator' object has no attribute 'randint
The solution may be to substitute randint for integers

have anyone experienced this error?

Thanks

Multi-objective setting

In the current version of LBF an agent gets only one value as a reward after each step. If agents had several resources they could collect in the environment could we change the reward from a value to a vector, where each component of that vector represents the reward on that current objective (i.e. the amount of resources collected after executing a certain action) that an agent receives after each step?

I want to know how to run a simple demo.

Hello, I download your code ,but I donot know how to run a simple demo for visualing, please help me.

The max_obs of observation range may not correct

Dear authors, after checking the code, I found that in https://github.com/semitable/lb-foraging/blob/master/lbforaging/foraging/environment.py#L125

            field_x = self.field.shape[1]
            field_y = self.field.shape[0]
            # field_size = field_x * field_y

            max_food = self.max_food
            max_food_level = self.max_player_level * len(self.players)

            min_obs = [-1, -1, 0] * max_food + [-1, -1, 0] * len(self.players)
            max_obs = [field_x, field_y, max_food_level] * max_food + [
                field_x,
                field_y,
                self.max_player_level,
            ]

and

return gym.spaces.Box(np.array(min_obs), np.array(max_obs), dtype=np.float32)

For 8x8 grid, the field_x and field_y are 8. However, the positions of the grids are within the range of [0, 7]. In gym.spaces.Box (https://github.com/openai/gym/blob/master/gym/spaces/box.py#L26), it is [0, 8]. I guess it will out of range. It will not cause any error if you do not predict the obs and modify the obs.

Could you please check it? If so, I will make a PR.

Error , not valid action

hello I am trying to execute the code , but when I compile it returns me this error
raise ValueError("%r is not a valid %s" % (value, cls.name))
ValueError: (5, 4) is not a valid Action

my env is " env = gym.make("Foraging-10x10-2p-5f-v1") "

Did I set it wrong?

thank you for all