tinyzqh / light_mappo Goto Github PK

View Code? Open in Web Editor NEW

416.0 1.0 73.0 1.29 MB

Lightweight version of MAPPO to help you quickly migrate to your local environment.

Python 99.57% Shell 0.43%

light_mappo's Introduction

light_mappo

Lightweight version of MAPPO to help you quickly migrate to your local environment.

Video (in Chinese)
This is a translated English version. Please click here for the orginal Chinese readme.

Background
Installation
Usage

Background

The original MAPPO code was too complex in terms of environment encapsulation, so this project directly extracts and encapsulates the environment. This makes it easier to transfer the MAPPO code to your own project.

Installation

Simply download the code, create a Conda environment, and then run the code, adding packages as needed. Specific packages will be added later.

Usage

The environment part is an empty implementation, and the implementation of the environment part in the light_mappo/envs/env_core.py file is: [Code] (https://github.com/tinyzqh/light_mappo/blob/main/envs/env_core.py)

import numpy as np
class EnvCore(object):
    """
    # Environment Agent
    """
    def __init__(self):
        self.agent_num = 2 # set the number of agents(aircrafts), here set to two
        self.obs_dim = 14 # set the observation dimension of agents
        self.action_dim = 5 # set the action dimension of agents, here set to a five-dimensional

    def reset(self):
        """
        # When self.agent_num is set to 2 agents, the return value is a list, and each list contains observation data of shape = (self.obs_dim,)
        """
        sub_agent_obs = []
        for i in range(self.agent_num):
            sub_obs = np.random.random(size=(14, ))
            sub_agent_obs.append(sub_obs)
        return sub_agent_obs

    def step(self, actions):
        """
        # When self.agent_num is set to 2 agents, the input of actions is a two-dimensional list, and each list contains action data of shape = (self.action_dim,).
        # By default, the input is a list containing two elements, because the action dimension is 5, so each element has a shape of (5,)
        """
        sub_agent_obs = []
        sub_agent_reward = []
        sub_agent_done = []
        sub_agent_info = []
        for i in range(self.agent_num):
            sub_agent_obs.append(np.random.random(size=(14,)))
            sub_agent_reward.append([np.random.rand()])
            sub_agent_done.append(False)
            sub_agent_info.append({})

        return [sub_agent_obs, sub_agent_reward, sub_agent_done, sub_agent_info]

Just write this part of the code, and you can seamlessly connect with MAPPO. After env_core.py, two files, env_discrete.py and env_continuous.py, were separately extracted to encapsulate the action space and discrete action space. In elif self.continuous_action: in algorithms/utils/act.py, this judgment logic is also used to handle continuous action spaces. The # TODO here in runner/shared/env_runner.py is also used to handle continuous action spaces.

In the train.py file, choose to comment out continuous environment or discrete environment to switch the demo environment.

Related Efforts

on-policy - 💌 Learn the author implementation of MAPPO.

Maintainers

@tinyzqh.

Translator

@tianyu-z

License

MIT © tinyzqh

light_mappo's People

Stargazers

Watchers

Forkers

happyemoji zhuwenjie911 yangxingyuan1 tanxiangtj rex18lf iamjimiaomiao hell-to-heaven djt-hust guwangtu raylrayl gitshitou lyrorz wgxhihihi shushushulian xiaohuojianchendiwen deepsota aicools hanhanchan erlebnisw leslie-tang en2805990 wang88256187 yuang-yao yuling91 wzc-blog alilalily koino1 moooontoo mrx1a finleygou yangfengwxy zhenzhenzhizhi tianyu-z wtxhfut417 jeffrey28 clorisqiu1 luershuai jobethli magiclucky1996 taoasd yiminga sixsixsixmybaby gavine199 zxtxjtu chuangzhang1999 shiiku419 starintheshell nyrus-y glonkkkkk birdmanking blog666 jinbo-he wangwang318 ydeng6 testmonkey02 danny-wenya luoming3 yz9968 wh1sker huang312 icc-qi qst75693 zheng-guangyuan drkaen xander-2077 p1gggggg lognam-huang illusionxc buck111111 sulenful syhdtc668 somnuschaplet

light_mappo's Issues

How to set continuous action

I want to use continuous actions, but an error is reported after setting the self.discrete_action_space in the environment to false.

如果想换policy，怎么换？

VecEnvWrapper使用

There is the mistake in the env_wrappers.py that is the VecEnvWrappercan is the unresolved reference.

请问env_continuous文件在哪里

直接把env_discrete文件里的 self.discrete_action_input = False是不是就是用于连续动作

请问这个修改了环境之后具体怎么跑？

训练效果查看

请问训练结束后，得到logs 和 models 怎么查看和使用？log

s使用tensoboard进行查看吗，怎么来加载模型测试来查看效果呢？

ModuleNotFoundError: No module named 'envs.env_core'

from envs.env_core import EnvCore

在使用连续动作空间时，输出的动作取值无法设置上下限

师兄您好，我在调整env_continuous.py文件中第30行代码u_action_space = spaces.Box(low=0.0, high=90.0, shape=(self.signal_action_dim,), dtype=np.float32)后，action取值并没有限制在0和90之间，请问师兄这是为什么呢?谢谢师兄了。

How to render if using customized environment?

I am using my customized env and want to render in 'human' and 'rgb_array' mode. Could you please give some examples or implementation?

MAPPO-L

Thanks very much for your codes.

Have you considered to extend it into other variants of MAPPO, such as MAPPO-L?

一次回合结束时重置环境导致obs发生变化

在env_wrappers.py中，step_wait()的"obs[i] = self.envs[i].reset()"判断episode是否结束，这里将reset之后的观测值传给了obs[i]，导致episode结束的那一刻的obs被覆盖。这样赋值是否不妥？因为reset之后的obs可以认为是随机的，不应该将其赋给obs[i]，而应该直接调用"self.envs[i].reset()"？

Action mask?

您好！如果agent的动作维度不一致时，light-mappo如何进行action mask？

如何实现论文里的可视化呢

自己的环境观测空间是box时该如何修改代码呢？

每个智能体的观测空间是一个3宽高的Box，那么在env_core.py中self.obs_dim该如何设置呢？以及env_discrete.py里的observation_space又该如何设置？最后是否还有其他要做特别修改的地方吗？

加入自己的环境，使用env_continuous时碰到的问题

在自己修改代码后，选择的是continuous env，智能体separated policy更新action，但是env_runner.py中的 collect 函数这里只有MultiDiscrete 和Discrete两个选项，没有Box选项，请问这个情况要怎么处理？感谢！

env

給出的范例只有 sub-agnet_obs ，这里是不是没有特别区分观测信息与全局状态信息？这里的 sub_agent_obs 就是指智能体的部分观测信息的列表吗？那这样全局信息是怎么处理的呢，就是把部分观测信息的融合作为全局信息？

选use_eval的时候运行报错NotImplementedError

是不是连续动作空间的环境不能用eval

Traceback (most recent call last):
File "G:\lcz\mappo\train\train.py", line 149, in
main(sys.argv[1:])
File "G:\lcz\mappo\train\train.py", line 137, in main
runner.run()
File "G:\lcz\mappo\runner\shared\env_runner.py", line 88, in run
self.eval(total_num_steps)
File "C:\Users\ljh99\anaconda3\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "G:\lcz\mappo\runner\shared\env_runner.py", line 183, in eval
raise NotImplementedError
NotImplementedError