idsia / hhmarl_2d Goto Github PK

View Code? Open in Web Editor NEW

51.0 51.0 10.0 1.72 MB

Heterogeneous Hierarchical Multi Agent Reinforcement Learning for Air Combat

Python 100.00%

air-combat heterogeneous-agents hierarchical-reinforcement-learning multi-agent-reinforcement-learning

hhmarl_2d's People

Contributors

Stargazers

Watchers

Forkers

wuweh fanbbbb sunxi1010 nantogmas92 yulin3262 heavencc zhimaerfan shenjiede chengc1 yuyixiong-123

hhmarl_2d's Issues

License ?

I stumbled across your project while researching about possible search and rescue reinforcement learning environments. It seems interesting. I am wondering under what license is this project released under ? Thank you.

An Error is raised “Could not find L3 Fight Policy. Store in E:\hhmarl_2D\results\Level3_fight_2vs2\checkpoint”

After training the hetero in mode=fight & escape on L3，I tried to train the hetero with “python train_hetero.py --epochs=10000 --restore=True --agent_mode=escape --level=4”，but I failed.An Error is raised “Could not find L3 Fight Policy. Store in E:\hhmarl_2D\results\Level3_fight_2vs2\checkpoint”，but the folder exists and under the folder exists ‘E:\hhmarl_2D\results\Level3_fight_2vs2\checkpoint\policies\ac1_policy’ and ‘E:\hhmarl_2D\results\Level3_fight_2vs2\checkpoint\policies\ac2_policy’.Looking foward to your reply.

Adjustment to accelerate the training process

Hello, I'm trying to implement the project, however the training process is too slowly.
Level 1 took about 5 days in my PC.
I don't make changes and just follow the instruction.
(Windows 11 Pro
CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2.59 GHz 6 cores
Memory:16.0GB)

I want to know whether the training rate is normal.
If I need to complete the whole training in 2 weeks, do I have some methods? Can I just stop in the middle of the training and take the intermediate "checkpoint" to continue trainiing the next level?
I appreciate your response.

FileNotFoundError when first running train_hetero.py,

When I first run train_hetero.py as the README describes, the FileNotFoundError occured and I debugged the code. It seems that there is not "events" in Ray log dirs, so the logic in update_logs() function inside the train_hetero.py caused the mistake.
I want to figure out what wrong with my operations and I'm looking forwad to an appropriate solution.
Any suggestions will be appreciated!!

How to write an evaluation environment for 3v3 or 4v4, 5v5etc.?

Hello, I apologize for the interruption. I have been trying to learn your method and have completed training the agent. I would like to evaluate the performance of the strategy at different scales, such as how the "fight" strategy performs in situations like 3v3, 4v4, etc. I made an attempt but was not successful. I would like to ask you how to write evaluation environments for 3v3 and 4v4 in both LowLevel-env and HighLevel-env. If possible, could you provide an example for 3v3 or 4v4? I look forward to your reply. Thank you very much~

How to set args when running train_hetero.py

1.Is it essential to train every level of the hetero with "--level" from 1 to 5?
2.Is it essential to set the argument "--restore=True" when "--level">=2?
3.If ValueError "[[nan,nan,nan,...,nan,nan],...,[nan,nan,...,nan,nan]" is raised,and logger warns "NaN or Inf in input data",should I restore the model from lower level or retrain the model from level=1?

FileExistsError: [WinError 183] 当文件已存在时，无法创建该文件。: 'policies/model.pt' -> 'policies/Esc_AC1.pt'

Why did this error occur while training the low-level policy?

Traceback (most recent call last):
File "D:\Py_project\hhmarl_2D-main\hhmarl_2D-main\train_hetero.py", line 292, in
make_checkpoint(args, algo, log_dir, i, args.level, test_env)
File "D:\Py_project\hhmarl_2D-main\hhmarl_2D-main\train_hetero.py", line 105, in make_checkpoint
os.rename('policies/model.pt', f'policies/{policy_name}.pt')
FileExistsError: [WinError 183] 当文件已存在时，无法创建该文件。: 'policies/model.pt' -> 'policies/Esc_AC1.pt'

what is its trainning environment?

An error occurred while running train_cetero.py

Hello.When I try to run "train_hetero.py" with level=5 and `agent_mode="fight", I got the error below:

2024-08-05 11:12:33,753 ERROR actor_manager.py:507 -- Ray error, taking actor 1 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=29278, ip=121.48.165.149, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f9e141f9fa0>)
File "/home/ps/anaconda3/envs/tzb/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 609, in init
self.env = env_creator(copy.deepcopy(self.env_context))
File "/home/ps/anaconda3/envs/tzb/lib/python3.8/site-packages/ray/rllib/env/utils.py", line 133, in _gym_env_creator
env = env_descriptor(env_context)
File "/media/sda2/codes/zk/hhmarl_2D-main/envs/env_hetero.py", line 50, in init
self._get_policies("LowLevel")
File "/media/sda2/codes/zk/hhmarl_2D-main/envs/env_base.py", line 326, in _get_policies
self.policies[i] = {"fight_1": torch.load(os.path.join(policy_dir, f'L{i}_AC1_fight.pt')), "fight_2": torch.load(os.path.join(policy_dir, f'L{i}_AC2_fight.pt'))}
File "/home/ps/anaconda3/envs/tzb/lib/python3.8/site-packages/torch/serialization.py", line 1065, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/ps/anaconda3/envs/tzb/lib/python3.8/site-packages/torch/serialization.py", line 468, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/ps/anaconda3/envs/tzb/lib/python3.8/site-packages/torch/serialization.py", line 449, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/media/sda2/codes/zk/hhmarl_2D-main/policies/L4_AC1_fight.pt'

But when I try to run "train_hetero.py" with level=4 and `agent_mode="fight", I got the error below:

2024-08-05 11:14:47,704 ERROR actor_manager.py:507 -- Ray error, taking actor 1 out of service. ray::RolloutWorker.apply() (pid=31499, ip=121.48.165.149, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f83843c7070>)
File "/home/ps/anaconda3/envs/tzb/lib/python3.8/site-packages/ray/rllib/utils/actor_manager.py", line 185, in apply
raise e
File "/home/ps/anaconda3/envs/tzb/lib/python3.8/site-packages/ray/rllib/utils/actor_manager.py", line 176, in apply
return func(self, *args, **kwargs)
File "/home/ps/anaconda3/envs/tzb/lib/python3.8/site-packages/ray/rllib/execution/rollout_ops.py", line 86, in
lambda w: w.sample(), local_worker=False, healthy_only=True
File "/home/ps/anaconda3/envs/tzb/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 915, in sample
batches = [self.input_reader.next()]
File "/home/ps/anaconda3/envs/tzb/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py", line 92, in next
batches = [self.get_data()]
File "/home/ps/anaconda3/envs/tzb/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py", line 277, in get_data
item = next(self._env_runner)
File "/home/ps/anaconda3/envs/tzb/lib/python3.8/site-packages/ray/rllib/evaluation/sampler.py", line 664, in _env_runner
base_env.send_actions(actions_to_send)
File "/home/ps/anaconda3/envs/tzb/lib/python3.8/site-packages/ray/rllib/env/multi_agent_env.py", line 656, in send_actions
raise e
File "/home/ps/anaconda3/envs/tzb/lib/python3.8/site-packages/ray/rllib/env/multi_agent_env.py", line 645, in send_actions
obs, rewards, terminateds, truncateds, infos = env.step(agent_dict)
File "/media/sda2/codes/zk/hhmarl_2D-main/envs/env_base.py", line 88, in step
self._take_action(action)
File "/media/sda2/codes/zk/hhmarl_2D-main/envs/env_hetero.py", line 165, in _take_action
actions = self._policy_actions(policy_type=self.opp_mode, agent_id=i, unit=u)
File "/media/sda2/codes/zk/hhmarl_2D-main/envs/env_base.py", line 392, in _policy_actions
out = self.policy[policy_str](
KeyError: 'fight_1_opp'

Fail to restore the commander policy when evaluating with commander.

Hello, when I try to evaluate with Commander, I got the error below. Looks like it failed to restored the commander policy. Do I need to change some configurations before evaluating with Commander? I use the pre-trained files you provide and carefully read the 'Procedure'.

Traceback (most recent call last):
File "C:\Users\Administrator\Desktop\hhmarl\hhmarl_2D\evaluation.py", line 99, in
policy = Policy.from_checkpoint(check, ["commander_policy"])["commander_policy"] if args.eval_hl else None
File "G:\python_venvs\python310\lib\site-packages\ray\rllib\policy\policy.py", line 335, in from_checkpoint
policies[policy_id] = Policy.from_state(policy_state)
File "G:\python_venvs\python310\lib\site-packages\ray\rllib\policy\policy.py", line 378, in from_state
new_policy = actual_class(
File "G:\python_venvs\python310\lib\site-packages\ray\rllib\algorithms\ppo\ppo_torch_policy.py", line 67, in init
self._initialize_loss_from_dummy_batch()
File "G:\python_venvs\python310\lib\site-packages\ray\rllib\policy\policy.py", line 1405, in _initialize_loss_from_dummy_batch
actions, state_outs, extra_outs = self.compute_actions_from_input_dict(
File "G:\python_venvs\python310\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 522, in compute_actions_from_input_dict
return self._compute_action_helper(
File "G:\python_venvs\python310\lib\site-packages\ray\rllib\utils\threading.py", line 24, in wrapper
return func(self, *a, **k)
File "G:\python_venvs\python310\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1141, in _compute_action_helper
dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens)
File "G:\python_venvs\python310\lib\site-packages\ray\rllib\models\modelv2.py", line 259, in call
res = self.forward(restored, state or [], seq_lens)
File "C:\Users\Administrator\Desktop\hhmarl\hhmarl_2D\models\ac_models_hier.py", line 85, in forward
output, new_state = self.forward_rnn(input_dict, state, seq_lens)
File "C:\Users\Administrator\Desktop\hhmarl\hhmarl_2D\models\ac_models_hier.py", line 91, in forward_rnn
x = torch.cat((self.inp1(self._inp1), self.inp2(self._inp2), self.inp3(self._inp3)),dim=1)
File "G:\python_venvs\python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "G:\python_venvs\python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "G:\python_venvs\python310\lib\site-packages\ray\rllib\models\torch\misc.py", line 169, in forward
return self._model(x)
File "G:\python_venvs\python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "G:\python_venvs\python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "G:\python_venvs\python310\lib\site-packages\torch\nn\modules\container.py", line 217, in forward
input = module(input)
File "G:\python_venvs\python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "G:\python_venvs\python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "G:\python_venvs\python310\lib\site-packages\torch\nn\modules\linear.py", line 116, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x20 and 10x50)

Process finished with exit code 1
Looking forward to your reply. Thanks for you patience.

idsia / hhmarl_2d Goto Github PK

hhmarl_2d's People

Contributors

Stargazers

Watchers

Forkers

hhmarl_2d's Issues

License ?

An Error is raised “Could not find L3 Fight Policy. Store in E:\hhmarl_2D\results\Level3_fight_2vs2\checkpoint”

Adjustment to accelerate the training process

FileNotFoundError when first running train_hetero.py,

How to write an evaluation environment for 3v3 or 4v4, 5v5etc.?

How to set args when running train_hetero.py

FileExistsError: [WinError 183] 当文件已存在时，无法创建该文件。: 'policies/model.pt' -> 'policies/Esc_AC1.pt'

Why did this error occur while training the low-level policy?

what is its trainning environment?

An error occurred while running train_cetero.py

Fail to restore the commander policy when evaluating with commander.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent