philtabor / multi-agent-deep-deterministic-policy-gradients Goto Github PK

A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

Python 100.00%

reinforcement-learning actor-critic-methods maddpg multi-agent-reinforcement-learning deep-reinforcement-learning actor-critic-algorithm

multi-agent-deep-deterministic-policy-gradients's People

Contributors

Stargazers

Watchers

Forkers

ahmadsuleman haozhougt z7076 fb1n15 shreyadate05 yiishen arthurcarvalhoc sumantasunny suhailsama aicools scorpio-h fylxzq shamim237 raylrayl cove9988 dnovischi danielkboyer woosuk1103 terrisgo tianhao-peng 1184022880com stha-prashant vananle pandiarajang ffxu1024 zhili-zh ouat-in-github zijiwang zshiningstar jnbai517 moonmaxwell-ycy miguelangelmy heikolee elizabeth-palacios wanghailin2019 evilne projetsplusia ywang-nssc xiangzhang-122 zzsong1023 xueliu8617112 pvpadrao ballball-yy alanw-git juanxincai1 mmehedin oudeng prorates dnlam aliirshayyid abdarabi kdkhanmir joshwuuu projecttopstep j-degooijer ljx-hwj jerry-jwz mrahman2-vt bucoolsir hongdazhang qst75693 lianz-lit ccv7 004001c xu1115 shenjiede

multi-agent-deep-deterministic-policy-gradients's Issues

ValueError: could not broadcast input array from shape (8) into shape (28)

File "maddpg_torch.py", line 345, in
memory.store_transition(obs, state, actions, reward, obs_, state_, done)
File "maddpg_torch.py", line 51, in store_transition
self.state_memory[index] = state
ValueError: could not broadcast input array from shape (8) into shape (28)

This error occurs when scenario='simple_adversary' in main
when scenario='simple ', this error will not occur

I just fixed the problem about backward

here is the solution
#2 (comment)

Found dtype Float but expected Double

Hello dear,
i'm trying to run the code (after your correction on backward) but i'm getting the following error:
I also tried with python 3.6 (numpy 1.14.5 , torch 1.10.1, gym 0.10.5), i'm still getting the same error

critic_loss.backward(retain_graph=True)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Found dtype Float but expected Double

it seems like the error is being raised from the method MADDPG.learn at :
critic_loss.backward(retain_graph=True)
i checked the variable of "target" variable and has dtype=torch.float64
Any idea ? thanks

Access to the video on Youtube

Hi, Phil. I wanted to access the video on Youtube, but it said that this is a private video.

Question about backward

Hi Phil,

I have watched your video on Youtube. There's still a question about the critic_loss.backward(retain_graph=True). In your solution, you just turn the torch version from 1.8.1 to 1.4, I think it's a bug in version 1.4 and so that you are running bug-free in version 1.4.

I have checked a lot of information but I still don't know how to solve it. So here I am to turn to you.
Here is my Traceback:

File "main.py", line 101, in <module>
    maddpg_agent.learn(memory)
  File "maddpg.py", line 99, in learn
    critic_loss.backward(retain_graph=True)
  File "/usr/local/lib/python3.7/dist-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 8]], which is output 0 of TBackward, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

usage of critic_value_new[dones[:, 0.0]] = 0.0 in learn()

Multi-Agent-Deep-Deterministic-Policy-Gradients/maddpg.py

Line 74 in a3c294a

critic_value_[dones[:,0]] = 0.0

Hi Phill,

Could you please help me to understand what's this line is for?
critic_value_new[dones[:, 0.0]] = 0.0
Since critic_value_new float variable it cannot be used as array. Should we set just dones[agent_idx] to 0?

Thanks and Regards
Viji

AttributeError: 'MultiDiscrete' object has no attribute 'n'

I tried to run simply_reference scenario but this error appeared.
AttributeError: 'MultiDiscrete' object has no attribute 'n'
How to fix it please

ModuleNotFoundError: No module named 'make_env'

When I run the code, the following error appears, showing that the make_env module is missing，how to solve this problem?
Traceback (most recent call last):
File "F:\project\pytorch\edge computing\Multi-Agent-Deep-Deterministic-Policy-Gradients-master\main.py", line 4, in
from make_env import make_env
ModuleNotFoundError: No module named 'make_env'

Shouldn't it be agent.actor.forward() and calculate actor_loss?

Multi-Agent-Deep-Deterministic-Policy-Gradients/maddpg.py

Line 83 in a3c294a

actor_loss = agent.critic.forward(states, mu).flatten()

Dear Phill,

First of all plenty of thanks and gratitude for your lessens, I've learned a lot from your lectures.
I've noticed a difference in the code at line 83 in MADDPG class while calculating actor-loss. It's running forward propagation of critic network instead of actor network. I believe this is typo, please correct me if I'm wrong.