Coder Social home page Coder Social logo

Comments (6)

puyuan1996 avatar puyuan1996 commented on August 22, 2024

Hello, thanks for your question.

I replace the policy-related parameters of our hopper_onppo_config with the settings you gave and run in 3M env steps. Here is the naive result:
image
Although the performance is poor due to hyperparameters, the error you mentioned doesn't appear in this setting.

I guess that there may be abnormal data in the ppo_batch in your setting. It is possible that the occurrence of nan is due to abnormally large or abnormally small values in pi_new/pi_old in the update formula of ppo.

You can set up the code like below, then run cityflow_ppo_continuous_train.py in debug mode to save and analyze the ppo_batch when there is an error on this line.

try:
    ppo_loss, ppo_info = ppo_error_continuous(ppo_batch, self._clip_ratio)
except Exception as error:
    print(error, ppo_batch)
    torch.save(ppo_batch, 'ppo_batch.pt')

After you save the ppo_batch.pt when the error occurs, you can check the ppo_batch carefully.

Thanks a lot.

from di-engine.

zyz109429 avatar zyz109429 commented on August 22, 2024

First of all, thank you for your reply.
I did the debug as you suggested.
The caught exceptions are as follow。

异常

The printed information is as follow。

ppo_data(logit_new={'mu': tensor([[nan],[nan],[nan],...,[nan]], grad_fn=), 'sigma': tensor([[nan], [nan], [nan],...,[nan]], grad_fn=)}, logit_old={'mu': tensor([-0.6285, -0.5731, -0.6455, 0.8511, -0.6911, 0.6962, -0.4182, -0.5283,
-0.6198, 0.9046, -0.1926, -0.5818, 0.9112, -0.5774, 0.0645, 0.8799,
0.8856, -0.6692, -0.5422, 0.2381, 0.9047, 0.8866, -0.5679, 0.5986,
-0.3940, -0.6046, -0.5193, -0.6381, 0.9134, -0.3108, 0.9075, 0.9092,
-0.6426, -0.5828, 0.9051, -0.6339, -0.6349, -0.6433, -0.6034, -0.5630,
-0.5207, -0.6074, -0.6682, -0.6652, -0.7154, 0.7598, -0.1082, -0.0081,
-0.5290, 0.8938, 0.9072, -0.6852, 0.7760, -0.2126, 0.8408, -0.6748,
-0.6174, 0.6603, 0.8826, -0.6609, -0.3863, -0.4872, -0.6193, 0.8938]), 'sigma': tensor([1.4908, 1.3997, 1.5247, 0.3084, 1.5925, 0.4297, 1.2325, 1.3328, 1.4403,
0.2572, 0.9868, 1.3993, 0.2489, 1.3731, 0.7672, 0.2819, 0.2811, 1.5416,
1.3511, 0.6743, 0.2589, 0.2757, 1.3921, 0.4840, 1.1544, 1.4665, 1.3378,
1.4687, 0.2475, 1.0555, 0.2545, 0.2508, 1.5072, 1.3612, 0.2557, 1.4653,
1.4874, 1.4856, 1.4224, 1.3854, 1.2687, 1.4344, 1.5471, 1.5305, 1.6533,
0.3765, 0.8946, 0.8425, 1.3026, 0.2663, 0.2548, 1.5716, 0.3727, 0.9790,
0.3151, 1.5678, 1.4493, 0.4493, 0.2809, 1.5410, 1.1080, 1.2272, 1.4482,
0.2673])}, action=tensor([ 1.3408, 0.1602, -3.9134, 0.9856, -0.2720, 1.4314, -0.4599, -1.1067,
0.9704, 1.3853, -2.0676, -1.4295, 1.0197, 1.2636, -0.8234, 0.8861,
0.2818, -3.0889, 0.1927, -0.4374, 0.8012, 0.8854, -2.3425, -0.1133,
2.1956, -2.5130, -0.3467, -1.0016, 1.2977, 0.3969, 0.5102, 0.9858,
-2.2762, -3.7371, 0.8547, 0.5875, 1.0124, 1.4402, 1.2027, 0.0245,
-2.7108, -1.3852, -0.1114, 0.8502, -0.8981, 0.6419, 0.6401, -0.3337,
-2.1801, 1.1617, 0.7595, -2.1792, 0.2434, 0.3868, 0.2773, -0.6104,
1.0241, 0.6774, 0.3616, 0.9796, 0.3465, 0.0860, 0.4484, 0.5196]), value_new=tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
grad_fn=), value_old=tensor([-0.9685, -0.4812, -0.6246, -2.7388, -2.0899, -1.3862, -0.4165, -0.4572,
-1.7662, -2.7320, -0.8122, -0.5687, -2.7295, -1.9923, -2.6184, -2.8475,
-2.3100, -2.1565, -0.6518, -2.5187, -2.7433, -2.8006, -0.4143, -2.5838,
-1.4480, -0.4806, -0.3680, -2.8206, -2.7685, -2.2257, -2.7590, -2.5800,
-0.9456, -1.8938, -2.8282, -1.5848, -0.9681, -1.6505, -0.7479, -0.4279,
-2.8415, -0.8131, -1.6537, -2.5903, -2.4278, -2.6943, -2.6982, -1.3150,
-1.6957, -2.8198, -2.7288, -2.2993, -2.0520, -1.0157, -2.7231, -1.8825,
-0.7513, -2.7403, -2.8357, -1.6993, -1.9129, -1.6852, -2.2528, -2.6801]), adv=tensor([ 0.6305, 0.2643, 0.6371, 0.6466, 1.9546, -0.4648, -1.2747, -0.9353,
-1.0359, -0.5286, -1.3012, 0.3276, -0.2845, -0.3729, -1.1896, 0.9560,
2.1039, 0.2346, -1.3772, 1.6108, -0.5274, 1.1824, -0.1134, -0.4184,
-1.9192, -1.6429, -0.5152, 2.0295, -0.7275, 0.6804, -0.5729, -0.9296,
1.2292, -0.2330, 0.1495, -1.8284, -0.4702, -1.7449, 1.9015, 0.8298,
1.9774, -0.3908, 0.4063, 1.0299, -0.0164, 0.3812, -0.1739, -0.8045,
0.4500, -0.5867, 0.4400, 0.1079, 1.4858, -1.0783, 0.8624, -1.0333,
-0.1328, 0.1841, -0.0777, 0.5778, -0.2049, -0.6412, 0.4944, -0.2176]), return_=tensor([-0.6951, -0.3260, -0.3491, -2.4602, -1.3890, -1.4665, -0.7583, -0.6894,
-2.0309, -2.8329, -1.1625, -0.3931, -2.7516, -2.0430, -2.9327, -2.4690,
-1.5608, -2.0109, -1.0267, -1.9288, -2.8437, -2.3490, -0.3811, -2.6490,
-1.9979, -0.9413, -0.4646, -2.0955, -2.9336, -1.9362, -2.8741, -2.8103,
-0.4789, -1.8992, -2.7101, -2.1054, -1.0501, -2.1441, -0.0641, -0.0902,
-2.1332, -0.8695, -1.4527, -2.1879, -2.3633, -2.5014, -2.6846, -1.5049,
-1.4806, -2.9394, -2.5169, -2.1947, -1.5024, -1.2941, -2.3748, -2.1463,
-0.7243, -2.6110, -2.7910, -1.4429, -1.9093, -1.8224, -2.0234, -2.6806]), weight=None).

According to the data and error message, it is suggested that I modify the following total_loss. backward (retain _ graph = True).

Hope to get your further guidance, thank you very much!

from di-engine.

zyz109429 avatar zyz109429 commented on August 22, 2024

from di-engine.

puyuan1996 avatar puyuan1996 commented on August 22, 2024

Hello,

Could you save the model self._learn_model and input batch data when the error first occurs after this line

 output = self._learn_model.forward(batch['obs'], mode='compute_actor_critic')

something like this:

try:
    ppo_loss, ppo_info = ppo_error_continuous(ppo_batch, self._clip_ratio)
except Exception as error:
    torch.save(ppo_batch, 'ppo_batch.pt')
    torch.save(batch, 'input_batch.pt')   # to save input batch
    torch.save(self._state_dict_learn(), 'ckpt.pt')  # to save model

Then we can analyze whether the model parameters or the input data have nan values.
If the input has nan value, then we can check the environment.
If the parameters of the model have nan values, error may be due to abnormal gradient.
If neither has nan, we can load the model and pass in the input batch again to verify if the output shows nan values.

Thanks.

from di-engine.

zyz109429 avatar zyz109429 commented on August 22, 2024

According to the method you provided, I found after trying: input_batch has no null value.
In PPO_batch, logit_new is all Nan, logit_old is not Nan, and then the parameters of model are all Nan.
so error may be due to abnormal gradient.

I've tried other activation functions and gradient clipping, and it didn't work. In addition, I have also tried DDPG and SAC algorithms, and the reward is normally gradually convergent. But during the training process, the reward of PPO algorithm has been kept at the initial value attachment, and it seems that there is no learning.

Hope to get your further guidance again,Thanks.

from di-engine.

puyuan1996 avatar puyuan1996 commented on August 22, 2024

Hello,

Have you normalized the input given by this cityflow environment? What is its current maximum and minimum value of obs and reward?

In order to reproduce your error on my side, can you provide the complete main function file cityflow_ppo_continuous_train.py ?

To confirm that the other parts are the same other than the env, have you made any changes to the original PPO algorithm to adapt to your environment? Or you just specified the relevant hyperparameters in this file,the relevant code of ppo are the original DI-engine version of the latest branch?

Does the error occur at a fixed number of iterations? If yes, how many iterations is it?

Thanks a lot.

from di-engine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.