My RL code does not converge when I do Transfer Learning and initialize the weights to

hard coded epsilon = 1 about autonomousdrivingcookbook HOT 6 CLOSED

microsoft commented on June 20, 2024

hard coded epsilon = 1

from autonomousdrivingcookbook.

Comments (6)

mitchellspryn commented on June 20, 2024 1

Actually, I'm fixing some other stuff right now. I'll add that portion in. find it on branch mspryn_bugfixes shortly.

from autonomousdrivingcookbook.

mitchellspryn commented on June 20, 2024

I checked the code, those are intentional.

When training a RL model, you need to balance exploration (trying new strategies) and exploitation (improving the best known strategy incrementally). The way we do this in our code is via a strategy call linear-epsilon annealing. With this method, we start by making decisions completely at random, as our model is meaningless. Even with transfer learning, we have dense layers that are initialized at random. Over time, as our model better learns to predict the Q values, we decrease the percentage of time that we explore (i.e. take random actions), and increase the percentage of time we exploit (i.e. follow the model's predictions). If you look around line 348 in distributed_agent.py, you can see where we are decreasing the epsilon after each successful iteration. But, during training, we never want to stop exploring entirely, as this can cause the model to get stuck in a local minimum. This has the consequence that even a perfect model will crash during training, as we will still be making random decisions. The final convergence value is set by the parameter min_epsilon, which defaults to 0.1, meaning 10% random actions.

Few more comments / questions:

RL models are prone to overfitting like any model. You need to be really careful if you resume training of an already trained model. I'm not surprised that running hours of training on the already trained model leads to garbage.
Note that always_random is only set to True when filling the replay memory for the first time. After that, it's false, which means linear epsilon annealing happens (check around line 160).
We don't want to overwhelm the training machine with data, so we always stop and perform a training iteration after 30 seconds. In this case, AirSim will keep the last control signal that is being sent, meaning that the car will most likely crash.
You can try playing around with the min_epsilon and per_iter_epsilon_reduction parameters to modify the training schedule. The former will control the minimum percentage of time that we explore, and the latter will control how quickly we move from a full-explore to mostly-exploit strategy. The parameters provided in the notebooks have been shown to work well for most cases, but there may be a better combination that will allow for faster training.
How are you determining "convergence?"
Are you running the final models using the RunModel.ipynb?

from autonomousdrivingcookbook.

rafi-vivanti commented on June 20, 2024

Thanks!
Yes, I'm checking using RunModel. I have no good definition to convergence, I simply run RunModel 5 times and if all of them crashes within 5 seconds I assume the model is bad. This happens even if I do as said, and use "pretrained_model_weights.h5" for initialization of the Transfer learning.

from autonomousdrivingcookbook.

rafi-vivanti commented on June 20, 2024

I think I found another related bug:

at distributed_agent.py line 40:
self.__train_conv_layers = bool(parameters['train_conv_layers'])
should be:
self.__train_conv_layers = bool(int(parameters['train_conv_layers']))

Otherwise it's always True, which might explain why my Transfer Learning didn't work.

from autonomousdrivingcookbook.

mitchellspryn commented on June 20, 2024

I see. That value will be the string "True" or "False", so the fix should be

self.__train_conv_layers = (parameters['train_conv_layers'].lower().strip() == 'true')

Can you submit a PR with that?

from autonomousdrivingcookbook.

mitchellspryn commented on June 20, 2024

The PR has been merged, so I'll close this issue.

from autonomousdrivingcookbook.

hard coded epsilon = 1 about autonomousdrivingcookbook HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent