Coder Social home page Coder Social logo

Comments (6)

mitchellspryn avatar mitchellspryn commented on June 20, 2024 1

Actually, I'm fixing some other stuff right now. I'll add that portion in. find it on branch mspryn_bugfixes shortly.

from autonomousdrivingcookbook.

mitchellspryn avatar mitchellspryn commented on June 20, 2024

I checked the code, those are intentional.

When training a RL model, you need to balance exploration (trying new strategies) and exploitation (improving the best known strategy incrementally). The way we do this in our code is via a strategy call linear-epsilon annealing. With this method, we start by making decisions completely at random, as our model is meaningless. Even with transfer learning, we have dense layers that are initialized at random. Over time, as our model better learns to predict the Q values, we decrease the percentage of time that we explore (i.e. take random actions), and increase the percentage of time we exploit (i.e. follow the model's predictions). If you look around line 348 in distributed_agent.py, you can see where we are decreasing the epsilon after each successful iteration. But, during training, we never want to stop exploring entirely, as this can cause the model to get stuck in a local minimum. This has the consequence that even a perfect model will crash during training, as we will still be making random decisions. The final convergence value is set by the parameter min_epsilon, which defaults to 0.1, meaning 10% random actions.

Few more comments / questions:

  • RL models are prone to overfitting like any model. You need to be really careful if you resume training of an already trained model. I'm not surprised that running hours of training on the already trained model leads to garbage.
  • Note that always_random is only set to True when filling the replay memory for the first time. After that, it's false, which means linear epsilon annealing happens (check around line 160).
  • We don't want to overwhelm the training machine with data, so we always stop and perform a training iteration after 30 seconds. In this case, AirSim will keep the last control signal that is being sent, meaning that the car will most likely crash.
  • You can try playing around with the min_epsilon and per_iter_epsilon_reduction parameters to modify the training schedule. The former will control the minimum percentage of time that we explore, and the latter will control how quickly we move from a full-explore to mostly-exploit strategy. The parameters provided in the notebooks have been shown to work well for most cases, but there may be a better combination that will allow for faster training.
  • How are you determining "convergence?"
  • Are you running the final models using the RunModel.ipynb?

from autonomousdrivingcookbook.

rafi-vivanti avatar rafi-vivanti commented on June 20, 2024

Thanks!
Yes, I'm checking using RunModel. I have no good definition to convergence, I simply run RunModel 5 times and if all of them crashes within 5 seconds I assume the model is bad. This happens even if I do as said, and use "pretrained_model_weights.h5" for initialization of the Transfer learning.

from autonomousdrivingcookbook.

rafi-vivanti avatar rafi-vivanti commented on June 20, 2024

I think I found another related bug:

at distributed_agent.py line 40:
self.__train_conv_layers = bool(parameters['train_conv_layers'])
should be:
self.__train_conv_layers = bool(int(parameters['train_conv_layers']))

Otherwise it's always True, which might explain why my Transfer Learning didn't work.

from autonomousdrivingcookbook.

mitchellspryn avatar mitchellspryn commented on June 20, 2024

I see. That value will be the string "True" or "False", so the fix should be

self.__train_conv_layers = (parameters['train_conv_layers'].lower().strip() == 'true')

Can you submit a PR with that?

from autonomousdrivingcookbook.

mitchellspryn avatar mitchellspryn commented on June 20, 2024

The PR has been merged, so I'll close this issue.

from autonomousdrivingcookbook.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.