Coder Social home page Coder Social logo

tmrl_moseac's Introduction

TMRL_MOSEAC

This project is for applying the idea of elastic time step with reinforcement learning under tmrl framework.

This repo is for the paper Reinforcement Learning with Elastic Time Step. About the

The tmrl framework can establish a communication link between game Trackmania to your local machine though OpenPlant.

img5

The tmrl framework is a powerful Multi-threaded data transfer framework, allowing users to train multi-threaded. However, Trackmania is not supposed to run multi-threaded. In other words, one machine can only launch one game at the same time.

Therefore, our code will be used for single-thread training on a Windows machine.

Prerequisites

  • Windows 10 / 11
  • Python >= 3.7
  • A NVIDIA GPU (required only if you plan to train your own AI, better latest the Ampere framework to enable the latest driver and Cuda version.)

If using Anaconda on Windows:

We recommend installing the conda version of pywin32:

conda install pywin32

Installation

The following instructions are for installing tmrl with support for the TrackMania 2023 video game.

You will first need to install TrackMania 2023 and also a small community-supported utility called Openplanet for TrackMania (the Gymnasium environment needs this utility to compute the reward).

Ensure you install the Visual C++ runtime before installing the OpenPlanet. You can download it here for 64bits versions of Windows.

Then you can install tmrl framework easily by:

pip install tmrl

Deploy the code

Preparing

Once you have installed the tmrl framework, you should set up your own Cuda environment.

You need to install the driver by checking the official website for the latest driver. Then, install cuda toolkit with its related cuDNN.

After that, you need to install pytorch with the guidance of their official website.

We tested our code with these hardware and software conditions:

  • Nvidia RTX 4070
  • Driver version: 535.98
  • Cuda version: 11.8
  • cuDNN version: 8.7.0
  • Pytorch version: 2.1.0+cu118

Our code should work with a higher version. Those conditions are just for reference.

Install the requirements

Once you have the development environment prepared, you can download our code though:

git clone https://github.com/alpaficia/TMRL_MOSEAC

Then copy the config.json file to the tmrl config folder. It should locate at C:\user\your_name\TmrlData\config.

Install the map and dependence

Load the tmrl-test track into your TrackMania game:

  • Navigate to your home folder (C:\Users\username\), and open TmrlData\resources
  • Copy the tmrl-test.Map.Gbx file into ...\Documents\Trackmania\Maps\My Maps.

Next, you need to install the requirement package by:

cd Path_To_Your_Code pip install -r requirement.txt

Launch the training code

Game preparation

  • Launch TrackMania 2023

  • In case the OpenPlanet menu is showing in the top part of the screen, hide it using the f3 key

  • Launch the tmrl-test track. This can be done by selecting create > map editor > edit a map > tmrl-test,` selecting a map, and hitting the green flag.

  • Set the game in windowed mode. To do this, bring the cursor to the top of the screen, and a drop-down menu will show Hit the window icon.

  • Bring the TrackMania window to the top-left corner of the screen. On Windows 10 / 11, it should automatically fit
    a quarter of the screen (NB: the window will automatically snap to the top-left corner and get sized properly when you start the training).

  • Hide the ghost by pressing the g key.

Start Training

Once you start the game, you can launch a new terminal and cd to your code path. Then:

python MOSEAC_main.py

The training process is based on real-time data. So, performance is highly related to your PC's hardware products.

If you don't see the log outputs many "time out," which means your PC is good enough to process with the current time parameters setting.

If you see many "time out" warnings during this training, it means your CPU IPC or RAM is not good enough for this training. Our control rate is default set within [5, 30] Hz. You can change them by:

python3 MOSEAC_main.py --min_time=YOUR_VALUES --max_time=YOUR_VALUES ` To meet your hardware productions' performance.

Our code was tested on the CPU: i5-13600K, RAM: DDR4 3200 MHz, and GPU: Nvidia RTX 4070. If you keep the default parameters, you probably see the agent is able to pass the first curve around 8 hours.

More details and settings

The Trackmania game also provides lidar data. If you would like to do more, please refer to the repo here.

MOSEAC Hypermeter Sheet for the above result

Name Value Annotation
Total steps 2e7
$\gamma$ 0.95 Discount factor
Net shape (256, 256)
batch_size 256
a_lr 2e-4 Learning rate of Actor Network
c_lr 3e-4 Learning rate of Critic Network
max_steps 2000 Maximum steps for one episode
$\alpha$ 0.01
$\eta$ -3 Refer to SAC
min_time 0.02 Minimum control duration, in seconds
max_time 0.5 Maximum control duration, in seconds
$\alpha_{m}-max$ 10.0 Maximum value for $\alpha_{m}$
$\alpha_{m}$ 1.0 Init value of $\alpha_{m}$
$\psi$ 1e-4 Monotonically increasing H-parameter
Optimizer Adam Refer to Adam
environment steps 1
Replaybuffer size 1e5
Number of samples before training start 5 * max_steps
Number of critics 2

Results

The training graphs are here:

img1

img2

The graph of consumption of steps is here:

img3

The graph of consumption of time is here:

img4

The video is available here.

License

MIT

Contact Information

Author: Dong Wang ([email protected]), Giovanni Beltrame ([email protected])

You are also welcome to contact MISTLAB for more fun and practical robotics and AI-related projects and collaborations. :)

image6

tmrl_moseac's People

Contributors

alpaficia avatar dong-wang1 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.