Coder Social home page Coder Social logo

geograz / tunnel-automation-with-reinforcement-learning-tunnrl- Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 2.0 198 KB

Code repository for the paper "Reinforcement Learning based Process Optimization and Strategy Development in Conventional Tunneling" by G.H. Erharter, T.F. Hansen, Z. Liu and T. Marcher

License: MIT License

Python 99.83% Shell 0.17%
tunnelling reinforcement-learning tunnel-excavation machine-learning excavation-sequence

tunnel-automation-with-reinforcement-learning-tunnrl-'s Introduction

Tunnel automation with Reinforcement Learning - TunnRL

This repository contains the codes for the paper:

Reinforcement learning based process optimization and strategy development in conventional tunneling

by Georg H. Erharter, Tom F. Hansen, Zhongqiang Liu and Thomas Marcher

published in Automation in Construction (Vol. 127; July 2021)

DOI: https://doi.org/10.1016/j.autcon.2021.103701

The paper was published as part of a collaboration on Machine Learning between the Institute of Rock Mechanics and Tunnelling (Graz University of Technology) and the Norwegian Geotechnical Institute (NGI) in Oslo.

Requirements and folder structure

Use the requirements.txt file to download the required packages to run the code. We recommend using a package management system like conda for this purpose.

Code and folder structure set up

The code framework depends on a certain folder structure. The python files should be placed in the main directory. The set up should be done in the following way:

Reinforcement_Learning_for_Geotechnics
├── 02_plots
│   └── tmp
├── 04_checkpoints
├── 06_results
│   └── tmp
├── 00_main.py
├── 02_model_tester.py
├── 04_analyzer.py
├── A_utilities.py
├── B_generator.py
├── C_geotechnician.py
├── D_tunnel.py
└── E_plotter.py

Either set up the folder structure manually or on Linux run:

bash folder_structure.sh

Code description

  • 00_main.py ... is the main executing file
  • 02_model_tester.py ... file that runs and tests individual checkpoints of already trained model for further analysis
  • 04_analyzer.py ... file that analyzes and visualizes the performance of agents tested with 02_model_tester.py
  • A_utilities.py ... is a library containing useful functions that do not directly belong to the environment or the agent
  • B_generator.py ... part of the environment that generates a new geology for every episode
  • C_geotechnician.py ... part of the environment that evaluates the stability and also contains the RL agent itself
  • D_tunnel.py ... part of the environment that handles the rewards and updates the progress of the excavation
  • E_plotter.py ... plotting functionalities to visualize the training progress or render episodes

Pseudo - code for the utilized DQN-algorithm

(inspired by Deeplizard)

  • A. Initialize replay memory capacity ("un-correlates" the otherwise sequential correlated input)
  • B. Inititalize the policy-ANN (keeps the optimal approximated Q-function) with random weights
  • C. Clone the policy-ANN to a second target-ANN that is used for computing $ Q^* $ in $Q^*(s,a) - Q(s,a) = loss$
  • D. For each episode:
    1. Initialize the starting state (not resetting the weights)
    2. For each time step:
      • Select an action after an epsilon-greedy strategy (exploitation or exploration)
      • Execute the selected action in and emulator
      • Observe reward and next state
      • Store experience (a tuple of old-state, action, reward, new-state) in replay memory
      • Sample a random batch from replay memory
      • Preprocess all states (an array of values) from batch
      • Pass batch of preprocessed states and next-states to policy-ANN and target-ANN. Predict Q-values for both ANN's.
      • Calculate loss between output Q-values from policy-ANN and target-ANN
      • Standard gradient descent with back propagation updates weights in the policy-ANN to minimize loss. Every xxx timestep the weights in the target-ANN is updated with weights from the policy-ANN

References

Besides other references given in the paper, we especially want to highlight the Reinforcement Learning with Python tutorial series of Sentdex which served as a basis for the agent in C_geotechnician.py.

tunnel-automation-with-reinforcement-learning-tunnrl-'s People

Contributors

geograz avatar tfha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tunnel-automation-with-reinforcement-learning-tunnrl-'s Issues

problems with videocapture

I have solved the path-problem for linux but I have now got this error-messages with videocapture. I didn't find the error. It seems like there could be something with the cv2 import. In addition to the error message below, I get some the error from the attached image :

File "/home/tfha/projects/Reinforcement_Learning_for_Geotechnics/00_main.py", line 231, in
pltr.render_episode(Path('02_plots/tmp'), fps=2, x_pix=1680, y_pix=480,
File "/home/tfha/projects/Reinforcement_Learning_for_Geotechnics/E_plotter.py", line 294, in render_episode
out = cv2.VideoWriter(savepath, fourcc, fps, (x_pix, y_pix))
TypeError: VideoWriter() missing required argument 'frameSize' (pos 5)

image

code for use of the model

Maybe its a silly question, but is the 02_model_tester.py that is the code that should be used to run the model with new unseen data? I.e for prediction of the next action on each timestep.

new rockmass in each episode?

What is the idea of generating a new version of the rockmass each time? Is it to expose the agent for as many rockmasses as possible? What would be the effect of initializing the same rockmass in each episode? Less generalization? Intutitively I though that we should initialize the environment to the same start each time. But if this way works well (and it seems to :-)), I guess this is the best way.

Geology in figure 10

Have you tried to add R1 and R2 as a background in figure 10? This can be illustrative to the chosen actions by the agent.

image

Code in fit method

Is there some reason why you haven't added the if else part in the last section in the fit-method, ie. like Sentdex has done?

image

code for using the trained model

Maybe its a silly question, but is the 02_model_tester.py that is the code that should be used to run the model with new unseen data? I.e for prediction of the next action on each timestep.

update_replay_memory - above maxsize?

what if you try to append to replay memory but the memory is full? We can get errors or start overwriting old values? More info: https://deeplizard.com/learn/video/PyQNfsGUnQA

def update_replay_memory(self, transition):
    self.replay_memory.append(transition) #TODO: check for size

Alternative:

def push(self, experience):
    if len(self.memory) < self.capacity:
        self.memory.append(experience)
    else:
        self.memory[self.push_count % self.capacity] = experience
    self.push_count += 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.