Coder Social home page Coder Social logo

jash-2000 / pole-balance-control-algorithms Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 84 KB

Developed various model-based and model-free Intelligent and Naive algorithms for the beam balance environment in OpenAI Gym.

Jupyter Notebook 92.94% Python 7.06%
boltzman-policy-reward deep-reinforcement-learning epsilon-greedy-exploration variational-pid-controller

pole-balance-control-algorithms's Introduction

Balancing a pole on a dynamically moving cart.

Control_System_Representation

The task of balaning a beam on a cart is a classic problem in Control Engineering. The original model deals with a variety of constraints including physical parameters like friction and centre of mass, but for the purpose of simulation, we have to work with certain assumptions. We have assumed a 1-D motion in which a pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 units to the cart i.e. either right or left. The pendulum starts upright, and the goal is to prevent it from falling over. The state space control block is represented by four values: cart position, cart velocity, pole angle, and the velocity of the tip of the pole. The animation ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center. Further, a reward of +1 is provided for every timestep that the pole remains upright.

In this repository, I have implemented this problem using various model based and model free algorithms. The simulation environment used in OpenAI Gym which provides the data for feedback loop. It also aids in obaining a rendered animation output. Diifferent algorithms used can be summarized below.

  1. Gaussian sampling - This mehtod initiates the action space with completely random values. It is clearly observant the the average episode length is no longer than 7.

  2. Naive-Greedy Brute Force - The Naive weight allocation method used a brute force algorithm to get the best weights. The average episode length was found to be around 200. This approach iterates through 100 random weigths which decide the update steps and saves the best obtained weights. It is basically a single layer Neural Network without backpropogation, hence Naive.

  3. Further improvements were made by using PID controlled On-Off controller with fixed weights. The graph obtained is of oscillatory nature as the control problem is of Regulatory nature. The control block and the output obtained is shown below. The pole doesnt fall ever !!!! But, it is quite obvious that this is overfitting the model and thus, would not be able to control the pole in dynamic environment .
    PID Control Block.

  4. In the fourth trial, I used a simple Q-learning model that used epsilon greedy based matrix for mitigating through exploration-exploitation trade-offs.

  5. Finally, I also developed a Deep Q-Learning network(dqn) based system for balancing the beam, using a Boltzmann policy based rewardning strategy. I made use of the baselines library.


To run the files

  • Clone the repo or download it and extract the files.
  • Import Open AI Gym using the following commmand
  pip intall gym

Gym contains a collection of pre-defined environments, where we can run our algorithms.

  • It may take up some amount of time while training the network. So it is better to comment out env.render() before starting the training.
  • Gym also allows you to save the final rendered output files using its wrapper functions. The final saved video files, weights and other logs would get saved in your working directory. To work with gym wrappers and save the output files in,
 from gym import wrappers
 env = wrappers.Monitor(env, '..\Desktop\RL\MovieFiles', force = True)

Debug the environment

The environment that we import from gym have user-friendly APIs. The following section is specific to gym environments.

  • Getiting the largest value possible in state_space
   print(env.observation_space.high)
  • Getting the lowest value possible in state_space
   print(env.observation_space.low)
  • Getting the number of actions space possible
      print(env.action_space.n)
  • Viewing the animation
   env.render()

pole-balance-control-algorithms's People

Contributors

jash-2000 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.