Coder Social home page Coder Social logo

masoudslipknot / reinforcment_learning_valueiteration Goto Github PK

View Code? Open in Web Editor NEW
5.0 3.0 0.0 36 KB

Reinforcement- Learning project: Value Iteration Implementation.

Java 100.00%
reinforcement-learning java value-iteration artificial-intelligence

reinforcment_learning_valueiteration's Introduction

Reinforcment_Learning_ValueIteration

Project Description:

This below map is a discretized world for an agent which will randomly be in any of these states except for A and B. An agent can make 4 moves in each state: North, East, West, and South. Each movement is done by probability. When agent decides to go North, this action can be done with 0.6 probablity and two other actions which are Perpendicular to the current movement might be done with 0.2 probablity. If any action causes that agent will go out of the map, aget will be punished with -1 reward. If agent can get to A and B it will recive 10 and 5 as reward value and then will be directed to C and D.

Map:

[Map](https://www.dropbox.com/s/lfr8508m4429zc8/Map.png?dl=0)

Result:

Result is an 2d array in which state the policy is stated. It means that if agent is in that state it is recommend that it follows the policy.

Policy Map

This is the policy map when the reward value for going out of map is -1 and gamma parameter is 0.1. According to the Belman equation, if you increase the gamma parameter then it will take longer for the algorithm to converge.
And, we will have larger values for Utilities. gamma parameter's duty is to elimiate the future and computation.
------------
|E|D|W|C|W|
------------
|E|N|N|N|W|
------------
|E|N|N|N|W|
------------
|E|N|N|N|W|
------------
|N|N|N|N|N|
------------
Then the policy map when the reward value for going out og map is 10.
-----------
|E|D|W|C|W|
-----------
|E|N|W|N|W|
-----------
|E|N|N|N|W|
-----------
|E|N|N|N|W|
-----------
|N|N|N|N|N|
-----------
and when the reward value for going out of map is 0 the policy map is defined as below:
-----------
|E|D|W|C|W|
-----------
|N|N|N|N|N|
-----------
|N|N|N|N|N|
-----------
|N|N|N|N|N|
-----------
|N|N|N|N|N|
-----------

Algorithm :

For developing this project Reinforcment Learning is used and Dynamic Programming which is one RL algorithm is implemented. For defining the Policy we used Vale Iteration.
We have different methods which I will make an explanation for them.
1 - public void intialMap(int PosX, int PosY):
This method will recieve two integer as agent current place and initiliez the map.
2 - public void UpdateUtility(int Row,int column):
This method will update each poition utility based on Value Iteration and 4 possible moves.
3 - public double Northmove(int Row, int column) and 3 other actions:
These method check that in this postition is this action applicable or not and sets utility for the position.
4 - public void printmap():
This method pritns tha map.
5 - public void ValueIteration():
In this method Value Iteration algorithm is implemented.
6 - public static void agentrun():
Finally in this method we first generate a random position for current Agent place. And we run the algorithm.
7- Value of Discount factor can be easily changed in code.

Author:

Masoud Erfani
Hope you enjoy it.

reinforcment_learning_valueiteration's People

Contributors

masoudslipknot avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.