Coder Social home page Coder Social logo

minesweeperai's Introduction

Introduction

This project was created from scratch using Python,Numpy to create the game logic, pygame to render the states and PyTorch to train the AI. It has given satisfactory results with the now dated DQN algorithm.

Details:

These are just to give you a rough overview of the project, more details about functions and variables can be seen in the comments of my code

Requirements

pytorch
numpy
matplotlib
pygame

Result:


white = zeros
dark gray = unexplored
Note: The game has been slowed down to do 2 decisions every 1 second

Game

Created the minesweeper game from scratch using python and numpy. Can take upto 7000 decisions per second when the renderer is off. the game component supports creation of any grid size with any number of bombs.
Logic for the game is written in game.py

Renderer

The backend of the game is connected with PyGame renderer. If this is turned on, the training time drastically increases and slows the process down. Color palette has been ripped off from colorhunt.
Logic for the renderer is written in renderer.py

Playable version

I have implemented a click and playable version incorporating both the above in object oriented fashion and included a clicker, and a solution display in the command prompt to debug. Width, Height, and bomb numbers can be modified in the class variables
Logic for the playable version is written in playable.py
Just execute it in cmd to play it

python playable.py


Playable eversion of 20x20 grid with 20 mines

Reinforcement Learning

1. Deep Q learning

Deep Q learning takes in the state and tries to predict the action which will maximize the reward of the game. As opposed to traditional Q learning where all possible states are stored in memory, Deep Q tries to find a relation between input state and action, relieving the stress on memory requirements of the algorithm.

I have implemented Dueling DQN with help from RL Adventure repo

It takes roughly 20,000 x 4096 decisions to reach a rough winrate of 60 percent on a 6x6 board with 6 mines.
Training on an i5 6200u took about 18 hours roughly.

Logic for training is written in train_ddqn.py

python train_ddqn.py

2. Action Masking

Implemented action masking in softmax layer of the model to prevent it from choosing invalid actions, such as tiles which are already selected.
This modification helped the model learn faster as opposed to giving it a -ve reward on choosing invalid actions

3. Gradient Clipping ( Deprecated )

Added a clip to the gradients as I was facing vanishing and exploding gradients leading to nan values after few epochs of training

4. Soft Target Network update

Due to this, gradient clipping is not required. The soft clipping solved the loss jumping to very high values.

5. Reward Based Epsilon Decay (RBED)

This implementation carries out epsilon decay based on the reward of the model. It decreases the epsilon value to a factor of itself (0.9x here) if a reward threshold is met, and then increases the reward threshold by a certain amount. The process is repeated and a minimum cap of 0.01 is set for epsilon.

Testing

A simple python file for loading the pre trained module and testing its winrate and visualized test is written.
Logic for testing is written in tester.py

python tester.py

win_tester: function checks the number of wins in 1000 games
slow_tester: visualizes a single game of the game with the AI while printing rewards

Plotting

A module for reading the log file and printing a smoothened all in one normalized graph is present.
Note the plotter can run in realtime while the module is being trained as it reads from "./Logs/ddqn_log.txt" which is being written to in realtime by train_ddqn.py

Static Plotter:

cd Logs
python plotter.py

Dynamic Plotter ( Refreshes every second ) :

cd Logs
python dynamic_plotter.py

Note:

  1. for both let 150 steps of the training run, cause smoothener needs a minimum of 150 steps.
  2. The edge of the graph always falls off as a straight line due to smoothening factor. Don't panic _
  3. Everything is normalized from 0-1 just for my ease to see at a glance if the model is learning or not.


(Above) Normalized graph to give rough overview of the increase and decrease of parameters.
(Below) Isolated graphs of the parameters with respect to their original scales.




minesweeperai's People

Contributors

manjunatha-b avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.