Coder Social home page Coder Social logo

alpha-go-reversi's Introduction

alpha-go-reversi

AlphaReversi implementation based on "Mastering the game of Go without human knowledge" and "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by DeepMind.

The algorithm learns to play games like Chess and Go without any human knowledge. It uses Monte Carlo Tree Search and a Deep Residual Network to evaluate the board state and play the most promising move.

Requirements

  • TensorFlow (Tested on 1.4.0)
  • NumPy
  • Python 3

Options:

  • --num_iterations: Number of iterations.
  • --num_games: Number of self play games played during each iteration.
  • --num_mcts_sims: Number of MCTS simulations per game.
  • --c_puct: The level of exploration used in MCTS.
  • --l2_val: The level of L2 weight regularization used during training.
  • --momentum: Momentum Parameter for the momentum optimizer.
  • --learning_rate: Learning Rate for the momentum optimizer.
  • --t_policy_val: Value for policy prediction.
  • --temp_init: Initial Temperature parameter to control exploration.
  • --temp_final: Final Temperature parameter to control exploration.
  • --temp_thresh: Threshold where temperature init changes to final.
  • --epochs: Number of epochs during training.
  • --batch_size: Batch size for training.
  • --dirichlet_alpha: Alpha value for Dirichlet noise.
  • --epsilon: Value of epsilon for calculating Dirichlet noise.
  • --model_directory: Name of the directory to store models.
  • --num_eval_games: Number of self-play games to play for evaluation.
  • --eval_win_rate: Win rate needed to be the best model.
  • --load_model: Binary to initialize the network with the best model.
  • --human_play: Binary to play as a Human vs the AI.
  • --resnet_blocks: Number of residual blocks in the resnet.
  • --record_loss: Binary to record policy and value loss to a file.
  • --loss_file: Name of the file to record loss.
  • --game: Number of the game. 0: Tic Tac Toe, 1: Othello.

The models file in othello

  • models3a : n=10
  • models3b : n=30, epoch=50
  • models3c : n=100
  • models4a : epoch=10
  • models4b : epoch=30, cpuct=5, epsilon=0.25
  • models5a : cpuct=1
  • models5b : cpuct=20
  • models6a : epsilon=0
  • models6c : epsilon=0.5

best param : n=100, cpuct=5, epoch=50, epsilon=0.25

alpha-go-reversi's People

Contributors

jayhew910 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.