Coder Social home page Coder Social logo

frankiegu / alphagozero-python-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yhyu13/alphagozero-python-tensorflow

0.0 1.0 0.0 138.08 MB

Congratulation to DeepMind! This is a reengineering implementation (on behalf of many other git repo in /support/) of DeepMind's Oct19th publication: [Mastering the Game of Go without Human Knowledge]. The supervised learning approach is more practical for individuals.

License: MIT License

Python 99.38% Shell 0.62%

alphagozero-python-tensorflow's Introduction

AlphaGOZero

This is a trial implementation of DeepMind's Oct19th publication: Mastering the Game of Go without Human Knowledge.


Useful links:

All DeepMind’s AlphaGO games

GoGOD dataset, $15

KGS >=4dan, FREE

Youtube: Learn to play GO

repo: MuGo

repo: ROCAlphaGO

repo: miniAlphaGO

repo: resnet-tensorflow

repo: leela-zero (c++ AlphaGo Zero replica)

repo: reversi-alpha-zero (if you like reversi(黑白棋))

From Paper:

Our program, AlphaGo Zero, differs from AlphaGo Fan and AlphaGo Lee 12 in several im- portant aspects. First and foremost, it is trained solely by self-play reinforcement learning, starting from random play, without any supervision or use of human data. Second, it only uses the black and white stones from the board as input features. Third, it uses a single neural network, rather than separate policy and value networks. Finally, it uses a simpler tree search that relies upon this single neural network to evaluate positions and sample moves, without performing any Monte- Carlo rollouts. To achieve these results, we introduce a new reinforcement learning algorithm that incorporates lookahead search inside the training loop, resulting in rapid improvement and precise and stable learning.

Congratulation to DeepMind to pierce the frontier once again! AlphaGO Zero (fully self-play by reinforcement learning with no human games examples).

I downloaded the paper Mastering the Game of Go without Human Knowledge in the first place, but only found myself lack prior knowledge in Monte Carlo Search Tree (MCST). I tried my best to highlight what is interesting.

This time's AlphaGo uses combined policy & value network (final fc diverges to two branches) to cope with training stability. From Paper:

Innovation (annealing & Dirichlet noise) in MCTS has enabled exploration

From Paper:

And exploration leads to learning more and more complex movings, making the game at the end of training (~70h) both competitive and balanced.

From Paper:

The input is still raw stones but normal CNN has been replaced by residual net

From Paper:

And finally pure RL has outperformed supervised learning+RL agent

From Paper:

AlphaGo Zero Architecture:

  • input 19 x 19 x 17: 7 previous states + current state player’s stone, 7 previous states + current state opponent’s stone, player’s colour
    1. A convolution of 256 filters of kernel size 3 x 3 with stride 1
    2. Batch normalisation
    3. A rectifier non-linearity

Residual Blocks

    1. A convolution of 256 filters of kernel size 3 x 3 with stride 1
    2. Batch normalisation
    3. A rectifier non-linearity
    4. A convolution of 256 filters of kernel size 3 x 3 with stride 1
    5. Batch normalisation
    6. A skip connection that adds the input to the block
    7. A rectifier non-linearity

Policy Head

  • 1.A convolution of 2 filters of kernel size 1 x 1 with stride 1 2. Batch normalisation 3. A rectifier non-linearity 4. A fully connected linear layer that outputs a vector of size 192^2 + 1 = 362 corresponding to logit probabilities for all intersections and the pass move

Value Head

    1. A convolution of 1 filter of kernel size 1 x 1 with stride 1
    2. Batch normalisation
    3. A rectifier non-linearity
    4. A fully connected linear layer to a hidden layer of size 256
    5. A rectifier non-linearity
    6. A fully connected linear layer to a scalar
    7. A tanh non-linearity outputting a scalar in the range [ 1, 1]

Set up

Install requirement

python 3.6

pip install -r requirement.txt

Download Dataset (kgs 4dan)

Under repo's root dir

cd data/download
chmod +x download.sh
./download.sh

Preprocess Data

It is only an example, feel free to assign your local dataset directory

python preprocess.py preprocess ./data/SGFs/kgs-*

Train A Model

python main.py --mode=train --force_save —-n_resid_units=20

Play Against An A.I. (currently only random A.I. is available)

python main.py --mode=gtp —-policy=random --model_path='./savedmodels/model--0.0.ckpt'

Basic Self-play

Under repo’s root dir

python utils/selfplay.py

Credit:

*Brain Lee *Ritchie Ng

alphagozero-python-tensorflow's People

Contributors

yhyu13 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.