Coder Social home page Coder Social logo

yhyu13 / alphagozero-python-tensorflow Goto Github PK

View Code? Open in Web Editor NEW
341.0 29.0 113.0 189.37 MB

Congratulation to DeepMind! This is a reengineering implementation (on behalf of many other git repo in /support/) of DeepMind's Oct19th publication: [Mastering the Game of Go without Human Knowledge]. The supervised learning approach is more practical for individuals. (This repository has single purpose of education only)

License: MIT License

Python 99.41% Shell 0.59%
deepmind supervised-learning python-tensorflow alphago-zero

alphagozero-python-tensorflow's Introduction

AlphaGOZero (python tensorflow implementation)

This is a trial implementation of DeepMind's Oct19th publication: Mastering the Game of Go without Human Knowledge.

DeepMind release AlphaZero Teaching Go. It's a lot of fun!


From Paper

Pure RL has outperformed supervised learning+RL agent

SL evaluation

Download trained model

  1. https://drive.google.com/drive/folders/1Xs8Ly3wjMmXjH2agrz25Zv2e5-yqQKaP?usp=sharing

  2. Place under ./savedmodels/large20/


Set up

Install requirement

python 3.6 tensorflow/tensorflow-gpu (version 1.4, version >= 1.5 can't load trained models)

pip install -r requirement.txt

Download Dataset (kgs 4dan)

Under repo's root dir

cd data/download
chmod +x download.sh
./download.sh

Preprocess Data

It is only an example, feel free to assign your local dataset directory

python preprocess.py preprocess ./data/SGFs/kgs-*

Train A Model

python main.py --mode=train

Play Against An A.I.

python main.py --mode=gtp —-gtp_poliy=greedypolicy --model_path='./savedmodels/your_model.ckpt'

Play in Sabaki

  1. In console:
which python

add result to the headline of main.py with #! prefix.

  1. Add the path of main.py to Sabaki's manage Engine with argument --mode=gtp

TODO:

  • AlphaGo Zero Architecture
  • Supervised Training
  • Self Play pipeline
  • Go Text Protocol
  • Sabaki Engine enabled
  • Tabula rasa (failed)
  • Distributed learning

Credit (orderless):

*Brain Lee *Ritchie Ng *Samuel Graván *森下 健 *yuanfengpang

alphagozero-python-tensorflow's People

Contributors

yhyu13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alphagozero-python-tensorflow's Issues

What does random AI mean?

Mentioned in the README:

Play Against An A.I. (currently only random A.I. is available)

What does random A.I. mean?

Go on!!

What you have done until now looks pretty well. Please go on.

uvloop modle can not for windows

Collecting uvloop (from -r .\requirements.txt (line 6))
Using cached uvloop-0.8.1.tar.gz
RuntimeError: uvloop does not support Windows at the moment

can change this uvloop?

ValueError: At least two variables have the same name: init/initial_bn/beta

Downloaded the trained model, and run as below:
python main.py --mode=gtp --model_path='./savedmodels/model-0.4114.ckpt'
gives error:

CRITICAL root: Traceback (most recent call last):
File "main.py", line 231, in
fnFLAGS.MODE
File "main.py", line 226, in
'gtp': lambda: gtp(),
File "main.py", line 69, in gtp
engine = make_gtp_instance(flags=flags, hps=hps)
File "/home/ly/src/lib/alphazero/AlphaGOZero-python-tensorflow/utils/gtp_wrapper.py", line 110, in make_gtp_instance
n = Network(flags, hps)
File "/home/ly/src/lib/alphazero/AlphaGOZero-python-tensorflow/Network.py", line 85, in init
self.saver = tf.train.Saver(var_list=var_to_save, max_to_keep=10)
File "/home/ly/anaconda3/envs/learning/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1311, in init
self.build()
File "/home/ly/anaconda3/envs/learning/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1320, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/ly/anaconda3/envs/learning/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1357, in _build
build_save=build_save, build_restore=build_restore)
File "/home/ly/anaconda3/envs/learning/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 787, in _build_internal
saveables = self._ValidateAndSliceInputs(names_to_saveables)
File "/home/ly/anaconda3/envs/learning/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 635, in _ValidateAndSliceInputs
names_to_saveables = BaseSaverBuilder.OpListToDict(names_to_saveables)
File "/home/ly/anaconda3/envs/learning/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 612, in OpListToDict
name)
ValueError: At least two variables have the same name: init/initial_bn/beta

ChessAlpha Zero development

Hello, @mokemokechicken and @Yuhang.

As I promised I've done (just in one day, I had no more time) an adaptation of the reversi-zero project @mokemokechicken did into a chess version: https://github.com/Zeta36/chess-alpha-zero

The project is already functional (in the sense that it doesn't fail and the three workers do their job), but unfortunately I have no GPU (just an Intel i5 CPU) nor money to spend in a AWS server or similar.
So I just could check the self-play with the toy config "--type mini". Moreover, I had to descend self.simulation_num_per_move = 2 and self.parallel_search_num = 2.

In this way I was able to generate the 100 games needed for the optimitazion worker to start. The optimization process seemed to work perfeclty, and the model was able to reach a loss of ~0.6 after 1000 steps. I guess so that the model was able to overfit the 100 games of the former self-play.

Then I execute the evaluation process and it worked fine. The overfitted model was able to defeat the random original model of the beggining by 100% (causality??).
Finally I check the ASCII way to play against the best model. It worked as expected. To indicate our moves we have to use UCI notation: a1a2, b3b8, etc. More info here: https://chessprogramming.wikispaces.com/Algebraic+Chess+Notation

By the way, the model output is now of size 8128 (instead of the 64 of reversi and the 362 of Go), and it correspond to all possible legal UCI moves in a chess game. I generate these new labels in the config.py file.
I have to note you also that the board state (and the player turn) is traced by the fen chess notation: https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation

Here is for example the FEN for the starting position: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 (w is for white to move).

I changed also a little bit the resign function. Chess is not like Go or Reversi where you always finish the game more or less in the same number of movements. In Chess the game can end in a lot of ways (checkmate, stalemate, etc.) and the self-play coul be more than 200 movements before reaching an ending position (normally in a draw). So I decided to cut-off the play after some player has more than 13 points of advantage (this score is computed as usual taking into account the value of the pieces: he queen is worth 10, roots 5.5, etc).

As you can imagine with my poor machine I could not fully test the project beyond these tiny tests of functionality. So I'd really appreciate if you could please take some free time of your GPU's for testing this implementation in a more serious way. Both you can of course be colaborators of the project if you wish.

Also I don't know if I commited some theoretical bugs after this adaptation to chess and I'd apretiate too any comments by your side in this sense.

Best regards!!

Project Update

Hi,

My name is Yohan YU, your developer of this repository. Right now, I'm able to run a volta instance on the cloud, let's see how it goes. I will keep updating in a week.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.