yhyu13 / alphagozero-python-tensorflow Goto Github PK

Congratulation to DeepMind! This is a reengineering implementation (on behalf of many other git repo in /support/) of DeepMind's Oct19th publication: [Mastering the Game of Go without Human Knowledge]. The supervised learning approach is more practical for individuals. (This repository has single purpose of education only)

License: MIT License

Python 99.41% Shell 0.59%

deepmind supervised-learning python-tensorflow alphago-zero

alphagozero-python-tensorflow's Introduction

AlphaGOZero (python tensorflow implementation)

This is a trial implementation of DeepMind's Oct19th publication: Mastering the Game of Go without Human Knowledge.

DeepMind release AlphaZero Teaching Go. It's a lot of fun!

From Paper

Pure RL has outperformed supervised learning+RL agent

SL evaluation

Download trained model

https://drive.google.com/drive/folders/1Xs8Ly3wjMmXjH2agrz25Zv2e5-yqQKaP?usp=sharing
Place under ./savedmodels/large20/

Set up

Install requirement

python 3.6 tensorflow/tensorflow-gpu (version 1.4, version >= 1.5 can't load trained models)

pip install -r requirement.txt

Download Dataset (kgs 4dan)

Under repo's root dir

cd data/download
chmod +x download.sh
./download.sh

Preprocess Data

It is only an example, feel free to assign your local dataset directory

python preprocess.py preprocess ./data/SGFs/kgs-*

Train A Model

python main.py --mode=train

Play Against An A.I.

python main.py --mode=gtp —-gtp_poliy=greedypolicy --model_path='./savedmodels/your_model.ckpt'

Play in Sabaki

In console:

which python

add result to the headline of main.py with #! prefix.

Add the path of main.py to Sabaki's manage Engine with argument --mode=gtp

TODO:

Credit (orderless):

*Brain Lee *Ritchie Ng *Samuel Graván *森下健 *yuanfengpang

alphagozero-python-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

guangyaai jayjinseokkim coodest zhipengchen allensmile spark-lin tonyxia2016 likeucode benjamesbabala ryfan-rs zhuwenxiao wowuq 0xqq sunjiajia lienbo jackode zhangjiulong frankiegu whitegl zhongpuxia shenleiz claireseaworthy daoos walter1218 jamesben6688 kustomzone kelvinkarroy mujinveil nanfengpo sumit33k maplewzx eminsight 1715509415 chengstone wjsxlb2017 firefoxmetzger muyunren sylondia nsl2014fm vivienzou1 williamwhe mniwk qiupys webygit johnsoncui awilliamson shuxjweb chocliu totonac wjssx jingfei05 jackylee1 changze pankeshgupta schabc grospy evershinemj hermeszhang akatoshking bisratm yangyouy artificyan pcblanchet lamperougeyxy mihawk2016 kqqinging guticamilo93 kunlun-zhu vist0r huaweitechnology yclxf nkcr7 gloriatab a515151 ranshon artificerlife git04112019 xiaolongbuffett yuhai-china citymap jun-lizst marsqz delphigeek wingml yangzl2014 darknight1509 pabmar68hotmail clockzhong beosro tlkl fanjunhua1228 kuyecheng oldbister husin123 lgh0504 jinhuli jiapengwei sharechanxd benneighbour tianhm

alphagozero-python-tensorflow's Issues

What does random AI mean?

Mentioned in the README:

Play Against An A.I. (currently only random A.I. is available)

What does random A.I. mean?

How to specify pretrained model in model_path?

I downloaded pretrained model in checkpoints/large20, and tried to specify model_path to run. but it seems tf's not able to restore correctly. What's the correct path to pass in?

Go on!!

What you have done until now looks pretty well. Please go on.

How is the selfplay version going?

Is the training effective?

uvloop modle can not for windows

Collecting uvloop (from -r .\requirements.txt (line 6))
Using cached uvloop-0.8.1.tar.gz
RuntimeError: uvloop does not support Windows at the moment

can change this uvloop?

Not found: Key Variable not found in checkpoint

DEBUG network: Loading Model Failed

ValueError: At least two variables have the same name: init/initial_bn/beta

Downloaded the trained model, and run as below:
python main.py --mode=gtp --model_path='./savedmodels/model-0.4114.ckpt'
gives error:

CRITICAL root: Traceback (most recent call last):
File "main.py", line 231, in
fnFLAGS.MODE
File "main.py", line 226, in
'gtp': lambda: gtp(),
File "main.py", line 69, in gtp
engine = make_gtp_instance(flags=flags, hps=hps)
File "/home/ly/src/lib/alphazero/AlphaGOZero-python-tensorflow/utils/gtp_wrapper.py", line 110, in make_gtp_instance
n = Network(flags, hps)
File "/home/ly/src/lib/alphazero/AlphaGOZero-python-tensorflow/Network.py", line 85, in init
self.saver = tf.train.Saver(var_list=var_to_save, max_to_keep=10)
File "/home/ly/anaconda3/envs/learning/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1311, in init
self.build()
File "/home/ly/anaconda3/envs/learning/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1320, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/ly/anaconda3/envs/learning/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1357, in _build
build_save=build_save, build_restore=build_restore)
File "/home/ly/anaconda3/envs/learning/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 787, in _build_internal
saveables = self._ValidateAndSliceInputs(names_to_saveables)
File "/home/ly/anaconda3/envs/learning/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 635, in _ValidateAndSliceInputs
names_to_saveables = BaseSaverBuilder.OpListToDict(names_to_saveables)
File "/home/ly/anaconda3/envs/learning/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 612, in OpListToDict
name)
ValueError: At least two variables have the same name: init/initial_bn/beta

What level is pretrained large20 against random? The win rate is always 0.5

The win rate is always 0.5 between large 20 and random, even approaching the end. Is the pretrained model no particular difference than random, even though it's running DNN+MCTS?

'Network' object has no attribute 'run' when gtp mode is chosen

Is the code incomplete?

ChessAlpha Zero development

Hello, @mokemokechicken and @Yuhang.

As I promised I've done (just in one day, I had no more time) an adaptation of the reversi-zero project @mokemokechicken did into a chess version: https://github.com/Zeta36/chess-alpha-zero

The project is already functional (in the sense that it doesn't fail and the three workers do their job), but unfortunately I have no GPU (just an Intel i5 CPU) nor money to spend in a AWS server or similar.
So I just could check the self-play with the toy config "--type mini". Moreover, I had to descend self.simulation_num_per_move = 2 and self.parallel_search_num = 2.

In this way I was able to generate the 100 games needed for the optimitazion worker to start. The optimization process seemed to work perfeclty, and the model was able to reach a loss of ~0.6 after 1000 steps. I guess so that the model was able to overfit the 100 games of the former self-play.

Then I execute the evaluation process and it worked fine. The overfitted model was able to defeat the random original model of the beggining by 100% (causality??).
Finally I check the ASCII way to play against the best model. It worked as expected. To indicate our moves we have to use UCI notation: a1a2, b3b8, etc. More info here: https://chessprogramming.wikispaces.com/Algebraic+Chess+Notation

By the way, the model output is now of size 8128 (instead of the 64 of reversi and the 362 of Go), and it correspond to all possible legal UCI moves in a chess game. I generate these new labels in the config.py file.
I have to note you also that the board state (and the player turn) is traced by the fen chess notation: https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation

Here is for example the FEN for the starting position: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 (w is for white to move).

I changed also a little bit the resign function. Chess is not like Go or Reversi where you always finish the game more or less in the same number of movements. In Chess the game can end in a lot of ways (checkmate, stalemate, etc.) and the self-play coul be more than 200 movements before reaching an ending position (normally in a draw). So I decided to cut-off the play after some player has more than 13 points of advantage (this score is computed as usual taking into account the value of the pieces: he queen is worth 10, roots 5.5, etc).

As you can imagine with my poor machine I could not fully test the project beyond these tiny tests of functionality. So I'd really appreciate if you could please take some free time of your GPU's for testing this implementation in a more serious way. Both you can of course be colaborators of the project if you wish.

Also I don't know if I commited some theoretical bugs after this adaptation to chess and I'd apretiate too any comments by your side in this sense.

Best regards!!

main.py: error: unrecognized arguments: —-policy=randompolicy

Simply copy paste the command in README, would have unrecognized arguments for policy error.

What does this argument do?

Project Update

Hi,

My name is Yohan YU, your developer of this repository. Right now, I'm able to run a volta instance on the cloud, let's see how it goes. I will keep updating in a week.