fairy-stockfish / variant-nnue-pytorch Goto Github PK

View Code? Open in Web Editor NEW

27.0 4.0 17.0 1004 KB

chess variant NNUE training code for Fairy-Stockfish

Home Page: https://github.com/fairy-stockfish/variant-nnue-pytorch/wiki/Introduction

CMake 0.29% Batchfile 0.07% Python 73.77% C++ 25.22% Shell 0.57% C 0.08%

board-game chess-variants nnue xiangqi shogi crazyhouse fairy-stockfish

variant-nnue-pytorch's Introduction

Fairy-Stockfish

Overview

Fairy-Stockfish is a chess variant engine derived from Stockfish designed for the support of fairy chess variants and easy extensibility with more games. It can play various regional, historical, and modern chess variants as well as games with user-defined rules. For compatibility with graphical user interfaces it supports the UCI, UCCI, USI, UCI-cyclone, and CECP/XBoard protocols.

The goal of the project is to create an engine supporting a large variety of chess-like games, equipped with the powerful search of Stockfish. Despite its generality the playing strength is on a very high level in almost all supported variants. Due to its multi-protocol support Fairy-Stockfish works with almost any chess variant GUI.

Installation

You can download the Windows executable or Linux binary from the latest release or compile the program from source. The program comes without a graphical user interface, so you perhaps want to use it together with a compatible GUI, or play against it online at pychess, lishogi, or lichess. Read more about how to use Fairy-Stockfish in the wiki.

If you want to preview the functionality of Fairy-Stockfish before downloading, you can try it out on the Fairy-Stockfish playground in the browser.

Optional NNUE evaluation parameter files to improve playing strength for many variants are in the list of NNUE networks. For the regional variants Xiangqi, Janggi, and Makruk dedicated releases with built-in NNUE networks are available. See the wiki for more details on NNUE.

Contributing

If you like this project, please support its development via patreon or paypal, by contributing CPU time to the framework for testing of code improvements, or by contributing to the code or documentation. An introduction to the code base can be found in the wiki.

Supported games

The games currently supported besides chess are listed below. Fairy-Stockfish can also play user-defined variants loaded via a variant configuration file, see the file src/variants.ini and the wiki.

Regional and historical games

Xiangqi, Manchu, Minixiangqi, Supply chess
Shogi, Shogi variants
Janggi
Makruk, ASEAN, Makpong, Ai-Wok
Ouk Chatrang, Kar Ouk
Sittuyin
Shatar, Jeson Mor
Shatranj, Courier
Raazuvaa

Chess variants

Capablanca, Janus, Modern, Chancellor, Embassy, Gothic, Capablanca random chess
Grand, Shako, Centaur, Tencubed, Opulent
Chess960, Placement/Pre-Chess
Crazyhouse, Loop, Chessgi, Pocket Knight, Capablanca-Crazyhouse
Bughouse, Koedem
Seirawan, Seirawan-Crazyhouse, Dragon Chess
Amazon, Chigorin, Almost chess
Hoppel-Poppel, New Zealand
Antichess, Giveaway, Suicide, Losers, Codrus
Extinction, Kinglet, Three Kings, Coregal
King of the Hill, Racing Kings
Three-check, Five-check
Los Alamos, Gardner's Minichess
Atomic
Horde, Maharajah and the Sepoys
Knightmate, Nightrider, Grasshopper
Duck chess, Omicron, Gustav III
Berolina Chess, Pawn-Sideways, Pawn-Back, Torpedo, Legan Chess
Spartan Chess
Wolf Chess
Troitzky Chess

Shogi variants

Related games

Help

See the Fairy-Stockfish Wiki for more info, or if the required information is not available, open an issue or join our discord server.

Bindings

Besides the C++ engine, this project also includes bindings for other programming languages in order to be able to use it as a library for chess variants. They support move, SAN, and FEN generation, as well as checking of game end conditions for all variants supported by Fairy-Stockfish. Since the bindings are using the C++ code, they are very performant compared to libraries directly written in the respective target language.

Python

The python binding pyffish contributed by @gbtami is implemented in pyffish.cpp. It is e.g. used in the backend for the pychess server.

Javascript

The javascript binding ffish.js contributed by @QueensGambit is implemented in ffishjs.cpp. The compilation/binding to javascript is done using emscripten, see the readme.

Ports

WebAssembly

For in-browser use a port of Fairy-Stockfish to WebAssembly is available at npm. It is e.g. used for local analysis on pychess.org. Also see the Fairy-Stockfish WASM demo available at https://fairy-stockfish-nnue-wasm.vercel.app/.

Stockfish

Overview

Stockfish is a free, powerful UCI chess engine derived from Glaurung 2.1. Stockfish is not a complete chess program and requires a UCI-compatible graphical user interface (GUI) (e.g. XBoard with PolyGlot, Scid, Cute Chess, eboard, Arena, Sigma Chess, Shredder, Chess Partner or Fritz) in order to be used comfortably. Read the documentation for your GUI of choice for information about how to use Stockfish with it.

The Stockfish engine features two evaluation functions for chess, the classical evaluation based on handcrafted terms, and the NNUE evaluation based on efficiently updatable neural networks. The classical evaluation runs efficiently on almost all CPU architectures, while the NNUE evaluation benefits from the vector intrinsics available on most CPUs (sse2, avx2, neon, or similar).

Files

This distribution of Stockfish consists of the following files:

Readme.md, the file you are currently reading.
Copying.txt, a text file containing the GNU General Public License version 3.
AUTHORS, a text file with the list of authors for the project
src, a subdirectory containing the full source code, including a Makefile that can be used to compile Stockfish on Unix-like systems.
a file with the .nnue extension, storing the neural network for the NNUE evaluation. Binary distributions will have this file embedded.

The UCI protocol and available options

The Universal Chess Interface (UCI) is a standard protocol used to communicate with a chess engine, and is the recommended way to do so for typical graphical user interfaces (GUI) or chess tools. Stockfish implements the majority of it options as described in the UCI protocol.

Developers can see the default values for UCI options available in Stockfish by typing ./stockfish uci in a terminal, but the majority of users will typically see them and change them via a chess GUI. This is a list of available UCI options in Stockfish:

Threads

The number of CPU threads used for searching a position. For best performance, set this equal to the number of CPU cores available.
Hash

The size of the hash table in MB. It is recommended to set Hash after setting Threads.
Clear Hash

Clear the hash table.
Ponder

Let Stockfish ponder its next move while the opponent is thinking.
MultiPV

Output the N best lines (principal variations, PVs) when searching. Leave at 1 for best performance.
Use NNUE

Toggle between the NNUE and classical evaluation functions. If set to "true", the network parameters must be available to load from file (see also EvalFile), if they are not embedded in the binary.
EvalFile

The name of the file of the NNUE evaluation parameters. Depending on the GUI the filename might have to include the full path to the folder/directory that contains the file. Other locations, such as the directory that contains the binary and the working directory, are also searched.
UCI_AnalyseMode

An option handled by your GUI.
UCI_Chess960

An option handled by your GUI. If true, Stockfish will play Chess960.
UCI_ShowWDL

If enabled, show approximate WDL statistics as part of the engine output. These WDL numbers model expected game outcomes for a given evaluation and game ply for engine self-play at fishtest LTC conditions (60+0.6s per game).
UCI_LimitStrength

Enable weaker play aiming for an Elo rating as set by UCI_Elo. This option overrides Skill Level.
UCI_Elo

If enabled by UCI_LimitStrength, aim for an engine strength of the given Elo. This Elo rating has been calibrated at a time control of 60s+0.6s and anchored to CCRL 40/4.
Skill Level

Lower the Skill Level in order to make Stockfish play weaker (see also UCI_LimitStrength). Internally, MultiPV is enabled, and with a certain probability depending on the Skill Level a weaker move will be played.
SyzygyPath

Path to the folders/directories storing the Syzygy tablebase files. Multiple directories are to be separated by ";" on Windows and by ":" on Unix-based operating systems. Do not use spaces around the ";" or ":".

Example: C:\tablebases\wdl345;C:\tablebases\wdl6;D:\tablebases\dtz345;D:\tablebases\dtz6

It is recommended to store .rtbw files on an SSD. There is no loss in storing the .rtbz files on a regular HD. It is recommended to verify all md5 checksums of the downloaded tablebase files (md5sum -c checksum.md5) as corruption will lead to engine crashes.
SyzygyProbeDepth

Minimum remaining search depth for which a position is probed. Set this option to a higher value to probe less aggressively if you experience too much slowdown (in terms of nps) due to tablebase probing.
Syzygy50MoveRule

Disable to let fifty-move rule draws detected by Syzygy tablebase probes count as wins or losses. This is useful for ICCF correspondence games.
SyzygyProbeLimit

Limit Syzygy tablebase probing to positions with at most this many pieces left (including kings and pawns).
Move Overhead

Assume a time delay of x ms due to network and GUI overheads. This is useful to avoid losses on time in those cases.
Slow Mover

Lower values will make Stockfish take less time in games, higher values will make it think longer.
nodestime

Tells the engine to use nodes searched instead of wall time to account for elapsed time. Useful for engine testing.
Debug Log File

Write all communication to and from the engine into a text file.

For developers the following non-standard commands might be of interest, mainly useful for debugging:

bench ttSize threads limit fenFile limitType evalType

Performs a standard benchmark using various options. The signature of a version (standard node count) is obtained using all defaults. bench is currently bench 16 1 13 default depth mixed.
compiler

Give information about the compiler and environment used for building a binary.
d

Display the current position, with ascii art and fen.
eval

Return the evaluation of the current position.
export_net [filename]

Exports the currently loaded network to a file. If the currently loaded network is the embedded network and the filename is not specified then the network is saved to the file matching the name of the embedded network, as defined in evaluate.h. If the currently loaded network is not the embedded network (some net set through the UCI setoption) then the filename parameter is required and the network is saved into that file.
flip

Flips the side to move.

A note on classical evaluation versus NNUE evaluation

Both approaches assign a value to a position that is used in alpha-beta (PVS) search to find the best move. The classical evaluation computes this value as a function of various chess concepts, handcrafted by experts, tested and tuned using fishtest. The NNUE evaluation computes this value with a neural network based on basic inputs (e.g. piece positions only). The network is optimized and trained on the evaluations of millions of positions at moderate search depth.

The NNUE evaluation was first introduced in shogi, and ported to Stockfish afterward. It can be evaluated efficiently on CPUs, and exploits the fact that only parts of the neural network need to be updated after a typical chess move. The nodchip repository provides additional tools to train and develop the NNUE networks. On CPUs supporting modern vector instructions (avx2 and similar), the NNUE evaluation results in much stronger playing strength, even if the nodes per second computed by the engine is somewhat lower (roughly 80% of nps is typical).

Notes:

the NNUE evaluation depends on the Stockfish binary and the network parameter file (see the EvalFile UCI option). Not every parameter file is compatible with a given Stockfish binary, but the default value of the EvalFile UCI option is the name of a network that is guaranteed to be compatible with that binary.
to use the NNUE evaluation, the additional data file with neural network parameters needs to be available. Normally, this file is already embedded in the binary or it can be downloaded. The filename for the default (recommended) net can be found as the default value of the EvalFile UCI option, with the format nn-[SHA256 first 12 digits].nnue (for instance, nn-c157e0a5755b.nnue). This file can be downloaded from

https://tests.stockfishchess.org/api/nn/[filename]

replacing [filename] as needed.

What to expect from the Syzygy tablebases?

If the engine is searching a position that is not in the tablebases (e.g. a position with 8 pieces), it will access the tablebases during the search. If the engine reports a very large score (typically 153.xx), this means it has found a winning line into a tablebase position.

If the engine is given a position to search that is in the tablebases, it will use the tablebases at the beginning of the search to preselect all good moves, i.e. all moves that preserve the win or preserve the draw while taking into account the 50-move rule. It will then perform a search only on those moves. The engine will not move immediately, unless there is only a single good move. The engine likely will not report a mate score, even if the position is known to be won.

It is therefore clear that this behaviour is not identical to what one might be used to with Nalimov tablebases. There are technical reasons for this difference, the main technical reason being that Nalimov tablebases use the DTM metric (distance-to-mate), while the Syzygy tablebases use a variation of the DTZ metric (distance-to-zero, zero meaning any move that resets the 50-move counter). This special metric is one of the reasons that the Syzygy tablebases are more compact than Nalimov tablebases, while still storing all information needed for optimal play and in addition being able to take into account the 50-move rule.

Large Pages

Stockfish supports large pages on Linux and Windows. Large pages make the hash access more efficient, improving the engine speed, especially on large hash sizes. Typical increases are 5..10% in terms of nodes per second, but speed increases up to 30% have been measured. The support is automatic. Stockfish attempts to use large pages when available and will fall back to regular memory allocation when this is not the case.

Support on Linux

Large page support on Linux is obtained by the Linux kernel transparent huge pages functionality. Typically, transparent huge pages are already enabled, and no configuration is needed.

Support on Windows

The use of large pages requires "Lock Pages in Memory" privilege. See Enable the Lock Pages in Memory Option (Windows) on how to enable this privilege, then run RAMMap to double-check that large pages are used. We suggest that you reboot your computer after you have enabled large pages, because long Windows sessions suffer from memory fragmentation, which may prevent Stockfish from getting large pages: a fresh session is better in this regard.

Compiling Stockfish yourself from the sources

Stockfish has support for 32 or 64-bit CPUs, certain hardware instructions, big-endian machines such as Power PC, and other platforms.

On Unix-like systems, it should be easy to compile Stockfish directly from the source code with the included Makefile in the folder src. In general it is recommended to run make help to see a list of make targets with corresponding descriptions.

    cd src
    make help
    make net
    make build ARCH=x86-64-modern

When not using the Makefile to compile (for instance, with Microsoft MSVC) you need to manually set/unset some switches in the compiler command line; see file types.h for a quick reference.

When reporting an issue or a bug, please tell us which Stockfish version and which compiler you used to create your executable. This information can be found by typing the following command in a console:

    ./stockfish compiler

Understanding the code base and participating in the project

Stockfish's improvement over the last decade has been a great community effort. There are a few ways to help contribute to its growth.

Donating hardware

Improving Stockfish requires a massive amount of testing. You can donate your hardware resources by installing the Fishtest Worker and view the current tests on Fishtest.

Improving the code

If you want to help improve the code, there are several valuable resources:

In this wiki, many techniques used in Stockfish are explained with a lot of background information.
The section on Stockfish describes many features and techniques used by Stockfish. However, it is generic rather than being focused on Stockfish's precise implementation. Nevertheless, a helpful resource.
The latest source can always be found on GitHub. Discussions about Stockfish take place these days mainly in the FishCooking group and on the Stockfish Discord channel. The engine testing is done on Fishtest. If you want to help improve Stockfish, please read this guideline first, where the basics of Stockfish development are explained.

Terms of use

Stockfish is free, and distributed under the GNU General Public License version 3 (GPL v3). Essentially, this means you are free to do almost exactly what you want with the program, including distributing it among your friends, making it available for download from your website, selling it (either by itself or as part of some bigger software package), or using it as the starting point for a software project of your own.

The only real limitation is that whenever you distribute Stockfish in some way, you MUST always include the full source code, or a pointer to where the source code can be found, to generate the exact binary you are distributing. If you make any changes to the source code, these changes must also be made available under the GPL.

For full details, read the copy of the GPL v3 found in the file named Copying.txt.

variant-nnue-pytorch's People

Contributors

Stargazers

Watchers

Forkers

wunaidev belzedar94 dapao9999 mtaktikos notruck huynq55 movindutb leedavid psycho-toolbox nelloho donglinworld macteki calcitem ichduersieeswir chocolatebakery dpldgr wwwishere

variant-nnue-pytorch's Issues

Training Stops After Epoch 0

Hi, I am trying to get the NNUE trainer to work on an Ubuntu 20.04 cloud computer. To exclude variant specific issues, I went for normal chess first.

However, the training stops always after epoch 0 without saving any checkpoint in the specified directory "variant-nnue-pytorch-master/save_230724/default/version_0/checkpoints". Any idea what's going on here?

(env) us@nvdiat4-1vcpu-3-75gb-15cent-ubuntu2004:~/variant-nnue-pytorch-master$ python3.6 train.py --default_root_dir "save_230724" --threads 1 --num-workers 1 --gpus 1 --max_epochs 10 chess_depth5_100mio.bin chess_depth5_100mio.bin
Feature set: HalfKAv2^
Num real features: 45056
Num virtual features: 768
Num features: 45824
Training with chess_depth5_100mio.bin validating with chess_depth5_100mio.bin
Global seed set to 42
Seed 42
Using batch size 16384
Smart fen skipping: True
Random fen skipping: 3
limiting torch to 1 threads.
Using log dir save_230724
/home/us/env/lib/python3.6/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:488: LightningDeprecationWarning: Argument period in ModelCheckpoint is deprecated in v1.3 and will be removed in v1.5. Please use every_n_epochs instead.
"Argument period in ModelCheckpoint is deprecated in v1.3 and will be removed in v1.5."
/home/us/env/lib/python3.6/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:433: UserWarning: ModelCheckpoint(save_last=True, save_top_k=None, monitor=None) is a redundant configuration. You can save the last checkpoint with ModelCheckpoint(save_top_k=None, monitor=None).
"ModelCheckpoint(save_last=True, save_top_k=None, monitor=None) is a redundant configuration."
ModelCheckpoint(save_last=True, save_top_k=-1, monitor=None) will duplicate the last checkpoint saved.
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
Using c++ data loader
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Ranger optimizer loaded.
Gradient Centralization usage = False

| Name | Type | Params

0 | input | DoubleFeatureTransformerSlice | 23.8 M
1 | layer_stacks | LayerStacks | 152 K

24.0 M Trainable params
0 Non-trainable params
24.0 M Total params
95.925 Total estimated model params size (MB)
Global seed set to 42
Epoch 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1283/1283 [02:02<00:00, 10.44it/s, loss=0.0129, v_num=0Killed

This is my installed python environment:

(env) us@nvdiat4-1vcpu-3-75gb-15cent-ubuntu2004:~$ python3.6 -m pip freeze
absl-py==1.4.0
aiohttp==3.8.4
aiosignal==1.2.0
async-timeout==4.0.2
asynctest==0.13.0
attrs==22.2.0
cachetools==4.2.4
certifi==2023.5.7
charset-normalizer==2.0.12
cmake==3.26.4
cupy-cuda101==9.6.0
cycler==0.11.0
dataclasses==0.8
fastrlock==0.8.1
frozenlist==1.2.0
fsspec==2022.1.0
future==0.18.3
google-auth==2.22.0
google-auth-oauthlib==0.4.6
grpcio==1.48.2
idna==3.4
idna-ssl==1.1.0
importlib-metadata==4.8.3
importlib-resources==5.4.0
install==1.3.5
kiwisolver==1.3.1
Markdown==3.3.7
matplotlib==3.3.4
multidict==5.2.0
numpy==1.19.5
oauthlib==3.2.2
packaging==21.3
Pillow==8.4.0
protobuf==3.19.6
pyasn1==0.5.0
pyasn1-modules==0.3.0
pyDeprecate==0.3.1
pyparsing==3.1.0
python-chess==0.31.4
python-dateutil==2.8.2
pytorch-lightning==1.4.9
PyYAML==6.0
requests==2.27.1
requests-oauthlib==1.3.1
rsa==4.9
six==1.16.0
tensorboard==2.10.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
torch==1.8.1+cu101
torchmetrics==0.7.3
tqdm==4.64.1
typing_extensions==4.1.1
urllib3==1.26.16
Werkzeug==2.0.3
yarl==1.7.2
zipp==3.6.0

(env) us@nvdiat4-1vcpu-3-75gb-15cent-ubuntu2004:~$ python3.6
Python 3.6.15 (default, Apr 25 2022, 01:55:53)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch
torch.version.cuda
'10.1'

(env) us@nvdiat4-1vcpu-3-75gb-15cent-ubuntu2004:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

The training data file was generated on another, cpu-only cloud machine:

us@c2-16vcpu-120gb-ubuntu2004-9cent:~$ ./fairy-stockfish-tools_x86-64-bmi2
Fairy-Stockfish 260422 by Fabian Fichter
uci
...
uciok
setoption name Use NNUE value false
setoption name Threads value 8
setoption name Hash value 2048
setoption name UCI_Variant value chess
lib/nnue_training_data_formats.h:
#define FILES 8
#define RANKS 8
#define PIECE_TYPES 6
#define PIECE_COUNT 32
#define POCKETS false
#define KING_SQUARES 64

variant.py:
RANKS = 8
FILES = 8
SQUARES = RANKS * FILES
KING_SQUARES = 64
PIECE_TYPES = 6
PIECES = 2 * PIECE_TYPES
USE_POCKETS = False
POCKETS = 2 * FILES if USE_POCKETS else 0

PIECE_VALUES = {
1: 126,
2: 781,
3: 825,
4: 1276,
5: 2538,
}
info string variant chess files 8 ranks 8 pocket 0 template fairy startpos rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
isready
readyok
generate_training_data depth 5 count 100000000 random_multi_pv 4 random_multi_pv_diff 100 random_move_count 8 random_move_max_ply 20 write_min_ply 5 eval_limit 10000 set_recommended_uci_options data_format bin output_file_name chess_depth5_100mio.bin
INFO: Executing generate_training_data command
INFO: Parameters:

search_depth_min = 5
search_depth_max = 5
nodes = 0
count = 100000000
eval_limit = 10000
eval_diff_limit = 500
num threads (UCI) = 8
random_move_min_ply = 1
random_move_max_ply = 20
random_move_count = 8
random_move_like_apery = 0
random_multi_pv = 4
random_multi_pv_diff = 100
random_multi_pv_depth = 5
write_min_ply = 5
write_max_ply = 400
book =
output_file_name = chess_depth5_100mio.bin
save_every = 18446744073709551615
random_file_name = 0
write_drawn_games = 1
draw by low score = 1
draw by insuff. mat. = 1
info string classical evaluation enabled
INFO (sfen_writer): Creating new data file at chess_depth5_100mio.bin
PRNG::seed = cbbe2308f46191ed
........................................
200000 sfens, 12277 sfens/second, at Sat Jul 22 03:47:07 2023
...
100000000 sfens, 10813 sfens/second, at Sat Jul 22 06:18:04 2023

INFO: generate_training_data finished.
quit
us@c2-16vcpu-120gb-ubuntu2004-9cent:~$

Bug in pseudoroyal + custom piece variants training

As discussed in the discord channel, there is a bug somewhere as the pseudo-royal commoner is not the last piece in the ordering whenever there is a custom piece on top.

Example of variant with the bug:

[allexplodeatomic:nocheckatomic]
pawn = -
customPiece1 = p:fmWfceFifmnD
pawnTypes = p

Example of faulty behaviour (Custom piece [pawn] negative values):

NNUE derived piece values:
+-------+-------+-------+-------+-------+-------+-------+-------+
| r | n | b | q | k | b | n | r |
| -4.27 | -1.43 | -2.09 | -4.62 | | -1.87 | -1.69 | -4.34 |
+-------+-------+-------+-------+-------+-------+-------+-------+
| p | p | p | p | p | p | p | p |
| +2.16 | +2.47 | +2.49 | +2.70 | +2.28 | +2.02 | +2.87 | +2.16 |
+-------+-------+-------+-------+-------+-------+-------+-------+
| | | | | | | | |
| | | | | | | | |
+-------+-------+-------+-------+-------+-------+-------+-------+
| | | | | | | | |
| | | | | | | | |
+-------+-------+-------+-------+-------+-------+-------+-------+
| | | | | | | | |
| | | | | | | | |
+-------+-------+-------+-------+-------+-------+-------+-------+
| | | | | | | | |
| | | | | | | | |
+-------+-------+-------+-------+-------+-------+-------+-------+
| P | P | P | P | P | P | P | P |
| -2.01 | -2.69 | -2.53 | -2.80 | -2.20 | -1.65 | -2.93 | -2.22 |
+-------+-------+-------+-------+-------+-------+-------+-------+
| R | N | B | Q | K | B | N | R |
| +3.99 | +1.97 | +2.61 | +4.24 | | +2.44 | +1.97 | +4.40 |
+-------+-------+-------+-------+-------+-------+-------+-------+

About CPU only mode trainer.

I found out that official-stockfish/nnue-pytorch#87 managed to run the trainer on cpu only mode.
But as I notice that at some point official-stockfish/nnue-pytorch@0764091 a custom kernel has been applied to the feature transformer.
So It means more changes need to be applied to make it work on cpu side.

But the changes of the custom kernel is so large that it's very hard to make it run on cpu? i.e. a whole bunch of changes need to be applied?

Sample variant.py for xiangqi

The variant.py in the master branch is for chess only.
Could you provide a sample for xiangqi training ?
My guess is something like: (please correct any mistakes)

RANKS = 10
FILES = 9
SQUARES = RANKS * FILES
KING_SQUARES = RANKS * FILES
PIECE_TYPES = 7
PIECES = 2 * PIECE_TYPES
USE_POCKETS = False
POCKETS = 2 * FILES if USE_POCKETS else 0

PIECE_VALUES = {
1 : 200, # copied from types.h in fairy stockfish, SoldierValueMg
2 : 420, # FersValueMg
3 : 300, # ElephantValueMg
4 : 520, # HorseValueMg
5 : 800, # CannonPieceValueMg
6 : 1276 # RookValueMg
}

Variant training support

Multi-GPU training

Thanks

vs2017怎么编译 variant nnue pytroch

我用vs2017编译 cmakelist.txt 总是报错是怎么回事？

Extended training data format

https://github.com/ianfab/variant-nnue-pytorch/tree/largedata
https://github.com/ianfab/variant-nnue-tools/tree/tools_largedata

extend to 1024 bit
extend to 7 bit per pocket

你好能不能出一个发行版？

训练工具可以做一个发行版吗？可以让有闲置的硬件资源的玩家自己训练但是很多玩家不具备编程能力

Segmentation Fault While Training For a Variant

The Variant i'm using is as:

[chessGC]
customPiece1 = p:mfFcfW
knight = n
bishop = b
rook = r
queen = q
king = k
promotedPieceType = p:q
promotionRank = 8
mandatoryPiecePromotion = true
pieceDrops = true
castling = false
startFen = rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w - - 3+3 0 1
checkCounting = true

This Variant has the pawns moving diagonally and capturing forward and it also has the three check rule.
To generate data for this variant , i used the following commands:

uci
setoption name UCI_Variant value chessGC
setoption name PruneAtShallowDepth value false
setoption name Use NNUE value false
setoption name Threads value 6
setoption name Hash value 1024
isready
generate_training_data nodes 10 set_recommended_uci_options depth 8 keep_draws 1 eval_limit 3000 count 1000 output_file_name ./training_data/data.binpack

But It is showing the following error:

INFO: Executing generate_training_data command
INFO: Parameters:
  - search_depth_min       = 8
  - search_depth_max       = 8
  - nodes                  = 10
  - count                  = 1000
  - eval_limit             = 3000
  - eval_diff_limit        = 500
  - num threads (UCI)      = 1
  - random_move_min_ply    = 1
  - random_move_max_ply    = 24
  - random_move_count      = 5
  - random_move_like_apery = 0
  - random_multi_pv        = 0
  - random_multi_pv_diff   = 32000
  - random_multi_pv_depth  = 8
  - write_min_ply          = 16
  - write_max_ply          = 400
  - book                   = 
  - output_file_name       = ./training_data/data.binpack
  - save_every             = 18446744073709551615
  - random_file_name       = 0
  - write_drawn_games      = 1
  - draw by low score      = 1
  - draw by insuff. mat.   = 1
info string classical evaluation enabled
INFO (sfen_writer): Creating new data file at ./training_data/data.binpack
PRNG::seed = 1f4585ea784a7e06

INFO: generate_training_data finished.
Segmentation fault (core dumped)

How should i go about resolving this?

add compiled file for windows users

to ubdip:
compiling fast dataloarder on windows is slow and complex.
if you like , I can upload my compiled file so that others can use it as well.

Improve error handling for inconsistent DATA_SIZE

For now just documented at https://github.com/fairy-stockfish/variant-nnue-pytorch/wiki/FAQ#assertion-failed-bits--6

hello : train xiangqi NNUE

Unknown error when trying to Train

Hi,

For some reason i cannot get the code to work as it is not showing any errors currently. Could you help me figure out what is going on?

i am trying to create a nnue file for a 10x10 variant.

Could it be that the pytorch-lighning and pytorch is incompatible?
Not sure why assertion would fail :(

There are two extra pieces, would i have to code them manually into the code?

sorry if these questions are simple i'm trying my best to learn.

Thank you so much for your dedication, the chess engine world is thankful for all this amazing work

ERROR

(siege) C:\Users\Kosmic\Desktop\variant-nnue-pytorch-master>python train.py --smart-fen-skipping --random-fen-skipping 3 --batch-size 16384 --threads 20 --num-workers 20 --gpus 1 C:\Users\Kosmic\Desktop\Variant\Validation-data\1mil9depth.bin C:\Users\Kosmic\Desktop\Variant\Validation-data\1mil12depth.bin
Feature set: HalfKAv2^
Num real features: 150000
Num virtual features: 1600
Num features: 151600
Training with C:\Users\Kosmic\Desktop\Variant\Validation-data\1mil9depth.bin validating with C:\Users\Kosmic\Desktop\Variant\Validation-data\1mil12depth.bin
Global seed set to 42
Seed 42
Using batch size 16384
Smart fen skipping: True
Random fen skipping: 3
limiting torch to 20 threads.
Using log dir logs/
C:\Users\Kosmic\anaconda3\envs\siege\Lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py:487: LightningDeprecationWarning: Argument period in ModelCheckpoint is deprecated in v1.3 and will be removed in v1.5. Please use every_n_epochs instead.
rank_zero_deprecation(
C:\Users\Kosmic\anaconda3\envs\siege\Lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py:432: UserWarning: ModelCheckpoint(save_last=True, save_top_k=None, monitor=None) is a redundant configuration. You can save the last checkpoint with ModelCheckpoint(save_top_k=None, monitor=None).
rank_zero_warn(
ModelCheckpoint(save_last=True, save_top_k=-1, monitor=None) will duplicate the last checkpoint saved.
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
Using c++ data loader
Assertion failed: bits <= 6, file C:/Users/Kosmic/Desktop/variant-nnue-pytorch-master/lib/nnue_trainLOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
ing_datRanger optimizer loaded.
Gradient Centralization usage = False
a_formats.h, line 662
Assertion failed: bits <= 6, file C:/Users/Kosmic/Desktop/variant-nnue-pytorch-master/lib/nnue_training_data_formats.h, line 662

| Name | Type | Params

0 | input | DoubleFeatureTransformerSlice | 78.8 M
1 | layer_stacks | LayerStacks | 152 K

79.0 M Trainable params
0 Non-trainable params
79.0 M Total params
315.939 Total estimated model params size (MB)
Validation sanity check: 0it [00:00, ?it/s]C:\Users\Kosmic\anaconda3\envs\siege\Lib\site-packages\pytorch_lightning\trainer\data_loading.py:105: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 20 which is the number of cpus on this machine) in theDataLoader` init to improve performance.
rank_zero_warn(
Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]
(siege) C:\Users\Kosmic\Desktop\variant-nnue-pytorch-master>

env Package!

absl-py 1.4.0 pypi_0 pypi
aiohttp 3.8.5 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
annotated-types 0.5.0 pypi_0 pypi
ansicon 1.89.0 pypi_0 pypi
anyio 3.7.1 pypi_0 pypi
arrow 1.2.3 pypi_0 pypi
async-timeout 4.0.3 pypi_0 pypi
attrs 23.1.0 pypi_0 pypi
backoff 2.2.1 pypi_0 pypi
beautifulsoup4 4.12.2 pypi_0 pypi
blessed 1.20.0 pypi_0 pypi
bzip2 1.0.8 he774522_0
ca-certificates 2023.7.22 h56e8100_0 conda-forge
cachetools 5.3.1 pypi_0 pypi
certifi 2022.12.7 pypi_0 pypi
charset-normalizer 2.1.1 pypi_0 pypi
click 8.1.7 pypi_0 pypi
colorama 0.4.6 pypi_0 pypi
contourpy 1.1.0 pypi_0 pypi
croniter 1.4.1 pypi_0 pypi
cuda-version 11.8 h70ddcb2_2 conda-forge
cudatoolkit 11.8.0 h09e9e62_12 conda-forge
cupy 12.2.0 py311h77068d7_0 conda-forge
cycler 0.11.0 pypi_0 pypi
dateutils 0.6.12 pypi_0 pypi
deepdiff 6.3.1 pypi_0 pypi
fastapi 0.103.0 pypi_0 pypi
fastrlock 0.8.2 py311h12c1d0e_0 conda-forge
filelock 3.9.0 pypi_0 pypi
fonttools 4.42.1 pypi_0 pypi
frozenlist 1.4.0 pypi_0 pypi
fsspec 2023.6.0 pypi_0 pypi
future 0.18.3 pypi_0 pypi
google-auth 2.22.0 pypi_0 pypi
google-auth-oauthlib 1.0.0 pypi_0 pypi
grpcio 1.57.0 pypi_0 pypi
h11 0.14.0 pypi_0 pypi
idna 3.4 pypi_0 pypi
inquirer 3.1.3 pypi_0 pypi
intel-openmp 2023.2.0 h57928b3_49496 conda-forge
itsdangerous 2.1.2 pypi_0 pypi
jinja2 3.1.2 pypi_0 pypi
jinxed 1.2.0 pypi_0 pypi
kiwisolver 1.4.5 pypi_0 pypi
libblas 3.9.0 17_win64_mkl conda-forge
libcblas 3.9.0 17_win64_mkl conda-forge
libffi 3.4.4 hd77b12b_0
libhwloc 2.9.1 h51c2c0f_0 conda-forge
libiconv 1.17 h8ffe710_0 conda-forge
liblapack 3.9.0 17_win64_mkl conda-forge
libxml2 2.10.4 h0ad7f3c_1
lightning 2.0.7 pypi_0 pypi
lightning-cloud 0.5.37 pypi_0 pypi
lightning-utilities 0.9.0 pypi_0 pypi
markdown 3.4.4 pypi_0 pypi
markdown-it-py 3.0.0 pypi_0 pypi
markupsafe 2.1.2 pypi_0 pypi
matplotlib 3.7.2 pypi_0 pypi
mdurl 0.1.2 pypi_0 pypi
mkl 2022.1.0 h6a75c08_874 conda-forge
mpmath 1.2.1 pypi_0 pypi
multidict 6.0.4 pypi_0 pypi
networkx 3.0 pypi_0 pypi
numpy 1.24.1 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
openssl 3.1.2 hcfcfb64_0 conda-forge
ordered-set 4.1.0 pypi_0 pypi
packaging 23.1 pypi_0 pypi
pillow 9.3.0 pypi_0 pypi
pip 23.2.1 py311haa95532_0
protobuf 4.24.2 pypi_0 pypi
psutil 5.9.5 pypi_0 pypi
pthreads-win32 2.9.1 hfa6e2cd_3 conda-forge
pyasn1 0.5.0 pypi_0 pypi
pyasn1-modules 0.3.0 pypi_0 pypi
pydantic 2.1.1 pypi_0 pypi
pydantic-core 2.4.0 pypi_0 pypi
pydeprecate 0.3.1 pypi_0 pypi
pygments 2.16.1 pypi_0 pypi
pyjwt 2.8.0 pypi_0 pypi
pyparsing 3.0.9 pypi_0 pypi
python 3.11.4 he1021f5_0
python-chess 0.31.4 pypi_0 pypi
python-dateutil 2.8.2 pypi_0 pypi
python-editor 1.0.4 pypi_0 pypi
python-multipart 0.0.6 pypi_0 pypi
python_abi 3.11 2_cp311 conda-forge
pytorch-lightning 1.4.9 pypi_0 pypi
pytz 2023.3 pypi_0 pypi
pyyaml 6.0.1 pypi_0 pypi
readchar 4.0.5 pypi_0 pypi
requests 2.28.1 pypi_0 pypi
requests-oauthlib 1.3.1 pypi_0 pypi
rich 13.5.2 pypi_0 pypi
rsa 4.9 pypi_0 pypi
setuptools 68.0.0 py311haa95532_0
six 1.16.0 pypi_0 pypi
sniffio 1.3.0 pypi_0 pypi
soupsieve 2.4.1 pypi_0 pypi
sqlite 3.41.2 h2bbff1b_0
starlette 0.27.0 pypi_0 pypi
starsessions 1.3.0 pypi_0 pypi
sympy 1.11.1 pypi_0 pypi
tbb 2021.9.0 h91493d7_0 conda-forge
tensorboard 2.14.0 pypi_0 pypi
tensorboard-data-server 0.7.1 pypi_0 pypi
tk 8.6.12 h2bbff1b_0
torch 2.0.1+cu118 pypi_0 pypi
torchaudio 2.0.2+cu118 pypi_0 pypi
torchmetrics 0.7.0 pypi_0 pypi
torchvision 0.15.2+cu118 pypi_0 pypi
tqdm 4.66.1 pypi_0 pypi
traitlets 5.9.0 pypi_0 pypi
typing-extensions 4.7.1 pypi_0 pypi
tzdata 2023c h04d1e81_0
ucrt 10.0.22621.0 h57928b3_0 conda-forge
urllib3 1.26.13 pypi_0 pypi
uvicorn 0.23.2 pypi_0 pypi
vc 14.2 h21ff451_1
vc14_runtime 14.36.32532 hfdfe4a8_17 conda-forge
vs2015_runtime 14.36.32532 h05e6639_17 conda-forge
wcwidth 0.2.6 pypi_0 pypi
websocket-client 1.6.2 pypi_0 pypi
websockets 11.0.3 pypi_0 pypi
werkzeug 2.3.7 pypi_0 pypi
wheel 0.38.4 py311haa95532_0
xz 5.4.2 h8cc25b3_0
yarl 1.9.2 pypi_0 pypi
zlib 1.2.13 h8cc25b3_0

Add points as an input

This is a extension of fairy-stockfish/Fairy-Stockfish#576 , just for documentation purposes.

Once we have a generalized scoring system for "games of points", it would make sense to add it as an input for NNUE, so it can also learn from it.

Reinforcement learning on endgame result in bad opening and middle game performance.

Suppose I trained the model with some high-quality endgame training_data.bin using the following command:

python serialize.py --features='HalfKAv2' xiangqi-6f64c55fcb28.nnue startingpointfortraining.pt 
python train.py --resume-from-model startingpointfortraining.pt --batch-size 8000 --threads 12 --num-workers 88 --gpus 1 --max_epochs 520 --lambda 0 training_data.bin validation_data.bin

After some tests, I found that the performance of the endgame has significantly improved but at the same time, the performance of the opening and the middle game is reduced dramatically.

So theoretically HalfKAv2 model has 8 buckets, the training may significantly affect the buckets for the endgame and not the buckets on the opening and middlegame. The only possible reason I could come up with is the endgame training changes the weight and biases of the feature transformer layer that lead to this bad result.

If my assumption is correct, what should I do to fix this issue? i.e. to improve the endgame performance while not affecting the opening and middle game performance greatly?

BTW, is there anything wrong with the command I'm using?

xiangqi

Add wall squares as an input

Just for documentation purposes.

"Perhaps they would be just one additional plane (not 1 per color as for piece types) in the NNUE"

Some questions about training NNUE.

May I ask the difference between the following data generation process and the final NNUE play strength?
A. 100M position of depth 10
B. 100M position of depth 20
C. 1B position of depth 10
D. 1B position of depth 20
Let me guess, technically speaking, D > B > C > A ?

Using the latest NNUE file to generate new training data, it's like the process of bootstrapping, will it cause Bootstrapping Error problem? i.e. If the latest NNUE weights and biases have weaknesses in some situations they will get amplified in the late nets.

BTW, from your point of view, how much does the final NNUE file trained with depth 20 stronger than depth 10? What's the play strength difference between 1B position training data using the classical evaluation of depth 20 and 1B position with the latest NNUE net evaluation of depth 10?

Furthermore, suppose I can't hold 1B position in the RAM at a time, I split them into 3 parts, 400M 300M 300M.
How much player strength difference it will be do you think in the following training process?
A. Training 1B at a time for 400 epochs.
B. Training 400M at a time for 400 epochs, continue training the following 300M position for 400 epochs and continue training the rest 300M for 400 epochs.

Finally, If I cause any inconvenience to you for taking your precious time to answer these questions, I am so sorry to bother you these days. ( >︿< )

fairy-stockfish / variant-nnue-pytorch Goto Github PK

variant-nnue-pytorch's Introduction

Fairy-Stockfish

Overview

Installation

Contributing

Supported games

Regional and historical games

Chess variants

Shogi variants

Related games

Help

Bindings

Python

Javascript

Ports

WebAssembly

Stockfish

Overview

Files

The UCI protocol and available options

Threads

Hash

Clear Hash

Ponder

MultiPV

Use NNUE

EvalFile

UCI_AnalyseMode

UCI_Chess960

UCI_ShowWDL

UCI_LimitStrength

UCI_Elo

Skill Level

SyzygyPath

SyzygyProbeDepth

Syzygy50MoveRule

SyzygyProbeLimit

Move Overhead

Slow Mover

nodestime

Debug Log File

bench ttSize threads limit fenFile limitType evalType

compiler

d

eval

export_net [filename]

flip

A note on classical evaluation versus NNUE evaluation

What to expect from the Syzygy tablebases?

Large Pages

Support on Linux

Support on Windows

Compiling Stockfish yourself from the sources

Understanding the code base and participating in the project

Donating hardware

Improving the code

Terms of use

variant-nnue-pytorch's People

Contributors

Stargazers

Watchers

Forkers

variant-nnue-pytorch's Issues

| Name | Type | Params

0 | input | DoubleFeatureTransformerSlice | 78.8 M 1 | layer_stacks | LayerStacks | 152 K

Recommend Projects

Recommend Topics

Recommend Org

0 | input | DoubleFeatureTransformerSlice | 78.8 M
1 | layer_stacks | LayerStacks | 152 K