leelachesszero / lc0 Goto Github PK

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.

License: GNU General Public License v3.0

Batchfile 0.41% Shell 0.16% Meson 1.45% C++ 83.02% C 2.54% Cuda 3.11% Python 2.72% Dockerfile 0.03% Pawn 0.01% HLSL 1.63% NASL 0.99% Objective-C++ 3.47% Objective-C 0.46%

lc0's People

Contributors

Stargazers

Watchers

Forkers

borg323 cmcanavessi mooskagh gsobala frpays dubslow tilps killerducky exoticorn sashamn gonzalezjo fli ppdixit evalon32 prcuvu jnewlin12345 gbeauregard ron-wolf luiesa leedavid jffw ra1u reddenver cyanogenoid ganeshkrishnan1 pathosethoslogos cout-hello-world eraoul danieluranga cwbriscoe leela-xiangqi rudipo videodr0me ankan-ban rbatterman roy7 mardak jackthomson2 aysz88 larsjbro kostya potato-chess oscardssmith verccety zefresk uyhcire kiilas theanswer include-josh suzunohara ttl synthetica9 rohandalvi lucabrivio walidhenine2 martinre63 awesome-archive carlowood jjoshua2 kwccoin potrik98 adriod kmcrage almaudoh dantesenior gcp neiljain ghotrix msebi nblaxall shinbet hal2001 john45678 e00e wendazhou frelindb curicm molodiuc vochicong chessai kevinmel2000 szacer ovicenc snifferchess tjwebb jkormu tomaz2k vongolaximan putztzu monomial malcolmlewis dpedley mellekoning ftiannew wakamex cn4750 extrememoves deepankarsharma remiomosowon krishx

lc0's Issues

lc0 doesn't manages time well

There were reports that lc0 doesn't manage time well.

Solution will be for lc0 to look at it's misevaluations for previous move times and adjust (instead of having static slowmover).

--backend-opts parsing incorrectly

When --backend=multiplexing, the '-' in 'tensorflow-cpu' confuses the parser and it is trying to parse it as a number

e.g.

Georges-iMac-Pro-2:pgn george$ lc0-tensorflow --backend=multiplexing "--backend-opts=a(backend=tensorflow-cpu),b(backend=opencl)"
       _
|   _ | |
|_ |_ |_| built Jun  6 2018
isready
Found network file: ./weights.txt
error Unable to parse config at offset 20: a(backend=tensorflow-cpu),b(backend=opencl) (Unable to parse number)

Implement resign in lc0

From @mooskagh on May 27, 2018 16:13

Resign is not yet implemented in lc0. Implement that.

Copied from original issue: glinscott/leela-chess#674

Lc0 Wiki home page still has links to old glinscott wiki pages

Hi @killerducky

Thanks for moving the wiki to the new site. However there are still some remaining links on the new wiki that point to the (now empty glinscott wiki). For example, the new lc0 wiki Home page needs to link to the new lc0 Google Colab and lc0 Google Cloud pages instead of the old glinscott Google Colab and glinscott Google Cloud. Even better would be if the new wiki was available for anyone to edit (the old glinscott wiki anyone could edit)

Verify all backends support biases.

BN beta enabling in training (LeelaChessZero/lczero-training#4) will produce weight files that have non-zero biases. (As an alternative to requiring a weights file versioning. This is the same strategy that leela-zero go used.)

I've taken a look at the code and can see cuda backend merges biases into the BN means. Open Cl and Blas backends also appear to do the same thing. Cuda backend also sometimes folds the BN means back in to biases after that, but the transform appears sound at first glance.

I am however having a bit of difficulty finding where biases are handled in the tf backend.

Reduce memory footprint (Node structure is too large)

From @mooskagh on April 16, 2018 9:33

Plan is:

Remove ChessBoard from Node structure
Make leaves take less space
Consider swtiching to stockfish movegen, if result of two previous steps is slow.

Copied from original issue: glinscott/leela-chess#331

Remove the BLAS dependency from OpenCL backend.

This is to be investigated as it would make the build easier for AMD users.

There are 2 BLAS dependencies:

sgemm reference timing in the OpenCL tuning. Can it be removed ?
sgemv at the end of the inference. It can definitely be replaced by a for loop that the compiler will happily vectorize. This implementation will be marginally slower than the blas one, and is not critical for overall backend performance.

Implement power decay averaging

https://github.com/Videodr0me/leela-chess-experimental/wiki#power-decay-averaging-tree-search---backpropagate-gamma075

Very simple idea, easy to implement, shown to increase strength.

Not to be implemented immediately, but I'm filing this issue in order it not to be forgotten.

Disable tree reuse in training

lc0 backends scale unfavourably at low movetimes compared to lczero

I have performed a preliminary analysis of how various Leela backends perform with different NN sizes and at different movetimes.

None of the lc0 backends tested perform well at very short movetimes such as those which might be used for rapid comparative engine testing (i.e. fractions of a second per move). The original opencl backend for lczero does far better in this regard even though it is running on an AMD rather than an nVidia GPU.

One exception to poor performance at short movetimes with lc0 backends is the tensorflow-cpu lc0 backend which is exceptionally fast with 64x6 networks, but deteriorates very rapidly with increasing network size.

The lc0 opencl backend is slower at short movetimes than the lczero opencl backend although it ultimately overtakes it at longer movetimes.

The opencl backends (both lczero and lc0), are much slower than cudnn but scale much better with increasing network size.

It is possible that some of the observed effects are specific to Mac drivers, but the results suggest that there may be considerable room for optimisation of early search speed.

Complete the development of network_check

network_check is an indispensable tool for backend development as it checks that non regression in backends. But currently, it does the bare minimum. It checks opencl vs blas, at a given tolerance and a given rate.

add the ability to chose the backends through options,
add the ability to pass options to individual backends,
add the ability to select the tolerance,
add the ability to select the rate of checking,
output the max difference seen (whatever it breached the max tolerance or not).

Document build instructions for visual studio 2017 more completely.

Current instructions for visual studio 2017 based build appear to miss a few details that need changing in the build_cuda.cmd.

The following is my build_cuda.cmd. (Having installed visual studio 2017 community edition with the 14.11 compiler version option installed and using Cuda 9.1 since I don't have 9.2 installed yet and having installed cudnn into the same directory as cuda 9.1. I hear cuda 9.2 supports newer compiler versions, so it might be a bit simpler.)

rd /s build

call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64 -vcvars_ver=14.11
python "C:\Users<UsernameHere>\AppData\Local\Programs\Python\Python36\Scripts\meson.py" build --backend vs2017 --buildtype release ^
-Dcudnn_libdirs="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\lib\x64" ^
-Dcudnn_include="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\include" ^
-Ddefault_library=static

pause

cd build

"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\MSBuild.exe" ^
/p:Configuration=Release ^
/p:Platform=x64 ^
/p:PreferredToolArchitecture=x64 subprojects\zlib-1.2.11\[email protected]@@[email protected] ^
/filelogger

Pack more than one game per training games files

While building NNs, one significant issue is that each training game is gzipped into a single file. That means that 1 million training games to build an NN means 1 million files in the same folder. That is incredibly impractical. I would request this be changed to either 1000 or 10,000 games per file.

lc0 needs a version number so server can perform limiting to reject versions with known bugs or lacking features.

Current output includes the build date, but this doesn't tell us much about the code.

My thought is to add a version number to the id name string like lczero does. Whenever there are training affecting changes committed to the source control everyone uses to build it, this version number should be increased.

Fix memory leak

There are reports that the recent lc0 version leaks memory.

That's the issue for myself so that I don't forget about this issue.

Training game PGN bug

Summary: Some training games PGNs are wrong.

Example found here: http://testserver.lczero.org/game/5311612

Reproducible? With luck, I suppose.

Testserver links: Github link incorrect

http://testserver.lczero.org/ still references https://github.com/glinscott/leela-chess.

Continuing integration

Enable some sort of continuing integration, both testing PRs and generating new builds as needed.

lc0: print uci info more often

From @mooskagh on May 27, 2018 5:26

Currently lc0 prints updated uci info when bestmove/depth/seldepth is changed.
At higher depth it may take a while until something from that list is changed.
It makes sense to print uci info also on timer (every 27 seconds, for example).

Copied from original issue: glinscott/leela-chess#668

lc0 exceeds time limit

There are reports that lc0 exceeds time limit:

when go infinite, stop doesn't immediately stop
in time budget mode, it exceeds time budget.

Solution would be to stop in calling thread (in use case 1), and have a watchdog thread (in use case 2)

Implement benchmarks in lc0

From @mooskagh on May 4, 2018 11:38

Implement the following benchmarks in "new" lc0:

Perft (already implemented in tests)
Running backend computation in loop without MCTS
"nodes 130000" from starting position
Something which evaluates precision of NN evaluation by backend, but I don't know where to get ground truth from.

Copied from original issue: glinscott/leela-chess#527

Train value head against q at root after search and game result

Just an idea: Why not train the value head not only against the game result but against root q after search averaged with game result.

Shouldn't this alleviate somewhat the problems of value head learning? Or has this already been tested?

Informative Tournament Info

I made the tournament info more readable and included win%, Elo and LOS.

before:

tournamentstatus win 3 3 lose 5 3 draw 8 10

after - obviously not same tournament as above ;-):

tournamentstatus P1: +815 -825 =1213 Win: 49.82% Elo: -1.22 LOS: 40.25% P1-W: +473 -354 =600 P1-B: +342 -471 =613

code change in loop.cc:

void SelfPlayLoop::SendTournament(const TournamentInfo& info) {
  int winp1 = info.results[0][0] + info.results[0][1];
  int loosep1 = info.results[2][0] + info.results[2][1]; 
  int draws = info.results[1][0] + info.results[1][1];
  float perct=-1, elo=99999;
  float los = 99999; 
  if ((winp1+loosep1+draws)>0) perct = (((float)draws) / 2 + winp1) / (winp1 + loosep1 + draws);
  if ((perct < 1) && (perct > 0)) elo = -400 * log(1 / perct - 1) / log(10);
  if ((winp1 + loosep1)>0) los = .5 + .5 * std::erf((winp1 - loosep1) / std::sqrt(2.0*(winp1 + loosep1)));

  std::string res = "tournamentstatus";
  if (info.finished) res += " final";
  res += " P1: +" + std::to_string(winp1) + " -" + std::to_string(loosep1) + " =" + std::to_string(draws);
  if (perct > 0) {
 	  std::ostringstream oss;
	  oss << std::fixed<<std::setw(5) << std::setprecision(2) << (perct * 100) <<"%";
	  res += " Win: " + oss.str();
  }
  if (elo < 99998) {
	  std::ostringstream oss;
	  oss << std::fixed<< std::setw(5)<<std::setprecision(2) << (elo);
	  res += " Elo: " + oss.str();
  }
  if (los < 99998) {
  std::ostringstream oss;
  oss << std::fixed << std::setw(5) << std::setprecision(2) << (los * 100) << "%";
  res += " LOS: " + oss.str();
  }
  res += " P1-W: +" + std::to_string(info.results[0][0]) + " -" + std::to_string(info.results[2][0]) + " =" + std::to_string(info.results[1][0]);
  res += " P1-B: +" + std::to_string(info.results[0][1]) + " -" + std::to_string(info.results[2][1]) + " =" + std::to_string(info.results[1][1]);
  SendResponse(res);
}

Backpropagating game-theoretical values

Has it been considered to (recursively) determine whether nodes have been "solved" (as a natural by-product of the search as is)? See http://www.ru.is/~yngvi/pdf/WinandsB10a.pdf for an example.

Code assumes P=0 is invalid?

https://github.com/LeelaChessZero/lc0/blob/master/src/mcts/search.cc#L718
if (child->GetP() == 0.0f) continue;

I think this code is using P=0 to say that child is not valid/initialized? I don't think that's a safe assumption. There is another PR to handle the case when sum of all children P=0, but there can still be cases where individual children's P=0.

Edit: Now I think the point is to not waste CPU time on nodes that we never want to prefetch? Is that right?

Crash in multiplexed setup with blas, opencl and heavy threading

The testing of PR #87 did bring out a crash on MacOS (and MacOS only) under Multiplexed condition with OpenCL and BLAS, and heavy threading.

Only command line was causing the crash (between a few seconds and a few minutes) is the following:
./lc0 --backend=multiplexing "--backend-opts=a(backend=blas,threads=8),b(backend=opencl,threads=3)" -t 11 --minibatch-size=64

The crash was not happening every time immediately. You had to run up to 100k nodes (possibly in different runs) to experience it. Sometimes it went it in less a 1K nodes. But it was very replicable, but very random.

Today Gabby experienced same problem happened on master:

Here's my crash with the latest master, run with ./lc0 weights.txt --backend=multiplexing '--backend-opts=a(backend=blas,threads=4),b(backend=opencl,threads=3)' -t 7 --minibatch-size=64. This is not using the PR. https://ghostbin.com/paste/5vxbg

[He] was also running the 256x20 net by @logosgg at the time

Bootstraps or building NNs: gradual feeding or all at once?

I had a test to suggest, which I think might be very important for any future bootstraps and increase in net size. My question is this:

Suppose I have 1 million games. Is it better to feed the NN with all 1 million at once and then just adjust LR when needed?

Is it better to split those million into 10 blocks of increasing strength, and train the NN on each block exclusively and successively?

Leela Zero works with gradual increases, we know. And all bootstraps, as far as I know, work by feeding all the games at once. I have not seen anyone tell me, or even know, whether this might not make a profound difference.

Consider this after all: The games are not of uniform strength. There are the near random ones from the beginning, and the advanced ones much much later. By mixing them all the entire way, you are feeding the NN, even when it is mature, with near random games still. To my mind there is just no way that can be ideal.

Therefore my test is this: take that second test run that has already started at 10x128, and all the games that led to it, but rebuild the 10x128 bootstrap by feeding the games gradually, not all at once. See if the end result is the same strength (or similar) or it is stronger.

If feeding gradually has a strong impact on strength in a positive sense, this might be an important guide towards bootstraps in general.

Make parameter parsers validate input

From @mooskagh on April 29, 2018 20:6

(pretty straightforward, so if someone wants an easy task to start contributing, that's a good candidate)

Currently while command line / uci parameters have restrictions (min/max for numeric, list of choices for choice params), they are not checked. Validate them.

Copied from original issue: glinscott/leela-chess#470

Unexpected Behavior Change With Recent lc0 Versions

Using main net ID 421, with the latest versions of lc0 master (Mon, Jun 18), and the latest crem builds, run the following test position, which is a mate in 1.

1. b3 g6 2. Bb2 Nf6 3. g3 Bg7 4. Nf3 c5 5. c4 Nc6 6. Bg2 d6 7. O-O e5 8. Nc3 O-O 9. d3 Rb8 10. Qd2 a6 11. Ng5 h6 12. Nge4 Nh5 13. Nd5 f5 14. Nec3 f4 15. Kh1 Be6 16. Bf3 Nd4 17. Bxh5 gxh5 18. gxf4 Qh4 19. Ne4 exf4 20. Rg1 Kh7 21. Qa5 b6 22. Qxa6 Rf7 23. Nxd6 Rd7 24. Nxb6 Rxd6 25. Rxg7+ Kxg7 26. Qa7+ Bf7 27. Bxd4+ Kh7

At low node counts ~(0-1000), ID 421 correctly finds the mate in 1 (Qxf7#)
At higher node counts, ~>1000 nodes, ID 421 unfinds the mate in 1 (eventually settles on Qxb8)

On older versions of lc0 (~a few weeks old; I haven't tracked down the exact change at this time), ID 421 finds the mate in 1 and keeps it. This suggests recent changes to lc0 are causing the change.

LCZero v0.10 also finds the mate in 1 and keeps it fine.

I found this position because the incorrect analysis caused Leela to lose a machine match I was running (it moved into the mate-in-1 by playing 27. ...Kh7???).

Lc0 can't reuse the tree after castling moves

From @SashaMN on May 24, 2018 23:41

Using the latest version from next branch lc0 client on ubuntu 16.04 with cudnn on 5 gpus.
It looks like lc0 always drops the tree after every castling move. I observe this problem every game. Here is one log example.
Launch command:

./engines/lc0_new --slowmover=2.8 --tempdecay-moves=5 --threads=20 --weights=engines/weights_kb1-256x20-2100000.txt --cpuct=3.16836 --fpu-reduction=-0.0683163 --backend=multiplexing --nncache=20000000 "--backend-opts=a(backend=cudnn,gpu=0,max_batch=512),b(backend=cudnn,gpu=1,max_batch=512),c(backend=cudnn,gpu=2,max_batch=512),d(backend=cudnn,gpu=3,max_batch=512),e(backend=cudnn,gpu=4,max_batch=512)"

game_log.txt

Copied from original issue: glinscott/leela-chess#658

Make lc0 work efficiently on multi-core CPU (for NN inferrence)

That's planned to be hopefully done before TCEC.
Tracking issue so that I don't forget it.

Create a PR585 type variant for lc0

Either as a configuration parameter, or just as a never-to-submit PR which has it hard coded, implement a search option which gives every root child exactly one visit.
This should be able to be batched very efficiently, which might allow Aloril's spreadsheet to update PR585 columns much faster.

(As a stretch goal, support arbitrary depth alpha-beta search using value head, with policy head suggesting which child to search first to optimize the alpha-beta. This however requires a different get_best_child implementation and probably doesn't provide a massive increase in value over the simple depth1 case for testing purposes.)

Tablebase support

Perhaps the biggest feature remaining that lczero has that lc0 doesn't. Syzygy in particular

Class Boardsquare wrong squarenumbers in comment / Fen parser accepts illegal moves

Hi, when debugging some search mods and looking at the node structures and the moves, I noticed that this comment below is plain wrong:

class BoardSquare {
 public:
  constexpr BoardSquare() {}
  // As a single number, 0 to 63, bottom to top, left to right.
// 0 is a1, 8 is b1, 63 is h7.

actually 1 is b1 and 63 is h8. 55 would be h7

Also the fen parser does accept illegal moves:

position fen 6Q1/8/8/7k/8/8/3p1pp1/3Kbrrb w - - 0 1 g8g7 h5h4 g7g6 h5h3

but maybe its normal as stockfish also accepts this.

Improve training data for learning tactics

Porting to lc0 of lczero issues glinscott/leela-chess#698 and glinscott/leela-chess#699 using the same game for analysis:

CCLS SCTR vs id359 game 1

Trying to find Rxh4 https://clips.twitch.tv/NimbleLazyNewtPRChase:

position startpos moves d2d4 d7d5 c1f4 g7g6 e2e3 g8f6 c2c4 c7c5 d4c5 f8g7 b1c3 d8a5 c4d5 f6d5 d1d5 g7c3 b2c3 a5c3 e1e2 c3a1 f4e5 a1b1 e5h8 c8e6 d5d3 b1a2 e2f3 f7f6 h8g7 b8d7 f3g3 a8c8 c5c6 c8c6 d3d4 c6d6 d4b4 d6b6 b4h4 d7c5 h2h3 b6b2 g1e2 a2d5 g3h2 d5e5 e2g3 h7h5 h4d4 e5d4 e3d4 c5b3 g7h6 h5h4 g3e4 g6g5 f1d3 b3d4 h1a1 a7a6 e4c5 b2f2 d3e4 e6f5 e4b7 f2c2 a1a4 d4e2 c5e4 f5e4 b7e4 c2c1 e4d3 e2f4 d3a6 f4h5

Here's the history of networks from 364 going back 10 at a time and what they thought of the winning move Rxh4 / a4h4 (focus on V and P for now):

id364 a4h4  (666 ) N:     759 (+ 0) (V: -36.58%) (P:  0.47%) (Q:  0.76777) (U: 0.00021) (Q+U:  0.76798) 
id354 a4h4  (666 ) N:     801 (+ 0) (V: -41.09%) (P:  0.23%) (Q:  0.78342) (U: 0.00010) (Q+U:  0.78353) 
id344 a4h4  (666 ) N:     747 (+ 0) (V: -43.49%) (P:  0.15%) (Q:  0.76876) (U: 0.00007) (Q+U:  0.76883) 
id334 a4h4  (666 ) N:     760 (+ 0) (V: -39.09%) (P:  0.11%) (Q:  0.77163) (U: 0.00005) (Q+U:  0.77168) 
id324 a4h4  (666 ) N:     752 (+ 0) (V: -40.29%) (P:  0.21%) (Q:  0.76674) (U: 0.00010) (Q+U:  0.76683) 
id314 a4h4  (666 ) N:     725 (+ 0) (V: -40.58%) (P:  0.18%) (Q:  0.76648) (U: 0.00009) (Q+U:  0.76656) 
id304 a4h4  (666 ) N:     779 (+ 0) (V: -46.39%) (P:  0.18%) (Q:  0.76410) (U: 0.00008) (Q+U:  0.76418) 
id294 a4h4  (666 ) N:     812 (+ 0) (V: -44.82%) (P:  0.17%) (Q:  0.76923) (U: 0.00007) (Q+U:  0.76930) 
id284 a4h4  (666 ) N:     756 (+ 0) (V:  -3.65%) (P:  0.29%) (Q:  0.76365) (U: 0.00013) (Q+U:  0.76379) 
id274 a4h4  (666 ) N:     775 (+ 0) (V:  16.01%) (P:  0.25%) (Q:  0.78128) (U: 0.00011) (Q+U:  0.78139) 
id264 a4h4  (666 ) N:     708 (+ 0) (V: -47.48%) (P:  0.17%) (Q:  0.74895) (U: 0.00008) (Q+U:  0.74903) 
id254 a4h4  (666 ) N:     721 (+ 0) (V: -19.61%) (P:  0.25%) (Q:  0.73499) (U: 0.00012) (Q+U:  0.73511) 
id244 a4h4  (666 ) N:     718 (+ 0) (V: -34.57%) (P:  0.23%) (Q:  0.69746) (U: 0.00011) (Q+U:  0.69756) 
id234 a4h4  (666 ) N:     750 (+ 0) (V: -21.15%) (P:  0.58%) (Q:  0.68519) (U: 0.00027) (Q+U:  0.68546) 
id224 a4h4  (666 ) N:     762 (+ 0) (V: -16.51%) (P:  0.38%) (Q:  0.73801) (U: 0.00017) (Q+U:  0.73818) 
id214 a4h4  (666 ) N:     745 (+ 0) (V: -13.26%) (P:  0.40%) (Q:  0.74920) (U: 0.00018) (Q+U:  0.74938) 
id204 a4h4  (666 ) N:     729 (+ 0) (V: -28.40%) (P:  0.31%) (Q:  0.53719) (U: 0.00015) (Q+U:  0.53734) 
id194 a4h4  (666 ) N:     741 (+ 0) (V:  -3.47%) (P:  0.51%) (Q:  0.74044) (U: 0.00024) (Q+U:  0.74068) 
id184 a4h4  (666 ) N:     745 (+ 0) (V: -23.81%) (P:  0.44%) (Q:  0.72124) (U: 0.00020) (Q+U:  0.72144) 
id174 a4h4  (666 ) N:     724 (+ 0) (V:  16.63%) (P:  0.25%) (Q:  0.66271) (U: 0.00012) (Q+U:  0.66283) 
id164 a4h4  (666 ) N:     715 (+ 0) (V:  -5.59%) (P:  0.91%) (Q:  0.65085) (U: 0.00043) (Q+U:  0.65129) 
id154 a4h4  (666 ) N:     711 (+ 0) (V: -21.75%) (P:  0.57%) (Q:  0.62195) (U: 0.00027) (Q+U:  0.62222) 
id144 a4h4  (666 ) N:     723 (+ 0) (V: -28.62%) (P:  0.59%) (Q:  0.62137) (U: 0.00028) (Q+U:  0.62165)
id134 a4h4  (666 ) N:     492 (+ 0) (V: -47.86%) (P:  0.61%) (Q:  0.53013) (U: 0.00042) (Q+U:  0.53055) 
id124 a4h4  (666 ) N:     636 (+ 0) (V: -37.09%) (P:  0.39%) (Q:  0.62892) (U: 0.00021) (Q+U:  0.62912)

Generally, the prior for this winning move is very low at under 1%, and the value is also unfavorable for white, so search will normally avoid it. This is tricky for tactics to be learned where playing an initially bad move opens up a better outcome.

That's where noise comes in to trick search into visiting more, and here's 50 runs of ./lc0 --weights=id359 --verbose-move-stats --noise --no-smart-pruning with go nodes 800 from the above position startpos …:

info string a4h4  (666 ) N:       0 (+ 0) (V:   0.00%) (P:  0.25%) (Q: -0.29468) (U: 0.08479) (Q+U: -0.20989) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.25%) (Q: -0.38945) (U: 0.04206) (Q+U: -0.34739) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.25%) (Q: -0.38945) (U: 0.04234) (Q+U: -0.34711) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.25%) (Q: -0.38945) (U: 0.04248) (Q+U: -0.34696) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.25%) (Q: -0.38945) (U: 0.04262) (Q+U: -0.34683) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.25%) (Q: -0.38945) (U: 0.04310) (Q+U: -0.34634) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.25%) (Q: -0.38945) (U: 0.04329) (Q+U: -0.34616) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.26%) (Q: -0.38945) (U: 0.04338) (Q+U: -0.34607) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.26%) (Q: -0.38945) (U: 0.04345) (Q+U: -0.34600) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.27%) (Q: -0.38945) (U: 0.04560) (Q+U: -0.34384) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.27%) (Q: -0.38945) (U: 0.04596) (Q+U: -0.34348) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.30%) (Q: -0.38945) (U: 0.05126) (Q+U: -0.33818) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.33%) (Q: -0.38945) (U: 0.05541) (Q+U: -0.33404) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.34%) (Q: -0.38945) (U: 0.05772) (Q+U: -0.33173) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.36%) (Q: -0.38945) (U: 0.06209) (Q+U: -0.32735) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.38%) (Q: -0.38945) (U: 0.06505) (Q+U: -0.32440) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.38%) (Q: -0.38945) (U: 0.06522) (Q+U: -0.32422) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.39%) (Q: -0.38945) (U: 0.06538) (Q+U: -0.32406) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.40%) (Q: -0.38945) (U: 0.06868) (Q+U: -0.32077) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.40%) (Q: -0.38945) (U: 0.06874) (Q+U: -0.32070) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.41%) (Q: -0.38945) (U: 0.06946) (Q+U: -0.31998) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.41%) (Q: -0.38945) (U: 0.06971) (Q+U: -0.31973) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.43%) (Q: -0.38945) (U: 0.07282) (Q+U: -0.31662) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.47%) (Q: -0.38945) (U: 0.07992) (Q+U: -0.30952) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.48%) (Q: -0.38945) (U: 0.08057) (Q+U: -0.30888) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.48%) (Q: -0.38945) (U: 0.08158) (Q+U: -0.30786) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.48%) (Q: -0.38945) (U: 0.08243) (Q+U: -0.30702) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.53%) (Q: -0.38945) (U: 0.08946) (Q+U: -0.29998) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.54%) (Q: -0.38945) (U: 0.09278) (Q+U: -0.29667) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.60%) (Q: -0.38945) (U: 0.10182) (Q+U: -0.28763) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.61%) (Q: -0.38945) (U: 0.10405) (Q+U: -0.28540) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.64%) (Q: -0.38945) (U: 0.10879) (Q+U: -0.28066) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.83%) (Q: -0.38945) (U: 0.14161) (Q+U: -0.24783) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.88%) (Q: -0.38945) (U: 0.14977) (Q+U: -0.23967) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.90%) (Q: -0.38945) (U: 0.15223) (Q+U: -0.23722) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  0.96%) (Q: -0.38945) (U: 0.16305) (Q+U: -0.22639) 
info string a4h4  (666 ) N:       1 (+ 0) (V: -38.94%) (P:  1.00%) (Q: -0.38945) (U: 0.17040) (Q+U: -0.21904) 
info string a4h4  (666 ) N:     132 (+ 0) (V: -38.94%) (P:  1.29%) (Q:  0.76707) (U: 0.00331) (Q+U:  0.77039) 
info string a4h4  (666 ) N:     132 (+ 0) (V: -38.94%) (P:  1.30%) (Q:  0.76707) (U: 0.00336) (Q+U:  0.77043) 
info string a4h4  (666 ) N:     527 (+ 0) (V: -38.94%) (P:  1.39%) (Q:  0.77280) (U: 0.00092) (Q+U:  0.77373) 
info string a4h4  (666 ) N:     215 (+ 0) (V: -38.94%) (P:  1.48%) (Q:  0.77114) (U: 0.00236) (Q+U:  0.77350) 
info string a4h4  (666 ) N:     484 (+ 0) (V: -38.94%) (P:  1.50%) (Q:  0.77353) (U: 0.00107) (Q+U:  0.77460) 
info string a4h4  (666 ) N:     289 (+ 0) (V: -38.94%) (P:  1.71%) (Q:  0.77238) (U: 0.00202) (Q+U:  0.77440) 
info string a4h4  (666 ) N:     536 (+ 0) (V: -38.94%) (P:  1.81%) (Q:  0.77391) (U: 0.00117) (Q+U:  0.77507) 
info string a4h4  (666 ) N:     714 (+ 0) (V: -38.94%) (P:  2.39%) (Q:  0.77523) (U: 0.00123) (Q+U:  0.77645) 
info string a4h4  (666 ) N:     649 (+ 0) (V: -38.94%) (P:  2.97%) (Q:  0.77412) (U: 0.00158) (Q+U:  0.77570) 
info string a4h4  (666 ) N:     643 (+65) (V: -38.94%) (P:  3.22%) (Q:  0.77441) (U: 0.00156) (Q+U:  0.77597) 
info string a4h4  (666 ) N:     714 (+ 0) (V: -38.94%) (P:  3.66%) (Q:  0.77523) (U: 0.00181) (Q+U:  0.77703) 
info string a4h4  (666 ) N:     714 (+ 0) (V: -38.94%) (P:  5.44%) (Q:  0.77523) (U: 0.00263) (Q+U:  0.77786) 
info string a4h4  (666 ) N:     738 (+ 0) (V: -38.94%) (P:  8.65%) (Q:  0.77528) (U: 0.00404) (Q+U:  0.77931)

Here, 13 of 50 games would have produced valuable training data, so noise is indeed working, but the majority is training to avoid the correct move. Averaging this training data for the move across 50 games should cause P to move towards 16.3% (= 6523 / ~800 / 50). But then combined with training data from other games, the networks have learned to keep avoiding this move.

As from the other issue: The premise is that for a self-play to end up in a learnable board state, it seems unfortunate that it misses the opportunity to generate valuable training data for the correct move more often than not. Clearly, AZ's numbers are good enough to eventually generate strong networks, but perhaps training search could be better optimized?

I've rerun the analysis with lc0 and 50 games each configuration from the above board state to measure the average training data for the expected tactic:

Testing patches for visit twice and negative fpu

diff --git a/src/mcts/search.cc b/src/mcts/search.cc
--- a/src/mcts/search.cc
+++ b/src/mcts/search.cc
@@ -650,4 +650,9 @@ Node* Search::PickNodeToExtend(Node* node, PositionHistory* history) {
     for (Node* iter : node->Children()) {
       if (is_root_node) {
+        if (kNoise && iter->GetN() < 2) {
+          node = iter;
+          possible_moves = 2; // avoid "only one possible move" short circuit
+          break;
+        }
         // If there's no chance to catch up the currently best node with
         // remaining playouts, not consider it.

diff --git a/src/mcts/search.cc b/src/mcts/search.cc
--- a/src/mcts/search.cc
+++ b/src/mcts/search.cc
@@ -645,5 +645,5 @@ Node* Search::PickNodeToExtend(Node* node, PositionHistory* history) {
     float parent_q =
         (is_root_node && kNoise)
-            ? -node->GetQ(0, kExtraVirtualLoss)
+            ? -node->GetQ(0, kExtraVirtualLoss) + kFpuReduction
             : -node->GetQ(0, kExtraVirtualLoss) -
                   kFpuReduction * std::sqrt(node->GetVisitedPolicy());

epsilon	alpha	fpu	twice	average tactic training
0.25	0.3	0.0	no	16.3%
0.25	3.0	0.0	no	27.6%
0.5	0.3	0.0	no	33.1%
0.5	3.0	0.0	no	60.0%
0.25	0.3	-0.2	no	18.5%
0.25	3.0	-0.2	no	28.3%
0.5	0.3	-0.2	no	34.8%
0.5	3.0	-0.2	no	59.6%
0.25	0.3	0.0	yes	92.1%

I only ran one "visit each root move twice" as even with the default search parameters, it generally searches much deeper after being nudged over with the forced breadth exploration. This is true across all the previously listed networks from id364 to id124 above, and the output with high Ns are with "visit twice."

Is there an appropriate level of average tactic training? It looks like the current 16.3% is too low to outweigh the other training data. A related question is how often are self-play games getting into learnable states, but I don't have a good way to answer that.

lc0: Don't do GC during "position" uci command

From @mooskagh on May 25, 2018 14:39

I incorrectly assumed that position uci command is where all GC/long operations should happen.
Turns out it's not so. Have to change that.

Copied from original issue: glinscott/leela-chess#660

Rebase and merging instead of creating a merge commit

Hi all maintainers!
Sorry for spamming the issues but...
It is so much annoying that when you merge a pull request you select Merge Pull Request and create a merge commit which usually includes Merge pull request 'number' from 'source'! Why not selecting Rebase and merge? It would be nice.
Entirely yours.
@double-beep

Have a separate timewatch thread

Currently Search checks whether time is up at every iteration.
But as ComputeBlocking may be long, search may take significant time overhead with that approach.

Instead of that, we need a separate thread which would watch the time and output bestmove as soon as time is up.

Possible regression in lc0

Reported by MTGOStark on discord. Seeing 40 elo drop in testing between the 0604 experimental and 0619 with default parameters using net test1_27. (Also mainline 315 - but the position below was with test1_27)
Still trying to find a great example of where the two differ, but the following position does at least seem to be converging on different results.

position startpos moves e2e4 c7c5 g1f3 d7d6 f1b5 c8d7 b5d7 d8d7 e1g1 g8f6 f1e1 b8c6 c2c3 e7e6 h2h3 f8e7 d2d4 c5d4 c3d4 d6d5 e4e5 f6e4 a2a3 e8g8 b1d2 e4d2 d1d2 a7a5 d2f4 a5a4 f4g4 g8h8 c1e3 b7b5 g4h5 b5b4 a3b4 c6b4 e1e2 h8g8 f3g5 h7h6 g5f3 d7b5 e2d2 f8c8 e3h6 g7h6 h5h6 b4c2

Invalid en passant captures in certain positions

I was looking at an entirely closed position:

k2r4/2b1p3/3pPp2/r1pP1Pp1/1pP3Pp/pP5P/P7/5K2 b - - 27 9

I see analysis like the below snippit in a few networks I tried, including 387 and 390. We can see moves like Ra5-b5 Kf1-f2 Ka8-b7 c4xb5, which is clearly an invalid move in this position, or in any position. Network 397 suggested some similar captures early in its evaluation but ditched them in later lines. It seems like the fact that any network would be able suggest such a move indicates that there is some engine bug regarding what constitutes valid moves.

FEN: k2r4/2b1p3/3pPp2/r1pP1Pp1/1pP3Pp/pP5P/P7/5K2 b - - 27 9

Lczero:
 5	00:00	 2	15	+8.45	Rd8-e8 Kf1-e2
 6	00:00	 6	26	+6.50	Rd8-e8 Kf1-e2 Re8-d8
 8	00:00	 13	29	+8.21	Rd8-e8 Kf1-e2 Re8-d8 Ke2-d1
 8	00:00	 19	30	+8.83	Rd8-e8 Kf1-e1 Re8-d8 Ke1-d1
 8	00:00	 20	30	+9.16	Rd8-e8 Kf1-e1 Re8-d8 Ke1-d1
 12	00:03	 127	32	+11.22	Ra5-b5 c4xb5 c5-c4 b3xc4 b4-b3 a2xb3 a3-a2
 12	00:05	 190	32	+8.85	Ra5-a4 b3xa4 b4-b3 a2xb3 a3-a2 b3-b4
 13	00:10	 336	32	+9.32	Ra5-b5 c4xb5 c5-c4 b3xc4 b4-b3 a2xb3 a3-a2 b3-b4 a2-a1Q+
 14	00:15	 512	32	+11.34	Ra5-b5 c4xb5 c5-c4 b3xc4 b4-b3 a2xb3 a3-a2 b3-b4 a2-a1Q+ Kf1-e2
 17	01:20	 3k	32	+11.57	Ra5-b5 Kf1-f2 Ka8-b7 c4xb5 c5-c4 b3xc4 b4-b3 b5-b6 b3xa2 b6xc7
 Rh8 -> 89 (V: 90.25%) (N: 3.87%) PV: Rh8 Ke1 Ra7 Kd1 Rg8 Ke1
 Kb7 -> 203 (V: 90.32%) (N: 8.86%) PV: Kb7 Ke1 Rb5 Kd1 Rb6 Ke2 Re8
 Rd7 -> 209 (V: 92.70%) (N: 1.06%) PV: Rd7 exd7 Kb7 Ke2 Ra8 Kd1 Rd8 Kc1
 Ra7 -> 344 (V: 91.97%) (N: 5.86%) PV: Ra7 Ke1 Ra4 bxa4 b3 axb3 a2 b4
 Ka7 -> 374 (V: 91.61%) (N: 8.57%) PV: Ka7 Ke1 Ra4 bxa4 b3 axb3 a2 b4
 Kb8 -> 382 (V: 91.71%) (N: 8.21%) PV: Kb8 Ke1 Rb5 cxb5 c4 bxc4 b3 axb3 a2 b4
 Bb8 -> 398 (V: 92.12%) (N: 5.74%) PV: Bb8 Ke1 Rb5 cxb5 c4 bxc4 b3 axb3 a2 b4
 Ra6 -> 419 (V: 92.00%) (N: 6.99%) PV: Ra6 Ke1 Rc6 dxc6 d5 cxd5 Rxd5 Ke2 c4 bxc4
 Rf8 -> 426 (V: 92.16%) (N: 6.01%) PV: Rf8 Ke1 Rb5 cxb5 c4 bxc4 b3 axb3 a2 b4
 Re8 -> 434 (V: 91.75%) (N: 8.93%) PV: Re8 Ke1 Rb5 cxb5 c4 bxc4 b3 axb3 a2 b4
 Rc8 -> 440 (V: 92.05%) (N: 7.01%) PV: Rc8 Ke1 Rb5 cxb5 c4 bxc4 b3 axb3 a2 c5
 Bb6 -> 693 (V: 92.55%) (N: 5.54%) PV: Bb6 Ke1 Rb5 cxb5 c4 bxc4 b3 axb3 a2 Ke2
 Rb8 -> 834 (V: 92.57%) (N: 6.28%) PV: Rb8 Ke1 Rbb5 cxb5 c4 bxc4 b3 Kd2 bxa2 c5
 Rg8 -> 926 (V: 92.65%) (N: 5.85%) PV: Rg8 Ke1 Ra4 bxa4 b3 Kd2 bxa2 a5 a1=Q a6
 Ra4 -> 2184 (V: 92.90%) (N: 5.02%) PV: Ra4 Kg1 Ka7 Kh1 Re8 Kg2 Ra5 Kh2
 Rb5 -> 2567 (V: 92.89%) (N: 6.21%) PV: Rb5 Kf2 Kb7 cxb5 c4 bxc4 b3 b6 bxa2 bxc7
stm Black winrate 92.47%
 19	05:41	 11k	32	+11.11	Ra5-b5 Kf1-f2 Ka8-b7 c4xb5 c5-c4 b3xc4 b4-b3 b5-b6 b3xa2 b6xc7

Support Polyglot Opening Book

Can maybe use code from brainfish/cfish which both support polyglot (and feature 2).

ASMFish actually has the best implementation of this support following three features:

Ponders while in book, especially helpful if opponent gets out of book before you.
Has option to specify get out of book after at most X moves (default 20 or so)
Option to say get out of book when X moves deep remaining (say moves are calculated to depth 28 and this TC allows reaching depth 30, then you would want to clip the last 2 moves of the book)

lc0: Support configuration files

From @mooskagh on May 27, 2018 6:11

Add configuration file as yet another way to change lc0 parameters (in addition to command line flags and UCI parameters).

It's convenient to e.g. setup backend/hardware configuration only once, or some people like --verbose-move-stats always enabled.
Also some GUIs have no easy way to either supply command line flags or uci params. That would be helpful for them too.

Copied from original issue: glinscott/leela-chess#669

Comprehensive test suite

Enlarge the test suite substantially to generate confidence in the health of the codebase.

Depth tracking needs re-design and re-implementation

The current depth output is totally useless for people and protocol alike, since it never goes above 2, ever.

PR #74 would address that particular issue, replacing the current depth with something that's approximately right, and at least somewhat useful, but it isn't necessarily ideal.

So, even if/when 74 is merged, we need a better design for depth. Doing a logarithmic approximation with calculated branching factor is considered suboptimal; perhaps better is a linear average of the depth of all nodes. There are other choices as well.

And, even aside from design choices and issues, the current implementations, either of them, suffer the drawback of using storage inside the Node structure, which is a precious resource per issue #13. No matter which design is chosen, it should not require any storage inside each Node. This shouldn't be too hard; it should be relatively straightforward to maintain a variable or two at the top level Search , which is updated during node eval backpropagation. This would be effective with the caveat that it is stateful: tree reuse would require complete recursive calculation of the top level variables for the subtree in question. (One potential workaround is to have Search do a different depth calculation for each potential first and second ply move; then, when the tree is pruned for these two moves, we already have the requisite calculation done, and save the full recursive recalculation only for more extreme (i.e. rarer) tree pruning scenarios.)

Make training games deterministically reproducible

Probably best done by printing and/or storing the Random seed used to generate that game.

Crash with OpenCL backend with cpu on macos

Trying to use opencl backend with cpu rather than gpu causes a crash after lc0 tries to tune the cpu:

Georges-iMac-Pro-2:pgn george$ DYLD_LIBRARY_PATH=/opt/intel/mkl/lib/ lc0 --backend=opencl "--backend-opts=gpu=0" -t 8
       _
|   _ | |
|_ |_ |_| built Jun 15 2018
isready
Found network file: ./338.txt
Creating backend [opencl]...
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 (Mar 15 2018 15:35:11)
Platform profile: FULL_PROFILE
Platform name:    Apple
Platform vendor:  Apple
Device ID:     0
Device name:   Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
Device type:   CPU
Device vendor: Intel
Device driver: 1.1
Device speed:  3200 MHz
Device cores:  16 CU
Device score:  512
Device ID:     1
Device name:   AMD Radeon Pro Vega 56 Compute Engine
Device type:   GPU
Device vendor: AMD
Device driver: 1.2 (May  8 2018 15:49:10)
Device speed:  1250 MHz
Device cores:  56 CU
Device score:  1112
Selected platform: Apple
Selected device: Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
Will try 578 valid configurations.
Failed to find a working configuration.
Check your OpenCL drivers.
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Tuner failed to find working configuration.
Abort trap: 6

The early versions of lczero did not attempt to tune when cpu was the detected device. Later versions did and crashed in the same way.

Have separate probabilities for win/draw/lose

From @mooskagh on April 4, 2018 13:57

That's a pretty speculative idea, but it's very straightforward to implement and it gives more control on playing style which may be useful for later experiments.

So, the idea is instead of having one value out of value head (from -1 to 1), to have three values (passed through softmax layer):

Win probability
Lose probability
Draw probability

During the MCTS, this will be translated into one value:
If we scale Q from 0 to 1:

win + 0.5 draw  <- exactly what we have now
win + draw  <- the program tries not to lose, winning is not important
win  <-  the program wants to win and plays aggressively, draw is not acceptable.

Also other levels of coefficients between draw and lose are possible to tweak aggresiveness more gradually (in this example it's 0.5*draw, 1*draw and 0*draw).

The same if we scale Q from -1 to 1:

win - lose  <- that's what we have now
win + draw - lose  <- win or draw is fine
win - draw - lose  <-  the program wants to win and plays aggressively, draw is not acceptable.

(again it allows more gradual tweaks of draw from -1 to 1)

Copied from original issue: glinscott/leela-chess#241

Black moves pieces towards h1 corner

http://testserver.lczero.org/match/28
http://testserver.lczero.org/match_game/27988 (white is test31, black is test27).
This match was probably played with the T=1 bug, which is why it lasted so long in the first place. But another question is why does black tend to move its pieces towards h1? It's probably because moves towards h1 for black (h8 for white) have the largest encoded values. So possibly something is overflowing, and all results are tied, and the final tie-break is the size of the move encoding.

So far though I'm not able to reproduce that, search looks normal, it wants to play 150.Bb7.

position startpos moves g2g3 e7e5 h2h4 b8c6 d2d3 f8c5 c1h6 g8h6 e1d2 h6g4 d1c1 g4f2 c2c4 d7d5 b2b3 f2h1 c1a3 c5d4 b3b4 d4a1 c4d5 d8d5 e2e4 d5d6 h4h5 d6b4 a3b4 c6b4 d3d4 h1g3 b1a3 e5d4 h5h6 g7h6 g1e2 g3f1 d2c1 d4d3 c1d1 d3e2 d1e2 f7f5 e4f5 c8f5 e2f1 e8c8 a3c4 b4a2 c4e3 f5h3 f1e2 h6h5 e3f5 h3f5 e2e3 h5h4 e3f4 h4h3 f4f5 h3h2 f5e6 h2h1q e6f7 h7h5 f7g6 h5h4 g6f7 h4h3 f7g6 h3h2 g6f5 b7b6 f5e6 a7a5 e6f5 h1g1 f5f4 b6b5 f4f3 a2c3 f3f4 h2h1q f4f5 b5b4 f5f4 b4b3 f4f5 b3b2 f5f4 b2b1q f4e5 c3e4 e5f4 a1b2 f4f5 a5a4 f5e6 a4a3 e6f5 a3a2 f5f4 a2a1q f4f5 d8d1 f5f4 b2c1 f4f5 c1d2 f5e6 d2e1 e6e7 e1d2 e7e6 d2e1 e6e7 e1g3 e7e6 g3f4 e6f7 h1g2 f7e6 f4e5 e6f5 g2h1 f5e6 h1g2 e6e7 e4f2 e7e6 d1f1 e6f7 g2h1 f7e6 e5h2 e6e7 c8b8 e7d7 h1g2 d7e6 g2h1 e6d7 f2g4 d7e6 h8h6 e6d7 g1f2 d7d8 h6h5 d8e8 h5h4 e8d8 f2g3 d8e8 f1f2 e8d7 b1g1 d7d8 a1f1 d8e8 f1g2 e8d7 f2f1 d7d8 g3f3 d8d7 g4f2 d7d8 h4h3 d8e7 f1e1 e7d7 e1f1 d7d8 f1e1 d8d7 e1f1 d7d8 b8a7 d8d7 a7a6 d7c8 f1e1 c8d8 e1f1 d8e8 f1e1 e8d7 e1f1 d7d8 c7c6 d8c8 c6c5 c8d8 a6b7 d8e8 c5c4 e8d8 c4c3 d8e8 c3c2 e8d8 c2c1q d8e7 c1b2 e7e6 b2c1 e6e7 b7a8 e7e8 c1b2 e8d7 b2c1 d7d8 c1b2 d8c8 f1e1 c8d8 b2c1 d8d7 e1f1 d7d8 c1g5 d8d7 g5h6 d7c8 h6h5 c8d8 h5h6 d8e8 h6h5 e8d7 h5h4 d7c8 h4g3 c8d8 f1e1 d8d7 e1f1 d7d8 a8b7 d8e8 b7a6 e8d8 f1e1 d8d7 e1f1 d7d8 a6b5 d8d7 b5c4 d7d8 f1e1 d8d7 g1f1 d7d8 f2g4 d8d7 g4f2 d7d8 e1a1 d8c8 g2g1 c8d7 g3g2 d7d8 a1a2 d8d7 a2c2 d7c8 c4c5 c8d8 c5d4 d8e8 d4d3 e8d7 c2e2 d7d8 e2e1 d8d7 d3e2 d7d8 e2d1 d8d7 d1d2 d7d8 f1e2 d8d7 g1f1 d7d8 h2g1 d8d7 h3h2 d7d8 f2d1 d8d7 g1f2 d7d8 g2g1 d8c8 
go nodes 800

info string d1b2  (1683) N:      22 (+ 0) (V:  99.95%) (P:  6.97%) (Q:  0.22714) (U: 0.26501) (Q+U:  0.49215)
info string d1e3  (1681) N:      30 (+ 0) (V:  99.96%) (P:  8.88%) (Q:  0.23324) (U: 0.25028) (Q+U:  0.48353)
info string f3b7  (1295) N:     293 (+ 0) (V:  99.97%) (P:  0.48%) (Q:  0.85665) (U: 0.00141) (Q+U:  0.85807)

Random backend only includes mask in hash.

It should also include value so that rule 50 plane affects the random output.

Random backend isn't random enough.

I think that for generating a large initial sets of positions each game should be played between random backends each of which have a different randomly selected seed to their hash.
(Can still have a default of 0 seed for producing predictable output.)