mcostalba / scoutfish Goto Github PK

View Code? Open in Web Editor NEW

152.0 152.0 23.0 2.57 MB

Chess Query Engine

License: GNU General Public License v3.0

Makefile 1.62% C++ 97.09% Python 1.29%

scoutfish's People

Contributors

Stargazers

Watchers

Forkers

pychess wallmarkets scchess jackalsh crunchynyc niklasf rozim tdy jonsubs montasaurus santakd xzvb12 akarshjd neeraj9 k2052 dualword dmiller0090 bishopkurgat fletcher

scoutfish's Issues

"x86_64" & "x86-64" problem with Makefile

Hi,

when i try to compile it on my Ubuntu 16.04/x86_64 system, i get make errors.
I had to change the Makefile abit (see title with "-" & "_") to make it work.
see here:
Makefile.txt

Can y take a look?
Jürgen

Support backend pagination of scoutfish results

Right now, we can limit the number of matches, which is great. Ideally, we need pagination of results too. This allows natural exploration with the UI.

Typically, backend code can support a combination of results per page and page number.

Example : if u have 100 results. I want 20 results per page, and give me page 5, should give me matches 80-100. A query of 10 results per page and for page 3, should return matches 20-30.

More details are here: https://www.dynatable.com/#paginating

False match returned by scout

[Event "Local Event"]
[Site "Local Site"]
[Date "2017.02.06"]
[Round "1"]
[White "bt"]
[Black "Guest"]
[Result "1/2-1/2"]
[ECO "C23"]
[TimeControl "300+0"]
[WhiteClock "0:04:17.098"]
[BlackClock "0:04:44.391"]
[PlyCount "20"]

e4 e5 2. Bc4 Bc5 3. Nf3 Nf6 4. d3 d6 5. Be3 Be6 6. Nc3 Nc6 7. Qd2 Qd7 8. Ke2
Ke7 9. Rhe1 Rhe8 10. h3 h6 1/2-1/2

tamas@tami:~/scoutfish$ src/scoutfish scout ../white-move-Rae1.scout '{"white-move": "Rae1"}'
{
    "moves": 20,
    "match count": 1,
    "moves/second": 10000,
    "processing time (ms)": 2,
    "matches":
    [
        { "ofs": 0, "ply": [17] }
    ]
}

I suggest to use long algebraic notation in "white-move" and "black-move" rules instead of SAN to prevent such false matches.

Max Matches not working in python api

I tried
p.setoption('Max Matches', 1)

in front of


    for e in QUERY_DB:
        sys.stdout.write('Query ' + str(cnt) + '...')
        sys.stdout.flush()
        result = p.scout(e['q'])
        if (result['match count'] == e['matches']):
            print('OK')
        else:
            print('FAIL')
        p.before = ''
        cnt += 1

in test.py and it still works well. Its possible I have something else wrong in my env.

Scoutfish raises EOF expection

I get EOF exception with attached pychess.pgn

tamas@tami:~$ PYTHONPATH=pychess/lib python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.

from pychess.external.scoutfish import Scoutfish
s=Scoutfish(engine="pychess/lib/pychess/external/scoutfish")
s.open("./pychess.pgn")
Traceback (most recent call last):
File "", line 1, in
File "/home/tamas/pychess/lib/pychess/external/scoutfish.py", line 32, in open
self.db = self.make()
File "/home/tamas/pychess/lib/pychess/external/scoutfish.py", line 48, in make
self.wait_ready()
File "/home/tamas/pychess/lib/pychess/external/scoutfish.py", line 23, in wait_ready
self.p.expect(u'readyok')
File "/usr/local/lib/python2.7/dist-packages/pexpect/spawnbase.py", line 321, in expect
timeout, searchwindowsize, async)
File "/usr/local/lib/python2.7/dist-packages/pexpect/spawnbase.py", line 345, in expect_list
return exp.expect_loop(timeout)
File "/usr/local/lib/python2.7/dist-packages/pexpect/expect.py", line 105, in expect_loop
return self.eof(e)
File "/usr/local/lib/python2.7/dist-packages/pexpect/expect.py", line 50, in eof
raise EOF(msg)
pexpect.exceptions.EOF: End Of File (EOF).
<pexpect.popen_spawn.PopenSpawn object at 0x7fd774d5d1d0>
searcher: searcher_re:
0: re.compile("readyok")

pychess.zip

Give chess_db compatible offset back in scout json output

Maybe sending an UCI option to scoutfish can change it to create 8byte aligned offsets on output.
See mcostalba/chess_db#28

Sync soutfish.py with chess_db.py

In particular let make() to use self.pgn and check for self.p where needed

Code does not compile

This might be something easy to fix. I can locally fix it by #include <memory.h> in position.h. However, that is not the right fix. Wonder why this is happening in this repo.

shiv@shiv-Inspiron-3847:~/chess/scoutfish/src$ make build ARCH=x86-64-modern

Config:
debug: 'no'
sanitize: 'no'
optimize: 'yes'
arch: 'x86_64'
bits: '64'
kernel: 'Linux'
os: 'GNU/Linux'
prefetch: 'yes'
popcnt: 'yes'
sse: 'yes'
pext: 'no'

Flags:
CXX: g++
CXXFLAGS: -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11  -pedantic -Wextra -Wshadow -m64 -DNDEBUG -O3 -DIS_64BIT -msse -msse3 -mpopcnt -DUSE_POPCNT -flto
LDFLAGS:  -Wl,--no-as-needed -lpthread -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11  -pedantic -Wextra -Wshadow -m64 -DNDEBUG -O3 -DIS_64BIT -msse -msse3 -mpopcnt -DUSE_POPCNT -flto

Testing config sanity. If this fails, try 'make help' ...

make ARCH=x86-64-modern COMP=gcc all
make[1]: Entering directory '/home/shiv/chess/scoutfish/src'
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11  -pedantic -Wextra -Wshadow -m64 -DNDEBUG -O3 -DIS_64BIT -msse -msse3 -mpopcnt -DUSE_POPCNT -flto   -c -o benchmark.o benchmark.cpp
In file included from benchmark.cpp:27:0:
position.h: In member function ‘Position& Position::operator=(const Position&)’:
position.h:76:7: error: ‘::memcpy’ has not been declared
       ::memcpy(this, &pos, sizeof(Position));
       ^
<builtin>: recipe for target 'benchmark.o' failed
make[1]: *** [benchmark.o] Error 1
make[1]: Leaving directory '/home/shiv/chess/scoutfish/src'
Makefile:426: recipe for target 'build' failed
make: *** [build] Error 2

Incorporate chess_db functionality into scoutfish

From GUI development point of view it would be convenient to have the book/find functionality of chess_db in scoutfish/csoutfish.py. Both using the same parser so this seems reasonable anyhow.

Invalid query JSON makes the binary SIGABORT

Hi,

I sent an invalid json while trying to use scoutfish, and the binary received a SIGABRT. It happened in the json parsing code, the libc throws an exception due to incorrect token or something
frame #8: 0x0000000000291eaf scoutfishScout::parse_query(data=0x00007fffffffd900, is=0x00007fffffffdc80) at scout.cpp:634:12
631
632 */
633
-> 634 json j = json::parse(is);`

Tested string is ../scoutfish/src/scoutfish scout test.scout "{ "sub-fen": "8/8/p7/8/8/1B3N2/8/8" }"

Does imbalance support multiple queens?

I want to find games where one side has 2 Q's and the other side has a Q and 2 R's.

I tried this:
./scoutfish scout ./mega-filtered.scout '{ "imbalance": "QQvQRR" }'

and spot checked one result and the game had a Q vs 2 R's, but not what I wanted
(QQ vs QRR).

New search rules

Now scoutfish can search for positions where we expect given pieces exist on given squares using "sub-fen" but not for patterns where given piece exist on given rank/file or it's not there at all. Simple example is "white has an isolated a pawn" meaning white has a pawn somewhere in a2-a7 but not in b2-b7. It would be nice to have some syntax/rule to search for this kind of patterns.

Encode Result value

It might make sense to encode the result value so that we can answer questions like whats the value of 2 Bishops vs Bishop and Knight in open positions and how about closed positions (its an interesting sub-topic to think about how to represent open and close positions, I will start a separate issue on that).

If we encode result, we can answer those queries easily. Like 2 bishops vs Bishop and Knight is worth half a pawn (Kaufman thinks its half a pawn in his books) in the middle/endgame.

stockfish code

more of a question than an issue. why does scoutfish embark the whole stockfish code? based on the readme and python wrapper, i don't understand why you need all the search and syzygy code. is there some kind of undocumented feature where this is useful ?

Use different file name extension for index file than chess_db

Add support for moved-piece and captured-piece

Something like

{ 'moved': 'BN', 'captured': 'R' }

To detect all the bishop and knight captures of an opponent rook

{ 'moved': 'K', 'captured': '', stm='BLACK' }

To detect quite moves of black king (you may want to make it part of a bigger query or of a streak).

Support "add" another .pgn to existing .pgn/.scout database

In #8 you wrote "in Scoutfish format you don't need to rewrite the bin if you add/delete/modify some game. In case of add some new games you can just append related info to the bin (almost instantaneous)."
It would be good to have some format of "add" or "append" command besides "make" in scoutfish to append new games to database.

Proposal for UI for scoutfish

This might be off-topic for this project but I think its very useful to think of integration even on the backend.

I now think a board editor like https://en.lichess.org/editor is handy. Its not too hard to integrate but I was thinking of a good user experience.

If you right click on a square, note that a circle is drawn, and if you drag while right clicking you can form arrows. The board is also editable. I wonder if queries should be written leveraging this interface code with text boxes for more input such as result, side to move, material, all being dropdowns.

We can use multiple editable boards for sequence search. Wondering if this makes sense.

Scoutfish crashes when the database name contains whitespace

Hi,

I tried several time but Scoutfish crashes when the database name contains whitespace.
Do you have some suggestion to fix this problem without where to rename the database?

Thanks and regards,
Giordano

Search for material imbalance

Maybe can be useful to search for all the games where one side is 2 pawns down and won, or where we have a BB vs NN endgame.

Create initial release

It would be good to have some binary downloads too.

Make a python extension for scoutfish

While this is not strictly necessary, it makes it easy to integrate into python apps. I have contributed to a python module of stockfish at https://github.com/jromang/Stockfish/blob/pyfish/src/pyfish.cpp (the work was started by Jean-Francois).

I tried to create one for scoutfish but its turns out starting the multi-threaded scout search with listeners is a little more tricky than I expected. However, I think its quite do-able.

Mobility support in queries

Does it make sense to add mobility support? This allows us to query on "open" or close positions. One simple metric is if there are many legal moves, its an open position, in a closed position, the number of legal moves is restricted.

Whats the best construct to reuse from stockfish? For human players, they typically differentiate, open, closed, semi-open, and semi-closed positions. Semi-open and semi-closed are usually only in the opening.

Can we have something like 1-5 for mobility, where 5 means very open positions and 1 means very closed positions?

Query API

Query database should be flexible, powerful and at the same time very general, it means that with the same form template we would like to cover many different query scenarios.

One approach is that of partial fen string. For instance if we want to retrieve all the positions with a white rook in a1 and a black bishop in c3 we can build up following query:

{ 
   sub-fen: [
                   "R7/8/2b6/8/8/8/8/8"
                  ]
}

In case we want also to retrieve the case where in a1 there is a queen, then:

{ 
   sub-fen: [
                   "R7/8/2b6/8/8/8/8/8",
                   "Q7/8/2b6/8/8/8/8/8"
                  ]
}

This kind of composition is simple but covers a lot of cases, we can do more, suppose we want to retrieve positions with passed white pawns in a5, then

{ 
   sub-fen: [
                   "8/8/8/8/P7/8/8/8"
                  ],
    not-fen: [
                   "8/8/8/8/8/P7/8/8",
                   "8/8/8/8/8/8/P7/8",
                   "8/8/8/8/8/1P6/8/8",
                   "8/8/8/8/8/8/1P6/8",
                  ]
}

This is just an illustrative example to present the idea of a query based on a list of very simple conditions that we can use to build arbitrary complex queries.

Bug in streaks embedded in sequences

Unless I've misunderstood how streaks and sequences are supposed to work together, a streak that matches partially before failing backtracks too far in the list of conditions.

Consider the following database and query (txt extensions to placate GitHub's extension allowlist):

test.pgn.txt
query.json.txt

Scoutfish will only match the first game, not the second, which I think is wrong. The cause is scout.cpp:168-169, which sets condIdx to zero instead of 1, which means that in addition to discarding the failed partial streak match it also discards the matched e4 move which should not be discarded.

Query containig "stm": "white" and "stm": "black" returns same plies

This two result can't be correct at the same time:

tamas@tami:~/scoutfish$ src/scoutfish scout pgn/famous_games.scout '{"stm":"white", "captured": "Q"}'

{
    "moves": 28873,
    "match count": 148,
    "moves/second": 4124714,
    "processing time (ms)": 7,
    "matches":
    [
        { "ofs": 0, "ply": [53] }, 
        { "ofs": 666, "ply": [45] }, 
        { "ofs": 5989, "ply": [43] },

tamas@tami:~/scoutfish$ src/scoutfish scout pgn/famous_games.scout '{"stm":"black", "captured": "Q"}'

{
    "moves": 28873,
    "match count": 148,
    "moves/second": 5774600,
    "processing time (ms)": 5,
    "matches":
    [
        { "ofs": 0, "ply": [53] }, 
        { "ofs": 666, "ply": [45] }, 
        { "ofs": 5989, "ply": [43] },

Get games PGN through Phyton

Add to scoutfish.py a function to get the games descriptions out of the offsets in the list of matches.

This makes sense to do from Python, it should be fast enough.

"pass" rule is not documented in readme

Mate, stalemate

I don't know stockfish internals but new position rules like "mate" and "stalemate" can be useful too if it's possible. Maybe the best would be to expose all chess knowledge stockfish has in a particular position.

Differences to MongoDB

Could you explain why scoutfish is better or different compared to storing PGN Files in e.g. MongoDB? A small comparision in the Readme would be great.

Consecutive sequences: a streak

This is another powerful concept that merges the sequence concept with the persistence concept.

Mainly a streak is like a sequence but with added constrain that the conditions should be satisfied one-by-one along consecutive moves.

You may want to use a streak, for instance, to look for a pawn-down imbalance: we need the imbalance to persist for at least few moves to be sure we are not in the middle of a capture-recapture combination.

Indeed the concept is much more powerful and general than simple imbalance, so I'd expect to find other uses in the future.

Allow a list of material distribution

Like we have lists of sub-fen, we may want to have conditions like:

{ "material": ["KBNKN", "KBNKB"] }

With obvious meaning...

Allom material distribution with more than 8 pieces

It was originally developed for endgames and crashes with more than 8 pieces

Result rule should support list

For example query for decided only games of a tournament/player. {"result": ["1-0", "0-1"]}

Scoutfish crash on .pgn file containing variant games

I'v tried to find something in my own (lichess) games .pgn downloaded before, but failed because it contained some crazyhouse games. It would be good to support chess variants played on fics/lichess. In theory this would be doable using lichess stockfish variant fork https://github.com/niklasf/Stockfish. If this need to much work maybe just ignore games containing "Variant" tag in .pgn headers.
(Same issue stands for chess_db.)

Search for exact FEN

This is not a book, so search cannot be immediate, but anyhow is a nice to have, for instance to search all the games with a specific ECO opening (ECO to FEN mapping is up to the UI tool).

Sequential conditions

This is a very powerful concept!

The idea is to allow more than one condition to be valid at different points in the game. For instance we can search for all the games with a given opening (passing the corresponding sub-fen) and that end up with a given material distribution like KNNKB. Note that the 2 conditions are not required to be true at the same move, as is the normal case, just at least once per game, for the game to be selected.

Create csv from PGN

Is there a way to create a CSV file from PGN file?

"exact-fen"

README.md say that full FEN is just a special sub-fen. It's only true in the early phase of games. Later especially in end game phase a given sub-fen can match to several game positions with additional pieces in squares where sub-fen contains "8". Of course exact FEN can be expressed using one "sub-fen" and one "material" rule, but it would be more convenient to use "exact-fen" only. What do you think?

Inconsistent ply returnd by scout

I tested scoutfish to filter games after 1.e4 using two different scoutfish rules. I find that "sub-fen" rule gives correct plies:

tamas@tami:~/pychess/scoutfish$ src/scoutfish scout 'pgn/famous_games.scout { "sub-fen": "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR" }'

{
    "moves": 28873,
    "match count": 229,
    "moves/second": 4812166,
    "processing time (ms)": 6,
    "matches":
    [
        { "ofs": 666, "ply": [1] }, 
        { "ofs": 2008, "ply": [1] }, 
        { "ofs": 3313, "ply": [1] },

but "white-move" rule gives incorrect plies (ply - 1):

tamas@tami:~/pychess/scoutfish$ src/scoutfish scout 'pgn/famous_games.scout { "white-move": "e4" }'

{
    "moves": 28873,
    "match count": 390,
    "moves/second": 7218250,
    "processing time (ms)": 4,
    "matches":
    [
        { "ofs": 0, "ply": [10] }, 
        { "ofs": 666, "ply": [0] }, 
        { "ofs": 2008, "ply": [0] }, 
        { "ofs": 3313, "ply": [0] },

Add max_game_offsets support to scoutfish

The support is there in chess_db and the python client needs a bound otherwise python will take a while to process the gigantic JSON.

Support capture notation in SAN move

Allow to write something like:

{"white-move": ["Nxe1", "Raxe1"}

Currently capture notation 'X' is not supported and should not be used in the SAN move.

Test case question

What is https://github.com/mcostalba/scoutfish/blob/master/src/test.py#L48 test good for?
Why is it a "streak" and why it contains 2 "result" rule?

Code does not compile on Mac OS X

parser.cpp:550:5: error: no matching function for call to 'mem_map'
    mem_map(dbName.c_str(), &baseAddress, &mapping, &size);
    ^~~~~~~
./misc.h:111:6: note: candidate function not viable: no known conversion from 'uint64_t *' (aka 'unsigned long long *') to 'size_t *' (aka 'unsigned long *') for 4th argument
void mem_map(const char* fname, void** baseAddress, uint64_t* mapping, size_t* size);
     ^
46 warnings and 1 error generated.
make[1]: *** [parser.o] Error 1
make: *** [build] Error 2

The root cause is that size_t compiles to unsigned long in Mac OS X's clang compiler as opposed to unsigned long long in gcc. Should we cast to unsigned long long explicitly and solve the issue?

Different ply returned by same position

As test case, chose the position after 1. e4

q = {'stm': 'balck', 'sub-fen': 'rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR'}
s = {'sequence': [{'white-move': 'e4'}, {'stm': 'balck', 'sub-fen': 'rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR'}]}

m1 = p.scout(q)['matches']
m2 = p.scout(s)['matches']

print(m1[0])  #   {u'ofs': 666, u'ply': [1]}
print(m2[0])  #   {u'ofs': 666, u'ply': [1, 2]}

We have wrong m2[0] because it should be ply: [1], considering that position is the same, and anyhow the sub-fen is matched at ply 1 not at 2.

The error arises because move_rule is not reset after a single condition matches, but only at the end of the game. Moreover, even if we correct the reset, we would end up with repeated plies in matches output, something like:

{u'ofs': 666, u'ply': [1, 1]}

So we would need to add additional logic to remove duplicates before to output the result.