ericsteinberger / pokerrl Goto Github PK

View Code? Open in Web Editor NEW

387.0 387.0 86.0 308 KB

Framework for Multi-Agent Deep Reinforcement Learning in Poker

License: MIT License

Python 100.00%

deep-learning framework gym-environment poker ray reinforcement-learning reinforcement-learning-algorithms research

pokerrl's People

Contributors

Stargazers

Watchers

Forkers

mzktbyjc2016 xiaoqiao jona-gold ddding1407 savvai drejanko geblanco jk-cim jonathanlehner boyettw franz101 russellr922 jarlped bobavery85 wwxfromtju yazzyyaz kevindeasis hashcacher hhanh00 xiamx marcelomf mathace rgg81 bayesss dilshan23 leedaga adewin songxiaopeng adarshj noname72 lrjxaint bulldog5046 arain-sh xmgfx sierkov-bot shibei00 downseq vogtai zxpower diditforlulz273 xgshark zhaoenmin jialianlee williamyuanv0 jneckar jinqingyu luci0 m0th517 rainmanwang laokpa mohamedun jayansene tianrilieyan kiranikram andreyzipy flavio58it michael-z northwolf521 ew423 sjtudh tzuren mainvc hellovertex onedollor io6696 bengtlofgren awfeequdng asellerg rpsebastian emwangs jejellyroll-fr itsdredev chhjjjt yufeiweng bassemfg limbicnode42 allexfrolov fapulito lupama2 dansok drazcmd

pokerrl's Issues

Cannot use BR evaluation when game_cls == LimitHoldem

Hi Eric,

I have tried to use your paper experiment hyperparameter to train a model with game_cls = LImitHoldem.

However, If I set the evaluation method is 'BR' (best response), then the program will abort in the staging of "Creating BR Mode Evaluator...", I wonder is there anything different between game_cls= LImitHoldem and game_cls=StandardLeduc? If so, how could I modify the code accordingly in order to train the deepCFR model?

Thanks,
Joena

Hand Eval for Mac OS

Could you please compile hand evaluation for mac os?

Cppwrapper source

Hi, In advance, thanks for making this public.

The Flop and holden games utilizes the c++ libraries in
PokerRL/game/_/cpp_wrappers. Is it possible to have the source code for these classes published?

BR doesn't work with FlopHoldem

I tried to evaluate my agent's exploitability with Flop5Holdem, so i changed eval_methods to br, but when I tried to launch it, program freezed on Creating BR Evaluator...

DCFR_NN_Losses and Exploitability

Hi Eric!

Thank you for making this public!

I have some general questions about correlations between DCFR_NN_Losses and Exploitability of the agent in big games.
Would you please give a hint:

Will Exploitability of the agent decrease with global iterations if DCFR_NN_Losses does not fall below a certain value? In other words, does it make sense to keep doing global iterations if DCFR_NN_Losses are stuck on 0.2 for example?
Which of the parameters for training AdvantageNet (n_batches_adv_training, mini_batch_size_adv, max_buffer_size) have the greatest impact on reducing DCFR_NN_Losses?

Could add simple example of "interactive_user_v_agent"?

Eric Steinberger,
Thank you for share such a greate framework of porker, Even I know little about machine learning, I still found much fun of this project. I am trying to make some fun of play poker with agent that use your algorithm, how can it works? Could you give a example about how to run with interactive_user_v_agent.py

Starting from fixed board

Thank you for really great job, extremely useful for community!
I tried to evaluate my algorithm on small game of Texas Holdem, starting from flop with fixed initial board cards. In my opinion such small games are very needed for evaluating algorithm with big deck size still allowing measuring exploitability explicitly. So I changed poker environment functions for dealing cards with new rounds, adding starting deck size, but then I found same variables used in 'get_n_cards_out_at_LUT' and realized that therefore there could be many unseen dependencies. Maybe you could give a little advice for realization of fixed starting board?

More examples

Great project, but hard to figure out the code.
Could you please add some more examples and documentation.
Examples from this repository do not match the documentation. Where are Driver and Evaluator? The examples from the DeepCFR project are too complicated.
How to use each component of framework?
How can they be used in interactive mode?

using a gpu with ray locally.

Hello, im trying to understand how to use my nvidia gpu locally using the "distributed" version and ray.

torch.cuda.get_device_name(0) return the name of 1070
torch.cuda.is_available() returns True

i tried modifying "num_gpus" both on "mayberay" and "dist".
also as found on ray documentation i added ray.get_gpu_ids() which correctly returns the number on "num_gpus" of ray.int(num_gpus ..) on mayberay.py

the program works fine apparently but when i check with "watch -n 2 nvidia-smi" it does not seems to use the gtx all.

im using the example on deepCRF which uses leduc with a lower number of workers.

i can't find any solution on ray's documentation.

Test your Poker Agents on pokerwars.io

Hi Guys,

just wanna let you know that there is a Free Poker Bot Platform to test your agents in a more heterogenous environment. I think the current bots are already pretty competitive although it would be nice to compete against more ml experts. There are around 20-40 bots online almost 24/7.

Jump to pokerwars leaderboard or check out several API languages on pokerwars github

Hope to see some of you there and exchange some insights.

Cheers,
Simon

Observationspace/Infostate

Hey Eric,

thanks for making this public, haven't found a good env so far that implements Multiplayer NL. Am I understanding the code right that the observationspace isn't actually perfect information? E.g. it is only the last couple actions? Do you have any research on how this affects convergence?
I had a bit of trouble understanding the code, so I apologize if I just didn't read it right.

implement Pluribus?

can something like Pluribus be implemented with this

MCCFR addition?

It would be great to test MCCFR in these settings. Any idea if this might be included in the future?

BR Texas Holdem

Re-open from #6

Thank you again for response)
I just proposed a modification of Deep CFR algorithm, which works really fine with small leduc poker game, but as you know it is extremely toy game, and to achieve any academical results I should measure exploitability in Texas game. Maybe, if you have access to proprietary tools for exploitability measurement, you could test my bot, and in the case of success and therefore publication, it could be a collaboration.
Another question, you previously mentioned that local best response is supported for large games, but as I know it is modified version of best response, so it is quite strange that vanilla BR is supported only for toy games).

Originally posted by @SavvaI in #6 (comment)

eval_agent.pkl not found

Hi,Eric!
in the /PokerRL/examples/interactive_user_v_agent.py , It shows that we should give a path to eval_agent.pkl .
But I searched the project ,I don't get it ,or I haven't found a way to generate it .

So, can you tell me where I can get the eval_agent.pkl file?

Thanks a lot!

Sub tree solving

Hey,

I was wondering how would one go about solving a game of NLTH on only a few subsets of a full game tree.
For example, how could I limit the poker engine used when training CFR; so that lets say, the flop 2c2h2d always gets dealt. Another example would be to have a fixed starting hand.

What I am trying to do is this: find flop textures that are similar and group them together. For that I would like to see how each hand acts on particular board and so on.

Another completly seperate thing I noticed. You have a DLL for evaluating hands, is it possible to get a source code for that? Some variants of poker have smaller decks and rules.
For example 6+ No limit holdem, in this game players play with 36 card deck. And because of that hand rankings are different; Flush beats a full house etc.

Thank you,
Jonas

EvalAgentBase class errors

Hi @TinkeringCode,

First of all, great work you have here! Really useful.

I am trying to replicate some of your experiments (play against DeepCFR for example), but I am getting two errors.

Trying to play interactively against the algorithm (modified examples/interactive_v_agent.py), when human plays first, the notify_of_processed_tuple_action method gets executed (line 103) and errors with: TypeError: notify_of_processed_tuple_action() got an unexpected keyword argument 'action_tuple'. I think the arguments are reversed and misnamed in EvalAgentBase class, it seems pretty straight forward (I can submit a PR fixing it).

The second error comes when the algorithm plays, get_action_frac_tuple method gets called, but it does not exist in the base class AttributeError: 'EvalAgentDeepCFR' object has no attribute 'get_action_frac_tuple'. This seems more difficult to solve, I haven't studied the code in depth, I assume the action could be get from get_action overridden method, but it's not clear to me how to get the fraction or bet size.

MWE (inside the DeepCFR repo)

from os.path import dirname, abspath

from DeepCFR.EvalAgentDeepCFR import EvalAgentDeepCFR
from PokerRL.game.InteractiveGame import InteractiveGame

path_to_sdcfr_eval_agent = dirname(abspath(__file__)) + "/trained_agents/Example_FHP_SINGLE.pkl"

if __name__ == '__main__':
    eval_agent = EvalAgentDeepCFR.load_from_disk(path_to_eval_agent=path_to_sdcfr_eval_agent)

    # to replicate error 1, when prompted choose any action
    plays_first = [0]
    # then, to replicate error 2, change 
    # plays_first = [1]
    # so that the algorithm starts and the second error is triggered
    game = InteractiveGame(env_cls=eval_agent.env_bldr.env_cls,
                           env_args=eval_agent.env_bldr.env_args,
                           seats_human_plays_list=plays_first,
                           eval_agent=eval_agent,
                           )

    game.start_to_play()

Kind Regards,
Guillermo.

What is lookup table used for ?

Hole_cards are encoded by index in a lookup table to save in the buffer and finally decoded and feed into neural networks.
Why we don't just save an array that represent private observation? Just for save memory usage?