Coder Social home page Coder Social logo

ericsteinberger / pokerrl Goto Github PK

View Code? Open in Web Editor NEW
387.0 387.0 86.0 308 KB

Framework for Multi-Agent Deep Reinforcement Learning in Poker

License: MIT License

Python 100.00%
deep-learning framework gym-environment poker ray reinforcement-learning reinforcement-learning-algorithms research

pokerrl's People

Contributors

diditforlulz273 avatar ericsteinberger avatar geblanco avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pokerrl's Issues

Cannot use BR evaluation when game_cls == LimitHoldem

Hi Eric,

I have tried to use your paper experiment hyperparameter to train a model with game_cls = LImitHoldem.

However, If I set the evaluation method is 'BR' (best response), then the program will abort in the staging of "Creating BR Mode Evaluator...", I wonder is there anything different between game_cls= LImitHoldem and game_cls=StandardLeduc? If so, how could I modify the code accordingly in order to train the deepCFR model?

Thanks,
Joena

Cppwrapper source

Hi, In advance, thanks for making this public.

The Flop and holden games utilizes the c++ libraries in
PokerRL/game/_/cpp_wrappers. Is it possible to have the source code for these classes published?

BR doesn't work with FlopHoldem

I tried to evaluate my agent's exploitability with Flop5Holdem, so i changed eval_methods to br, but when I tried to launch it, program freezed on Creating BR Evaluator...

DCFR_NN_Losses and Exploitability

Hi Eric!

Thank you for making this public!

I have some general questions about correlations between DCFR_NN_Losses and Exploitability of the agent in big games.
Would you please give a hint:

  1. Will Exploitability of the agent decrease with global iterations if DCFR_NN_Losses does not fall below a certain value? In other words, does it make sense to keep doing global iterations if DCFR_NN_Losses are stuck on 0.2 for example?

  2. Which of the parameters for training AdvantageNet (n_batches_adv_training, mini_batch_size_adv, max_buffer_size) have the greatest impact on reducing DCFR_NN_Losses?

Could add simple example of "interactive_user_v_agent"?

Eric Steinberger,
Thank you for share such a greate framework of porker, Even I know little about machine learning, I still found much fun of this project. I am trying to make some fun of play poker with agent that use your algorithm, how can it works? Could you give a example about how to run with interactive_user_v_agent.py

Starting from fixed board

Thank you for really great job, extremely useful for community!
I tried to evaluate my algorithm on small game of Texas Holdem, starting from flop with fixed initial board cards. In my opinion such small games are very needed for evaluating algorithm with big deck size still allowing measuring exploitability explicitly. So I changed poker environment functions for dealing cards with new rounds, adding starting deck size, but then I found same variables used in 'get_n_cards_out_at_LUT' and realized that therefore there could be many unseen dependencies. Maybe you could give a little advice for realization of fixed starting board?

More examples

Great project, but hard to figure out the code.
Could you please add some more examples and documentation.
Examples from this repository do not match the documentation. Where are Driver and Evaluator? The examples from the DeepCFR project are too complicated.
How to use each component of framework?
How can they be used in interactive mode?

using a gpu with ray locally.

Hello, im trying to understand how to use my nvidia gpu locally using the "distributed" version and ray.

torch.cuda.get_device_name(0) return the name of 1070
torch.cuda.is_available() returns True

i tried modifying "num_gpus" both on "mayberay" and "dist".
also as found on ray documentation i added ray.get_gpu_ids() which correctly returns the number on "num_gpus" of ray.int(num_gpus ..) on mayberay.py

the program works fine apparently but when i check with "watch -n 2 nvidia-smi" it does not seems to use the gtx all.

im using the example on deepCRF which uses leduc with a lower number of workers.

i can't find any solution on ray's documentation.

Test your Poker Agents on pokerwars.io

Hi Guys,

just wanna let you know that there is a Free Poker Bot Platform to test your agents in a more heterogenous environment. I think the current bots are already pretty competitive although it would be nice to compete against more ml experts. There are around 20-40 bots online almost 24/7.

Jump to pokerwars leaderboard or check out several API languages on pokerwars github

Hope to see some of you there and exchange some insights.

Cheers,
Simon

Observationspace/Infostate

Hey Eric,

thanks for making this public, haven't found a good env so far that implements Multiplayer NL. Am I understanding the code right that the observationspace isn't actually perfect information? E.g. it is only the last couple actions? Do you have any research on how this affects convergence?
I had a bit of trouble understanding the code, so I apologize if I just didn't read it right.

MCCFR addition?

It would be great to test MCCFR in these settings. Any idea if this might be included in the future?

BR Texas Holdem

Re-open from #6

Thank you again for response)
I just proposed a modification of Deep CFR algorithm, which works really fine with small leduc poker game, but as you know it is extremely toy game, and to achieve any academical results I should measure exploitability in Texas game. Maybe, if you have access to proprietary tools for exploitability measurement, you could test my bot, and in the case of success and therefore publication, it could be a collaboration.
Another question, you previously mentioned that local best response is supported for large games, but as I know it is modified version of best response, so it is quite strange that vanilla BR is supported only for toy games).

Originally posted by @SavvaI in #6 (comment)

eval_agent.pkl not found

Hi,Eric!
in the /PokerRL/examples/interactive_user_v_agent.py , It shows that we should give a path to eval_agent.pkl .
But I searched the project ,I don't get it ,or I haven't found a way to generate it .

So, can you tell me where I can get the eval_agent.pkl file?

Thanks a lot!

Sub tree solving

Hey,

I was wondering how would one go about solving a game of NLTH on only a few subsets of a full game tree.
For example, how could I limit the poker engine used when training CFR; so that lets say, the flop 2c2h2d always gets dealt. Another example would be to have a fixed starting hand.

What I am trying to do is this: find flop textures that are similar and group them together. For that I would like to see how each hand acts on particular board and so on.

Another completly seperate thing I noticed. You have a DLL for evaluating hands, is it possible to get a source code for that? Some variants of poker have smaller decks and rules.
For example 6+ No limit holdem, in this game players play with 36 card deck. And because of that hand rankings are different; Flush beats a full house etc.

Thank you,
Jonas

EvalAgentBase class errors

Hi @TinkeringCode,

First of all, great work you have here! Really useful.

I am trying to replicate some of your experiments (play against DeepCFR for example), but I am getting two errors.

Trying to play interactively against the algorithm (modified examples/interactive_v_agent.py), when human plays first, the notify_of_processed_tuple_action method gets executed (line 103) and errors with: TypeError: notify_of_processed_tuple_action() got an unexpected keyword argument 'action_tuple'. I think the arguments are reversed and misnamed in EvalAgentBase class, it seems pretty straight forward (I can submit a PR fixing it).

The second error comes when the algorithm plays, get_action_frac_tuple method gets called, but it does not exist in the base class AttributeError: 'EvalAgentDeepCFR' object has no attribute 'get_action_frac_tuple'. This seems more difficult to solve, I haven't studied the code in depth, I assume the action could be get from get_action overridden method, but it's not clear to me how to get the fraction or bet size.

MWE (inside the DeepCFR repo)

from os.path import dirname, abspath

from DeepCFR.EvalAgentDeepCFR import EvalAgentDeepCFR
from PokerRL.game.InteractiveGame import InteractiveGame

path_to_sdcfr_eval_agent = dirname(abspath(__file__)) + "/trained_agents/Example_FHP_SINGLE.pkl"

if __name__ == '__main__':
    eval_agent = EvalAgentDeepCFR.load_from_disk(path_to_eval_agent=path_to_sdcfr_eval_agent)

    # to replicate error 1, when prompted choose any action
    plays_first = [0]
    # then, to replicate error 2, change 
    # plays_first = [1]
    # so that the algorithm starts and the second error is triggered
    game = InteractiveGame(env_cls=eval_agent.env_bldr.env_cls,
                           env_args=eval_agent.env_bldr.env_args,
                           seats_human_plays_list=plays_first,
                           eval_agent=eval_agent,
                           )

    game.start_to_play()

Kind Regards,
Guillermo.

What is lookup table used for ?

Hole_cards are encoded by index in a lookup table to save in the buffer and finally decoded and feed into neural networks.
Why we don't just save an array that represent private observation? Just for save memory usage?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.