ericsteinberger / pokerrl Goto Github PK
View Code? Open in Web Editor NEWFramework for Multi-Agent Deep Reinforcement Learning in Poker
License: MIT License
Framework for Multi-Agent Deep Reinforcement Learning in Poker
License: MIT License
Hi Eric,
I have tried to use your paper experiment hyperparameter to train a model with game_cls = LImitHoldem.
However, If I set the evaluation method is 'BR' (best response), then the program will abort in the staging of "Creating BR Mode Evaluator...", I wonder is there anything different between game_cls= LImitHoldem and game_cls=StandardLeduc? If so, how could I modify the code accordingly in order to train the deepCFR model?
Thanks,
Joena
Could you please compile hand evaluation for mac os?
Hi, In advance, thanks for making this public.
The Flop and holden games utilizes the c++ libraries in
PokerRL/game/_/cpp_wrappers. Is it possible to have the source code for these classes published?
I tried to evaluate my agent's exploitability with Flop5Holdem, so i changed eval_methods to br, but when I tried to launch it, program freezed on Creating BR Evaluator...
Hi Eric!
Thank you for making this public!
I have some general questions about correlations between DCFR_NN_Losses and Exploitability of the agent in big games.
Would you please give a hint:
Will Exploitability of the agent decrease with global iterations if DCFR_NN_Losses does not fall below a certain value? In other words, does it make sense to keep doing global iterations if DCFR_NN_Losses are stuck on 0.2 for example?
Which of the parameters for training AdvantageNet (n_batches_adv_training, mini_batch_size_adv, max_buffer_size) have the greatest impact on reducing DCFR_NN_Losses?
Eric Steinberger,
Thank you for share such a greate framework of porker, Even I know little about machine learning, I still found much fun of this project. I am trying to make some fun of play poker with agent that use your algorithm, how can it works? Could you give a example about how to run with interactive_user_v_agent.py
Thank you for really great job, extremely useful for community!
I tried to evaluate my algorithm on small game of Texas Holdem, starting from flop with fixed initial board cards. In my opinion such small games are very needed for evaluating algorithm with big deck size still allowing measuring exploitability explicitly. So I changed poker environment functions for dealing cards with new rounds, adding starting deck size, but then I found same variables used in 'get_n_cards_out_at_LUT' and realized that therefore there could be many unseen dependencies. Maybe you could give a little advice for realization of fixed starting board?
Great project, but hard to figure out the code.
Could you please add some more examples and documentation.
Examples from this repository do not match the documentation. Where are Driver and Evaluator? The examples from the DeepCFR project are too complicated.
How to use each component of framework?
How can they be used in interactive mode?
Hello, im trying to understand how to use my nvidia gpu locally using the "distributed" version and ray.
torch.cuda.get_device_name(0) return the name of 1070
torch.cuda.is_available() returns True
i tried modifying "num_gpus" both on "mayberay" and "dist".
also as found on ray documentation i added ray.get_gpu_ids() which correctly returns the number on "num_gpus" of ray.int(num_gpus ..) on mayberay.py
the program works fine apparently but when i check with "watch -n 2 nvidia-smi" it does not seems to use the gtx all.
im using the example on deepCRF which uses leduc with a lower number of workers.
i can't find any solution on ray's documentation.
Hi Guys,
just wanna let you know that there is a Free Poker Bot Platform to test your agents in a more heterogenous environment. I think the current bots are already pretty competitive although it would be nice to compete against more ml experts. There are around 20-40 bots online almost 24/7.
Jump to pokerwars leaderboard or check out several API languages on pokerwars github
Hope to see some of you there and exchange some insights.
Cheers,
Simon
Hey Eric,
thanks for making this public, haven't found a good env so far that implements Multiplayer NL. Am I understanding the code right that the observationspace isn't actually perfect information? E.g. it is only the last couple actions? Do you have any research on how this affects convergence?
I had a bit of trouble understanding the code, so I apologize if I just didn't read it right.
can something like Pluribus be implemented with this
It would be great to test MCCFR in these settings. Any idea if this might be included in the future?
Re-open from #6
Thank you again for response)
I just proposed a modification of Deep CFR algorithm, which works really fine with small leduc poker game, but as you know it is extremely toy game, and to achieve any academical results I should measure exploitability in Texas game. Maybe, if you have access to proprietary tools for exploitability measurement, you could test my bot, and in the case of success and therefore publication, it could be a collaboration.
Another question, you previously mentioned that local best response is supported for large games, but as I know it is modified version of best response, so it is quite strange that vanilla BR is supported only for toy games).
Originally posted by @SavvaI in #6 (comment)
Hi,Eric!
in the /PokerRL/examples/interactive_user_v_agent.py , It shows that we should give a path to eval_agent.pkl .
But I searched the project ,I don't get it ,or I haven't found a way to generate it .
So, can you tell me where I can get the eval_agent.pkl file?
Thanks a lot!
Hey,
I was wondering how would one go about solving a game of NLTH on only a few subsets of a full game tree.
For example, how could I limit the poker engine used when training CFR; so that lets say, the flop 2c2h2d always gets dealt. Another example would be to have a fixed starting hand.
What I am trying to do is this: find flop textures that are similar and group them together. For that I would like to see how each hand acts on particular board and so on.
Another completly seperate thing I noticed. You have a DLL for evaluating hands, is it possible to get a source code for that? Some variants of poker have smaller decks and rules.
For example 6+ No limit holdem, in this game players play with 36 card deck. And because of that hand rankings are different; Flush beats a full house etc.
Thank you,
Jonas
Hi @TinkeringCode,
First of all, great work you have here! Really useful.
I am trying to replicate some of your experiments (play against DeepCFR for example), but I am getting two errors.
Trying to play interactively against the algorithm (modified examples/interactive_v_agent.py
), when human plays first, the notify_of_processed_tuple_action
method gets executed (line 103) and errors with: TypeError: notify_of_processed_tuple_action() got an unexpected keyword argument 'action_tuple'
. I think the arguments are reversed and misnamed in EvalAgentBase class, it seems pretty straight forward (I can submit a PR fixing it).
The second error comes when the algorithm plays, get_action_frac_tuple
method gets called, but it does not exist in the base class AttributeError: 'EvalAgentDeepCFR' object has no attribute 'get_action_frac_tuple'
. This seems more difficult to solve, I haven't studied the code in depth, I assume the action could be get from get_action
overridden method, but it's not clear to me how to get the fraction or bet size.
MWE (inside the DeepCFR repo)
from os.path import dirname, abspath
from DeepCFR.EvalAgentDeepCFR import EvalAgentDeepCFR
from PokerRL.game.InteractiveGame import InteractiveGame
path_to_sdcfr_eval_agent = dirname(abspath(__file__)) + "/trained_agents/Example_FHP_SINGLE.pkl"
if __name__ == '__main__':
eval_agent = EvalAgentDeepCFR.load_from_disk(path_to_eval_agent=path_to_sdcfr_eval_agent)
# to replicate error 1, when prompted choose any action
plays_first = [0]
# then, to replicate error 2, change
# plays_first = [1]
# so that the algorithm starts and the second error is triggered
game = InteractiveGame(env_cls=eval_agent.env_bldr.env_cls,
env_args=eval_agent.env_bldr.env_args,
seats_human_plays_list=plays_first,
eval_agent=eval_agent,
)
game.start_to_play()
Kind Regards,
Guillermo.
Hole_cards are encoded by index in a lookup table to save in the buffer and finally decoded and feed into neural networks.
Why we don't just save an array that represent private observation? Just for save memory usage?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.