ericsteinberger / deep-cfr Goto Github PK
View Code? Open in Web Editor NEWScalable Implementation of Deep CFR and Single Deep CFR
License: MIT License
Scalable Implementation of Deep CFR and Single Deep CFR
License: MIT License
This was the first result as I looked for a DeepCFR implementation, and I was wondering how feasible you think it would be to adapt it to a different hidden information game? Would supplying a new game_cls
with alternate implementations just work? Or is the more poker knowledge/assumptions embedded in the code?
as 'aprx_imm_reg' here is computed for every action and put to buffer without being summed up, I have no idea why
'aprx_imm_reg *= legal_action_mask / n_actions_to_smpl '
I think it is because I could not understand the formula here(v~(I) = * p(a) * |A(I)), and I failed find corresponding part in your paper,
"""
Last state values are the average, not the sum of all samples of that state since we add
v~(I) = * p(a) * |A(I)|. Since we sample multiple actions on each traverser node, we have to average over
their returns like: v~(I) * Sum_a=0_N (v~(I|a) * p(a) * ||A(I)|| / N).
"""
is there any reference for it?
thanks a lot
2019-12-10 14:47:13,655 ERROR worker.py:1559 -- Possible unhandled error from worker: ray_LearnerActor:update() (pid=93368, host=iMac-adminposh-6.local)
File "Deep-CFR/DeepCFR/workers/la/dist.py", line 11, in init
LocalLearnerActor.init(self, t_prof=t_prof, worker_id=worker_id, chief_handle=chief_handle)
File "Deep-CFR/DeepCFR/workers/la/local.py", line 24, in init
self.env_bldr = rl_util.get_env_builder(t_prof=t_prof)
File "python3.7/site-packages/PokerRL/rl/rl_util.py", line 84, in get_env_builder
return ENV_BUILDER(env_cls=get_env_cls_from_str(t_prof.game_cls_str), env_args=t_prof.module_args["env"])
File "python3.7/site-packages/PokerRL/game/wrappers.py", line 49, in init
super().init(env_cls=env_cls, env_args=env_args)
File "python3.7/site-packages/PokerRL/game//EnvWrapperBuilderBase.py", line 25, in init
self.lut_holder = env_cls.get_lut_holder()
File "python3.7/site-packages/PokerRL/game//rl_env/game_rules.py", line 312, in get_lut_holder
return LutHolderHoldem(cls)
File "python3.7/site-packages/PokerRL/game//look_up_table.py", line 303, in init
super().init(lut_getter=LutGetterHoldem(env_cls=env_cls))
File "python3.7/site-packages/PokerRL/game//look_up_table.py", line 100, in init
n_cards_out_lut=self.get_n_cards_out_at_LUT())
File "python3.7/site-packages/PokerRL/game//cpp_wrappers/CppLUT.py", line 18, in init
"lib_luts." + self.CPP_LIB_FILE_ENDING))
File "python3.7/site-packages/PokerRL//CppWrapper.py", line 18, in init
self._clib = ctypes.cdll.LoadLibrary(path_to_dll)
File "python3.7/ctypes/init.py", line 434, in LoadLibrary
return self._dlltype(name)
File "python3.7/ctypes/init.py", line 356, in init
self.handle = dlopen(self.name, mode)
OSError: dlopen(python3.7/site-packages/PokerRL/game//cpp_wrappers/lib_luts.so, 6): no suitable image found. Did find:
python3.7/site-packages/PokerRL/game//cpp_wrappers/lib_luts.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
python3.7/site-packages/PokerRL/game//cpp_wrappers/lib_luts.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
2019-12-10 14:47:14,423 WARNING worker.py:1619 -- The actor or task with ID ffffffffffff26acd28101000000 is pending and cannot currently be scheduled. It requires {CPU: 1.000000} for execution and {CPU: 1.000000} for placement, but this node only has remaining {object_store_memory: 1.562500 GiB}, {memory: 2.832031 GiB}. In total there are 0 pending tasks and 16 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster.
python paper_experiment_sdcfr_vs_deepcfr_h2h.py
************************** Initing args for: EXPERIMENT_SD-CFR_vs_Deep-CFR_FHP **************************
Running with 20 LearnerActor Workers.
Traceback (most recent call last):
File "paper_experiment_sdcfr_vs_deepcfr_h2h.py", line 75, in
n_iterations=150,
File "/Users/d.saienko/Data Science/Deep-CFR/DeepCFR/workers/driver/Driver.py", line 21, in init
chief_cls=Chief, eval_agent_cls=EvalAgentDeepCFR)
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/PokerRL/rl/base_cls/workers/DriverBase.py", line 35, in init
self._ray.init_local()
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/PokerRL/rl/MaybeRay.py", line 44, in init_local
object_store_memory=min(2 * (10 ** 10), int(psutil.virtual_memory().total * 0.4)),
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 1323, in init
head=True, shutdown_at_exit=False, ray_params=ray_params)
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/node.py", line 149, in init
self.start_head_processes()
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/node.py", line 571, in start_head_processes
self.start_redis()
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/node.py", line 419, in start_redis
self.get_resource_spec(),
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/node.py", line 240, in get_resource_spec
self._ray_params.redis_max_memory).resolve(is_head=self.head)
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/resource_spec.py", line 195, in resolve
int(100 * (memory / system_memory))))
ValueError: After taking into account object store and redis memory usage, the amount of memory on this node available for tasks and actors (-2.96 GB) is less than -17% of total. You can adjust these settings with ray.init(memory=, object_store_memory=).
Hi Eric,
I have tried to use your paper experiment hyperparameter to train a model with game_cls = LImitHoldem.
However, If I set the evaluation method is 'BR' (best response), then the program will abort in the staging of "Creating BR Mode Evaluator...", I wonder is there anything different between game_cls= LImitHoldem and game_cls=StandardLeduc? If so, how could I modify the code accordingly in order to train the deepCFR model?
Thanks,
Joena
I tried to evaluate sdcfr_vs_deepcfr_h2h.py agent's exploitability with Flop5Holdem, so i changed eval_methods to br, but when I tried to launch it, program freezed on Creating BR Evaluator.
Im getting this error on running the code: python paper_experiment_bigleduc_exploitability.py
return _env.observation_space.shape[0] + self.action_vector_size
TypeError: 'NoneType' object is not subscriptable
did I missed sth on the running?
I have been experimenting with the pretrained agents in thsi repository. Using the example code from PokerRL I've loaded a trained agent and tried to play against it:
from PokerRL.game.InteractiveGame import InteractiveGame
from DeepCFR.EvalAgentDeepCFR import EvalAgentDeepCFR
if __name__ == '__main__':
eval_agent = EvalAgentDeepCFR.load_from_disk(
path_to_eval_agent="/home/vishvananda/Deep-CFR/trained_agents/Example_FHP_SINGLE.pkl")
game = InteractiveGame(env_cls=eval_agent.env_bldr.env_cls,
env_args=eval_agent.env_bldr.env_args,
seats_human_plays_list=[0],
eval_agent=eval_agent,
)
game.start_to_play()
I have noticed that the bet/raise sizes in this form are always pot sized. The preflop raise is always to 300 (which would be a pot size raise over the big blind of 100). Post-flop the first bet is 600 (pot sized) and a raise over the 600 will be to 2400 (once again pot sized -- after calling the 600, the pot contains 1800 which is the ammount of the raise). According to the paper from Brown, Flop Hold Em is a limit game which should mean the preflop and flop bet sizes should always be 100 chips. I'm not sure if it was actually trained with pot size raises and it just wasn't mentioned. If it is intended to be limit raising, is there some way to set up the interactive game to use the proper bet sizes?
Here is a log of the interactive game showing the large bet sizing:
$ python3 interactive_user_v_agent.py [40/9667]
_____
_____ _____ |6 |
|2 | _____ |5 || & & |
| & ||3 | _____ | & & || & & | _____
| || & & ||4 || & || & & ||7 |
| & || || & & || & & ||____9|| & & | _____
|____Z|| & || ||____S| |& & &||8 | _____
|____E|| & & | | & & ||& & &||9 |
|____h| |____L|| & & ||& & &|
|& & &||& & &|
|____8||& & &|
|____6|
____________________________________________ TUTORIAL ____________________________________________
Actions:
0 Fold
1 Call
2 Raise according to current fixed limit
****************************
* GAME START *
****************************
___________________________________ preflop - 0 acts ___________________________________
Board:
Last Action: player_None: None None | Main_pot: 0
Player_0:stack: 19950 current_bet: 50 side_pot_rank: -1 hand: Ad, 4s, | Side_pot0: 0
Player_1:stack: 19900 current_bet: 100 side_pot_rank: -1 hand: 2d, As, | Side_pot1: 0
Num raises this round: 1
What action do you want to take as player 0?2
___________________________________ preflop - 1 acts ___________________________________
Board:
Last Action: player_0: 2 300 | Main_pot: 0
Player_0:stack: 19700 current_bet: 300 side_pot_rank: -1 hand: Ad, 4s, | Side_pot0: 0
Player_1:stack: 19900 current_bet: 100 side_pot_rank: -1 hand: 2d, As, | Side_pot1: 0
Num raises this round: 2
___________________________________ flop - 1 acts ___________________________________
Board: Qh, Td, 9s, 7d, Jd,
Last Action: player_1: 1 300 | Main_pot: 600
Player_0:stack: 19700 current_bet: 0 side_pot_rank: -1 hand: Ad, 4s, | Side_pot0: 0
Player_1:stack: 19700 current_bet: 0 side_pot_rank: -1 hand: 2d, As, | Side_pot1: 0
Num raises this round: 0
___________________________________ flop - 0 acts ___________________________________
Board: Qh, Td, 9s, 7d, Jd,
Last Action: player_1: 1 0 | Main_pot: 600
Player_0:stack: 19700 current_bet: 0 side_pot_rank: -1 hand: Ad, 4s, | Side_pot0: 0
Player_1:stack: 19700 current_bet: 0 side_pot_rank: -1 hand: 2d, As, | Side_pot1: 0
Num raises this round: 0
What action do you want to take as player 0?2
___________________________________ flop - 1 acts ___________________________________
Board: Qh, Td, 9s, 7d, Jd,
Last Action: player_0: 2 600 | Main_pot: 600
Player_0:stack: 19100 current_bet: 600 side_pot_rank: -1 hand: Ad, 4s, | Side_pot0: 0
Player_1:stack: 19700 current_bet: 0 side_pot_rank: -1 hand: 2d, As, | Side_pot1: 0
Num raises this round: 1
___________________________________ flop - 1 acts ___________________________________
Board: Qh, Td, 9s, 7d, Jd,
Last Action: player_1: 0 -1 | Main_pot: 0
Player_0:stack: 20300 current_bet: 0 side_pot_rank: -1 hand: Ad, 4s, | Side_pot0: 0
-Player_1:stack: 19700 current_bet: 0 side_pot_rank: -1 hand: 2d, As, | Side_pot1: 0
Num raises this round: 1
Current Winnings per player: [300.0, -300.0]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.