ericsteinberger / deep-cfr Goto Github PK

Scalable Implementation of Deep CFR and Single Deep CFR

License: MIT License

Python 100.00%

deep-learning reinforcement-learning reinforcement-learning-algorithms neural-networks poker imperfect-information-games research cfr counterfactual-regret-minimization

deep-cfr's People

Contributors

Stargazers

Watchers

Forkers

bimingda100 jk-cim dogordog kastnerkyle wwxfromtju pushpen fraferra yazzyyaz logicalhan hashcacher hhanh00 xiamx marcelomf redfox2005 downseq mathace dilshan23 adarshj ridhi1412 rgg81 dongdongbai preacle jbmcgill jonathanlehner shibei00 williamyuanv0 vogtai zxpower diditforlulz273 ub1que annw0922 jinqingyu dolokov sebigher rainmanwang mohamedun kahovka xelaihs lukemshannonhill northwolf521 shadowkun thinkmillllc ew423 tzuren ryan2x mainvc z349152433 asellerg liuqi8827 rpsebastian jejellyroll-fr bassemfg limbicnode42 memsyi sevenk liamdgray anothereden ard-skelling esweffsfeg w123ok

deep-cfr's Issues

Adapting to non-poker games?

This was the first result as I looked for a DeepCFR implementation, and I was wondering how feasible you think it would be to adapt it to a different hidden information game? Would supplying a new game_cls with alternate implementations just work? Or is the more poker knowledge/assumptions embedded in the code?

Why mean over all actions sampled in multi outcome sampling

https://github.com/EricSteinberger/Deep-CFR/blob/master/DeepCFR/workers/la/sampling_algorithms/MultiOutcomeSampler.py

as 'aprx_imm_reg' here is computed for every action and put to buffer without being summed up, I have no idea why
'aprx_imm_reg *= legal_action_mask / n_actions_to_smpl '

I think it is because I could not understand the formula here(v~(I) = * p(a) * |A(I)), and I failed find corresponding part in your paper,
"""
Last state values are the average, not the sum of all samples of that state since we add
v~(I) = * p(a) * |A(I)|. Since we sample multiple actions on each traverser node, we have to average over
their returns like: v~(I) * Sum_a=0_N (v~(I|a) * p(a) * ||A(I)|| / N).
"""

is there any reference for it?

thanks a lot

Do you have any idea why it happens? Thanks a lot!

2019-12-10 14:47:13,655 ERROR worker.py:1559 -- Possible unhandled error from worker: ray_LearnerActor:update() (pid=93368, host=iMac-adminposh-6.local)
File "Deep-CFR/DeepCFR/workers/la/dist.py", line 11, in init
LocalLearnerActor.init(self, t_prof=t_prof, worker_id=worker_id, chief_handle=chief_handle)
File "Deep-CFR/DeepCFR/workers/la/local.py", line 24, in init
self.env_bldr = rl_util.get_env_builder(t_prof=t_prof)
File "python3.7/site-packages/PokerRL/rl/rl_util.py", line 84, in get_env_builder
return ENV_BUILDER(env_cls=get_env_cls_from_str(t_prof.game_cls_str), env_args=t_prof.module_args["env"])
File "python3.7/site-packages/PokerRL/game/wrappers.py", line 49, in init
super().init(env_cls=env_cls, env_args=env_args)
File "python3.7/site-packages/PokerRL/game//EnvWrapperBuilderBase.py", line 25, in init
self.lut_holder = env_cls.get_lut_holder()
File "python3.7/site-packages/PokerRL/game//rl_env/game_rules.py", line 312, in get_lut_holder
return LutHolderHoldem(cls)
File "python3.7/site-packages/PokerRL/game//look_up_table.py", line 303, in init
super().init(lut_getter=LutGetterHoldem(env_cls=env_cls))
File "python3.7/site-packages/PokerRL/game//look_up_table.py", line 100, in init
n_cards_out_lut=self.get_n_cards_out_at_LUT())
File "python3.7/site-packages/PokerRL/game//cpp_wrappers/CppLUT.py", line 18, in init
"lib_luts." + self.CPP_LIB_FILE_ENDING))
File "python3.7/site-packages/PokerRL//CppWrapper.py", line 18, in init
self._clib = ctypes.cdll.LoadLibrary(path_to_dll)
File "python3.7/ctypes/init.py", line 434, in LoadLibrary
return self._dlltype(name)
File "python3.7/ctypes/init.py", line 356, in init
self.handle = dlopen(self.name, mode)
OSError: dlopen(python3.7/site-packages/PokerRL/game//cpp_wrappers/lib_luts.so, 6): no suitable image found. Did find:
python3.7/site-packages/PokerRL/game//cpp_wrappers/lib_luts.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
python3.7/site-packages/PokerRL/game//cpp_wrappers/lib_luts.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
2019-12-10 14:47:14,423 WARNING worker.py:1619 -- The actor or task with ID ffffffffffff26acd28101000000 is pending and cannot currently be scheduled. It requires {CPU: 1.000000} for execution and {CPU: 1.000000} for placement, but this node only has remaining {object_store_memory: 1.562500 GiB}, {memory: 2.832031 GiB}. In total there are 0 pending tasks and 16 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster.

ValueError: After taking into account object store and redis memory usage, the amount of memory on this node available for tasks and actors (-2.96 GB) is less than -17% of total. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).

python paper_experiment_sdcfr_vs_deepcfr_h2h.py
************************** Initing args for: EXPERIMENT_SD-CFR_vs_Deep-CFR_FHP **************************
Running with 20 LearnerActor Workers.
Traceback (most recent call last):
File "paper_experiment_sdcfr_vs_deepcfr_h2h.py", line 75, in
n_iterations=150,
File "/Users/d.saienko/Data Science/Deep-CFR/DeepCFR/workers/driver/Driver.py", line 21, in init
chief_cls=Chief, eval_agent_cls=EvalAgentDeepCFR)
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/PokerRL/rl/base_cls/workers/DriverBase.py", line 35, in init
self._ray.init_local()
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/PokerRL/rl/MaybeRay.py", line 44, in init_local
object_store_memory=min(2 * (10 ** 10), int(psutil.virtual_memory().total * 0.4)),
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 1323, in init
head=True, shutdown_at_exit=False, ray_params=ray_params)
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/node.py", line 149, in init
self.start_head_processes()
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/node.py", line 571, in start_head_processes
self.start_redis()
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/node.py", line 419, in start_redis
self.get_resource_spec(),
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/node.py", line 240, in get_resource_spec
self._ray_params.redis_max_memory).resolve(is_head=self.head)
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/resource_spec.py", line 195, in resolve
int(100 * (memory / system_memory))))
ValueError: After taking into account object store and redis memory usage, the amount of memory on this node available for tasks and actors (-2.96 GB) is less than -17% of total. You can adjust these settings with ray.init(memory=, object_store_memory=).

Program abort at "Creating BR Mode Evaluator" when game_cls=LimitHoldem

Hi Eric,

I have tried to use your paper experiment hyperparameter to train a model with game_cls = LImitHoldem.

However, If I set the evaluation method is 'BR' (best response), then the program will abort in the staging of "Creating BR Mode Evaluator...", I wonder is there anything different between game_cls= LImitHoldem and game_cls=StandardLeduc? If so, how could I modify the code accordingly in order to train the deepCFR model?

Thanks,
Joena

想改成 2 人的无限注德州扑克该怎么改呢，机器比较差，训练不太动三个人的

BR evaluation doesn't work with Flop5Holdem

I tried to evaluate sdcfr_vs_deepcfr_h2h.py agent's exploitability with Flop5Holdem, so i changed eval_methods to br, but when I tried to launch it, program freezed on Creating BR Evaluator.

Error on running the code

Im getting this error on running the code: python paper_experiment_bigleduc_exploitability.py

return _env.observation_space.shape[0] + self.action_vector_size
TypeError: 'NoneType' object is not subscriptable

did I missed sth on the running?

Raise Sizes for Flop5Holdem

I have been experimenting with the pretrained agents in thsi repository. Using the example code from PokerRL I've loaded a trained agent and tried to play against it:

from PokerRL.game.InteractiveGame import InteractiveGame
from DeepCFR.EvalAgentDeepCFR import EvalAgentDeepCFR

if __name__ == '__main__':
    eval_agent = EvalAgentDeepCFR.load_from_disk(
        path_to_eval_agent="/home/vishvananda/Deep-CFR/trained_agents/Example_FHP_SINGLE.pkl")

    game = InteractiveGame(env_cls=eval_agent.env_bldr.env_cls,
                           env_args=eval_agent.env_bldr.env_args,
                           seats_human_plays_list=[0],
                           eval_agent=eval_agent,
                           )

    game.start_to_play()

I have noticed that the bet/raise sizes in this form are always pot sized. The preflop raise is always to 300 (which would be a pot size raise over the big blind of 100). Post-flop the first bet is 600 (pot sized) and a raise over the 600 will be to 2400 (once again pot sized -- after calling the 600, the pot contains 1800 which is the ammount of the raise). According to the paper from Brown, Flop Hold Em is a limit game which should mean the preflop and flop bet sizes should always be 100 chips. I'm not sure if it was actually trained with pot size raises and it just wasn't mentioned. If it is intended to be limit raising, is there some way to set up the interactive game to use the proper bet sizes?

Here is a log of the interactive game showing the large bet sizing:

$ python3 interactive_user_v_agent.py                                                                                                                             [40/9667]

                                                _____
                    _____                _____ |6    |
                   |2    | _____        |5    || & & |
                   |  &  ||3    | _____ | & & || & & | _____
                   |     || & & ||4    ||  &  || & & ||7    |
                   |  &  ||     || & & || & & ||____9|| & & | _____
                   |____Z||  &  ||     ||____S|       |& & &||8    | _____
                          |____E|| & & |              | & & ||& & &||9    |
                                 |____h|              |____L|| & & ||& & &|
                                                             |& & &||& & &|
                                                             |____8||& & &|
                                                                    |____6|

____________________________________________ TUTORIAL ____________________________________________
Actions:
0       Fold
1       Call
2       Raise according to current fixed limit

****************************
*        GAME START        *
****************************




___________________________________ preflop - 0 acts ___________________________________
Board:
Last Action:   player_None: None                                                                                                              None |   Main_pot:        0
     Player_0:stack:     19950 current_bet:        50 side_pot_rank:        -1 hand:  Ad, 4s,                     |   Side_pot0:      0
     Player_1:stack:     19900 current_bet:       100 side_pot_rank:        -1 hand:  2d, As,                     |   Side_pot1:      0
Num raises this round:  1


What action do you want to take as player 0?2



___________________________________ preflop - 1 acts ___________________________________
Board:
Last Action:   player_0: 2                                                                                                               300 |   Main_pot:        0
     Player_0:stack:     19700 current_bet:       300 side_pot_rank:        -1 hand:  Ad, 4s,                     |   Side_pot0:      0
     Player_1:stack:     19900 current_bet:       100 side_pot_rank:        -1 hand:  2d, As,                     |   Side_pot1:      0
Num raises this round:  2




___________________________________  flop - 1 acts  ___________________________________
Board:  Qh, Td, 9s, 7d, Jd,
Last Action:   player_1: 1                                                                                                               300 |   Main_pot:      600
     Player_0:stack:     19700 current_bet:         0 side_pot_rank:        -1 hand:  Ad, 4s,                     |   Side_pot0:      0
     Player_1:stack:     19700 current_bet:         0 side_pot_rank:        -1 hand:  2d, As,                     |   Side_pot1:      0
Num raises this round:  0





___________________________________  flop - 0 acts  ___________________________________
Board:  Qh, Td, 9s, 7d, Jd,
Last Action:   player_1: 1                                                                                                                 0 |   Main_pot:      600
     Player_0:stack:     19700 current_bet:         0 side_pot_rank:        -1 hand:  Ad, 4s,                     |   Side_pot0:      0
     Player_1:stack:     19700 current_bet:         0 side_pot_rank:        -1 hand:  2d, As,                     |   Side_pot1:      0
Num raises this round:  0


What action do you want to take as player 0?2



___________________________________  flop - 1 acts  ___________________________________
Board:  Qh, Td, 9s, 7d, Jd,
Last Action:   player_0: 2                                                                                                               600 |   Main_pot:      600
     Player_0:stack:     19100 current_bet:       600 side_pot_rank:        -1 hand:  Ad, 4s,                     |   Side_pot0:      0
     Player_1:stack:     19700 current_bet:         0 side_pot_rank:        -1 hand:  2d, As,                     |   Side_pot1:      0
Num raises this round:  1





___________________________________  flop - 1 acts  ___________________________________
Board:  Qh, Td, 9s, 7d, Jd,
Last Action:   player_1: 0                                                                                                                -1 |   Main_pot:        0
     Player_0:stack:     20300 current_bet:         0 side_pot_rank:        -1 hand:  Ad, 4s,                     |   Side_pot0:      0
    -Player_1:stack:     19700 current_bet:         0 side_pot_rank:        -1 hand:  2d, As,                     |   Side_pot1:      0
Num raises this round:  1



Current Winnings per player: [300.0, -300.0]

ericsteinberger / deep-cfr Goto Github PK

deep-cfr's People

Contributors

Stargazers

Watchers

Forkers

deep-cfr's Issues

Adapting to non-poker games?

Why mean over all actions sampled in multi outcome sampling

Do you have any idea why it happens? Thanks a lot!

ValueError: After taking into account object store and redis memory usage, the amount of memory on this node available for tasks and actors (-2.96 GB) is less than -17% of total. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).

Program abort at "Creating BR Mode Evaluator" when game_cls=LimitHoldem

想改成 2 人的无限注德州扑克该怎么改呢，机器比较差，训练不太动三个人的

BR evaluation doesn't work with Flop5Holdem

Error on running the code

Raise Sizes for Flop5Holdem

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent