Coder Social home page Coder Social logo

ericsteinberger / deep-cfr Goto Github PK

View Code? Open in Web Editor NEW
265.0 16.0 60.0 71.58 MB

Scalable Implementation of Deep CFR and Single Deep CFR

License: MIT License

Python 100.00%
deep-learning reinforcement-learning reinforcement-learning-algorithms neural-networks poker imperfect-information-games research cfr counterfactual-regret-minimization

deep-cfr's People

Contributors

ericsteinberger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-cfr's Issues

Adapting to non-poker games?

This was the first result as I looked for a DeepCFR implementation, and I was wondering how feasible you think it would be to adapt it to a different hidden information game? Would supplying a new game_cls with alternate implementations just work? Or is the more poker knowledge/assumptions embedded in the code?

Why mean over all actions sampled in multi outcome sampling

https://github.com/EricSteinberger/Deep-CFR/blob/master/DeepCFR/workers/la/sampling_algorithms/MultiOutcomeSampler.py

as 'aprx_imm_reg' here is computed for every action and put to buffer without being summed up, I have no idea why
'aprx_imm_reg *= legal_action_mask / n_actions_to_smpl '

I think it is because I could not understand the formula here(v~(I) = * p(a) * |A(I)), and I failed find corresponding part in your paper,
"""
Last state values are the average, not the sum of all samples of that state since we add
v~(I) = * p(a) * |A(I)|. Since we sample multiple actions on each traverser node, we have to average over
their returns like: v~(I) * Sum_a=0_N (v~(I|a) * p(a) * ||A(I)|| / N).
"""

is there any reference for it?

thanks a lot

Do you have any idea why it happens? Thanks a lot!

2019-12-10 14:47:13,655 ERROR worker.py:1559 -- Possible unhandled error from worker: ray_LearnerActor:update() (pid=93368, host=iMac-adminposh-6.local)
File "Deep-CFR/DeepCFR/workers/la/dist.py", line 11, in init
LocalLearnerActor.init(self, t_prof=t_prof, worker_id=worker_id, chief_handle=chief_handle)
File "Deep-CFR/DeepCFR/workers/la/local.py", line 24, in init
self.env_bldr = rl_util.get_env_builder(t_prof=t_prof)
File "python3.7/site-packages/PokerRL/rl/rl_util.py", line 84, in get_env_builder
return ENV_BUILDER(env_cls=get_env_cls_from_str(t_prof.game_cls_str), env_args=t_prof.module_args["env"])
File "python3.7/site-packages/PokerRL/game/wrappers.py", line 49, in init
super().init(env_cls=env_cls, env_args=env_args)
File "python3.7/site-packages/PokerRL/game/
/EnvWrapperBuilderBase.py", line 25, in init
self.lut_holder = env_cls.get_lut_holder()
File "python3.7/site-packages/PokerRL/game//rl_env/game_rules.py", line 312, in get_lut_holder
return LutHolderHoldem(cls)
File "python3.7/site-packages/PokerRL/game/
/look_up_table.py", line 303, in init
super().init(lut_getter=LutGetterHoldem(env_cls=env_cls))
File "python3.7/site-packages/PokerRL/game/
/look_up_table.py", line 100, in init
n_cards_out_lut=self.get_n_cards_out_at_LUT())
File "python3.7/site-packages/PokerRL/game//cpp_wrappers/CppLUT.py", line 18, in init
"lib_luts." + self.CPP_LIB_FILE_ENDING))
File "python3.7/site-packages/PokerRL/
/CppWrapper.py", line 18, in init
self._clib = ctypes.cdll.LoadLibrary(path_to_dll)
File "python3.7/ctypes/init.py", line 434, in LoadLibrary
return self._dlltype(name)
File "python3.7/ctypes/init.py", line 356, in init
self.handle = dlopen(self.name, mode)
OSError: dlopen(python3.7/site-packages/PokerRL/game/
/cpp_wrappers/lib_luts.so, 6): no suitable image found. Did find:
python3.7/site-packages/PokerRL/game/
/cpp_wrappers/lib_luts.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
python3.7/site-packages/PokerRL/game/
/cpp_wrappers/lib_luts.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
2019-12-10 14:47:14,423 WARNING worker.py:1619 -- The actor or task with ID ffffffffffff26acd28101000000 is pending and cannot currently be scheduled. It requires {CPU: 1.000000} for execution and {CPU: 1.000000} for placement, but this node only has remaining {object_store_memory: 1.562500 GiB}, {memory: 2.832031 GiB}. In total there are 0 pending tasks and 16 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster.

ValueError: After taking into account object store and redis memory usage, the amount of memory on this node available for tasks and actors (-2.96 GB) is less than -17% of total. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).

python paper_experiment_sdcfr_vs_deepcfr_h2h.py
************************** Initing args for: EXPERIMENT_SD-CFR_vs_Deep-CFR_FHP **************************
Running with 20 LearnerActor Workers.
Traceback (most recent call last):
File "paper_experiment_sdcfr_vs_deepcfr_h2h.py", line 75, in
n_iterations=150,
File "/Users/d.saienko/Data Science/Deep-CFR/DeepCFR/workers/driver/Driver.py", line 21, in init
chief_cls=Chief, eval_agent_cls=EvalAgentDeepCFR)
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/PokerRL/rl/base_cls/workers/DriverBase.py", line 35, in init
self._ray.init_local()
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/PokerRL/rl/MaybeRay.py", line 44, in init_local
object_store_memory=min(2 * (10 ** 10), int(psutil.virtual_memory().total * 0.4)),
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 1323, in init
head=True, shutdown_at_exit=False, ray_params=ray_params)
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/node.py", line 149, in init
self.start_head_processes()
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/node.py", line 571, in start_head_processes
self.start_redis()
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/node.py", line 419, in start_redis
self.get_resource_spec(),
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/node.py", line 240, in get_resource_spec
self._ray_params.redis_max_memory).resolve(is_head=self.head)
File "/Users/d.saienko/anaconda3/lib/python3.7/site-packages/ray/resource_spec.py", line 195, in resolve
int(100 * (memory / system_memory))))
ValueError: After taking into account object store and redis memory usage, the amount of memory on this node available for tasks and actors (-2.96 GB) is less than -17% of total. You can adjust these settings with ray.init(memory=, object_store_memory=).

Program abort at "Creating BR Mode Evaluator" when game_cls=LimitHoldem

Hi Eric,

I have tried to use your paper experiment hyperparameter to train a model with game_cls = LImitHoldem.

However, If I set the evaluation method is 'BR' (best response), then the program will abort in the staging of "Creating BR Mode Evaluator...", I wonder is there anything different between game_cls= LImitHoldem and game_cls=StandardLeduc? If so, how could I modify the code accordingly in order to train the deepCFR model?

Thanks,
Joena

BR evaluation doesn't work with Flop5Holdem

I tried to evaluate sdcfr_vs_deepcfr_h2h.py agent's exploitability with Flop5Holdem, so i changed eval_methods to br, but when I tried to launch it, program freezed on Creating BR Evaluator.

Error on running the code

Im getting this error on running the code: python paper_experiment_bigleduc_exploitability.py

return _env.observation_space.shape[0] + self.action_vector_size
TypeError: 'NoneType' object is not subscriptable

did I missed sth on the running?

Raise Sizes for Flop5Holdem

I have been experimenting with the pretrained agents in thsi repository. Using the example code from PokerRL I've loaded a trained agent and tried to play against it:

from PokerRL.game.InteractiveGame import InteractiveGame
from DeepCFR.EvalAgentDeepCFR import EvalAgentDeepCFR

if __name__ == '__main__':
    eval_agent = EvalAgentDeepCFR.load_from_disk(
        path_to_eval_agent="/home/vishvananda/Deep-CFR/trained_agents/Example_FHP_SINGLE.pkl")

    game = InteractiveGame(env_cls=eval_agent.env_bldr.env_cls,
                           env_args=eval_agent.env_bldr.env_args,
                           seats_human_plays_list=[0],
                           eval_agent=eval_agent,
                           )

    game.start_to_play()

I have noticed that the bet/raise sizes in this form are always pot sized. The preflop raise is always to 300 (which would be a pot size raise over the big blind of 100). Post-flop the first bet is 600 (pot sized) and a raise over the 600 will be to 2400 (once again pot sized -- after calling the 600, the pot contains 1800 which is the ammount of the raise). According to the paper from Brown, Flop Hold Em is a limit game which should mean the preflop and flop bet sizes should always be 100 chips. I'm not sure if it was actually trained with pot size raises and it just wasn't mentioned. If it is intended to be limit raising, is there some way to set up the interactive game to use the proper bet sizes?

Here is a log of the interactive game showing the large bet sizing:

$ python3 interactive_user_v_agent.py                                                                                                                             [40/9667]

                                                _____
                    _____                _____ |6    |
                   |2    | _____        |5    || & & |
                   |  &  ||3    | _____ | & & || & & | _____
                   |     || & & ||4    ||  &  || & & ||7    |
                   |  &  ||     || & & || & & ||____9|| & & | _____
                   |____Z||  &  ||     ||____S|       |& & &||8    | _____
                          |____E|| & & |              | & & ||& & &||9    |
                                 |____h|              |____L|| & & ||& & &|
                                                             |& & &||& & &|
                                                             |____8||& & &|
                                                                    |____6|

____________________________________________ TUTORIAL ____________________________________________
Actions:
0       Fold
1       Call
2       Raise according to current fixed limit

****************************
*        GAME START        *
****************************




___________________________________ preflop - 0 acts ___________________________________
Board:
Last Action:   player_None: None                                                                                                              None |   Main_pot:        0
     Player_0:stack:     19950 current_bet:        50 side_pot_rank:        -1 hand:  Ad, 4s,                     |   Side_pot0:      0
     Player_1:stack:     19900 current_bet:       100 side_pot_rank:        -1 hand:  2d, As,                     |   Side_pot1:      0
Num raises this round:  1


What action do you want to take as player 0?2



___________________________________ preflop - 1 acts ___________________________________
Board:
Last Action:   player_0: 2                                                                                                               300 |   Main_pot:        0
     Player_0:stack:     19700 current_bet:       300 side_pot_rank:        -1 hand:  Ad, 4s,                     |   Side_pot0:      0
     Player_1:stack:     19900 current_bet:       100 side_pot_rank:        -1 hand:  2d, As,                     |   Side_pot1:      0
Num raises this round:  2




___________________________________  flop - 1 acts  ___________________________________
Board:  Qh, Td, 9s, 7d, Jd,
Last Action:   player_1: 1                                                                                                               300 |   Main_pot:      600
     Player_0:stack:     19700 current_bet:         0 side_pot_rank:        -1 hand:  Ad, 4s,                     |   Side_pot0:      0
     Player_1:stack:     19700 current_bet:         0 side_pot_rank:        -1 hand:  2d, As,                     |   Side_pot1:      0
Num raises this round:  0





___________________________________  flop - 0 acts  ___________________________________
Board:  Qh, Td, 9s, 7d, Jd,
Last Action:   player_1: 1                                                                                                                 0 |   Main_pot:      600
     Player_0:stack:     19700 current_bet:         0 side_pot_rank:        -1 hand:  Ad, 4s,                     |   Side_pot0:      0
     Player_1:stack:     19700 current_bet:         0 side_pot_rank:        -1 hand:  2d, As,                     |   Side_pot1:      0
Num raises this round:  0


What action do you want to take as player 0?2



___________________________________  flop - 1 acts  ___________________________________
Board:  Qh, Td, 9s, 7d, Jd,
Last Action:   player_0: 2                                                                                                               600 |   Main_pot:      600
     Player_0:stack:     19100 current_bet:       600 side_pot_rank:        -1 hand:  Ad, 4s,                     |   Side_pot0:      0
     Player_1:stack:     19700 current_bet:         0 side_pot_rank:        -1 hand:  2d, As,                     |   Side_pot1:      0
Num raises this round:  1





___________________________________  flop - 1 acts  ___________________________________
Board:  Qh, Td, 9s, 7d, Jd,
Last Action:   player_1: 0                                                                                                                -1 |   Main_pot:        0
     Player_0:stack:     20300 current_bet:         0 side_pot_rank:        -1 hand:  Ad, 4s,                     |   Side_pot0:      0
    -Player_1:stack:     19700 current_bet:         0 side_pot_rank:        -1 hand:  2d, As,                     |   Side_pot1:      0
Num raises this round:  1



Current Winnings per player: [300.0, -300.0]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.