Coder Social home page Coder Social logo

gym-coinche's Introduction

Coinche R.O.

Coinche is a game card.

Here are the rules. The coinche is played between two teams of two players using a set of 32 classic cards (8 cards of each color - tile, clover, heart and spades). La Coinche is a strategic game similar to the Bridge. In Coinche, players must evaluate their hands and estimate the number of points they think they can reach in the game. Players make contracts and the ads have a major role in the game.

A round of Coinche could be divided into two distinct parts:

  • First, given their hand, players propose a contrat (number of points to do + the suit)
  • Secondly, the players play heigh tricks

Objectif of the project

We want to apply Reinforcement Learning to Coinche. To begin, we'll focus only on the second phase of a round. Players we'll be provided their eight cards' game and a contrat. The AI player will then have to both learn the rules and some strategies

One of the issue that we are facing would be the possibility to have the AI playing a card while it is not allow given the context. To prevent such position, we chose to deliver as output of the AI a vector of probability rather than a card. Therefore if the AI favors a card that can't be played, the gym envirronment will do a mask on the possible strokes and select the highest probability card favored by the AI

Installation

We use the framework gym linked with RL Coach that provides deep learning algorithm.

Gym environment takes care of the game in itself (rules, rounds) and provide standard outputs to reiforcement learning algorithms based on steps

$ pip install requirements.txt

Random score prediction and contrat selection

In order to simulate the contrat selection phase, we train a Machine Learning model on random games to predict the expected value of a game. More precisly, given the player hands, we aimed at predicting the number of points the team won playing randomly against a random team. Therefore we are able to predict if a game is "interesting" or not (regardless of the strategy)

The contrat selection phase is done as follow:

For each team:
    for each possible atout suit:
        compute the expected value of the game
    choose the best exepected value and the atout suit that comes with it

Select the attacking team that offers the best expected value and the atout suit that comes with it

To train the reward_model, run reward_prediction/train_reward_prediction_model.py. It will generate a tf model, which weifhts are saved under h5 format.

Be sure to generate first the random data using random_games_and_rewards_analysis.ipynb.

NB: This approach could also be reused with policies different from random. It could even be more precise

Training

To launch a training, you'll need a preset.py file and be sure to have correct registers in init.py.

A register should look like:

register(
    id='coinche-v1',
    entry_point='coinche.gym.env:GymCoinche',
    kwargs={
        'players': [
            AIPlayer("./path/to/checkpoint/chckp_name.ckpt", 0, "N"),
            RandomPlayer(1, "E"),
            GymPlayer(2, "S"),
            RandomPlayer(3, "W")
        ],
        'contrat_model_path': './reward_prediction/reward_model.h5'
    }
)

it contains:

  • An id (mandatory ! It is the id that must be given in the preset.py)
  • The entry point (do not change)
  • kwargs
    • If you want to use specific policy agents (either pretrained policies or custom deterministic policies), file the list. BE CAREFULL, we advise to keep the gymPlayer at index n°2
    • If you want to use a model to predict the random score (see above) in order to simulate the contrat selection phase, just give the path (Don't change the kwargs name)

Then, once you have change the levele name in the preset:

###############
# Environment #
###############
env_params = GymVectorEnvironment(level='coinche-v1') <--- change level name given the register

you are ready to launch RL-coach using command line:

# to run a coach preset

$ coach -p ./preset.py -e coinche -ep ./experiments/ -s 270 -c

# To run a coach preset restarting training at a checkpoint
coach -p ./preset.py -e coinche -ep ./experiments/ -s 270 -c -crd ./path/to/checkpoint/

The training will be save in the directory ./experiments, and the tensorflow checkpoints will be saved every 270 seconds

You can then you the checkpoints policies will playing by calling :

  • AIPlayer("./path/to/checkpoint/chckp_name.ckpt", 0, "N")

Policy comparaison:

An important thing is to determine if a policy is better than another. To do so, it is possible to make compete to policies against each other and to use statistical approach.

Please refer to policy_comparaison.ipynb: It contains:

  • the brick to simulates the game
  • a quick description of the distribution of number o points won by each team
  • a Statitistical test brick where H0 hypothesys is that there are no significant differencies between the two poilcies

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

C.R.O.

gym-coinche's People

Contributors

ericpicot avatar clemri94 avatar thommei avatar nlaille avatar mtlouis avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gym-coinche's Issues

[Minor Issue] Deck Composition

Deck is reshuffled after each Round.
This should only happen at the begining of a new game (ie. before the first round of a game)

Investigate on how to get good outputs when calling checkpoint graph

When initializing a environment, we declare that the output format is (32,):
self.action_space = spaces.Box(low=0, high=1, shape=(32,))

The output is bounded in [0, 1], and during the training (even when relaunching a training from a checkpoint), it is correctly bounded (next phot shows min and max of the 32 dim output while training:
Capture d’écran 2020-07-20 à 13 29 35

Now, if we load a checkpoint to use this policy calling:
` def get_action(self, observation):

    with self.sess.as_default():
        with self.graph.as_default():
            actions = self.sess.run(self.output_tensor, {self.input_tensor: observation})
    action = np.squeeze(actions, axis=0)
    return action` (master/coinche/player.py)

the output can be negative:
Capture d’écran 2020-07-20 à 13 33 09

We need to investigate in order to have an output that is coorectly bounded

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.