Coder Social home page Coder Social logo

marcpaulo15 / rl-connect4 Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 5.42 MB

Deep Reinforcement Learning algorithms to play Connect4 using a combination of Supervised Learning and Reinforcement Learning

License: MIT License

Python 56.26% Jupyter Notebook 43.74%
alphago connect4-ai-game connect4-game deepqlearning dqn-pytorch ppo-pytorch pygame pygame-game pygame-games python pytorch pytorch-rl reinforcement-learning reinforcement-learning-agent reinforcement-learning-algorithms two-player-game zero-sum-game zero-sum-games

rl-connect4's Introduction

RL-connect4

The main objective of this project is to compare how different Reinforcement Learning algorithms learn to play Connect4.

We propose a training pipeline that combines Supervised Learning and self-play Reinforcement Learning.

  • Part 1) A convolutional neural network learns to mimic the actions of a mid-level player
    • supervised learning, one-class classification
    • our mid-level player is a hand-crafted heuristic (1-Step minimax search)
  • Part2) Starting from the pre-trained network (with some knowledge of the game), we apply different Deep Reinforcement Learning algorithms (separately) to improve the performance of the pre-trained network (from Part 1).

We transfer the learning from Part 1 to Part 2. The pre-trained convolutional block (from Part 1) is regarded as a feature extractor and is frozen in Part 2. The rest of Fully Connected layers are trained to solve the RL task of each Deep RL algorithm. With this approach, the Reinforcement Learning algorithms do not have to learn from scratch. The training becomes more stable because the first self-play games are not random.

Our training pipeline (transfer learning)

To evaluate the trained agents, they compete against each other, so we can compare them and conclude which algorithm has achieved the highest level of play.

Finally, we present a simple User Interface to let the user play Connect 4 against all the agents trained in this project.

KEY WORDS: connect4, zero-sum games, deep learning, supervised learning, transfer learning, reinforcement learning, self-play, Proximal Policy Optimization, PPO, REINFORCE, Deep Q-Network, DQN, Dueling Deep Q-Network, Dueling DQN.

Agents

Implementation of the agents. There are two types of agents: Baseline Agents, and Trainable Agents.

  • Baseline Agents: implement a non-trainable heuristic to play the game.
    • Random Agent: selects columns at random.
    • Leftmost Agent: selects the leftmost column.
    • N-Step Lookahead Agent: simulates N turns ahead and runs a minimax search to select actions.
  • Trainable Agents: implement a trainable model (neural network) to play the game.
    • Vanilla and Dueling DQN Agents: use a model to estimate the optimal Q-values.
    • REINFORCE and PPO Agents: use a model to estimate the optimal policy.

Data

Implementation of the classes to store and process training data (games).

  • part1 data: synthetic dataset used in the Supervised Learning task.
  • part1 dataset generator: notebook to generate the synthetic dataset used in the Supervised Learning task.
    • 200k (state, actions) pairs played by the 1-Step Lookahead Agent (baseline, mid-level player).
    • Supervised Learning task (classification): predict the actions of the 1-StepLA at each turn.
  • replay memory: a class that serves as an Experience Replay Memory or as an Episode Buffer.
    • backpropagates the last rewards to the intermediate steps.

Environment

Implementation of the Connect4 game as a Reinforcement Learning environment.

  • connect game env: implements the environment (OpenAI gym structure).
  • env utils: some auxiliary functions used by the environment and the agents.

Eval

Implementation of the competition system to evaluate the agents.

  • run episode: implements the logic to let two agents play a Connect4 game.
  • competition: implements the competition system to let two agents play several Connect4 games. ranking

Game

Implementation of a simple User Interface (using Pygame) to let the readers play against the best agents defined in this project.

  • game config: configuration files to customize the application.
  • game logic: classes to implement the logic of the application.
  • connect game main: run this Python script to run the application and play the game. Game Menu Example of an ongoing game

Models

Implementation of the neural network architecture. Contains the best models of each agent.

  • architectures: list of predefined neural network architectures.
  • saved models: weights and training hyper-parameters of the best models trained in this project.
  • custom network: a class to implement a wide range of different neural network architectures

Train

Implements the training pipeline to train each agent (Supervised Learning, self-play Reinforcement Learning)

  • part 1 supervised learning: a policy network learns to predict the actions of a mid-level player:
    • see also: Data/part1 data; Data/part1 dataset generator; Agents/1-Step Lookahead Agent.
  • ppo training: a PPO Agent learns a policy to maximize the expected return.
    • depends on: Train/part 1 supervised learning.
  • reinforce training: a REINFORCE Agent learns a policy to maximize the expected return.
    • depends on: Train/part 1 supervised learning.
  • vanilla and dueling dqn training: a Vanilla DQN or Dueling DQN Agent learns the Q-values.
    • depends on: Train/part 1 supervised learning.

rl-connect4's People

Contributors

marcpaulo15 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.