Coder Social home page Coder Social logo

agogo's Introduction

agogo

A reimplementation of AlphaGo in Go (specifically AlphaZero)

About

The algorithm is composed of:

  • a Monte-Carlo Tree Search (MCTS) implemented in the mcts package;
  • a Dual Neural Network (DNN) implemented in the dualnet package.

The algorithm is wrapped into a top-level structure (AZ for AlphaZero). The algorithm applies to any game able to fulfill a specified contract.

The contract specifies the description of a game state.

In this package, the contract is a Go interface declared in the game package: State.

Description of some concepts/ubiquitous language

  • In the agogo package, each player of the game is an Agent, and in a game, two Agents are playing in an Arena

  • The game package is loosely coupled with the AlphaZero algorithm and describes a game's behavior (and not what a game is). The behavior is expressed as a set of functions to operate on a State of the game. A State is an interface that represents the current game state as well as the allowed interactions. The interaction is made by an object Player who is operating a PlayerMove. The implementer's responsibility is to code the game's rules by creating an object that fulfills the State contract and implements the allowed moves.

Training process

Applying the Algo on a game

This package is designed to be extensible. Therefore you can train AlphaZero on any board game respecting the contract of the game package. Then, the model can be saved and used as a player.

The steps to train the algorithm are:

  • Creating a structure that is fulfilling the State interface (aka a game).
  • Creating a configuration for your AZ internal MCTS and NN.
  • Creating an AZ structure based on the game and the configuration
  • Executing the learning process (by calling the Learn method)
  • Saving the trained model (by calling the Save method)

The steps to play against the algorithm are:

  • Creating an AZ object
  • Loading the trained model (by calling the Read method)
  • Switching the agent to inference mode via the SwitchToInference method
  • Get the AI move by calling the Search method and applying the move to the game manually

Examples

Four board games are implemented so far. Each of them is defined as a subpackage of game:

tic-tac-toe

Tic-tac-toe is a m,n,k game where m=n=k=3.

Training

Here is a sample code that trains AlphaGo to play the game. The result is saved in a file example.model

// encodeBoard is a GameEncoder (https://pkg.go.dev/github.com/gorgonia/agogo#GameEncoder) for the tic-tac-toe
func encodeBoard(a game.State) []float32 {
     board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
     for i := range board {
     if board[i] == 0 {
          board[i] = 0.001
     }
     }
     playerLayer := make([]float32, len(a.Board()))
     next := a.ToMove()
     if next == game.Player(game.Black) {
     for i := range playerLayer {
          playerLayer[i] = 1
     }
     } else if next == game.Player(game.White) {
     // vecf32.Scale(board, -1)
     for i := range playerLayer {
          playerLayer[i] = -1
     }
     }
     retVal := append(board, playerLayer...)
     return retVal
}

func main() {
    // Create the configuration of the neural network
     conf := agogo.Config{
         Name:            "Tic Tac Toe",
         NNConf:          dual.DefaultConf(3, 3, 10),
         MCTSConf:        mcts.DefaultConfig(3),
         UpdateThreshold: 0.52,
     }
     conf.NNConf.BatchSize = 100
     conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well)
     conf.NNConf.K = 3
     conf.NNConf.SharedLayers = 3
     conf.MCTSConf = mcts.Config{
         PUCT:           1.0,
         M:              3,
         N:              3,
         Timeout:        100 * time.Millisecond,
         PassPreference: mcts.DontPreferPass,
         Budget:         1000,
         DumbPass:       true,
         RandomCount:    0,
     }

     conf.Encoder = encodeBoard

    // Create a new game
    g := mnk.TicTacToe()
    // Create the AlphaZero structure 
    a := agogo.New(g, conf)
    // Launch the learning process
    err := a.Learn(5, 50, 100, 100) // 5 epochs, 50 episode, 100 NN iters, 100 games.
    if err != nil {
        log.Println(err)
    }
    // Save the model
     a.Save("example.model")
}

Inference

func encodeBoard(a game.State) []float32 {
    board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
    for i := range board {
        if board[i] == 0 {
            board[i] = 0.001
        }
    }
    playerLayer := make([]float32, len(a.Board()))
    next := a.ToMove()
    if next == game.Player(game.Black) {
        for i := range playerLayer {
            playerLayer[i] = 1
        }
    } else if next == game.Player(game.White) {
        // vecf32.Scale(board, -1)
        for i := range playerLayer {
            playerLayer[i] = -1
        }
    }
    retVal := append(board, playerLayer...)
    return retVal
}

func main() {
    conf := agogo.Config{
        Name:     "Tic Tac Toe",
        NNConf:   dual.DefaultConf(3, 3, 10),
        MCTSConf: mcts.DefaultConfig(3),
    }
    conf.Encoder = encodeBoard

    g := mnk.TicTacToe()
    a := agogo.New(g, conf)
    a.Load("example.model")
    a.A.Player = mnk.Cross
    a.B.Player = mnk.Nought
    a.B.SwitchToInference(g)
    a.A.SwitchToInference(g)
    // Put x int the center
    stateAfterFirstPlay := g.Apply(game.PlayerMove{
        Player: mnk.Cross,
        Single: 4,
    })
    fmt.Println(stateAfterFirstPlay)
    // ⎢ · · · ⎥
    // ⎢ · X · ⎥
    // ⎢ · · · ⎥

    // What to do next
    move := a.B.Search(stateAfterFirstPlay)
    fmt.Println(move)
    // 1
    g.Apply(game.PlayerMove{
        Player: mnk.Nought,
        Single: move,
    })
    fmt.Println(stateAfterFirstPlay)
    // ⎢ · O · ⎥
    // ⎢ · X · ⎥
    // ⎢ · · · ⎥
}

Misc

A Funny Thing Happened On The Way To Reimplementing AlphaGo - A talk by @chewxy (one of the authors) about this specific implementation

Credits

Original implementation credits to

agogo's People

Contributors

carleeto avatar chewxy avatar owulveryck avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.