Coder Social home page Coder Social logo

akkordeon's Introduction

"What I cannot create, I do not understand." - Richard Feynman.

Akkordeon: Training a neural net with Akka

The world is asynchronous.

This project shows how to train a neural net with Akka.

The mechanics are as follows:

A layer is embedded in a gate. A gate is an actor. The results of the forward and backward pass are passed as messages from one gate to the next. Calculations inside a layer are performed asynchronously from other layers. Thus, a layer does not have to wait for the backward pass in order to perform the forward pass of the next batch.

Every gate has its optimizer. Optimization on a layer thus runs asynchronously from other layers. To alleviate the 'delayed gradient' problem, I use an implementation of the 'Asynchronous Stochastic Gradient Descent with Delay Compensation' optimizer.

Data providers are embedded in sentinels and implemented as actors. You can have mutiple sentinels running at the same time, each with a subset of the training data for example. This also allows us to run the training and validation phases concurrently.

All actors can be deployed on a single machine or in a cluster of machines, thus leveraging both horizontal and vertical computing power.

Components

components

Gate

A gate is similar to a layer. Every gate is an actor. Whereas in a traditional network there is only one optimizer for the complete network, here every gate has its optimizer. There is however no difference in functionality, since optimizers do not share data between layers.

A gate can consist of an arbitrarily complex network in itself. You can put multiple convolutional, pooling, batchnorm, dropouts, ... and so on in one gate. Or you can assign them to different gates, thus distributing the work over multiple actors.

Network

A network is a sequence of gates. The sequence is open. You can attach multiple sentinels, each with its data provider, to the network.

Sentinel

The sentinel is an actor, and does a couple of things:

  • provide data, through the data provider, for training, validation and test
  • calculate and report loss and accuracy during training and validation
  • trigger the forward pass for each batch during training, validation and test
  • trigger the backward pass for each batch when training

You can attach multiple sentinels to a network. Typically, one or more sentinels are provided for training, and one for validation. The latter runs every 20 seconds for example, whereas the training sentinels run continuously.

Prepare

After having cloned/downloaded the source code of this project, get the MNIST dataset by executing the script scripts/download_mnist.sh or by manually downloading the files from the URLs in the script, and putting them in a folder data/mnist.

You will need sbt to build the project.

Build and run

Single JVM

sbt 'runMain botkop.akkordeon.SimpleAkkordeon'

This will produce output similar to this:

[info] tdp        epoch:     1 loss:  2.939994 duration: 7105.075212ms scores: (0.22618558114035087)
[info] tdp        epoch:     2 loss:  1.848889 duration: 2339.476822ms scores: (0.4044360040590681)
[info] tdp        epoch:     3 loss:  1.463448 duration: 2278.748975ms scores: (0.5158070709745762)
[info] tdp        epoch:     4 loss:  1.136699 duration: 2245.955278ms scores: (0.6229231711161338)
[info] tdp        epoch:     5 loss:  0.968350 duration: 2309.301106ms scores: (0.6776098002821712)
[info] tdp        epoch:     6 loss:  0.880695 duration: 2259.42184ms scores: (0.7060564301781735)
[info] tdp        epoch:     7 loss:  0.892328 duration: 2856.552759ms scores: (0.7027704402551813)
[info] vdp        epoch:     1 loss:  0.866831 duration: 1768.835725ms scores: (0.7107204861111112)

Multiple JVMs

In this scenario, I show how to deploy the neural net on one JVM, and the sentinels on other JVMs. The JVMs can be deployed on the same machine, or on different machines. Note that when deploying the sentinels on separate machines, you will need to make the data accessible on those machines.

Another scenario that comes to mind is to split the network itself in separate entities, and deploy those on different JVMs. Let's say that for now, I leave this as an exercise for the reader.

Obtain the IP address of the machine on which you want to run the neural net. If you run all JVMs on the same machine, then you can use 127.0.0.1. Append a free port number separated by colon:

export NNADDR=192.168.1.23:25520

Start the neural net in a terminal window:

sbt "runMain botkop.akkordeon.examples.NetworkApp $NNADDR"

Obtain the IP address of a machine on which you want to run a sentinel. If you run all JVMs on the same machine, then you can use 127.0.0.1. The parameter 60000 is the number of samples from the data set you want to use. Start a training sentinel in another terminal:

MY_IP=192.168.0.158
sbt "runMain botkop.akkordeon.examples.SentinelApp $MY_IP train 60000 $NNADDR"

And another one:

MY_IP=192.168.0.159
sbt "runMain botkop.akkordeon.examples.SentinelApp $MY_IP train 3000 $NNADDR"

Also start a validation sentinel.

MY_IP=192.168.0.160
sbt "runMain botkop.akkordeon.examples.SentinelApp $MY_IP validate 10000 $NNADDR"

References

akkordeon's People

Contributors

koen-dejonghe avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.