Coder Social home page Coder Social logo

mlgpu's Introduction

##Algorithms in this library have not been extensively tested; some likely have bugs William Agnew's Machine Learning Library

This library was implement largely for learning purposes; I do not have the time to implement the optimizations and features necessary to compete with the likes of TensorFlow or Theano. However, implementing everything from scratch, including GPU acceleration, was a fun programming challenge that gave me a much deeper intuition for both machine learning algorithms and the structure of popular machine learning libraries. I talk about what I learned in the last section.

Many of the basics of reinforcement machine learning I learned from https://webdocs.cs.ualberta.ca/~sutton/book/ebook/the-book.html, and supervised machine learning http://neuralnetworksanddeeplearning.com/

##Requirements

  • JCUDA >=0.6 (and necessary GPU and CUDA setup)
  • Apache Commons Math 3.x

##Easy Things to Run

  • Monte Carlo Value Iteration Reinforcement Learning+Neural Network State Valuation Checkers: src/checkers/testCheckers.java

* MNIST: src/test/MNISTNumbers.java

##Reinforcement Learning

##Supervised Learning

  • Feedforward Neural Networks
  • Fully Connected Layers
  • Convolution and Pooling Layers
  • Fully Connected Layers Optimized for Sparse Inputs
  • Convolution Layers Optimized for Sparse Inputs
  • Sigmoid, TanH, ReLU, SoftMax
  • Euclidean and Cross Entropy Loss
  • L2 Regularization
  • SGD, Multithreaded (Multiple GPU workers) SGD
  • RProp, Multithreaded (Multiple GPU workers) RProp
  • Nestrov Momentum
  • Unsupervised Pretraining

##GPU Interface

  • n-Dimensional FP32 GPU Accelerated Matrices with Wrappers to Many BLAS Calls (used JBLAS Library)
  • n-Dimensional Sparse FP32 GPU Accelerated Matrices with Wrappers to Many BLAS Calls (used JBLAS Library)

##Lessons Learned

  • Moving data between CPU and GPU memory is slow and must be essentially eliminated to not be a bottleneck.
  • CUDA memory management is weird. When transferring a matrix from CPU to GPU memory, profiling revealed that it is faster to zero currently malloc'ed (but no longer needed) GPU memory and then copy the matrix from CPU memory into this zeroed GPU memory than to malloc new GPU memory and copy the matrix into that memory. However, I suspect there is much I do not understand about GPU memory (ex. asynchronous transfers).
  • Actually relevant to major ML libraries Although I did not know it when I wrote this library, ML libraries like Theano and Tensorflow will copy over large chunks of memory to GPU-ex. a mini batch of training examples. This is a problem if the minibatch size exceeds the GPU memory size. While there are of course programmatic ways around this, my library completely avoids this problem by only copying matrices to GPU when a GPU function (ex. saxpy) has been called on them, and copying them back to CPU only when a CPU only function has been called on them. It would be a simple matter to combine this with making space if necessary on the GPU by sending matrices in GPU memory back to CPU memory, and drawing on the rich caching literature and predictable nature of most training algorithms to minimize the number of GPU-CPU matrix transfers while giving the ML training machine memory equal to CPU memory+GPU memory at no effort to the user-programmer.
  • While I got away with not doing it for feedforward neural nets, it would be conceptually and programmatically easier to think of neural nets as graphs where edges represent data flow and vertices are differentiable functions: see my next machine learning library, https://github.com/wagnew/ComputeGraph, for implementation of this concept.

mlgpu's People

Contributors

wagnew3 avatar

Stargazers

breandan avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.