Coder Social home page Coder Social logo

singularity-deep-learning's Introduction

A Singularity Container Recipe for Deep Learning

Intended Purpose

This repository contains the recipe for a Singularity container equipped with Scikit-Learn, TensorFlow, and Keras. With it, you can harness the power of GPUs on remote servers where you have limited permissions!

It came about because I wanted to do some deep learning tasks on GPU-equipped servers with inconsistent environments.

It took me quite a bit of time to figure out how to do this. I had to dig through repositories, blog posts, and web forums. I hope this repository helps other people figure things out faster than I did. The script hasn't been tested across systems, etc., but it may be useful regardless.

Usage

Follow these steps to build the Singularity image:

  • Go to a host where you have root access (e.g., your personal computer). Clone this repository and mv into it.

  • Make sure you have Singularity installed. If you're on a Mac, this may entail setting up a Vagrant virtual machine.

  • Download the following NVIDIA requirements and move them to this directory:

  • Set the variables in the build_image.sh script.

    • NVIDIA driver version. 384.111 may be a safe choice. It needs to be compatible with the CUDA installer, and with the GPUs you'll be using. These drivers will be installed via apt-get from within the container during construction.
    • CUDA version (get it from the name of the installer: cuda_<version>_linux.run )
    • cuDNN version (get it from the name of the installer: cudnn-<version>.tgz )
  • Run the script as root: sudo bash build_image.sh. Then relax. It takes a little bit of time to build the image.

  • scp the resulting image file to a host you want to work on---one that has Singularity installed, and has TensorFlow-capable GPUs.

  • Test your image, e.g., by running singularity exec --nv <your image> python3 keras_mnist_test.py

If all goes well, you should see something like:

$ singularity exec --nv tf_keras.simg python3 keras_mnist_test.py 
Using TensorFlow backend.
60000 train samples
10000 test samples
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                5130      
=================================================================
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________
Train on 60000 samples, validate on 10000 samples
Epoch 1/20
2018-07-17 17:55:10.972714: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
2018-07-17 17:55:12.698975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: Quadro P5000 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:0b:00.0
totalMemory: 15.90GiB freeMemory: 15.78GiB
2018-07-17 17:55:13.019224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 1 with properties: 
name: Quadro P5000 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:11:00.0
totalMemory: 15.90GiB freeMemory: 15.78GiB
2018-07-17 17:55:13.019478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Device peer to peer matrix
2018-07-17 17:55:13.019769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1126] DMA: 0 1 
2018-07-17 17:55:13.019843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 0:   Y N 
2018-07-17 17:55:13.019893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 1:   N Y 
2018-07-17 17:55:13.019981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Quadro P5000, pci bus id: 0000:0b:00.0, compute capability: 6.1)
2018-07-17 17:55:13.020032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Quadro P5000, pci bus id: 0000:11:00.0, compute capability: 6.1)
60000/60000 [==============================] - 20s 333us/step - loss: 0.2449 - acc: 0.9251 - val_loss: 0.1060 - val_acc: 0.9658
Epoch 2/20
60000/60000 [==============================] - 4s 68us/step - loss: 0.1024 - acc: 0.9684 - val_loss: 0.0777 - val_acc: 0.9763

Other Useful Resources:

singularity-deep-learning's People

Contributors

dpmerrell avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.