Coder Social home page Coder Social logo

nagyistge / convnet-benchmarks Goto Github PK

View Code? Open in Web Editor NEW

This project forked from soumith/convnet-benchmarks

0.0 1.0 0.0 764 KB

Easy benchmarking of all public open-source implementations of convnets

Shell 6.21% Cuda 8.30% C 3.64% Makefile 0.52% Python 62.65% Lua 17.31% JavaScript 0.53% C++ 0.83%

convnet-benchmarks's Introduction

convnet-benchmarks

Easy benchmarking of all public open-source implementations of convnets. A summary is provided in the section below.

Machine: 6-core Intel Core i7-5930K CPU @ 3.50GHz + NVIDIA Titan X + Ubuntu 14.04 x86_64

##Imagenet Winners Benchmarking I pick some popular imagenet models, and I clock the time for a full forward + backward pass. I average my times over 10 runs. I ignored dropout and softmax layers.

######One small note: The CuDNN benchmarks are done using Torch bindings. One can also do the same via Caffe bindings or bindings of any other library. This note is here to clarify that Caffe (native) and Torch (native) are the convolution kernels which are present as a default fallback. Some of the frameworks like TensorFlow and Chainer are benchmarked with CuDNN, but it is not explicitly mentioned, and hence one might think that these frameworks as a whole are faster, than for example Caffe, which might not be the case.

AlexNet (One Weird Trick paper) - Input 128x3x224x224

Library Class Time (ms) forward (ms) backward (ms)
Nervana-fp16 ConvLayer 92 29 62
CuDNN[R3]-fp16 (Torch) cudnn.SpatialConvolution 96 30 66
CuDNN[R3]-fp32 (Torch) cudnn.SpatialConvolution 96 32 64
Nervana-fp32 ConvLayer 101 32 69
fbfft (Torch) fbnn.SpatialConvolution 104 31 72
Chainer Convolution2D 177 40 136
cudaconvnet2* ConvLayer 177 42 135
CuDNN[R2] * cudnn.SpatialConvolution 231 70 161
TensorFlow conv2d 277 70 207
Caffe (native) ConvolutionLayer 324 121 203
Torch-7 (native) SpatialConvolutionMM 342 132 210
CL-nn (Torch) SpatialConvolutionMM 963 388 574
Caffe-CLGreenTea ConvolutionLayer 1442 210 1232

Overfeat [fast] - Input 128x3x231x231

Library Class Time (ms) forward (ms) backward (ms)
CuDNN[R3]-fp16 (Torch) cudnn.SpatialConvolution 313 107 206
CuDNN[R3]-fp32 (Torch) cudnn.SpatialConvolution 326 113 213
fbfft (Torch) SpatialConvolutionCuFFT 342 114 227
Nervana-fp16 ConvLayer 355 112 242
Nervana-fp32 ConvLayer 398 124 273
Chainer Convolution2D 620 135 484
cudaconvnet2* ConvLayer 723 176 547
CuDNN[R2] * cudnn.SpatialConvolution 810 234 576
Caffe ConvolutionLayer 823 355 468
TensorFlow conv2d 842 216 626
Torch-7 (native) SpatialConvolutionMM 878 379 499
CL-nn (Torch) SpatialConvolutionMM 963 388 574
Caffe-CLGreenTea ConvolutionLayer 2857 616 2240

OxfordNet [Model-A] - Input 64x3x224x224

Library Class Time (ms) forward (ms) backward (ms)
Nervana-fp16 ConvLayer 529 167 362
Nervana-fp32 ConvLayer 590 180 410
CuDNN[R3]-fp16 (Torch) cudnn.SpatialConvolution 615 179 436
CuDNN[R3]-fp32 (Torch) cudnn.SpatialConvolution 615 196 418
Chainer Convolution2D 885 251 632
fbfft (Torch) SpatialConvolutionCuFFT 1092 355 737
cudaconvnet2* ConvLayer 1229 408 821
CuDNN[R2] * cudnn.SpatialConvolution 1099 342 757
Caffe ConvolutionLayer 1068 323 745
Torch-7 (native) SpatialConvolutionMM 1105 350 755
TensorFlow conv2d 1510 315 1195
CL-nn (Torch) SpatialConvolutionMM 3437 875 2562
Caffe-CLGreenTea ConvolutionLayer 5620 988 4632

GoogleNet V1 - Input 128x3x224x224

Library Class Time (ms) forward (ms) backward (ms)
Nervana-fp16 ConvLayer 283 85 197
Nervana-fp32 ConvLayer 322 90 232
CuDNN[R3]-fp32 (Torch) cudnn.SpatialConvolution 431 117 313
CuDNN[R3]-fp16 (Torch) cudnn.SpatialConvolution 501 109 392
Chainer Convolution2D 687 189 497
TensorFlow conv2d 1084 246 838
Caffe ConvolutionLayer 1935 786 1148
CL-nn (Torch) SpatialConvolutionMM 7016 3027 3988
Caffe-CLGreenTea ConvolutionLayer 9462 746 8716

Layer-wise Benchmarking (Last Updated April 2015)

###Spatial Convolution layer (3D input 3D output, densely connected)

forward + backprop (wrt input and weights)
Original Library Class/Function Benchmarked Time (ms) forward (ms) backward (ms)
fbfft SpatialConvolutionCuFFT 256 101 155
cuda-convnet2 * ConvLayer 977 201 776
cuda-convnet** pylearn2.cuda_convnet 1077 312 765
CuDNN R2 * cudnn.SpatialConvolution 1019 269 750
Theano CorrMM 1225 407 818
Caffe ConvolutionLayer 1231 396 835
Torch-7 SpatialConvolutionMM 1265 418 877
DeepCL ConvolutionLayer 6280 2648 3632
cherry-picking**** best per layer 235 79 155

This table is NOT UPDATED For TITAN-X. These numbers below were on Titan Black and are here only for informational and legacy purposes.

Original Library Class/Function Benchmarked Time (ms) forward (ms) backward (ms)
Theano (experimental)*** conv2d_fft 1178 304 874
Torch-7 nn.SpatialConvolutionBHWD 1892 581 1311
ccv ccv_convnet_layer 809+bw 809
Theano (legacy) conv2d 70774 3833 66941
  • * indicates that the library was tested with Torch bindings of the specific kernels.
  • ** indicates that the library was tested with Pylearn2 bindings.
  • *** This is an experimental module which used FFT to calculate convolutions. It uses a lot of memory according to @benanne
  • **** The last row shows results obtainable when choosing the best-performing library for each layer.
  • L1 - Input: 128x128 Batch-size 128, Feature maps: 3->96, Kernel Size: 11x11, Stride: 1x1
  • L2 - Input: 64x64 Batch-size 128, Feature maps: 64->128, Kernel Size: 9x9, Stride: 1x1
  • L3 - Input: 32x32 Batch-size 128, Feature maps: 128->128, Kernel Size: 9x9, Stride: 1x1
  • L4 - Input: 16x16 Batch-size 128, Feature maps: 128->128, Kernel Size: 7x7, Stride: 1x1
  • L5 - Input: 13x13 Batch-size 128, Feature maps: 384->384, Kernel Size: 3x3, Stride: 1x1
  • The table is ranked according to the total time forward+backward calls for layers (L1 + L2 + L3 + L4 + L5)

#####Breakdown

forward

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library Class/Function Benchmarked L1 L2 L3 L4 L5 Total
fbfft SpatialConvolutionCuFFT 57 27 6 2 9 101
cuda-convnet2 * ConvLayer 36 113 40 4 8 201
cuda-convnet** pylearn2.cuda_convnet 38 183 68 7 16 312
CuDNN R2 cudnn.SpatialConvolution 56 143 53 6 11 269
Theano CorrMM 91 143 121 24 28 407
Caffe ConvolutionLayer<Dtype> 93 136 116 24 27 396
Torch-7 nn.SpatialConvolutionMM 94 149 123 24 28 418
DeepCL ConvolutionLayer 738 1241 518 47 104 2648
cherry-picking**** best per layer 36 27 6 2 8 79
backward (gradInput + gradWeight)

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library Class/Function Benchmarked L1 L2 L3 L4 L5 Total
fbfft SpatialConvolutionCuFFT 76 45 12 4 18 155
cuda-convnet2 * ConvLayer 103 467 162 15 29 776
cuda-convnet** pylearn2.cuda_convnet 136 433 147 15 34 765
CuDNN R2 cudnn.SpatialConvolution 139 401 159 19 32 750
Theano CorrMM 179 405 174 29 31 818
Caffe ConvolutionLayer<Dtype> 200 405 172 28 30 835
Torch-7 nn.SpatialConvolutionMM 206 432 178 29 32 877
DeepCL ConvolutionLayer 484 2144 747 59 198 3632
cherry-picking**** best per layer 76 45 12 4 18 155

convnet-benchmarks's People

Contributors

bbagcilar avatar dadaism avatar f0k avatar hughperkins avatar liuliu avatar naibaf7 avatar nicholas-leonard avatar nouiz avatar ozabluda avatar soumith avatar yangqing avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.