At first I couldn't get GPU runtimes to be any faster than CPU runtimes on neuralnetwo

neuralnetworks is 2000 times slower using GPU than Theano using CPU about neuralnetworks HOT 1 CLOSED

ivan-vasilev commented on August 29, 2024

neuralnetworks is 2000 times slower using GPU than Theano using CPU

from neuralnetworks.

Comments (1)

ivan-vasilev commented on August 29, 2024

An answer - with a huge delay, for which I don't have an excuse. There are several reasons why this library is significantly slower than other libraries (including Theano) and they are somehow related to the use of Aparapi:

The memory management in Aparapi is very limited - most of the other libraries use highly optimized gpu kernels, usually implemented with CUDA (cudnn being by far the most popular). CUDA provides the ability to rearrange the gpu arrays in a way, which greatly increases the computational speed (this is especially true for convolutional operations). Aparapi simply doesn't have that.
Another serious limitaion of Aparapi is that if there is a chain of gpu kernels (operations), where the input of one operation is the output of the previous operation (which is the case with most neural networks) it is not possible to "contain" this communication within the gpu. This means that the output of one operation is first transfered from the gpu memory to the general RAM and then transfered back from the RAM to the gpu memory to serve as input of the next operation. Unfortunately this greatly reduces the performance.

In conclusion I would say that when I started working on the library I was not aware of any of these limitations and my goal was to introduce myself to the deep learning field and produce something meaningful in the same time. Additionally, I tried to create something which could run on any hardware - thus using java and opencl. At that time cudnn didn't exist and the ony other deep learning library that I was aware of was cuda-convnet, so I didn't have much of a choice anyway. I hope that someday in the future I would be able to port the library to use cudnn.

from neuralnetworks.

Recommend Projects

neuralnetworks is 2000 times slower using GPU than Theano using CPU about neuralnetworks HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent