Coder Social home page Coder Social logo

GPU compatibility about molly.jl HOT 5 CLOSED

juliamolsim avatar juliamolsim commented on May 27, 2024
GPU compatibility

from molly.jl.

Comments (5)

jgreener64 avatar jgreener64 commented on May 27, 2024

Yeah it's a good time to discuss it, and in the context of the package more broadly. We want an API where users can plug and play their own components, yet is:

  • fast on CPU
  • works CPU parallel
  • fast on GPU
  • differentiable.

This is clearly a big ask, and under the hood we will likely have to have multiple implementations. I think the DiffEq ecosystem has two versions, in-place and out-of-place, of some algorithms.

Personally, I am willing to sacrifice some speed to achieve the above aims. I doubt we will ever be able to match the likes of GROMACS for speed, but if we are in the same ballpark and can offer a powerful and flexible API then we have a useful contribution.

My current thinking is to have two implementations, which would be used like this:

Standard Differentiable
CPU A B
CPU parallel A -
GPU B B

A is the current implementation on master, which is already 2 implementations in the sense that it branches based on parallel, but we'll ignore that. B would be a broadcasted, out-of-place version that allocates more memory but works with Zygote and makes use of GPU broadcasting without custom kernels. Of course we could write GPU kernels for everything, but that would be another implementation and if we can get decent performance with broadcasting I'd prefer to do that.

I have some rough work on the gpu branch that runs but is slower than the main CPU implementation, currently by a factor of 5x on CPU and 2.5x on GPU. Profiling suggests the time is in broadcasting but I'll have to dig further and probably ask some people who know more about Julia on GPUs. It is nearly differentiable, I think there is one adjoint missing. Consequently, this could be the foundation for implementation B.

There are some changes to the force functions to make broadcasting work. We would probably have to make the corresponding changes to implementation A to make the APIs match, but I think it's possible without sacrificing performance.

KernelAbstractions.jl is a possibility though I don't know too much about it, or how it works with autodiff.

Let me know what you think of the above thoughts.

from molly.jl.

SebastianM-C avatar SebastianM-C commented on May 27, 2024

I am in favor of the in-place and out of place APIs. I think this would go well with a future DiffEq integration (i.e. replace the simulators with ODEProblems).

I'm still not sure why the out of place version would allocate more in the CPU if we use StaticArrays. I will take a closer look to that branch.

Regarding differentiability, I think it would be simpler to do it after the DiffEq integration (I'll use a separate issue for that discussion).

from molly.jl.

SebastianM-C avatar SebastianM-C commented on May 27, 2024

Regarding performance and broadcasting, one of the optimizations in the energy function PR was about specializing the dr computation for SVectors by manually unrolling the broadcast. I'm not sure how this would work in the GPU context, but that computation is one of the most performance sensitive parts of the code (as almost anything needs that distance).

Also, regarding benchmarks, I would be very curious to see how we compare to GROMACS. Do you have a test problem for that?

from molly.jl.

jgreener64 avatar jgreener64 commented on May 27, 2024

Sounds sensible. I have made progress on fixing the broadcasting issues, due to GPU array views in a new broadcast I introduced to loop over ij pairs. The GPU implementation now looks faster than the CPU implementation by >10x. It's still rough and ready, and calculates each force twice, but I'll have some time to work on it in the next week.

Also, regarding benchmarks, I would be very curious to see how we compare to GROMACS. Do you have a test problem for that?

I used to have GROMACS timings for the peptide test case, which is nice in that it combines bonded and non-bonded forces, but I think they are at work and I am now working from home. It shouldn't be too hard to run a short simulation from the top file in data/5XER to get a rough time per timestep, I'll try and do that at some point. We are a way off in terms of implemented algorithms though: we would need to implement cell-based neighbour lists, Ewalds summation and SHAKE-like hydrogen algorithms for that case to be a fair comparison.

from molly.jl.

jgreener64 avatar jgreener64 commented on May 27, 2024

The GPU path now uses CUDA.jl kernels and works with AD.

from molly.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.