Comments (5)
Yeah it's a good time to discuss it, and in the context of the package more broadly. We want an API where users can plug and play their own components, yet is:
- fast on CPU
- works CPU parallel
- fast on GPU
- differentiable.
This is clearly a big ask, and under the hood we will likely have to have multiple implementations. I think the DiffEq ecosystem has two versions, in-place and out-of-place, of some algorithms.
Personally, I am willing to sacrifice some speed to achieve the above aims. I doubt we will ever be able to match the likes of GROMACS for speed, but if we are in the same ballpark and can offer a powerful and flexible API then we have a useful contribution.
My current thinking is to have two implementations, which would be used like this:
Standard | Differentiable | |
---|---|---|
CPU | A | B |
CPU parallel | A | - |
GPU | B | B |
A is the current implementation on master, which is already 2 implementations in the sense that it branches based on parallel
, but we'll ignore that. B would be a broadcasted, out-of-place version that allocates more memory but works with Zygote and makes use of GPU broadcasting without custom kernels. Of course we could write GPU kernels for everything, but that would be another implementation and if we can get decent performance with broadcasting I'd prefer to do that.
I have some rough work on the gpu
branch that runs but is slower than the main CPU implementation, currently by a factor of 5x on CPU and 2.5x on GPU. Profiling suggests the time is in broadcasting but I'll have to dig further and probably ask some people who know more about Julia on GPUs. It is nearly differentiable, I think there is one adjoint missing. Consequently, this could be the foundation for implementation B.
There are some changes to the force functions to make broadcasting work. We would probably have to make the corresponding changes to implementation A to make the APIs match, but I think it's possible without sacrificing performance.
KernelAbstractions.jl is a possibility though I don't know too much about it, or how it works with autodiff.
Let me know what you think of the above thoughts.
from molly.jl.
I am in favor of the in-place and out of place APIs. I think this would go well with a future DiffEq integration (i.e. replace the simulators with ODEProblems).
I'm still not sure why the out of place version would allocate more in the CPU if we use StaticArrays. I will take a closer look to that branch.
Regarding differentiability, I think it would be simpler to do it after the DiffEq integration (I'll use a separate issue for that discussion).
from molly.jl.
Regarding performance and broadcasting, one of the optimizations in the energy function PR was about specializing the dr
computation for SVectors
by manually unrolling the broadcast. I'm not sure how this would work in the GPU context, but that computation is one of the most performance sensitive parts of the code (as almost anything needs that distance).
Also, regarding benchmarks, I would be very curious to see how we compare to GROMACS. Do you have a test problem for that?
from molly.jl.
Sounds sensible. I have made progress on fixing the broadcasting issues, due to GPU array views in a new broadcast I introduced to loop over ij pairs. The GPU implementation now looks faster than the CPU implementation by >10x. It's still rough and ready, and calculates each force twice, but I'll have some time to work on it in the next week.
Also, regarding benchmarks, I would be very curious to see how we compare to GROMACS. Do you have a test problem for that?
I used to have GROMACS timings for the peptide test case, which is nice in that it combines bonded and non-bonded forces, but I think they are at work and I am now working from home. It shouldn't be too hard to run a short simulation from the top
file in data/5XER
to get a rough time per timestep, I'll try and do that at some point. We are a way off in terms of implemented algorithms though: we would need to implement cell-based neighbour lists, Ewalds summation and SHAKE-like hydrogen algorithms for that case to be a fair comparison.
from molly.jl.
The GPU path now uses CUDA.jl kernels and works with AD.
from molly.jl.
Related Issues (20)
- Links in Docs are broken HOT 2
- Boltzmann Constant in LAMMPS 'real' units does not work HOT 5
- AtomsBase not properly implemented HOT 4
- GPU error with custom pairwise interactions HOT 5
- Inconsistent Units Crash Simulation HOT 5
- apply_coupling! method is not found for custom coupling function HOT 2
- Example for new MC membrane barostat HOT 12
- How to cite? HOT 6
- [feature request] velocity-dependent forces HOT 2
- Should we pass more properties (e.g. velocities, step number) to `force`/`potential_energy`? HOT 4
- Should we have functions to add and remove atoms to/from a `System`? HOT 3
- open an sdf file HOT 10
- Molly.jl: AD with Enyzme returns 0 gradient HOT 1
- Units in Nose Hoover temperature HOT 1
- GPU error with the example HOT 4
- return type for force for pairwise potential in CUDA HOT 1
- sys.coords does not work as Vector{Vector} HOT 2
- EAM implementation and dimension error HOT 5
- Boundary Conditions HOT 1
- UndefVarError: `σ6` not defined ERROR when simulating the NPT ensemble HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from molly.jl.