Comments (10)
Hi @anjohan!
Wow, 250 k atoms :) Are you actually working with systems of this magnitude?
The linear scaling in classical MD is largely possible if there is no need to calculate neighbour lists. This is typically the case with fixed bonds. As soon as you have to start figuring out which atoms are within the radial cutoff, there is a need to calculate the pairwise distances between atoms.
SOAP is quite interesting in this case. In theory, there is no need to ever calculate pairwise distances. One could just form a gaussian atomic density based on all atoms. This would make the algorithm scale linearly. In practise however, this linear calculation will become quite expensive in terms of time and memory if the atomic density consists of thousands of atoms.
So instead, in our current implementation, only atoms within the cutoff are included in the atomic density when calculating the expansion on individual sites. This requires that we calculate the pairwise distances once. With the current implementation this results in the O(n^2) scaling you here demonstrate. The current distance calculation is quite naive and I can improve it to scale as O(n log n). This faster distance calculation is actually already implemented for many other descriptors in the package. It was not implemented for SOAP as with typical system sizes this distance calculation is negligible, and only starts to dominate the SOAP calculation at very large system sizes.
So in summary, I will implement the faster pairwise distance calculation, which will make it scale O(n log n). I will also see if the fully linear calculation is feasible, but I fear it has such a big constant prefactor that it will not be usable in practice.
from dscribe.
With 250 000 atoms, each frame takes 1.5 hours.
With what SOAP settings do you get 1.5 h/configuration?
from dscribe.
Thanks for the quick response!
Wow, 250 k atoms :) Are you actually working with systems of this magnitude?
Yes (unfortunately). I'm using the SOAP vectors to visualise phase transitions for fairly large systems. This may not have been your intended purpose, but it's working superbly!
The linear scaling in classical MD is largely possible if there is no need to calculate neighbour lists. This is typically the case with fixed bonds. As soon as you have to start figuring out which atoms are within the radial cutoff, there is a need to calculate the pairwise distances between atoms.
SOAP is quite interesting in this case. In theory, there is no need to ever calculate pairwise distances. One could just form a gaussian atomic density based on all atoms. This would make the algorithm scale linearly. In practise however, this linear calculation will become quite expensive in terms of time and memory if the atomic density consists of thousands of atoms.
So instead, in our current implementation, only atoms within the cutoff are included in the atomic density when calculating the expansion on individual sites. This requires that we calculate the pairwise distances once. With the current implementation this results in the O(n^2) scaling you here demonstrate. The current distance calculation is quite naive and I can improve it to scale as O(n log n). This faster distance calculation is actually already implemented for many other descriptors in the package. It was not implemented for SOAP as with typical system sizes this distance calculation is negligible, and only starts to dominate the SOAP calculation at very large system sizes.
So in summary, I will implement the faster pairwise distance calculation, which will make it scale O(n log n). I will also see if the fully linear calculation is feasible, but I fear it has such a big constant prefactor that it will not be usable in practice.
Hm. Thanks to dscribe, I haven't had to dive too deeply into the descriptor calculations, but I was hoping that it should be linearly scaling due to the cutoff. From what I understand, SOAP vectors rely on summing Gaussians centred at each neighbour and then projecting this onto some basis. This is conceptually similar to computing short-range forces in MD, so I was hoping the spatial binning trick would work, i.e.
- Divide space into regions of size rcut x rcut x rcut.
- For each atom, its neighbours are found either inside the same region or inside the nearest 26 regions.
The time thus ends up being proportional to (number of atoms) x (number of neighbours), where the number of neighbours is approximately rho x (27rcut^3), rho being the density. While the number of neighbours is a pretty large prefactor, it's still overall linear, and with N=250k, the number of neighbours per atom is very small compared to the total number of atoms.
Would this not work for SOAP vectors?
I did look around inside the libsoap directory, but the code is (understandably) quite involved.
From what I gather, the getFilteredPos
function is called inside to soap
function to collect the neighbours of a given atom (or position) by looping over all the atoms and determining the ones which are inside the cutoff sphere.
If you create a spatial binning of the atoms, the getFilteredPos
function would only need to loop over a small subset of the atoms to find the neighbours.
Sorry if my explanation is rambly, here's some pseudo-code:
def soap(...):
atomlist = way of storing atoms
L = system size (assuming cubic)
rcut = cutoff
Nbins = ceil(L/rcut)
binsize = L/Nbins
all_atoms = atomlist with all the atoms
soap_positions = where SOAPs should be calculated
atomlists = atomlist[Nbins][Nbins][Nbins] # array of empty atom lists
# do spatial binning, fill atomlists
for atom in all_atoms:
i = atom.x/binsize
j = atom.y/binsize
k = atom.z/binsize
atomlists[i][j][k].append(atom)
for position in soap_positions:
neighbours = getFilteredPos(position, atomlists)
calculate SOAPs based on neighbours
def getFilteredPos(position, atomlists):
# Determine which bin
this_i = position.x/binsize
this_j = position.y/binsize
this_k = position.z/binsize
neighbours = []
# Loop over neighbouring bins
for i in this_i+[-1,0,1]:
for j in this_j+[-1,0,1]:
for k in this_j+[-1,0,1]:
for atom in atomlists[i][j][k]:
if norm(atom.position - position) < rcut:
neighbours.append(atom)
return neighbours
With what SOAP settings do you get 1.5 h/configuration?
rcut = 6.0, nmax=lmax=3. I'm essentially running the system in my example above with 50x50x50 unit cells. With 18x18x18 unit cells (11664 atoms), each frame took 10-20 seconds. These runs are on fairly standard intel nodes on the cluster. Since I have to calculate SOAP vectors for 100-1000 of these frames, I've parallelised it with mpi4py.
from dscribe.
Thanks for the detailed answer! You're application sounds very interesting and I would be very interested to see what comes out of such visualization.
I think you are right and the binning might be the best solution since there is only one global cutoff. Should have thought about that earlier, thanks for pointing it out!
I will try my hand at implementing and testing it. Unfortunately I cannot promise any schedule, as I don't right now have too much time to spend on this.
from dscribe.
No worries, this isn't urgent. The SOAP vector calculation still takes much less time than the MD simulation, since I'm only calculating descriptors for a small subset of the frames and using mpi4py.
I should also be able to add a naive (but linearly scaling) binning myself, now that I understand more of libsoap. I may have some time later this week.
from dscribe.
I've started working on an implementation.
from dscribe.
That's great! External contributors are very welcome.
Will you be doing this on the python side or in C/C++? For now this doesn't really matter, but eventually, I would want this functionality in the C/C++ side and available to all the other descriptors as well.
from dscribe.
In C. As a result, I'm currently enjoying some segfaults.
from dscribe.
Have fun :>
from dscribe.
Thanks @anjohan for contributing the spatial binning algorithm! It is now fully tested and integrated into SOAP in version 0.3.2.
from dscribe.
Related Issues (20)
- Is it possible to parallelize `lmbtr.create` when working on one `ase.Atoms` object? HOT 3
- Error with np.str (NumPy >= 1.24) HOT 1
- Descriptor that recognizes each atom of the same species differently HOT 1
- The example in README.md is not correct HOT 1
- [Bug] Error in SOAP derivatives when using weighting. HOT 2
- API compatibility is broken since 0696656 HOT 1
- GNU compiler warnings HOT 1
- Issue with the Coulomb matrix descriptor HOT 4
- dscribe setup broken with py 3.10.1? HOT 2
- Please make dscribe available at conda-forge/osx-arm64 channel HOT 3
- SOAP computation hangs when `calc` is not `None` in ASE trajectory HOT 2
- Can dscribe encode the atom type not in the list as "unkown type" in acsf calculation? HOT 2
- Descsize of ASCF HOT 1
- conda channel has no function for features' derivatives
- `CoulombMatrix(permutation="sorted_l2")` is not symmetric HOT 5
- Naming incosistency of rcut in SOAP and MBTR HOT 2
- Potential memory leak in MBTR HOT 2
- Analytical derivatives of SOAP HOT 4
- Identical geometry but similarity < 1 HOT 4
- Numerical SOAP derivatives for periodic systems HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dscribe.