juliamolsim / dftk.jl Goto Github PK
View Code? Open in Web Editor NEWDensity-functional toolkit
Home Page: https://docs.dftk.org
License: MIT License
Density-functional toolkit
Home Page: https://docs.dftk.org
License: MIT License
Opening this issue to track ideas on that front. I think we are already pretty good at multithreading. Do we want DFTK to work on GPUs and clusters?
GPU is probably the easiest to get working with minimal changes in the code. We can probably get the entire iterative eigensolver to work on the GPU quite easily. I'm not 100% sure of what the speedup would be though; we should maybe try to benchmark FFTs. I don't think FFTs are something GPUs are good at. Even nvidia, who usually enjoy throwing around irrealistic speedups, give a relatively modest "up to 10x" on their website, so I'd be surprised if we gained even 5x; nice but not worth investing huge effort into.
Distributed can be done with MPI, and would enable us to do pretty big systems. But that has to be done with explicit MPI calls and it increases the complexity of the code, so I'm a bit reluctant. Possibly things like https://github.com/JuliaParallel/DistributedArrays.jl and https://github.com/JuliaParallel/Elemental.jl might make it easier, but I don't know how ready these are.
The ETSF nanoquanta parsers are now available in the utils and they could be used to simplify some test code.
Implement what is mentioned in https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/bzmesh.jl#L133.
The PSP8 pseudopotential format seems to be worth supporting for experimentation.
And simplify things like
Line 72 in 553db8b
We pass around composition
a lot and end up documenting it in several places. Should we just give it a struct?
FFTW supports multiple FFTs, and it says in the manual (http://www.fftw.org/fftw3.pdf p. 32)
Plans obtained in this way can often be faster than calling FFTW multiple times for the individual transforms
We should benchmark to see if there's really an improvement on timing (this doesn't seem to be a BLAS3 situation where there should be a large impact, but performance is weird, so I don't know...)
This would also nicely let us use FFTW multithreading optimally automatically.
One tricky complication is that with locking, the number of vectors we apply the Hamiltonian on changes from iteration to iteration. Maybe have one plan for the full nband (which is always needed at the first iteration anyway), and then fall back to the 1-by-1 version if less are required. We should look at how other codes do it.
Now that JuliaFormatter is around and reasonably stable, we could easily set something up and integrate it with our workflows using a git pre-commit hook.
It's a bit annoying to have to wait for CI to be green before merging. I've seen people use https://github.com/bors-ng/bors-ng, where you just leave a comment that it looks good to merge, and it'll merge it automatically when CI is OK (if I understand correctly). That could be nice, and it looks easy enough to setup. @mfherbst I'll do it if you agree.
Eg in update_hamiltonian!
. See discussion in #45 (comment)
Psi is the worst offender, because there is three things there: a psi (array of size Nb), a block of psis (array Nb x N), and a block of block of psis (array Nk of arrays Nb x N). We sometimes use Psi
, psi
, or various unicode names
For densities there is the RFA and the real and fourier parts. We sometimes use unicode, sometimes not
pw
, sometimes basis
kpoint
, sometimes kpt
, singular or pluralocc
, sometimes occupation
, singular or pluralProbably others...
As discussed in #104 it would be nice to have a separate energies datastructure with a nicer show method. Should essentially behave like a dict.
The Logging stdlib is very nicely done. The main design point is separating production of log messages from processing : we can define several log levels, and then decide what we want to do with them - ignore some, print some, write some to an output file, etc. In particular we can just pepper the code liberally with @info
and @debug
(or similar) and worry later about if we want to display it or not. Added benefit is we can disable some and the corresponding macro will not get evaluated, which might be a way to do expensive checks.
Any chance I can convince you to move over to JuliaMolSim
and register the package in the MolSim
registry? Once we start using it, it would be very helpful to have some minimal versioning information. Even without any expectation of backward compatibility - I generally refuse to guarantee that.
It would nice to explicitly test the examples and expensive tests from time to time. Perhaps using a different CI provider.
Based on the current callback functionality, one could implement a nicer print function, which is more specific to the problem we solve that the default NLSolve output.
Test with silicon.jl and
kgrid = [1, 4, 4] # k-Point grid
supercell = [2, 1, 1] # Lattice supercell
So, if we omit any kinetic or nonlocal terms, we get thomas-fermi like models where the density is a functional of the density only. That's quite fun for educational purposes, and also maybe useful for people working on orbital-free DFT. The diagonalization approach could maybe work but is stupid here. We should have it as an example just for the fun of it.
The following are non-FFT hotspots when running silicon with a large-ish ecut:
Also we do a lot more iterations than necessary of the eigensolver; setting maxiter to 4 gave a good speedup.
This is representative of problems with a small unit cell. Should benchmark large unit cells now.
One option to better embed examples into the documentation without code duplication could be Weave.jl. For their own documentation they use Documenter, so it could at least be compatible.
We call objects of type PlaneWaveModel basis
, so let's just rename to PlaneWaveBasis
? PWBasis
for short?
kpoints
is sometimes used for coordinates, sometimes for KPoint
, which confused me. Rename the first use to kcoords
?
They are bearable but still annoying when developing. Empirically (meaning when I ctrl-C precompilation) it seems quite a bit of time is spent in FourierTransforms.jl; can we maybe load it conditionally, or even better upstream our changes?
In case of computations at finite temperature, the energy is not variational wrt the orbitals, which makes it harder to compute derivatives such as forces. We should instead compute the free energy, which has an entropic contribution.
So this has been grating me for a while. There is an analogy between different basis sets (and common functionality) that we fail to exploit as much as we maybe could. Actually basis set is not really the proper name, "representation" would probably be better. So right now in the code we have
We want to add symmetry-adapted sets when k=0, which generalizes the first two items to real functions). There could also be a xc grid, or a PAW grid, or even whole other basis sets. All these are representations of subsets of periodic functions. We have RFA that lazily converts between two of these. Should we have something like a AbstractFunction with subtypes, and (more or less lossy) convert
methods? That'd also be a natural place for #43. Or are we just going to end up losing ourselves in abstract nonsense for minimal gain? An issue is that for a number of reasons I think we don't want to have Psi[ik] be a Vector{TF} where {TF <: AbstractionFunction} but rather a plain Matrix of real/complex numbers.
Something that would be quite useful is to be able to convert quantities (densities, wavefunctions) between grids (different lattices, different ecuts, different fft_sizes). Eg a more generalized version of interpolate_at_kpoint
. Doesn't need to be accurate.
This may be far outside your interest so feel free to just close this but have you considered allowing other basis sets, eg atomic orbitals, ...
Currently the documentation is very outdated and should be adapted to the new code structure.
Move everything to toplevel, and then make folders for specific things, like
New developments:
https://julialang.org/blog/2019/07/multithreading
JuliaMath/FFTW.jl#105
JuliaLang/julia#32786
There's also the StridedArrays package that automatically parallelizes broadcasts.
Note that there's a significant overhead for now: https://discourse.julialang.org/t/multithreaded-broadcast/26786, which appears to be a known issue that will get better at some point in the future JuliaLang/julia#32701 (comment)
So it looks like the preferred model will be that julia's scheduler handles all the threading, and the underlying libraries use julia's threads. Essentially this means that we will be able to just set JULIA_NUM_THREADS and get threaded FFT/BLAS from there. If we find out that this is too fine-grained to yield good speedup, we can add explicit annotations (eg @threads
on the loop over bands for the Hamiltonian application, or @strided
on selected time-intensive broadcasts), and that should work fine.
Similar to the ABINIT interface, an iterface to GPAW would be nice to do reference calculations.
For pretty picture purposes. We need to be a bit careful about how to impose the gauge condition at the discrete level (to get a hermitian hamiltonian): if N is the discrete nabla operator and B the magnetic field, then N and B do not necessarily commute at the discrete level even if divB = 0 at the continuous level, and so BN is not necessarily hermitian. Possibly a good way to do it is to allow arbitrary fields, and impose the divB=0 condition at the discrete level by adding an extra term to the local potential, ie BN-NB should be a local potential and we just calculate that? Or maybe we just do x dy - y dx for now, since the commutation is easier to see there.
Things I like about the current Density
Things I don't like
rho_real = real(rho)
which is pretty confusing. Possibly fixed by the previous thingreal
is bad; eg imag(real(rho))
is very weirdSuggestions for names:
I'm leaning towards FourierArray.
To replace real
I suggest we use the getproperty
overload in julia to transparently intercept calls to arr.real
(and compute it lazily if we need to).
We can parametrize by a type that tells us whether the array in real space is real (in which case the fourier representation might be a compressed one, at least in the future).
We could also think about moving basis_Crho
(which we should maybe rename) closer to that array in the code.
We can easily support broadcast in a trivial way, which would simplify some things, eg
density_from_real(basis, real(ρin) .+ m.α .* (real(ρout) .- real(ρin)))
becomes way nicer. Con: I don't know how easy it is to support broadcast on custom types. See julia issue 33453
Self-explanatory :-)
As discussed #59 our code is still full of expressions, where a Float64
result implicitly generated. This should be carefully investigated and made properly type-stable.
The usual way of doing preconditioning with the kinetic energy is essentially 1/(<K> + K)
, where K is the kinetic energy, and = <psi, K psi> where psi is the vector under consideration (right now we take constant). This means that PR = P\R does not work, and one needs a more general mechanism PR = P(X,R). There are even more general kinds of preconditioning that do not even fit in P(X,R) but instead require P(X,G) where G is the unpreconditioned gradient, with which we might want to play with @benjaminstamm. We should figure out a general mechanism for doing that.
So what's going on now is a bit weird. The assume_band_gap
thing switches between the two functions we have in occupation.jl, but the first one is sort of redundant : it's a particular case of the second one when there's a gap. The rational thing to do here is remove any mention of bandgap, and use the second function all the time, even at 0K. However I kinda like the first function because it's simpler and does not need a bisection. Maybe we disable the codepaths that lead to and mark it visually as inactive but leave it in the source and still test it?
Just to keep track of that. It'd be nice to have a struct that contains ham, psi, rho, occupations, fermi level, orbital energies, etc. and that is strongly immutable (in the sense that its bit contents don't change after creation). I think the major question is: does psi belong with the hamiltonian it creates or with the density that created it? The first one seems more natural, but is trickier to express mixing algorithms in (because a hamiltonian is created by a density, without reference to a psi). This is specific to DFT, as opposed to eg hybrid functionals where you need a psi to make the hamiltonian also.
Related to #2
In some gross-pitaevskii problems, the potential energy rather than the kinetic energy "dominates", and it is good to precondition with the potential energy. This is an interesting one because the preconditioning has to be done in real space, so good to keep that in mind for API design.
Just opening this issue to collect thoughts on how to exploit AD. A good first step to explore how feasible this is is energy_ewald()
, which is completely self-contained and should be very AD-friendly, both for forward and backward.
The current design has served us well but:
I think we should come up with a design for the most general situation possible, where we want to do direct minimization wrt the psi_nk and their occupation for eg Hartree-Fock. Then we try to join with the current design.
Mathematically, a term is a function E(D) of the density matrix (in our case represented as Psi and their occupations for all kpoints). From that we want to compute E(D) and H = nabla E(D). H is block diagonal in the sense that it maps Bloch waves to Bloch waves, which leads to Hk. Hk is something that takes as input a psi (represented by a list of coordinates in the G basis), and outputs Hk psi.
Performance-wise, I think we can do pretty much what we want (closures left and right, allocations of whatever), provided we don't screw up on the application of Hk to vectors.
So, just to get the ball rolling, how about Model contains a list (ie tuple) of terms. Each term is something that can be called like term(basis, Psi, occupation)
and returns an energy, as well as an HamiltonianTerm object, that can store whatever it wants inside, and can be called like ham_term(kpoint, Psi)
. HamiltonianBlock
would then contain a list of these things and wrap mul!
around that.
Ignoring optimizations, I think this is a nice design. Now of course the big problem with this is that terms need psi in some space and output it in some other space (eg the magnetic term needs psi in reciprocal space and outputs it in real space; not sure about the fock term, should check that). We handle this currently in mul!
in HamiltonianBlock by knowing what the terms are and how they like to be called. It'd be great if we could invert the control somehow. Traits could be one answer (ie terms could implement IsRealSpace(Term)
). Another answer could be RealFourierArray. But maybe the simplest is simply to have them work as ham_term(kpoint, Psi_real, Psi_fourier)
and output (HPsi_real, HPsi_fourier)
(which are then accumulated then Fourier transformed)?
We miss the current optimization that we accumulate all local potentials into one, but that logic we can potentially implement by term fusion (eg if multiple terms declare -eg through traits- that they're local potentials, we can fuse them).
(I promise, this is my last "hey, let's redesign DFTK completely" :-p If we manage to do something like the above, we should be good for a while - at least, until we get to response properties...)
On #40 with supercell = [3,1,1]
and display_progress=true
in the eigensolver, the eigensolver fails to converge. Not sure if there are not enough extra bands or if we're doing something wrong. Need to investigate this to do David's thing. Also not sure what convergence criteria people typically take for the iterative eigensolver, but I'm pretty sure it's way less tight than what we do now.
@mfherbst I hope you don't mind, I'm kinda using the issue tracker as a TODO list...
Add an option to perform real-to-complex and complex-to-real FFTs with a kwarg, and use that to do the potentials as real arrays. Could also do dispatch on the eltype of f_real, but it's not compatible with G_to_r(f_fourier).
Ref #30
Something like
struct Eigensolver
tol
maxiter
precon # Promises to implement precon(ham) -> P
solver # Promises to implement solver(A,X0,P,tol=tol,maxiter=maxiter)
end
default_eigensolver(tol=1e-6, maxiter=100) = Eigensolver(tol, maxiter, lobpcg_hyper)
It'd be nice to have something to just run a subset of tests
JuliaLang/Pkg.jl#1226 was recently merged, so we could have a @tset
macro that expands to a check on ARGS and only runs the test if specified. Or just wait for someone to come up with a package that does it...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.