juliamolsim / dftk.jl Goto Github PK

Density-functional toolkit

License: MIT License

Julia 98.59% Shell 1.41%

electronic-structure density-functional-theory kohn-sham toolkit mathematical-analysis julia plane-wave ab-initio numerical-analysis computational-chemistry

dftk.jl's People

Contributors

Stargazers

Watchers

Forkers

arbrefleur gitter-badger icamps ssirajdine louisponet caioleitao obaica rashidrafeek nunofernandes-plight haidi-ustc berquist ssthurai wangvei silentforest89 umbriquse jsjie dsantra92 liublgit ddffttd niklasschmitz harrisonlabollita standardgalactic computational-chemistry-research shahramyalameha cocteautwins beruy gkemlin chemicalfiend gauravmauryaios jaemolihm cnliao bkaperick paulxshen playfloor thazhemadam stevengj leticia-maria sailfish009 azadoks haampie mtowara stjordanis laurentvidal95 epolack gdufenshuoo kvnoct jrdegreeff hamletwanttocode killah-t-cell qiaojunfeng gvigne crazy-rookie msdupuy unkcpz blondegeek chronum94 berger-stefan kunyuan wwjcmp carstenbauer szkered compphysguy zhubonan jerjorg raolixiang-up bashancode bs544 xuechuang1999 a-azzali andrea-bordignon technici4n clementinebarat cedrictravelletti xuequan818 lucifer1004 wuyihanvictor mkrack dynamic-queries abussy tyjerome jaydevsr bonans

dftk.jl's Issues

HPC

Opening this issue to track ideas on that front. I think we are already pretty good at multithreading. Do we want DFTK to work on GPUs and clusters?

GPU is probably the easiest to get working with minimal changes in the code. We can probably get the entire iterative eigensolver to work on the GPU quite easily. I'm not 100% sure of what the speedup would be though; we should maybe try to benchmark FFTs. I don't think FFTs are something GPUs are good at. Even nvidia, who usually enjoy throwing around irrealistic speedups, give a relatively modest "up to 10x" on their website, so I'd be surprised if we gained even 5x; nice but not worth investing huge effort into.
Distributed can be done with MPI, and would enable us to do pretty big systems. But that has to be done with explicit MPI calls and it increases the complexity of the code, so I'm a bit reluctant. Possibly things like https://github.com/JuliaParallel/DistributedArrays.jl and https://github.com/JuliaParallel/Elemental.jl might make it easier, but I don't know how ready these are.

Use ETSF nanoquanta file structure for silicon_runners.jl

The ETSF nanoquanta parsers are now available in the utils and they could be used to simplify some test code.

Optimise fallback kpoint symmetry search

Implement what is mentioned in https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/bzmesh.jl#L133.

Implement PSP8 Pseudopotential format

The PSP8 pseudopotential format seems to be worth supporting for experimentation.

Struct for atoms

And simplify things like

DFTK.jl/src/forces.jl

Line 72 in 553db8b

count = 1

Struct for `composition`?

We pass around composition a lot and end up documenting it in several places. Should we just give it a struct?

Multiple FFTs at once

FFTW supports multiple FFTs, and it says in the manual (http://www.fftw.org/fftw3.pdf p. 32)

Plans obtained in this way can often be faster than calling FFTW multiple times for the individual transforms

We should benchmark to see if there's really an improvement on timing (this doesn't seem to be a BLAS3 situation where there should be a large impact, but performance is weird, so I don't know...)

This would also nicely let us use FFTW multithreading optimally automatically.

One tricky complication is that with locking, the number of vectors we apply the Hamiltonian on changes from iteration to iteration. Maybe have one plan for the full nband (which is always needed at the first iteration anyway), and then fall back to the 1-by-1 version if less are required. We should look at how other codes do it.

Automatic source-code formatting

Now that JuliaFormatter is around and reasonably stable, we could easily set something up and integrate it with our workflows using a git pre-commit hook.

Bors

It's a bit annoying to have to wait for CI to be green before merging. I've seen people use https://github.com/bors-ng/bors-ng, where you just leave a comment that it looks good to merge, and it'll merge it automatically when CI is OK (if I understand correctly). That could be nice, and it looks easy enough to setup. @mfherbst I'll do it if you agree.

Use less mutation

Eg in update_hamiltonian!. See discussion in #45 (comment)

Consistent names

Psi is the worst offender, because there is three things there: a psi (array of size Nb), a block of psis (array Nb x N), and a block of block of psis (array Nk of arrays Nb x N). We sometimes use Psi, psi, or various unicode names

For densities there is the RFA and the real and fourier parts. We sometimes use unicode, sometimes not

PWBasis is sometimes called pw, sometimes basis
Kpoints are sometimes kpoint, sometimes kpt, singular or plural
Occupations are sometimes occ, sometimes occupation, singular or plural

Probably others...

Energies datastructure

As discussed in #104 it would be nice to have a separate energies datastructure with a nicer show method. Should essentially behave like a dict.

Use Logging

The Logging stdlib is very nicely done. The main design point is separating production of log messages from processing : we can define several log levels, and then decide what we want to do with them - ignore some, print some, write some to an output file, etc. In particular we can just pepper the code liberally with @info and @debug (or similar) and worry later about if we want to display it or not. Added benefit is we can disable some and the corresponding macro will not get evaluated, which might be a way to do expensive checks.

Move to JuliaMolSim

Any chance I can convince you to move over to JuliaMolSim and register the package in the MolSim registry? Once we start using it, it would be very helpful to have some minimal versioning information. Even without any expectation of backward compatibility - I generally refuse to guarantee that.

Test examples and expensive tests

It would nice to explicitly test the examples and expensive tests from time to time. Perhaps using a different CI provider.

Pretty traces

Eg there's https://discourse.julialang.org/t/ann-solvertraces-jl-v0-1-0/34400

Improve convergence printing during SCF iterations.

Based on the current callback functionality, one could implement a nicer print function, which is more specific to the problem we solve that the default NLSolve output.

No corresponding symop found

Test with silicon.jl and

kgrid = [1, 4, 4]        # k-Point grid
supercell = [2, 1, 1]    # Lattice supercell

Thomas-Fermi

So, if we omit any kinetic or nonlocal terms, we get thomas-fermi like models where the density is a functional of the density only. That's quite fun for educational purposes, and also maybe useful for people working on orbital-free DFT. The diagonalization approach could maybe work but is stupid here. We should have it as an example just for the fun of it.

Performance audit

The following are non-FFT hotspots when running silicon with a large-ish ecut:

compute_density
the logic around mul!(Y,block,X) which should allocate less and fuse more. one alloc is easily dealt with, but we might want to represent the kinetic as simply an array to get it to also fuse some of the linear algebra in lobpcg

Also we do a lot more iterations than necessary of the eigensolver; setting maxiter to 4 gave a good speedup.

This is representative of problems with a small unit cell. Should benchmark large unit cells now.

Explained examples and tutorial-style documentation

One option to better embed examples into the documentation without code duplication could be Weave.jl. For their own documentation they use Documenter, so it could at least be compatible.

PlaneWaveModel and kpoints names

We call objects of type PlaneWaveModel basis, so let's just rename to PlaneWaveBasis? PWBasis for short?
kpoints is sometimes used for coordinates, sometimes for KPoint, which confused me. Rename the first use to kcoords?

Precompilation times

They are bearable but still annoying when developing. Empirically (meaning when I ctrl-C precompilation) it seems quite a bit of time is spent in FourierTransforms.jl; can we maybe load it conditionally, or even better upstream our changes?

Free energy for computations with temperature

In case of computations at finite temperature, the energy is not variational wrt the orbitals, which makes it harder to compute derivatives such as forces. We should instead compute the free energy, which has an entropic contribution.

Bases and representations

So this has been grating me for a while. There is an analogy between different basis sets (and common functionality) that we fail to exploit as much as we maybe could. Actually basis set is not really the proper name, "representation" would probably be better. So right now in the code we have

The set of G such that 1/2 |k+G|^2 <= Ecut, complex functions only
The rectangular G grid, complex functions only
The rectangular r grid, in both real (densities and potentials) and complex (everything else) versions

We want to add symmetry-adapted sets when k=0, which generalizes the first two items to real functions). There could also be a xc grid, or a PAW grid, or even whole other basis sets. All these are representations of subsets of periodic functions. We have RFA that lazily converts between two of these. Should we have something like a AbstractFunction with subtypes, and (more or less lossy) convert methods? That'd also be a natural place for #43. Or are we just going to end up losing ourselves in abstract nonsense for minimal gain? An issue is that for a number of reasons I think we don't want to have Psi[ik] be a Vector{TF} where {TF <: AbstractionFunction} but rather a plain Matrix of real/complex numbers.

Implement conversions between grids

Something that would be quite useful is to be able to convert quantities (densities, wavefunctions) between grids (different lattices, different ecuts, different fft_sizes). Eg a more generalized version of interpolate_at_kpoint. Doesn't need to be accurate.

Other Basis Sets

This may be far outside your interest so feel free to just close this but have you considered allowing other basis sets, eg atomic orbitals, ...

Remove outdated documentation

Currently the documentation is very outdated and should be adapted to the new code structure.

File structure

Move everything to toplevel, and then make folders for specific things, like

eigensolvers
scf
utils
pseudo
brillouin
terms

Test that hamiltonians and energies are consistent

Threading

New developments:
https://julialang.org/blog/2019/07/multithreading
JuliaMath/FFTW.jl#105
JuliaLang/julia#32786
There's also the StridedArrays package that automatically parallelizes broadcasts.
Note that there's a significant overhead for now: https://discourse.julialang.org/t/multithreaded-broadcast/26786, which appears to be a known issue that will get better at some point in the future JuliaLang/julia#32701 (comment)

So it looks like the preferred model will be that julia's scheduler handles all the threading, and the underlying libraries use julia's threads. Essentially this means that we will be able to just set JULIA_NUM_THREADS and get threaded FFT/BLAS from there. If we find out that this is too fine-grained to yield good speedup, we can add explicit annotations (eg @threads on the loop over bands for the Hamiltonian application, or @strided on selected time-intensive broadcasts), and that should work fine.

Add interface to GPAW

Similar to the ABINIT interface, an iterface to GPAW would be nice to do reference calculations.

Magnetic fields

For pretty picture purposes. We need to be a bit careful about how to impose the gauge condition at the discrete level (to get a hermitian hamiltonian): if N is the discrete nabla operator and B the magnetic field, then N and B do not necessarily commute at the discrete level even if divB = 0 at the continuous level, and so BN is not necessarily hermitian. Possibly a good way to do it is to allow arbitrary fields, and impose the divB=0 condition at the discrete level by adding an extra term to the local potential, ie BN-NB should be a local potential and we just calculate that? Or maybe we just do x dy - y dx for now, since the commutation is easier to see there.

Replace Density

Things I like about the current Density

Frees us from having to think about whether we have computed the Fourier or not, which is extremely nice
We can do it lazily

Things I don't like

It's not specific to density, we can do potentials in the same way.
It's not clear whether it should be a real thing or not
It doesn't support a lot of functionality, it's mainly a container type
We spend a lot of time converting back and forth; eg rho_real = real(rho) which is pretty confusing. Possibly fixed by the previous thing
real is bad; eg imag(real(rho)) is very weird

Suggestions for names:

GridArray (it's an array, it's on a grid)
CrhoArray. Consistent with our notations (once we figure out what they are), but idiosyncrasic => bad
FourierArray (it's an array with its own Fourier inside)

I'm leaning towards FourierArray.

To replace real I suggest we use the getproperty overload in julia to transparently intercept calls to arr.real (and compute it lazily if we need to).

We can parametrize by a type that tells us whether the array in real space is real (in which case the fourier representation might be a compressed one, at least in the future).

We could also think about moving basis_Crho (which we should maybe rename) closer to that array in the code.

We can easily support broadcast in a trivial way, which would simplify some things, eg
density_from_real(basis, real(ρin) .+ m.α .* (real(ρout) .- real(ρin))) becomes way nicer. Con: I don't know how easy it is to support broadcast on custom types. See julia issue 33453

Switch memoization backend

https://discourse.julialang.org/t/ann-memoization-jl/31556

Forces and stresses

Self-explanatory :-)

forces
stresses

Add tests for variational character

Without XC, the result should not depend on the fft grid size.
Without XC, the energy of a given psi should always decrease when Ecut is increased and psi is extended with zeros on the new grid
The result should not depend on Vhat(G) for |G| <= 2 Gmax, with 1/2 |Gmax|^2 = Ecut

Proper type stability

As discussed #59 our code is still full of expressions, where a Float64 result implicitly generated. This should be carefully investigated and made properly type-stable.

Adaptive kinetic preconditioning

The usual way of doing preconditioning with the kinetic energy is essentially 1/(<K> + K), where K is the kinetic energy, and = <psi, K psi> where psi is the vector under consideration (right now we take constant). This means that PR = P\R does not work, and one needs a more general mechanism PR = P(X,R). There are even more general kinds of preconditioning that do not even fit in P(X,R) but instead require P(X,G) where G is the unpreconditioned gradient, with which we might want to play with @benjaminstamm. We should figure out a general mechanism for doing that.

Simplify occupation

So what's going on now is a bit weird. The assume_band_gap thing switches between the two functions we have in occupation.jl, but the first one is sort of redundant : it's a particular case of the second one when there's a gap. The rational thing to do here is remove any mention of bandgap, and use the second function all the time, even at 0K. However I kinda like the first function because it's simpler and does not need a bisection. Maybe we disable the codepaths that lead to and mark it visually as inactive but leave it in the source and still test it?

Struct for psi and its friends

Just to keep track of that. It'd be nice to have a struct that contains ham, psi, rho, occupations, fermi level, orbital energies, etc. and that is strongly immutable (in the sense that its bit contents don't change after creation). I think the major question is: does psi belong with the hamiltonian it creates or with the density that created it? The first one seems more natural, but is trickier to express mixing algorithms in (because a hamiltonian is created by a density, without reference to a psi). This is specific to DFT, as opposed to eg hybrid functionals where you need a psi to make the hamiltonian also.

Implement ONCVPSP (Norm-converving Vanderbilt PSPs)

Related to #2

Potential energy preconditioning

In some gross-pitaevskii problems, the potential energy rather than the kinetic energy "dominates", and it is good to precondition with the potential energy. This is an interesting one because the preconditioning has to be done in real space, so good to keep that in mind for API design.

Autodiff

Just opening this issue to collect thoughts on how to exploit AD. A good first step to explore how feasible this is is energy_ewald(), which is completely self-contained and should be very AD-friendly, both for forward and backward.

Terms interface

The current design has served us well but:

it's specialized to "basic" DFT, doesn't really support eg Fock terms (where the hamiltonian depends on the psi and the occupations, not just the density) and magnetic fields (where the hamiltonian application needs gradients of psi)
it hardcodes a list of predefined terms (kinetic, external, hartree, xc, nonlocal). It'd be nice to have an interface that lets the terms pick and choose what information they need
energies sort of come as an afterthought. Eg the psp and ewald energies should be terms like the others (they just happen not to depend on the psi)
After thinking about it a lot, I think this interface where the terms are called separately for the energy and potential, and have to ask for if things are nothing or not and fill them is annoying. A term, when called, should output its energy, and a lazy object that is able to compute gradients if they are needed. That frees the caller from having to figure out exactly what temporaries the term needs.

I think we should come up with a design for the most general situation possible, where we want to do direct minimization wrt the psi_nk and their occupation for eg Hartree-Fock. Then we try to join with the current design.

Mathematically, a term is a function E(D) of the density matrix (in our case represented as Psi and their occupations for all kpoints). From that we want to compute E(D) and H = nabla E(D). H is block diagonal in the sense that it maps Bloch waves to Bloch waves, which leads to Hk. Hk is something that takes as input a psi (represented by a list of coordinates in the G basis), and outputs Hk psi.

Performance-wise, I think we can do pretty much what we want (closures left and right, allocations of whatever), provided we don't screw up on the application of Hk to vectors.

So, just to get the ball rolling, how about Model contains a list (ie tuple) of terms. Each term is something that can be called like term(basis, Psi, occupation) and returns an energy, as well as an HamiltonianTerm object, that can store whatever it wants inside, and can be called like ham_term(kpoint, Psi). HamiltonianBlock would then contain a list of these things and wrap mul! around that.

Ignoring optimizations, I think this is a nice design. Now of course the big problem with this is that terms need psi in some space and output it in some other space (eg the magnetic term needs psi in reciprocal space and outputs it in real space; not sure about the fock term, should check that). We handle this currently in mul! in HamiltonianBlock by knowing what the terms are and how they like to be called. It'd be great if we could invert the control somehow. Traits could be one answer (ie terms could implement IsRealSpace(Term)). Another answer could be RealFourierArray. But maybe the simplest is simply to have them work as ham_term(kpoint, Psi_real, Psi_fourier) and output (HPsi_real, HPsi_fourier) (which are then accumulated then Fourier transformed)?

We miss the current optimization that we accumulate all local potentials into one, but that logic we can potentially implement by term fusion (eg if multiple terms declare -eg through traits- that they're local potentials, we can fuse them).

(I promise, this is my last "hey, let's redesign DFTK completely" :-p If we manage to do something like the above, we should be good for a while - at least, until we get to response properties...)

Investigate LOBPCG convergence in silicon supercells

On #40 with supercell = [3,1,1] and display_progress=true in the eigensolver, the eigensolver fails to converge. Not sure if there are not enough extra bands or if we're doing something wrong. Need to investigate this to do David's thing. Also not sure what convergence criteria people typically take for the iterative eigensolver, but I'm pretty sure it's way less tight than what we do now.

@mfherbst I hope you don't mind, I'm kinda using the issue tracker as a TODO list...

Use real FFTs

Add an option to perform real-to-complex and complex-to-real FFTs with a kwarg, and use that to do the potentials as real arrays. Could also do dispatch on the eltype of f_real, but it's not compatible with G_to_r(f_fourier).

Figure out an API for eigensolvers/SCF/solvers

Ref #30
Something like

struct Eigensolver
    tol
    maxiter
    precon # Promises to implement precon(ham) -> P
    solver # Promises to implement solver(A,X0,P,tol=tol,maxiter=maxiter)
end

default_eigensolver(tol=1e-6, maxiter=100) = Eigensolver(tol, maxiter, lobpcg_hyper)

Only run a subset of tests

It'd be nice to have something to just run a subset of tests
JuliaLang/Pkg.jl#1226 was recently merged, so we could have a @tset macro that expands to a check on ARGS and only runs the test if specified. Or just wait for someone to come up with a package that does it...