The ace.jl from acesuit

Differentiation via ChainRules.jl

We should probably redesign differentiation in a more organised way, likely following the ChainRules.jl ideas or even using ChainRules.jl directly. In particular this should enable us to leverage AD tools when needed.

Pyjulip problems

We have had problems setting up the stable ACE1 with Pyjulip. One of the problems was with Python version (need to have Python 3.8 or older, not 3.9.) The other problem is potentially with JuLIP v0.12, need 0.11. But this latter one is to be confirmed.

Constant term in ACE basis

Adding a constant term ($c\in\mathbb{C}$ for invariant basis and $$cI(2L+1)$$ for SphericalMatrix(L,L) ) to the corresponding ACE basis would not destroy the symmetry of the basis and may help improve the fitting result as well as stability (Maybe?).

We could add it directly to the basis while another thought could be that we add a mean field for ACELinearModels, which contains the mean value of data. This means that we didn't fit the coefficient for the constant term but give an expected value to it.

JuLIP compatibility

currently the Project.toml says ACE is not compatible with JuLIP version 0.11.2, which is the default JuLIP. This causes conflicts and prevent installation. When I add this version of JuLIP everything seems to work fine.

Many Zero-Columns in A2B

the A2B map has many zero columns for all of the symmetries we are considering:

introduce a filter mechanism
possibly modify the construction pibasis -> symmbasis to `coupling -> pibasis -> symmbasis\
is there an analytic reduction that we've missed?

Custom distance transform

It would be nice to have the option for implementing custom distance transforms based on simple formulae.

quickstart not working

Cleaned up my julia installation so I can use a binary that works on all my nodes, and when I took the opportunity to clean up everything and start over, the instructions in the quickstart https://github.com/ACEsuit/ACE.jl/tree/dev-v0.8.x#quickstart don't work. Linux, stock julia 1.7 binary download, and I get

(@v1.7) pkg> registry add https://github.com/JuliaMolSim/MolSim.git 
     Cloning registry from "https://github.com/JuliaMolSim/MolSim.git"
       Added registry `MolSim` to `~/.julia/registries/MolSim`

(@v1.7) pkg> add JuLIP, ACE, IPFitting
    Updating registry at `~/.julia/registries/MolSim`
    Updating git-repo `https://github.com/JuliaMolSim/MolSim.git`
   Resolving package versions...
ERROR: cannot find name corresponding to UUID 49dc2e85-a5d0-5ad3-a950-438e2897f1b9 in a registry

(@v1.7) pkg> add JuLIP, ACE
   Resolving package versions...
ERROR: cannot find name corresponding to UUID 49dc2e85-a5d0-5ad3-a950-438e2897f1b9 in a registry

Basis Selectors and Basis Constraints

I am thinking of retiring the BasisSelectors interface in favour of a BasisConstraints interface. E.g., I could imagine things like the following:

Bsel = Order(3) ∩ TotalDegree(10)
Bsel = Order(5) ∩ WTotalDegree(Dict(...)) ∩ Order(:bond, 1)

etc... no rush, but to me this feels much more intuitive than the fiddling around we are doing right now.

Degree and Basis Selector

We need to revise the interface for basis selectors. At the moment it is acting as if the "degree" is not needed, but this is not true. The one-particle basis must be sorted by degree, and the "isadmissable" function must be monotone. This all suggests that we should maybe revert to the "degree" implementation and drop the "isadmissible" version altogether.

Bug in ObjectPools

When the object pools in ACEbase.jl are implemented using a Vector, then the strangest thing happens. After pop! ing the last element, the pool gets reduced by length one but then when I acquire again the same element is returned. This doesn't always happen and I can't really consistently reproduce it. Most likely some very strange memory management bug. For now I've switched from Vectors to Sets as the pools. This fixes the bug for now, but the push! (release) operation is now 10 times slower, ca 1 us instead of < 100 ns.

Typo in `function _gensparse(...)`

Should isadmissible in line 76 (part of function _gensparse(...)) in sparsegrids.jl

if all(minvv .== 0) && isadmissible(vv) && filter(vv)

be admissible ?

Wigner and CG

Switch to
https://github.com/Jutho/WignerSymbols.jl/blob/master/README.md
??

Multispecies fitting

Doing a multi species fit generally results in a large condition number R (e.g. 1e164) for which doing a regular QR decomposition results in a LAPACK error. Performing the fit with RRQR does work.

Fix and generalise the function `norm(X::XState)`

There seems to be a typo in norm(X::XState) associated with which the MWE looks like:

julia> Xs = PositionState([1.0,1.0,1.0])

julia> @show norm(Xs)
norm(Xs) = 2.9999999999999996

julia> @show norm(Xs.rr)
norm(Xs.rr) = 1.7320508075688772

and also it won't work for States that have non-number fields, e.g.,

julia> X = ACE.State(rr = [1.0,1.0,1.0], be = :env);
julia> norm(X)

ERROR: MethodError: no method matching iterate(::Symbol)

(A very similar issue occurs in scaling(basis::PIBasis) I guess?)

File format

I am trying to settle on a file format for v1.x and would appreciate feedback on some thoughts:

there are essentially two perspectives:

Use a slim format that contains the bare minimum of information to reconstruct the bases and potentials; or
Dump the entire Julia type into a file including everything that could be recomputed at runtime.

There are some benefits to both I think. E.g., 1 will lead to MUCH smaller files, but on the other hand it can only be read and understood with the code that wrote it. 2 on the other hand will have lots of "meta-data" type information that is not needed to reconstruct the types but it will make it easy to write a parser at some point that can read it even if the original code is lost or cannot be made to run for whatever reason...

As I'm writing this I wonder whether there is a third way:

have a minimal "slim" file format as default, but provide the option of saving also this meta-data mentioned above which will be a "human-readable" specification of potential or basis.

@gabor1 your perspective would be particularly appreciated here.

Fix CI

Element-pair dependant radial transform (r0)

I would like to be able to adapt the radial basis depending on both atomic species of the atoms involved. More specifically, I would like to be able to set the radial transform (r0) based on the pair of involved elements.

Why does this make sense? Why is this important?

General: In a given training set the covalent bond radii can vary significantly. Meaning that the distance of interest varies drastically based on the elements invovled. At the most extreem case, the C-H distance is much smaller than a typical C-large atom distance.
Specific Case (Metal organic frameworks): particularly bad, in this training set, the smallest C-H distance is 0.98A , while the smallest C-Mg distane is 3A.

Should States and Properties be StaticArrays?

CC @zhanglw0521 @MatthiasSachs

What if we created the following type hierarchy:

abstract type Property{...} <: StaticArray{...} end 

struct MyProp1 <: Property
val ...
end

struct MyProp2 <: Property
val ...
end

then we have the advantage that we can leverage all of StaticArrays.jl to do arithmetic on these properties.

Tag 1.0

This question is more about "selling" than anything else. so far we've somewhat but not strictly followed semver.

@gabor1 @casv2 @dussong How do you feel about tagging v0.8.x at 1.0.0 so we have a record with enforced backward compatibility of the version that we've been using up to now?

In the meantime we continue to develop ACE as versions v0.9.x, 0.10.x, etc. and eventually tag new branch as 2.0.0?

Access Product 1p Basis Components

Implement an interface how to access the 1p basis components via a symbol. E.g.

Rn = B1p[:Rn] 
Ylm = B1p[:Ylm]

Unexplained usage of dA for multi-species

This relates to issue #6 - there is a remaining step which works around that bug, but it should not be required. Specifically the line

   fill!(dA, zero(eltype(dA)))

in oneparticlebasis.jl should not be required since the 1p basis doesn't add into the storage dA but writes into it. So somewhere we are reading lots of zeros. This suggests also that there are some operations that we shouldn't be performing.

Segfaults with ACE + J1.7

test_admodel.jl has a segfault when Zygoting the loss w.r.t. the model. (indicated in comments on main branch)

same parameters v5 and v6 seem to produce different basis; explore what changed and fix if there are bugs (@cortner)

PolyPairPots scaling function

In order to add a scaling(::PolyPairPots) function, we need to get a second derivative estimate of the OrthPolyBasis initially. Would I need to write a evaluate_2d!(.. J::OrthPolyBasis, ...) function and calculate the second derivatives over the transformed t (given [r_in,r_cut] and a transform)? Or is there a more simple estimate for the laplacian like for the RPIBasis?

Real EuclideanVector

Currently the ACE.EuclideanVector can only be of type complex, but I think it would make sense to have a real version too.

scaling_N for RPI basis

I think this function takes the PI basis functions, finds the orders and then maps to some order_weights which we can control using a dictionary D. Does this make sense? (to be placed in /src/rpi/rpi_basis.jl)

function scaling_N(basis::RPIBasis, p, D; a2b = abs)
   wwpi = scaling(basis.pibasis, p)
   wwrpi = zeros(Float64, length(basis))
   for iz0 = 1:numz(basis)
      pibasis_fncs = collect(keys(basis.pibasis.inner[iz0].b2iAA))
      orders = order.(pibasis_fncs)
      order_weights = [D[order] for order in orders]
      wwpi_iz0 = order_weights .* wwpi[basis.pibasis.inner[iz0].AAindices]
      wwrpi[basis.Bz0inds[iz0]] = a2b.(basis.A2Bmaps[iz0]) * wwpi_iz0
   end
   return wwrpi
end

with usage being

D = Dict(1 => 0.1,
    2 => 0.5,
     3 => 0.1,
     4 => 100.00)

ACE.RPI.scaling_N(rpi_basis, 2, D)

I think this returns a scaling vector the size of the rpi_basis, but with scaled laplacian coefficients according to the interaction order.

Proper implementation of scaling function

PIbasis scaling should reduce to 1pbasis scaling
1pbasis scaling should "know"about the type of variables in the state
make scaling for categorical basis

Type Instabilities in basis assembly

There is a very strange type instability in RPI basis assembly, in this line:
https://github.com/JuliaMolSim/SHIPs.jl/blob/07b35e8c078040808baa65f90dbe52c989702aa0/src/rpi/rpi_basis.jl#L224

This appears to go quite deep into an inability to determine the type of the inner basis. It is amazing that the Julia compiler can work with that anyhow, but for sure the ::Int hack needs to be removed and this type instability fixed properly.

In JuLIP.MLIPs.forces the call to evaluate_d! appears to be type-unstable as well. This has to do with the compiler not being able to infer the type of the tmp variable. Very strange. Maybe the idea of temporary variable structures need to be revisited; maybe they need to be stored in structs rather than named tuples.

Interface for Basis Specs

At the moment, the 1p basis components all have upper bounds on the degree. This restricts the product 1p basis and further then the many-body basis. In particular it makes it impossible to grow the basis in a systematic way.

So the goal is to remove this and other restrictions to allow the following reverse procedure:

choose the symmetric basis
determine the resulting product basis
determine the resulting product-1p-basis
determine the resulting 1p-basis components Rn, Ylm, etc.

All this requires a more flexible framework to set or adjust the basis spec. What makes it difficult is how the basis spec propagates the other way. This needs to be thought through carefully.

Fuzzy data key matching

JuLIP Version bounds

ACE v0.8 needs an upper bound for JuLIP, likely 0.10.x, but should be tested carefully. Problem is that ACE tests pass but IPFitting tests fail. Not clear why, some strange import problems that I couldn't track down likely caused by the introduction of ACEbase.

The property filters look like they may be broken?

Lines like this one:

   if length(bb) == 1 #MS: Not sure if this should be here

seem wrong. Need to go investigate

Ways to specify the basis

give # of 1-p basis as a parameter instead of degree
can we also give # basis fcns overall instead of a degree?

m range in product basis

In function indexrange we have the following

   # HACK: fix the m range based on the maximal l-range
   #       this needs to be suitably generalised if we have multiple
   #       (l, m) pairs, e.g. (l1, m1), (l2, m2)
   if haskey(rg, :m)
      maxl = maximum(rg[:l])
      rg[:m] = collect(-maxl:maxl)
   end

says it all ...

Unify derivative testing

For every model that we test we manually write directional derivative tests like shown below. This can most likely be unified in a little test suite implemented within ACE.Testing.

import Base.*

struct __TestSVec{T}
   val::T
end

*(a::Number, u::__TestSVec) = a * u.val
*(a::SMatrix, u::__TestSVec) = a * u.val
*(a::SArray{Tuple{N1,N2,N3}}, u::__TestSVec) where {N1, N2,N3} =
      reshape(reshape(a, Size(N1*N2, N3)) * u.val, Size(N1, N2))

Us = __TestSVec.(randn(SVector{3, Float64}, length(Xs)))
C = randn(typeof(φ.val), length(basis))
F = t -> sum( sum(c .* b.val)
                    for (c, b) in zip(C, ACE.evaluate(basis, ACEConfig(Xs + t[1] * Us))) )
dF = t -> [ sum( sum(c .* db)
                  for (c, db) in zip(C, ACE.evaluate_d(basis, ACEConfig(Xs + t[1] * Us)) * Us) ) ]
fdtest(F, dF, [0.0], verbose=true)

Object Pools

revisit the idea if switching entirely to allocating operations and employ object pools for bottlenecks. cf. branch co/pool for initial experiments with ObjectPools.jl

Generate DAG for Graph evaluators is slow

@casv2 if you have some estimates for the larger basis sets?

Purging the git history

there are some big files that need to be purged from the git history before 1.0

Save potentials with test set

add the option to create a small test set stored with the potentials to test upon loading that the potential evaluates correctly. This would be intended to ensure compatibility across versions of ACE.jl

Element Type

In order to avoid clashes with Base.eltype we may wish to use

fltype - floating point type
rfltype - real float type
cfltype - complex float type

Changing maxdeg results in unexpected force behaviour

When the maxdeg parameter is changed from 1.0 even a bit (to 0.95) to decrease the overall size of the basis set, it causes large errors in the forces (but not in the energies).

Basis functions info

It'd be nice to obtain the species and interaction order of a basis function. Also, it'd be great to go the other way, e.g. having a rough idea of an important basis function, and retrieve higher n/l/m basis functions "in that direction".

Roadmap for 1.x and 2.x

This issue is to summarise the next steps in the ACE + IPFitting game, comments and discussion are very welcome. This roadmap will be adjusted according to discussions.

ACE 1.x

Overall Goal: rock-solid and usable linear regression, some basic hyper parameter optimisation

High Priority:

fix slow basis assembly since v0.6 (cf. #10 for further work)
fix crashing multi-threaded code in IPFitting
shift to DAG evaluation code as default (cf v0.7.0)
general repulsive core (cf v0.7.0)
tests for cross-version compatibility and performance regression (will enforce compatibility only from v1.0.0 onwards) (cf. v0.7.1)
#52
general cleanup and revamp of IPFitting (@casv2)

Next steps, could be 1.x or beyond

ACE 2.x

ACE 2.x will focus on improved composability and nonlinear models

arbitrary densities
bond model (cylindrical symmetry) (CO, best to move to 2.x)
neighbour counting models
rewrite multi-species via discrete density
nonlinear combination (e.g. Finnis Sinclair) - this could be shifted to 1.x
multi-scale basis, medium-range interaction
composition via general densities
self-consistency and Coulomb

cc @gabor1 @casv2 @andresrossb

isapprox for State and DState

For testing purposes we want to compare weather or not two states are equivalent. Then we want to overload isapprox such that for two states or DStates A and B we could simply test them by using: A ≈ B.

ACE2 TODO List

Please add further items in a separate post

Merge the different coupling coefficient structures
merge the different coupling coefficient implementations
gradients (@cortner)
Spherical tensors (@zhanglw0521)
Euclidean tensors (@MatthiasSachs)
multiple n, l, m
multiplying and concatenating properties

State vs Property

Is an input variable fundamentally different from an output variable? As we are moving towards layers / message passing kind of constructions this question becomes very relevant. E.g. a position is just a EuclideanVector with a symmetry attached to it. A charge (input) is just an invariant scalar.

ACEConfig vs just anything iterable
do we distinguish site state from (i,j) pair state? The distinction is very natural in the context of interatomic potentials but not so natural in other contexts.
If we do go for 2 arguments, then it feel much more natural to choose (Xi, Xj) rather than (Xi, Xij). But is this a restriction of generality?

To make an informed decision on this, I think we need to benchmark the cost of converting (Xi, Xj) to a single Xij state that combines information from both.

acesuit / ace.jl Goto Github PK

ace.jl's People

Contributors

Stargazers

Watchers

Forkers

ace.jl's Issues

ACE 1.x

High Priority:

Next steps, could be 1.x or beyond

ACE 2.x

Recommend Projects

Recommend Topics

Recommend Org