Coder Social home page Coder Social logo

ace.jl's People

Contributors

andresrossb avatar casv2 avatar cortner avatar matthiassachs avatar zhanglw0521 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ace.jl's Issues

Differentiation via ChainRules.jl

We should probably redesign differentiation in a more organised way, likely following the ChainRules.jl ideas or even using ChainRules.jl directly. In particular this should enable us to leverage AD tools when needed.

Pyjulip problems

We have had problems setting up the stable ACE1 with Pyjulip. One of the problems was with Python version (need to have Python 3.8 or older, not 3.9.) The other problem is potentially with JuLIP v0.12, need 0.11. But this latter one is to be confirmed.

Constant term in ACE basis

Adding a constant term ($c\in\mathbb{C}$ for invariant basis and $$cI(2L+1)$$ for SphericalMatrix(L,L) ) to the corresponding ACE basis would not destroy the symmetry of the basis and may help improve the fitting result as well as stability (Maybe?).

We could add it directly to the basis while another thought could be that we add a mean field for ACELinearModels, which contains the mean value of data. This means that we didn't fit the coefficient for the constant term but give an expected value to it.

JuLIP compatibility

currently the Project.toml says ACE is not compatible with JuLIP version 0.11.2, which is the default JuLIP. This causes conflicts and prevent installation. When I add this version of JuLIP everything seems to work fine.

Many Zero-Columns in A2B

the A2B map has many zero columns for all of the symmetries we are considering:

  • introduce a filter mechanism
  • possibly modify the construction pibasis -> symmbasis to `coupling -> pibasis -> symmbasis\
  • is there an analytic reduction that we've missed?

Custom distance transform

It would be nice to have the option for implementing custom distance transforms based on simple formulae.

quickstart not working

Cleaned up my julia installation so I can use a binary that works on all my nodes, and when I took the opportunity to clean up everything and start over, the instructions in the quickstart https://github.com/ACEsuit/ACE.jl/tree/dev-v0.8.x#quickstart don't work. Linux, stock julia 1.7 binary download, and I get

(@v1.7) pkg> registry add https://github.com/JuliaMolSim/MolSim.git 
     Cloning registry from "https://github.com/JuliaMolSim/MolSim.git"
       Added registry `MolSim` to `~/.julia/registries/MolSim`

(@v1.7) pkg> add JuLIP, ACE, IPFitting
    Updating registry at `~/.julia/registries/MolSim`
    Updating git-repo `https://github.com/JuliaMolSim/MolSim.git`
   Resolving package versions...
ERROR: cannot find name corresponding to UUID 49dc2e85-a5d0-5ad3-a950-438e2897f1b9 in a registry

(@v1.7) pkg> add JuLIP, ACE
   Resolving package versions...
ERROR: cannot find name corresponding to UUID 49dc2e85-a5d0-5ad3-a950-438e2897f1b9 in a registry

Basis Selectors and Basis Constraints

I am thinking of retiring the BasisSelectors interface in favour of a BasisConstraints interface. E.g., I could imagine things like the following:

Bsel = Order(3) ∩ TotalDegree(10)
Bsel = Order(5) ∩ WTotalDegree(Dict(...)) ∩ Order(:bond, 1)

etc... no rush, but to me this feels much more intuitive than the fiddling around we are doing right now.

Degree and Basis Selector

We need to revise the interface for basis selectors. At the moment it is acting as if the "degree" is not needed, but this is not true. The one-particle basis must be sorted by degree, and the "isadmissable" function must be monotone. This all suggests that we should maybe revert to the "degree" implementation and drop the "isadmissible" version altogether.

Bug in ObjectPools

When the object pools in ACEbase.jl are implemented using a Vector, then the strangest thing happens. After pop! ing the last element, the pool gets reduced by length one but then when I acquire again the same element is returned. This doesn't always happen and I can't really consistently reproduce it. Most likely some very strange memory management bug. For now I've switched from Vectors to Sets as the pools. This fixes the bug for now, but the push! (release) operation is now 10 times slower, ca 1 us instead of < 100 ns.

Typo in `function _gensparse(...)`

Should isadmissible in line 76 (part of function _gensparse(...)) in sparsegrids.jl

if all(minvv .== 0) && isadmissible(vv) && filter(vv)

be admissible ?

Multispecies fitting

Doing a multi species fit generally results in a large condition number R (e.g. 1e164) for which doing a regular QR decomposition results in a LAPACK error. Performing the fit with RRQR does work.

Fix and generalise the function `norm(X::XState)`

There seems to be a typo in norm(X::XState) associated with which the MWE looks like:

julia> Xs = PositionState([1.0,1.0,1.0])

julia> @show norm(Xs)
norm(Xs) = 2.9999999999999996

julia> @show norm(Xs.rr)
norm(Xs.rr) = 1.7320508075688772

and also it won't work for States that have non-number fields, e.g.,

julia> X = ACE.State(rr = [1.0,1.0,1.0], be = :env);
julia> norm(X)

ERROR: MethodError: no method matching iterate(::Symbol)

(A very similar issue occurs in scaling(basis::PIBasis) I guess?)

File format

I am trying to settle on a file format for v1.x and would appreciate feedback on some thoughts:

there are essentially two perspectives:

  1. Use a slim format that contains the bare minimum of information to reconstruct the bases and potentials; or
  2. Dump the entire Julia type into a file including everything that could be recomputed at runtime.

There are some benefits to both I think. E.g., 1 will lead to MUCH smaller files, but on the other hand it can only be read and understood with the code that wrote it. 2 on the other hand will have lots of "meta-data" type information that is not needed to reconstruct the types but it will make it easy to write a parser at some point that can read it even if the original code is lost or cannot be made to run for whatever reason...

As I'm writing this I wonder whether there is a third way:

  1. have a minimal "slim" file format as default, but provide the option of saving also this meta-data mentioned above which will be a "human-readable" specification of potential or basis.

@gabor1 your perspective would be particularly appreciated here.

Element-pair dependant radial transform (r0)

I would like to be able to adapt the radial basis depending on both atomic species of the atoms involved. More specifically, I would like to be able to set the radial transform (r0) based on the pair of involved elements.

Why does this make sense? Why is this important?

  • General: In a given training set the covalent bond radii can vary significantly. Meaning that the distance of interest varies drastically based on the elements invovled. At the most extreem case, the C-H distance is much smaller than a typical C-large atom distance.
  • Specific Case (Metal organic frameworks): particularly bad, in this training set, the smallest C-H distance is 0.98A , while the smallest C-Mg distane is 3A.

Should States and Properties be StaticArrays?

CC @zhanglw0521 @MatthiasSachs

What if we created the following type hierarchy:

abstract type Property{...} <: StaticArray{...} end 

struct MyProp1 <: Property
val ...
end

struct MyProp2 <: Property
val ...
end

then we have the advantage that we can leverage all of StaticArrays.jl to do arithmetic on these properties.

See also this part of the documentation

One reason I started thinking about this is I am hoping that this might help with derivatives - taking derivatives increases the dimensionality. How best to handle this? Should we take vectors of properties, or should we create a higher-dimensional array?

Tag 1.0

This question is more about "selling" than anything else. so far we've somewhat but not strictly followed semver.

@gabor1 @casv2 @dussong How do you feel about tagging v0.8.x at 1.0.0 so we have a record with enforced backward compatibility of the version that we've been using up to now?

In the meantime we continue to develop ACE as versions v0.9.x, 0.10.x, etc. and eventually tag new branch as 2.0.0?

Unexplained usage of dA for multi-species

This relates to issue #6 - there is a remaining step which works around that bug, but it should not be required. Specifically the line

   fill!(dA, zero(eltype(dA)))

in oneparticlebasis.jl should not be required since the 1p basis doesn't add into the storage dA but writes into it. So somewhere we are reading lots of zeros. This suggests also that there are some operations that we shouldn't be performing.

Segfaults with ACE + J1.7

test_admodel.jl has a segfault when Zygoting the loss w.r.t. the model. (indicated in comments on main branch)

PolyPairPots scaling function

In order to add a scaling(::PolyPairPots) function, we need to get a second derivative estimate of the OrthPolyBasis initially. Would I need to write a evaluate_2d!(.. J::OrthPolyBasis, ...) function and calculate the second derivatives over the transformed t (given [r_in,r_cut] and a transform)? Or is there a more simple estimate for the laplacian like for the RPIBasis?

Real EuclideanVector

Currently the ACE.EuclideanVector can only be of type complex, but I think it would make sense to have a real version too.

scaling_N for RPI basis

I think this function takes the PI basis functions, finds the orders and then maps to some order_weights which we can control using a dictionary D. Does this make sense? (to be placed in /src/rpi/rpi_basis.jl)

function scaling_N(basis::RPIBasis, p, D; a2b = abs)
   wwpi = scaling(basis.pibasis, p)
   wwrpi = zeros(Float64, length(basis))
   for iz0 = 1:numz(basis)
      pibasis_fncs = collect(keys(basis.pibasis.inner[iz0].b2iAA))
      orders = order.(pibasis_fncs)
      order_weights = [D[order] for order in orders]
      wwpi_iz0 = order_weights .* wwpi[basis.pibasis.inner[iz0].AAindices]
      wwrpi[basis.Bz0inds[iz0]] = a2b.(basis.A2Bmaps[iz0]) * wwpi_iz0
   end
   return wwrpi
end

with usage being

D = Dict(1 => 0.1,
    2 => 0.5,
     3 => 0.1,
     4 => 100.00)

ACE.RPI.scaling_N(rpi_basis, 2, D)

I think this returns a scaling vector the size of the rpi_basis, but with scaled laplacian coefficients according to the interaction order.

Type Instabilities in basis assembly

This appears to go quite deep into an inability to determine the type of the inner basis. It is amazing that the Julia compiler can work with that anyhow, but for sure the ::Int hack needs to be removed and this type instability fixed properly.

  • In JuLIP.MLIPs.forces the call to evaluate_d! appears to be type-unstable as well. This has to do with the compiler not being able to infer the type of the tmp variable. Very strange. Maybe the idea of temporary variable structures need to be revisited; maybe they need to be stored in structs rather than named tuples.

Interface for Basis Specs

At the moment, the 1p basis components all have upper bounds on the degree. This restricts the product 1p basis and further then the many-body basis. In particular it makes it impossible to grow the basis in a systematic way.

So the goal is to remove this and other restrictions to allow the following reverse procedure:

  1. choose the symmetric basis
  2. determine the resulting product basis
  3. determine the resulting product-1p-basis
  4. determine the resulting 1p-basis components Rn, Ylm, etc.

All this requires a more flexible framework to set or adjust the basis spec. What makes it difficult is how the basis spec propagates the other way. This needs to be thought through carefully.

JuLIP Version bounds

ACE v0.8 needs an upper bound for JuLIP, likely 0.10.x, but should be tested carefully. Problem is that ACE tests pass but IPFitting tests fail. Not clear why, some strange import problems that I couldn't track down likely caused by the introduction of ACEbase.

Ways to specify the basis

  • give # of 1-p basis as a parameter instead of degree
  • can we also give # basis fcns overall instead of a degree?

m range in product basis

In function indexrange we have the following

   # HACK: fix the m range based on the maximal l-range
   #       this needs to be suitably generalised if we have multiple
   #       (l, m) pairs, e.g. (l1, m1), (l2, m2)
   if haskey(rg, :m)
      maxl = maximum(rg[:l])
      rg[:m] = collect(-maxl:maxl)
   end

says it all ...

Unify derivative testing

For every model that we test we manually write directional derivative tests like shown below. This can most likely be unified in a little test suite implemented within ACE.Testing.

import Base.*

struct __TestSVec{T}
   val::T
end

*(a::Number, u::__TestSVec) = a * u.val
*(a::SMatrix, u::__TestSVec) = a * u.val
*(a::SArray{Tuple{N1,N2,N3}}, u::__TestSVec) where {N1, N2,N3} =
      reshape(reshape(a, Size(N1*N2, N3)) * u.val, Size(N1, N2))

Us = __TestSVec.(randn(SVector{3, Float64}, length(Xs)))
C = randn(typeof.val), length(basis))
F = t -> sum( sum(c .* b.val)
                    for (c, b) in zip(C, ACE.evaluate(basis, ACEConfig(Xs + t[1] * Us))) )
dF = t -> [ sum( sum(c .* db)
                  for (c, db) in zip(C, ACE.evaluate_d(basis, ACEConfig(Xs + t[1] * Us)) * Us) ) ]
fdtest(F, dF, [0.0], verbose=true)

Object Pools

revisit the idea if switching entirely to allocating operations and employ object pools for bottlenecks. cf. branch co/pool for initial experiments with ObjectPools.jl

Save potentials with test set

add the option to create a small test set stored with the potentials to test upon loading that the potential evaluates correctly. This would be intended to ensure compatibility across versions of ACE.jl

Element Type

In order to avoid clashes with Base.eltype we may wish to use

  • fltype - floating point type
  • rfltype - real float type
  • cfltype - complex float type

Basis functions info

It'd be nice to obtain the species and interaction order of a basis function. Also, it'd be great to go the other way, e.g. having a rough idea of an important basis function, and retrieve higher n/l/m basis functions "in that direction".

Roadmap for 1.x and 2.x

This issue is to summarise the next steps in the ACE + IPFitting game, comments and discussion are very welcome. This roadmap will be adjusted according to discussions.

ACE 1.x

Overall Goal: rock-solid and usable linear regression, some basic hyper parameter optimisation

High Priority:

  • fix slow basis assembly since v0.6 (cf. #10 for further work)
  • fix crashing multi-threaded code in IPFitting
  • shift to DAG evaluation code as default (cf v0.7.0)
  • general repulsive core (cf v0.7.0)
  • tests for cross-version compatibility and performance regression (will enforce compatibility only from v1.0.0 onwards) (cf. v0.7.1)
  • #52
  • general cleanup and revamp of IPFitting (@casv2)

Next steps, could be 1.x or beyond

  • Create a "wrapper" module that gives a nicer user-interface that will ideally remain independent of developments on the ACE backend (to discuss with Cas)
  • Generate basis for fixed cost rather than for fixed degree; cf. #2
  • general sure-fire way to generate repulsive cores (Gabor, Cas, CO to discuss)
  • real basis (CO); this should give an extra factor 2
  • optimisation of distance transform (Andres)
  • optimisation of radial basis (Andres)
  • Sparsification of basis (IPFitting, sparse priors, Matthias, ...)
  • automated weight selection through iteratively reweighed least squares (IPFitting)
  • Orthogonal basis (= pure basis; to be discussed with Simon)
  • Revisit regularisation with different metrics
  • kernel and GP type regression using ACE basis as feature vectors
  • Incorporate aPIPs via code-generated DAG evaluation

ACE 2.x

ACE 2.x will focus on improved composability and nonlinear models

  • arbitrary densities
  • bond model (cylindrical symmetry) (CO, best to move to 2.x)
  • neighbour counting models
  • rewrite multi-species via discrete density
  • nonlinear combination (e.g. Finnis Sinclair) - this could be shifted to 1.x
  • multi-scale basis, medium-range interaction
  • composition via general densities
  • self-consistency and Coulomb

cc @gabor1 @casv2 @andresrossb

isapprox for State and DState

For testing purposes we want to compare weather or not two states are equivalent. Then we want to overload isapprox such that for two states or DStates A and B we could simply test them by using: A ≈ B.

ACE2 TODO List

Please add further items in a separate post

  • Merge the different coupling coefficient structures
  • merge the different coupling coefficient implementations
  • gradients (@cortner)
  • Spherical tensors (@zhanglw0521)
  • Euclidean tensors (@MatthiasSachs)
  • multiple n, l, m
  • multiplying and concatenating properties

State vs Property

Is an input variable fundamentally different from an output variable? As we are moving towards layers / message passing kind of constructions this question becomes very relevant. E.g. a position is just a EuclideanVector with a symmetry attached to it. A charge (input) is just an invariant scalar.

Decision on ACE Arguments

There are a few (related) decisions to be made:

  • ACEConfig vs just anything iterable
  • do we distinguish site state from (i,j) pair state? The distinction is very natural in the context of interatomic potentials but not so natural in other contexts.
  • If we do go for 2 arguments, then it feel much more natural to choose (Xi, Xj) rather than (Xi, Xij). But is this a restriction of generality?

To make an informed decision on this, I think we need to benchmark the cost of converting (Xi, Xj) to a single Xij state that combines information from both.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.