odinn-sciml / odinn.jl Goto Github PK

View Code? Open in Web Editor NEW

68.0 68.0 10.0 64.28 MB

Global glacier model using Universal Differential Equations for climate-glacier interactions

License: MIT License

Julia 98.06% Jupyter Notebook 1.94%

climate cryosphere differential-equations glaciers julia scientific-machine-learning sciml

odinn.jl's People

Contributors

Stargazers

Watchers

Forkers

fperez climatepals varunkhajuria10 jordibolibar facusapienza21 udemirezen leoyouli vivekag7 linkryu chrisrackauckas

odinn.jl's Issues

Make `generate_batches` function general by using `varargs`

Right now the generate_batches function is more or less hardcoded to a given number of input features. Although not straightforward, since the inputs are a combination of Vectors and scalars, a variable number of input features should be automatically assimilating by using varargs:

function generate_batches(batch_size, θ, UD, features...; shuffle=true)

end

Bug with integration Fiona + OGGM + ODINN

When running toy_model.jl and having the OGGM environment correctly working with PyCall, I receive the following error coming from Fiona when executing an entity task to download and process glacier data:

in expression starting at /Users/facundosapienza/Dropbox/ODINN-dev/ODINN.jl/scripts/toy_model.jl:149

caused by: PyError ($(Expr(:escape, :(ccall(#= /Users/facundosapienza/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'AttributeError'>
AttributeError("partially initialized module 'fiona' has no attribute '_loading' (most likely due to a circular import)")
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 558, in init_glacier_directories
    gdirs = execute_entity_task(utils.GlacierDirectory, entities,
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 191, in execute_entity_task
    out = [pc(gdir) for gdir in gdirs]
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 191, in <listcomp>
    out = [pc(gdir) for gdir in gdirs]
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 108, in __call__
    res = self._call_internal(func, arg, kwargs)
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 102, in _call_internal
    return call_func(gdir, **kwargs)
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/utils/_workflow.py", line 2468, in __init__
    rgi_entity = self._read_shapefile_from_path(_shp)
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/utils/_workflow.py", line 3082, in _read_shapefile_from_path
    shp = gpd.read_file(fp)
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/geopandas/io/file.py", line 242, in _read_file
    engine = _check_engine(engine, "'read_file' function")
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/geopandas/io/file.py", line 97, in _check_engine
    _import_fiona()
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/geopandas/io/file.py", line 40, in _import_fiona
    import fiona
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/fiona/__init__.py", line 85, in <module>
    with fiona._loading.add_gdal_dll_directories():

Do you have any idea of how to fix this @JordiBolibar @fmaussion ? I am able to run the entity task and download the data, but the error happen immediately after. I observed this error just when I run ODINN in my local computer (I never observed this problem when working in the JupyterHub).

Investigate use of batches for UDE training

Right now we are training the UDE based on the full H matrix. This seems reasonable for this toy model, but we should investigate potentially more efficient ways to do so (i.e. using batches).

New API

We should improve the interface of how we launch simulations. In order to have everything in a more tidy way. We should create data structures that gather multiple parameters driving simulations.

Here's a draft of the prototype I have in mind:

Model parameters

PhysicalParameters struct to hold all parameters related to model physics.
Hyperparameters struct holding all the hyperparameters from the neural network.
SolverParameters struct holding all parameters to configure the solver.
UDEparameters struct holding all parameters for the training of the UDE. It's a child of SolverParameters, since it also includes a solver.
OGGMparameters struct wrapping OGGM's parameters
SimulationParameters struct holding the parameters defining a simulation
Parameters: struct holding all the parameter types.

Models

Machine struct holding an ML model (e.g. a neural network) + a Hyperparameters struct. Inspired by MLJ.jl. When initialized inside a Model, it automatically fetches the Hyperparameters to generate the NN.
MBmodel struct defining a mass balance model. It can have children like TImodel.
IceflowModel struct defining an ice flow model. It can have children, like SIAmodel
Model struct, contains all information about the PhysicalParameters, SolverParameters, UDEparameters, OGGMparameters, Machine, MBmodel, IceflowModel

Glaciers, intial conditions and climate data

Climate struct containing climate data series for a Glacier.
Glaciers struct containing OGGM's gdirs and additional information on the Julia side, for example its Climate.

Simulations

Simulation (inmutable?) struct holding all the data necessary for a simulation: Model, Glaciers, its type (functional inversion, inversion or forward).

To be run such as:

# Produce everything needed for a simulation
# First we create the model parameters
parameters = Parameters(; physics = PhysicsParameters(),
                                      hyper = Hyperparameters(),
                                      solver = SolverParameters(),
                                      UDE = UDEparameters(),
                                      OGGM = OGGMparameters(workspace_path),
                                      simulation = SimulationParameters())

# Then we generate the models to be run
model = Model(IceflowModel(), MBmodel(), Machine(), params)
glaciers = Glaciers(rgi_ids)
add_climate!(Glaciers, time_period)

# We create a simulation based on everything we generated before
simulation = Simulation(model, glaciers, parameters)

# We finally run the simulation
run!(simulation)

Beyond this, we should also accommodate an API similar to OGGM's, where everything is based on entity_tasks around glacier directories. Since Julia is not OO, entity tasks would just be functions with a common interface, run on Glaciers. Something like:

entity_tasks = [ODINN.tasks.DoSomethingToGlaciers(), ODINN.tasks.DoSomethingElseToGlaciers()]
apply_entity_tasks!(glaciers, entity_tasks)

Properly optimize Julia code

After some discussions in the Julia Discourse regarding Float64 vs Float32, some adjustments should be made to the code in order to avoid some performance pitfalls. Those are:

Move all code back to Float64. Apparently, unless one is using GPUs, there's no benefit on that for CPUs.
Use Int in exponents (e.g. in the flow law). See explanations in Discourse.
Avoid memory allocation in util functions following example in iceflow_sandbox model.

Clean up package dependencies

We need to double check that all the packages we're using are still relevant. This slows down precompilation time a lot.

Use Sparse Arrays for glacier matrices

Glacier matrix data includes a great number of grid cells with zeros. Once the model will be upscaled to multiple glaciers, using Sparse Arrays will help save a lot of memory.

https://docs.julialang.org/en/v1/stdlib/SparseArrays/

Error in getting climate data in parallel

get_climate returns a serialization error when run in parallel with a pmap. For now the workaround is just to run it in serial with a map, but we should investigate what is the issue behind this.

Formalize Python and OGGM installation and folder structure

We should clearly specify in the README how to and where to install Python, OGGM and PyCall. I'm not sure that these should go in this repo, but we could perhaps create a script to install everything in a correct relative path from this repo. At the very least, we should give a clear indication on what paths and folder structures are expected in order to correctly replicate what ODINN is expecting.

Make Glathida functions work in a `pmap`

Right now, calling get_glathida in a pmap results in:

ERROR: LoadError: PyError ($(Expr(:escape, :(ccall(#= /home/jovyan/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'AttributeError'>
AttributeError("Can't pickle local object 'WeakValueDictionary.__init__.<locals>.remove'")

Though not essential, since it is reasonably fast with a serial execution, it would be nice to fix this and make it work in parallel.

Initialize UA with stable value

UA, the NN to optimize A is currently initialized with random weights. The initial state should produce a stable and realistic A value in order to run the forward model with stable conditions.

Physical constrains in the NN

We know we want to impose the range of the NN that learns the value of A to cover a limited range of plausible values. The current implementation of the loss function involves the loss of misfit between the model and the ground truth and some regularization term l_A that penalizes values outside the plausible range of values for A.

I think this could lead to some optimization problems in cases where the value of A is outside this range, including suboptimal solutions or really slow convergence. For some reason I am not 100% convinced yet, the standard way of doing this with NN is using a bounded activation function in the last layer (sigmoid, tanh) and scale it to be in the desirable range of values (see for example here and here). This requires further investigation.

Potential improvements on `Parameters` subtypes

Instead of having them as a stack of global parameters, gather them in a data structure (struct or Dict) in order to tidy things up.

A as a function of long-term temperature

In order to improve the function of A, it should depend on the long-term surface temperature instead of the MB. We should take the projected CPDDs the climate data from OGGM and run similar tests to the ones we've done so far with Argentière.

Optimize code with `SparseArrays`

Right now the code is quite computationally expensive due to the use of matrices. Since a lot of values of the glacier matrix are 0, we should use the library SparseArrays to reduce the data overhead and see if this optimizes the forward model and consequently the AD with Zygote.

Speed up loss function based on `V`

I have run some benchmarks on the different loss functions based on H, V and HV. The results are the following:

H: 203.710 s (3655336127 allocations: 188.45 GiB)

V: 519.855 s (8589977971 allocations: 471.25 GiB)

HV: 480.228 s (7931932251 allocations: 434.23 GiB)

This clearly indicates that H is the fastest solution. My intuition tells me that including V is slower, possibly due to the fact that it involves dependencies on 2 matrices: Vx and Vy. Since using V should be our preferred option, we should investigate ways to speed this up. Efforts in trying to reduce the number of sampling points to compute the loss (#22) don't seem to yield any advantages. @facusapienza21 to be discussed in today's meeting.

Migrate `SnoopPrecompile.jl` to `PrecompileTools.jl`

Investigate the use of SnoopPrecompile.jl for the ODINN.jl package. So far I have already implemented it in the #invert branch, but I still haven't properly assessed if there is any benefit in terms of performance. I should also check more in detail how to correctly use, since it was just a quick test.

Make parameters local

Right now we still have some parameters as const global variables. This is OK for simplicity's sake for now, but it is annoying when re-running code, as the variables cannot be redefined.

This is also problematic for testing, since we want to make sure that we control the local environment of the simulation.

For all these reasons, we need to pass explicitly all the necessary variables (e.g. B or H₀) in context, in order to avoid having them as const.

Merge common code into functions between `iceflow.jl` and `inversions.jl`

There is some common code between the icefow and inversions workflows, which could be put in common.

This is not a priority, but for the future it would be nice to move more code infrastructure into common functions called by both workflows in order to avoid code redundancy.

Huge RAM usage when using multiprocessing - PyCall and OGGM suspected

I suspect the huge memory usage right now comes from xarray and the opening of huge .nc files with the climate data for each gdir. We should find a way to make this more memory efficient and avoid having too much stuff open at the same time. This might be worth a discussion with @fmaussion to see how it's done in OGGM.

Update format of `gdirs_climate` throughout ODINN to new batched format

The current format of gdirs_climate as a tuple is not compatible with batching. The new format needs to be an Array of tuples, instead of a Tuple of arrays.

So far I have updated the functions in climate.jl in order to support both formats. The new batched format should be extended everywhere and it should become the new norm.

Test for the forward-backward method based on analytical solution

Write a test function for both the forward and backward method(s) using the Halfar solution (see equation 39).

Create `Glacier` or `Gdir` type to store glacier state and initial conditions

Same as for the case of the Simulation type. We have too many basic types going around, and we need to tidy things up. Instead of passing so many matrices and scalars to the solver, we should gather everything in a type Glacier to be passed to simulations.

We have two options here:

Investigate if we can directly use the gdirs, and if this is technically feasible both with OrdinaryDiffEq.jl solvers and with AD in UDEs. The fact that it will be a PyObject might make things complicated.
Otherwise we could just create a Gdir type in Julia to translate it, or just call it Glacier. We should also make sure that types such as mutable structs or Dicts work correctly with solvers and AD. To be thought.

Parallelize ice flow model

After some initial tests with ParallelStencil.jl, it seems that it might be trickier than I thought to apply it with Zygote.jl, since the mutation limitation makes it not trivial.

After a discussion with Ludovic and Samuel, Samuel suggested the following:

Should I initialize the matrices with @ZeroS at each iteration in the loop and then apply the parallelized functions by reference as you did?
Allocations, in particular of big arrays, are something very costly. Thus, you should avoid that whenever possible inside performance critical code. This is why we allocate all needed arrays for computation at the beginning of our applications - outside of any loop. Thus, you should not initialize the matrices at each iteration in the loop but even before the loop.

Is it possible to apply those macros to a function WITH return values? This would allow me to add the matrix initialization within the function in order to keep things tidy, e.g.
No, this is not possible for two reasons: 1. CUDA does not allow kernels to return values; 2. we do not see any interest in coming up with a workaround to support that given that allocations are expensive and we therefore do not see any interest to have them inside kernels.
That said, you can always wrap a @parallel kernel into a normal CPU functions, where you can do whatever you would like to, e.g., to take up your example:

@parallel function _compute_dS!(S, dSdx, dSdy, ∇S²)
    @all(dSdx) = @d_xa(S)/Δx
    @all(dSdy) = @d_ya(S)/Δy
    @all(∇S²) = @av_ya(dSdx).^2 .+ @av_xa(dSdy).^2
end

function compute_dS!(S)
    ref_size = size(S)
    dSdx = @zeros(ref_size .- (1,0))
    dSdy = @zeros(ref_size .- (0,1))
    ∇S² = @zeros(ref_size .- 2)
    @parallel _compute_dS!(S, dSdx, dSdy, ∇S²)
    return dSdx, dSdy, ∇S²
end

However, as noted above, you should pre-allocate these arrays in any case for performance reasons. If you would like to avoid passing these pre-allocated arrays to the functions/kernels, then you have also the possibility to allocate what is needed at first call or when the size changes.

The function gather from ImplicitGlobalGrid allocates for example some internal buffer at first call and reuses it as long as it is big enough: https://github.com/eth-cscs/ImplicitGlobalGrid.jl/blob/master/src/gather.jl

This remains one option, and for what I understood, this would serve to parallelize operations in the PDE solver. However, there is also the possibility to parallelize the simulations for each glacier themselves. I have started working on this using the Distributed.jl library and using pmap. I've started an implementation with a parallel file in order to avoid breaking the main workflow, and I have deployed this in the JupyterHub.

Let's use this thread to keep track of the developments on this side.

Use initial glacier conditions from OGGM

A necessary feature for the toy model will be to use initial glacier conditions from OGGM. So far we've been working with Harry's files, but we need to finally move to the real framework. I have been working on and off on this for a while, so it will be my next big objective.

Update tests and implement CI

Due to all the crazy new developments we haven't updated the tests to keep up with everything. Once we settle on a new stable codebase, we should make new tests to cover the code and implement CI on GitHub.

Initialize NN weights in a more favourable space

We need to find a way to initialize the NN weights in a more favourable space. This would most likely speed up the training. A linear function with a slight positive incline could be a sensitive way to start.

Maybe @redouanelg would like to try this? This could be an easy first issue, very ML-focused :)

Get Millan et al. (2022) ice velocity data from OGGM

A new entity task is available at OGGM to download ice surface velocity data for each gdir. Learn how to use it and add it to the ice rheology inversions pipeline.

Include different numerical methods in the DE solvers and add them as parameter options.

In the current version of the iceflow_*!() functions, we are using an explicit methods where the temporal step $\Delta t$ is choosen in order to be smaller that some upper bound determined by the maximum diffusivity. Then, the diffusivity and consequently the temporal step are dependent of the parameters of the model and this is going to be backpropagated when we compute the gradients. This complicates the differenciation of the method. In order to avoid this, it is important to include the option of an explicit method with fixed (but small) temporal step that is independent of the rest of the parameters.

It remains to include this as a parameter of the model. For a first analysis, I suggest to use explicit instead of explicit-adaptive for differentiation.

Implicit method needed to solve the SIA

The manually coded explicit forward scheme is super unstable and cannot solve the SIA, as it is a stiff PDE. An implicit method is needed, so ideally we should try:

To develop a simple implicit scheme in order to correctly understand how it works.
To use DifferentialEquations.jl and DiffEqFlux.jl in order to solve the stiff SIA PDE.

Currently being investigated with @facusapienza21 .

Create workflow to download massive `gdir` data using `wget`

In order to speed up the download of large amounts of data related to gdirs it is better to use a routine based on wget, pulling from this url: https://cluster.klima.uni-bremen.de/~oggm/gdirs/oggm_v1.6/L1-L2_files/elev_bands/RGI62/b_010/L2/

The command should be the following one:

wget --recursive --no-parent --cut-dirs=3 -nH -R "index.html*" https://cluster.klima.uni-bremen.de/~oggm/path/to/your/prepro/

Global Parameter setting file

Eventually, we want to have all the global parameters of the model (A, n, rho, g) defined in a file that we can read from any of our simulations. This can also be useful to set some of the libraries we want to import.

Investigate loss function with fewer random sampling points

There's the hypothesis that computing the loss with fewer points, instead of using the full H or V matrix can result in an improved performance during backpropagation. We should test this and see if this also can act as regularization. It would be nice to come up with a sweet spot between performance and accuracy.

Create fake A(ELA) function

Create a fake function to generate A with respect to ELAs (proxy of mass balance and temperature).

Avoid instantiating packages each time we start the server

Right now we have to instantiate all the packages each time we start a new JupyterHub server. We should find a way to keep that stored in the server somehow.

LoadError: Mutating arrays is not supported using Zygote

Find a way to update H in the forward model using Zygote. Zygote cannot calculate multiple gradients if the H matrix is overwritten at each timestep.

A very similar issue is discussed here: https://discourse.julialang.org/t/mutating-arrays-not-supported/42123/10

And potential solutions are discussed here: https://github.com/rakeshvar/Zygote-Mutating-Arrays-WorkAround.jl

Train on multiple glaciers

For the preprint, it would be nice to perform the training using multiple glaciers and with real climate series for temperature data. Using around 10 different glaciers would be enough, in order to reproduce the current fake dataset of 9 different temperature series ranging -20:0 ºC.

Loss function using surface velocities

Using a loss function based on the ice surface velocities gives the following error:

Backpropagation
ERROR: LoadError: MethodError: no method matching zero(::Type{Any})
Closest candidates are:
  zero(::Type{Union{Missing, T}}) where T at missing.jl:105
  zero(::Union{Type{P}, P}) where P<:Dates.Period at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Dates/src/periods.jl:53
  zero(::FillArrays.Ones{T, N, Axes} where Axes) where {T, N} at /Users/Bolib001/.julia/packages/FillArrays/VLeUk/src/FillArrays.jl:540
  ...
Stacktrace:
  [1] zero(#unused#::Type{Any})
    @ Base ./missing.jl:106
  [2] zero(x::Vector{Any})
    @ Base ./abstractarray.jl:1085
  [3] _backmean(xs::Vector{Any}, Δ::Matrix{Float64}, #unused#::Colon)
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/lib/array.jl:311
  [4] (::Zygote.var"#649#650"{Colon, Vector{Any}})(Δ::Matrix{Float64})
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/lib/array.jl:309
  [5] (::Zygote.var"#2763#back#651"{Zygote.var"#649#650"{Colon, Vector{Any}}})(Δ::Matrix{Float64})
    @ Zygote ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:67
  [6] Pullback
    @ ~/Desktop/Jordi/Julia/odinn_toy_model/scripts/helpers/iceflow.jl:87 [inlined]
  [7] (::typeof(∂(loss)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface2.jl:0
  [8] Pullback
    @ ~/Desktop/Jordi/Julia/odinn_toy_model/scripts/helpers/iceflow.jl:43 [inlined]
  [9] (::typeof(∂(λ)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface2.jl:0
 [10] (::Zygote.var"#84#85"{Params, typeof(∂(λ)), Zygote.Context})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface.jl:343
 [11] hybrid_train!(loss::typeof(loss), glacier_ref::Dict{String, Vector{T} where T}, UA::Chain{Tuple{Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}, Dense{var"#530#537", Matrix{Float64}, Vector{Float64}}, Dense{var"#531#538", Matrix{Float64}, Vector{Float64}}, Dense{var"#sigmoid_A#534"{Int64, Float64}, Matrix{Float32}, Vector{Float32}}}}, opt::ADAM, H::Matrix{Float32}, p::Tuple{Int64, Int64, Float64, Float64, Matrix{Float32}, Vector{Float64}, Float64, Int64}, t::Int64, t₁::Int64)
    @ Main ~/Desktop/Jordi/Julia/odinn_toy_model/scripts/helpers/iceflow.jl:53
 [12] macro expansion

We should investigate this after AGU, as it makes way more sense to train the model based on surface velocities rather than the final ice thickness state.

Restart Julia for PyCall to work

Right now, in the current implementation of initialize_ODINN(), the configuration of PyCall doesn't work from scratch. After building some packages, the code doesn't find the right Pythong path, and it is necessary to close the Julia session and run the code again in order to correctly access it. We should find a workaround for this, and see where the issue comes from.

Merge redundant code related to `DiscreteCallback`

A first raw implementation of a DiscreteCallback to add the monthly MB has been added. Right now, the code is pretty much repeated between the PDE and the UDE for practical purposes. Moreover, some functions are declared within functions.

See how to properly code this in a clean way to avoid duplication and enhance modularity.

Generate training dataset with fake A function

Generate forward simulation of ice thickness and velocity fields, with around 4 snapshots for the whole simulation to train the UDEs.

Stop using random mass balances and use sequential MBs and ELAs in order to have a reproducible behaviour.

A as a function of the average of the ELAs of the last 5 years -> account for delays and slow transitions in ice temperature.

Mass balance needs to be added in small timesteps in forward model

The current raw model includes Harry's annual MB in annual timesteps, so the full annual MB is added in a single dt. The annual MB should be interpolated in smaller time steps (e.g. daily, weekly or monthly), and then be added in the forward model in a similar way that the one I coded for annual MB.

Move all dates to calendar years

So far some code is implemented using hydrological years. Following the recent migration from OGGM to calendar years, we should really follow along in order to avoid future issues.

Create ODINN package

Create modules for each file and generate package in order to be able to easily call ODINN from other repos (e.g. notebook).

The value of A in Glen's Law

The value of the diffusivity $\Gamma$ in the SIA equation is proportional to A, the constant in Glen's Law.

In our simulations we are using a different value (1.3e-24) of the one recommended in Cuffey (2010) (Chapter 3) they recommend do use A = 2.4e-24 [1 / s Pa^3].

Create Julia data structure for `gdir`

We need a Julia data structure to store some of the information from gdirs. To be seen if should be a mutable struct or what.

Simulation high level functions should take those for the simulations, instead of just more basic data structures.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

The NN only optimizes the value of A
The NN is set to generate the whole diffusivity component. This should possibly be left for later stages.