odinn-sciml / odinn.jl Goto Github PK
View Code? Open in Web Editor NEWGlobal glacier model using Universal Differential Equations for climate-glacier interactions
License: MIT License
Global glacier model using Universal Differential Equations for climate-glacier interactions
License: MIT License
Right now the generate_batches
function is more or less hardcoded to a given number of input features. Although not straightforward, since the inputs are a combination of Vector
s and scalars, a variable number of input features should be automatically assimilating by using varargs
:
function generate_batches(batch_size, θ, UD, features...; shuffle=true)
end
When running toy_model.jl
and having the OGGM environment correctly working with PyCall, I receive the following error coming from Fiona when executing an entity task to download and process glacier data:
in expression starting at /Users/facundosapienza/Dropbox/ODINN-dev/ODINN.jl/scripts/toy_model.jl:149
caused by: PyError ($(Expr(:escape, :(ccall(#= /Users/facundosapienza/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'AttributeError'>
AttributeError("partially initialized module 'fiona' has no attribute '_loading' (most likely due to a circular import)")
File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 558, in init_glacier_directories
gdirs = execute_entity_task(utils.GlacierDirectory, entities,
File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 191, in execute_entity_task
out = [pc(gdir) for gdir in gdirs]
File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 191, in <listcomp>
out = [pc(gdir) for gdir in gdirs]
File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 108, in __call__
res = self._call_internal(func, arg, kwargs)
File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 102, in _call_internal
return call_func(gdir, **kwargs)
File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/utils/_workflow.py", line 2468, in __init__
rgi_entity = self._read_shapefile_from_path(_shp)
File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/utils/_workflow.py", line 3082, in _read_shapefile_from_path
shp = gpd.read_file(fp)
File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/geopandas/io/file.py", line 242, in _read_file
engine = _check_engine(engine, "'read_file' function")
File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/geopandas/io/file.py", line 97, in _check_engine
_import_fiona()
File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/geopandas/io/file.py", line 40, in _import_fiona
import fiona
File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/fiona/__init__.py", line 85, in <module>
with fiona._loading.add_gdal_dll_directories():
Do you have any idea of how to fix this @JordiBolibar @fmaussion ? I am able to run the entity task and download the data, but the error happen immediately after. I observed this error just when I run ODINN in my local computer (I never observed this problem when working in the JupyterHub).
Right now we are training the UDE based on the full H
matrix. This seems reasonable for this toy model, but we should investigate potentially more efficient ways to do so (i.e. using batches).
We should improve the interface of how we launch simulations. In order to have everything in a more tidy way. We should create data structures that gather multiple parameters driving simulations.
Here's a draft of the prototype I have in mind:
Model parameters
PhysicalParameters
struct to hold all parameters related to model physics.
Hyperparameters
struct holding all the hyperparameters from the neural network.
SolverParameters
struct holding all parameters to configure the solver.
UDEparameters
struct holding all parameters for the training of the UDE. It's a child of SolverParameters
, since it also includes a solver.
OGGMparameters
struct wrapping OGGM's parameters
SimulationParameters
struct holding the parameters defining a simulation
Parameters
: struct holding all the parameter types.
Models
Machine
struct holding an ML model (e.g. a neural network) + a Hyperparameters
struct. Inspired by MLJ.jl
. When initialized inside a Model
, it automatically fetches the Hyperparameters
to generate the NN.
MBmodel
struct defining a mass balance model. It can have children like TImodel
.
IceflowModel
struct defining an ice flow model. It can have children, like SIAmodel
Model
struct, contains all information about the PhysicalParameters
, SolverParameters
, UDEparameters
, OGGMparameters
, Machine
, MBmodel
, IceflowModel
Glaciers, intial conditions and climate data
Climate
struct containing climate data series for a Glacier
.
Glaciers
struct containing OGGM's gdir
s and additional information on the Julia side, for example its Climate
.
Simulations
Simulation
(inmutable?) struct holding all the data necessary for a simulation: Model
, Glacier
s, its type (functional inversion
, inversion
or forward
).To be run such as:
# Produce everything needed for a simulation
# First we create the model parameters
parameters = Parameters(; physics = PhysicsParameters(),
hyper = Hyperparameters(),
solver = SolverParameters(),
UDE = UDEparameters(),
OGGM = OGGMparameters(workspace_path),
simulation = SimulationParameters())
# Then we generate the models to be run
model = Model(IceflowModel(), MBmodel(), Machine(), params)
glaciers = Glaciers(rgi_ids)
add_climate!(Glaciers, time_period)
# We create a simulation based on everything we generated before
simulation = Simulation(model, glaciers, parameters)
# We finally run the simulation
run!(simulation)
Beyond this, we should also accommodate an API similar to OGGM's, where everything is based on entity_tasks
around glacier directories. Since Julia is not OO, entity tasks would just be functions with a common interface, run on Glaciers
. Something like:
entity_tasks = [ODINN.tasks.DoSomethingToGlaciers(), ODINN.tasks.DoSomethingElseToGlaciers()]
apply_entity_tasks!(glaciers, entity_tasks)
After some discussions in the Julia Discourse regarding Float64
vs Float32
, some adjustments should be made to the code in order to avoid some performance pitfalls. Those are:
Float64
. Apparently, unless one is using GPUs, there's no benefit on that for CPUs.Int
in exponents (e.g. in the flow law). See explanations in Discourse.iceflow_sandbox
model.We need to double check that all the packages we're using are still relevant. This slows down precompilation time a lot.
Glacier matrix data includes a great number of grid cells with zeros. Once the model will be upscaled to multiple glaciers, using Sparse Arrays will help save a lot of memory.
get_climate
returns a serialization error when run in parallel with a pmap
. For now the workaround is just to run it in serial with a map
, but we should investigate what is the issue behind this.
We should clearly specify in the README how to and where to install Python, OGGM and PyCall. I'm not sure that these should go in this repo, but we could perhaps create a script to install everything in a correct relative path from this repo. At the very least, we should give a clear indication on what paths and folder structures are expected in order to correctly replicate what ODINN is expecting.
Right now, calling get_glathida
in a pmap
results in:
ERROR: LoadError: PyError ($(Expr(:escape, :(ccall(#= /home/jovyan/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'AttributeError'>
AttributeError("Can't pickle local object 'WeakValueDictionary.__init__.<locals>.remove'")
Though not essential, since it is reasonably fast with a serial execution, it would be nice to fix this and make it work in parallel.
UA
, the NN to optimize A
is currently initialized with random weights. The initial state should produce a stable and realistic A
value in order to run the forward model with stable conditions.
We know we want to impose the range of the NN that learns the value of A to cover a limited range of plausible values. The current implementation of the loss function involves the loss of misfit between the model and the ground truth and some regularization term l_A
that penalizes values outside the plausible range of values for A.
I think this could lead to some optimization problems in cases where the value of A is outside this range, including suboptimal solutions or really slow convergence. For some reason I am not 100% convinced yet, the standard way of doing this with NN is using a bounded activation function in the last layer (sigmoid, tanh) and scale it to be in the desirable range of values (see for example here and here). This requires further investigation.
Instead of having them as a stack of global parameters, gather them in a data structure (struct
or Dict
) in order to tidy things up.
In order to improve the function of A
, it should depend on the long-term surface temperature instead of the MB. We should take the projected CPDDs the climate data from OGGM and run similar tests to the ones we've done so far with Argentière.
Right now the code is quite computationally expensive due to the use of matrices. Since a lot of values of the glacier matrix are 0, we should use the library SparseArrays
to reduce the data overhead and see if this optimizes the forward model and consequently the AD with Zygote.
I have run some benchmarks on the different loss functions based on H
, V
and HV
. The results are the following:
H
: 203.710 s (3655336127 allocations: 188.45 GiB)
V
: 519.855 s (8589977971 allocations: 471.25 GiB)
HV
: 480.228 s (7931932251 allocations: 434.23 GiB)
This clearly indicates that H
is the fastest solution. My intuition tells me that including V
is slower, possibly due to the fact that it involves dependencies on 2 matrices: Vx
and Vy
. Since using V
should be our preferred option, we should investigate ways to speed this up. Efforts in trying to reduce the number of sampling points to compute the loss (#22) don't seem to yield any advantages. @facusapienza21 to be discussed in today's meeting.
Investigate the use of SnoopPrecompile.jl
for the ODINN.jl
package. So far I have already implemented it in the #invert branch, but I still haven't properly assessed if there is any benefit in terms of performance. I should also check more in detail how to correctly use, since it was just a quick test.
Right now we still have some parameters as const
global variables. This is OK for simplicity's sake for now, but it is annoying when re-running code, as the variables cannot be redefined.
This is also problematic for testing, since we want to make sure that we control the local environment of the simulation.
For all these reasons, we need to pass explicitly all the necessary variables (e.g. B
or H₀
) in context
, in order to avoid having them as const
.
There is some common code between the icefow
and inversions
workflows, which could be put in common.
This is not a priority, but for the future it would be nice to move more code infrastructure into common functions called by both workflows in order to avoid code redundancy.
I suspect the huge memory usage right now comes from xarray and the opening of huge .nc files with the climate data for each gdir. We should find a way to make this more memory efficient and avoid having too much stuff open at the same time. This might be worth a discussion with @fmaussion to see how it's done in OGGM.
The current format of gdirs_climate
as a tuple is not compatible with batching. The new format needs to be an Array of tuples, instead of a Tuple of arrays.
So far I have updated the functions in climate.jl
in order to support both formats. The new batched format should be extended everywhere and it should become the new norm.
Write a test function for both the forward and backward method(s) using the Halfar solution (see equation 39).
Same as for the case of the Simulation
type. We have too many basic types going around, and we need to tidy things up. Instead of passing so many matrices and scalars to the solver, we should gather everything in a type Glacier
to be passed to simulations.
We have two options here:
gdirs
, and if this is technically feasible both with OrdinaryDiffEq.jl solvers and with AD in UDEs. The fact that it will be a PyObject
might make things complicated.Gdir
type in Julia to translate it, or just call it Glacier
. We should also make sure that types such as mutable structs or Dicts work correctly with solvers and AD. To be thought.After some initial tests with ParallelStencil.jl, it seems that it might be trickier than I thought to apply it with Zygote.jl, since the mutation limitation makes it not trivial.
After a discussion with Ludovic and Samuel, Samuel suggested the following:
- Should I initialize the matrices with @ZeroS at each iteration in the loop and then apply the parallelized functions by reference as you did?
Allocations, in particular of big arrays, are something very costly. Thus, you should avoid that whenever possible inside performance critical code. This is why we allocate all needed arrays for computation at the beginning of our applications - outside of any loop. Thus, you should not initialize the matrices at each iteration in the loop but even before the loop.
- Is it possible to apply those macros to a function WITH return values? This would allow me to add the matrix initialization within the function in order to keep things tidy, e.g.
No, this is not possible for two reasons: 1. CUDA does not allow kernels to return values; 2. we do not see any interest in coming up with a workaround to support that given that allocations are expensive and we therefore do not see any interest to have them inside kernels.
That said, you can always wrap a @parallel kernel into a normal CPU functions, where you can do whatever you would like to, e.g., to take up your example:
@parallel function _compute_dS!(S, dSdx, dSdy, ∇S²)
@all(dSdx) = @d_xa(S)/Δx
@all(dSdy) = @d_ya(S)/Δy
@all(∇S²) = @av_ya(dSdx).^2 .+ @av_xa(dSdy).^2
end
function compute_dS!(S)
ref_size = size(S)
dSdx = @zeros(ref_size .- (1,0))
dSdy = @zeros(ref_size .- (0,1))
∇S² = @zeros(ref_size .- 2)
@parallel _compute_dS!(S, dSdx, dSdy, ∇S²)
return dSdx, dSdy, ∇S²
end
However, as noted above, you should pre-allocate these arrays in any case for performance reasons. If you would like to avoid passing these pre-allocated arrays to the functions/kernels, then you have also the possibility to allocate what is needed at first call or when the size changes.
The function gather from ImplicitGlobalGrid allocates for example some internal buffer at first call and reuses it as long as it is big enough: https://github.com/eth-cscs/ImplicitGlobalGrid.jl/blob/master/src/gather.jl
This remains one option, and for what I understood, this would serve to parallelize operations in the PDE solver. However, there is also the possibility to parallelize the simulations for each glacier themselves. I have started working on this using the Distributed.jl library and using pmap
. I've started an implementation with a parallel file in order to avoid breaking the main workflow, and I have deployed this in the JupyterHub.
Let's use this thread to keep track of the developments on this side.
A necessary feature for the toy model will be to use initial glacier conditions from OGGM. So far we've been working with Harry's files, but we need to finally move to the real framework. I have been working on and off on this for a while, so it will be my next big objective.
Due to all the crazy new developments we haven't updated the tests to keep up with everything. Once we settle on a new stable codebase, we should make new tests to cover the code and implement CI on GitHub.
We need to find a way to initialize the NN weights in a more favourable space. This would most likely speed up the training. A linear function with a slight positive incline could be a sensitive way to start.
Maybe @redouanelg would like to try this? This could be an easy first issue, very ML-focused :)
A new entity task is available at OGGM to download ice surface velocity data for each gdir
. Learn how to use it and add it to the ice rheology inversions pipeline.
In the current version of the iceflow_*!()
functions, we are using an explicit methods where the temporal step $\Delta t$
is choosen in order to be smaller that some upper bound determined by the maximum diffusivity. Then, the diffusivity and consequently the temporal step are dependent of the parameters of the model and this is going to be backpropagated when we compute the gradients. This complicates the differenciation of the method. In order to avoid this, it is important to include the option of an explicit method with fixed (but small) temporal step that is independent of the rest of the parameters.
It remains to include this as a parameter of the model. For a first analysis, I suggest to use explicit
instead of explicit-adaptive
for differentiation.
The manually coded explicit forward scheme is super unstable and cannot solve the SIA, as it is a stiff PDE. An implicit method is needed, so ideally we should try:
Currently being investigated with @facusapienza21 .
In order to speed up the download of large amounts of data related to gdirs
it is better to use a routine based on wget
, pulling from this url: https://cluster.klima.uni-bremen.de/~oggm/gdirs/oggm_v1.6/L1-L2_files/elev_bands/RGI62/b_010/L2/
The command should be the following one:
wget --recursive --no-parent --cut-dirs=3 -nH -R "index.html*" https://cluster.klima.uni-bremen.de/~oggm/path/to/your/prepro/
Eventually, we want to have all the global parameters of the model (A, n, rho, g) defined in a file that we can read from any of our simulations. This can also be useful to set some of the libraries we want to import.
There's the hypothesis that computing the loss with fewer points, instead of using the full H
or V
matrix can result in an improved performance during backpropagation. We should test this and see if this also can act as regularization. It would be nice to come up with a sweet spot between performance and accuracy.
Create a fake function to generate A with respect to ELAs (proxy of mass balance and temperature).
Right now we have to instantiate all the packages each time we start a new JupyterHub server. We should find a way to keep that stored in the server somehow.
Find a way to update H in the forward model using Zygote. Zygote cannot calculate multiple gradients if the H matrix is overwritten at each timestep.
A very similar issue is discussed here: https://discourse.julialang.org/t/mutating-arrays-not-supported/42123/10
And potential solutions are discussed here: https://github.com/rakeshvar/Zygote-Mutating-Arrays-WorkAround.jl
For the preprint, it would be nice to perform the training using multiple glaciers and with real climate series for temperature data. Using around 10 different glaciers would be enough, in order to reproduce the current fake dataset of 9 different temperature series ranging -20:0
ºC.
Using a loss function based on the ice surface velocities gives the following error:
Backpropagation
ERROR: LoadError: MethodError: no method matching zero(::Type{Any})
Closest candidates are:
zero(::Type{Union{Missing, T}}) where T at missing.jl:105
zero(::Union{Type{P}, P}) where P<:Dates.Period at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Dates/src/periods.jl:53
zero(::FillArrays.Ones{T, N, Axes} where Axes) where {T, N} at /Users/Bolib001/.julia/packages/FillArrays/VLeUk/src/FillArrays.jl:540
...
Stacktrace:
[1] zero(#unused#::Type{Any})
@ Base ./missing.jl:106
[2] zero(x::Vector{Any})
@ Base ./abstractarray.jl:1085
[3] _backmean(xs::Vector{Any}, Δ::Matrix{Float64}, #unused#::Colon)
@ Zygote ~/.julia/packages/Zygote/rv6db/src/lib/array.jl:311
[4] (::Zygote.var"#649#650"{Colon, Vector{Any}})(Δ::Matrix{Float64})
@ Zygote ~/.julia/packages/Zygote/rv6db/src/lib/array.jl:309
[5] (::Zygote.var"#2763#back#651"{Zygote.var"#649#650"{Colon, Vector{Any}}})(Δ::Matrix{Float64})
@ Zygote ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:67
[6] Pullback
@ ~/Desktop/Jordi/Julia/odinn_toy_model/scripts/helpers/iceflow.jl:87 [inlined]
[7] (::typeof(∂(loss)))(Δ::Float64)
@ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface2.jl:0
[8] Pullback
@ ~/Desktop/Jordi/Julia/odinn_toy_model/scripts/helpers/iceflow.jl:43 [inlined]
[9] (::typeof(∂(λ)))(Δ::Float64)
@ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface2.jl:0
[10] (::Zygote.var"#84#85"{Params, typeof(∂(λ)), Zygote.Context})(Δ::Float64)
@ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface.jl:343
[11] hybrid_train!(loss::typeof(loss), glacier_ref::Dict{String, Vector{T} where T}, UA::Chain{Tuple{Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}, Dense{var"#530#537", Matrix{Float64}, Vector{Float64}}, Dense{var"#531#538", Matrix{Float64}, Vector{Float64}}, Dense{var"#sigmoid_A#534"{Int64, Float64}, Matrix{Float32}, Vector{Float32}}}}, opt::ADAM, H::Matrix{Float32}, p::Tuple{Int64, Int64, Float64, Float64, Matrix{Float32}, Vector{Float64}, Float64, Int64}, t::Int64, t₁::Int64)
@ Main ~/Desktop/Jordi/Julia/odinn_toy_model/scripts/helpers/iceflow.jl:53
[12] macro expansion
We should investigate this after AGU, as it makes way more sense to train the model based on surface velocities rather than the final ice thickness state.
Right now, in the current implementation of initialize_ODINN()
, the configuration of PyCall doesn't work from scratch. After building some packages, the code doesn't find the right Pythong path, and it is necessary to close the Julia session and run the code again in order to correctly access it. We should find a workaround for this, and see where the issue comes from.
A first raw implementation of a DiscreteCallback
to add the monthly MB has been added. Right now, the code is pretty much repeated between the PDE and the UDE for practical purposes. Moreover, some functions are declared within functions.
See how to properly code this in a clean way to avoid duplication and enhance modularity.
Generate forward simulation of ice thickness and velocity fields, with around 4 snapshots for the whole simulation to train the UDEs.
Stop using random mass balances and use sequential MBs and ELAs in order to have a reproducible behaviour.
A
as a function of the average of the ELAs of the last 5 years -> account for delays and slow transitions in ice temperature.
The current raw model includes Harry's annual MB in annual timesteps, so the full annual MB is added in a single dt. The annual MB should be interpolated in smaller time steps (e.g. daily, weekly or monthly), and then be added in the forward model in a similar way that the one I coded for annual MB.
So far some code is implemented using hydrological years. Following the recent migration from OGGM to calendar years, we should really follow along in order to avoid future issues.
Create modules for each file and generate package in order to be able to easily call ODINN from other repos (e.g. notebook).
The value of the diffusivity
In our simulations we are using a different value (1.3e-24) of the one recommended in Cuffey (2010) (Chapter 3) they recommend do use A = 2.4e-24 [1 / s Pa^3].
We need a Julia data structure to store some of the information from gdirs
. To be seen if should be a mutable struct or what.
Simulation high level functions should take those for the simulations, instead of just more basic data structures.
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
The training seems to be struggling with OGGM initial glacier states. It might be either due to data resolution, different glacier topography or other numerical problems.
I'm investigating ways to normalize the loss function by dividing the magnitude of the computed and reference H
and Vx
and Vy
by themselves. This will give equal weight to large and small values.
To accompany the preprint and the first release, we should create a nice Jupyter notebook based on ODINN.jl
(i.e. a high-level interface), showcasing how ODINN works and how it can train with multiple glaciers at the same time.
Train an UDE to optimize the value of A based on ELA values. Customize the loss function in order to produce results which are physically plausible.
Two options can be investigated:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.