Coder Social home page Coder Social logo

odinn-sciml / odinn.jl Goto Github PK

View Code? Open in Web Editor NEW
68.0 68.0 10.0 64.28 MB

Global glacier model using Universal Differential Equations for climate-glacier interactions

License: MIT License

Julia 98.06% Jupyter Notebook 1.94%
climate cryosphere differential-equations glaciers julia scientific-machine-learning sciml

odinn.jl's People

Contributors

facusapienza21 avatar github-actions[bot] avatar jordibolibar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

odinn.jl's Issues

Make `generate_batches` function general by using `varargs`

Right now the generate_batches function is more or less hardcoded to a given number of input features. Although not straightforward, since the inputs are a combination of Vectors and scalars, a variable number of input features should be automatically assimilating by using varargs:

function generate_batches(batch_size, θ, UD, features...; shuffle=true)

end

Bug with integration Fiona + OGGM + ODINN

When running toy_model.jl and having the OGGM environment correctly working with PyCall, I receive the following error coming from Fiona when executing an entity task to download and process glacier data:

in expression starting at /Users/facundosapienza/Dropbox/ODINN-dev/ODINN.jl/scripts/toy_model.jl:149

caused by: PyError ($(Expr(:escape, :(ccall(#= /Users/facundosapienza/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'AttributeError'>
AttributeError("partially initialized module 'fiona' has no attribute '_loading' (most likely due to a circular import)")
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 558, in init_glacier_directories
    gdirs = execute_entity_task(utils.GlacierDirectory, entities,
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 191, in execute_entity_task
    out = [pc(gdir) for gdir in gdirs]
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 191, in <listcomp>
    out = [pc(gdir) for gdir in gdirs]
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 108, in __call__
    res = self._call_internal(func, arg, kwargs)
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/workflow.py", line 102, in _call_internal
    return call_func(gdir, **kwargs)
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/utils/_workflow.py", line 2468, in __init__
    rgi_entity = self._read_shapefile_from_path(_shp)
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/oggm/utils/_workflow.py", line 3082, in _read_shapefile_from_path
    shp = gpd.read_file(fp)
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/geopandas/io/file.py", line 242, in _read_file
    engine = _check_engine(engine, "'read_file' function")
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/geopandas/io/file.py", line 97, in _check_engine
    _import_fiona()
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/geopandas/io/file.py", line 40, in _import_fiona
    import fiona
  File "/Users/facundosapienza/envs/oggm_env/lib/python3.9/site-packages/fiona/__init__.py", line 85, in <module>
    with fiona._loading.add_gdal_dll_directories():

Do you have any idea of how to fix this @JordiBolibar @fmaussion ? I am able to run the entity task and download the data, but the error happen immediately after. I observed this error just when I run ODINN in my local computer (I never observed this problem when working in the JupyterHub).

Investigate use of batches for UDE training

Right now we are training the UDE based on the full H matrix. This seems reasonable for this toy model, but we should investigate potentially more efficient ways to do so (i.e. using batches).

New API

We should improve the interface of how we launch simulations. In order to have everything in a more tidy way. We should create data structures that gather multiple parameters driving simulations.

Here's a draft of the prototype I have in mind:

Model parameters

  • PhysicalParameters struct to hold all parameters related to model physics.

  • Hyperparameters struct holding all the hyperparameters from the neural network.

  • SolverParameters struct holding all parameters to configure the solver.

  • UDEparameters struct holding all parameters for the training of the UDE. It's a child of SolverParameters, since it also includes a solver.

  • OGGMparameters struct wrapping OGGM's parameters

  • SimulationParameters struct holding the parameters defining a simulation

  • Parameters: struct holding all the parameter types.

Models

  • Machine struct holding an ML model (e.g. a neural network) + a Hyperparameters struct. Inspired by MLJ.jl. When initialized inside a Model, it automatically fetches the Hyperparameters to generate the NN.

  • MBmodel struct defining a mass balance model. It can have children like TImodel.

  • IceflowModel struct defining an ice flow model. It can have children, like SIAmodel

  • Model struct, contains all information about the PhysicalParameters, SolverParameters, UDEparameters, OGGMparameters, Machine, MBmodel, IceflowModel

Glaciers, intial conditions and climate data

  • Climate struct containing climate data series for a Glacier.

  • Glaciers struct containing OGGM's gdirs and additional information on the Julia side, for example its Climate.

Simulations

  • Simulation (inmutable?) struct holding all the data necessary for a simulation: Model, Glaciers, its type (functional inversion, inversion or forward).

To be run such as:

# Produce everything needed for a simulation
# First we create the model parameters
parameters = Parameters(; physics = PhysicsParameters(),
                                      hyper = Hyperparameters(),
                                      solver = SolverParameters(),
                                      UDE = UDEparameters(),
                                      OGGM = OGGMparameters(workspace_path),
                                      simulation = SimulationParameters())

# Then we generate the models to be run
model = Model(IceflowModel(), MBmodel(), Machine(), params)
glaciers = Glaciers(rgi_ids)
add_climate!(Glaciers, time_period)

# We create a simulation based on everything we generated before
simulation = Simulation(model, glaciers, parameters)

# We finally run the simulation
run!(simulation)

Beyond this, we should also accommodate an API similar to OGGM's, where everything is based on entity_tasks around glacier directories. Since Julia is not OO, entity tasks would just be functions with a common interface, run on Glaciers. Something like:

entity_tasks = [ODINN.tasks.DoSomethingToGlaciers(), ODINN.tasks.DoSomethingElseToGlaciers()]
apply_entity_tasks!(glaciers, entity_tasks)

Properly optimize Julia code

After some discussions in the Julia Discourse regarding Float64 vs Float32, some adjustments should be made to the code in order to avoid some performance pitfalls. Those are:

  • Move all code back to Float64. Apparently, unless one is using GPUs, there's no benefit on that for CPUs.
  • Use Int in exponents (e.g. in the flow law). See explanations in Discourse.
  • Avoid memory allocation in util functions following example in iceflow_sandbox model.

Clean up package dependencies

We need to double check that all the packages we're using are still relevant. This slows down precompilation time a lot.

Error in getting climate data in parallel

get_climate returns a serialization error when run in parallel with a pmap. For now the workaround is just to run it in serial with a map, but we should investigate what is the issue behind this.

Formalize Python and OGGM installation and folder structure

We should clearly specify in the README how to and where to install Python, OGGM and PyCall. I'm not sure that these should go in this repo, but we could perhaps create a script to install everything in a correct relative path from this repo. At the very least, we should give a clear indication on what paths and folder structures are expected in order to correctly replicate what ODINN is expecting.

Make Glathida functions work in a `pmap`

Right now, calling get_glathida in a pmap results in:

ERROR: LoadError: PyError ($(Expr(:escape, :(ccall(#= /home/jovyan/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'AttributeError'>
AttributeError("Can't pickle local object 'WeakValueDictionary.__init__.<locals>.remove'")

Though not essential, since it is reasonably fast with a serial execution, it would be nice to fix this and make it work in parallel.

Initialize UA with stable value

UA, the NN to optimize A is currently initialized with random weights. The initial state should produce a stable and realistic A value in order to run the forward model with stable conditions.

Physical constrains in the NN

We know we want to impose the range of the NN that learns the value of A to cover a limited range of plausible values. The current implementation of the loss function involves the loss of misfit between the model and the ground truth and some regularization term l_A that penalizes values outside the plausible range of values for A.

I think this could lead to some optimization problems in cases where the value of A is outside this range, including suboptimal solutions or really slow convergence. For some reason I am not 100% convinced yet, the standard way of doing this with NN is using a bounded activation function in the last layer (sigmoid, tanh) and scale it to be in the desirable range of values (see for example here and here). This requires further investigation.

A as a function of long-term temperature

In order to improve the function of A, it should depend on the long-term surface temperature instead of the MB. We should take the projected CPDDs the climate data from OGGM and run similar tests to the ones we've done so far with Argentière.

Optimize code with `SparseArrays`

Right now the code is quite computationally expensive due to the use of matrices. Since a lot of values of the glacier matrix are 0, we should use the library SparseArrays to reduce the data overhead and see if this optimizes the forward model and consequently the AD with Zygote.

Speed up loss function based on `V`

I have run some benchmarks on the different loss functions based on H, V and HV. The results are the following:

H: 203.710 s (3655336127 allocations: 188.45 GiB)

V: 519.855 s (8589977971 allocations: 471.25 GiB)

HV: 480.228 s (7931932251 allocations: 434.23 GiB)

This clearly indicates that H is the fastest solution. My intuition tells me that including V is slower, possibly due to the fact that it involves dependencies on 2 matrices: Vx and Vy. Since using V should be our preferred option, we should investigate ways to speed this up. Efforts in trying to reduce the number of sampling points to compute the loss (#22) don't seem to yield any advantages. @facusapienza21 to be discussed in today's meeting.

Migrate `SnoopPrecompile.jl` to `PrecompileTools.jl`

Investigate the use of SnoopPrecompile.jl for the ODINN.jl package. So far I have already implemented it in the #invert branch, but I still haven't properly assessed if there is any benefit in terms of performance. I should also check more in detail how to correctly use, since it was just a quick test.

Make parameters local

Right now we still have some parameters as const global variables. This is OK for simplicity's sake for now, but it is annoying when re-running code, as the variables cannot be redefined.

This is also problematic for testing, since we want to make sure that we control the local environment of the simulation.

For all these reasons, we need to pass explicitly all the necessary variables (e.g. B or H₀) in context, in order to avoid having them as const.

Update format of `gdirs_climate` throughout ODINN to new batched format

The current format of gdirs_climate as a tuple is not compatible with batching. The new format needs to be an Array of tuples, instead of a Tuple of arrays.

So far I have updated the functions in climate.jl in order to support both formats. The new batched format should be extended everywhere and it should become the new norm.

Create `Glacier` or `Gdir` type to store glacier state and initial conditions

Same as for the case of the Simulation type. We have too many basic types going around, and we need to tidy things up. Instead of passing so many matrices and scalars to the solver, we should gather everything in a type Glacier to be passed to simulations.

We have two options here:

  • Investigate if we can directly use the gdirs, and if this is technically feasible both with OrdinaryDiffEq.jl solvers and with AD in UDEs. The fact that it will be a PyObject might make things complicated.
  • Otherwise we could just create a Gdir type in Julia to translate it, or just call it Glacier. We should also make sure that types such as mutable structs or Dicts work correctly with solvers and AD. To be thought.

Parallelize ice flow model

After some initial tests with ParallelStencil.jl, it seems that it might be trickier than I thought to apply it with Zygote.jl, since the mutation limitation makes it not trivial.

After a discussion with Ludovic and Samuel, Samuel suggested the following:


  1. Should I initialize the matrices with @ZeroS at each iteration in the loop and then apply the parallelized functions by reference as you did?
    Allocations, in particular of big arrays, are something very costly. Thus, you should avoid that whenever possible inside performance critical code. This is why we allocate all needed arrays for computation at the beginning of our applications - outside of any loop. Thus, you should not initialize the matrices at each iteration in the loop but even before the loop.
  1. Is it possible to apply those macros to a function WITH return values? This would allow me to add the matrix initialization within the function in order to keep things tidy, e.g.
    No, this is not possible for two reasons: 1. CUDA does not allow kernels to return values; 2. we do not see any interest in coming up with a workaround to support that given that allocations are expensive and we therefore do not see any interest to have them inside kernels.
    That said, you can always wrap a @parallel kernel into a normal CPU functions, where you can do whatever you would like to, e.g., to take up your example:
@parallel function _compute_dS!(S, dSdx, dSdy, ∇S²)
    @all(dSdx) = @d_xa(S)/Δx
    @all(dSdy) = @d_ya(S)/Δy
    @all(∇S²) = @av_ya(dSdx).^2 .+ @av_xa(dSdy).^2
end

function compute_dS!(S)
    ref_size = size(S)
    dSdx = @zeros(ref_size .- (1,0))
    dSdy = @zeros(ref_size .- (0,1))
    ∇S² = @zeros(ref_size .- 2)
    @parallel _compute_dS!(S, dSdx, dSdy, ∇S²)
    return dSdx, dSdy, ∇S²
end

However, as noted above, you should pre-allocate these arrays in any case for performance reasons. If you would like to avoid passing these pre-allocated arrays to the functions/kernels, then you have also the possibility to allocate what is needed at first call or when the size changes.

The function gather from ImplicitGlobalGrid allocates for example some internal buffer at first call and reuses it as long as it is big enough: https://github.com/eth-cscs/ImplicitGlobalGrid.jl/blob/master/src/gather.jl


This remains one option, and for what I understood, this would serve to parallelize operations in the PDE solver. However, there is also the possibility to parallelize the simulations for each glacier themselves. I have started working on this using the Distributed.jl library and using pmap. I've started an implementation with a parallel file in order to avoid breaking the main workflow, and I have deployed this in the JupyterHub.

Let's use this thread to keep track of the developments on this side.

Use initial glacier conditions from OGGM

A necessary feature for the toy model will be to use initial glacier conditions from OGGM. So far we've been working with Harry's files, but we need to finally move to the real framework. I have been working on and off on this for a while, so it will be my next big objective.

Initialize NN weights in a more favourable space

We need to find a way to initialize the NN weights in a more favourable space. This would most likely speed up the training. A linear function with a slight positive incline could be a sensitive way to start.

Maybe @redouanelg would like to try this? This could be an easy first issue, very ML-focused :)

Include different numerical methods in the DE solvers and add them as parameter options.

In the current version of the iceflow_*!() functions, we are using an explicit methods where the temporal step $\Delta t$ is choosen in order to be smaller that some upper bound determined by the maximum diffusivity. Then, the diffusivity and consequently the temporal step are dependent of the parameters of the model and this is going to be backpropagated when we compute the gradients. This complicates the differenciation of the method. In order to avoid this, it is important to include the option of an explicit method with fixed (but small) temporal step that is independent of the rest of the parameters.

Screen Shot 2021-05-21 at 11 46 15 PM

It remains to include this as a parameter of the model. For a first analysis, I suggest to use explicit instead of explicit-adaptive for differentiation.

Implicit method needed to solve the SIA

The manually coded explicit forward scheme is super unstable and cannot solve the SIA, as it is a stiff PDE. An implicit method is needed, so ideally we should try:

  • To develop a simple implicit scheme in order to correctly understand how it works.
  • To use DifferentialEquations.jl and DiffEqFlux.jl in order to solve the stiff SIA PDE.

Currently being investigated with @facusapienza21 .

Global Parameter setting file

Eventually, we want to have all the global parameters of the model (A, n, rho, g) defined in a file that we can read from any of our simulations. This can also be useful to set some of the libraries we want to import.

Investigate loss function with fewer random sampling points

There's the hypothesis that computing the loss with fewer points, instead of using the full H or V matrix can result in an improved performance during backpropagation. We should test this and see if this also can act as regularization. It would be nice to come up with a sweet spot between performance and accuracy.

Train on multiple glaciers

For the preprint, it would be nice to perform the training using multiple glaciers and with real climate series for temperature data. Using around 10 different glaciers would be enough, in order to reproduce the current fake dataset of 9 different temperature series ranging -20:0 ºC.

Loss function using surface velocities

Using a loss function based on the ice surface velocities gives the following error:

Backpropagation
ERROR: LoadError: MethodError: no method matching zero(::Type{Any})
Closest candidates are:
  zero(::Type{Union{Missing, T}}) where T at missing.jl:105
  zero(::Union{Type{P}, P}) where P<:Dates.Period at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Dates/src/periods.jl:53
  zero(::FillArrays.Ones{T, N, Axes} where Axes) where {T, N} at /Users/Bolib001/.julia/packages/FillArrays/VLeUk/src/FillArrays.jl:540
  ...
Stacktrace:
  [1] zero(#unused#::Type{Any})
    @ Base ./missing.jl:106
  [2] zero(x::Vector{Any})
    @ Base ./abstractarray.jl:1085
  [3] _backmean(xs::Vector{Any}, Δ::Matrix{Float64}, #unused#::Colon)
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/lib/array.jl:311
  [4] (::Zygote.var"#649#650"{Colon, Vector{Any}})(Δ::Matrix{Float64})
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/lib/array.jl:309
  [5] (::Zygote.var"#2763#back#651"{Zygote.var"#649#650"{Colon, Vector{Any}}})(Δ::Matrix{Float64})
    @ Zygote ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:67
  [6] Pullback
    @ ~/Desktop/Jordi/Julia/odinn_toy_model/scripts/helpers/iceflow.jl:87 [inlined]
  [7] (::typeof((loss)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface2.jl:0
  [8] Pullback
    @ ~/Desktop/Jordi/Julia/odinn_toy_model/scripts/helpers/iceflow.jl:43 [inlined]
  [9] (::typeof((λ)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface2.jl:0
 [10] (::Zygote.var"#84#85"{Params, typeof((λ)), Zygote.Context})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface.jl:343
 [11] hybrid_train!(loss::typeof(loss), glacier_ref::Dict{String, Vector{T} where T}, UA::Chain{Tuple{Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}, Dense{var"#530#537", Matrix{Float64}, Vector{Float64}}, Dense{var"#531#538", Matrix{Float64}, Vector{Float64}}, Dense{var"#sigmoid_A#534"{Int64, Float64}, Matrix{Float32}, Vector{Float32}}}}, opt::ADAM, H::Matrix{Float32}, p::Tuple{Int64, Int64, Float64, Float64, Matrix{Float32}, Vector{Float64}, Float64, Int64}, t::Int64, t₁::Int64)
    @ Main ~/Desktop/Jordi/Julia/odinn_toy_model/scripts/helpers/iceflow.jl:53
 [12] macro expansion

We should investigate this after AGU, as it makes way more sense to train the model based on surface velocities rather than the final ice thickness state.

Restart Julia for PyCall to work

Right now, in the current implementation of initialize_ODINN(), the configuration of PyCall doesn't work from scratch. After building some packages, the code doesn't find the right Pythong path, and it is necessary to close the Julia session and run the code again in order to correctly access it. We should find a workaround for this, and see where the issue comes from.

Merge redundant code related to `DiscreteCallback`

A first raw implementation of a DiscreteCallback to add the monthly MB has been added. Right now, the code is pretty much repeated between the PDE and the UDE for practical purposes. Moreover, some functions are declared within functions.

See how to properly code this in a clean way to avoid duplication and enhance modularity.

Generate training dataset with fake A function

Generate forward simulation of ice thickness and velocity fields, with around 4 snapshots for the whole simulation to train the UDEs.

Stop using random mass balances and use sequential MBs and ELAs in order to have a reproducible behaviour.

A as a function of the average of the ELAs of the last 5 years -> account for delays and slow transitions in ice temperature.

Mass balance needs to be added in small timesteps in forward model

The current raw model includes Harry's annual MB in annual timesteps, so the full annual MB is added in a single dt. The annual MB should be interpolated in smaller time steps (e.g. daily, weekly or monthly), and then be added in the forward model in a similar way that the one I coded for annual MB.

Move all dates to calendar years

So far some code is implemented using hydrological years. Following the recent migration from OGGM to calendar years, we should really follow along in order to avoid future issues.

Create ODINN package

Create modules for each file and generate package in order to be able to easily call ODINN from other repos (e.g. notebook).

The value of A in Glen's Law

The value of the diffusivity $\Gamma$ in the SIA equation is proportional to A, the constant in Glen's Law.

In our simulations we are using a different value (1.3e-24) of the one recommended in Cuffey (2010) (Chapter 3) they recommend do use A = 2.4e-24 [1 / s Pa^3].

Create Julia data structure for `gdir`

We need a Julia data structure to store some of the information from gdirs. To be seen if should be a mutable struct or what.

Simulation high level functions should take those for the simulations, instead of just more basic data structures.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

UDE not training with OGGM initial state

The training seems to be struggling with OGGM initial glacier states. It might be either due to data resolution, different glacier topography or other numerical problems.

I'm investigating ways to normalize the loss function by dividing the magnitude of the computed and reference H and Vx and Vy by themselves. This will give equal weight to large and small values.

Prepare notebook with a demo

To accompany the preprint and the first release, we should create a nice Jupyter notebook based on ODINN.jl (i.e. a high-level interface), showcasing how ODINN works and how it can train with multiple glaciers at the same time.

Train UDE on A for the SIA PDE

Train an UDE to optimize the value of A based on ELA values. Customize the loss function in order to produce results which are physically plausible.

Two options can be investigated:

  • The NN only optimizes the value of A
  • The NN is set to generate the whole diffusivity component. This should possibly be left for later stages.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.