Coder Social home page Coder Social logo

Thoughts about multiscalearrays.jl HOT 5 CLOSED

sciml avatar sciml commented on May 30, 2024
Thoughts

from multiscalearrays.jl.

Comments (5)

zahachtah avatar zahachtah commented on May 30, 2024

As soon as this works I'll try it on a couple of classical multi scale models in eco-evo field, e.g. evolutionary food webs, spatial evolutionary food webs, spatial vegetation models.

from multiscalearrays.jl.

ChrisRackauckas avatar ChrisRackauckas commented on May 30, 2024

Hey,

Sorry this turned into a novel.

Now that I know it works (tried it in DifferentialEquations.jl, yay it's working on my
local branch and everything seems to have 0-overhead as expected!), let me explain this
in full.

The Structure - why it's fast, and the constraints

Let me first explain the constraint that this has. If you take a look at an arbitrary part of a loop in DifferentialEquations.jl, you will see it's littered with these:

      for i in uidx
        tmp[i] = u[i]+Δt*(a71*k1[i]+a72*k2[i]+a73*k3[i]+a74*k4[i]+a75*k5[i]+a76*k6[i])
      end

So anything that is

  • Linearly indexible
  • Defines a few basic arithmetic rules on the lowest index

Will work, and will actually work without allocating (i.e. it will be a very fast loop). Here are the two main caveats:

  • getfield is not a type-stable operation. So if you provide an indexing scheme where can use multiple different fields from a type, it won't be fast.

So that's the main constraint: the ODE/SDE/etc. solvers are only doing the continuous updates on the terms top.x.x.x.x.x.... everything has to be in the .x for the ODE, specifically the .x at the leaf level (since the .x at higher levels is just another in the hierarchy).

Now that's the only real speed constraint. By design, since .x is a vector, the higher level is looping on a contiguous vector, so that's fast. So in my example, if you take a population and have it loop through a large list of cells, it can do so quickly because that would just be looping along one vector.

Lastly, talking about performance, if you have your type as immutable, then carnaval's PR (Github is down for me while writing this, mention to be this and I'll replace this with the link to the PR) will turn this into essentially a 0 overhead loop. So this is designed with all of the enhancements of v0.6 in mind so that way it boils down to something with no overhead in the most constrained case (using immutable types, always strict typing). The beauty of Julia is that, at any spot, we can "break" these stricter rules at the cost of some performance for some generality (and the further discussion will give some examples).

How to add generality, and the costs

So there's speed in every place you look, with the generality of the hierarchy, but with the .x constraint. Knowing that, let's brainstorm sensible programming practices to expand the generality of the idea with minimal losses to performance.

  1. parameters. I used parametrised functions to give parameters (e.g. traits) to the ode function. Will these AbstractMultiScaleModel types now hold these parameters?

Yes, you can use any f(t,u,du), just what "shows up" as u and du are MultiScaleModels (of the same sizes). Thus your f can be any function which performs continuous reactions (even a ParameterizedFunction). Continuous is a key point: the solver is not accurate if the reactions here are not defined as continuous. Discontinuous events (adding predators/prey, depending on the level being modeled at the bottom) are handled in callbacks or events.

  1. can you mix leaf types? for example have predator and prey types that dispatch the growth dynamics function based on type?

You can. Vector{T} can have T as a Union type if you wanted, i.e. T<:Union{Predator,Prey}. Note that, as of right now, Julia will not loop as quickly on this because that's looping over an array of an abstract type (Jeff Bezanson's coming improvement to the type system in v0.6 may fix this, in which case, we are in business!). The other way to handle this is to use the generality of the type itself. Instead of having a Predator and a Prey, you can instead have at an Organism and then have

immutable Organism{T} <: MultiScaleModelLeaf
  x::Vector{T}
  predator::Bool
end

Now your leaf type will automatically update both predators and prey in the ODE, and you can distinguish between them inside of f using this boolean (or you can make it a Symbol to have many types of organisms). This will keep the typing strict in the x above it (so still fast loops). To write functions which dispatch using this you can use the symbol however you want, with the choice of how to use it depending on "how much of the problem is handled within these parts" (see Stefan's mention here).

So those are two tools to use this for more generality.

  1. when you apply the function to the leaf vector (Proteins in your example) would parameters in the parent node be available? more curious than any direct implementation in mind...Actually, if space is head node you could have environmental parameters such as temperature or nutrients there that determine growth rates of organisms living in that space-node. Actually, nutrients would also have to be dynamic but the value is not in a leaf node, unless you can have a nutrient leaf node mixed with others such as organisms...is this to confusing?

You can already do this with Vector{Union{...}}, but this is a compelling enough case to extend a little bit. How about this: for non-leaf types have the following:

immutable Population{T<:AbstractMultiScaleModel} <: AbstractMultiScaleModel
  x::Vector{T}
  y::Vector{BottomType}
  end_idxs::Vector{Int}
end

where BottomType is the same type as the values in the leaf type (i.e. most likely Float64) (you want to make sure these are all the same for type stability). Then the looping is extended to the following:

  • Loop through y
  • Loop through x

y would be the "continuous values inherent to the non-leaf type". With a small change to end_idxs and the looping structure, this can be done with minimal to no loss of performance. This generality sounds necessary, so it should probably be my next goal (after propagating all of the changes I have done to get it working).

  1. assume I have an species (population of) and I track abundance of individuals but I also want to use quantitative genetics to have a trait, e.g. mean size, evolve. I guess thats related to 3) as you'd want one abundance type and one size type as leaf for parent species and then have one function describing change in abundance and one describing change in mean size

Would size need to be a type, or just a value in y of Species which is non-leaf? Or you could use a Vector{Union{...}}

Space

And for the last one, there's a lot to unfold.

  1. I assume one can use e.g. space as Head node?

Depends on the model. Remember that, other than this x (and maybe y) business, you can do whatever you want at the discrete level. So if you want to do discrete updates for space, you can always handle that in callbacks. The callback function serve three functions:

  1. You can choose to do an update right before or right after a save. So you can add discrete updates here (i.e. each timestep, do something). The stepsize and timepoint are available so if you use an adaptive ODE method, you can make this apply more evenly.
  2. You can declare events. Events allow the solver to handle discontinuous changes in the continuous components.

The first way has one caveat: the changes are only applied after a step. So for example if you're taking really large steps, if you use the first way you may have particles/organisms/etc. to move every step because they should actually "be moving in-between the steps". You can constrain the stepsize or use a fixed-timestep ODE method (you can change any method to fixed timesteps with adaptive=false) to handle this, but it will always have a loss of error.

Events are nice because they will handle the discontinuity well. For example, your event could be "every second update the positions". So you could do what you wanted to do with 1, but instead check if the time has gone another full minute, and put an event there if it has.

Side-note: ODE Callback/Event Extension

Using the notation of the events, here's how I am extending events to handle this. Instead of having callback as a function, have it as an overloaded type with a cache. I can slightly extend the macro to the following:

default_callback = @ode_callback begin
  @ode_savevalues
end cache=MyCache

where MyCache<:CallbackCache is any type you choose, and

default_callback = @ode_callback begin
  @ode_savevalues
end

just defaults to having the cache be some empty immutable cache (0 overhead). Then extend the event interface to be:

function event_f(t,u,f,user_cache)
  u[1]
end

and

function apply_event!(u,f,user_cache,solver_cache)
  u[2] = -u[2]
end

(solver_cache is still necessary since that's how the internal solver parts need
to be resized when additions and deletions occur). That's a very conservative extension
which will let you even do events on values of the derivative (using f inside the event,
I don't have interpolations for the derivative nor a compelling use-case to go through all
that extra work when in the end, it would add a whole lot of complexity).

This will take less than an hour to do, and it's a breaking change so I'm doing this
before the next release (i.e. before the first release with events). This should give
a lot of generality and speed (you can use the caching for speed!) while not bloating
the user-burden (it's there, but you can just ignore it if you want).

Back to the MultiScaleModels

Now here's how you can use this in a multi-scale model to do discrete timestep updates
with an adaptive timestep solver. You can have a CallbackCache which has a value for
next_minute, starting at 1. Then you can make the event

function event_f(t,u,f,user_cache)
  60*user_cache.last_minute-t
end

Notice that this is positive until t=60, at which point it goes negative so an
event will occur. The solver will step adaptively until it goes past 60, then the
event framework will pull it back to exactly t=60 and apply the event (the discontinuity there).

So then the apply_event! function will allow for updating at every minute. You can use this
for safe fixed timestep updates while keeping continuous timestep updates for the ODE.
So if you're doing discrete positions, you can update the positions of organisms every hour, where
between each hour each individual organism is running an ODE.

[Note that this is general
enough that you can actually do a Gillespie simulation on the discrete level: to do that you just
choose the next time for an event to be an exponentially-distributed random-variable
(with rate based on the number of organisms you have, and rate constants), and then
apply an event when you hit that time value.]

What about continuous updates to space? Actually, diffusion of discrete particles
is just a stochastic differential equation, so there's no need for a PDE if this is
solved using an SDE. With the y extension, you can have y hold the spatial information
of an organism and if this is solved using SDEs, that will update continuously
(solving the SDE you specify). For people who aren't familiar with stochastic equations,
I will have an example for this.

However, if events happen too often, they constrain the solver's timesteps to be
small. This is exactly the problem with Gillespie simulation, and it would apply here.
While some things help (tau-leaping), this is generally just a fact that discrete
simulations have to handle things discretely... so too many changes loses what's going
on. See the last part of this response for my "philosophy" on this.

(makes me think if one could combine e.g. an ODE for e.g. cell growth/mobility
with a PDE for diffusion of chemicals and have them interact....i.e. cells react
to and affect chemical gradients)

Beautiful idea that I hadn't thought of, but the y extension actually handles this
perfectly (this is actually on accident! I was brainstorming everything thinking this
wasn't possible to do efficiently, but realized it "just works"). Say for your cell model
you want a background protein. The background protein concentrations in space can be
held as the y vector (or matrix), and then you can just have your f apply the
diffusion PDE to them. Since the diffusion PDE is just a matrix-vector multiplication,
inside of f you would just write du.y = A*u.y and that would make the proteins diffuse.

If there was more than one background protein, you can make y a MultiScaleModel
where the next level is a leaf-type SpatialProtein, and then since that will update
continuously, the following would do a diffusion update on all of them:

f = function(t,u,du)
  ### Diffuse
  for i in eachindex(u.y)
    du.y[i] = A[i]*u.y[i]
  end
  ### Now do stuff on `u.x`
end

where now I have A a vector of matrices, each matrix having a different diffusion
constant. Using what I show below, you can then have cells (or whatever is below)
add to du.y to influence it as well, so it's a bi-directional influence.

Defining the ODE, the internal looping structure

  1. when you apply the function to the leaf vector (Proteins in your example) would parameters in the parent node be available?

Yes, I think this is a good lead-in to describing the looping structure for f.

At the highest level and lowest level this is easy. To loop over everything in the lowest
level, you just

for i in eachindex(u)
  u[i] # Will get each thing that at a leaf, or in a `y` (when that's implemented)
end

At the highest level, you can grab each child by

for i in eachindex(u.x)
  u.x[i] # So if the top is population, this is a cell
end

I want to extend the indexing so that way, if you have Tissue -> Population -> Cells -> Protein concentrations (which are the numbers at the bottom, so the leaf type is Cell), then I want to add:

for i in eachindex(u,2)
  u[i] # u is a tissue, so level 2 is cells, so this loops through each cells
  # i = (i1,i2), so i[1] would tell you which population, i[2] is which cell
  # together, i just ends up giving the cell for convenience.
  # Using this, you can change `u.y[i]` from the cells, so have the cells
  # influence the PDE!
end

This is actually a simple fix from what I have, and would handle these constraints.
Notice that the ODE solver just uses the first:

for i in eachindex(u)
  u[i] # Will get each thing that at a leaf, or in a `y` (when that's implemented)
end

so this generality is simply added as convenience for users (i.e. you can already
do this, but it would be a real pain to do).

Overarching Philosophy: Mixing Discrete and the Continuous Blur

Here's the main idea: this let's you add structures, but also "blur" whenever necessary.
The big problem that I have with most common biological models (which handle multiple scales)
is that "the blur is only at the top", so they use a PDE at the top for "concentrations
when there's enough that a continuous approximation makes sense". However, then there's
too many discrete things being modeled and the numerical methods typically grind to
a halt. In this framework, you can do exactly this kind of model with events, but as noted before,
if your events are too frequent the solver will slow not step very large and will slow
down. What this means is that, you don't just want to blur at the top level, you want
to blur at the bottom level as well.

You want to pick a middle, say cells, and then have the protein concentrations of the
morphogen gradients which affects the migration be a continuous PDE, and have the
gene regulatory network inside of the cell be a continuous SDE, but model the cells
themselves as discrete entities. By blurring below as continuous with stochasticity,
you can use adaptive solvers (indeed, my latest paper in review is a fast algorithm
for adaptive solving of SDEs) to not have too many events and still model the
stochastic nature of it, but not have to hit every discrete moment when events
occur. Mix this with events, you can do tau-leaping on the cells influenced by
a grander (S)PDE, where each cells holds an SDE.

That's what this framework should
make "trivial" to implement: with this implementation having nearly 0 overhead
(after v0.6 changes) and thus all of the speed of the (well-optimized DifferentialEquations.jl)
solvers applies directly to the model (alleviating the issue of having to write
special-purpose solvers for these kinds of model which also slows things down
since you're not using a "tried-and-true" optimized solving package). As you can
see, it's already almost there, in fact on my local branch it already works,
but some usability enhancements (i.e. the special looping) will be necessary
to get the "ease-of-use" requirement satisfied.

One final point: Linear indexability

One point to note is that x can actually be anything which is linearly indexable
(preferrably, something with fast linear indexing). So while in the example I always
have x as a vector, you can have x as a matrix if that organization makes it
easier for you to write your f.

from multiscalearrays.jl.

ChrisRackauckas avatar ChrisRackauckas commented on May 30, 2024

Update on the y idea. When I wrote the first part Github was down, so this is
a separate idea that I will post separately. The y and the Vector{Union{...}}
parts are necessary because this framework cannot "arbitrarily loop through all fields".
You cannot define a type which is a subtype of AbstractMultiScaleModel because then
the indexing would have to use getfield() which is slow and not type-inferrable
(even if "we know" all of the leafs actually just hold Float64 values!). So I cannot
make it so that way you have arbitrarily-many fields that are considered for continuous
update (which are denoted by "they are part of the linear index"). However, if you
look at indexing.jl, you will see that I simply use the fact that I know the data is
in x. If you know there's a y, I can easily extend this to use both x and y.
Now, if I knew in advance what the fields you want to have as part of the linear index
are, I could make it do it on each of those. So is there an way to do this? Yes,
if the type is defined by a macro, I could use metaprogramming to defining the indexing
dispatch directly for that type, knowing the fields, and thus getting efficient indexing.

Is this still type inferrable? I wonder if the king of Julia arrays, @timholy, would
be able to chime in on the design. That's a lot to ask given the amount that's written above, but I assumed you'd "get it" really fast: it's a tree which uses recursion indexing on one field to make it have a linear index. Is there a way to generalize this to "any field" in a way that's type-inferrable, given that the leafs are all the same type?

Here's the indexing code. It's actually really succinct (other than my 4 AM bisection where I gave up and made it a vectorized calll... I'll fix that soon hahaha).

https://github.com/JuliaDiffEq/MultiScaleModels.jl/blob/master/src/indexing.jl#L42

from multiscalearrays.jl.

zahachtah avatar zahachtah commented on May 30, 2024

Great read and very promising! Looking forward to testing

from multiscalearrays.jl.

ChrisRackauckas avatar ChrisRackauckas commented on May 30, 2024

So yeah, I'm closing this to get more specific in issues. It works in ODEs and SDEs. The changing size is possible, but currently you just have to know which cache variables to change. I'll make the cache iterators soon, it's just that making them will take quite a bit more typing than just using them directly so behind the scenes I've been avoiding this! The big change from this idea, adding y and eltype, is its own separate issue #7 and I'm hoping to finish that off today.

There's still some performance improvements that could occur (especially in SDEs), but the full functionality should be available very soon (it really needed the new integrator + callback interfaces!)

from multiscalearrays.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.