Coder Social home page Coder Social logo

bodkan / demografr Goto Github PK

View Code? Open in Web Editor NEW
26.0 3.0 1.0 82.75 MB

Fast and simple simulation-based population genetic inference in R

Home Page: https://bodkan.net/demografr

License: Other

R 96.07% Makefile 1.62% Python 0.33% Slim 0.22% Dockerfile 1.77%

demografr's Issues

Missing Functions - expand_priors, get_prior_names

Hi @bodkan,

I have been working through demografr this week and found that there were two functions called that didn't exist in the package. The functions were expand_priors() and get_prior_names(). get_prior_names seemed to be a simple name extraction function, so I made one of my own, but I'm not sure what expand_priors is used for. This was the only information I found about it in the code from the validate_abc function: # first expand any generic "..." prior sampling expressions (if needed). Is this just used for non-uniform priors? Could you explain in a bit more detail what expanding the prior in this context means?

Thanks,
GK

Add an "intelligent" model plotting function?

demografr uses slendr as a means to specify demographic models whose parameters are to be inferred. This takes a form of a simple slendr function, which simply compiles a model to a single slendr object. In its simplest form, it can look something like this:

model <- function(Ne_A, Ne_B, Ne_C, Ne_D, T_AB, T_BC, T_CD) {
  popA <- population("popA", time = 1,    N = Ne_A)
  popB <- population("popB", time = T_AB, N = Ne_B, parent = popA)
  popC <- population("popC", time = T_BC, N = Ne_C, parent = popB)
  popD <- population("popD", time = T_CD, N = Ne_D, parent = popC)

  model <- compile_model(
    populations = list(popA, popB, popC, popD),
    generation_time = 1, simulation_length = 10000
  )

  return (model)
}

The parameters of this function (Ne_A, Ne_B, ...) are then parameters which will be inferred by demografr automatically by fitting against observed data.

To avoid easy-to-miss bugs in misspecification of models, it would be nice to have a way to visualize demografr model functions without the need to enter dummy values (annoying).

However, one can't just plug in random values because (most?) parameter combinations would lead to an invalid, uncompilable model.

Thoughts/options:

  • Figure some iterative approach which would test different values one by one and see which model compiles?
  • Do a simple static analysis of the slendr function code -- build an AST and then:
    • determine which function parameters are Ne -- those can be set to whatever value (like 1).
    • determine which values are split times, and use the parent = information to enforce the ordering (either backward or forward times)
    • based on split times, pick consistent random values of gene flow

The final dummy model would be plotted as a simple tree, without showing the dummy Ne and times in any scale at all (similarly, gene flow events would be plotted just with arrows).

I love messing with AST but I have no time for this. :( Maybe a project for a computer sciency student?

How to handle sequencing error and ancient DNA damage?

One major selling point of this R package is efficient, fast ABC inference using tree-sequences and slendr.

For lots of data sets, in particular low-coverage and/or ancient DNA data, erroneous SNP calls present an issue.

In the context of ABC, it is generally assumed that the summary statistics come from data which is reasonably clean and where various errors (especially aDNA damage) have been taken care of via filtering.

For many (?) analyses and summary statistics, aDNA errors would add noise around the true values of summary statistics. In cases like this, simulating summary statistics from perfect, clean tree sequences would not be a problem.

Still, it might be interesting to add an option to sprinkle artificial mutations that correspond to damage or sequencing errors on top of standard tree sequences. Then, summary statistics computation would proceed as normal, with the exception that it would be mutation-based rather than branch-based (which would be normally the mode of operation).

Although useful more generally, it doesn't make sense to suggest to add this functionality to tskit or msprime. Perhaps this package could have a tiny built-in Python submodule which would add damage or errors on top of mutated tree sequence.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.