bodkan / demografr Goto Github PK

View Code? Open in Web Editor NEW

26.0 3.0 1.0 82.75 MB

Fast and simple simulation-based population genetic inference in R

Home Page: https://bodkan.net/demografr

License: Other

R 96.07% Makefile 1.62% Python 0.33% Slim 0.22% Dockerfile 1.77%

demografr's Issues

Missing Functions - expand_priors, get_prior_names

Hi @bodkan,

I have been working through demografr this week and found that there were two functions called that didn't exist in the package. The functions were expand_priors() and get_prior_names(). get_prior_names seemed to be a simple name extraction function, so I made one of my own, but I'm not sure what expand_priors is used for. This was the only information I found about it in the code from the validate_abc function: # first expand any generic "..." prior sampling expressions (if needed). Is this just used for non-uniform priors? Could you explain in a bit more detail what expanding the prior in this context means?

Thanks,
GK

Add an "intelligent" model plotting function?

demografr uses slendr as a means to specify demographic models whose parameters are to be inferred. This takes a form of a simple slendr function, which simply compiles a model to a single slendr object. In its simplest form, it can look something like this:

model <- function(Ne_A, Ne_B, Ne_C, Ne_D, T_AB, T_BC, T_CD) {
  popA <- population("popA", time = 1,    N = Ne_A)
  popB <- population("popB", time = T_AB, N = Ne_B, parent = popA)
  popC <- population("popC", time = T_BC, N = Ne_C, parent = popB)
  popD <- population("popD", time = T_CD, N = Ne_D, parent = popC)

  model <- compile_model(
    populations = list(popA, popB, popC, popD),
    generation_time = 1, simulation_length = 10000
  )

  return (model)
}

The parameters of this function (Ne_A, Ne_B, ...) are then parameters which will be inferred by demografr automatically by fitting against observed data.

To avoid easy-to-miss bugs in misspecification of models, it would be nice to have a way to visualize demografr model functions without the need to enter dummy values (annoying).

However, one can't just plug in random values because (most?) parameter combinations would lead to an invalid, uncompilable model.

Thoughts/options:

Figure some iterative approach which would test different values one by one and see which model compiles?
Do a simple static analysis of the slendr function code -- build an AST and then:
- determine which function parameters are Ne -- those can be set to whatever value (like 1).
- determine which values are split times, and use the parent = information to enforce the ordering (either backward or forward times)
- based on split times, pick consistent random values of gene flow

The final dummy model would be plotted as a simple tree, without showing the dummy Ne and times in any scale at all (similarly, gene flow events would be plotted just with arrows).

I love messing with AST but I have no time for this. :( Maybe a project for a computer sciency student?

How to handle sequencing error and ancient DNA damage?

One major selling point of this R package is efficient, fast ABC inference using tree-sequences and slendr.

For lots of data sets, in particular low-coverage and/or ancient DNA data, erroneous SNP calls present an issue.

In the context of ABC, it is generally assumed that the summary statistics come from data which is reasonably clean and where various errors (especially aDNA damage) have been taken care of via filtering.

For many (?) analyses and summary statistics, aDNA errors would add noise around the true values of summary statistics. In cases like this, simulating summary statistics from perfect, clean tree sequences would not be a problem.

Still, it might be interesting to add an option to sprinkle artificial mutations that correspond to damage or sequencing errors on top of standard tree sequences. Then, summary statistics computation would proceed as normal, with the exception that it would be mutation-based rather than branch-based (which would be normally the mode of operation).

Although useful more generally, it doesn't make sense to suggest to add this functionality to tskit or msprime. Perhaps this package could have a tiny built-in Python submodule which would add damage or errors on top of mutated tree sequence.

bodkan / demografr Goto Github PK

demografr's Issues

Missing Functions - expand_priors, get_prior_names

Add an "intelligent" model plotting function?

How to handle sequencing error and ancient DNA damage?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent