Coder Social home page Coder Social logo

pmhn's Introduction

Project Status: WIP โ€“ Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. build Ruff Code style: black

Personalised Mutual Hazard Networks

Principled probabilistic modelling with Mutual Hazard Networks.

Running the workflows

To facilitate reproducibility we use Snakemake. We recommend creating a new virtual environment (e.g., using Micromamba) and installing Snakemake as described in their documentation.

Once the environment is set, the package can be installed using

$ pip install -e .  # Note -e which will allow modifying the code when needed

Contributing

We use Poetry to control dependencies.

When Poetry is installed, clone the repository and type

$ poetry install --with dev

to install the package with the dependencies used for development.

At this stage you should be able to use Pytest to run unit tests:

$ poetry run pytest

Alternatively, you may want to work inside Poetry environment:

$ poetry shell
$ pytest

When you submit a pull request, automated continuous integration tests will be run. They include unit tests as well as code quality checks. To run the code quality checks automatically at each commit made, we use pre-commit. To activate it run:

$ poetry shell  # If it is not already active
$ pre-commit install

Acknowledgements

This package is built around LearnMHN (the backend for Mutual Hazard Networks) and PyMC (probabilistic programming framework).

pmhn's People

Contributors

laurenzkeller avatar pawel-czyz avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

pmhn's Issues

Backend for loglikelihood calculation

Overview

Backend for calculating loglikelihood and its derivatives.
It can be based e.g., on LearnMHN package and joblib parallelisation.

Tasks:

  • Loglikelihood and gradient for a single genotype and MHN.
  • Loglikelihood and gradient for several genotypes and shared MHN.
  • Loglikelihood and gradient for several genotypes and several MHNs.

Description

We want to have functions of signatures:

def loglikelihood(genotype: Bool[Array, " M"], theta: Float[Array, "M M"]) -> float:
   ...
   
def gradient(genotype: Bool[Array, " M", theta: Float[Array, "M M") -> Float[Array, "M M"]:
 ...

implementing the loglikelihood and the gradient for a particular genotype.

Apart from that we want to have vectorized versions as described above.

Implement tree validation

As Xiang has noticed:

We may also want to add functions to check if the trees contain the following cases
A -> B -> B
B <- A -> B
because TreeMHN does not allow repeated mutations in the same lineage or identical siblings.

Implement TreeMHN likelihood

We want to have a backend implementing a function of essentially the following signature:

def loglikelihood(tree: Tree, theta: np.ndarray) -> float:
    """Calculates the loglikelihood and the gradient.

    Args:
        tree: tree
        theta: unconstrained (log-) theta matrix of shape `(n_genes, n_genes)`

    Returns:
      loglikelihood, `log P(tree | theta)`
    """
    raise NotImplementedError

The implementation should be accompanied by unit tests, where the answer is known (either manually calculated or using original implementation)

Note that this task may be already too large to be accomplished in one sprint. After we discuss it, we can split it into several smaller subtasks.

Implement tree simulator

Implement the tree generative TreeMHN process. It should be based on the Algorithm 1 (pseudocode) from Supplementary Information.

Simulation framework

We want to be able to simulate a discrete Markov chain from the Markov process given by MHN.

Utilities:

  • Sampling a trajectory from initial state at time $t_0=0$ to $t_\mathrm{max}$.
  • Sampling time $t_\mathrm{max}$ from exponential distribution.

TreeMHN: implement gradient of the loglikelihood

We want to add the utility of calculating the gradient to the backend.

def gradient(tree: Tree, theta: np.ndarray) -> np.ndarray:
    ...

Note that it is not the priority at the start of the project: we will try to do the modelling as soon as possible, starting with a sequential Monte Carlo sampler and Metropolis transitions using small simulated data.

Only after initial experiments (we will probably see that it's not scalable), we'll consider switching to Hamiltonian Monte Carlo (which requires gradients).

PyMC Op for vanilla MHN

Create a PyMC Op object which can be used to calculate loglikelihood and the gradient of MHN.

Note: this issue depends on #3.

PyMC Op for personalised MHNs

Add a PyMC Op which takes full genotype matrix as well as an array of shape (patients, genes, genes) representing the MHNs and evaluating:

  • total loglikelihood
  • the derivative of the total loglikelihood with respect to the MHNs (which will again be of shape (patients, genes, genes))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.