Coder Social home page Coder Social logo

hmyh1202 / cassiopeia Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yoseflab/cassiopeia

0.0 0.0 0.0 32.54 MB

A Package for Cas9-Enabled Single Cell Lineage Tracing Tree Reconstruction

License: MIT License

Python 44.71% R 1.14% Perl 4.74% HTML 11.43% Shell 0.02% Jupyter Notebook 37.96%

cassiopeia's Introduction

Updates (April 16, 2020)

We now introduce FitchCount algorithm in Cassiopeia's Analysis module. Briefly, FitchCount is an efficient algorithm for aggreagating the number of state transitions across all optimal evolutionary histories (under the maximum parsimony criterion) given the states of the leaves are known. It builds on the Fitch-Hartigan algorithm for ancestral state assignment (i.e. the Small Parsimony Problem; Fitch 1971 & Hartigan 1973).

You can access the algorithm in cassiopeia.Analysis.reconstruct_states module with the fitch_count function. The function takes in a Networkx tree with a Pandas series mapping each leaf to a given state and returns a square count matrix M which summarizes the number of times a state flipped to any other state across all optimal solutions to the small parsimony problem as given by the Fitch-Hartigan algorithm.

You can invoke the algorithm as such:

from cassiopeia.Analysis.reconstruct_states import fitch_count

# tree is a networkx object over Cassiopeia Nodes
M = reconstruct_states.fitch_count(tree, meta['tissue_sample')

We are in the process of putting together a notebook tutorial, so stay tuned!

Updates (Feb. 9, 2020)

We have some updated features in our most current release:

  • LCA-based Hybrid Switching: we've found mixed results in using cell-number-based cutoffs in Cassiopeia-Hybrid and have thus started using the distance to the latest-common-ancestor (LCA) of a given group of cells as a determining factor for transitioning between Greedy and ILP. We recommend using values between 10 and 20. You can control this parameter with the hybrid_lca_mode, which will interpret the cutoff parameter as an LCA distance.
  • Additional approaches for missing data handling: in our Cassiopeia-Greedy approach (which Hybrid also uses), we now support different modes for missing data handling: (1) we've added a K-nearest-neighbor approach which classifies cells with missing data based on where it's K-closest 'friends' were assigne; and (2) a lookahead approach where we use future Greedy splits to assign cells with missing data. You can specify which mode you'd like to use with the greedy_missing_data_mode which can either be knn, avg, or lookahead.

As a reminder, you can look at all parameters that reconstruct-lineage and stress-test allow by using the -h flag.

Cassiopeia

This is a software suite for proecessing data from single cell lineage tracing experiments. This suite comes equipped with three main modules:

  • Target Site Sequencing Pipeline: a pipeline for extracing lineage information from raw fastqs produced from a lineage tracing experiment.
  • Phylogeny Reconstruction: a collection of tools for constructing phylogenies. We support 5 algorithms currently: a greedy algorithm based on multi-state compatibility, an exact Steiner-Tree solver, Cassiopeia (the combination of these two), Neighbor-Joining, and Camin-Sokal Maximum Parsimony.
  • Benchmarking: a set of tools for benchmarking; a simulation framework and tree comparsion tools.

You can find all documentation here

You can also find example notebooks in this repository:

Free Software: MIT License

Installation

  1. Clone the package as so: git clone https://github.com/YosefLab/Cassiopeia.git

  2. Ensure that you have python3.6 installed. You can install this via pip.

  3. Make sure that Gurobi is installed. You can follow the instructions listed here. To verify that it's working correctly, use the following tests:

    • Run the command gurobi.sh from a terminal window
    • From the Gurobi installation directory (where there is a setup.py file), use python setup.py install --user
  4. Make sure that Emboss is properly configurd and installed; oftentimes users may see a "command not found" error when attempting to align with the align_sequences function we have provided. This is most likely due to the fact that you have not properly added the binary file to your path variable. For details on how to download, configure, and install the Emboss package, refer to this tutorial.

  5. One of Cassiopeia's dependencies, pysam, requires HTSLib to be installed. You can read about pysam's requirements here.

  6. Ensure the Cython is installed. You can do this via python3.6 pip install --user cython.

  7. While we get pip working, it's best to first clone the package and then follow these instructions:

    • python3.6 setup.py build
    • python3.6 setup.py build_ext --inplace
    • python3.6 -m pip install . --user

To verify that it installed correctly, try using the package in a python session: import cassiopeia. Then, to make sure that the command-line tools work, try reconstruct-lineage -h and confirm that you get the usage details.

Command Line Tools

In addition to allowing users to use Cassiopeia from a python session, we provide five unique command line tools for common pipeline procedures:

  • reconstruct-lineage: Reconstructs a lineage from a provided character matrix (consisting of cells x characters where each element is the observed state of that character in that cell).
  • post-process-tree: Post-process trees after reconstructing to assign sample identities back to leaves of the tree and removing any leaves that don't correspond to a sample in the character matrix.
  • stress-test: Conduct stress testing on a given simulated tree. Writes out a new tree file after inferring a tree from the unique leaves of the "true", simulated tree.
  • call-lineages: Perform lineage group calling from a molecule table.
  • filter-molecule-table: Perform molecule table filtering.

All usage details can be found by using the -h flag.

cassiopeia's People

Contributors

mattjones315 avatar alexkhodaverdian avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.