Coder Social home page Coder Social logo

mtx-mtdna's Introduction

mtdna-dlp

scripts to generate figures/tables for mtDNA DLP+ paper

Installation (in the command line)

Clone this repository (git clone https://github.com/reznik-lab/mtdna-dlp.git)

Install miniconda: https://docs.conda.io/en/latest/miniconda.html

Create a new conda environment and activate it (make sure it is activated before installing the dependencies):

conda create -n [environment]
conda activate [environment]

Install the following:

conda install -c bioconda ensembl-vep=95.3
conda install -c bioconda vt
conda install -c bioconda pybedtools
conda install -c bioconda bcftools
conda install -c bioconda samtools
conda install -c bioconda vcf2maf
conda install -c bioconda pysam
conda install -c bioconda gatk4
conda install -c conda-forge biopython
conda install numpy
conda install pandas
conda install -c conda-forge matplotlib

Additionally, a VEP offline cache needs to be installed (NOTE: CACHE MUST BE SAME VERSION AS VEP VERSION). Please refer to https://uswest.ensembl.org/info/docs/tools/vep/script/vep_cache.html for instructions on how to install a VEP cache. Due to the size of the caches, it will likely take several hours to install.

Running the single cell pipeline (in the command line)

Navigate to the directory with the scMTpipeline.py file and run the following (replace all brackets):

conda activate [environment]
python3 scMTpipeline.py -d [data_directory] -r [reference_fasta] -w [working_directory] -l [library_id] -re [results_directory] -vc [optional_vep_cache_directory] -q [optional_mapping_quality] -Q [optional_base_quality] -s [optional_strand] -p [optional_patternlist] -t [optional_threshold]

Parameter descriptions:

  • Data directory: path to directory with input .bam files
  • Reference fasta: path to fasta file (recommended to use [working_directory]/reference/b37/b37_MT.fa for GRCh37 or [working_directory]/reference/GRCh38/genome_MT.fa for GRCh38)
  • Working directory: path to directory with scMTpipeline.py file in it
  • Library ID: name of .bam file to use as input
  • Results directory: path to directory where results will be stored
  • (OPTIONAL )VEP cache directory: path to directory with VEP cache
  • (OPTIONAL) Mapping quality: minimum mapping quality (default=20)
  • (OPTIONAL) Base qualtiy: minimum base quality (default=20)
  • (OPTIONAL) Strand: minimum number of reads mapping to forward and reverse strand to call mutation (default=2)
  • (OPTIONAL) Patternlist: file containing a list of filenames to process at a time (to be used when there are many files to process)
  • (OPTIONAL) Threshold: critical threshold for calling a cell wild-type (default=0.1)

For example, a call to run the single cell pipeline with the minimum paramaters could look like this:

python3 scMTpipeline.py -d /my_data/ -r /my_home/mtdna-dlp/python/reference/b37/b37_MT.fa -w /my_home/mtdna-dlp/python/ -l my_file -re /my_home/mtdna-dlp/results/

To run scMTpipeline.py on the provided example data:

python3 scMTpipeline.py -d /mtdna-dlp/python/example_data/ -r /mtdna-dlp/python/reference/b37/b37_MT.fa -w /mtdna-dlp/python/ -l SA1101-A96155C -re /mtdna-dlp/python/results

mtx-mtdna's People

Contributors

jlanillos avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.