Coder Social home page Coder Social logo

diffexprproject's Introduction

diffExprProject

Description

  • This is a project for Loyola University Chicago's computational biology class (COMP 383). I chose track 1 which is differential expression.
  • This project presents a pipeline that enables a user to determine the most differentially expressed genes between different samples of sequencing reads and search a genetic database for other species of a specified family that express those genes. This is a highly dynamic tool that enables a user to customize the pipeline to their needs.
  • The project requires the following requirements preinstalled:
  1. kallisto - command line application to quantify transcripts per million (TPMs) from a set of reads mapped to a reference genome for assembly
  2. ncbi-blast+ - command line application to run a blast search of a query against a genome database locally downloaded
  3. fastq-dump - command line application to split SRA objects into reverse and forward reads
  4. R packages - sleuth, dplyr for statistical analysis of TPMs to find the most differentially expressed transcripts between the samples
  5. Python packages - the following set-up section

Set-up

  1. First clone this repository locally with the following command line arguments:
    • git clone [HTTPS]
  2. Once cloned, move into the main project folder such that your current working directory is the diffExprProject folder
    • cd diffExprProject
  3. Install python requirements
    • create a virtual environment, conda environment, or install locally; below is an example of using python virtual environments
      • python3 -m venv venv -> this create a virtual environment
      • source venv/bin/activate -> this activates it, should see a (venv)
    • install requirements from requirements.txt
      • pip3 install -r requirements.txt

Test Run

RUN

  1. To run
    • python3 run.py -e [EMAIL] -t testData # make sure that the testData folder is inputed as shown and not with any slashs nor relative paths (i.e. . or ..)
  2. For more information on flags and arguments
    • python3 run.py --help
  • make sure email is one that can be used to access NCBI sequences; ensure this is run from the main project directory
  • The test folder contains sample fastq files, a Betaherpesvirinae genome database folder to blast the most significantly expressed genes across, and a metatable that stores information about sample data.
  • The links folder is are the SRA links from where the data comes from.
  • if you change the name of the testData folder or input a different one, ensure that you update the metadata table and/or links flags to the appropriate flag as they default to the test folder but can be adjusted

Output

  • The outputs will be saved in PipelineProject_Rohan_Sethi directory in which results are saved in results and data for input into various functions is saved in data
  • The log file contains information from the test run
  • The log file already in this repo is the one from the full run and not from the test sample data

More Complicated Run

  • one could run the run.py file or run each script separately to meet whatever needs necessary; use the --help flag to get more information for each script is looking to run separately

  • all flags in brackets are optional and default to the following (may want to change if not running test data):

    • -s = testData/links/fileLinks.txt
    • -i = NC_006273.2
    • -e = no default, need to specify one
    • -m = testData/metatable.tsv
    • -l = ./PipelineProject_Rohan_Sethi/PipelineProject.log
    • -n = Betaherpesvirinae
    • -b = None
    • -u = 10
    • -t = None
  • usage: python3 run.py [-h] [-s INPUT] [-i INDEX] -e EMAIL [-m METATABLE] [-l LOGFILE] [-n NAME] [-b BLASTDB] [-u NUMSELECT] [-t TESTDATA]

  • options:

    • -h, --help show this help message and exit
    • -s INPUT, --input INPUT input file with NCBI links for the SRA sample data to download from
    • -i INDEX, --index INDEX input accession id for index to assemble the reads
    • -e EMAIL, --email EMAIL input email for NCBI access via biopython
    • -m METATABLE, --metatable METATABLE metatable tab deliminated containing information about the samples; an example in testData
    • -l LOGFILE, --logfile LOGFILE name/path of log file to store important output information tab delimited
    • -n NAME, --name NAME
      name of species to blast against to see what other species the most differentially expressed genes are expressed in
    • -b BLASTDB, --blastdb BLASTDB if already have blast genome fasta file, then input path to it here; this is if you want to skip the downloaded step if already have a blast fasta; must match the name parameter if using; if not inputed then ignores and downloads what is passed to name parameter
    • -u NUMSELECT, --numSelect NUMSELECT number of blast results to store from the blast search
    • -t TESTDATA, --testData TESTDATA input test data folder name; only if wanting to run test run, else ignore

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.