Coder Social home page Coder Social logo

eorgekit / bacterial_variant_calling Goto Github PK

View Code? Open in Web Editor NEW

This project forked from uct-cbio/bacterial_variant_calling

0.0 1.0 0.0 1.39 MB

A pipeline for variant calling on bacterial genomes created with Nextflow and singularity / docker

License: MIT License

Python 27.98% Perl 4.32% Groovy 4.56% R 11.28% CSS 1.02% Nextflow 50.84%

bacterial_variant_calling's Introduction

uct-cbio/bacterial_variant_calling

Bacterial variant calling and phylogenetics

A pipeline for variant calling on bacterial genomes created with Nextflow and singularity / docker

This pipeline is currently under development. If you wish to use it in future, please feel free to watch the repository.

Quickstart

nextflow run uct-cbio/bacterial_variant_calling --reads sample_sheet.csv --genome H37Rv.fa -with-docker bacterial_env

This is assuming you have a sample sheet formatted as described bellow, and a docker image created with VarDock called 'bacterial_env'.

Basic usage:

The typical command for running the pipeline is as follows:

nextflow run uct-cbio/bacterial_variant_calling --reads sample_sheet.csv --genome refgenome.fa -profile uct_hex
Mandatory arguments:
  --reads                       Path to input data (must be surrounded with quotes)
  --genome                      Path te reference genome against which the reads will be aligned (in fasta format).
  -profile                      Hardware config to use. Currently profile available for UCT's HPC 'uct_hex' - create your own if necessary

Other arguments:
  --outdir                      The output directory where the results will be saved
  --SRAdir                      The directory where reads downloaded from the SRA will be stored
  --email                       Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits
  -name                     

Example run: To run on UCT hex

  1. Start a 'screen' session from the headnode

  2. Start an interactive job using: qsub -I -q UCTlong -l nodes=1:series600:ppn=1 -d pwd

  3. A typical command would look something like:

    nextflow run uct-cbio/bacterial_variant_calling --reads sample_sheet.csv --genome refgenome.fa -profile uct_hex --SRAdir /path/to/writable/dir/

If you are using reads from the SRA, these will be downloaded using the SRA toolkit and deposited in the specified --SRAdir. Please make sure that this directory is writable.

Sample file

To allow for both local reads and reads from the SRA to be used, the pipeline has the ability to pull reads from the SRA based on the accession number (eg, SRR5989977).

The 'number' column must contain a unique value.

number origin replicate isolate R1 R2
1 genomic 1 wgs_sample_1 path/to/reads/reads_R1.fq path/to/reads/reads_R2.fq
2 genomic 2 wgs_sample_1 path/to/reads/reads_R1.fq path/to/reads/reads_R2.fq
3 genomic 3 wgs_sample_1 path/to/reads/reads_R1.fq path/to/reads/reads_R2.fq
4 genomic 1 wgs_sample_2 path/to/reads/reads_R1.fq path/to/reads/reads_R2.fq
5 genomic 2 wgs_sample_2 path/to/reads/reads_R1.fq path/to/reads/reads_R2.fq
6 genomic 3 wgs_sample_2 path/to/reads/reads_R1.fq path/to/reads/reads_R2.fq
7 genomic 1 wgs_sample_3 path/to/reads/reads_R1.fq path/to/reads/reads_R2.fq
8 genomic 2 wgs_sample_3 path/to/reads/reads_R1.fq path/to/reads/reads_R2.fq
9 genomic 3 wgs_sample_3 path/to/reads/reads_R1.fq path/to/reads/reads_R2.fq
10 genomic 1 H37Rv SRR5989977

In the above example, samples 1-9 are locally stored where sample 10 is a control sample from the SRA. Including the accession number in the R1 column will result in the reads from the SRA to be downloaded and used in the analysis. This must be exported to a csv file, with a comma ',' separating the columns:

number,origin,replicate,isolate,R1,R2
1,genomic,1,wgs_sample_1,path/to/reads/reads_R1.fq,path/to/reads/reads_R2.fq
2,genomic,2,wgs_sample_1,path/to/reads/reads_R1.fq,path/to/reads/reads_R2.fq
...
10,genomic,1,H37Rv,SRR5989977

Prerequisites

Nextflow, Docker. All other dependencies are found in the included Docker recipe (VarDock).

Note: if you are working on UCT hex you can simply use the singularity image specified in the uct_hex profile.

Documentation

Read mapping: BWA

Variant caller used: Freebayes

Phylogenetic analysis: RAxML

Other useful info

To create a Singularity image from a Docker image, please make use of Docker to singularity. This is needed to run the pipeline on the UCT cluster.

Known issues

GFF vs GTF format

Some tools require GFF and others GTF, and converting between the formats is often impossible due to poor standard adoption. If the run fails, 90% of the time it is a format issue. The annotation files from NCBI are often the only ones that will work.

Naming of fastq files

Built With

Credits

This pipeline was developed by members of the Bioinformatics Support Team (BST) at the University of Cape Town. Dr. Jon Ambler is a member of CIDRI-Africa, and the main developer of this pipeline, using the layout and documentation outlined by Dr Katie Lennard and Gerrit Botha. Adapted from the nf-core/rnaseq pipeline.

Additional thanks to Paolo Di Tommaso, the developer of NextFlow, for their help troubleshooting.

License

This project is licensed under the MIT License - see the LICENSE file for details

bacterial_variant_calling's People

Contributors

jambler24 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.