Coder Social home page Coder Social logo

mitochondrial-genome-scripts's Introduction

Mitochondrial-Genome-Scripts: The code associated with this repository was used to assemble and analyze the primary genome discussed in Kovar et al., in review with Genome Biology and Evolution.

There are two shell scripts included herein - one counts mononucleotides and the other is the iterative genome assembler

MONONUCLEOTIDE COUNTING: The first tool (Mononucleotide_repeat_calc.sh) is a simple script to count mononucleotide repeats in a genome (a fasta file). The script includes comments explaining each step and the input requirements.

ITERATIVE GENOME ASSEMBLER: The second tool (Iternative_assembly_script.sh) is a pipeline for the iterative assembly of mitochondrial (or similar genomes) from long read data (PacBio data). Below is a description of the software requirements. The script also includes comments explaining each step.

Requirements: Linux Ubuntu Server (or similar) run from the command line.

You must have the following non-standard software installed and included in your path to run the script:

CANU (Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research 2017; doi: https://doi.org/10.1101/gr.215087.116)

Blasr (https://github.com/PacificBiosciences/blasr)

Seqtk (https://github.com/lh3/seqtk)

You must have the follow files under these names in a single input/output folder that will be used for the run.

“seed0.fasta” – this is the initial reference genome(s) needed to start the first round of assembly. If you use multiple starting references they should be in this single file as separate fasta sequences.

“round0.fastq” – this file must exist but be completely empty.

“seed0.blasr.sort.list.out.fixed” – this file must exist but be completely empty

The forloop in this bash script includes 10 cycles of assembly. Each round identifies the reads matching the starting genome and assembles on the available data. See the full manuscript for more information.

If you want to use this script, you will need to substitute all of your own appropriate paths to folders and starting files.

If you want to use this script, you will need to substitute all of your own appropriate paths to folders and starting files.

mitochondrial-genome-scripts's People

Contributors

cdb3ny avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.