Coder Social home page Coder Social logo

mariachiaragrieco / prunus-cp-genome-assembly Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 11 KB

Assembly of chloroplast genome using long-reads and short-reads for Genomics Course 2021 (MSc Bioinformatics for Computational Genomics)

Shell 100.00%
genome-assembly chloroplast pacbio nanopore illumina genomics long-reads

prunus-cp-genome-assembly's Introduction

prunus-cp-genome-assembly

Project for Genomics Course 2021 (MSc Bioinformatics for Computational Genomics) helded by Prof. Aureliano Gomez Bombarely at Università degli Studi di Milano.

Aim

The project aims to investigate and compare the de novo assemblies of the chloroplast genome using both short-reads of Illumina and long-reads coming from PacBio and Oxford Nanopore.

The chloroplast genome of Prunus avium was chosen for the assembly.

Description

1. Retrieving raw data

The sequencing raw data were retrieved from the SRA repository in NCBI under the accession number:

The reference chloroplast genome of Prunus avium (MK622380.1) and Prunus apetala (NC_053693.1) were taken from NCBI Nucleotide Database.

The raw data were downloaded and converted into FASTQ files using fastq-dump.

The statistics of the fastq were obtained using fastq-stats.

2. Mapping

The mapping was performed both to the prunus avium and prunus apetala using:

Illumina reads were pre-processed by using fastq-mcf to remove the adapters before the mapping.

The reads mapping to the chloroplast genome were extracted using samtoos view.

The mapping stats were evaluated with samtools stats.

3. Coverage evaluation

The mapped reads were sorted by position using samtool sort.

bedtools genomecov with the option -d was used to obtain the depth at each genome position.

4. Assembly

The BAM files obtained from the mapping to Prunus avium were converted into FASTQ with bedtools bamtofast.

In order to find the best assembly, several subsets of reads were generated by sektq sample.

The assemblies were obtained choosing:

  • canu for long-reads
  • AbySS for short-reads comparing different kmer size

The statistics of each assembly were obtained with FastaSeqStats.

5. Annotation of the longest contigs reconstructed

The longest contigs for each dataset were selected using FastaExtract and aligned to the reference using BLASTN.

The annotation was performed using GeSeq.

prunus-cp-genome-assembly's People

Contributors

mariachiaragrieco avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.