Coder Social home page Coder Social logo

alginment2vcf's Introduction

ALGINMENT2VCF

First Download Brachypodium Genome reference file

https://www.ncbi.nlm.nih.gov/guide/howto/dwn-genome/

Select Genomes FTP site .....

Screenshot from 2022-04-07 14-42-21

Select /refseq index

Screenshot from 2022-04-07 14-43-21

Select /plant Index

Screenshot from 2022-04-07 14-44-11

Select /Brachypodium _distachyon

Screenshot from 2022-04-07 14-44-43

Select /all_assembly_versions

Screenshot from 2022-04-07 14-45-53

I choose version one v1

Screenshot from 2022-04-07 14-46-34

I choose an ascension run ERR4835478 and used fastq-dump ERR4835478 to generate a fastq file

fastq-dump ERR4835478

I use BWA to align Fasta with FASTQ and generate BAM file

Screenshot from 2022-04-08 08-29-47

~9 Hours @ 18.1 GB

Take a look at the HEader of The FASTQ file

Screenshot from 2022-04-08 08-32-01

We run fastqc R package on our FASTQ file with the following R script

setwd("/home/michael/Desktop/Alignment")
install.packages("fastqcr")
library(fastqcr)
fastqc_install()
fastqc(fq.dir = "/home/michael/Desktop/Alignment/FASTQ",threads = 4)
qc.file <- "/home/michael/Desktop/Alignment/FASTQ/FASTQC/ERR4835478_fastqc.zip"
qc <- qc_read(qc.file)
names(qc)
#These commands are only necessary to plot in RStudio Viewer, otherwise the most important plots are automatically genearted to HTML file in FASTQC directory

qc_plot(qc, "summary")
qc_plot(qc, "Basic statistics")
qc_plot(qc, "Per base sequence quality")
qc_plot(qc, "Per base sequence content")
qc_plot(qc, "Per sequence GC content")
qc_plot(qc, "Per sequence quality scores")
qc_plot(qc, "Sequence duplication levels")
qc_plot(qc, "Per base N content")
qc_plot(qc, "Sequence length distribution")
qc_plot(qc, "Sequence duplication levels")
qc_plot(qc, "Overrepresented sequences")
qc_plot(qc, "Adapter content")
qc_plot(qc, "Kmer content")

Open the HTML file in the FASTQC directory Generated by fastqc function

I have taken several screenshots

Screenshot from 2022-04-08 08-54-24 Screenshot from 2022-04-08 08-54-35 Screenshot from 2022-04-08 08-54-46 Screenshot from 2022-04-08 08-54-58 Screenshot from 2022-04-08 08-55-08 Screenshot from 2022-04-08 08-55-18 Screenshot from 2022-04-08 08-55-28 Screenshot from 2022-04-08 08-55-38 Screenshot from 2022-04-08 08-55-49 Screenshot from 2022-04-08 08-55-59 Screenshot from 2022-04-08 08-56-07

Check the Data Structure

Screenshot from 2022-04-08 09-01-13

Index your reference genome FASTA file first

Screenshot from 2022-04-07 14-23-55

Use samtools to sort BAM file

Screenshot from 2022-04-08 09-36-32

Index the Sorted Reads

Screenshot from 2022-04-08 09-40-36

Make a VCF file with BAM, Indexed Bam, and Reference Genome FASTA

Screenshot from 2022-04-08 12-58-22

At this point you should have 6 directories and 17 files

Screenshot from 2022-04-08 13-05-08

Rename the sample to match Ascension Identifier

Screenshot from 2022-04-08 13-30-06

VCF Header and Info Fields

Screenshot from 2022-04-08 13-30-48

Use ChromQual to plot important figures with R Script

setwd("/home/michael/Desktop/Alignment/plots")
library(QTLseqr)
library(tinytex)
library(vcfR)
library(tidyr)
library(ggplot2)
library(dplyr)
library(ggrepel)
library(ggpubr)
library(data.table)
# Define a vector of Chromosomes
Chroms <- c("NC_016131.3","NC_016132.3","NC_016133.3","NC_016134.3","NC_016135.3")
file <- "vcfnewsamplename.vcf.gz"
QTLseqr::ChromQual(file = file,chromlist = Chroms, windowSize = 1e+05, HighLimQuality = 15000, scalar = 1, ncol = 5, binwidth1 = 10, binwidth2 = 100, p1 = TRUE, p2 = TRUE, p3 = TRUE, p4 = TRUE, p5 = TRUE)

Standard R Console Output

Screenshot from 2022-04-08 15-07-32

And Plots p1, p2, and p5 as Generated in View Panel

Screenshot from 2022-04-08 15-21-54

Screenshot from 2022-04-08 15-26-48

Screenshot from 2022-04-11 08-54-33

Using RMVP Package plot SNP densities per chromosome/contig

library(rMVP)
sample<-"ERR4835478"
pathtosample <- "/home/michael/Desktop/Alignment/plots/vcfnewsamplename.vcf"
out<- paste0("mvp.",sample,".vcf")
memo<-paste0(sample)
dffile<-paste0("mvp.",sample,".vcf.geno.map")

message("Making MVP data S1")
MVP.Data(fileVCF=pathtosample,
         #filePhe="Phenotype.txt",
         fileKin=FALSE,
         filePC=FALSE,
         out=out
)
message("Reading MVP Data S1")
df <- read.table(file = dffile, header=TRUE)
message("Making SNP Density Plots")
MVP.Report.Density(df[,c(1:3)], bin.size = 100000, col = c("blue", "yellow", "red"), memo = memo, file.type = "jpg", dpi=300)

Screenshot from 2022-04-08 15-54-00

The IGV can be used to view Variants as well

A Big Picture Overview

Screenshot from 2022-04-11 17-33-26

And the first two varaints

Screenshot from 2022-04-11 14-09-00

Can also see particular statistics on each variant by double clicking

Screenshot from 2022-04-11 14-14-49

The Reference Genome makes the Protein Asparagine from the DNA sequence AAT

So does the SNP change the protein that is made, yes in some cases it does including this one.

Here is the molecular structure of Asparagine

Asparagine

The SNP changes AAT to GAT which makes the protein Aspartic Acid

AsparticAcid

So similar proteins but NOT THE SAME!

A Good reference for AMino Acids

https://www.aatbio.com/data-sets/amino-acid-reference-chart-table

#We can also use Qualimap for more analysis. I made an environment for this purpose. qualimapenvironment

#And then I ran this command qualimap

#Select file drop down to New Analysis and finally BAMQC. Browse your directory for the sorted BAM file and >>>>> Start Analysis aag

99

100

101

102

103

104

105

106

107

108

110

111

112

alginment2vcf's People

Contributors

pbglmichaelhall avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.