Coder Social home page Coder Social logo

hutaobo / aluscancnv2 Goto Github PK

View Code? Open in Web Editor NEW
5.0 0.0 1.0 192.49 MB

AluScanCNV2 is an R package that integrates the cross-platform CNV detection and tumor prediction function on the basis of AluScanCNV software developed in the early stage.

R 100.00%
ngs

aluscancnv2's Introduction

AluScanCNV2: an R package for copy number variation-based cancer risk prediction

Installation

AluScanCNV2 can be installed using the install_github function in the devtools package.

library(devtools)
install_github('hutaobo/AluScanCNV2')

CNV calling

The coverageBed tool of the BEDtools software (Quinlan and Hall, 2010) was used to calculate the depth of sequencing reads in the analysis-ready BAM file (coverageBed -hist -a 5k.bin -b output.bed > output.5k.doc). The outputted coverage file was utilized in the AluScanCNV2 package for CNV calling.

AluScanCNV2 relies on Geary-Hinkley transformation (GHT)-based comparison of the read-depth of a sequence window on the test sample with that on either a paired control sample in the case of 'paired CNV' analysis, or a reference template constructed from pooled reference samples in the case of 'unpaired CNV' analysis (Yang et al., 2014).

# Calling of paired CNV
library(AluScanCNV2)
control_doc_path <- system.file("extdata/Breast1_b.5k.doc", package = "AluScanCNV2")
tumor_doc_path <- system.file("extdata/Breast1_1.5k.doc", package = "AluScanCNV2")
pairedCNV(control.5k.doc = control_doc_path, sample.5k.doc = tumor_doc_path, window.size = "500k", output.path = "./")

# Calling of unpaired CNV
sample_doc_path <- system.file("extdata/Breast1_b.5k.doc", package = "AluScanCNV2")
unpairedCNV(sample.5k.doc = sample_doc_path, window.size = "500k", seq.method = "AluScan", output.path = "./")

Plot CNV frequency

path_to_file_1 <- system.file("extdata/Breast1_b.local.500k.unpaired.seg", package = "AluScanCNV2")
path_to_file_2 <- system.file("extdata/Breast1_1.local.500k.unpaired.seg", package = "AluScanCNV2")
p <- plotFrequency(input = c(path_to_file_1, path_to_file_2))
plot(p)

Identification of recurrent CNVs

alu_control <- list.files(path = 'path_to_folder_of_seg_files', full.names = TRUE)
library(AluScanCNV2)
alu_control <- seg2CNV(alu_control)
alu_control$recurrence <- alu_control$recurrence / (ncol(alu_control) - 4)

Selection of informative CNVs

recurr_cnv <- featureSelection2(nonCancerListA = alu_control, CancerListA = alu_cancer, nonCancerListB = wgs_control, CancerListB = wgs_cancer, Cri = 0.33)

Prediction of cancer-predisposition

library(caret)
metric <- "Accuracy"
control <- trainControl(method = "cv", number = 10)

# a) linear algorithms
fit.lda <- train(type ~ ., data = dataset, method = "lda", metric = metric, trControl = control)
# b) nonlinear algorithms
# CART
fit.cart <- train(type ~ ., data = dataset, method = "rpart", metric = metric, trControl = control)
# kNN
fit.knn <- train(type ~ ., data = dataset, method = "knn", metric = metric, trControl = control)
# c) advanced algorithms
# SVM
fit.svm <- train(type ~ ., data = dataset, method = "svmRadial", metric = metric, trControl = control)
# Random Forest
fit.rf <- train(type ~ ., data = dataset, method = "rf", metric = metric, trControl = control)
# d) others
# Naive Bayes
fit.nb <- train(type ~ ., data = dataset, method = "naive_bayes", metric = metric, trControl = control)

results <- resamples(list(lda = fit.lda, cart = fit.cart, knn = fit.knn, svm = fit.svm, rf = fit.rf, nb = fit.nb))
results <- resamples(list(cart = fit.cart, knn = fit.knn, svm = fit.svm, rf = fit.rf, nb = fit.nb))
summary(results)

Functions of validation

library(caret)
library(ggplot2)
# Calibration of the observed probability vs. prediction probability
p <- calPlot(model, data, class)
p

References

Quinlan, A. R. and I. M. Hall (2010). "BEDTools: a flexible suite of utilities for comparing genomic features." Bioinformatics 26(6): 841-842.

Yang, J. F., X. F. Ding, L. Chen, W. K. Mat, M. Z. Xu, J. F. Chen, J. M. Wang, L. Xu, W. S. Poon, A. Kwong, G. K. Leung, T. C. Tan, C. H. Yu, Y. B. Ke, X. Y. Xu, X. Y. Ke, R. C. Ma, J. C. Chan, W. Q. Wan, L. W. Zhang, Y. Kumar, S. Y. Tsang, S. Li, H. Y. Wang and H. Xue (2014). "Copy number variation analysis based on AluScan sequences." J. Clin. Bioinforma. 4(1): 15.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.