Hello,
First of all, thank you very much for developing panelcn.mops. It is a really helpful tool.
I am having a problem when plotting all exons of a candidate gene from a panel. The problem is basically that if the gene is in the forward strand, the first exon does not show up in the plot and if the gene is in the reverse strand, the last exon of the gene is not plotted. My bed file is sorted by chr, position, so basically the plotBoxplot() command is not plotting the first exon that appears in the bed file.
However, the results table created with the createResultTable() command shows all the exons from each candidate gene.
Here you have the code that I have used to analyze my gene panel so that you can figure out if I have made any errors:
# Load library
library(panelcn.mops)
# Read input files
bed <- "Genes_with_exon_coordinates_sorted_31bp_flanking_sides.bed"
countWindows <- getWindows(bed)
# Sample to analyze
Sample_X_bam <- "BAMs/Sample_X.bam"
Sample_X <- countBamListInGRanges(countWindows = countWindows,
bam.files = Sample_X_bam, read.width = 150)
# Control samples
Sample_A_bam <- "BAMs/Sample_A.bam"
Sample_B_bam <- "BAMs/Sample_B.bam"
Sample_C_bam <- "BAMs/Sample_C.bam"
Sample_D_bam <- "BAMs/Sample_D.bam"
Sample_E_bam <- "BAMs/Sample_E.bam"
Sample_A <- countBamListInGRanges(countWindows = countWindows,
bam.files = Sample_A_bam, read.width = 150)
Sample_B <- countBamListInGRanges(countWindows = countWindows,
bam.files = Sample_B_bam, read.width = 150)
Sample_C <- countBamListInGRanges(countWindows = countWindows,
bam.files = Sample_C_bam, read.width = 150)
Sample_D <- countBamListInGRanges(countWindows = countWindows,
bam.files = Sample_D_bam, read.width = 150)
Sample_E <- countBamListInGRanges(countWindows = countWindows,
bam.files = Sample_E_bam, read.width = 150)
# [...] I have 109 controls in my dataset
controls <- countBamListInGRanges(countWindows = countWindows,
bam.files = c(Sample_A_bam, Sample_B_bam, Sample_C_bam, Sample_D_bam,
Sample_E_bam),
read.width = 150)
# The gene panel contains 26 genes, but I am only analyzing 4 of them
selectedGenes <- c("GENE_1", "GENE_2", "GENE_3", "GENE_4")
XandCB <- Sample_X
elementMetadata(XandCB) <- cbind(elementMetadata(XandCB),
elementMetadata(controls))
resultlist <- runPanelcnMops(XandCB, countWindows = countWindows,
selectedGenes = selectedGenes)
# Although the countWindows contains 548 obs. of 6 variables, the resultlist identifies only 495 Iranges
(str(resultlist[[1]]))
integerCopyNumber(resultlist[[1]])[1:5]
sampleNames <- colnames(elementMetadata(Sample_X))
resulttable <- createResultTable(resultlist = resultlist, XandCB = XandCB,
countWindows = countWindows,
selectedGenes = selectedGenes,
sampleNames = sampleNames)
# This table shows all exons from the 4 genes
resulttable[[1]]
# GENE_1 - Located in the reverse strand - Does not plot last exon
plotBoxplot(result = resultlist[[1]], sampleName = sampleNames[1],
countWindows = countWindows,
selectedGenes = selectedGenes, showGene = 1)
# GENE_2 - Located in the forward strand - Does not plot first exon
plotBoxplot(result = resultlist[[1]], sampleName = sampleNames[1],
countWindows = countWindows,
selectedGenes = selectedGenes, showGene = 2)
# GENE_3 - Located in the forward strand - Does not plot first exon
plotBoxplot(result = resultlist[[1]], sampleName = sampleNames[1],
countWindows = countWindows,
selectedGenes = selectedGenes, showGene = 3)
# GENE_4 - Located in the reverse strand - Does not plot last exon
plotBoxplot(result = resultlist[[1]], sampleName = sampleNames[1],
countWindows = countWindows,
selectedGenes = selectedGenes, showGene = 4)
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] panelcn.mops_1.9.0 cn.mops_1.36.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.2 IRanges_2.24.0
[6] S4Vectors_0.28.1 BiocGenerics_0.36.0
loaded via a namespace (and not attached):
[1] compiler_4.0.3 BiocManager_1.30.10 XVector_0.30.0 remotes_2.2.0
[5] prettyunits_1.1.1 bitops_1.0-6 tools_4.0.3 zlibbioc_1.36.0
[9] testthat_3.0.0 digest_0.6.27 pkgbuild_1.1.0 pkgload_1.1.0
[13] lifecycle_0.2.0 memoise_1.1.0 rlang_0.4.9 cli_2.2.0
[17] rstudioapi_0.13 curl_4.3 GenomeInfoDbData_1.2.4 withr_2.3.0
[21] fs_1.5.0 Biostrings_2.58.0 desc_1.2.0 devtools_2.3.2
[25] rprojroot_2.0.2 glue_1.4.2 Biobase_2.50.0 R6_2.5.0
[29] processx_3.4.5 fansi_0.4.1 exomeCopy_1.36.0 BiocParallel_1.24.1
[33] sessioninfo_1.1.1 purrr_0.3.4 magrittr_2.0.1 callr_3.5.1
[37] usethis_2.0.0 Rsamtools_2.6.0 ps_1.5.0 ellipsis_0.3.1
[41] assertthat_0.2.1 RCurl_1.98-1.2 crayon_1.3.4
Graphical example:
Analysis performed with +/- 31 bp flanking region for each exon. In this case Ex12 is missing from the plot:
![image](https://user-images.githubusercontent.com/15621626/103579380-12efe900-4e8d-11eb-8fbb-c7b9e7936267.png)
![image](https://user-images.githubusercontent.com/15621626/103578976-5bf36d80-4e8c-11eb-8f73-31cba294c194.png)
I have continued doing tests and I have tried to sort the bed file according to transcriptional order (Ex1, Ex2, Ex3, ...) to see if the function plotBoxplot() would plot the Ex1 for the genes from the reverse strand. The analysis is performed in the right way when you visualize it with the resulttable <- createResultTable() function. This analysis was performed without the +/- 31 bp flanking region for each exon. Therefore the exon sizes and plots are slightly different, but the concept is the same one:
![image](https://user-images.githubusercontent.com/15621626/103611015-368b5180-4ed6-11eb-8ee7-8b07ba8fcad9.png)
The plotBoxplot() function does not plot the Ex1 for the reordered exons, as expected.
Additionally, I have found a potential "bug" in the plotBoxplot() function. It reverts the X-axis labels according to the new bed file (Ex1, Ex2, Ex3, Ex4,....) , but the boxplots stay in the same order, that is in chromosomal order (Ex12, Ex11, Ex10, Ex9, ...), which would generate an erroneous plot:
![image](https://user-images.githubusercontent.com/15621626/103611881-165c9200-4ed8-11eb-8490-9db1c7b088b0.png)
Now the sample would have an Ex7-Ex10 deletion, instead of the real Ex3-Ex6 deletion that is shown in both tables.
In order for the plotBoxplot() function to work properly, the bed file needs to be sorted in chromosomal order. If not, you can get an erroneous plot, even when the analysis and the table show the correct results.
Am I doing something wrong? How can I plot the first exon of each gene?
I have even tried to add the UTR region to the bed file to have a first record for each gene before the first exon, but that does not work either.
Does anybody have had these issues before?
Thank you very much,
Best Regards,