morinlab / gamblr.viz Goto Github PK
View Code? Open in Web Editor NEWCollection of functions to make plots for Genomic Analysis of Mature B-cell Lymphomas in R
Home Page: https://morinlab.github.io/GAMBLR.viz/
License: MIT License
Collection of functions to make plots for Genomic Analysis of Mature B-cell Lymphomas in R
Home Page: https://morinlab.github.io/GAMBLR.viz/
License: MIT License
We need github actions for this repo to be set up in a similar way to other GAMBLR packages so we can track proper installation and configuration
Since user can supply maf or regions or both, ideally we could be able to specify projection.
Error in (function (maf_df, onco_matrix_path, genes, include_noncoding = NULL, :
formal argument "maf_df" matched by multiple actual arguments
Instead, calc_mutation_frequency_bin_by_regions
is used, but it's not present in any child repo.
This following code was run after changing the argument maftools_obj
of the internal call of prettyOncoplot
to maf_df
.
#get data for plotting
meta = get_gambl_metadata()
meta = dplyr::filter(meta, cohort %in% c("dlbcl_reddy", "dlbcl_schmitz"))
ssm = dplyr::filter(GAMBLR.data::sample_data$grch37$maf,
Tumor_Sample_Barcode %in% meta$sample_id)
ssm = maftools::read.maf(ssm)
#build plot
prettyCoOncoplot(maf = ssm,
metadata = meta,
comparison_column = "cohort",
include_noncoding = NULL,
minMutationPercent = 0,
genes = c("MYC",
"TET2",
"TP53",
"DDX3X",
"ID3"),
metadataColumns = c("pathology",
"EBV_status_inf",
"pairing_status",
"cohort"),
splitColumnName = "EBV_status_inf",
metadataBarHeight = 10,
fontSizeGene = 12,
metadataBarFontsize = 10,
legend_row = 2,
label1 = "Adult",
label2 = "Pediatric")
# Error: $ operator not defined for this S4 class
This is the object given to prettyOncoplot
's maf_df
argument:
> ssm1
An object of class MAF
ID summary Mean Median
1: NCBI_Build GRCh37 NA NA
2: Center . NA NA
3: Samples 468 NA NA
4: nGenes 234 NA NA
5: Frame_Shift_Del 441 0.942 1
6: Frame_Shift_Ins 201 0.429 0
7: In_Frame_Del 125 0.267 0
8: In_Frame_Ins 20 0.043 0
9: Missense_Mutation 5680 12.137 10
10: Nonsense_Mutation 805 1.720 1
11: Nonstop_Mutation 19 0.041 0
12: Splice_Site 394 0.842 1
13: Translation_Start_Site 85 0.182 0
14: total 7770 16.603 15
> class(ssm1)
[1] "MAF"
attr(,"package")
[1] "maftools"
The error in prettyOncoplot
probably comes from this operation:
> maf_df <- ssm1
> maf_df$Tumor_Sample_Barcode
Error in ssm1$Tumor_Sample_Barcode :
$ operator not defined for this S4 class
When returning SVs fo label, this function always calls the variants with default setting, disregarding what user provides in the main function call. This results in absent label for the SVs present there since the coordinates and chr prefixing does not match. It should be addressed here:
GAMBLR.viz/R/prettyRainfallPlot.R
Lines 213 to 221 in 9f3528f
Refer to this thread with sample plots and calls to test the functionality
seq_type
parameter in pretty_rainfall_plot
should be renamed to this_seq_type
for consistency reasons. Self-assigning.
GAMBLR.viz functions that are currently relying on core GAMBLR functions (GSC access) to retrieve data if MAF/SEG/BEDPE data is not provided need to be updated so that such functions can work out-of-the-box in this package. without any dependencies on GSC-access-dictated GAMBLR functions.
It is only used in fancy_qc_plot and is not a consistent feature throughout - also requires html outputs that are not publcation-friendly. It should be dropped at this point to decrease the dependency burden
Ensure function import tags and in-code package prefixing are constant and up to date. Issue added for tracking purposes.
> library(GAMBLR.results)
> fancy_v_count(this_sample_id = "DOHH-2")
This function only works with matched samples for now
trying to find output from: battenberg
looking for flatfile:
Error in if (!file.exists(battenberg_file)) { :
argument is of length zero
This is because assign_cn_to_ssm
is used internally, which calls get_sample_wildcards
to try to find the battenberg file path. However, get_sample_wildcards
doesn’t work with unmatched
samples.
fancy_v_count
works fine if using the bundled data (GAMBLR.data).
The prettyOncoplot ignores the argument keepSampleOrder and does nothing with it. This should be fixed in a way that when this argument is provided, the order of patients in the supplied metadata is respected.
I wonder if someone can reproduce/check if this is a bug.
genome_ssm_data = get_coding_ssm()
ssm_data_capture = get_coding_ssm(this_seq_type = "capture")
all_ssm = bind_rows(ssm_data_capture,genome_ssm_data) #combine all ssm into one large data frame
#Problem:
#this makes a plot with every cohort in the legend even if no samples from the cohort exist in the MAF
prettyOncoplot(maf_df = all_ssm,these_samples_metadata =
all_sample_meta,
minMutationPercent = 5,metadataColumns = c("pathology","cohort"),
sortByColumns = c("pathology","cohort"))
# Expected functionality (requires an explicit filter)
# this only shows the relevant cohorts in the legend (i.e. those in the oncoplot)
prettyOncoplot(maf_df = all_ssm,these_samples_metadata =
filter(all_sample_meta,sample_id %in% all_ssm$Tumor_Sample_Barcode),
minMutationPercent = 5,metadataColumns = c("pathology","cohort"),
sortByColumns = c("pathology","cohort"))```
[Slack Message](https://morinlabsfu.slack.com/archives/C0224H120CU/p1706573886956999?thread_ts=1706573886.956999&cid=C0224H120CU)
It is missing the github actions status badge and should contain the link to the website
ERROR: dependencies ‘ComplexHeatmap’, ‘g3viz’, ‘GAMBLR.utils’, ‘maftools’ are not available for package ‘GAMBLR.viz’
The current implementation of prettyRainfallPlot uses annotated SVs for labelling. This should become optional and user-configurable, so user can look at raw (not annotated) SVs.
When working on this update, the sv_data also needs to be optionally passed as argument to allow for visualization of user-provided data
I think because when we don't supply the custom colors it is internally handled but when the user provides it then there is trust no modifications are necessary. We can probably add internal call to convert NAs to literal string anyways
The theme_cowplot is used in few functions but this goes against the consistency we're after in the plots generated by this package. They should be swapped for theme_Morons()
throughout.
The prettyForestPlot uses cowplot to arrange figures but it should be possible to swap it with ggpubr approach which is already used as dependency too, so we decrease overall dependency burden.
maf_metadata <- get_gambl_metadata(seq_type_filter = "genome") %>%
dplyr::filter(pathology %in% c("FL", "DLBCL"))
maf_data <- get_ssm_by_samples(
these_samples_metadata = maf_metadata
)
fl_genes = c("RRAGC", "CREBBP", "VMA21", "ATP6V1B2")
dlbcl_genes = c("EZH2", "KMT2D", "MEF2B", "CD79B", "MYD88", "TP53")
genes = c(fl_genes, dlbcl_genes)
gene_groups = c(rep("FL", length(fl_genes)), rep("DLBCL", length(dlbcl_genes)))
names(gene_groups) = genes
prettyOncoplot(
maf_df = maf_data,
genes = genes,
these_samples_metadata = maf_metadata %>%
arrange(patient_id),
splitGeneGroups = gene_groups,
keepGeneOrder = TRUE,
splitColumnName = "pathology",
metadataBarHeight = 5,
metadataBarFontsize = 8,
legend_row = 2,
fontSizeGene = 11,
metadataColumns = c("pathology", "lymphgen", "sex"),
sortByColumns = c("pathology", "lymphgen", "sex")
)
All mutation types: Missense_Mutation, Multi_Hit, Frame_Shift_Del, Splice_Site,
Nonsense_Mutation, Frame_Shift_Ins, In_Frame_Del, Translation_Start_Site.
`alter_fun` is assumed vectorizable. If it does not generate correct plot, please set
`alter_fun_is_vectorized = FALSE` in `oncoPrint()`.
Error: elements in `col` should be named vectors.
The error happens in the ComplexHeatmap::oncoPrint
call. Here is the object that is given to the col
parameter of oncoPrint
:
> col
Nonsense_Mutation Missense_Mutation Multi_Hit Frame_Shift_Ins
"#c41230" "#39b54b" "#455564" "#e90c8b"
Frame_Shift_Del In_Frame_Ins In_Frame_Del Nonstop_Mutation
"#e90c8b" "#5f3a17" "#5f3a17" "#2cace3"
Translation_Start_Site Splice_Site Splice_Region 3'UTR
"#8781bd" "#fe9003" "#fe9003" "#f9bd1f"
Silent
"#A020F0"
Only used in prettyRainfallPlot and should be reconfigured with dplyr so we decrease the dependency burden on viz package
The warning message:
The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
ℹ Please use the `fun` argument instead.
The issue can be reproduced by running the examples:
#load packages
library(dplyr)
#get sample IDs for available genome samples
genome_collated = collate_results(seq_type_filter = "genome") %>%
pull(sample_id)
#subset the collated samples on BL samples
my_samples = get_gambl_metadata() %>%
dplyr::filter(sample_id %in% genome_collated) %>%
dplyr::filter(pathology == "BL") %>% pull(sample_id)
fancy_propcov_plot(these_sample_ids = my_samples)
When prettyOncoplot is supplied with hide_annotations
, the color legend is hidden but the track itself is still shown. This should become optional and if user wants to do so, the metadata track should be hidden as well.
This is important when you want to force ordering on some column but not show it in the track below the oncoplot.
Functionality of other viz functions returns pathology only if no metadata column is selected; this function shows pathology as metadata even if it is not specified (and other metadata is specified).
Example:
aSHM.regions.hg38 <- data.frame("chrom" = c("chr22","chr2", "chr11", "chr14"),
start = c(22026076,88857361, 69641155, 105586437) ,
end = c(22922913, 90235368, 69654474, 106879844),
name = c("IGL", "IGK", "CCND1", "IGH"))
heatmap_mutation_frequency_bin(regions_bed = aSHM.regions.hg38,
these_samples_metadata = get_gambl_metadata() %>% filter(pathology == "MCL"),
projection="hg38",
metadataColumns = c("cohort"))
The heatmap_mutation_frequency_bin
used to plot the column titles with rotation 90. But this feature was lost at some point. See this parameter:
GAMBLR.viz/R/heatmap_mutation_frequency_bin.R
Line 347 in 6563146
The default setting in the earlier versions of this function were set to 90, so the gene names do not overlap when single regions are plotted. Probably, this was changed intentionally because it was making some version of the resulting plot better. So why don't we convert this to a user-configurable argument that can be dynamically set based on user's needs. The default can be 0 to respect current behaviour.
I removed the ggsci-related functions from this package but neglected to remove the line under the Remotes section. This line should be safe to delete and should be done to prevent the package from unnecessarily installing/upgrading.
Self-explanatory
When making the tutorial, I noticed this info, while useful, is printed at every iteration of the oncoplot so after some time gets annoying and polluting the document when the oncoplot is generated and displayed in the same chunk:
[1] "numcases: 441"
[1] "numgenes: 10"
This should only be displayed when the verbose
is set to TRUE.
In addition, when the forest object is provided for the gene annotations, the genes that are not statistically different between the groups are shown as Both
in the Enriched in
legend - and this should be fixed to become Neither
.
prettyOncoplot
is missing the line that assigns POS/NEG colours for columns when a custom colours aren't specified.
When I run it in verbose
mode it does indicate that it found the colours:
[1] ">>>>>>>"
[1] "POS" "NEG"
[1] "<<<<<<<"
found colours for bcl2_ba here
[1] ">>>>>>>"
[1] "POS" "NEG"
[1] "<<<<<<<"
found colours for bcl6_ba here
$lymphgen
EZB-MYC EZB EZB-COMP ST2 ST2-COMP MCD MCD-COMP BN2 BN2-COMP N1 N1-COMP A53 Other COMPOSITE NA
"#52000F" "#721F0F" "#C7371A" "#C41230" "#EC3251" "#3B5FAC" "#6787CB" "#7F3293" "#A949C1" "#55B55E" "#7FC787" "#5b6d8a" "#ACADAF" "#ACADAF" "white"
$bcl2_ba
POS NEG NA
"#c41230" "#E88873" "white"
$bcl6_ba
POS NEG NA
"#c41230" "#E88873" "white"
But it then returns the error
Error: elements in `col` should be named vectors.
This is using the bcl2_ba
and bcl6_ba
columns returned by get_gambl_metadata
.
Currently the prettylollipoplot uses g3viz under the hood to generate nice lollipop plots. However, the produced outputs are html-based, and the package itself has cBioPortal API dependency which in turn makes GAMBLR.viz very bulky, prone to installation errors, and requires very long installation time.
The ggplot2 on the other end is already a dependency and implementing the lollipop functionality through ggplot2 will address all of these issues.
This bug happens when GAMBLR.data is loaded.
library(GAMBLR.datra)
> comp_report(this_sample = "HTMCP-01-06-00422-01A-01D",
+ out = "reports/",
+ export_individual_plots = TRUE)
Using the bundled CN segments (.seg) calls in GAMBLR.data...
Error: You have given one or more unsupported or deprecated argument to assign_cn_to_ssm . Please check the documentation and spelling of your arguments.
Offending argument(s): from_flatfile,use_augmented_maf
heatmap_mutation_frequency_bin
internally calls calc_mutation_frequency_bin_regions
. However, when using the
bundled data, GAMBLR.data::calc_mutation_frequency_bin_regions
is missing the parameters from_indexed_flatfile
and mode
, which are required in heatmap_mutation_frequency_bin
.
dlbcl_bl_meta = get_gambl_metadata() %>%
dplyr::filter(pathology %in% c("DLBCL", "BL"))
some_regions = GAMBLR.data::somatic_hypermutation_locations_GRCh37_v_latest %>%
dplyr::filter(!gene %in% c("BTG2", "CXCR4", "ST6GAL1", "BCL6", "LPP", "RHOH", "CD83",
"PIM1", "BACH2", "SGK1", "MYC", "PAX5", "GRHPR", "FANK1", "BIRC3",
"BTG1", "DTX1", "BCL7A", "ZFP36L1", "SERPINA9", "TCL1A", "CIITA",
"IRF8", "S1PR2", "MEF2B", "IGLL5", "TMSB4X", "PIM2")) %>%
select(chr_name, hg19_start, hg19_end, gene) %>%
rename( "chrom"="chr_name", "start"="hg19_start", "end"="hg19_end", "name"="gene") %>%
mutate( chrom = str_remove(chrom, "chr") )
heatmap_mutation_frequency_bin(these_samples_metadata = dlbcl_bl_meta,
regions_bed = some_regions,
from_indexed_flatfile = TRUE,
mode = "slms-3")
# Warning: Multiple regions in the provided data frame have the same name. Merging these entries based on min(start) and max(end) per name value.
# Error: You have given one or more unsupported or deprecated argument to calc_mutation_frequency_bin_regions . Please check the documentation and spelling of your arguments.
# Offending argument(s): from_indexed_flatfile,mode
Should we remove both from_indexed_flatfile
and mode
parameters from heatmap_mutation_frequency_bin
and don't specify these parameters in the internal call of calc_mutation_frequency_bin_regions
? Then the GAMBLR.results version of calc_mutation_frequency_bin_regions
would always use the default values for these parameters.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.