morinlab / gamblr.viz Goto Github PK

View Code? Open in Web Editor NEW

0.0 4.0 0.0 6.13 MB

Collection of functions to make plots for Genomic Analysis of Mature B-cell Lymphomas in R

Home Page: https://morinlab.github.io/GAMBLR.viz/

License: MIT License

R 100.00%

gamblr.viz's Introduction

GAMBLR.viz

Collection of functions to make plots for Genomic Analysis of Mature B-cell Lymphomas in R.

For detailed information about how to install, bundled resources, functionality overview, tutorials, frequently asked questions and much more please visit the website morinlab.github.io/GAMBLR.viz

Contributing

Cloning repo for the code development

The easiest way to obtain and contribute to GAMBLR.viz is to do this via cloning the repository

cd
git clone [email protected]:morinlab/GAMBLR.viz.git

In your R editor of choice, set your working directory to the place you just cloned the repo.

setwd("~/GAMBLR.viz")

Install the package in R by running the following command (requires the devtools package)

devtools::install()

As GAMBL users (GAMBLRs, so to speak) rely on the functionality of this package, the Master branch is protected. All commits must be submitted via pull request on a branch. Please refer to the GAMBL documentation for details on how to do this.

Function conflicts

This package relies on the use of some functions (e.g. get_gambl_metadata(), get_coding_ssm() etc) that exist in 2 different versions: GAMBLR.data for the users who do not have access to GSC and GAMBLR.results for the Morin Lab users with access to GSC. If your contribution relies on the use of such functions, please follow these 2 steps:

DO NOT prepend the function use with <package>:: (for example, <package>::function()), and
DO NOT add the corresponding package to the @import section of the function

Following these steps will ensure correct usage of the proper function depending on which package is loaded in the session and will avoid functionality conflicts.

gamblr.viz's People

Contributors

Watchers

gamblr.viz's Issues

Website/tutorials for GAMBLR.viz

setup github actions for this package

We need github actions for this repo to be set up in a similar way to other GAMBLR packages so we can track proper installation and configuration

`heatmap_mutation_frequency_bin` shows pathology even when not specified

Functionality of other viz functions returns pathology only if no metadata column is selected; this function shows pathology as metadata even if it is not specified (and other metadata is specified).

Example:

aSHM.regions.hg38 <- data.frame("chrom" = c("chr22","chr2", "chr11", "chr14"), 
                                                start = c(22026076,88857361, 69641155, 105586437) , 
                                                end = c(22922913, 90235368, 69654474, 106879844), 
                                                name = c("IGL", "IGK", "CCND1", "IGH"))

heatmap_mutation_frequency_bin(regions_bed = aSHM.regions.hg38, 
                               these_samples_metadata = get_gambl_metadata() %>% filter(pathology == "MCL"), 
                               projection="hg38", 
                               metadataColumns = c("cohort"))

error in `prettyCoOncoplot`

Error in (function (maf_df, onco_matrix_path, genes, include_noncoding = NULL, :
formal argument "maf_df" matched by multiple actual arguments

`comp_report` finds errors to be fixed in GAMBLR.data and GAMBLR.results

This bug happens when GAMBLR.data is loaded.

library(GAMBLR.datra)
> comp_report(this_sample = "HTMCP-01-06-00422-01A-01D",
+                   out = "reports/",
+                   export_individual_plots = TRUE)
Using the bundled CN segments (.seg) calls in GAMBLR.data...
Error:  You have given one or more unsupported or deprecated argument to  assign_cn_to_ssm . Please check the documentation and spelling of your arguments.
Offending argument(s): from_flatfile,use_augmented_maf

Ensure viz passes the devtools check

Self-explanatory

`fancy_v_count` doesn't work when GAMBLR.results is loaded

> library(GAMBLR.results)
> fancy_v_count(this_sample_id = "DOHH-2")
This function only works with matched samples for now                                                                                                          
trying to find output from: battenberg                                                                                                                         
looking for flatfile: 
Error in if (!file.exists(battenberg_file)) { : 
  argument is of length zero

This is because assign_cn_to_ssm is used internally, which calls get_sample_wildcards to try to find the battenberg file path. However, get_sample_wildcards doesn’t work with unmatched samples.

fancy_v_count works fine if using the bundled data (GAMBLR.data).

`heatmap_mutation_frequency_bin` lost rotation of the column title

The heatmap_mutation_frequency_bin used to plot the column titles with rotation 90. But this feature was lost at some point. See this parameter:

GAMBLR.viz/R/heatmap_mutation_frequency_bin.R

Line 347 in 6563146

column_title_rot = 0,

The default setting in the earlier versions of this function were set to 90, so the gene names do not overlap when single regions are plotted. Probably, this was changed intentionally because it was making some version of the resulting plot better. So why don't we convert this to a user-configurable argument that can be dynamically set based on user's needs. The default can be 0 to respect current behaviour.

remove ggsci dependency from Description

I removed the ggsci-related functions from this package but neglected to remove the line under the Remotes section. This line should be safe to delete and should be done to prevent the package from unnecessarily installing/upgrading.

Add internal call to prettyOncoplot to convert NA metadata values to literal "NA" string

I think because when we don't supply the custom colors it is internally handled but when the user provides it then there is trust no modifications are necessary. We can probably add internal call to convert NAs to literal string anyways

Slack Message

`heatmap_mutation_frequency_bin` is not used in its examples

Instead, calc_mutation_frequency_bin_by_regions is used, but it's not present in any child repo.

README needs update

It is missing the github actions status badge and should contain the link to the website

ashm_multi_rainbow_plot doesn't have option for genome projection

Since user can supply maf or regions or both, ideally we could be able to specify projection.

viz installation fails in a fresh environment

ERROR: dependencies ‘ComplexHeatmap’, ‘g3viz’, ‘GAMBLR.utils’, ‘maftools’ are not available for package ‘GAMBLR.viz’

Function import tags vs. in-code package prefixing

Ensure function import tags and in-code package prefixing are constant and up to date. Issue added for tracking purposes.

`prettyOncoplot` bugs

When making the tutorial, I noticed this info, while useful, is printed at every iteration of the oncoplot so after some time gets annoying and polluting the document when the oncoplot is generated and displayed in the same chunk:

[1] "numcases: 441"
[1] "numgenes: 10"

This should only be displayed when the verbose is set to TRUE.

In addition, when the forest object is provided for the gene annotations, the genes that are not statistically different between the groups are shown as Both in the Enriched in legend - and this should be fixed to become Neither.

The annotation shows values not present in the data

I wonder if someone can reproduce/check if this is a bug.

genome_ssm_data = get_coding_ssm()
ssm_data_capture = get_coding_ssm(this_seq_type = "capture")
all_ssm = bind_rows(ssm_data_capture,genome_ssm_data) #combine all ssm into one large data frame

#Problem:
#this makes a plot with every cohort in the legend even if no samples from the cohort exist in the MAF
prettyOncoplot(maf_df = all_ssm,these_samples_metadata = 
                 all_sample_meta,
               minMutationPercent = 5,metadataColumns = c("pathology","cohort"),
               sortByColumns = c("pathology","cohort"))

# Expected functionality (requires an explicit filter)
# this only shows the relevant cohorts in the legend (i.e. those in the oncoplot)
prettyOncoplot(maf_df = all_ssm,these_samples_metadata = 
                 filter(all_sample_meta,sample_id %in% all_ssm$Tumor_Sample_Barcode),
               minMutationPercent = 5,metadataColumns = c("pathology","cohort"),
               sortByColumns = c("pathology","cohort"))```

[Slack Message](https://morinlabsfu.slack.com/archives/C0224H120CU/p1706573886956999?thread_ts=1706573886.956999&cid=C0224H120CU)

`prettyCoOncoplot` isn't supported by `prettyOncoplot` after maftools dependency has been removed from the latter

This following code was run after changing the argument maftools_obj of the internal call of prettyOncoplot to maf_df.

#get data for plotting
meta = get_gambl_metadata()
meta = dplyr::filter(meta, cohort %in% c("dlbcl_reddy", "dlbcl_schmitz"))

ssm = dplyr::filter(GAMBLR.data::sample_data$grch37$maf,
                    Tumor_Sample_Barcode %in% meta$sample_id)
ssm = maftools::read.maf(ssm)

#build plot
prettyCoOncoplot(maf = ssm,
                 metadata = meta,
                 comparison_column = "cohort",
                 include_noncoding = NULL,
                 minMutationPercent = 0,
                 genes = c("MYC",
                           "TET2",
                           "TP53",
                           "DDX3X",
                           "ID3"),
                 metadataColumns = c("pathology",
                                     "EBV_status_inf",
                                     "pairing_status",
                                     "cohort"),
                 splitColumnName = "EBV_status_inf",
                 metadataBarHeight = 10,
                 fontSizeGene = 12,
                 metadataBarFontsize = 10,
                 legend_row = 2,
                 label1 = "Adult",
                 label2 = "Pediatric")

# Error: $ operator not defined for this S4 class

This is the object given to prettyOncoplot's maf_df argument:

> ssm1
An object of class  MAF 
                        ID summary   Mean Median
 1:             NCBI_Build  GRCh37     NA     NA
 2:                 Center       .     NA     NA
 3:                Samples     468     NA     NA
 4:                 nGenes     234     NA     NA
 5:        Frame_Shift_Del     441  0.942      1
 6:        Frame_Shift_Ins     201  0.429      0
 7:           In_Frame_Del     125  0.267      0
 8:           In_Frame_Ins      20  0.043      0
 9:      Missense_Mutation    5680 12.137     10
10:      Nonsense_Mutation     805  1.720      1
11:       Nonstop_Mutation      19  0.041      0
12:            Splice_Site     394  0.842      1
13: Translation_Start_Site      85  0.182      0
14:                  total    7770 16.603     15

> class(ssm1)
[1] "MAF"
attr(,"package")
[1] "maftools"

The error in prettyOncoplot probably comes from this operation:

> maf_df <- ssm1
> maf_df$Tumor_Sample_Barcode
Error in ssm1$Tumor_Sample_Barcode : 
  $ operator not defined for this S4 class

drop reshape2 dependency

Only used in prettyRainfallPlot and should be reconfigured with dplyr so we decrease the dependency burden on viz package

`fancy_propcov_plot` returns warning message

The warning message:

The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
ℹ Please use the `fun` argument instead.

The issue can be reproduced by running the examples:

#load packages
library(dplyr)

#get sample IDs for available genome samples
genome_collated = collate_results(seq_type_filter = "genome") %>% 
  pull(sample_id)

#subset the collated samples on BL samples
my_samples = get_gambl_metadata() %>% 
  dplyr::filter(sample_id %in% genome_collated) %>% 
  dplyr::filter(pathology == "BL") %>% pull(sample_id)

fancy_propcov_plot(these_sample_ids = my_samples)

Plotting functions should work without GSC access.

GAMBLR.viz functions that are currently relying on core GAMBLR functions (GSC access) to retrieve data if MAF/SEG/BEDPE data is not provided need to be updated so that such functions can work out-of-the-box in this package. without any dependencies on GSC-access-dictated GAMBLR functions.

Parameters missing in `GAMBLR.data::calc_mutation_frequency_bin_regions`

heatmap_mutation_frequency_bin internally calls calc_mutation_frequency_bin_regions. However, when using the
bundled data, GAMBLR.data::calc_mutation_frequency_bin_regions is missing the parameters from_indexed_flatfile and mode, which are required in heatmap_mutation_frequency_bin.

dlbcl_bl_meta = get_gambl_metadata() %>%
    dplyr::filter(pathology %in% c("DLBCL", "BL"))

some_regions = GAMBLR.data::somatic_hypermutation_locations_GRCh37_v_latest %>% 
  dplyr::filter(!gene %in% c("BTG2", "CXCR4", "ST6GAL1", "BCL6", "LPP", "RHOH", "CD83", 
                             "PIM1", "BACH2", "SGK1", "MYC", "PAX5", "GRHPR", "FANK1", "BIRC3", 
                             "BTG1", "DTX1", "BCL7A", "ZFP36L1", "SERPINA9", "TCL1A", "CIITA", 
                             "IRF8", "S1PR2", "MEF2B", "IGLL5", "TMSB4X", "PIM2")) %>% 
  select(chr_name, hg19_start, hg19_end, gene) %>% 
  rename( "chrom"="chr_name", "start"="hg19_start", "end"="hg19_end", "name"="gene") %>% 
  mutate( chrom = str_remove(chrom, "chr") )

heatmap_mutation_frequency_bin(these_samples_metadata = dlbcl_bl_meta,
                               regions_bed = some_regions,
                               from_indexed_flatfile = TRUE,
                               mode = "slms-3")
# Warning: Multiple regions in the provided data frame have the same name. Merging these entries based on min(start) and max(end) per name value. 
#  Error:  You have given one or more unsupported or deprecated argument to  calc_mutation_frequency_bin_regions . Please check the documentation and spelling of your arguments.
# Offending argument(s): from_indexed_flatfile,mode

Should we remove both from_indexed_flatfile and mode parameters from heatmap_mutation_frequency_bin and don't specify these parameters in the internal call of calc_mutation_frequency_bin_regions? Then the GAMBLR.results version of calc_mutation_frequency_bin_regions would always use the default values for these parameters.

drop cowplot dependency

The theme_cowplot is used in few functions but this goes against the consistency we're after in the plots generated by this package. They should be swapped for theme_Morons() throughout.

The prettyForestPlot uses cowplot to arrange figures but it should be possible to swap it with ggpubr approach which is already used as dependency too, so we decrease overall dependency burden.

Develop ggplot2-implemented lollipop plots

Currently the prettylollipoplot uses g3viz under the hood to generate nice lollipop plots. However, the produced outputs are html-based, and the package itself has cBioPortal API dependency which in turn makes GAMBLR.viz very bulky, prone to installation errors, and requires very long installation time.
The ggplot2 on the other end is already a dependency and implementing the lollipop functionality through ggplot2 will address all of these issues.

prettyRainfallPlot does not respect the projection whet returning SVs

When returning SVs fo label, this function always calls the variants with default setting, disregarding what user provides in the main function call. This results in absent label for the SVs present there since the coordinates and chr prefixing does not match. It should be addressed here:

GAMBLR.viz/R/prettyRainfallPlot.R

Lines 213 to 221 in 9f3528f

    
           if (label_sv) { 
        
             message("Getting combined manta + GRIDSS SVs using GAMBLR ...") 
        
             these_sv = get_manta_sv(these_sample_ids  = this_sample_id) 
        
             if ("SCORE" %in% colnames(these_sv)) { 
        
               these_sv = these_sv %>% 
        
                 rename("SOMATIC_SCORE" = "SCORE") 
        
             } 
        
             # annotate SV 
        
             these_sv = annotate_sv(these_sv)

Refer to this thread with sample plots and calls to test the functionality

keepSampleOrder in prettyOncoplot is ignored

The prettyOncoplot ignores the argument keepSampleOrder and does nothing with it. This should be fixed in a way that when this argument is provided, the order of patients in the supplied metadata is respected.

Harmonize parameter names

seq_type parameter in pretty_rainfall_plot should be renamed to this_seq_type for consistency reasons. Self-assigning.

Optionally use raw SVs in prettyRainfallPlot

The current implementation of prettyRainfallPlot uses annotated SVs for labelling. This should become optional and user-configurable, so user can look at raw (not annotated) SVs.

When working on this update, the sv_data also needs to be optionally passed as argument to allow for visualization of user-provided data

`prettyOncoplot` is missing a step to assign POS/NEG colours

prettyOncoplot is missing the line that assigns POS/NEG colours for columns when a custom colours aren't specified.

When I run it in verbose mode it does indicate that it found the colours:

[1] ">>>>>>>"
[1] "POS" "NEG"
[1] "<<<<<<<"
found colours for bcl2_ba here
[1] ">>>>>>>"
[1] "POS" "NEG"
[1] "<<<<<<<"
found colours for bcl6_ba here
$lymphgen
  EZB-MYC       EZB  EZB-COMP       ST2  ST2-COMP       MCD  MCD-COMP       BN2  BN2-COMP        N1   N1-COMP       A53     Other COMPOSITE        NA 
"#52000F" "#721F0F" "#C7371A" "#C41230" "#EC3251" "#3B5FAC" "#6787CB" "#7F3293" "#A949C1" "#55B55E" "#7FC787" "#5b6d8a" "#ACADAF" "#ACADAF"   "white" 

$bcl2_ba
      POS       NEG        NA 
"#c41230" "#E88873"   "white" 

$bcl6_ba
      POS       NEG        NA 
"#c41230" "#E88873"   "white"

But it then returns the error

Error: elements in `col` should be named vectors.

This is using the bcl2_ba and bcl6_ba columns returned by get_gambl_metadata.

Error in `prettyOncoplot`

maf_metadata <- get_gambl_metadata(seq_type_filter = "genome") %>%
  dplyr::filter(pathology %in% c("FL", "DLBCL"))

maf_data <- get_ssm_by_samples(
  these_samples_metadata = maf_metadata
)

fl_genes = c("RRAGC", "CREBBP", "VMA21", "ATP6V1B2")

dlbcl_genes = c("EZH2", "KMT2D", "MEF2B", "CD79B", "MYD88", "TP53")

genes = c(fl_genes, dlbcl_genes)

gene_groups = c(rep("FL", length(fl_genes)), rep("DLBCL", length(dlbcl_genes)))
names(gene_groups) = genes

prettyOncoplot(
  maf_df = maf_data,
  genes = genes,
  these_samples_metadata = maf_metadata %>%
    arrange(patient_id),
  splitGeneGroups = gene_groups,
  keepGeneOrder = TRUE,
  splitColumnName = "pathology",
  metadataBarHeight = 5,
  metadataBarFontsize = 8,
  legend_row = 2,
  fontSizeGene = 11,
  metadataColumns = c("pathology", "lymphgen", "sex"),
  sortByColumns = c("pathology", "lymphgen", "sex")
)

All mutation types: Missense_Mutation, Multi_Hit, Frame_Shift_Del, Splice_Site,
Nonsense_Mutation, Frame_Shift_Ins, In_Frame_Del, Translation_Start_Site.
`alter_fun` is assumed vectorizable. If it does not generate correct plot, please set
`alter_fun_is_vectorized = FALSE` in `oncoPrint()`.
Error: elements in `col` should be named vectors.

The error happens in the ComplexHeatmap::oncoPrint call. Here is the object that is given to the col parameter of oncoPrint:

> col
     Nonsense_Mutation      Missense_Mutation              Multi_Hit        Frame_Shift_Ins 
             "#c41230"              "#39b54b"              "#455564"              "#e90c8b" 
       Frame_Shift_Del           In_Frame_Ins           In_Frame_Del       Nonstop_Mutation 
             "#e90c8b"              "#5f3a17"              "#5f3a17"              "#2cace3" 
Translation_Start_Site            Splice_Site          Splice_Region                  3'UTR 
             "#8781bd"              "#fe9003"              "#fe9003"              "#f9bd1f" 
                Silent 
             "#A020F0"

drop plotly dependency

It is only used in fancy_qc_plot and is not a consistent feature throughout - also requires html outputs that are not publcation-friendly. It should be dropped at this point to decrease the dependency burden

hide_annotations still shows metadata track in prettyOncoplot

When prettyOncoplot is supplied with hide_annotations, the color legend is hidden but the track itself is still shown. This should become optional and if user wants to do so, the metadata track should be hidden as well.
This is important when you want to force ordering on some column but not show it in the track below the oncoplot.

	if (label_sv) {
	message("Getting combined manta + GRIDSS SVs using GAMBLR ...")
	these_sv = get_manta_sv(these_sample_ids = this_sample_id)
	if ("SCORE" %in% colnames(these_sv)) {
	these_sv = these_sv %>%
	rename("SOMATIC_SCORE" = "SCORE")
	}
	# annotate SV
	these_sv = annotate_sv(these_sv)