Coder Social home page Coder Social logo

dtm2451 / dittoseq Goto Github PK

View Code? Open in Web Editor NEW
185.0 5.0 18.0 19.12 MB

Color blindness friendly visualization of single-cell and bulk RNA-sequencing data

License: MIT License

R 99.68% TeX 0.32%
rna-seq single-cell single-cell-rna-seq visualization color-blindness color-blind

dittoseq's Introduction

dittoSeq dittoSeq

A set of functions built to enable analysis and visualization of single-cell and bulk RNA-sequencing data by novice, experienced, and color blind coders

dittoSeq includes universal plotting and helper functions for working with (sc)RNAseq data processed in these packages:

  • single-cell:
    • Seurat (versions 2+), Seurat data structure
    • scran / scater / other Bioconductor packages that utilize the SingleCellExperiment data structure
  • bulk:
    • edgeR, DGEList data structure
    • DESeq2 / other Bioconductor packages that utilize the SummarizedExperiment data structure

All plotting functions spit out easy-to-read, color blind friendly, plots (ggplot2, plotly, or pheatmap/ComplexHeatmap) upon minimal coding input for your daily analysis needs, yet also allow sufficient manipulations to provide for out-of-the-box submission-quality figures!

dittoSeq also makes access of underlying data easy, for submitting to journals or for adding extra layers to the plot, with data.out = TRUE inputs!

Overview

News:

Major functionality updates are coming in the next release!

Updates in dittoSeq v1.16 (Bioconductor 3.19):

  • Feature Extensions:

    1. Multi-modality functionality: To support visualization of markers from multiple modalities within a single plot, the way that assay and slot inputs can be provided for Seurat-objects, and that assay and swap.rownames can be provided for SingleCellExperiment and Summarized Experiment objects has been overhauled. A new documentation page, ?GeneTargeting, describes the new methodologies.
      • Note: Single assay mode currently remains the default for all plotters. In the current implementation, you need to explicitly set assay whenever aiming to target multiple modalities, but I will be considering defaulting to e.g. assay = c("RNA", "ADT") for Seurat objects in a future dittoSeq-v2.0.
    2. 'dittoDotPlot()' vars-categories: Added support for categorization of vars/markers shown in 'dittoDotPlot()'s and also for axes swapping.
      • Added vars.dir input to give internal control over whether markers are shown on the x-axis (default) or the y-axis (vars.dir = "y").
      • vars input can be given as a named list to group markers (list element values) into categories (list element names).
      • Added automatic addition of display-style adjustments that make category labels appear more like category labels.
        • New inputs categories.split.adjust and categories.theme.adjust were added to let users turn off these display adjustments. Elements which are added to split.adjust (and then ultimately given to ggplot2::facet_grid()) can be turned off by setting categories.split.adjust = FALSE and elements which are added to theme (and applied via ggplot2::theme()) can be turned off by setting categories.theme.adjust = FALSE.
    3. 'dittoDotPlot()' 3-color scale: Added support for adding a midpoint color to the color scale used for 'dittoDotPlot()'s.
      • New input mid.color controls the switch:
        • Left as the default, NULL, a 2-color scale is used (from 'min.color' to 'max.color')
        • Given mid.color = "<color>", a 3-color scale is used (from 'min.color' to <color> to 'max.color')
        • Giving mid.color = "ryb" or "rgb" or "rwb" allows single-point quick update to all of 'min.color', 'mid.color', and 'max.color' for use of one of three standard 3-color scales inspired by ColorBrewer ("ryb": from blue to yellow to red; "rgb": from blue to "gray97" to red; "rwb": from blue to "white" to red).
      • New input mid controls the data value at which 'mid.color' will be used in the scale, and receives intuitive defaulting so users generally don't need to provide it.
      • This mechanism is being rolled out an tested with dittoDotPlot() first, but users can expect extension of this functionality to other visualizations in an upcoming release!
    4. 'dittoDimPlot()', 'dittoScatterPlot()', 'dittoDimHex()', and 'dittoScatterHex():
      • Added 'labels.repel.adjust' input which provide additional control of labeling via input pass-through to the ggrepel::geom_label_repel() (labels.highlight = TRUE, the default) and ggrepel::geom_text_repel() (labels.highlight = FALSE) functions which underly do.label = TRUE labeling (when labels.repel is left as the default TRUE)
    5. 'dittoPlot()', 'dittoFreqPlot()', and 'dittoPlotVarsAcrossGroups()':
      • Added 'boxplot.outlier.size' input to allow control of the outlier shape's size
      • Added 'vlnplot.quantiles' input to allow drawing of lines, within violin plot data representations, at requested data quantiles.
    6. 'dittoScatterHex()' and 'dittoDimHex()':
      • Added a new 'color.method' style for discrete 'color.var'-data. Users can give color.method = "prop.<value>", where <value> is an actual data level of 'color.var'-data, to have color represent the proportion of 'color.var'-data == <value> for all bins.
  • Bug Fixes:

    1. 'dittoHeatmap()': Fixed a bug which blocked provision of 'annotation_row' and 'annotation_colors' inputs to dittoHeatmap without also generating column annotations via either 'annot.by' or direct 'annotation_col' provision.
  • Upkeep with ggplot-v3 & Seurat-v5, details here are generally invisbile to the user:

    1. ggplot-v3: Switched from making use of do.call() with the deprecated aes_string(), and simple list management for successively built setups, to mostly direct aes(.data[[<col>]]) calls, and use of modifyList for additions in successively built setups. This methodology should be backwards compatible to earlier ggplot versions, but that has not been officially tested.
    2. Seurat-v5: Added conditional code that switched to the newly supported SeuratObj[[<assay>]][<slot>] syntax for expression data retrieval when the user's Seurat package version is 5.0 or higher.

Previous updates:

Updates in dittoSeq v1.14 (Bioconductor 3.18)
  • Feature Extensions:

    1. 'dittoDotPlot()' & 'dittoPlotVarsAcrossGroups()': Improved 'group.by' ordering control via retention of factor levels and addition of a new 'groupings.drop.unused' input to control retention of empty levels.
    2. 'dittoHeatmap()': Targeting Seurat clusters with the "ident" shortcut now works for the 'annot.by' input of 'dittoHeatmap()'.
  • Bug Fixes:

    1. 'dittoHeatmap()': Fixed a feature exclusion check in 'dittoHeatmap()' meant to remove features without any non-zero values. Previously, it removed all features with a mean of zero, which blocked plotting from pre-scaled data.
    2. 'dittoHeatmap()': (New in dittoSeq-v1.15-devel, but also pushed to the released v1.14.2) Fixed a bug which blocked provision of 'annotation_row' and 'annotation_colors' inputs to dittoHeatmap without also generating column annotations via either 'annot.by' or direct 'annotation_col' provision.
    3. 'dittoDimPlot()' & 'getReductions()': Eliminated cases where 'getReductions()' did not return NULL for 'object's containing zero dimensionality reductions. This fix also improves associated error messaging of 'dittoDimPlot()' where such cases were missed.
  • Upkeep with ggplot-v3 & Seurat-v5, details here are generally invisbile to the user (New in dittoSeq-v1.15-devel, but also pushed to the released v1.14.1):

    1. ggplot-v3: Switched from making use of do.call() with the deprecated aes_string(), and simple list management for successively built setups, to mostly direct aes(.data[[<col>]]) calls, and use of modifyList for additions in suucessively built setups. This methodology should be backwards compatible to earlier ggplot versions, but that has not been officially tested.
    2. Seurat-v5: Added conditional code that switched to the newly supported SeuratObj[[<assay>]][<slot>] syntax for expression data retrieval when the user's Seurat package version is 5.0 or higher.
No code updates in dittoSeq v1.10 & v1.12 (Bioconductor 3.16 & 3.17)
  • Bioconductor-maintained version number updates only
Updates in dittoSeq v1.8 (Bioconductor 3.15)
  • Added 'randomize' option for 'order' input of 'dittoDimPlot()' and 'dittoScatterPlot()'
Updates in dittoSeq v1.6 (Bioconductor 3.14)
  • Vignette Update: Added a 'Quick-Reference: Seurat<=>dittoSeq' section.

  • Build & Test Infrastructure Update: Removed Seurat dependency from all build and test materials by removing Seurat code from the vignette and making all unit-testing of Seurat interactions conditional on both presence of Seurat and successful SCE to Seurat cnversion.

  • Bug Fixes:

    1. Fixed dittoFreqPlot calculation machinery to properly target all cell types but only necessary groupings for every sample. Removed the 'retain.factor.levels' input because proper calculations treat 'var'-data as a factor, and groupings data as non-factor.
    2. Allowed dittoHeatmap() to properly 'drop_levels' of annotations by ensuring 'annotation_colors' is not populated with colors for empty levels which would be dropped.
    3. Made 'do.label' machinery of scatter plots robust to NAs.
Updates in dittoSeq v1.4 (Bioconductor 3.13)
  • Added 1 New Visualization Function: dittoFreqPlot():
    • Combines the population frequency summarization of dittoBarPlot() with the plotting style of dittoPlot() to enable per-population, per-sample, per-group frequency comparisons which focus on individual cell types / clusters!
  • Improved & expanded faceting capabilities with split.by inputs:
    • Added split.by to functions which did not have it: dittoBarPlot(), dittoDotPlot(), and dittoPlotVarsAcrossGroups()
    • Added split.adjust input to allow tweaks to the underlying facet_grid() and facet_wrap() calls.
    • Better compatibility with other features
      • works with labeling of Dim/Scatter plots
      • new split.show.all.others input now controls whether the full spectrum of points, versus just points excluded with cells.use, will be shown as light gray in the background of Dim/Scatter facets.
  • Improved dittoPlot()-plotting engine:
    • y-axis plotting:
      • geom dodging when color.by is used to add subgroupings now works for jitters too.
      • added a boxplot.lineweight control option.
    • x-axis / ridge-plotting:
      • Added an alternative histogram-shaping option (Try adding ridgeplot.shape = "hist"!)
      • Better use of white space (via adjustments to default plot grid expansion & exposure of a ridgeplot.ymax.expansion input to allow user override.)
  • Improved ordering capability for dittoHeatmap() & dittoBarPlot():
    • dittoHeatmap(): You can now give many metadata to order.by and it will use them all, prioritizing earliest items
    • dittoBarPlot(): Factor-level ordering can now be retained in dittoBarPlot for var and group.by data, a typically expected behavior, by setting a new input retain.factor.levels = TRUE.
  • Added interaction with rowData of SE and SCEs:
    • swap.rownames input allows indication of genes/rows by non-default rownames. E.g. for an object with Ensembl_IDs as the default and a rowData column named 'symbol' that contains gene symbols, those symbols can be used via dittoFunction(..., var = "<gene_symbol>", swap.rownames = "symbol").
  • Quality of Life improvements:
    • Standardized data.out & do.hover interplay to allow both plotly conversion and data output.
    • Documentation Updates
Updates in dittoSeq v1.2 (Bioconductor 3.12)
  • Added 3 New Visualization Functions, dittoDotPlot(), dittoDimHex() & dittoScatterHex().
  • Expanded SummarizedExperiment compatibility across the entire toolset.
  • Added ComplexHeatmap integration to dittoHeatmap(), controlled by a new input, complex.
  • Added Rasterization for improved image editor compatibility of complex plots. (See the dedicated section in the vignette for details.)
  • Added labels.split.by input & do.contour, contour.color, and contour.linetype inputs to scatter/dim-plots.
  • Added order input to scatter/dim-plots for control of plotting order.
  • Added metas input for displaying such data with dittoHeatmap().
  • Added adjustment input to meta(), which works exactly as in gene() (but this is not yet implemented within data grab of visualization functions).
  • Added adj.fxn input to meta() and gene() for added control of how data might be adjusted (but this is not yet implemented within data grab of visualization functions).
  • Replaced (deprecated) highlight.genes input with highlight.features in dittoHeatmap().
  • Replaced (deprecated) OUT.List input with list.out for all multi_* plotters.
Updates in dittoSeq v1.0 (Bioconductor 3.11)
  • Submitted to Bioconductor

Color Blindness Compatibility:

The default colors of this package are meant to be color blind friendly. To make it so, I used the suggested colors from this source: Wong B, "Points of view: Color blindness." Nature Methods, 2011 and adapted them slightly by appending darker and lighter versions to create a 24 color vector. All plotting functions use these colors, stored in dittoColors(), by default. Also included is a Simulate() function that allows you to see what your function might look like to a colorblind individual. For more info on that, see the Color blindness Friendliness section below

Demuxlet Tools

Included in this package currently are a set of functions to facilitate Mux-seq applications. For information about how to use these tools, see the Demuxlet section down below. For more information on Demuxlet and Mux-sequencing, see the Demuxlet GitHub Page. (Impetus: Many Mux-seq experiments will involve generating the side-by-side bulk and single-cell RNAseq data like the rest of the package is built for.)

Installation:

### For R-4.0 users:
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("dittoSeq")

### For users with older versions of R:
# BiocManager will not let you install the pre-compiled version, but you can
# install directly from this GitHub via:
if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")

devtools::install_github("dtm2451/dittoSeq")

Quick Reference: Seurat <=> dittoSeq

Because often users will be familiar with Seurat already, so this may be 90% of what you may need!

Click to expand

As of May 25th, 2021, Seurat-v4.0.2 & dittoSeq v1.4.1

Functions

Seurat Viz Function(s) dittoSeq Equivalent(s)
DimPlot/ (I)FeaturePlot / UMAPPlot / etc. dittoDimPlot / multi_dittoDimPlot
VlnPlot / RidgePlot dittoPlot / multi_dittoPlot
DotPlot dittoDotPlot
FeatureScatter / GenePlot dittoScatterPlot
DoHeatmap dittoHeatmap*
[No Seurat Equivalent] dittoBarPlot / dittoFreqPlot
[No Seurat Equivalent] dittoDimHex / dittoScatterHex
[No Seurat Equivalent] dittoPlotVarsAcrossGroups
SpatialDimPlot, SpatialFeaturePlot, etc. dittoSpatial (coming soon!)

*Not all dittoSeq features exist in Seurat counterparts, and occasionally the same is true in the reverse.

Inputs

See reference below for the equivalent names of major inputs

Seurat has had inconsistency in input names from version to version. dittoSeq drew some of its parameter names from previous Seurat-equivalents to ease cross-conversion, but continuing to blindly copy their parameter standards will break people's already existing code. Instead, dittoSeq input names are guaranteed to remain consistent across versions, unless a change is required for useful feature additions.

Seurat Viz Input(s) dittoSeq Equivalents
object SAME
features var / vars (generally the 2nd input, so name not needed!) OR genes & metas for dittoHeatmap()
cells (cell subsetting is not always available) cells.use (consistently available)
reduction & dims reduction.use & dim.1, dim.2
pt.size size (or jitter.size)
group.by SAME
split.by SAME
shape.by SAME and also available in dittoPlot()
fill.by color.by (can be used to subset group.by further!)
assay / slot SAME
order = logical order but = "unordered" (default), "increasing", or "decreasing"
cols color.panel for discrete OR min.color, max.color for continuous
label & label.size & repel do.label & labels.size & labels.repel
interactive do.hover = via plotly conversion
[Not in Seurat] data.out, do.raster, do.letter, do.ellipse, add.trajectory.lineages and others!

Quick Start Guide:

Load in your data, then go!:

library(dittoSeq)

# dittoSeq works natively with Seurat, SingleCellExperiment (SCE),
#   & SummarizedExperiment (SE) objects

# Seurat
seurat <- Seurat::pbmc_small
dittoPlot(seurat, "CD14", group.by = "ident")

# SingleCellEXperiment
sce <- Seurat::as.SingleCellExperiment(seurat)
dittoDimPlot(sce, "CD14")

# SummarizedExperiment
# (Please excuse the janky setup code for this quick example.)
library(SummarizedExperiment)
se <- as(as.SingleCellExperiment(Seurat::pbmc_small), "SummarizedExperiment")
rownames(se) <- rownames(sce)
dittoBarPlot(sce, "ident", group.by = "RNA_snn_res.0.8")

# For working with non-SE bulk RNAseq data, first import your data into a
#   SingleCellExperiment structure, (which is essentially a SummarizedExperiment
#   structure just with an added space for holding dimensionality reductions).
# myRNA <- importDittoBulk(dds) # DESeq2
# myRNA <- importDittoBulk(dgelist) # edgeR
# Then add dimensionality reductions
# myRNA <- addDimReduction(myRNA, embeddings, "pca")
#   above, embeddings = the dim-reduction matrix
myRNA <- example("importDittoBulk")

# You're ready!
dittoDimPlot("gene1", myRNA, size = 3)

Quickly determine the metadata and gene options for plotting with universal helper functions:

getMetas(seurat)
isMeta("nCount_RNA", seurat)

getGenes(myRNA)
isGene("CD3E", myRNA)

getReductions(sce)

# View them with these:
gene("CD3E", seurat, assay = "RNA", slot = "counts")
meta("groups", seurat)
metaLevels("groups", seurat)

There are many dittoSeq Plot Types

Intuitive default adjustments generally allow creation of immediately useable plots.

# dittoDimPlot
dittoDimPlot(seurat, "ident", size = 3)
dittoDimPlot(seurat, "CD3E", size = 3)

# dittoBarPlot
dittoBarPlot(seurat, "ident", group.by = "RNA_snn_res.0.8")
dittoBarPlot(seurat, "ident", group.by = "RNA_snn_res.0.8",
    scale = "count")

# dittoPlot
dittoPlot(seurat, "CD3E", group.by = "ident")
dittoPlot(seurat, "CD3E", group.by = "ident",
    plots = c("boxplot", "jitter"))
dittoPlot(seurat, "CD3E", group.by = "ident",
    plots = c("ridgeplot", "jitter"))

# dittoHeatmap
dittoHeatmap(seurat, genes = getGenes(seurat)[1:20])
dittoHeatmap(seurat, genes = getGenes(seurat)[1:20],
    annot.by = c("groups", "nFeature_RNA"),
    scaled.to.max = TRUE,
    treeheight_row = 10)
# Turning off cell clustering can be necessary for large scRNAseq data
# Thus, clustering is turned off by default for single-cell data, but not for
# bulk RNAseq data.
# To control ordering/clustering separately, use 'order.by' or 'cluster_cols'
## (Not shown) ##
dittoHeatmap(seurat, genes = getGenes(seurat)[1:20],
    order.by = "groups")
dittoHeatmap(seurat, genes = getGenes(seurat)[1:20],
    cluster_cols = FALSE)

# dittoScatterPlot
dittoScatterPlot(
    object = seurat,
    x.var = "CD3E", y.var = "nCount_RNA",
    color.var = "ident", shape.by = "RNA_snn_res.0.8",
    size = 3)
dittoScatterPlot(
    object = seurat,
    x.var = "nCount_RNA", y.var = "nFeature_RNA",
    color.var = "CD3E",
    size = 1.5)

# Also multi-plotters:
    # multi_dittoDimPlot (multiple, in an array)
    # multi_dittoDimPlotVaryCells (multiple, in an array, but showing only
    #     certain cells in each plot)
    # multi_dittoPlot (multiple, in an array)
    # dittoPlot_VarsAcrossGroups (multiple genes or metadata as the jitter
    #     points (and other representations), summarized across groups by
    #     z-score, or mean, or median, or any function that outputs a
    #     single numeric value from a numeric vector input.)

Many adjustments can be made with simple additional inputs:

dittoSeq allows many adjustments to how data is represented via inputs directly within dittoSeq functions. Adjustments that are common across functions are briefly described below. Some others are within the examples above.

For more details, review the full vignette (vignette("dittoSeq") after installation via Bioconductor) and/or the documentation of individual functions (example: ?dittoDimPlot).

Common Adjustments:

  • All Titles are adjustable.
  • Easily subset the cells shown with cells.use
  • Colors can be adjusted easily.
  • Underlying data can be output.
  • plotly hovering can be added.
  • Many more! (Legends removal, label rotation, labels' and groupings' names, ...)
# Adjust titles
dittoBarPlot(seurat, "ident", group.by = "RNA_snn_res.0.8",
    main = "Starters",
    sub = "By Type",
    xlab = NULL,
    ylab = "Generation 1",
    x.labels = c("Ash", "Misty"),
    legend.title = "Types",
    var.labels.rename = c("Fire", "Water", "Grass"),
    x.labels.rotate = FALSE)

# Subset cells / samples
dittoBarPlot(seurat, "ident", group.by = "RNA_snn_res.0.8",
    cells.use = meta("ident", seurat)!=1)

# Adjust colors
dittoBarPlot(seurat, "ident", group.by = "RNA_snn_res.0.8",
    colors = c(3,1,2)) #Just changes the color order, probably most useful for dittoDimPlots
dittoBarPlot(seurat, "ident", group.by = "RNA_snn_res.0.8",
    color.panel = c("red", "orange", "purple"))

# Output data
dittoBarPlot(seurat, "ident", group.by = "RNA_snn_res.0.8",
    data.out = TRUE)

# Add plotly hovering
dittoBarPlot(seurat, "ident", group.by = "RNA_snn_res.0.8",
    do.hover = TRUE)

Color-blindness Friendliness

dittoSeq has many methods to make its plots color-blindness friendly:

1. The default color palette is built to work for the most common forms of colorblindness.

I am a protanomalous myself (meaning I am red-green impaired, but more red than green impaired), so I chose colors for dittoSeq that I could tell apart. These colors also work for deuteranomolies (red-green, but more green than red) the most common form of color-blindness.

Note: There are still other forms of colorblindness, tritanomaly (blue deficiency), and complete monochromacy. These are more rare. dittoSeq's default colors are not great for these, but 2 & 3 below can still help!

2. Color legend point-sizing is large by default

No color panel can be perfect, but when there are issues, being able to at least establish some of the color differences from the legend helps. For this goal, having the legend examples be large enough is SUPER helpful.

3. Lettering overlay

Once the number of colors being used for discrete plotting in dittoDimPlot gets too high for even a careful color panel to compensate, letters can be added to by setting do.letter = TRUE.

4. Shape.by

As an alternate to letting (do.letter & shape.by are incompatible with each other), distinct groups can be displayed using different shapes as well.

5. Interactive Plots

Many dittoSeq visualizations offer plotly conversion when a do.hover input is set to TRUE. Making plots interactive is another great way to make them accessible to individuals with vision impairments. I plan to build such plotly conversion into more functions in the future.

6. The Simulate function

This function allows a cone-typical individual to see what their dittoSeq plot might look like to a colorblind individual. This function works for all dittoSeq visualizations currently, except for dittoHeatmap.

Note: there are varying degrees of colorblindness. Simulate simulates for the most severe cases.

Say this is the code you would use to generate your plot:

dittoDimPlot("CD3E", object = seurat, do.letter=F)

The code to visualize this as if you were a deuteranope like me is:

Simulate(type = "deutan", plot.function=dittoDimPlot, "CD3E", object = seurat, do.letter=F)

The Simulate() function's inputs are:

  • type = "deutan", "protan", "tritan" = the type of colorblindness that you want to simulate. Deuteranopia is the most common, and involves primarily red color deficiency, and generally also a bit of green. Protanopia involves primarily green color deficiency, and generally also a bit of red. Tritanopia involves primarily blue color deficiency.
  • plot.function = the function you want to use. R may try to add (), but delete that if it does.
  • ... = any and all inputs that go into the plotting function you want to use.

Demuxlet tools

Included in this package are a set of functions to facilitate Mux-seq applications. For more information on Demuxlet and Mux-sequencing, see the Demuxlet GitHub Page. (Impetus: Many Mux-seq experiments will involve generating the side-by-side bulk and single-cell RNAseq data like the rest of the package is built for.)

  • importDemux() - imports Demuxlet info into a pre-made Seurat or SingleCellExperiment object. For more info on its use, see below and ?importDemux within R.

  • demux.calls.summary() - Makes a plot of how many calls were made per sample, separated by the separate lanes. This is very useful for checking the potential accuracy of sample calls when only certain samples went into certain lanes/pools/sequencing runs/etc. (Note: the default setting is to only show Singlet calls. Use singlets.only = FALSE to include one of the sample calls for any doublets.

demux.calls.summary(object)
  • demux.SNP.summary() - Useful for checking if you have a lot of cells with very few SNPs. Creates a plot of the number of SNPs per cell that is grouped by individual lane by default. This function is a simple wrapper for dittoPlot() function with var="demux.N.SNP" and with a number of input defaults adjusted (such as group.by and color.by = "Lane" so that the grouping is done according to 'Lane' metadata.)
demux.SNP.summary(object)

importDemux() Function:

You will need to point the function to:

  • object = the target Seurat/SCE object
  • demuxlet.best = the location(s) of your Demuxlet .best output files.

If your data comes from multiple droplet-gen lanes, then there are two main distinct ways to use the function.

They differ because of specifics of how the data from distinct lanes may have been combined. See ?importDemux in R for suggested usage.

Metadata created by importDemux:

Metadata slot name Description OR the Demuxlet.best column name if directly carried over
Lane guided by lane.names input, represents of separate droplet-generation lanes, pool, sequencing lane, etc.
Sample The sample call, from the BEST column
demux.doublet.call whether the sample was a singlet (SNG), doublet (DBL), or ambiguous (AMB), from the BEST column
demux.RD.TOTL RD.TOTL
demux.RD.PASS RD.PASS
demux.RD.UNIQ RD.UNIQ
demux.N.SNP N.SNP
demux.PRB.DBL PRB.DBL
demux.barcode.dup (Only generated when TRUEs will exist, indicative of a technical issue in the bioinformatics pipeline) whether a cell's barcode referred to only 1 row of the .best file, but multiple distinct cells in the dataset.

Summary output:

The import function spits out a quick summary of what was done, which will look something like this:

Adding 'Lane' information as meta.data
Extracting the Demuxlet calls
Matching barcodes
Adding Demuxlet info as metadata
Checking for barcode duplicates across lanes...
  No barcode duplicates were found.

SUMMARY:
2 lanes were identified and named:
  Lane1, Lane2
The average number of SNPs per cell for all lanes was: 505.3
Out of 80 cells in the Seurat object, Demuxlet assigned:
    75 cells or 93.8% as singlets
    4 cells or 5% as doublets
    and 1 cells as too ambiguous to call.
0 cells were not annotated in the demuxlet.best file.

dittoseq's People

Contributors

atpoint avatar dtm2451 avatar j-andrews7 avatar jwokaty avatar kant avatar nturaga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dittoseq's Issues

dittobarplot split.by

Split.by is really helpful in the graph tools dittoSeq provides. Maybe I'm missing something, but noticed doesn't work with split.by.
Currently I use cells.use and generate multiple graphs, do you have a suggestion?
Thanks!

Choose assay for Seurat objects?

Is there any way we can specify the assay to pull data from? This matters for Seurat objects because integrated assays don't typically contain all genes, and SCT assay normalized counts often have noise regressed out that makes the counts dependent in some way on each other.

Typically, the "RNA" assay is what is wanted, but it'd be nice to be able to specify this directly. I have not dug through the source code to see how the current assay is chosen, but I assume it just uses the default assay of the object?

dittoBoxPlot() ....for multiple genes???

Hello. I'm currently applying your wonderful tool for my project. I tried to use dittoBoxPlot() and it appears it can only be used for 1 gene at a time per plot.

I want to compare/display the expression levels of several DEGs across conditions (healthy and disease) in one plot such that it looks like the below figure:

...where y-axis is the expression levels, and the x-axis should be the genes (and each gene has 2 barplot of healthy and disease). Hence, similar to this boxplot, generating a pattern where the blue boxplots of healthy states are low, and the red boxplots of upregulated DEG genes in diseased are high.

Could you please help me? Thank you very much in advance!

image

multi_dittoPlot() then ggsave() only writes out last plot

I just starting using multi_dittoPlot() and it is working very well. However, when I try to use ggsave() afterwards, only the last gene given is saved in the resulting .jpeg. Example code from the vignette:

multi_dittoPlot(sce, delta.genes[1:6], group.by = "label",
    vlnplot.lineweight = 0.2, jitter.size = 0.3)
ggsave("onlyCBLN4plot.jpeg")

However, the old-fashioned calling jpeg() then dev.off() does give me the plot I want:

jpeg("all6genesplot.jpeg", width = 7, height = 7, unit = "in", res = 300, quality = 100)
multi_dittoPlot(sce, delta.genes[1:6], group.by = "label",
    vlnplot.lineweight = 0.2, jitter.size = 0.3)
dev.off()

So there is a work around but it is cumbersome. Just wanted to point this out - hopefully there is a quick fix.

dittoFreqPlot variable outputs

Hi,
thanks for this really nice package! I came across the dittoFreqPlot function and I am not sure I understand what is going on in the back. I have an SingleCellExperiment objects and run the following code and get the subsequent plot:
dittoFreqPlot(sce,var="knn_10_spatial_clusters", group.by = "Therapy",sample.by = "ImageNumber")
image

when I run the functions with "retain.factor.levels = TRUE" the plot looks different and I checked and the numbers are correct in this version of the function:
dittoFreqPlot(sce,var="knn_10_spatial_clusters", group.by = "Therapy",sample.by = "ImageNumber", retain.factor.levels = TRUE)
image

Could you explain what exactly "retain.factor.levels = TRUE" does? tricky to know which is the "correct" plot.
thanks and best,
daniel

Ridgeplot shape

Is there a way to plot histogram instead of density? Sometimes it is needed for raw counts.

Error when colnames are not set

Hi @dtm2451

I noticed the following error when calling dittoRidgePlot(sce, var = "MPO"):

Error in .var_OR_get_meta_or_gene(main.var, object, assay, slot, adjustment,  : 
  'var' is not a metadata or gene nor equal in length to ncol('object')

After a bit of digging I noticed that the .all_cells function extracts the colnames of a SummarizedExperiment object here. If colnames(object) returns NULL I guess most plotting functions break.
Cheers,

Nils

R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] dittoSeq_1.7.0              ggplot2_3.3.5               SpatialExperiment_1.4.0     SingleCellExperiment_1.16.0
 [5] SummarizedExperiment_1.24.0 Biobase_2.54.0              GenomicRanges_1.46.0        GenomeInfoDb_1.30.0        
 [9] IRanges_2.28.0              S4Vectors_0.32.2            BiocGenerics_0.40.0         MatrixGenerics_1.6.0       
[13] matrixStats_0.61.0         

loaded via a namespace (and not attached):
 [1] edgeR_3.36.0              DelayedMatrixStats_1.16.0 scuttle_1.4.0             R.utils_2.11.0           
 [5] assertthat_0.2.1          dqrng_0.3.0               GenomeInfoDbData_1.2.7    yaml_2.2.1               
 [9] ggrepel_0.9.1             pillar_1.6.4              lattice_0.20-45           glue_1.6.0               
[13] limma_3.50.0              beachmat_2.10.0           digest_0.6.29             RColorBrewer_1.1-2       
[17] XVector_0.34.0            colorspace_2.0-2          cowplot_1.1.1             htmltools_0.5.2          
[21] Matrix_1.3-4              R.oo_1.24.0               plyr_1.8.6                pkgconfig_2.0.3          
[25] pheatmap_1.0.12           magick_2.7.3              bookdown_0.24             zlibbioc_1.40.0          
[29] purrr_0.3.4               scales_1.1.1              HDF5Array_1.22.1          BiocParallel_1.28.0      
[33] tibble_3.1.6              generics_0.1.1            ellipsis_0.3.2            withr_2.4.3              
[37] magrittr_2.0.1            crayon_1.4.2              evaluate_0.14             R.methodsS3_1.8.1        
[41] fansi_0.5.0               tools_4.1.2               lifecycle_1.0.1           Rhdf5lib_1.16.0          
[45] DropletUtils_1.14.1       munsell_0.5.0             locfit_1.5-9.4            DelayedArray_0.20.0      
[49] compiler_4.1.2            rlang_0.4.12              rhdf5_2.38.0              grid_4.1.2               
[53] RCurl_1.98-1.5            ggridges_0.5.3            rhdf5filters_1.6.0        rjson_0.2.20             
[57] bitops_1.0-7              rmarkdown_2.11            gtable_0.3.0              DBI_1.1.2                
[61] R6_2.5.1                  gridExtra_2.3             knitr_1.36                dplyr_1.0.7              
[65] fastmap_1.1.0             utf8_1.2.2                parallel_4.1.2            Rcpp_1.0.7               
[69] vctrs_0.3.8               tidyselect_1.1.1          xfun_0.28                 sparseMatrixStats_1.6.0  

Code organization suggestions

From a quick scan of the code, here's some immediate structural suggestions:

  • I would strongly consider breaking up the DittoSeq.R file into smaller chunks. I know it's convenient to have all the functions in a single file, but this becomes unsustainable as your package gets larger and you have to keep on jumping up and down in the same file, e.g., to edit a function that gets called by another function. I usually have a single function definition per file, or at least a single category of function definitions; this makes it easier to navigate.
  • Your functions are designed to work with a range of input classes - this is prime territory for S4 dispatch. In some sense, you're already doing that in functions like get.genes, so it would be less effort to just let the dispatch system do that for you automatically. More generally, it's good to learn, because S4 is a powerful system that allows you to do pretty awesome things. (Not in this particular use case, but it really gets going when you deal with data infrastructure.)
  • Keep an eye on the complexity of your function interfaces. Developers (i.e., yourself) will find the code difficult to reason about if they have to worry about the combinatorial effects of multiple arguments. If there are no combinatorial effects, that should be enforced by splitting the code into multiple subfunctions that do separate parts. As a rough guideline, you should aim to keep functions below 50 lines of code; this is a very loose guideline, but ~250 for DBDimPlot is probably too long. It's not the worst I've seen, though; our iSEE function was several thousand lines long to handle the observer code... and it sucked to work on.

Some more specific coding suggestions, taken pretty much from BiocCheck:

  • Don't access slots directly, i.e., don't use @ or slot(). This is a pretty common problem that causes havoc with user code when the internal representation of the class changes. (Case in point, Seurat v2 to v3.) Rather, use the accessor functions if they have been provided - if not, write your own so that you quarantine any @ calls to a few functions that are easy to change.
  • Don't use F or T. T <- FALSE works, for example - and it's not even that uncommon (e.g., if you're working with Brachyury, or Hotelling's T).
  • Avoid 1:n, use seq_len(n). Similarly, use seq_along(vec) instead of 1:length(vec). The replacements have an advantage in that they behave sensibly when n=0.

That should probably be enough to get you going.

Density plot functionality

Density plots would be a nice addition via stat_binhex() and/or stat_density_2d(). Simplest implementation would likely be a new function built off of dittoScatterPlot, similar to how dittoDimPlot currently functions.

An alternative would be to add a few additional parameters to dittoScatterPlot and build it out there. I think the former is probably less of a headache, but probably worth discussing.

Additional updates before Bioconductor release

  • update all docs
  • Build out split.by functionality with ncol, nrow, and grid control.
  • extend the dittoColors() vector (#28)
  • expose a variation of the .is_bulk() util function
  • Build out unit tests a bit more
  • Fix singlets.only input of demux.calls.summary()
  • Maybe? reverse the order of object and test inputs in getter functions

dittoHeatmap Error in .which_data(assay, slot, object)[genes, cells.use] : invalid or not-yet-implemented 'Matrix' subsetting

Hello,

When running this
dittoHeatmap(sce.symbol, genes = genes, annot.by = "label", cluster_rows = FALSE)

I get the error:

Error in .which_data(assay, slot, object)[genes, cells.use] : 
  invalid or not-yet-implemented 'Matrix' subsetting

Here is a look at my objects:

> sce.symbol
class: SingleCellExperiment 
dim: 12212 1866 
metadata(1): Samples
assays(2): counts logcounts
rownames(12212): AL627309.1 RP11-206L10.9 ... AL354822.1 SRSF10
rowData names(6): ID Symbol ... discard hvgs
colnames: NULL
colData names(16): Sample Barcode ... label elbowlabel
reducedDimNames(4): PCA TSNE PCA.elbow TSNE.elbow
mainExpName: NULL
altExpNames(0):

> head(colData(sce.symbol),n=2)
DataFrame with 2 rows and 16 columns
                  Sample          Barcode       sum  detected subsets_Mito_sum subsets_Mito_detected
             <character>      <character> <numeric> <integer>        <numeric>             <integer>
1 ~/OneDrive - Univers.. AAACATACGGTACT-1      2400       637               78                    11
2 ~/OneDrive - Univers.. AAACATTGCTCGCT-1      1374       428               47                     9
  subsets_Mito_percent     total   discard sizeFactor       Phase  G1.score G2M.score   S.score    label elbowlabel
             <numeric> <numeric> <logical>  <numeric> <character> <numeric> <numeric> <numeric> <factor>   <factor>
1              3.25000      2400     FALSE   0.612746          G1     0.821     0.337     0.165        7          9
2              3.42067      1374     FALSE   0.392857          G1     0.626     0.526     0.151        9          6

> str(genes)
 chr [1:50] "LYZ" "CST3" "LGALS1" "TYROBP" "S100A4" "IGLL1" "VPREB1" "CD79B" "PTMA" "STMN1" "SPINK2" "RPS24" ...

and the versions I'm using:

> packageVersion('dittoSeq')
[1] ‘1.6.0’
> BiocManager::version()
[1] ‘3.14’
> 

I've gotten the heatmap to work just a few days ago. I don't think I've updated any packages or versions. Not sure what to try next.

Thank you,

Anaconda recipe request

Could you make a recipe through anaconda to install dittoseq? Or will this be a feature that will be available once this is on bioconductor?

Final update goals prior to upcoming Bioconductor release

  • Make dittoDotPlot vars ordering based on input order rather than alphabetical (Should just require adding factor levels.)
  • Fix centers of boxplots and violinplots to overlap in dittoPlots when group.by != color.by, even when their widths are different (a.k.a. the default setting). (Should be as simple as adding position = position_dodge(width = 1))
  • Update the vignette to match with the new native handling of SummarizedExperiment objects
  • Update object documentation to refer to "Seurat, SingleCellExperiment, or SummarizedExperiment objects"
  • Collapse NEWs pieces to refer to v1.0 -> v1.2

Separate from the package build but still...

  • Update README news section to be more current

dittoBarPlot scale = 'percent' gives the fraction rather than the percentage. Same with dittoDotPlot.

Hello,

This is more of a nitpicking issue, but I wanted to hear your reasoning behind this decision.
While the x axis label on a barplot where scale = 'percent' has been passed says by default "percent of cells", the values are [0,1] instead of [0,100], technically making it the fraction of cells. A visual example:

dittoBarPlot(
seurat.df, 'labels', group.by = 'Condition', main = NULL, sub = NULL, xlab = NULL,
scale = 'percent', x.reorder = c(1,8,3,2,5,4,7,6),
retain.factor.levels = TRUE, x.labels.rotate = FALSE) + theme(axis.title = element_text(size = 12),
axis.text = element_text(size = 12))

Gives this figure:
Percentisfraction

Whereas to get what I want I have to pass something along these lines:

dittoBarPlot(
seurat.df, 'labels', group.by = 'Condition', main = NULL, sub = NULL, xlab = NULL,
scale = 'percent', x.reorder = c(1,8,3,2,5,4,7,6),
retain.factor.levels = TRUE, x.labels.rotate = FALSE) + theme(axis.title = element_text(size = 12),
axis.text = element_text(size = 12)) +
scale_y_continuous(breaks = seq(0, 1, by = 0.25), labels = c('0', '25', '50', '75', '100'))

actualpercent

While this might not be the prettiest solution, it works fairly well for me. However I cannot figure out for the life of me how to do the same in dittoDotPlot:
Here is another example (everything default):
dittoDotPlot(
seurat.df, features, group.by = 'labels')

dottestditto

Whereas Seurat gives the actual percentages:
DotPlot(
seurat.df, features = features, group.by = 'labels')

seuratdot

By messing around, I found out that it is related to scale_size from ggplot, but I was not able to simply change the legend text by passing new labels, without changing the size of the plot's points, making them all tiny...

Do you have any suggestions?
Thanks in advance for the brilliant package!

labeling in dittoDimPlot overrides factor order during faceting

Hi Dan - I've been enjoying your package and putting it through it's paces. I came across a bug in an admittedly unusual situation. I was trying to both split.by and do.label = TRUE, and found that somehow adding in the label caused the original factor order of the split.by category to be ignored. An example using the data set in your vignette:

library(dittoSeq)
library(scRNAseq)
library(SingleCellExperiment)
library(Seurat)
# Download data
sce <- BaronPancreasData()
# Trim to only 5 of the cell types for simplicity of vignette
sce <- sce[,meta("label",sce) %in% c(
  "acinar", "beta", "gamma", "delta", "ductal")]
## -----------------------------------------------------------------------------
# Make Seurat and grab metadata
seurat <- CreateSeuratObject(counts(sce))
seurat <- AddMetaData(seurat, sce$label, col.name = "celltype")
seurat <- AddMetaData(seurat, sce$donor, col.name = "Sample")
seurat <- AddMetaData(seurat,
                      PercentageFeatureSet(seurat, pattern = "^MT"),
                      col.name = "percent.mt")
# Basic Seurat workflow (possibly outdated, but fine for this vignette)
seurat <- NormalizeData(seurat, verbose = FALSE)
seurat <- FindVariableFeatures(object = seurat, verbose = FALSE)
seurat <- ScaleData(object = seurat, verbose = FALSE)
seurat <- RunPCA(object = seurat, verbose = FALSE)
seurat <- RunTSNE(object = seurat)
seurat <- FindNeighbors(object = seurat, verbose = FALSE)
seurat <- FindClusters(object = seurat, verbose = FALSE)

#Make split plot
dittoDimPlot(seurat, "celltype", split.by = "Sample")
# Note GSM*57 is first and GSM*60 is last

#Change celltype and Sample to factors not in alphabetical order
seurat$celltype <- factor(seurat$celltype, levels = c("ductal","gamma","acinar","beta","delta"))
seurat$Sample <- factor(seurat$Sample, levels = c("GSM2230760", "GSM2230757","GSM2230759","GSM2230758"))

#Split plot again
dittoDimPlot(seurat, "celltype", split.by = "Sample")
#GSM*60 is now first plot, as it should be

#Now also add labels:
dittoDimPlot(seurat, "celltype", split.by = "Sample", do.label = TRUE)
#GSM*60 is back to being last, but celltype factor ordering is preserved

Not sure what is going on (ggplot2 is an admitted weak spot for me) but thought I would bring it to your attention. Thanks!

Is it possible to get separate graphs for each split.by of dittoDimPlot?

Hello and thanks for developing this great visualization package!

I had a few questions about faceting with split.by in dittoDimPlot.

Take a look at this example:
test

I have split my data based on a metadata column, but since this column has 8 conditions, one spot remains empty in the 3x3 grid, which I would like to fill with another dimplot.

My questions are the following:

  1. Is there a way to move the empty spot from the right of the grid to the left? I have already reordered the graphs the way I wanted them by changing the factors in the order I wanted them to appear.
  2. I tried merging this plot and the one that I would like to add to it by using ggarrange or similar methods but I could not get them to align. Is it even possible to align a single plot with a plot that is already on a grid?
  3. I was thinking that if I had each plot produced by the split.by argument as a single object, it would be much easier to make my own grid, but I cannot find a way to do so. Is it possible to do that?
  4. If not, I could make a separate plot for each of the 8 conditions, but I would miss the cool grey box with the legend. Is there an argument to put that grey legend box in a standard dittoDimPlot without using the split.by argument? For example I would prefer to have that as my title instead of using main = 'Title', especially if later they would be arranged in a grid with its own title.

Sorry if these questions are trivial, I am still very new at this, and would like to utilize the package to its full extent!
I understand that the questions might be more ggplot2 based rather than dittoSeq, but I would like to hear your opinion.
Thanks in advance.

Error when running dittoHeatmap

Hello,

I was encountering some errors when running the following scripts:

dittoHeatmap(cells, genes = NULL, metas = names(ES.cells),
annot.by = "groups",
fontsize = 7,
cluster_cols = TRUE,
heatmap.colors = colors(50))

Here is the error message generated after,

Error in .which_data(assay, slot, object)[genes, cells.use] :
invalid or not-yet-implemented 'Matrix' subsetting

Can you help me figure this out?

Thanks!
Hous

drop_levels is ignored in dittoHeatmap

First of all, thank you for this lifesaver of a package. I could make this plot manually, but it was getting to be a real pain.

Minor issue, but pheatmap seems to ignore the drop_levels parameter (TRUE by default) even if I specify manually, e.g.

dittoHeatmap(object = miniseq, 
                         genes = markers, 
                         annot.by = "type_state", 
                         scaled.to.max = T,
                         complex = F,
                         drop_levels = T)

However if I manually refactor first miniseq$group = factor(as.character(miniseq$group)), the unused level disappears. Is drop_levels set to FALSE within the wrapper somewhere? FWIW, other parameters passed to pheatmap work just fine, e.g. labels_row.

Seurat 3.0

This is great - I can't wait to start using this to make better plots. Do you plan to release a version that is compatible with Seurat 3.0.x?

Space in ridge plot

Hi Daniel,

first of all: thanks for this great package! It really saves so much time plotting things!
I'm a huge fan of the multi_dittoPlot function but always notice a gap at the upper part of the y-axis (see attached).
I generated the plot with the following call:

multi_dittoPlot(sce, vars = rownames(sce)[1:2], group.by = "ImageNumber", plots = "ridgeplot", assay = "exprs") 

Here, exprs contains arcsinh-transformed counts (not working with scRNAseq data anymore ;) ) and ImageNumber can be seen as a patient identifier.
Do you have a suggestion how to get rid of this gap?

image

plotting multiple smoothers in the scatter plot

Dear dittoSeq team,

Thanks for this very useful package.
I have a question regarding plotting smoothers in the scatter plot.

I used

dittoScatterPlot(epi.integrated,x.var = "pseudotime",'AT1G61760') and got a plot

image

This object (epi.integrated) is an integrated Seurat object of five samples. I want to get smoother like (see below) but for five different scatter smoothers for each sample. Something like plotSmoothers() of tradeSeq and group by "orig.ident" instead of lineages. Is it possible to obtain this directly from dittoScatterplot for model like "loess"?

image

Thanks,
Rahul

Pass `width` and `height` arguments to ggplotly when `do.hover = TRUE`

Controlling the size of interactive figures can currently only be done by passing the plot to layout() and setting the width and height there. This is being deprecated by plotly, so these arguments should be passed directly to the ggplotly call when do.hover = TRUE. Otherwise, they can just be ignored.

color separation in multi_dittoDimPlotVaryCells

Hi, Daniel:

I tried to plot logged clone size in my seurat object by patient. But the color seems always the default blue yellow ditto.colors, even when I set the color.panel = c("blue", "orange", "purple", "yellow", "red").
dittoSeq treated logged clone size as discrete or continuous? Do I need to set the color levels?
can I do something like
scale_color_gradientn(colors=c("lightgrey","#2c7bb6","yellow","#d7191c"), values=rescale(c(-0.1, 0, lowerbound, upperbwound)), guide="colorbar", limits=c(-0.1, ub), name='CloneSize')?

thanks for your time

Make updates for bioconductor review

Review checklist:

DESCRIPTION

  • Please format the description field to fit within the 80 character width.

Vignettes

  • Please update the formatting to fit the 80 character width.
  • Update the Bulk RNAseq section to remove all @s. Use extractor functions or include those data as separate objects.

Additional files / folders

  • Please remove / move the extra folders in the package.
  • Consider merging the README.md files into one.
  • Image assets can be placed in the vignettes or other folder or in a separate GitHub repository for linking (as it seems like they are only used for README files). ((placed in vignettes/))

R Code

  • Recode how importDemux2Seurat messages are generated using a wrapper for message() like .msg(verbose, text).
  • Check for package availability before using Seurat functions with requireNamespace and creating conditional code that will error when the package isn't available. See our guidelines for more information. ((Did this for both Seurat and plotly))
  • Remove defaulting T_T
    • Avoid the use of eval(parse(...)) and typeof(x) == 'character'
    • Avoid using character input and instead use only the object.
    • Test class membership with is(x, "classname") avoid using == in getReductions.
  • Change RNAseq implementation to utilize SingleCellExperiment which extends SummarizedExperiment with convenient dimensionality reduction storage.
    • Replace all current RNAseq methods with methods that work for this new class.
    • Please use endomorphic operations to resolve issues with addPrcomp and related functions. A more straightforward approach would be to use object as the first argument and require it to not be missing. Your return value would be your data class with additional data.
    • For metadata, you can make use of the metadata<- generic and create a method for your class. It appears that you are use S3 methods on an S4 class with addMeta<-.RNAseq. Please see point above. ((This exists for SingleCellExperiment objects already.))
    • Create accessor and setter functions for your class. There should be zero to minimal use of the @ function. ((These exist for SingleCellExperiment objects already.))
    • Combine the import* functions into one function & add conditional checks for suggested packages as described above. (for condition checks, moved edgeR and SummarizedExperiment to Imports:, DESeqDataSets are SummarizedExperiments, so I left DESeq2 itself in Suggests: to keep the package slimmer.)
    • RNAseq_mock data can be the output of importDESeq2 or prcomp. ((output of example("addDimReduction") is not a fully usable mock_bulk object called myRNA))
      • You can then use this data to provide the addDimReduction example in the documentation.

Creating dittoDimPlots with larger color vectors?

I recently encountered an instance where I was trying to make a dittoDimPlot which required 78 separate colors. I got the error Error: Insufficient values in manual scale. 78 needed but only 24 provided. I was able to manually concatenate some other color hexes with dittoColors(), but I'm wondering if larger color vectors might be possible in the future?

In case anyone runs into this, here's the code I used.

library(RColorBrewer)
qual_cols = brewer.pal.info[brewer.pal.info$category == 'qual',]
col_vector = unlist(mapply(brewer.pal, qual_cols$maxcolors, rownames(qual_cols)))
col_vector = c(dittoColors(), col_vector)

You can then pass col_vector to dittoDimPlot. Disclaimer, I don't think these colors are color-blind friendly anymore ):

Feature request: capacity to drop elements from the legend directly

The "proper" way to achieve this is to muck with the factor levels of the variable in question directly, which is at best annoying and time-consuming.

The ability to drop legend elements directly with functionality similar to the current rename.groups implementation in DBBarPlot would be extremely convenient, particularly when subsetting.

Repel labels do not work in dittoDimPlot function

I have tried to use the dittoDimPlot function and specify parameter, labels.repel = TRUE. However, repel labels do not work in the plot.

Screenshot 2564-07-25 at 19 39 20

R version 4.0.5 (2021-03-31)
Platform: x86_64-apple-darwin20.4.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /usr/local/Cellar/r/4.0.5_2/lib/R/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] devtools_2.4.2 usethis_2.0.1 dittoSeq_1.5.1 reshape_0.8.8
[5] RColorBrewer_1.1-2 dplyr_1.0.7 CytoTree_1.0.3 FlowSOM_2.1.26
[9] igraph_1.2.6 cytofCore_0.4 PCAtools_2.2.0 ggrepel_0.9.1
[13] scDataviz_1.0.0 readxl_1.3.1 cowplot_1.1.1 ggplot2_3.3.5
[17] flowCore_2.2.0 CATALYST_1.14.1 SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
[21] Biobase_2.50.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.7 IRanges_2.24.1
[25] S4Vectors_0.28.1 BiocGenerics_0.36.1 MatrixGenerics_1.2.1 matrixStats_0.59.0

loaded via a namespace (and not attached):
[1] rsvd_1.0.5 vcd_1.4-8 ica_1.0-2 corpcor_1.6.9
[5] ps_1.6.0 class_7.3-19 rprojroot_2.0.2 lmtest_0.9-38
[9] crayon_1.4.1 laeken_0.5.1 spatstat.core_2.3-0 MASS_7.3-54
[13] nlme_3.1-152 backports_1.2.1 sva_3.38.0 rlang_0.4.11
[17] XVector_0.30.0 ROCR_1.0-11 irlba_2.3.3 callr_3.7.0
[21] limma_3.46.0 scater_1.18.6 smoother_1.1 BiocParallel_1.24.1
[25] rjson_0.2.20 bit64_4.0.5 glue_1.4.2 pheatmap_1.0.12
[29] sctransform_0.3.2 processx_3.5.2 vipor_0.4.5 spatstat.sparse_2.0-0
[33] AnnotationDbi_1.52.0 isoband_0.2.5 spatstat.geom_2.2-2 haven_2.4.1
[37] tidyselect_1.1.1 SeuratObject_4.0.2 rio_0.5.27 fitdistrplus_1.1-5
[41] XML_3.99-0.6 tidyr_1.1.3 zoo_1.8-9 ggpubr_0.4.0
[45] nnls_1.4 xtable_1.8-4 RcppHNSW_0.3.0 magrittr_2.0.1
[49] evaluate_0.14 cli_3.0.1 scuttle_1.0.4 zlibbioc_1.36.0
[53] rstudioapi_0.13 miniUI_0.1.1.1 sp_1.4-5 rpart_4.1-15
[57] RcppEigen_0.3.3.9.1 tinytex_0.32 shiny_1.6.0 prettydoc_0.4.1
[61] BiocSingular_1.6.0 xfun_0.24 askpass_1.1 clue_0.3-59
[65] pkgbuild_1.2.0 cluster_2.1.2 pcaMethods_1.82.0 tibble_3.1.3
[69] listenv_0.8.0 png_0.1-7 future_1.21.0 withr_2.4.2
[73] bitops_1.0-7 aws.signature_0.6.0 ggforce_0.3.3 RBGL_1.66.0
[77] ranger_0.13.1 plyr_1.8.6 cellranger_1.1.0 ncdfFlow_2.36.0
[81] dqrng_0.3.0 e1071_1.7-7 pillar_1.6.1 RcppParallel_5.1.4
[85] GlobalOptions_0.1.2 cachem_1.0.5 multcomp_1.4-17 fs_1.5.0
[89] scatterplot3d_0.3-41 CytoML_2.2.2 TTR_0.24.2 GetoptLong_1.0.5
[93] gmodels_2.18.1 RUnit_0.4.32 DelayedMatrixStats_1.12.3 xts_0.12.1
[97] vctrs_0.3.8 ellipsis_0.3.2 generics_0.1.0 tools_4.0.5
[101] foreign_0.8-81 beeswarm_0.4.0 munsell_0.5.0 tweenr_1.0.2
[105] aws.s3_0.3.21 proxy_0.4-26 DelayedArray_0.16.3 pkgload_1.2.1
[109] fastmap_1.1.0 compiler_4.0.5 abind_1.4-5 httpuv_1.6.1
[113] sessioninfo_1.1.1 plotly_4.9.4.1 GenomeInfoDbData_1.2.4 gridExtra_2.3
[117] edgeR_3.32.1 lattice_0.20-44 ggnewscale_0.4.5 ggpointdensity_0.1.0
[121] deldir_0.2-10 utf8_1.2.2 later_1.2.0 jsonlite_1.7.2
[125] ggplot.multistats_1.0.0 scales_1.1.1 graph_1.68.0 pbapply_1.4-3
[129] carData_3.0-4 sparseMatrixStats_1.2.1 genefilter_1.72.1 lazyeval_0.2.2
[133] promises_1.2.0.1 car_3.0-11 latticeExtra_0.6-29 goftest_1.2-2
[137] spatstat.utils_2.2-0 reticulate_1.20 flowUtils_1.54.0 rmarkdown_2.9
[141] openxlsx_4.2.4 sandwich_3.0-1 Rtsne_0.15 forcats_0.5.1
[145] uwot_0.1.10 survival_3.2-11 yaml_2.2.1 plotrix_3.8-1
[149] cytolib_2.2.1 flowWorkspace_4.2.0 htmltools_0.5.1.1 memoise_2.0.0
[153] Seurat_4.0.3 locfit_1.5-9.4 destiny_3.4.0 viridisLite_0.4.0
[157] digest_0.6.27 assertthat_0.2.1 mime_0.11 RSQLite_2.2.7
[161] future.apply_1.7.0 remotes_2.4.0 data.table_1.14.0 blob_1.2.1
[165] drc_3.0-1 labeling_0.4.2 splines_4.0.5 Cairo_1.5-12.2
[169] RCurl_1.98-1.3 broom_0.7.8 hms_1.1.0 colorspace_2.0-2
[173] ConsensusClusterPlus_1.54.0 base64enc_0.1-3 BiocManager_1.30.16 ggbeeswarm_0.6.0
[177] shape_1.4.6 nnet_7.3-16 Rcpp_1.0.7 mclust_5.4.7
[181] RANN_2.6.1 mvtnorm_1.1-2 circlize_0.4.13 RProtoBufLib_2.2.0
[185] fansi_0.5.0 VIM_6.1.0 parallelly_1.26.1 R6_2.5.0
[189] grid_4.0.5 ggridges_0.5.3 lifecycle_1.0.0 zip_2.2.0
[193] curl_4.3.2 ggsignif_0.6.2 gdata_2.18.0 testthat_3.0.4
[197] leiden_0.3.8 robustbase_0.93-8 Matrix_1.3-4 desc_1.3.0
[201] RcppAnnoy_0.0.18 TH.data_1.0-10 stringr_1.4.0 htmlwidgets_1.5.3
[205] umap_0.2.7.0 beachmat_2.6.4 polyclip_1.10-0 purrr_0.3.4
[209] crosstalk_1.1.1 ComplexHeatmap_2.6.2 mgcv_1.8-36 globals_0.14.0
[213] openssl_1.4.4 patchwork_1.1.1 codetools_0.2-18 prettyunits_1.1.1
[217] gtools_3.9.2 RSpectra_0.16-0 gtable_0.3.0 DBI_1.1.1
[221] tensor_1.5 httr_1.4.2 KernSmooth_2.23-20 stringi_1.7.3
[225] reshape2_1.4.4 farver_2.1.0 annotate_1.68.0 viridis_0.6.1
[229] ggthemes_4.2.4 hexbin_1.28.2 Rgraphviz_2.34.0 xml2_1.3.2
[233] colorRamps_2.3 rvcheck_0.1.8 ggcyto_1.18.0 boot_1.3-28
[237] BiocNeighbors_1.8.2 scattermore_0.7 DEoptimR_1.0-9 bit_4.0.4
[241] spatstat.data_2.1-0 scatterpie_0.1.6 jpeg_0.1-8.1 pkgconfig_2.0.3
[245] corrplot_0.90 rstatix_0.7.0 knitr_1.33

Support for ridge plots

As we previously discussed, support for ridge plots (see Seurat's RidgePlot function) in DBPlot would be nice.

Violin, jitter, and boxplot don't align when using `color.by` in dittoPlot.

Reprex for dittoSeq v1.6.0:

library(recount3)
library(DESeq2)
library(SummarizedExperiment)

se <- recount3::create_rse_manual(
  project = "GBM",
  project_home = "data_sources/tcga",
  organism = "human",
  annotation = "gencode_v26",
  type = "gene"
)

assay(se, "counts") <- transform_counts(se)
assay(se, "raw_counts") <- NULL

rd <- rowData(se)
rownames(se) <- rd$gene_name

dds <- DESeqDataSet(se, design = ~tcga.gdc_cases.samples.sample_type)

assay(dds, "lognorm") <- assay(normTransform(dds))
assay(dds, "vst") <- assay(vst(dds))

dittoPlot(dds, "GFAP", group.by = "tcga.gdc_cases.samples.sample_type", 
  color.by = "tcga.gdc_cases.demographic.gender", assay = "vst", plots = c("vlnplot", "boxplot", "jitter"), 
  jitter.size = 1.5)

Results in:
image

I can't remember if we ran into this one previously or not. I'm not sure why the recurrent tumor group looks fine but the primaries don't.

DimPlot can't do both split.by and do.label

I was running through your vignette and tried to both split.by = "Sample" and then add do.label = TRUE but got this error:

dittoDimPlot(seurat, "celltype", split.by = "Sample")
#this works fine

dittoDimPlot(seurat, "celltype", split.by = "Sample", do.label = TRUE)
Error: Aesthetics must be either length 1 or the same as the data (20): label
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Aesthetics must be either length 1 or the same as the data (20): label
Backtrace:
  1. (function (x, ...) ...
  2. ggplot2:::print.ggplot(x)
  4. ggplot2:::ggplot_build.ggplot(x)
  5. ggplot2:::by_layer(function(l, d) l$compute_geom_2(d))
  6. ggplot2:::f(l = layers[[i]], d = data[[i]])
  7. l$compute_geom_2(d)
  8. ggplot2:::f(..., self = self)
  9. self$geom$use_defaults(data, self$aes_params, modifiers)
 10. ggplot2:::f(..., self = self)
 11. ggplot2:::check_aesthetics(params[aes_params], nrow(data))
Run `rlang::last_trace()` to see the full context.

The error comes when trying to plot the ggplot object created by dittoDimPlot(), because dittoDimPlot() itself doesn't trigger the error:

> p <- dittoDimPlot(seurat, "celltype", split.by = "Sample", do.label = TRUE)
> p
Error: Aesthetics must be either length 1 or the same as the data (20): label
Run `rlang::last_error()` to see where the error occurred.

It must be able to be done because Seurat's DimPlot can both split and label

DimPlot(seurat, label = TRUE, split.by = "Sample", ncol = 2)

I've just started trying out your package and really like it! I personally do not need it for the color-blindness aspect, but it does a much better job at selecting contrasting colors for close clusters.

dittoHeatmap scale colour display limits

Hi,

I first want to say what a great utility box! It really saved a lot of time making figures, thank you so much, ill be sure to acknowledge wherever I can :)

I wanted to enquire a bit more about the color gradient scale breaks for dittoheatmap. It would be nice if it had something similar to the display limit function of Seurat: "DoHeatmap(seurat_obj, disp.min = -2.5, disp.max = 2.5), to adjust for colour washing out for when you plot lowly divergent with highly divergent genes instead of having to go through pheatmap or complexheatmap function. Maybe that exists already and I missed it? My bad if thats the case!

Allow gene-based grouping for dittoPlot

It is sometimes useful to group samples by gene along the x-axis, like so:

image

For small numbers of genes, this can provide a more compact figure than faceting as multi_dittoPlot does.

Repel labels and ensure they are fully in the plot space

The labels for dittoDimPlot are really nice, but they can sometimes crowd each other or get cut off:
image

This can be somewhat alleviated by adjusting label.size, but overlapping still occurs. Labels can't be moved in post (Illustrator), as the cells they cover seemingly aren't plotted. Adding a "repel" option and a check to ensure all labels are fully within the plotting area would be nice quality of life changes.

`dittoPlot()`s nolonger render as intended when `do.hover = TRUE`

There have always been some differences after plotly conversion, but it's definitely gotten worse than before. (Before: seemed to just be that boxplots always showed outliers in the past, and jitter points have fills. Not too bad!)

library(dittoSeq)
example("importDittoBulk", echo = FALSE)

# do.hover = FALSE
dittoPlot(myRNA, "gene1", "SNP", "timepoint", shape.by = "timepoint",
          plots = c("vlnplot", "boxplot", "jitter"), jitter.size = 3)

image

# do.hover = TRUE
dittoPlot(myRNA, "gene1", "SNP", "timepoint", shape.by = "timepoint",
          plots = c("vlnplot", "boxplot", "jitter"), jitter.size = 3,
          do.hover = TRUE)

newplot

Multiple bits here...

  • Diagonal jitter banding: seems to be related to aes(text) not being successfully ignored by geom_jitter(). Potentially can fix by using plotly machinery to adjust hover info after ggplotly conversion.
  • violin and boxplot width/position issues only come into play when color.by is used to create sub-groupings:
    • boxplots don't dodge at all. need to investigate cause/fix
    • violin plot widths are not reduced accordingly. need to investigate cause/fix

Getting a permanent error

Hello

I have a Seurat object make directly from log2 normalised data (I don't have access to the raw counts)

I get this error

> dittoDimPlot(pbmc2)
Error in .leave_default_or_null(main, var, length(var) != 1) : 
  argument "var" is missing, with no default
> pbmc2
An object of class Seurat 
11548 features across 6651 samples within 1 assay 
Active assay: RNA (11548 features, 2000 variable features)
 3 dimensional reductions calculated: tsne, pca, umap
> 

What is the solution please

How to make heatmap legend smaller or multiple lines to include all?

Hi dittoSeq,

Thanks for such a great object-based plot package.
I am using our dittoHeatmap() and there are 33 samples, but the legend is too big to include all (only 22 are kept). Also, I annotated two factors (donor and dis) but only one legend (donor) can be retained. I tried the ggplot theme() and it doesn't work. Is there a way to make the legend smaller to include all the sample information?
image

One more question: is there a function we can get the z-score to make the heatmap?

Thanks,
Yale

boxplot p-value significance indicator?

Hello. May I ask if there's a functionality for p-value significance indicator between boxplots?

I made a split boxplot of gene signature split by Healthy, Disease, and Recovery. I wonder if this is something possible. Thank you!

Add warning for providing a single variable to multi-plotters

As it stands, providing a single variable to the multiplotters results in a somewhat cryptic error and tends to cause R to hang/crash on occasion. A more explicit warning would be helpful. Falling back to the appropriate single variable function would be an elegant, though more obfuscated, way to get around this as well.

Drop unused factor levels in dittoHeatmap

Currently, using cells.use to limit to cells of certain factors of a metadata variable in dittoHeatmap doesn't drop unused factor levels from the resulting legend. Passing drop_levels = TRUE to pheatmap also does not resolve this.

Order column names in dittoheatmap

Thanks for this useful package.
I have a question about heatmap column ordering.
I am using a seurat object where the cell type names as well as orig.ident (sample) names are not in order.
I wanted to order them as
ct<-c("Epidermal cells", "Mesophyll cells", "Xylem cells" , "Phloem cells" ,"Companion cells", "Tracheary element")
orig.ident <- ('S4','S2')

I used a code (to order cell type only as I dont know how to order both the cell types and samples simultaneously)

dittoHeatmap(s.integrated,genes = c('AT4G38130', 'AT5G63110'),
annot.by = c( "CellType","orig.ident"),col_order=ct)

which run but nothing changed.

I will thankful if anyone can suggest me how to order both the cell types and samples.

Regards,
Rahul
Rplot04

Multi-gene visualization in a single DimPlot with blended color

Greetings,
really nice work, thank you very much for sharing it with the community.

I would like to ask the developers about a particular functionality of DimPlots, that could be very useful to add to your features.

In particular, the combined visualization of two genes in a single DimPlot, with blended color in cells with overlapping expression from both genes, would be very useful. An example of such a visualization is illustrated in Seurat's visualization vignette: https://satijalab.org/seurat/articles/visualization_vignette.html , and in particular in the [FeaturePlot](https://satijalab.org/seurat/reference/FeaturePlot.html)(pbm[c](https://rdrr.io/r/base/c.html)3k.final, features = c("MS4A1", "CD79A"), blend = TRUE) code example.

Cheers

Reorder levels from dimplot to dittobarplot

I have a very naive question regarding dittoBarplolt representation. I want to use the same order and colours in the UMAP and dittoBarplot (figure below), but i can´t. Do yoy know how can I apply the order and the colours of the UMAP figure in celltype proportion figure?
figure

Someone recomended me to change the order in Dimplot to match with the one in dittoBarplot, but I would like to use the specific order from Dimplot. Hence, I tried to change the level order of the cell type in dittoSeq but it is not clear for me how to do it. Can you help me please?

Many thanks

Renaming gene IDs into gene names in dittoHeatmap

Again thanks for this tool!

I am trying to change the gene IDs with gene names. Although it is possible by "labels_row=gene_names", however it seems that the gene IDs in the plot and replaced gene names are not in order (wrong gene names are assigned).
Is there any way (like easy "dittoseq" way) to do this?

Regards,
Rahul

my code
filter_c_S2S4 <- c('AT1G52560', 'AT5G02500', 'AT5G02490', 'AT3G09440', 'AT3G12580')
cls="Epidermal cells"
pg<-dittoHeatmap(s.integrated,genes = intersect(filter_c_S2S4,rownames(s.integrated)),annot.by = c( "orig.ident","CellType"),
order.by = c( "orig.ident","CellType"),assay = 'SCT',
cells.use = colnames(subset(s.integrated,ident=cls)))

gene_labels <- pg$tree_row$labels
gene_order <- pg$tree_row$order

gene_labels_ord <- gene_labels[gene_order]

dicn <- read.csv('gene_ID_to_name.csv',header = T)
dicn_names<-dicn$Name
names(dicn_names)<-dicn$ID

gl <- c()
for (g in gene_labels_ord){
gl<-append(gl,paste(dicn_names[g]))
}

dittoHeatmap(s.integrated,genes = intersect(filter_c_S2S4,rownames(s.integrated)),annot.by = c( "orig.ident","CellType"),
order.by = c( "orig.ident","CellType"),assay = 'SCT',
cells.use = colnames(subset(s.integrated,ident=cls)),labels_row= gl)

I also tried
pg+scale_y_discrete(labels = dicn_names)

which also doesn't work

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.