Coder Social home page Coder Social logo

satijalab / azimuth Goto Github PK

View Code? Open in Web Editor NEW
100.0 9.0 28.0 7.69 MB

A Shiny web app for mapping datasets using Seurat v4

Home Page: https://satijalab.org/azimuth

License: GNU General Public License v3.0

Dockerfile 0.05% R 7.30% CSS 0.02% C++ 0.09% HTML 92.54%
single-cell-rna-seq single-cell-genomics shiny-app

azimuth's Introduction

Azimuth v0.5.0

Lifecycle

Azimuth is a Shiny app demonstrating a query-reference mapping algorithm for single-cell data. The reference data accompanying the app and the algorithms used are described in the publication “Integrated analysis of multimodal single-cell data” (Y. Hao, S. Hao, et al., Cell 2021).

We have made instances of the app available for public use, described here.

All the analysis and visualization functionality available in the app - and much more - is available in version 5 of the the Seurat R package.

Installation

Note: you may need to update some packages prior to installing Azimuth; from a fresh R session run:

update.packages(oldPkgs = c("withr", "rlang"))

You can install Azimuth from GitHub with:

if (!requireNamespace('remotes', quietly = TRUE) {
  install.packages('remotes')
}
remotes::install_github('satijalab/azimuth', ref = 'master')

Running the app

The app is launched as:

Azimuth::AzimuthApp()

By default, the appropriate reference files are loaded into memory by accessing a web URL. If you instead have a directory containing reference files at /path/to/reference (directory must contain files named ref.Rds and idx.annoy), specify it as:

Azimuth::AzimuthApp(reference = '/path/to/reference')

Downloading the app reference files

You can download the reference files that would be automatically loaded by default from Zenodo. Links are available on the Azimuth website here.

Specifying options

You can set options by passing a parameter to the AzimuthApp function. If you would like to run the Azimuth ATAC workflow with a sc/snATAC-seq query, specify: "Azimuth.app.do_bridge": "TRUE". Options in the Azimuth.app namespace (e.g. max_cells as shown in the example below) can omit the “Azimuth.app.” prefix. Options in other namespaces (e.g. Azimuth.de.digits as shown in the example below) including non-Azimuth namespaces, must be specified using their full name.

Azimuth::AzimuthApp(max_cells = 100000)


Azimuth::AzimuthApp('Azimuth.de.digits' = 5)

We also support reading options from a JSON-formatted config file. Provide the path to the config file as the parameter config to AzimuthApp. Example config file. As described above regarding setting options through parameters, the “Azimuth.app.” prefix may be omitted.

Azimuth::AzimuthApp(config = 'config.json')

You can also set Azimuth or other options in R. (The full name must always be specified, even for options in the Azimuth.app namespace.)

options('Azimuth.de.digits' = 5)
Azimuth::AzimuthApp()

Options can be set in any of these three ways simultaneously. Please note that options set in R will be overwritten by the same option specified in a config file, which are both overwritten by the same option provided as a parameter to the AzimuthApp function.

Docker

First, build the Docker image. Clone the repository and run the following while in the root of the repository to build the image and tag it with the name “azimuth”:

docker build -t azimuth .

Next, launch a container based on the image azimuth with a bind mount mounting the directory on the host containing the reference files (e.g. /path/to/reference) as /reference-data in the container.

docker run -it -p 3838:3838 -v /path/to/reference:/reference-data:ro azimuth

If port 3838 is already in use on the host or you wish to use a different port, use -p NNNN:3838 in the run command instead, to bind port NNNN on the host to port 3838 on the container. The container runs the command R -e "Azimuth::AzimuthApp(reference = '/reference-data')" by default.

Rebuilding the Docker image more quickly in certain cases

The docker image takes about 20 minutes to build from scratch. To save time, adding the argument --build-arg SEURAT_VER=$(date +%s) to the docker build command will use cached layers of the image (if available) and only reinstall Seurat and Azimuth (and not any of the dependencies), which takes less than a minute. Alternatively, to only reinstall Azimuth (and not Seurat or other dependencies) use the argument --build-arg AZIMUTH_VER=$(date +%s).

Specifying options

You can set options by passing a parameter to the AzimuthApp function:

docker run -it -p 3838:3838 -v /path/to/reference:/reference-data:ro azimuth R -e "Azimuth::AzimuthApp(reference = '/reference-data', max_cells = 100000)"

or providing the path to a config file (in this example, for convenience, the config file is assumed to be in the reference directory that is bind mounted to the container):

docker run -it -p 3838:3838 -v /path/to/reference:/reference-data:ro azimuth R -e "Azimuth::AzimuthApp(config = '/reference-data/config.json', max_cells = 100000)"

or setting the option in R:

docker run -it -p 3838:3838 -v /path/to/reference:/reference-data:ro azimuth R -e "options('Azimuth.map.pbcorthresh' = 0.5)" -e "Azimuth::AzimuthApp(reference = '/reference-data')"

or just starting a shell in the container, from which you can launch an interactive R session and set options as desired:

docker run -it -p 3838:3838 -v /path/to/reference:/reference-data:ro azimuth /bin/bash

Support

Azimuth annotation is currently supported in two ways; via the Azimuth app or via the RunAzimuth() function. Both methods accept the same kinds of files and run the same annotation workflow. To run the app please visit the website here. To use RunAzimuth() please see this tutorial.

If you use the instance of the app we are hosting on the web, you can download a Seurat R script once your analysis is complete that will guide you in reproducing the analysis. You do not need Azimuth to reproduce the analysis.

If you would like to help us improve the app, and you believe a dataset meets the requirements and it is publicly available for us to use for debugging but the app doesn’t work, please file a Github issue linking to the dataset and describing the problem on the issues page.

azimuth's People

Contributors

andrewwbutler avatar austinhartman avatar cdarby avatar gesmira avatar jaisonj708 avatar keller-mark avatar mojaveazure avatar rsatija avatar saketkc avatar yuhanh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azimuth's Issues

How to extract the annotated cell type for each cell from Azimuth results

Hi Admin,

I have got Azimuth running locally, I would like to extract the annotated cell type for each cell from Azimuth results.

But I could not :

query
An object of class Seurat
46330 features across 11769 samples within 4 assays
Active assay: refAssay (23036 features, 4999 variable features)
3 other assays present: RNA, prediction.score.celltype.l2, impADT
2 dimensional reductions calculated: integrated_dr, proj.umap
rownames(query[['prediction.score.celltype.l2']])
[1] "gdT" "CD8 TEM" "CD8 TCM"
[4] "dnT" "B intermediate" "CD4 TCM"
[7] "pDC" "NK" "B naive"
[10] "CD14 Mono" "Plasmablast" "CD4 Naive"
[13] "Treg" "CD16 Mono" "HSPC"
[16] "NK Proliferating" "CD4 TEM" "CD8 Naive"
[19] "MAIT" "B memory" "NK-CD56bright"
[22] "cDC2" "Platelet" "Eryth"
[25] "CD4 CTL" "cDC1" "CD4 Proliferating"
[28] "ILC" "ASDC" "CD8 Proliferating"
colnames(query[['prediction.score.celltype.l2']])[1:5]
[1] "AAACCCAAGCGCCCAT-1" "AAACCCAAGGTTCCGC-1" "AAACCCACAGAGTTGG-1"
[4] "AAACCCACAGGTATGG-1" "AAACCCACATAGTCAC-1"
df <- as.data.frame (query[['prediction.score.celltype.l2']])
Error in as.data.frame.default(query[["prediction.score.celltype.l2"]]) :
cannot coerce class ‘structure("Assay", package = "SeuratObject")’ to a data.frame

Could you please tell me how to extract the annotated cell type for each cell from Azimuth results ?

Thanks,

all(dims >= dims.min) is not TRUE

Hi ,
I got this after I upload my h5ad file to map
all(dims >= dims.min) is not TRUE

second how to download the reference dataset for the kidney?
it seems my query rds object is very big

Thank you

FindTransferAnchors error in R script

Hello,
Thank you for this great app.
I'm using Azimuth 0.4.3 and Seurat 4.0.4.

I ran Azimuth online using the PBMC reference with my dataset of ~6k NK/T cells.
Even though I got 12.74% of query cells with anchors and a 2.63/5, cluster preservation score, which I assume is due to the use of a homogeneous group of cells, I had coherent results.

After downloading the analysis script template, I tried to replicate the online analysis but at the FindTransferAnchors step I get this issue :
Error in idx[i, ] <- res[[i]][[1]] :
argument is of length zero

I used the default arguments in the script, which were the following :

FindTransferAnchors(
reference = reference$map,
query = query,
k.filter = NA,
reference.neighbors = "refdr.annoy.neighbors",
reference.assay = "refAssay",
query.assay = "refAssay",
reference.reduction = "refDR",
normalization.method = "SCT",
features = intersect(rownames(x = reference$map), VariableFeatures(object = query)),
dims = 1:50,
n.trees = 20,
mapping.score.k = 100)

Any idea on what is causing this issue ?

Regards,
Yannick

Deploying strategy

Hi,

I am a Phd student learning how to deploy shiny app. I wonder if you would mind to describe how did you deploy this azimuth app at:
https://azimuth.hubmapconsortium.org/
In the meantime, how did you handle concurrent users for this online azimuth shiny app.

Thank you!

azimuth_analysis.R code for human-kidney error message

Dear Azimuth team,

I go the following error message from azimuth_analysis.R code which I downloaded from https://app.azimuth.hubmapconsortium.org/app/human-kidney

Even though with " features = intersect(rownames(x = reference$map), VariableFeatures(object = query)),"

"Error in index$getNNsByVectorList(query[x, ], k, search.k, include.distance) :
fv.size() != vector_size
Calls: AddMetaData ... resolve.list -> signalConditionsASAP -> signalConditions
Execution halted"

But I can run it though https://app.azimuth.hubmapconsortium.org/app/human-kidney, that is how I got the azimuth_analysis.R code.

I appreciate your help!

if (packageVersion(pkg = "Seurat") < package_version(x = "4.0.0")) {

  • stop("Mapping datasets requires Seurat v4 or higher.", call. = FALSE)
  • }

Ensure glmGamPoi is installed

if (!requireNamespace("glmGamPoi", quietly = TRUE)) {

  • BiocManager::install("glmGamPoi")
    
  • }
  • }

Ensure Azimuth is installed

if (packageVersion(pkg = "Azimuth") < package_version(x = "0.4.0")) {

  • stop("Please install azimuth - remotes::install_github('satijalab/azimuth')", call. = FALSE)
  • }

library(Seurat)
Attaching SeuratObject
library(Azimuth)
Registered S3 method overwritten by 'SeuratDisk':
method from
as.sparse.H5Group Seurat
Attaching shinyBS

Download the Azimuth reference and extract the archive

Load the reference

Change the file path based on where the reference is located on your system.

#reference <- LoadReference(path = "/reference-data/human_kidney/")
#reference <- LoadReference(path = "https://seurat.nygenome.org/azimuth/references/v1.0.0/human_pbmc")
reference <- LoadReference(path = "https://seurat.nygenome.org/azimuth/references/v1.0.0/human_kidney")

Load the query object for mapping

Change the file path based on where the query file is located on your system.

#query <- LoadFileInput(path = "character(0)")
query <- LoadFileInput(path = args[3])
query <- ConvertGeneNames(

Calculate nCount_RNA and nFeature_RNA if the query does not

contain them already

if (!all(c("nCount_RNA", "nFeature_RNA") %in% c(colnames(x = query[[]])))) {

  • calcn <- as.data.frame(x = Seurat:::CalcN(object = query))
    
  • colnames(x = calcn) <- paste(
    
  •   colnames(x = calcn),
    
  •   "RNA",
    
  •   sep = '_'
    
  • )
    
  • query <- AddMetaData(
    
  •   object = query,
    
  •   metadata = calcn
    
  • )
    
  • rm(calcn)
    
  • }

Calculate percent mitochondrial genes if the query contains genes

matching the regular expression "^MT-"

if (any(grepl(pattern = '^MT-', x = rownames(x = query)))) {

  • query <- PercentageFeatureSet(
  • object = query,
    
  • pattern = '^MT-',
    
  • col.name = 'percent.mt',
    
  • assay = "RNA"
    
  • )
  • }

Filter cells based on the thresholds for nCount_RNA and nFeature_RNA

you set in the app

cells.use <- query[["nCount_RNA", drop = TRUE]] <= 261023 &

  • query[["nCount_RNA", drop = TRUE]] >= 254 &
  • query[["nFeature_RNA", drop = TRUE]] <= 11796 &
  • query[["nFeature_RNA", drop = TRUE]] >= 203

If the query contains mitochondrial genes, filter cells based on the

thresholds for percent.mt you set in the app

if ("percent.mt" %in% c(colnames(x = query[[]]))) {

  • cells.use <- cells.use & (query[["percent.mt", drop = TRUE]] <= 31 &
  • query[["percent.mt", drop = TRUE]] >= 0)
    
  • }

Remove filtered cells from the query

query <- query[, cells.use]

Preprocess with SCTransform

query <- SCTransform(

  • object = query,
  • assay = "RNA",
  • new.assay.name = "refAssay",
  • residual.features = rownames(x = reference$map),
  • reference.SCT.model = reference$map[["refAssay"]]@SCTModel.list$refmodel,
  • method = 'glmGamPoi',
  • ncells = 2000,
  • n_genes = 2000,
  • do.correct.umi = FALSE,
  • do.scale = FALSE,
  • do.center = TRUE
  • )
    Using reference SCTModel to calculate pearson residuals
    Determine variable features
    Setting min_variance to: -Inf
    Calculating residuals of type pearson for 2755 genes
    |============= |
    |================================
    |===================================================
    |========================================================
    ==============| 100%
    |=================================== |
    |======================================================================| 100%
    Set default assay to refAssay

Find anchors between query and reference

anchors <- FindTransferAnchors(

  • reference = reference$map,
  • query = query,
  • k.filter = NA,
  • reference.neighbors = "refdr.annoy.neighbors",
  • reference.assay = "refAssay",
  • query.assay = "refAssay",
  • reference.reduction = "refDR",
  • normalization.method = "SCT",
  • features = intersect(rownames(x = reference$map), VariableFeatures(object = query)),
  • dims = 1:100,
  • n.trees = 20,
  • mapping.score.k = 100
  • )
    Normalizing query using reference SCT model
    Projecting cell embeddings
    Finding query neighbors
    Finding neighborhoods
    Finding anchors
    Found 905 anchors

Transfer cell type labels and impute protein expression

Transferred labels are in metadata columns named "predicted.*"

The maximum prediction score is in a metadata column named "predicted.*.score"

The prediction scores for each class are in an assay named "prediction.score.*"

The imputed assay is named "impADT" if computed

refdata <- lapply(X = "annotation.l2", function(x) {

  • reference$map[[x, drop = TRUE]]
  • })

names(x = refdata) <- "annotation.l2"
if (FALSE) {

  • refdata[["impADT"]] <- GetAssayData(
  • object = reference$map[['ADT']],
    
  • slot = 'data'
    
  • )
  • }

query <- TransferData(

  • reference = reference$map,
  • query = query,
  • dims = 1:100,
  • anchorset = anchors,
  • refdata = refdata,
  • n.trees = 20,
  • store.weights = TRUE
  • )
    Finding integration vectors
    Finding integration vector weights
    0% 10 20 30 40 50 60 70 80 90 100%
    [----|----|----|----|----|----|----|----|----|----|
    **************************************************|
    Predicting cell labels
    Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from predictionscorea
    nnotation.l2_ to predictionscoreannotationl2_

Calculate the embeddings of the query data on the reference SPCA

query <- IntegrateEmbeddings(

  • anchorset = anchors,
  • reference = reference$map,
  • query = query,
  • reductions = "pcaproject",
  • reuse.weights.matrix = TRUE
  • )

Integrating dataset 2 with reference dataset
Finding integration vectors
Integrating data
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from integrated_dr_ t
o integrateddr_
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from integrated_dr_ t
o integrateddr_
Warning: All keys should be one or more alphanumeric characters followed by an underscore '', setting key to integrated
dr

Calculate the query neighbors in the reference

with respect to the integrated embeddings

query[["query_ref.nn"]] <- FindNeighbors(

  • object = Embeddings(reference$map[["refDR"]]),
  • query = Embeddings(query[["integrated_dr"]]),
  • return.neighbor = TRUE,
  • l2.norm = TRUE
  • )
    Computing nearest neighbors

The reference used in the app is downsampled compared to the reference on which

the UMAP model was computed. This step, using the helper function NNTransform,

corrects the Neighbors to account for the downsampling.

query <- Azimuth:::NNTransform(

  • object = query,
  • meta.data = reference$map[[]]
  • )

Project the query to the reference UMAP.

query[["proj.umap"]] <- RunUMAP(

  • object = query[["query_ref.nn"]],
  • reduction.model = reference$map[["refUMAP"]],
  • reduction.key = 'UMAP_'
  • )
    Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using t
    he cosine metric
    To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
    This message will be shown once per session
    Running UMAP projection
    02:12:52 Read 1020 rows
    02:12:52 Processing block 1 of 1
    02:12:52 Commencing smooth kNN distance calibration using 1 thread
    02:12:52 Initializing by weighted average of neighbor coordinates using 1 thread
    02:12:52 Commencing optimization for 67 epochs, with 20400 positive edges
    0% 10 20 30 40 50 60 70 80 90 100%
    [----|----|----|----|----|----|----|----|----|----|
    **************************************************|
    02:12:52 Finished
    Warning: No assay specified, setting assay as RNA by default.
    Warning message:
    In RunUMAP.default(object = neighborlist, reduction.model = reduction.model, :
    Number of neighbors between query and reference is not equal to the number of neighbros within reference

Calculate mapping score and add to metadata

query <- AddMetaData(

  • object = query,
  • metadata = MappingScore(anchors = anchors),
  • col.name = "mapping.score"
  • )
    Projecting reference PCA onto query
    Finding integration vector weights
    0% 10 20 30 40 50 60 70 80 90 100%
    [----|----|----|----|----|----|----|----|----|----|
    **************************************************|
    Projecting back the query cells into original PCA space
    Finding integration vector weights
    0% 10 20 30 40 50 60 70 80 90 100%
    [----|----|----|----|----|----|----|----|----|----|
    **************************************************|
    Computing scores:
    Finding neighbors of original query cells
    Finding neighbors of transformed query cells
    Error in index$getNNsByVectorList(query[x, ], k, search.k, include.distance) :
    fv.size() != vector_size
    Calls: AddMetaData ... resolve.list -> signalConditionsASAP -> signalConditions
    Execution halted

Human kidney reference

Hi,

I have seen that you have recently added human kidney reference from the recent biorxiv by Lake et al and that's great ! I have a couple of questions about this reference.

The pipeline used in the publication to produce the reference is based on home made normalization, home made batch effect correction and analysis with Pagoda2. I was wondering if it was compatible with the Map/Query method from Seurat ? The fact that the reference was obtained from another tool than Seurat will not impact the Map/Query result ?

If I follow the instruction from issue Map/Query, Do I need to preprocess my data with the same normalization method than the reference ?

Best,
WesDe

Cannot find directory

Hi,

When providing a path to local 'downloaded reference file' I keep getting a following error

Warning: Error in : Cannot find directory /home/monib/R/cellxgene/data/lung_ref.Rds
55: stop
54: LoadReference
50: server
Error : Cannot find directory /home/monib/R/cellxgene/data/lung_ref.Rds

i tried to provide a config file with a path to local file, however, same error.

Any suggestions as to why this is happening or how to fix it?

Thank you
Monib

Cannot add ADT to seurat object

Hi,

when loading a demo dataset i keep getting the following error:

Listening on http://127.0.0.1:4276
[1] "resetting..."
detected inputs from HUMAN with id type Gene.name
reference rownames detected HUMAN with id type Gene.name
Warning: Removed 1 rows containing missing values (geom_hline).
Warning: Removed 1 rows containing missing values (geom_rect).
Warning: Removed 1 rows containing missing values (geom_hline).
Warning: Removed 1 rows containing missing values (geom_rect).
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 1 rows containing missing values (geom_hline).
Warning: Removed 1 rows containing missing values (geom_rect).
Warning: Removed 1 rows containing missing values (geom_hline).
Warning: Removed 1 rows containing missing values (geom_rect).
Using reference SCTModel to calculate pearson residuals
Determine variable features
Calculating residuals of type pearson for 2301 genes
|===================================================================================================================================================================| 100%
|===================================================================================================================================================================| 100%
Set default assay to refAssay
Normalizing query using reference SCT model
Projecting cell embeddings
Finding query neighbors
Finding neighborhoods
Finding anchors
Found 7346 anchors
Warning: Error in : Cannot find 'ADT' in this Seurat object
4: runApp
1: Azimuth::AzimuthApp

traceback()
14: execCallbacks(timeoutSecs, all, loop$id)
13: run_now(timeoutMs/1000, all = FALSE)
12: service(timeout)
11: serviceApp()
10: ..stacktracefloor..(serviceApp())
9: withCallingHandlers(expr, error = doCaptureStack)
8: domain$wrapSync(expr)
7: promises::with_promise_domain(createStackTracePromiseDomain(),
expr)
6: captureStackTraces({
while (!.globals$stopped) {
..stacktracefloor..(serviceApp())
}
})
5: ..stacktraceoff..(captureStackTraces({
while (!.globals$stopped) {
..stacktracefloor..(serviceApp())
}
}))
4: runApp(appDir = shinyApp(ui = AzimuthUI, server = AzimuthServer))
3: force(code)
2: with_options(new = opts, code = runApp(appDir = shinyApp(ui = AzimuthUI,
server = AzimuthServer)))
1: Azimuth::AzimuthApp(config = "./data/config.json")

Not sure what is happening here.

Best regards
Monib

Big Input file error reading from connection

Hi,

I am using local version of Azimuth and trying to input a big seurat data rds(size=5G) for analysis. I have adjusted the config file by setting shiny.maxRequestSize = 1048576000 which is 10 times of default. However, I always see message from web interface like:
image

Is there any way to avoid this issue? Or it is not recommended to do large files with Azimuth

Thank you!

Reference dataset objects with all genes

First of all thank you for this great app. I was wondering if it's possible to download the Seurat objects of the annotated reference datasets with all genes. I have only checked the lung reference ref.Rds here (which I am particularly interested), so not entirely sure if it's the same for all refs, but as far as I can see there are only 3k genes. I noticed that the demo lung dataset object includes all genes, which is also great, but I am asking about the reference object in particular.

Thanks.

Is there a limit to how many times I can use AZIMUTH WEB?

Hello. I used Azimuth continously now for like 8 PBMC datasets, they all went smoothly. On my next run, the AZIMUTH PBMC web app turned white every time I visit it. Whether, I refresh or close it and visit it again.

But this only happens in PBMC app. Other apps like Pancreas, Lung, Kidney etc all function normally.

Retrieving cluster markers?

After we run seurat <- RunAzimuth(obj, reference = "./Data/Azimuth/"), is there any way to extract the cluster markers from the clusters itself, or am I only able to use the Annotation Details -> Label/Markers from the website and check intersection of this markers array and genes found in my dataset?

Cannot map cells to reference for human kidney

Hi

I am using docker version(0.4.3) of Azimuth to run my data for human kidney reference. However, when I tried to map cells to reference, some errors always jump out:

image

I can run my data using online version of Azimuth without issue. Would you mind help me with this? Thank you!

Does Azimuth App requires internet connect?

Hello, I'm getting an error:

Timeout was reached: [seurat.nygenome.org] Connection timeout after 10001 ms

when running Azimuth App from my own installation. Does this mean Azimuth requires internet even when using my local ref files? Is there a way that I can run this without internet on computational clusters.

Thanks!

Error when running docker container

Hi there,

Thanks for sharing azimuth.

I am trying to run azimuth on an AWS EC2 instance using a docker container built from the dockerfile in the repo. The shiny server runs successfully and I am able to connect via the website but as soon as I click on the azimuth page it goes grey and i see the following error messages in the terminal:
Screenshot 2021-06-04 at 14 33 40

I see that this error comes from line 208 in the server.R file and mentions demos. Do I need to include a demo file alongside the reference files or is there a way to disable the demo so that the app doesn't crash?

Thank you in advance.

A Seurat object for Azimuth PBMC reference?

Hi,

We're testing a new clustering algorithm, and would love to have the Azimuth reference to work with, since it's so wonderfully annotated. However, the link provided https://zenodo.org/record/4546839 - has the object that's stripped of all gene expression data from what I can tell. Is it possible to get the Seurat object for the reference 36k cells with RNA assay and all? Otherwise, would be grateful if you can point me to notebooks/code used to integrate all the raw data that went into the Azimuth PBMC reference.

Thank you in advance!

issue with port to launch a container

Hello, I finished building the Docker image. The following command has always been no response.

docker run -it -p 3838:3838 -v pbmc:/reference-data:ro azimuth

Tried the following as suggested and got invalid hostPort

docker run -it -p NNNN:3838 -v pbmc:/reference-data:ro azimuth
docker: Invalid hostPort: NNNN.

Any other suggestions? Thanks.

Error in curl::curl_fetch_memory: Could not resolve host: E

Thanks for the super quick response.
This solved the demo dataset problem. Thanks for that.
But now I get another problem:
I had provided reference data pathway and demo dataset

Azimuth::AzimuthApp(reference = "E:/project/2021_04_PBMC_5mRNA/analysis/mRNA/Integrate/azimuth_ref/" ,demodataset = 'demo.rds')

Listening on http://127.0.0.1:7817
Warning: Error in curl::curl_fetch_memory: Could not resolve host: E
64: curl::curl_fetch_memory
63: request_fetch.write_memory
61: request_perform
60: httr::GET
56: FUN
55: vapply
54: LoadReference
50: server
Error in curl::curl_fetch_memory(url, handle = handle) :
Could not resolve host: E
Is there any problem with my network? Thanks again for helping with this.

Originally posted by @Totoro-chen in #68 (comment)

Error in run Azimuth::AzimuthApp()

I'm try to use Azimuth in R(Rstudio), Following problems are encountered according to the official tutorial:

Azimuth::AzimuthApp()

Listening on http://127.0.0.1:6914
Warning: Error in :: Parameter length is zero
51: paste0
50: server
Error in 1:nrow(x = demos) : Parameter length is zero

I didn't find a parameter to specify 50 or 51,How do I deal with this problem?
Looking forward to your answer

Error message in RunUMAP when I run it locally

Dear Admin,

I got error message in RunUMAP when I run it locally, please help.

"Error in check_graph(graph, n_vertices, n_neighbors) :
ncol(idx) == expected_cols is not TRUE
Calls: RunUMAP ... RunUMAP.default -> -> check_graph -> stopifnot
Execution halted"


ARGUMENT 'Seurat-Azimuth/Seurat-pbmc3k/pbmc_10k_v3_filtered_feature_bc_matrix.h5' ignored

R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

#!/usr/bin/env Rscript
args <- commandArgs()

Ensure Seurat v4.0 or higher is installed

if (packageVersion(pkg = "Seurat") < package_version(x = "4.0.0")) {

  • stop("Mapping datasets requires Seurat v4 or higher.", call. = FALSE)
  • }

Ensure glmGamPoi is installed

if (!requireNamespace("glmGamPoi", quietly = TRUE)) {

  • if (!requireNamespace("BiocManager", quietly = TRUE)) {
  • BiocManager::install("glmGamPoi")
    
  • }
  • }

Ensure Azimuth is installed

if (packageVersion(pkg = "Azimuth") < package_version(x = "0.3.1")) {

  • stop("Please install azimuth - remotes::install_github('satijalab/azimuth')", call. = FALSE)
  • }

library(Seurat)
Attaching SeuratObject
library(SeuratDisk)
Registered S3 method overwritten by 'cli':
method from
print.boxx spatstat.geom
Registered S3 method overwritten by 'SeuratDisk':
method from
as.sparse.H5Group Seurat
library(Azimuth)
Attaching shinyBS

Download the Azimuth reference and extract the archive

Load the reference

Change the file path based on where the reference is located on your system.

reference <- LoadReference(path = "https://seurat.nygenome.org/azimuth/references/v1.0.0/human_pbmc")

Load the query object for mapping

Change the file path based on where the query file is located on your system.

#query <- LoadFileInput(path = "character(0)")
query <- LoadFileInput(args[3])
Warning message:
In sparseMatrix(i = indices[] + 1, p = indptr[], x = as.numeric(x = counts[]), :
'giveCsparse' has been deprecated; setting 'repr = "T"' for you

Calculate nCount_RNA and nFeature_RNA if the query does not

contain them already

if (!all(c("nCount_RNA", "nFeature_RNA") %in% c(colnames(x = query[[]])))) {

  • calcn <- as.data.frame(x = Seurat:::CalcN(object = query))
    
  • colnames(x = calcn) <- paste(
    
  •   colnames(x = calcn),
    
  •   "RNA",
    
  •   sep = '_'
    
  • )
    
  • query <- AddMetaData(
    
  •   object = query,
    
  •   metadata = calcn
    
  • )
    
  • rm(calcn)
    
  • }

Calculate percent mitochondrial genes if the query contains genes

matching the regular expression "^MT-"

if (any(grepl(pattern = '^MT-', x = rownames(x = query)))) {

  • query <- PercentageFeatureSet(
  • object = query,
    
  • pattern = '^MT-',
    
  • col.name = 'percent.mt',
    
  • assay = "RNA"
    
  • )
  • }

Filter cells based on the thresholds for nCount_RNA and nFeature_RNA

you set in the app

cells.use <- query[["nCount_RNA", drop = TRUE]] <= 79534 &

  • query[["nCount_RNA", drop = TRUE]] >= 501 &
  • query[["nFeature_RNA", drop = TRUE]] <= 7211 &
  • query[["nFeature_RNA", drop = TRUE]] >= 54

If the query contains mitochondrial genes, filter cells based on the

thresholds for percent.mt you set in the app

if ("percent.mt" %in% c(colnames(x = query[[]]))) {

  • cells.use <- cells.use & (query[["percent.mt", drop = TRUE]] <= 97 &
  • query[["percent.mt", drop = TRUE]] >= 0)
    
  • }

Remove filtered cells from the query

query <- query[, cells.use]

Preprocess with SCTransform

query <- SCTransform(

  • object = query,
  • assay = "RNA",
  • new.assay.name = "refAssay",
  • residual.features = rownames(x = reference$map),
  • reference.SCT.model = reference$map[["refAssay"]]@SCTModel.list$refmodel,
  • method = 'glmGamPoi',
  • ncells = 2000,
  • n_genes = 2000,
  • do.correct.umi = FALSE,
  • do.scale = FALSE,
  • do.center = TRUE
  • )
    Using reference SCTModel to calculate pearson residuals
    Determine variable features
    Calculating residuals of type pearson for 4999 genes
    |======= |
    |==================
    |============================
    |======================================
    |=================================================
    |==============================================
    |=========================================
    =============================| 100%
    |==================== |
    |==================================================
    |======================================================================| 100%
    Set default assay to refAssay

Find anchors between query and reference

anchors <- FindTransferAnchors(

  • reference = reference$map,
  • query = query,
  • k.filter = NA,
  • reference.neighbors = "refdr.annoy.neighbors",
  • reference.assay = "refAssay",
  • query.assay = "refAssay",
  • reference.reduction = "refDR",
  • normalization.method = "SCT",
  • features = intersect(rownames(x = reference$map), VariableFeatures(object = query)),
  • dims = 1:50,
  • n.trees = 20,
  • mapping.score.k = 100
  • )
    Normalizing query using reference SCT model
    Projecting cell embeddings
    Finding query neighbors
    Finding neighborhoods
    Finding anchors
    Found 11291 anchors

Transfer cell type labels and impute protein expression

Transferred labels are in metadata columns named "predicted.*"

The maximum prediction score is in a metadata column named "predicted.*.score"

The prediction scores for each class are in an assay named "prediction.score.*"

The imputed assay is named "impADT" if computed

refdata <- lapply(X = "celltype.l2", function(x) {

  • reference$map[[x, drop = TRUE]]
  • })

names(x = refdata) <- "celltype.l2"
if (TRUE) {

  • refdata[["impADT"]] <- GetAssayData(
  • object = reference$map[['ADT']],
    
  • slot = 'data'
    
  • )
  • }

query <- TransferData(

  • reference = reference$map,
  • query = query,
  • dims = 1:50,
  • anchorset = anchors,
  • refdata = refdata,
  • n.trees = 20,
  • store.weights = TRUE
  • )
    Finding integration vectors
    Finding integration vector weights
    0% 10 20 30 40 50 60 70 80 90 100%
    [----|----|----|----|----|----|----|----|----|----|
    **************************************************|
    Predicting cell labels
    Warning: Feature names cannot have underscores (''), replacing with dashes ('-')
    Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from predictionscorec
    elltype.l2
    to predictionscorecelltypel2_
    Transfering 228 features onto reference data

Calculate the embeddings of the query data on the reference SPCA

query <- IntegrateEmbeddings(

  • anchorset = anchors,
  • reference = reference$map,
  • query = query,
  • reductions = "pcaproject",
  • reuse.weights.matrix = TRUE
  • )
    Integrating dataset 2 with reference dataset
    Finding integration vectors
    Integrating data

Calculate the query neighbors in the reference

with respect to the integrated embeddings

query[["query_ref.nn"]] <- FindNeighbors(

  • object = Embeddings(reference$map[["refDR"]]),
  • query = Embeddings(query[["integrated_dr"]]),
  • return.neighbor = TRUE,
  • l2.norm = TRUE
  • )
    Computing nearest neighbors

The reference used in the app is downsampled compared to the reference on which

the UMAP model was computed. This step, using the helper function NNTransform,

corrects the Neighbors to account for the downsampling.

query <- Azimuth:::NNTransform(

  • object = query,
  • meta.data = reference$map[[]]
  • )

Project the query to the reference UMAP.

query[["proj.umap"]] <- RunUMAP(

  • object = query[["query_ref.nn"]],
  • reduction.model = reference$map[["refUMAP"]],
  • reduction.key = 'UMAP_'
  • )
    Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using t
    he cosine metric
    To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
    This message will be shown once per session
    Running UMAP projection
    Error in check_graph(graph, n_vertices, n_neighbors) :
    ncol(idx) == expected_cols is not TRUE
    Calls: RunUMAP ... RunUMAP.default -> -> check_graph -> stopifnot
    Execution halted

Deploying to the web

Hi again,

I was wondering how I could deploy an version of the app using my own reference to the web. Is there any particular way that you recommend I do so? I tried to just deploy from the RStudio using the publish button but that didn't work out for me.

Thanks

Cross-species mapping

Hi Developers,

Thank you for creating an amazing tool Azimuth!

Azimuth enables reference-based mapping. Was it shown valid for cross-platform and cross-species mapping at the same time?

Thank you for your time.
Best,
Yahui

Mapping score about Human Kidney Reference

Dear Azimuth Dev. Team,

We have been using Azimuth as the primary choice for cell type annotation. As our server does not support GUI, we used the Analysis script template for Azimuth analysis instead. Unfortunately, we rencently found there is a problem when fetching the mapping score from Human Kidney Reference.

More exactly, when running
query[["proj.umap"]] <- RunUMAP( object = query[["query_ref.nn"]], reduction.model = reference$map[["refUMAP"]], reduction.key = 'UMAP')_

it returned a warning message:
In RunUMAP.default(object = neighborlist, reduction.model = reduction.model, :
Number of neighbors between query and reference is not equal to the number of neighbros within reference

Then, for the next step
query <- AddMetaData( object = query, metadata = MappingScore(anchors = anchors), col.name = "mapping.score")

it further returned an error
Error in index$getNNsByVectorList(query[x, ], k, search.k, include.distance) :
fv.size() != vector_size

This problem is NOT observed in our previously used references like Human PBMC, Human Fetus and Human Lung v1. So perhaps there is a few special setting for Human Kidney reference that has not been specified in the corresponding Analysis script template?

Please tell us how to modify the script at your convenience.

Thank you!

What happens to data uploaded to Azimuth

Hi,

I have been going through the Azimuth tool documentation, but I haven't seen information about what happens to the data uploaded to Azimuth after the analysis. Do you save the gene expression count data so that it can be used in improving the service, the method or something like that?

Retrieving RNA raw counts

I have questions from ref.Rds file to retrieve RNA count data.

When I would like to retrieve count data by
data <- GetAssayData(object = pbmc, slot = "data")
It returns matrix of all zeroes.

And there are 2 assays available: refAssay and ADT.
ADT data seems to be available. However, refAssay does not contain any non-zeroes. Is this just an empty matrix?

How can I extract RNA count matrix, not ADT?

Thank you.

How to do subset analysis from AZIMUTH-identified cell types?

Hello, I'm just new to R programming and single-cell analysis. May I ask for your guidance please?

I queried a number of PBMC samples (healthy and diseased states) to AZIMUTH web. Then, I subsetted the "CD14 Mono" and "CD16 Mono" because I would like to do reclustering and subset analysis. May I ask how should I do this?

Should I follow these steps:
subset > RunPCA > RunUMAP > FindNeighbors > FindClusters > UMAPPlot > NormalizeData

Are these steps correct or could you please let me know the correct way? And after NormalizeData, is this ready for downstream analysis like DEG and GSEA?

Thank you!!!

Label my own UMAP with Azimuth?

I was wondering if Azimuth can automatically label my own UMAP instead of the reference UMAP provided, as for example SignacX package do. Thank you.

General question regarding how different batches samples are dealt with in Azimuth

Dear Developers,

Thank you for creating an amazing tool Azimuth!

I just had a naive question regarding how Azimuth deals with a scRNA-seq data that may come from different batches and samples. Is the user expected to upload the count matrix that has been subsetted to contains cells from individual batches/samples?

Thank you for your time.

Best regards,

Hani

Mapping tumor sample single cells onto the normal based reference

Dear Azimuth,

We mapped human lung cancer (NSCLC) single cell RNA-seq data by using Azimuth onto its default reference (Travaglini et al. 2020). There are some concerns among us that using the normal sample based reference for mapping tumor sample single cell. I wonder if the concerns are reasonable and there is a way to clear the concerns.

Thank you so much for creating Azimuth.
Luke Heo

Azimuth output R code does not give same UMAP plot/values as web app

Hello,

I was using the Azimuth web app to test some data but wanted to have reproducible codes so I tried downloading the R codes and running through them myself. However, I am not getting the same umap plot for the query data set as I see in the web app. For a reproducible example, I downloaded the Human reference PBMC files from Zenodo (https://zenodo.org/record/4546839#.YSZIgo5KguU) and the same example data set from 10X (https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_10k_v3/pbmc_10k_v3_filtered_feature_bc_matrix.h5). I uploaded this .h5 file to the web app and got the following umap plots:
image

I then downloaded the Analysis script template and tried running it on my computer:

> #!/usr/bin/env Rscript
> 
> # Ensure Seurat v4.0 or higher is installed
> if (packageVersion(pkg = "Seurat") < package_version(x = "4.0.0")) {
+   stop("Mapping datasets requires Seurat v4 or higher.", call. = FALSE)
+ }
> 
> # Ensure glmGamPoi is installed
> if (!requireNamespace("glmGamPoi", quietly = TRUE)) {
+   if (!requireNamespace("BiocManager", quietly = TRUE)) {
+     BiocManager::install("glmGamPoi")
+   }
+ }
> 
> # Ensure Azimuth is installed
> if (packageVersion(pkg = "Azimuth") < package_version(x = "0.3.1")) {
+   stop("Please install azimuth - remotes::install_github('satijalab/azimuth')", call. = FALSE)
+ }
> 
> library(Seurat)
Attaching SeuratObject
> library(Azimuth)
Registered S3 method overwritten by 'cli':
  method     from         
  print.boxx spatstat.geom
Registered S3 method overwritten by 'SeuratDisk':
  method            from  
  as.sparse.H5Group Seurat
Attaching shinyBS
> 
> 
> setwd("projects/ascoli/2021Jul-scRNASeq/")
> dir("data/genome/HumanPBMC/")
 [1] "azimuth_pred._both_webapp.tsv"             "azimuth_pred.tsv"                         
 [3] "azimuth_umap.Rds"                          "azimuth_umap_both_webapp.Rds"             
 [5] "idx.annoy"                                 "object_umap.jpeg"                         
 [7] "pbmc_10k_v3_filtered_feature_bc_matrix.h5" "query_umap.jpeg"                          
 [9] "ref.Rds"                                   "reference_umap.jpeg"                      
> 
> # Download the Azimuth reference and extract the archive
> 
> # Load the reference
> # Change the file path based on where the reference is located on your system.
> reference <- LoadReference(path = "data/genome/HumanPBMC/")
> 
> # Load the query object for mapping
> # Change the file path based on where the query file is located on your system.
> query <- LoadFileInput(path = "data/genome/HumanPBMC/pbmc_10k_v3_filtered_feature_bc_matrix.h5")
Warning message:
In sparseMatrix(i = indices[] + 1, p = indptr[], x = as.numeric(x = counts[]),  :
  'giveCsparse' has been deprecated; setting 'repr = "T"' for you
> 
> # Calculate nCount_RNA and nFeature_RNA if the query does not
> # contain them already
> if (!all(c("nCount_RNA", "nFeature_RNA") %in% c(colnames(x = query[[]])))) {
+     calcn <- as.data.frame(x = Seurat:::CalcN(object = query))
+     colnames(x = calcn) <- paste(
+       colnames(x = calcn),
+       "RNA",
+       sep = '_'
+     )
+     query <- AddMetaData(
+       object = query,
+       metadata = calcn
+     )
+     rm(calcn)
+ }
> 
> # Calculate percent mitochondrial genes if the query contains genes
> # matching the regular expression "^MT-"
> if (any(grepl(pattern = '^MT-', x = rownames(x = query)))) {
+   query <- PercentageFeatureSet(
+     object = query,
+     pattern = '^MT-',
+     col.name = 'percent.mt',
+     assay = "RNA"
+   )
+ }
> 
> # Filter cells based on the thresholds for nCount_RNA and nFeature_RNA
> # you set in the app
> cells.use <- query[["nCount_RNA", drop = TRUE]] <= 79534 &
+   query[["nCount_RNA", drop = TRUE]] >= 501 &
+   query[["nFeature_RNA", drop = TRUE]] <= 4000 &
+   query[["nFeature_RNA", drop = TRUE]] >= 54
> 
> # If the query contains mitochondrial genes, filter cells based on the
> # thresholds for percent.mt you set in the app
> if ("percent.mt" %in% c(colnames(x = query[[]]))) {
+   cells.use <- cells.use & (query[["percent.mt", drop = TRUE]] <= 20 &
+     query[["percent.mt", drop = TRUE]] >= 0)
+ }
> 
> # Remove filtered cells from the query
> query <- query[, cells.use]
> 
> # Preprocess with SCTransform
> query <- SCTransform(
+   object = query,
+   assay = "RNA",
+   new.assay.name = "refAssay",
+   residual.features = rownames(x = reference$map),
+   reference.SCT.model = reference$map[["refAssay"]]@SCTModel.list$refmodel,
+   method = 'glmGamPoi',
+   ncells = 2000,
+   n_genes = 2000,
+   do.correct.umi = FALSE,
+   do.scale = FALSE,
+   do.center = TRUE
+ )
Using reference SCTModel to calculate pearson residuals
Determine variable features
Calculating residuals of type pearson for 4999 genes
  |==============================================================================================| 100%
  |==============================================================================================| 100%
Set default assay to refAssay
> 
> # Find anchors between query and reference
> anchors <- FindTransferAnchors(
+   reference = reference$map,
+   query = query,
+   k.filter = NA,
+   reference.neighbors = "refdr.annoy.neighbors",
+   reference.assay = "refAssay",
+   query.assay = "refAssay",
+   reference.reduction = "refDR",
+   normalization.method = "SCT",
+   features = intersect(rownames(x = reference$map), VariableFeatures(object = query)),
+   dims = 1:50,
+   n.trees = 20,
+   mapping.score.k = 100
+ )
Normalizing query using reference SCT model
Projecting cell embeddings
Finding query neighbors
Finding neighborhoods
Finding anchors
	Found 10893 anchors
> 
> # Transfer cell type labels and impute protein expression
> #
> # Transferred labels are in metadata columns named "predicted.*"
> # The maximum prediction score is in a metadata column named "predicted.*.score"
> # The prediction scores for each class are in an assay named "prediction.score.*"
> # The imputed assay is named "impADT" if computed
> 
> refdata <- lapply(X = "celltype.l1", function(x) {
+   reference$map[[x, drop = TRUE]]
+ })
> names(x = refdata) <- "celltype.l1"
> if (TRUE) {
+   refdata[["impADT"]] <- GetAssayData(
+     object = reference$map[['ADT']],
+     slot = 'data'
+   )
+ }
> query <- TransferData(
+   reference = reference$map,
+   query = query,
+   dims = 1:50,
+   anchorset = anchors,
+   refdata = refdata,
+   n.trees = 20,
+   store.weights = TRUE
+ )
Finding integration vectors
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Predicting cell labels
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from predictionscorecelltype.l1_ to predictionscorecelltypel1_
Transfering 228 features onto reference data
> 
> # Calculate the embeddings of the query data on the reference SPCA
> query <- IntegrateEmbeddings(
+   anchorset = anchors,
+   reference = reference$map,
+   query = query,
+   reductions = "pcaproject",
+   reuse.weights.matrix = TRUE
+ )
  |                                                  | 0 % ~calculating  Integrating dataset 2 with reference dataset
Finding integration vectors
Integrating data
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
> 
> # Calculate the query neighbors in the reference
> # with respect to the integrated embeddings
> query[["query_ref.nn"]] <- FindNeighbors(
+   object = Embeddings(reference$map[["refDR"]]),
+   query = Embeddings(query[["integrated_dr"]]),
+   return.neighbor = TRUE,
+   l2.norm = TRUE
+ )
Computing nearest neighbors
> 
> # The reference used in the app is downsampled compared to the reference on which
> # the UMAP model was computed. This step, using the helper function NNTransform,
> # corrects the Neighbors to account for the downsampling.
> query <- NNTransform(
+   object = query,
+   meta.data = reference$map[[]]
+ )
Error in NNTransform(object = query, meta.data = reference$map[[]]) : 
  could not find function "NNTransform"
> 
> # Project the query to the reference UMAP.
> query[["proj.umap"]] <- RunUMAP(
+   object = query[["query_ref.nn"]],
+   reduction.model = reference$map[["refUMAP"]],
+   reduction.key = 'UMAP_'
+ )
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
Running UMAP projection
08:59:47 Read 10637 rows and found  numeric columns
08:59:47 Processing block 1 of 1
08:59:47 Commencing smooth kNN distance calibration using 1 thread
08:59:47 Initializing by weighted average of neighbor coordinates using 1 thread
08:59:47 Commencing optimization for 67 epochs, with 212740 positive edges
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
08:59:50 Finished
Warning: No assay specified, setting assay as RNA by default.
> 
> 
> # Calculate mapping score and add to metadata
> query <- AddMetaData(
+   object = query,
+   metadata = MappingScore(anchors = anchors),
+   col.name = "mapping.score"
+ )
Projecting reference PCA onto query
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Projecting back the query cells into original PCA space
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Computing scores:
    Finding neighbors of original query cells
    Finding neighbors of transformed query cells
    Computing query SNN
    Determining bandwidth and computing transition probabilities
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Total elapsed time: 10.7010440826416
> 
> # VISUALIZATIONS
> 
> # First predicted metadata field, change to visualize other predicted metadata
> id <- "celltype.l1"[1]
> predicted.id <- paste0("predicted.", id)
> 
> 
> library(ggplot2)
> 
> # DimPlot of the reference
> x11(width = 10, height = 7)
> DimPlot(object = reference$plot, reduction = "refUMAP", group.by = id, label = TRUE) + NoLegend()
> ggsave("data/genome/HumanPBMC/reference_umap.jpeg")
Saving 9.92 x 6.92 in image
> 
> # DimPlot of the query, colored by predicted cell type
> x11(width = 10, height = 7)
> DimPlot(object = query, reduction = "proj.umap", group.by = predicted.id, label = TRUE) + NoLegend()
> ggsave("data/genome/HumanPBMC/query_umap.jpeg")
Saving 9.92 x 6.92 in image

reference_umap
query_umap

I then tried downloading the UMAP (Seurat Reduction RDS) and this did give me the same plot (although I cannot get the same celltypes to have the same color). Also, the predictions are the same:

> # Read in downloaded umap values;
> # First re-read in the object and filter:
> 
> object <- LoadFileInput(path = "data/genome/HumanPBMC/pbmc_10k_v3_filtered_feature_bc_matrix.h5")
Warning message:
In sparseMatrix(i = indices[] + 1, p = indptr[], x = as.numeric(x = counts[]),  :
  'giveCsparse' has been deprecated; setting 'repr = "T"' for you
> 
> # Calculate nCount_RNA and nFeature_RNA if the object does not
> # contain them already
> if (!all(c("nCount_RNA", "nFeature_RNA") %in% c(colnames(x = object[[]])))) {
+   calcn <- as.data.frame(x = Seurat:::CalcN(object = object))
+   colnames(x = calcn) <- paste(
+     colnames(x = calcn),
+     "RNA",
+     sep = '_'
+   )
+   object <- AddMetaData(
+     object = object,
+     metadata = calcn
+   )
+   rm(calcn)
+ }
> 
> # Calculate percent mitochondrial genes if the object contains genes
> # matching the regular expression "^MT-"
> if (any(grepl(pattern = '^MT-', x = rownames(x = object)))) {
+   object <- PercentageFeatureSet(
+     object = object,
+     pattern = '^MT-',
+     col.name = 'percent.mt',
+     assay = "RNA"
+   )
+ }
> 
> # Filter cells based on the thresholds for nCount_RNA and nFeature_RNA
> # you set in the app
> cells.use <- object[["nCount_RNA", drop = TRUE]] <= 79534 &
+   object[["nCount_RNA", drop = TRUE]] >= 501 &
+   object[["nFeature_RNA", drop = TRUE]] <= 4000 &
+   object[["nFeature_RNA", drop = TRUE]] >= 54
> 
> # If the object contains mitochondrial genes, filter cells based on the
> # thresholds for percent.mt you set in the app
> if ("percent.mt" %in% c(colnames(x = object[[]]))) {
+   cells.use <- cells.use & (object[["percent.mt", drop = TRUE]] <= 20 &
+                               object[["percent.mt", drop = TRUE]] >= 0)
+ }
> 
> # Remove filtered cells from the object
> object <- object[, cells.use]
> 
> #Run code from download page:
> 
> projected.umap <- readRDS('data/genome/HumanPBMC/azimuth_umap.Rds')
> object <- object[, Cells(projected.umap)]
> object[['umap.proj']] <- projected.umap
> 
> 
> #Add in downloaded predictions:
> 
> predictions <- read.delim('data/genome/HumanPBMC/azimuth_pred.tsv', row.names = 1)
> object <- AddMetaData(
+   object = object,
+   metadata = predictions)
> 
> #Do plot:
> 
> 
> x11(width = 10, height = 7)
> DimPlot(object = object, reduction = "umap.proj", group.by = predicted.id, label = TRUE) + NoLegend()
> ggsave("data/genome/HumanPBMC/object_umap.jpeg")
Saving 9.92 x 6.92 in image
> 
> 
> all.equal(query$predicted.celltype.l1, object$predicted.celltype.l1)
[1] TRUE

object_umap

So what of the output code is incorrect that I am not getting the same umap plot? Thanks!

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.3.5      Azimuth_0.4.3      shinyBS_0.61       SeuratObject_4.0.2 Seurat_4.0.3      

loaded via a namespace (and not attached):
  [1] systemfonts_1.0.2           plyr_1.8.6                  igraph_1.2.6               
  [4] lazyeval_0.2.2              shinydashboard_0.7.1        splines_4.1.0              
  [7] listenv_0.8.0               scattermore_0.7             GenomeInfoDb_1.28.1        
 [10] digest_0.6.27               htmltools_0.5.1.1           fansi_0.5.0                
 [13] magrittr_2.0.1              tensor_1.5                  googlesheets4_0.3.0        
 [16] cluster_2.1.2               ROCR_1.0-11                 globals_0.14.0             
 [19] matrixStats_0.59.0          spatstat.sparse_2.0-0       colorspace_2.0-2           
 [22] ggrepel_0.9.1               textshaping_0.3.5           xfun_0.24                  
 [25] dplyr_1.0.7                 crayon_1.4.1                RCurl_1.98-1.3             
 [28] jsonlite_1.7.2              spatstat.data_2.1-0         survival_3.2-11            
 [31] zoo_1.8-9                   glue_1.4.2                  polyclip_1.10-0            
 [34] gtable_0.3.0                gargle_1.2.0                zlibbioc_1.38.0            
 [37] XVector_0.32.0              leiden_0.3.8                DelayedArray_0.18.0        
 [40] future.apply_1.7.0          BiocGenerics_0.38.0         abind_1.4-5                
 [43] scales_1.1.1                DBI_1.1.1                   miniUI_0.1.1.1             
 [46] Rcpp_1.0.7                  viridisLite_0.4.0           xtable_1.8-4               
 [49] reticulate_1.20             spatstat.core_2.3-0         bit_4.0.4                  
 [52] stats4_4.1.0                DT_0.18                     htmlwidgets_1.5.3          
 [55] httr_1.4.2                  RColorBrewer_1.1-2          ellipsis_0.3.2             
 [58] ica_1.0-2                   pkgconfig_2.0.3             farver_2.1.0               
 [61] uwot_0.1.10                 deldir_0.2-10               utf8_1.2.1                 
 [64] tidyselect_1.1.1            labeling_0.4.2              rlang_0.4.11               
 [67] reshape2_1.4.4              later_1.2.0                 munsell_0.5.0              
 [70] cellranger_1.1.0            tools_4.1.0                 cli_3.0.1                  
 [73] generics_0.1.0              ggridges_0.5.3              stringr_1.4.0              
 [76] fastmap_1.1.0               ragg_1.1.3                  goftest_1.2-2              
 [79] bit64_4.0.5                 fs_1.5.0                    fitdistrplus_1.1-5         
 [82] purrr_0.3.4                 RANN_2.6.1                  pbapply_1.4-3              
 [85] future_1.21.0               nlme_3.1-152                mime_0.11                  
 [88] hdf5r_1.3.3                 compiler_4.1.0              plotly_4.9.4.1             
 [91] png_0.1-7                   spatstat.utils_2.2-0        tibble_3.1.2               
 [94] glmGamPoi_1.4.0             stringi_1.7.3               lattice_0.20-44            
 [97] Matrix_1.3-3                SeuratDisk_0.0.0.9019       shinyjs_2.0.0              
[100] vctrs_0.3.8                 pillar_1.6.1                lifecycle_1.0.0            
[103] BiocManager_1.30.16         spatstat.geom_2.2-2         lmtest_0.9-38              
[106] RcppAnnoy_0.0.18            data.table_1.14.0           cowplot_1.1.1              
[109] bitops_1.0-7                irlba_2.3.3                 httpuv_1.6.1               
[112] patchwork_1.1.1             GenomicRanges_1.44.0        R6_2.5.0                   
[115] promises_1.2.0.1            KernSmooth_2.23-20          gridExtra_2.3              
[118] IRanges_2.26.0              parallelly_1.27.0           codetools_0.2-18           
[121] MASS_7.3-54                 assertthat_0.2.1            SummarizedExperiment_1.22.0
[124] withr_2.4.2                 presto_1.0.0                sctransform_0.3.2          
[127] S4Vectors_0.30.0            GenomeInfoDbData_1.2.6      mgcv_1.8-35                
[130] parallel_4.1.0              grid_4.1.0                  rpart_4.1-15               
[133] tidyr_1.1.3                 MatrixGenerics_1.4.0        googledrive_2.0.0          
[136] Rtsne_0.15                  Biobase_2.52.0              shiny_1.6.0                
[139] tinytex_0.32 

Using Personal Reference

When using my own reference and query datasets with Azimuth, when I load them and map cells to reference, it immediately crashes and I get this error:
Warning: Error in [[: subscript out of bounds
78: eval
77: eval
76: withProgress
75: observeEventHandler
4: runApp
1: Azimuth::AzimuthApp

I'm not sure why this happens as there doesn't seem like there is any instance where I would be indexing out of bounds?

Human Bone Marrow Reference

Hi all,

First of all, thanks for building this web tool! It's been a useful resource for my lab.

I'm mapping my single cell datasets to several of your hematopoietic reference datasets. I used the web tool to download the azimuth analysis scripts and was able to download the reference dataset for "Human - PBMC" through a web link. However, there doesn't seem to be a similar web link for the "Human - Bone Marrow" reference dataset.

I saw the paper references that you included on the azimuth page for the bone marrow reference dataset. However, I was wondering whether there was a pre-compiled reference for the BM dataset like the one for the PBMC dataset.

Thank you!
Will

do.correct.umi set to FALSE by default

Dear Azimuth team,

Thanks for this awesome tool! Looking at the template R code that the online app generates after each run, it appears that the app runs SCTransform on input query with do.correct.umi set to FALSE. Can you please confirm that this is by design, and shed light on any theoretical or pragmatic reasons behind this choice?

azimuth_analysis.txt

Many thanks,
Sina

Batch effect correction

Thank you for creating Azimuth.
It is amazing!

According to the guide, the input single cell RNA seq data must be unnnormalized.
I wonder that if a batch effect correction step can be processed before or during the Azimuth process?

Luke Heo

Filtering threshold score

Dear Azimuth,

There are two scores, predicted.annotation.score and mapping.score in the Azimuth result query.
Is there a recommended threshold score to select cells with confident annotation.

Many thanks for your wonderful work.
Luke Heo

Human Fetus reference did not work for anchor finding by R script template

Hello there,

I have tried to work on a public PBMC dataset (https://cellexalvr.med.lu.se/downloadable/pbmc3k.zip) by downloading the reference data from Zenodo (ref.Rds and idx.annoy) and applying the Azimuth's R script template.

It worked well when using the Human PBMC and the Human Lung references, returning 2754 and 2460 anchors, respectively. But when using the Human Fetus reference, it stuck at the anchor finding step, reporting the following error:

Error in index$getNNsByVectorList(query[x, ], k, search.k, include.distance) :
fv.size() != vector_size

Here is the code (same for Human PBMC and Human Lung references except the reference path):

library("Seurat")
library("Azimuth")
query <- LoadFileInput(path = "./pbmc3k.h5ad")
reference <- LoadReference(path = "./azimuthRefs/humanFetus/")
query <- ConvertGeneNames(object = query, reference.names = rownames(x = reference$map), homolog.table = "./azimuthRefs/homologs.rds")
query <- SCTransform(object = query, assay = "RNA", new.assay.name = "refAssay", residual.features = rownames(x = reference$map),
reference.SCT.model = reference$map[["refAssay"]]@SCTModel.list$refmodel, method = 'glmGamPoi', ncells = 2000, n_genes = 2000,
do.correct.umi = FALSE, do.scale = FALSE, do.center = TRUE)
anchors <- FindTransferAnchors( reference = reference$map, query = query, k.filter = NA, reference.neighbors = "refdr.annoy.neighbors",
reference.assay = "refAssay", query.assay = "refAssay", reference.reduction = "refDR", normalization.method = "SCT",
features = intersect(rownames(x = reference$map), VariableFeatures(object = query)), dims = 1:50,
n.trees = 20, mapping.score.k = 100)

On the other hand, this public PBMC dataset could be safely analyzed by the Azimuth online Web App using the Human Fetus reference. So I am really confused about what is wrong.

Thank you in advance!

Neutrophils?

Hi,
I found Azimuth is very useful. However, there is no Neutrophils and Megakayocytes (but there is platelets).
Why neutrophil is excluded? Any plan to add neutrophils back?

Thanks a lot,

Bug in cells.use in template.R

Hi Folks, thanks for the great app. I was looking at the download template feature in the app which is a great idea. Looking at the code I noticed that unlike in the actual code the template seems to have a small bug where cells.use is set for mitochondrial gene filtered cells - the mito filter section will reset the cell.use rather than & it to the preceding feature and counts filters. The correct code is in the actual app but the template has the error:

# If the query contains mitochondrial genes, filter cells based on the
# thresholds for ${mito.key} you set in the app
if ("${mito.key}" %in% c(colnames(x = query[[]]))) {
  cells.use <- query[["${mito.key}", drop = TRUE]] <= ${mito.max} &
    query[["${mito.key}", drop = TRUE]] >= ${mito.min}
}

Thought I'd mention it.

Thanks,

Tim

Conflicting Level1 and Level2 labels

Dear Azimuth team,

Thanks for this great resource.

I have been exploring the PBMC reference to annotate some internal query samples. Closer inspection of level1 and level2 labels in the query data revealed some conflicting labels, for example a cell is called "CD4 T" at level1, but "CD14 Mono" or "CD8 TCM" at level 2. While these cases are typically rare (less than 1%), I was wondering if you have encountered conflicting Level1 and Level2 labels, and what your thoughts are on the possible explanation(s).

Many thanks,
Sina

Warning: Error in -: invalid argument to unary operator

when we run our reference through a config file, after all of the data processing finished and before we could use the interactive features of the app, we got this error : Warning: Error in -: invalid argument to unary operator. I was curious why this occurred after all of the processing finished?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.