Coder Social home page Coder Social logo

yezhengstat / adtnorm Goto Github PK

View Code? Open in Web Editor NEW
19.0 19.0 4.0 49.23 MB

ADTnorm normalizes the cell surface protein measurement of CITE-seq data, facilitating across batches and across studies data integration.

Home Page: https://yezhengstat.github.io/ADTnorm/articles/ADTnorm-tutorial.html

License: GNU General Public License v3.0

R 99.86% Dockerfile 0.14%
cite-seq data-integration single-cell surface-protein-normalization

adtnorm's People

Contributors

daniel-caron avatar helenlindsay avatar juyeongkim avatar yezhengstat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

adtnorm's Issues

What is ADTseqDepth?

I am a bit unclear what is 'ADTseqDepth' column in the 'cell_x_feature' demo data.
From the documentation, ADTseqDepth is referred to as 'total UMI per cell', but does this mean it is simply a sum of all antibody reads for each cell? For instance, if I ran a 5-antibody panel sequencing experiment, and obtained a row (representing one cell) with raw reads as following:
CD3 CD4 CD8 CD14 CD19
18 138 13 491 3

Then, is the 'ADTseqDepth' for this cell simply 18+138+13+491+3= 663?

Currently, I don't think I really have a separate 'ADTseqDepth' data, and the only data I receive as an output from the experiment is the raw read matrix for each antibodies, just like the 'cell_x_adt' demo data. (I also get the column of exact unique barcodes used for the cells (e.g. AAGTTGTCTAC for row 1, ATTCTTTCGTTT for row 2, etc.), but I don't think this info would be really relevant.) So, I was wondering how to I substitute the 'ADTseqDepth' in the cell_x_feature parameter.

Furthermore, what could be done if I don't really have a 'sample_status' or 'cell_type_l1' data? There are ways I can surrogate these, but the distinction will not be clear as to which cell is healthy vs tumor. Are all 7 columns in the demo 'cell_x_feature' necessary (and equally important) when ADTnorm is run? Or is it okay if I just provide the 'sample' and 'batch' columns for the 'cell_x_feature'?

Thank you!

Error in (function (unregfd, ximarks, x0marks, x0lim = NULL, WfdPar = NULL,

Hi Ye,

I was trying to run your ADTnorm with a dataset of 15820 cells and 207 features (198 proteins + 9 isotype controls).

The function for normalisation:

  adt206_adt_adtnorm <- ADTnorm(
    cell_x_adt = adt206_raw_adt_ctl,  #Matrix of ADT raw counts in cells (rows) by ADT markers (columns) format.
    cell_x_feature = cell_x_feature, #Matrix of cells (rows) by cell features (columns) such as sample, batch, or other cell-type related information.
    save_outpath = outpath, 
    study_name = "adt198", 
    marker_to_process = NULL, 
    save_fig = TRUE
  )

However, I ran into the below error:

Progress:  Each dot is a curve
..
Progress:  Each dot is a curve
..
Progress:  Each dot is a curve
..
Error in (function (unregfd, ximarks, x0marks, x0lim = NULL, WfdPar = NULL,  : 
  Argument ximarks has values outside of range of unregfd.
In addition: Warning messages:
1: Removed 1 rows containing missing values (`geom_segment()`). 
2: Removed 1 rows containing missing values (`geom_segment()`). 

Below I'm showing the input data (hope it will help).
image

image

Thank you and I'm looking forward to hearing from you.

Warm regards,
Hsiao-Chi

missing value where TRUE/FALSE needed

Dear Ye Zheng,

First of all, thank you for creating this amazing package.
I have ADT data from single cell DNA-antibody sequencing experiments (rather than CITE-seq), which I want to (1) correct the batch effects across timepoints (within same patient) and (2) correct for the mouse background signals (the IgG's) for better accuracy.

I have two questions:
(1) I am keep getting the error message

Error in if (length(y_valley[x_valley > real_peak[1]]) == 0 || (y_valley[x_valley > :
missing value where TRUE/FALSE needed

after I run the following:
cell_x_adt_norm <- ADTnorm(
cell_x_adt = my_adt,
cell_x_feature = my_feat,
save_outpath = save_outpath,
study_name = run_name2,
marker_to_process = c("CD3", "CD4", "CD8"),
save_intermediate_fig = TRUE
)

Where my_adt is a matrix of 48 antibodies (including three IgG isotypes), and my_feat is a matrix where 'sample', 'batch', 'study_names' are the timepoints. Since I didn't have all the information in the demo data 'cell_x_feature' given, I had to arbitrarily make up some of the variables in my_feat.

(2) Since we have the exact IgG reads as a part of our ADT data, I was thinking if it is compatible to use dsb (denoised and scaled by background) and then use ADTnorm on the same dataset (so the data fed into ADTnorm is already normalized). I'm not too familiar with the math behind it, so if this practice sounds not recommended, please let me know!

Error in lnsrch_morph

Hello,

I am testing ADTnorm with only one CITE-seq PBMC sample that has 138 ADTs and ~17K cells. I am using default params. See code below. However, I am getting this error: "Error in lnsrch_morph(bvecold, fold, grad, pvec, fngrad_morph, morphList, :
Initial slope not negative." Any ideas? Thank you very much in advance!

cell_x_adt_norm = ADTnorm( cell_x_adt = cell_x_adt, cell_x_feature = cells_meta, save_outpath = output_path, study_name = run_name, marker_to_process = NULL, ## setting it to NULL by default will process all available markers in cell_x_adt. bimodal_marker = NULL, ## setting it to NULL will trigger ADTnorm to try different settings to find bimodal peaks for all the markers. brewer_palettes = "Dark2", ## color brewer palettes setting for the density plot save_fig = TRUE )

attempting batch correction with ADTnorm

Hello,

I installed ADTnorm from github using remotes::install_github("yezhengSTAT/ADTnorm", build_vignettes = FALSE)

I have an ADT-seq dataset that was generated in 4 different batches/runs with multiple samples in each batch/run. I am seeing an effect of the run even after normalizing with the DSB method. For example:
image

I have tried using ADTnorm with various parameters, but still the data is very separated by run. Below is my code along with the resulting UMAP plots:

Option 1:
`#### option 1
A316.VDJ <- readRDS(file = here::here("A316_final_vdj_all.rds"))
save_outpath <- "/Users/spanglerab"
run_name <- "ADTnorm_demoRun"

cell_x_adt <- t(as.data.frame(GetAssayData(A316.VDJ, assay = "Prot", slot = "counts")))
cell_x_feature <- [email protected]

cell_x_feature$sample = factor(cell_x_feature$run)
cell_x_feature$batch = factor(cell_x_feature$run)

cell_x_adt_norm <- ADTnorm(
cell_x_adt = cell_x_adt,
cell_x_feature = cell_x_feature,
save_outpath = save_outpath,
study_name = run_name,
save_intermediate_fig = TRUE
)

A316.VDJ <- SetAssayData(A316.VDJ, assay="Prot",slot = "data", new.data=t(cell_x_adt_norm))

DefaultAssay(A316.VDJ) <- "Prot"
A316.VDJ <- ScaleData(A316.VDJ, features = rownames(A316.VDJ))
A316.VDJ <- RunPCA(A316.VDJ, assay = "Prot", slot = "data", features = rownames(A316.VDJ), reduction.name = "apca")
A316.VDJ <- RunUMAP(A316.VDJ, reduction = "apca", dims = 1:18, assay = "Prot", reduction.name = "prot.umap", reduction.key = "protUMAP_", n.neighbors = 40, min.dist = 0.3, local.connectivity = 3, spread = 3)
pdf(file = here::here("DimPlot_prot_UMAP_all_adt_norm.pdf"))
DimPlot(A316.VDJ, reduction = "prot.umap", label = TRUE, group.by = "run")
dev.off()`
image

Thanks for your help,

Abby

Missionbio tapestri : error

Hello,

Thanks you for you package !

I'm trying to develop a pipeline of analysis for missionbio multi-omics data. They have something similar to cite-seq but with DNA-seq and protein. So i thought about using your package to normalize my protein read counts.

I used your docker and try to make it work with my data but i'm getting this error :

Error in lnsrch_morph(bvecold, fold, grad, pvec, fngrad_morph, morphList,  : 
  Initial slope not negative.

Perhaps you will have an idea about what i did wrong ?

Thanks you for your time !

Error in chol.default(temp): the leading minor of order 11 is not positive definite

Hello!!

Thank you for this tool! I am currently trying to run ADTnorm in a dataset with 45 antibodies and 97K cells. I am using the following code:

cell_x_feature$sample = factor(unlist(cell_x_feature$sample))
cell_x_feature$batch = factor(unlist(cell_x_feature$sample))

cell_x_adt_norm = ADTnorm(
  cell_x_adt = cell_x_adt, 
  cell_x_feature = cell_x_feature, 
  study_name = study_name,
  marker_to_process = NULL,
  bimodal_marker = NULL)

It seems to run smoothly until reaching the normalization of a specific antibody, when I get this error:

........Error in chol.default(temp) : 
  the leading minor of order 11 is not positive definite
In addition: There were 40 warnings (use warnings() to see them)

Do you know what could be causing it?
Thank you!
Marta

Error in landmarkRegion[[i]]

Thank you for this tool! I'm excited to see how it works.

I'm currently trying to run it on a dataset with 207K cells and 46 CITE-Seq antibodies. I have 18 10X lanes merged into one object. From there, I created a dataframe with the raw ADT counts and exported the batch information as a matrix. I named those objects cell_x_adt (for the counts) and cell_x_feature (for the batch information). I then ran the following code:

cell_x_adt_norm <- ADTnorm(cell_x_adt = cell_x_adt, cell_x_feature = cell_x_feature,
save_outpath = save_outpath, study_name = study_name)

And got the following error:

[1] "ADTnorm will process all the ADT markers from the ADT matrix:hash1, hash2, hash3, hash4, CD69.1, CD107a, CD154, HLA-DR, CD20, CD16, CD14.1, CD123, CD8, CD11b, CD11c, CD3, NKG2A, CD8B, CD4.1, CD27, CD1c, CD2.1, CD66b, CD56, CD206, CD163.1, TCR-Vdelta2, CD127, CD25, CD28.1, CD196-aka-CCR6, CD95, TCR-Va24-Ja18, CD183-aka-CXCR3, CD161, TCR-Va-7point2, TCR-gd-Vd1, CD279-aka-PD1, CD278-aka-ICOS, CX3CR1.1, CD194-aka-CCR4, TCR-Vg9, mIgG1-K-Isotype-Control, mIgG2b-K-Isotype-Control, mIgG2a-K-Isotype-Control, Armenian-Hasmter-IgG-Isotype-Control"
[1] "hash1"
Error in landmarkRegion[[i]] <- matrix(NA, ncol = 2, nrow = length(sample_name_list)) :
attempt to select less than one element in integerOneIndex

Any idea what could be causing this error or what could alleviate it? Thank you!

Edit: I also named save_outpath and study_name earlier in the script.

Unable to download ADTnomr

Hello,

I have been trying to download ADTnorm and use it and I keep running into this error:
ERROR: dependency ?flowStats? is not available for package ?ADTnorm?
I have followed the instructions of remotes::install_github("yezhengSTAT/ADTnorm", build_vignettes = FALSE, lib=libraryPath) and also I have tried to install the flowWorKspace and flowCore individually and that didn't fix the issue. This is my session info:

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] remotes_2.5.0

loaded via a namespace (and not attached):
[1] ps_1.7.6 prettyunits_1.2.0 crayon_1.5.2 withr_3.0.0 rprojroot_2.0.4 R6_2.5.1 rlang_1.1.3
[8] cli_3.6.2 curl_5.2.0 rstudioapi_0.15.0 callr_3.7.3 tools_4.1.2 compiler_4.1.2 processx_3.8.3
[15] pkgbuild_1.3.1 sessioninfo_1.2.2 tcltk_4.1.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.