immunomind / immunarch Goto Github PK

🧬 Immunarch: an R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires

License: Apache License 2.0

R 97.96% CSS 1.35% C++ 0.53% Dockerfile 0.16%

immunology tcr tcr-repertoire immunoinformatics immune-repertoire rep-seq bcr-repertoire bcr ig ig-repertoire bioinformatics immune-repertoire-data immune-repertoire-analysis t-cell-receptor b-cell-receptor immunoglobulin repertoire-analysis airr-analysis single-cell single-cell-analysis

immunarch's Introduction

`immunarch` --- Fast and Seamless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires in R

Why `immunarch`?

Work with any type of data: single-cell, bulk, data tables, databases --- you name it.
Community at the heart: ask questions, share knowledge and thrive in the community of almost 30,000 researchers and medical scientists worldwide. Pfizer, Novartis, Regeneron, Stanford, UCSF and MIT trust us.
One plot --- one line: write a whole PhD thesis in 8 lines of code or reproduce almost any publication in 5-10 lines of immunarch code.
Be on the bleeding edge of science: we regularly update immunarch with the latest methods. Let us know what you need!
Automatic format detection and parsing for all popular immunosequencing formats: from MiXCR and ImmunoSEQ to 10XGenomics and ArcherDX.

Lightning-fast Start

install.packages("immunarch")           # Install the package
library(immunarch); data(immdata)       # Load the package and the test dataset
repOverlap(immdata$data) %>% vis()      # Compute and visualise the most important statistics:
geneUsage(immdata$data[[1]]) %>% vis()  #     public clonotypes, gene usage, sample diversity
repDiversity(immdata$data) %>% vis(.by = "Status", .meta = immdata$meta)      # Group samples

From Berkeley with devotion

immunarch is brought to you by ImmunoMind --- a UC Berkeley SkyDeck startup. ImmunoMind improves the design of adoptive T-cell therapies such as CAR-T by precisely identifying T-cell subpopulations and their immune profile. ImmunoMind's tools are trusted by researchers from top pharma companies and universities, including 10X Genomics, Pfizer, Regeneron, UCSF, MIT, Stanford, John Hopkins School of Medicine and Vanderbilt University.

Introduction
Contact
Installation
Features
Quick Start
Bugs and Issues
Contribution
Citation

Introduction

immunarch is an R package designed to analyse T-cell receptor (TCR) and B-cell receptor (BCR) repertoires, mainly tailored to medical scientists and bioinformaticians. The mission of immunarch is to make immune sequencing data analysis as effortless as possible and help you focus on research instead of coding.

Contact

Create a ticket with a bug or question on GitHub Issues to get help from the community and enrich it with your experience. If you need to send us sensitive data, feel free to contact us via [email protected].

Installation

Latest release on CRAN

In order to install immunarch execute the following command:

install.packages("immunarch")

That's it, you can start using immunarch now! See the Quick Start section below to dive into immune repertoire data analysis. If you run in any trouble during installation, take a look at the Installation Troubleshooting section.

Note: there are quite a lot of dependencies to install with the package because it installs all the widely-used packages for data analysis and visualisation. You got both the AIRR data analysis framework and the full Data Science package ecosystem with only one command, making immunarch the entry-point for single-cell & immune repertoire Data Science.

Latest release on GitHub

If the above command doesn't work for any reason, try installing immunarch directly from its repository:

install.packages(c("devtools", "pkgload")) # skip this if you already installed these packages
devtools::install_github("immunomind/immunarch")
devtools::reload(pkgload::inst("immunarch"))

Latest pre-release on GitHub

Since releasing on CRAN is limited to one release per one or two months, you can install the latest pre-release version with all the bleeding edge and optimised features directly from the code repository. In order to install the latest pre-release version, you need to execute the following commands:

install.packages(c("devtools", "pkgload")) # skip this if you already installed these packages
devtools::install_github("immunomind/immunarch", ref="dev")
devtools::reload(pkgload::inst("immunarch"))

You can find the list of releases of immunarch here: https://github.com/immunomind/immunarch/releases

Key Features

Data agnostic. Fast and easy manipulation of immune repertoire data:
- The package automatically detects the format of your files---no more guessing what format is that file, just pass them to the package;
- Supports all popular TCR and BCR analysis and post-analysis formats, including single-cell data: ImmunoSEQ, IMGT, MiTCR, MiXCR, MiGEC, MigMap, VDJtools, tcR, AIRR, 10XGenomics, ArcherDX. More coming in the future;
- Works on any data source you are comfortable with: R data frames, data tables from data.table, databases like MonetDB, Apache Spark data frames via sparklyr;
- Tutorial is available here.
Beginner-friendly. Immune repertoire analysis made simple:
- Most methods are incorporated in a couple of main functions with clear naming---no more remembering dozens and dozens of functions with obscure names. For details see link;
- Repertoire overlap analysis (common indices including overlap coefficient, Jaccard index and Morisita's overlap index). Tutorial is available here;
- Gene usage estimation (correlation, Jensen-Shannon Divergence, clustering). Tutorial is available here;
- Diversity evaluation (ecological diversity index, Gini index, inverse Simpson index, rarefaction analysis). Tutorial is available here;
- Tracking of clonotypes across time points, widely used in vaccination and cancer immunology domains. Tutorial is available here;
- K-mer distribution measures and statistics. Tutorial is available here;
- Coming in the next releases: CDR3 amino acid physical and chemical properties assessment, mutation networks.
Seamless publication-ready plots with a built-in tool for visualisation manipulation:
- Rich visualisation procedures with ggplot2;
- Built-in tool FixVis makes your plots publication-ready: easily change font sizes, text angles, titles, legends and many more with clear-cut GUI;
- Tutorial is available here.

Quick start

The gist of the typical TCR or BCR data analysis workflow can be reduced to the next few lines of code.

Use `immunarch` data

1) Load the package and the data

library(immunarch)  # Load the package into R
data(immdata)  # Load the test dataset

2) Calculate and visualise basic statistics

repExplore(immdata$data, "lens") %>% vis()  # Visualise the length distribution of CDR3
repClonality(immdata$data, "homeo") %>% vis()  # Visualise the relative abundance of clonotypes

3) Explore and compare T-cell and B-cell repertoires

repOverlap(immdata$data) %>% vis()  # Build the heatmap of public clonotypes shared between repertoires
geneUsage(immdata$data[[1]]) %>% vis()  # Visualise the V-gene distribution for the first repertoire
repDiversity(immdata$data) %>% vis(.by = "Status", .meta = immdata$meta)  # Visualise the Chao1 diversity of repertoires, grouped by the patient status

Use your own data

library(immunarch)  # Load the package into R
immdata <- repLoad("path/to/your/data")  # Replace it with the path to your data. Immunarch automatically detects the file format.

Advanced methods

For advanced methods such as clonotype annotation, clonotype tracking, k-mer analysis and public repertoire analysis see "Tutorials".

Bugs and Issues

The mission of immunarch is to make bulk and single-cell immune repertoires analysis painless. All bug reports, documentation improvements, enhancements and ideas are appreciated. Just let us know via GitHub (preferably) or [email protected] (in case of private data).

Bug reports must:

Include a short, self-contained R snippet reproducing the problem.
Add a minimal data sample for us to reproduce the problem. In case of sensitive data you can send it to [email protected] instead of GitHub issues.
Explain why the current behavior is wrong/not desired and what you expect instead.
If the issue is about visualisations, please attach a picture to the issue. In other case we wouldn't be able to reproduce the bug and fix it.

Help the community

Aspiring to help the community build the ecosystem of scRNAseq & AIRR analysis tools? Found a bug? A typo? Would like to improve documentation, add a method or optimise an algorithm?

We are always open to contributions. There are two ways to contribute:

Create an issue here and describe what would you like to improve or discuss.
Create an issue or find one here, fork the repository and make a pull request with the bugfix or improvement.

Citation

ImmunoMind Team. (2019). immunarch: An R Package for Painless Bioinformatics Analysis of T-Cell and B-Cell Immune Repertoires. Zenodo. http://doi.org/10.5281/zenodo.3367200

BibTex:

@misc{immunomind_team_2019_3367200,
  author       = {{ImmunoMind Team}},
  title        = {{immunarch: An R Package for Painless Bioinformatics Analysis 
                    of T-Cell and B-Cell Immune Repertoires}},
  month        = aug,
  year         = 2019,
  doi          = {10.5281/zenodo.3367200},
  url          = {https://doi.org/10.5281/zenodo.3367200}
}

For EndNote citation import the immunarch-citation.xml file.

Preprint on BioArxiv is coming soon.

License

The package is freely distributed under the Apache-2.0 license. You can read more about it here.

For commercial or server use, please contact ImmunoMind via [email protected] about solutions for biomarker data science of single-cell immune repertoires.

Commercial Support

immunarch's People

Contributors

Stargazers

Watchers

Forkers

miachol shine333 aminzia mengchengyao meng0625 veronicanoyaro flywind2 jchenpku akhileshkaushal zpeng1989 zhangt369 rushil-chakra decenwang graceyraspberry runzi2018 changrong1023 haroon123 abrown435 k-blenman saifulislampharma rainlqy wangshun1121 ryanyip-kat pio82 emberwhirl ganson2018 jhuanglabtools jiaguwentt rosemary94 teresarubio philloidin dunlapg siyangming sjtu1999 jimhester atdurian dongfang1021 jianguozhou3 romainfrancois dfli1012 plezar feigeliudan01 zhangzy1994 shicheng-guo alienzj rnaimehaom micolak0115 zky17715002 crsky1023 pwwang sfu-ireceptor mahlaranjeet liisapomerants ewowiredu genomicsnx ncrna mozi-2060 genostack jxshi mqondisi mikolajkocikowski lxq123ll michaelchirico chunxuan-hs

immunarch's Issues

Wrong formula for Jaccard index?

🐛 Bug

When using the repOverlap function using the Jaccard index, I think it calculates wrong numbers. When looking at the source code in overlap.R, I saw that the Jaccard index is calculated as follows:

jaccard_index.default <- function(.x, .y) {
  .x = collect(.x, n = Inf)
  .y = collect(.y, n = Inf)
  intersection = nrow(dplyr::intersect(.x, .y))
  intersection/(nrow(.x) + nrow(.y) + intersection)
}

However, doesn't the intersection need to be subtracted:
intersection/(nrow(.x) + nrow(.y) - intersection)
?

I'm using Immunarch v. 0.5.5.

Thank you.

Suggestions: provide a detailed description about clonotype definition model

📚 Documentation

Hi, I think it is necessary to have a doc for models of clonotype definitions, like the one in scirpy.

Trouble loading an entire directory

Hi,
When I use repLoad(/path/to/mixcrclonesoutputfile.txt), I can do so without any problems.

However, when I use repLoad(/path/to/foldercontainingseveralmixcrclones.txtfiles/), I get this error-
Error in strsplit(df[[.dalignments]], "|", T, F, T) :
non-character argument

The folder contains a metadata.txt file with 3 tab-separated columns:
Sample Column1 Column2

Wondering if anyone knows what the issue seems to be?
Thanks.

Problems with metadata upload

Hi
i have been trying for a long time now to import metadata with my data.
the data is read fine, but the metadata not. I always get the following error:

metadata.txt
”[!] Samples found in the dataset, but not in the metadata: 18_5 22_5 24_5 Did you add all the necessary samples to the metadata file with correct names?”

Please find attached the metadata that was parsed using the following command, toghether with the files 18_5, 22_5, 24_5:

immdata = repLoad("/Users/anner/AnneDoc/Results_2019/TCRa_Sequencing/Piotr_sequencing /bfx1062/files_grouped_forTcR/Group2_Teff")

Could you please have a look and let me know what is wrong?

Thanks a lot !!!
annecar

Third test issue.

Body of the third test issue.

Number of clonotypes (or clones) plot y axis labels

I have 3434 cells (clones) in my sample, but the clonotype plot obtained by

exp_vol = repExplore(immdata$data, .method = "volume")
vis(exp_vol)

shows well over 5000 clonotypes.
The clone plot shows over 25000 clones.

See also https://immunarch.com/articles/3_basic_analysis.html where the clonotype numbers are similar.

I believe there is a bug with y axis labels: the ticks are labelled with 2000, 4000 etc instead of the correct numbers.

See also plots of diversity estimate (e.g. Chao1) where the ticks are also labelled with 2000, 4000 etc.

Test Issues

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

Expected behavior

Additional context

RepLoad issue with MiXCR files

Hello,
I just started using Immunarch today but when trying to parse a folder with different .txt clonoset files that I obtained using mixcr it gives me the following error

TCRAs <- repLoad("Users/abolivar1/Documents/TCRseq Project/Bioinformatics/TCRAs/", .format = "mixcr")
== Step 1/3: loading repertoire files... ==
Error in if (file.info(path)$isdir) { :
missing value where TRUE/FALSE needed

I am not sure what would be wrong with the files, as I am using them straight from Mixcr.

Any help would be appreciated

Thanks,
Ana

Can't upload immunarch formated table.

🐛 Bug

immunarch formated files are not being imported

To Reproduce

Steps to reproduce the behavior:

1.immdata = repLoad(.path = "AF.tsv", .format = 'immunarch')
2.
3.

Error in `[[<-.data.frame`(`*tmp*`, IMMCOL$cdr3nt, value = logical(0)) : 
  replacement has 0 rows, data has 3

AF.tsv.zip

Expected behavior

I should be able to import it into an immdata object

Additional context

Ordering clonotypes per sample, and colour scheme using trackClonotypes

Hi!

I just had a couple of questions regarding the latest version, and the function 'trackClonotypes'.

If I run trackClonotypes on more than 1 sample, I get the stacked bar plots, with the same clonotypes across different samples, coloured the same.

When I tried to order the input data for trackClonotypes, it doesn't change the order of the clonotypes (I want it to be in ascending or descending order of number)and instead I get this-
Warning:
In melt.data.table(.data) :
To be consistent with reshape2's melt, id.vars and measure.vars are internally guessed when both are 'NULL'. All non-numeric/integer/logical type columns are considered id.vars, which in this case are columns [CDR3.aa]. Consider providing at least one of 'id' or 'measure' vars in future.

Is there any way I can change this?

Also, is there anyway to change the colour scheme of the output for trackClonotypes?

Thanks a lot!

Problem generating mds plot after repOverlap

🐛 Bug

Hi There, I am running into an error while trying to graph with vis after repOverlap, exactly as detailed in the manual

To Reproduce

Steps to reproduce the behavior:

imm_ov1 = repOverlap(immdata.opc$data, .method = "public", .verbose = T)
vis(repOverlapAnalysis(imm_ov1, "mds+kmeans"))

it gives an error
.by="cluster", .meta=immdata.opc$meta

However, when I run
vis(repOverlapAnalysis(imm_ov1, "tsne"))

there is no problem.

This is what the beginning of print(imm_ov1) looks like

I think it might be due to the NAs in the middle. Thoughts?

Thanks in advance for your help

Expected behavior

Additional context

Error with repLoad function

Hi,

When using repLoad function I am getting the following error:

immdata <- repLoad("MIXCR/ExportedClones")
Parsing MIXCR/ExportedClones ...
Parsing MIXCR/ExportedClones/metadata.txt -- metadata
Parsing MIXCR/ExportedClones/SLX-17310.i701_i502_tcr.clonotypes.ALL.txt -- mixcr
Error in [[<-.data.frame(*tmp*, .aa.seq, value = list()) :
replacement has 0 rows, data has 218
I am using immunarch version 0.3.3
Output files are from mixcr in .txt format.

I would appreciate it if you could help me to resolve this issue.

Kind regards
Pani

Replace "develop" with "dev" in README

🚀 Feature

Replace "develop" branch with "dev" in README and close the "develop" branch

Error: All columns in a tibble must be 1d or 2d objects: * Column `Sample` is NULL

🐛 Bug

Important issue from emails.

I have tried to use immunarch for analyzing my data and encountered an error.

All columns in a tibble must be 1d or 2d objects:

Column Sample is NULL

An example of my input data file is attached. Would you please help me figure out how to prepare it for using in immunarch?

Difficulties parsing .txt files from MiXCR using repLoad

Hi Vadim,

I have used Immunarch previously, but can't get it to load data from my current study.

I have used MiXCR to align, assemble and export clones and alignments on bulk RNA-seq data, and have output files as .txt. Unfortunately repLoad will not parse my data either as the clones (from exportClones) or alignments (exportAlignments).

When defining the data format (.format = "mixcr") I receive the following error message: "Error in strsplit(df[[.dalignments]], "|", T, F, T) : non-character argument".

When leaving the data format undefined I receive this message: "-- unsupported format, skipping".

For simplicity, I haven't included my metadata file in the folder of data; and have limited the contents of the folder to a single text file each time (either clones, or alignments). I have attached an example of these data in a separate email.

Your help would be much appreciated.

Thanks, Michael

Error installing on windows

After running the following code from documentation in Windows10:

install.packages("devtools", dependencies = T)
devtools::install_local("path/to/your/folder/with/immunarch.tar.gz", dependencies=T)

I'm receiving the following error:

The downloaded binary packages are in
C:\Users\João Drama\AppData\Local\Temp\Rtmp8a5Pdw\downloaded_packages

checking for file 'C:\Users\João Drama\AppData\Local\Temp\Rtmp8a5Pdw\remotes1bc85014173\immunarch/DESCRIPTION' ...

√ checking for file 'C:\Users\João Drama\AppData\Local\Temp\Rtmp8a5Pdw\remotes1bc85014173\immunarch/DESCRIPTION' (1.9s)

preparing 'immunarch': (2.8s)
checking DESCRIPTION meta-information ...

checking DESCRIPTION meta-information ...

√ checking DESCRIPTION meta-information

cleaning src

checking vignette meta-information ...

√ checking vignette meta-information

excluding invalid files (1.1s)

Subdirectory 'inst/doc' contains invalid file names:
'1_introduction.Rmd' '2_data.Rmd' '3_basic_analysis.Rmd'
'4_overlap.Rmd' '5_gene_usage.Rmd' '6_diversity.Rmd' '7_fixvis.Rmd'
'1_introduction.html' '2_data.html' '3_basic_analysis.html'
'4_overlap.html' '5_gene_usage.html' '6_diversity.html'
'7_fixvis.html'

Warning in .write_description(db, ldpath) :

Warning in .write_description(db, ldpath) :
Unknown encoding with non-ASCII data: converting to ASCII

checking for LF line-endings in source and make files and shell scripts
checking for empty or unneeded directories
looking to see if a 'data/datalist' file should be added
building 'immunarch_0.3.2.tar.gz'

Installing package into ‘C:/Users/João Drama/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)

installing source package 'immunarch' ...
Warning in file(file, if (append) "a" else "w") :
cannot open file 'C:/Users/Joco Drama/Documents/R/win-library/3.5/immunarch/DESCRIPTION': No such file or directory
Error in file(file, if (append) "a" else "w") :
nÃ£o Ã© possÃvel abrir a conexÃ£o
ERROR: installing package DESCRIPTION failed for package 'immunarch'
removing 'C:/Users/João Drama/Documents/R/win-library/3.5/immunarch'
In R CMD INSTALL
Error in i.p(...) :
(convertido do aviso) installation of package ‘C:/Users/JOODRA~1/AppData/Local/Temp/Rtmp8a5Pdw/file1bc81dbd528c/immunarch_0.3.2.tar.gz’ had non-zero exit status

l´
Erro: unexpected input in "l´"

Thanks to anyone who can provide some insight!

Joao

Gene usage visualisation grouped by metadata

Hi,
I am trying to graph gene usage in samples as grouped by status. I am using my own dataset and I can't see any grouping happening. I have also tried using the test dataset and I also don't see that the samples get grouped. This is the code I am using, as provided in the Quick Start:

data(immdata) 
gu = geneUsage(immdata$data)
vis(gu, .by="Status", .meta=immdata$meta)

I don't see any difference by using the .by="Status" argument or not.

GitlabExodus

Task - GitlabExodus

Transfer Immunarch code from Gitlab to Github

repDiversity .method="dXX" not working

❓ Questions and Help

The following code gives an error:
div_d10 = repDiversity(.data = coding(immdata$data), .method = "dXX", .perc = 10)
Error in FUN(X[[i]], ...) :
You entered the wrong method! Please, try again.

Thank you in advance for your assistance!

Datatable error when dividing public repetoir

Hi, I get the following error when running the code below that it is complaining about the syntax ":=". I'm using R 3.6.3 and Immunoarch 0.5.5 and dplyr 0.8.5 on a Windows 10. I assume it is some bug related to which version of packages you use...

library(immunarch)
data("immdata")
immdata <- immdata 

pr = pubRep(immdata$data, "aa", .coding = T, .verbose = F)
pr1 = pubRepFilter(pr, immdata$meta, c(Status = "C"))
pr2 = pubRepFilter(pr, immdata$meta, c(Status = "MS"))
pr3 = pubRepApply(pr1, pr2,)

Error in :=(Samples.y, NULL) :
Check that is.data.table(DT) == TRUE. Otherwise, := and :=(...) are defined for use in j, once only and in particular ways. See help(":=").

I also get a warning when running pubRepFilter that:

You are using a dplyr method on a raw data.table, which will call the data frame
implementation, and is likely to be inefficient.

To suppress this message, either generate a data.table translation with lazy_dt()
or convert to a data frame or tibble with as.data.frame()/as_tibble().You are using a dplyr method on a raw data.table, which will call the data frame implementation, and is likely to be inefficient.

To suppress this message, either generate a data.table translation with lazy_dt()
or convert to a data frame or tibble with as.data.frame()/as_tibble().

2x2 matrix output in repOverlap when there are only two samples to analyse

🚀 Feature

Important request from emails.

Motivation

I am using your immunarch package’s repOverlap function.

However, when there are only two samples in the input directory, the 2 x 2 matrix for the repOverlap can not be generated correctly, and the expected matrix is dropped to a numeric value. Thereby, the heatmap graph is not drawn correctly, it is not 2 x 2 layout, and the axis tick labels are wrong.

Could you please fix the issue and produce the 2 x 2 matrix correctly when there is only 2 samples as inputs. In my opinion, the consistency is important for users when they have many different sample sets to be

The picture does not match the data

🐛 Bug

I think the .by argument in vis is not fully working at least when a vector is provided (Version : immunarch_0.3.3.9005)

To Reproduce

This is what I get when looking at exp_vol

exp_vol$Volume
[1] 213 5647 1333 2658 528 1326
by_vec = factor(c("D","S","D","S","L","N"))
by_vec
[1] D S D S L N
Levels: D L N S
p = vis(exp_vol, .by = by_vec)
Warning: Ignoring unknown aesthetics: y
Warning: Ignoring unknown aesthetics: xmin, xmax, annotations, y_position
p

Expected behavior

As one can see first D value is 213 second D value is 1333 and not 5647 as in the picture.

Thank you for your help!

Suggestions: metadata for each cell and integrate with Seurat in single-cell VDJ scenario

Hi immunarch developers,

Great thanks for this cool tool!

I would like to suggest two features which I think would improve the usability of immuarch when the input data is from single-cell.

Add metadata for each cell.

Now, immuarch reads data from samples and metadata is for each sample. In single-cell VDJ, if user want to compare clones of sub clusters of cells, they have to create another set of input files. If each cell has its own metadata, it would be much easier to compare different groups of cells.

Provide a small vignette or a set of functions to integrate VDJ data into Seurat object.

Nowadays, lots of researchers combine single-cell VDJ and single-cell RNA-seq analysis in their studies, and tools like Seurat are popular in single-cell RNA-seq data analysis. It is useful to map VDJ data to the cell clusters defined by transcriptome. There are already such tries like this and this. Since immuarch has high-level interfaces to manipulate VDJ data from several different platforms, it would be great that immuarch can provide a more elegant way to do this.

Please ignore me if you think it is trivial.

rarefraction - seq default error

rarefraction - seq.default error

Hello,

When I try to run rarefraction on my data it returns the following error. Can you please help?

imm_raref = repDiversity(mydata$data, "raref", .extrapolation = 200000, .verbose = F)
Error in seq.default(tail(seq(.step, sum(.data[[i]]), .step), 1) + .step, :
wrong sign in 'by' argument

I am not sure if the error is because of huge datasets.
mydata is a list of 3 dataframes.

dim(mydata[["data"]][["R_1"]])
[1] 143866 11
dim(mydata[["data"]][["R_2"]])
[1] 823220 11
dim(mydata[["data"]][["R_3"]])
[1] 980159 11

Thanks!

Is it possible to compare immunarch objects with repOverlap?

Hello,

Thanks a lot for the great tool.
As the title says, Is it possible to compare immunarch objects with repOverlap?
I created several immunarch objects from a 10x dataset using the "filter_barcode" function based on clusters I got from UMAP clustering of the cells by Seurat.

I can now look at the repertoire diversity per cluster, but I want to check the overlap between clusters too. Is there a way to do this?

Thanks a lot.

Error using repDiversity module

Hey, I had an error message using repDiversity module and I don't know how to solve the problem. The test data is using a result from Mixcr. Is there some way to pass it ? Many thanks.

Some questions about the visualization by group

Hi,

Thanks for this cool tool!

I have two questions about the visualization of .by group data.

Here is my data and metadata.

> names(tcr$data)
 [1] "0619_LN1" "0619_LN2" "0619_LN3" "0619_LN4" "0619_LN5" "0619_LN6" "0619_N1"  "0619_N2"  "0619_N3"  "0619_N4"  "0619_N5" 
[12] "0619_P1"

> tcr$meta
# A tibble: 12 x 4
   Sample   patient source tissue  
   <chr>    <chr>   <chr>  <chr>   
 1 0619_LN1 S0619   LN1    LN      
 2 0619_LN2 S0619   LN2    LN      
 3 0619_LN3 S0619   LN3    LN      
 4 0619_LN4 S0619   LN4    LN      
 5 0619_LN5 S0619   LN5    LN      
 6 0619_LN6 S0619   LN6    LN      
 7 0619_N1  S0619   N1     normal  
 8 0619_N2  S0619   N2     adjacent
 9 0619_N3  S0619   N3     tumor   
10 0619_N4  S0619   N4     tumor   
11 0619_N5  S0619   N5     tumor   
12 0619_P1  S0619   P1     PBMC

First, I visualized the clonality by proportion.

tcr.imm_pr = repClonality(tcr$data, .method = "clonal.prop")
vis(tcr.imm_pr)

vis(tcr.imm_pr, .by = 'tissue', .meta = tcr$meta, .test = F)

From the individual samples. "0619_N3", "0619_N4" and "0619_N5" (from tumor) all have higher values than that of "0619_P1" (from PBMC). Why does PBMC have higher value than that of tumor after using .by = 'tissue'?

Second, visualized the clonal space homeostasis.

tcr.imm_hom = repClonality(tcr$data, .method = "homeo", .clone.types = c(Small = .0001, Medium = .001, Large = .01, Hyperexpanded = 1))
vis(tcr.imm_hom)

vis(tcr.imm_hom, .by = c('tissue'), .meta = tcr$meta, .test = F)

In the figure of individual samples, "0619_N2" does not have "Small" clones while all others have. But in the group view, "adjacent" (from "0619_N2") has over 20% "Small" clones. Also, nearly all samples have high proportion of "Medium" clones, but only tissue "LN" shows in the group view. So, how to understand this?

I am looking forward to hearing you.

Bests,
Yiwei Niu

fail to reload the data

I just followed the instruction, created a metadata.txt in the folder containing the vdjtools output --TCR clonotypes data ,but when I try to reload the data " immdata <- repLoad("data_vdjtool","vdjtools")", the error come up :
Parsing data_vdjtool/c.1.txt -- vdjtools
Error in $<-.data.frame(*tmp*, "Proportion", value = numeric(0)) :
replacement has 0 rows, data has 143053.
But when I try the tcR package : "immdata <-parse.folder()", it works.

geneUsage and Diversity function

Hi, I would like to see the frequency of the gene usage. Is there an option of telling .quant to use the column "proportions".
Moreover, I am interested in the Shannon-Wiener Index for a cloneset (list) and the entropy function only allows choosing 1 column from 1 data.frame. Is there a way around that?

Thanks a lot!

Visualisation issues (vis)

🐛 Bug

Attempts to visualise several analysis either result in warning messages with no output, or just no output. I'm guessing it may be due to the structure of my data so I'll include my repLoad procedure below.

To Reproduce

Data is the filtered annotation files for two 10x samples. Samples seemed to read in ok, although got warning message (The following named parsers don't match the column names: barcode...). Read in using:

immdata <- repLoad("./immunarch")

metadata.txt content:

Sample
wt_UT
wt_aCT

Example of vis command with warning and no output:

> exp_vol = repExplore(immdata$data, .method = "volume")
> vis(exp_vol, .by = c("Sample"), .meta = immdata$meta)
Warning: Ignoring unknown aesthetics: y
Warning: Ignoring unknown aesthetics: xmin, xmax, annotations, y_position

Example of vis command with no output (or warning):

> exp_len = repExplore(immdata$data, .method = "len", .col = "aa")
> vis(exp_len)

Expected behavior

Graphical output of data

Additional comments

Several other vis inputs result in the same issue, happy to list them if that's of use, also happy to provide sample dataset. Suspicion the issue may lie in the "Source" section of the metadata, which is just my sample names in triplicate (see below), which may be confuse vis.

> immdata[["meta"]][["Source"]]
[1] "wt_aCT" "wt_aCT" "wt_aCT" "wt_UT"  "wt_UT"  "wt_UT"

repLoad error for 10X

I encountered the error when trying to repLoad a 10X VDJ library. Do you know why?

> repLoad(
+   "Fresh1_Tcell_vdj/outs/", 
+   .format = "10x")
Parsing Fresh1_Tcell_vdj/outs/ ...
Parsing Fresh1_Tcell_vdj/outs//all_contig_annotations.bed -- 10x
Error in `[[<-.data.frame`(`*tmp*`, .nuc.seq, value = character(0)) : 
  replacement has 0 rows, data has 7715
In addition: Warning messages:
1: The following named parsers don't match the column names: AGTAGTCTCGTTTAGG-1_contig_1, 22, 356, TRBV13-1_L-REGION+V-REGION 
2: In .which_recomb_type(df[[.vgenes]]) :
  Can't determine the type of V(D)J recombination. No insertions will be presented in the resulting data table.

I used cellranger v3.0.2 to produce the input repertoire files.

Thanks in advance!

Install issue

After running the following code from documentation:

> devtools::install_local("C:/immunarch/immunarch.tar.gz", dependencies = T)

I'm receiving the following error:

Skipping 1 packages not available: MonetDBLite
Installing 40 packages: airr, circlize, config, cowplot, dbplyr, dendextend, diptest, dtplyr, ellipse, factoextra, FactoMineR, fastcluster, flashClust, flexmix, forge, fpc, generics, ggpubr, ggrepel, ggsci, ggsignif, GlobalOptions, gridBase, heatmap3, kernlab, leaps, mclust, modeltools, MonetDBLite, polynom, prabclus, r2d3, Rtsne, scatterplot3d, shape, shinythemes, sparklyr, treemap, trimcluster, viridis
Installing packages into ‘C:/Users/rgorsuch/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
Error: (converted from warning) package ‘MonetDBLite’ is not available (for R version 3.5.1)

immunarch is not showing up in my Packages library and the repLoad function is not present when trying to import data.

Thanks to anyone who can provide some insight!

Ryne

Test issue #2.

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

Expected behavior

Additional context

test.txt

Update the "Introduction" vignette

📚 Documentation

Update the Introduction vignette https://immunarch.com/articles/v1_introduction.html with correct installation routines

Second test issue.

Test issue.

test.txt
Testing youtrack helpdesk integration.

10x parser wrongly uses UMI as clones / proportion

🐛 Bug

As titled, 10x parser wrongly used the UMI slot as count for clones. However, 10x uses the barcode as the 'count' of cells and 'UMI' as count of transcript.

To Reproduce

Steps to reproduce the behavior:

Read 10x consensus annotation.csv with repLoad
immdata <- repLoad("/path/to/consensus_annotation.csv", .format = "10x")
view the data my immdata
head(immdata$data)

Expected behavior

Count the number of barcode with the same VDJ (perhaps use just CDR3 at amino acid level) as the count.

Additional context

Since consensus annotation.csv contains no barcode information, probably need to use filtered_contig_annotation.csv instead.

NA group appears in group drawing

Hi
I checked the group information, there is no data missing, I can't solve this problem, can you help me?
my code
vis(imm_tail, .by="Type3", .meta=BJclinical)

Error with geneUsage on IGH

🐛 Bug

I am trying to run geneUsage on a MiXCR dataset that I imported consisting only of IGH chains. I can run it to generate a histogram of v gene usage (using "hs.ighv") but when I try to look at other genes, it only can pull the ighv genes.

To Reproduce

The following will produce a histogram of v genes.
v_usage = geneUsage(BCRdata.coding$data, "hs.ighv", .norm = TRUE, .ambig = "mag")
vis(v_usage, .plot = "hist")

However, the following attempts to generate d and j gene histograms just return the same v gene histogram.
d_usage = geneUsage(BCRdata.coding$data, "hs.ighd", .norm = TRUE, .ambig = "mag")
vis(d_usage, .plot = "hist")
or
j_usage = geneUsage(BCRdata.coding$data, "hs.ighj", .norm = TRUE, .ambig = "mag")
vis(j_usage, .plot = "hist")

Looking at d_usage or j_usage themselves, they are actually just the same as v_usage, a table of the v genes in my dataset. So it seems like geneUsage is having trouble pulling the correct genes from my dataset.

Additional context

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

other attached packages:
[1] cowplot_1.0.0 openxlsx_4.1.3 immunarch_0.5.2 gridExtra_2.3
[5] data.table_1.12.6 dtplyr_1.0.0 dplyr_0.8.3 ggplot2_3.2.1

Including BCR constant region information

🚀 Feature

Most BCR analysis programs (ex. MIXCR and IMMCANTATION) also output a field with information regarding the constant chain (ex. IGHG1, IGHA1). It would be nice to also load that information into immunarch.

Motivation

BCR constant region is an important characteristic of the repertoire - i.e for proportion of clones in IGHM vs IGHA tells us a lot about what process of affinity maturation/somatic hypermutation the clones is in. So far, when I need to do analysis like this I have always just wrote custom script to reload the data from mixcr.

Pitch

Add an extra field for CREGION

Alternatives

N/A

Additional context

repExplore .col="aa" and .col="nt"

When using repExplore, I am unable to select the amino acid or nucleotide CDR3 columns- the output is always based on the "Sequence" column (total rows in the dataset).

So, for example, these three output the same number of clonotypes:
exp_vol <- repExplore(immdata$data, .method = "volume")
exp_vol <- repExplore(immdata$data, .method = "volume", .col = "nt")
exp_vol <- repExplore(immdata$data, .method = "volume", .col = "aa")

Thanks in advance for the assistance!

This note is converted to an issue.

Task - Example

Todo - x

Empty R data frames after parsing MiXCR files

🐛 Bug

Important issue from our support email:

Email 1

I managed to repLoad my files but with these warning:

Warning messages:
1: In readLines(f, 1) : line 1 appears to contain an embedded nul
2: In readLines(f, 1) : line 1 appears to contain an embedded nul
3: In readLines(f, 1) : line 1 appears to contain an embedded nul

And when I try to load the data (immdata2) I get this:

> immdata2
$data
named list()

$meta
# A tibble: 0 x 0

So I wonder if I used the right file from MiXCR (clna)?

Email 2

I used the "all" file it works!!!

mixcr clna file loading fails

hello,
loading data from mixcr generates a bunch of errors. I attach the mixcr report which looks quite normal.

> library("immunarch")
> results <- "mixcr"
> rep <- repLoad(results, .format = "mixcr")
== Step 1/3: loading repertoire files... ==
Processing "mixcr" ...
  -- Parsing "mixcr/99753.clna" -- mixcr
Error: can't find a column with V genes
Error: can't find a column with J genes
Error: can't find a column with D genes
|=================================================================| 100%  481 MB
Warning: 14752614 parsing failures.
row            col  expected        actual                                                                        file
  1 NA             2 columns 1 columns     'mixcr/99753.clna'
  2 NA             2 columns 1 columns     'mixcr/99753.clna'
  3 MiXCR.CLNA.V04           embedded null 'mixcr/99753.clna'
  3 NA             2 columns 1 columns     'mixcr/99753.clna'
  4 MiXCR.CLNA.V04           embedded null 'mixcr/99753.clna'
... .............. ......... ............. ...........................................................................
See problems(...) for more details.

Error in `[[.tbl_df`(head(df), .vgenes) : object '.vgenes' not found
In addition: Warning messages:
1: In read.table(.filename, sep = .sep, skip = 0, nrows = 1, stringsAsFactors = F,  :
  line 1 appears to contain embedded nulls
2: Missing column names filled in: 'X2' [2]

99753.report.txt

Issues loading 10x data

❓ Questions and Help

We have a set of listed tutorials available on the website.

Hello, I am trying to load 10x data into R using repLoad and am getting the following errors. Am I using the correct 10x file? Any ideas for how to resolve my issue?

> immdata <- repLoad("filtered_contig_annotations.csv",.format = "10x")
== Step 1/3: loading repertoire files... ==
Processing "<initial>" ...
  -- Parsing "filtered_contig_annotations.csv" -- 10x
unknown format, skipping

== Step 2/3: checking metadata files and merging... ==
Processing "<initial>" ...
  -- Metadata file not found; creating a dummy metadata...
Dropping  1 column(s) from .metadata.txt. Do you have spaces or tabs after the name of the last column? Remove them to ensure everything works correctly.

== Step 3/3: splitting data by barcodes and chain types... ==
Done!
> str(immdata)
List of 2
 $ data: Named list()
 $ meta: tibble [0 × 0] (S3: tbl_df/tbl/data.frame)
 Named list()
>

Error with installing and loading IMGT output

Hi,

I’m trying to install immunarch in R with:

install.packages("immunarch")

But I get the following error:
Warning message:
package ‘immunarch’ is not available (for R version 3.6.1)

I also tried installing it in R version 3.2.3 and it gave the same error.

Because this didn’t work I installed the pre-release version and tried to load IMGT output data. But then I also get an error:

install.packages("devtools")
devtools::install_url("https://github.com/immunomind/immunarch/raw/master/immunarch.tar.gz")
imgtdata=repLoad("path_to_file", .format="imgt")
== Step 1/3: loading repertoire files... ==
Processing "path_to_file" ...
-- Parsing "path_to_file/1_Summary.txt" -- imgt
Warning: 2 parsing failures.
row col expected actual file
1 -- 34 columns 4 columns 'path_to_file/1_Summary.txt'
1 -- 34 columns 4 columns 'path_to_file/1_Summary.txt'

Warning: 12330 parsing failures.
row col expected actual file
1 -- 34 columns 4 columns 'path_to_file/1_Summary.txt'
2 -- 34 columns 4 columns 'path_to_file/1_Summary.txt'
3 -- 34 columns 4 columns 'path_to_file/1_Summary.txt'
4 -- 34 columns 4 columns 'path_to_file/1_Summary.txt'
5 -- 34 columns 4 columns 'path_to_file/1_Summary.txt'
... ... .......... ......... .............................................................................
See problems(...) for more details.

Error in if (any(str_detect(.name[i], c("TCRA", "TRAV", "TCRG", "TRGV", :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: Missing column names filled in: 'X34' [34]
2: Missing column names filled in: 'X34' [34]

Is there a way to order my samples in a specific order?

For example, if I visualize my data with vis(.by = cluster, .meta=immdata.opc.tcr$meta), the orders are determined by alphabetical. However, I would like to reorder the group into specific orders (Immune Rich, Mixture, Immune Desert). Is there a way I could do this in R? I have retried rearranging the order of the factors in immdata.opc.tcr$meta, but it doesn't seem to work.

Thanks!

The problem of multiple chain types

Hi,

I got a question about the chain type and the clonotype definition in the tools.

It seems that each chain (no matter TCR α/β) in the same data file will be considered one type of clonotype right, instead of considering a combination of them?

So it means that different kind of chains should not be put in the same input file?

Are there any reference for this?

Many thanks,
Meng

Installation issues

Hello,

I'm getting an error when installing the package using:

devtools::install_url("https://github.com/immunomind/immunarch/raw/master/immunarch.tar.gz")

Here's the error:

Error: (converted from warning) package 'dtplyr' was built under R version 3.6.2
Execution halted
ERROR: lazy loading failed for package 'immunarch'
Error: Failed to install 'unknown package' from URL:
(converted from warning) installation of package ‘C:/Users/X/AppData/Local/Temp/RtmpQ1Pgsx/file3b747cab2090/immunarch_0.5.4.tar.gz’ had non-zero exit status

I also tried installing it locally and get the same message. How can I fix this?

Thanks

Install failed with "Error: object ‘tbl_dt’ is not exported by 'namespace:dtplyr'"

🐛 Bug

Dear all,

I failed in installing Immunarch always with this error:

Error: object ‘tbl_dt’ is not exported by 'namespace:dtplyr'
Execution halted
ERROR: lazy loading failed for package ‘immunarch’

I tried install automatically:
devtools::install_url("https://github.com/immunomind/immunarch/raw/master/immunarch.tar.gz")
or manually:

install.packages(c("BiocManager", "covr", "dbscan", "doFuture", "DT", "future", "glmnet", "hdf5r", "hexbin", "Hmisc", "plotly", "prodlim", "R.oo", "RcppAnnoy", "RcppArmadillo", "RcppParallel", "RJSONIO", "rvest", "Seurat", "slam"))
devtools::install_local("~/Downloads/immunarch.tar.gz", dependencies=T)

I tried remove.packages(dtplyr) and reinstall it and it didn't help.
Here's the traceback():

8: stop(remote_install_error(remotes[[i]], e))
7: value[[3L]](cond)
6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
5: tryCatchList(expr, classes, parentenv, handlers)
4: tryCatch(res[[i]] <- install_remote(remotes[[i]], ...), error = function(e) {
       stop(remote_install_error(remotes[[i]], e))
   })
3: install_remotes(remotes, dependencies = dependencies, upgrade = upgrade, 
       force = force, quiet = quiet, build = build, build_opts = build_opts, 
       build_manual = build_manual, build_vignettes = build_vignettes, 
       repos = repos, type = type, ...)
2: pkgbuild::with_build_tools({
       ellipsis::check_dots_used(action = getOption("devtools.ellipsis_action", 
           rlang::warn))
       {
           remotes <- lapply(path, local_remote, subdir = subdir)
           install_remotes(remotes, dependencies = dependencies, 
               upgrade = upgrade, force = force, quiet = quiet, 
               build = build, build_opts = build_opts, build_manual = build_manual, 
               build_vignettes = build_vignettes, repos = repos, 
               type = type, ...)
       }
   }, required = FALSE)
1: devtools::install_local("~/Downloads/immunarch.tar.gz", dependencies = T)

here's sessionInfo():

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3        rstudioapi_0.10   magrittr_1.5      usethis_1.5.1     devtools_2.2.1    pkgload_1.0.2    
 [7] R6_2.4.1          rlang_0.4.1       tools_3.6.1       pkgbuild_1.0.6    sessioninfo_1.1.1 cli_1.1.0        
[13] withr_2.1.2       ellipsis_0.3.0    remotes_2.1.0     assertthat_0.2.1  digest_0.6.22     rprojroot_1.3-2  
[19] crayon_1.3.4      processx_3.4.1    callr_3.3.2       fs_1.3.1          ps_1.3.0          testthat_2.3.0   
[25] memoise_1.1.0     glue_1.3.1        compiler_3.6.1    desc_1.2.0        backports_1.1.5   prettyunits_1.0.2

I think this should not be a big issue but I cannot make it through. Hope anyone would help me. Thanks a lot to the community!

immunomind / immunarch Goto Github PK

immunarch's Introduction

immunarch --- Fast and Seamless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires in R

Why immunarch?

Lightning-fast Start

From Berkeley with devotion

Table of Contents

Introduction

Contact

Installation

Latest release on CRAN

Latest release on GitHub

Latest pre-release on GitHub

Key Features

Quick start

Use immunarch data

Use your own data

Advanced methods

Bugs and Issues

Help the community

Citation

License

Commercial Support

immunarch's People

Contributors

Stargazers

Watchers

Forkers

immunarch's Issues

🐛 Bug

📚 Documentation

🐛 Bug

To Reproduce

Expected behavior

Additional context

🐛 Bug

To Reproduce

Expected behavior

Additional context

🐛 Bug

To Reproduce

Expected behavior

Additional context

🚀 Feature

🐛 Bug

Task - GitlabExodus

❓ Questions and Help

🚀 Feature

Motivation

🐛 Bug

To Reproduce

Expected behavior

rarefraction - seq.default error

🐛 Bug

To Reproduce

Expected behavior

Additional comments

🐛 Bug

To Reproduce

Expected behavior

Additional context

📚 Documentation

🐛 Bug

To Reproduce

Expected behavior

Additional context

🐛 Bug

To Reproduce

Additional context

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Task - Example

🐛 Bug

Email 1

Email 2

❓ Questions and Help

🐛 Bug

`immunarch` --- Fast and Seamless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires in R

Why `immunarch`?

Use `immunarch` data