ajitjohnson / imsig Goto Github PK

View Code? Open in Web Editor NEW

25.0 25.0 7.0 292 KB

Immune Cell Gene Signatures for Profiling the Microenvironment of Solid Tumours

R 100.00%

cancer deconvolution immune transcriptomics

imsig's People

Contributors

Stargazers

Watchers

Forkers

simewolde bioinfonerd-forks yuanjingnan codyheiser bioinformatic-analysis qindan2008

imsig's Issues

Error in exp[as.character(g), ] : incorrect number of dimensions

I am trying to run imsig on some mouse gene expression datasets I have. I get this error:

Error in exp[as.character(g), ] : incorrect number of dimensions

Although these are mouse datasets, the gene names are in HGNC format. I checked the multisymbol checker here: https://www.genenames.org/tools/multi-symbol-checker/ to ensure that my genes are represented in the HGNC database - 88.6% match approved symbols. There are no duplicate gene names. There are no missing values. Maybe I do not have enough overlap with the Imsig genes? but how can I check to see what the overlap is between the expression data and Imsig? I do not see where these are stored for me to check the overlap. I attached the expression dataset I am trying to run imsig on in case that is helpful.
exp_data_for_imsig.txt

How can plot_abundance() results be arranged by sample name?

Hello, it seems like the bar plot readouts for plot_abundance are sorted from some global average of low score -> high score. However, my samples are from a timecourse, I would like to see them in the correct order. I tried re-naming the samples by number, alphabetical order, etc. but they always sort by least to greatest average score. Is there an option to change this within the plot_abundance() function?

issue with basicstats function

Hello, firstly - thanks a lot for your work on this pipeline, I generated very interesting results from the imsig function. But I am running into an issue with the basicstats and plot_network functions requiring too much RAM:

Basicstats <- gene_stat(exp, r = 0.6)
---> Checking zero-variance data...
---> Total number of variables: 56505
---> WARNING: 16011 variables found with zero variance
Error: cannot allocate vector of size 12.2 Gb

This was with a data frame with just 5 columns. Row names are gene symbols and column names are sample IDs

Is this typical for this function or could I be doing something wrong (I suspect the latter since all the other functions in your package work just fine on my machine)

Thanks again

which data does plot_network rely on?

Greetings,

I wanted to use all relevant information to do the plotting outside of R.
So I thought the relevant data would be:

The matrix returned by the imsig function to assign the group to the nodes
The corr_matrix for the edges between the nodes (genes).

After I found out the corr_matrix can only be used by the other functions inside the imsig package I'm wondering whether my thoughts about network building are correct (that using exactely these two tables is sufficient) and whether there is any other possibility to extract those data for computing the edges.

Best regards,
Nicolas

error in running the example_data

Hi,

I'm getting an error in running the 'imsig' package as below:

cell_abundance = imsig(exp = example_data, r = 0.7)
Error in sig[sig$gene %in% as.character(g), ] :
incorrect number of dimensions

The example_data looks fine though:

dim(example_data)
[1] 568 60
head(example_data, n=6)
GSM512479.CEL GSM512480.CEL GSM512481.CEL GSM512482.CEL GSM512483.CEL
AFF3 342.4006 52.88693 78.28706 101.1696 43.63739
BANK1 2535.4058 280.72550 498.01262 390.2110 697.87245
BLK 814.3128 132.36280 304.39403 304.8484 264.01830
BTLA 391.7088 152.33024 119.84629 202.0018 255.36544
CCR6 537.1195 153.71538 154.45685 254.8858 175.81826
CD180 821.2340 296.36957 234.71300 273.3661 353.07089

Any idea?

Thanks,
natlasy

What does the time clarification add?

Hi,

I'm using the imsig_survival and plot_survival functions, but the addition of the time variable doesn't change the outcome of the test.

Is this how it supposed to be?

Thanks,
Willem

Error in fastCor(t(exp)) : invalid nSplit: 0

I have a data frame of 54 samples and 10715 genes with batch corrected FPKM values like this:

> dim(countDataPrim)
[1] 10715    54

> head(countDataPrim)[1:3]
        PD471_1_PPR_210 PD471_3_PPR_300 PD471_4_PPC_100
MT-CO1         12.37570        12.72105        13.23066
CLU            13.64955        13.12052        13.54428
MT-ATP8        12.02654        12.57541        13.37396
MT-CO2         11.70563        11.74762        12.72256
RPLP1          12.14725        12.45726        11.60549
RPL18A         12.32321        12.13765        12.57799

> summary(countDataPrim) # fpkm data, corrected by batch
 PD471_1_PPR_210  PD471_3_PPR_300  PD471_4_PPC_100  PD832_10_PPC_110 PD832_11_PPY_200 PD836_22_PPY_110 
 Min.   :-1.603   Min.   :-1.447   Min.   :-1.536   Min.   :-1.779   Min.   :-2.162   Min.   :-0.7232  
 1st Qu.: 3.293   1st Qu.: 3.453   1st Qu.: 3.353   1st Qu.: 3.353   1st Qu.: 3.379   1st Qu.: 3.4273  
 Median : 4.450   Median : 4.484   Median : 4.492   Median : 4.424   Median : 4.449   Median : 4.4664  
 Mean   : 4.419   Mean   : 4.532   Mean   : 4.488   Mean   : 4.450   Mean   : 4.493   Mean   : 4.5265  
 3rd Qu.: 5.604   3rd Qu.: 5.585   3rd Qu.: 5.638   3rd Qu.: 5.557   3rd Qu.: 5.592   3rd Qu.: 5.5214  
 Max.   :13.650   Max.   :13.121   Max.   :13.544   Max.   :13.172   Max.   :14.401   Max.   :12.9651  

# read genesets
geneset <- read.delim('../../data/immune_profiling/imsig.genes.txt', header = F)
> geneset %>%
+     filter(V1 %in% rownames(countDataPrim)) %>%
+     group_by(V2) %>%
+     summarise(n = n())
# A tibble: 9 x 2
  V2                n
  <fct>         <int>
1 B cells           4
2 Interferon       44
3 Macrophages      64
4 Monocytes        31
5 Neutrophils      36
6 Plasma cells      2
7 Proliferation    84
8 T cells          37
9 Translation      62

As you can see, i have at least 2 genes in the fpkm matrix that are overlapping in each geneset. But I am still getting the error:

> cell_abundance <- imsig(exp = countDataPrim)
---> Maximum number of splits: floor(n/2) = 0
---> WARNING: number of splits nSplit > 0
---> WARNING: using maximum number of splits: nSplit = 0
Error in fastCor(t(exp)) : invalid nSplit: 0

Please let me know how to debug this.

Error in colMeans(xt) : 'x' must be an array of at least two dimensions

Hi,

I am using the imsig function on a data frame with the dimensions: 3073x74 but I am getting an error:

cell_abundance <- imsig(exp = dat)
Error in colMeans(xt) : 'x' must be an array of at least two dimensions

The input matrix looks like this:

head(dat)
            850_1   850_12   850_15   850_16   850_19
HSP90AB1 45692.00 41446.00 41257.00 44824.00 45158.00
PKM      47462.64 25530.66 12935.48 13589.50 18485.20
RPSA     16776.18 37067.86 17202.98 17110.53 26381.38
ANXA2    14280.00 17008.00  8279.00  9320.00  8841.00
RPL4     13009.00 27260.00 13832.00 14174.00 20250.00
PTMA     19532.58 18214.23 19824.99 17947.67 18417.27

Let me know if you need more information.

"plot_network" error

Hey,
I'm facing "Error in cor_data[fg, fg] : subscript out of bounds" error while using plot_network. Please help!

could not find function "corr_matrix"

Hi Ajit,

All functions work fine using example_data, and I am able to use the package (excellent package, btw). However, the corr_matrix function doesn't work.

corr_matrix(example_data, r = 0.6)
Error in corr_matrix(example_data, r = 0.6) :
could not find function "corr_matrix"

Any idea what I'm doing wrong?

(Windows 7, R 3.5.1, imsig package ver 1.0)

Error in fastCor(t(exp)) : invalid nSplit: 0

Hi Ajit,

when using your package with your example data, everything works fine.

When applying it to a subset of our own data, there are several problems.

I removed rows with no variance using
data <- data[apply(data, 1, var, na.rm=TRUE)!= 0.0, ]

I also removed all GeneSymbols containing "-" to make sure that no symbols like Gnai3-201 remain in the dataset.

Applying imsig(data, r=0.0) gives:

---> Maximum number of splits: floor(n/2) = 0
---> WARNING: number of splits nSplit > 0
---> WARNING: using maximum number of splits: nSplit = 0
Error in fastCor(t(exp)) : invalid nSplit: 0

My data in the data.frame look like this:

;1; 2; 3; 4;
GDF15 ; 2.252020e-01; 2.139837e+02; 6.835993e+00; 4.944126e+01

I currently use a subset of about 200 genes with 30 columns.
I also made sure that there are no duplicate GeneSymbols in the data.

Thanks in advance,
Nicolas

Question regarding imsig() output

Hello!

I was able to successfully run ImSig on a set of transcriptomes and am now interpreting the results (the output for the imsig() function). How were the scores for each gene signature calculated? Scores across samples ranged from 0.2 to 100 in one signature and 900 - 3,000 in another. I assume that this is different from the r score?

Thank you very much!

Type of input data

Hi,

Just curious - what kind of input data is expected by imsig: TPM/FPKM/log2(counts)/raw counts?

Thanks
Komal

Error in fastCor(t(exp)) : invalid nSplit: 0

When I run gene_stat or imsig on my dataset, I see the following:

---> Maximum number of splits: floor(n/2) = 0
---> WARNING: number of splits nSplit > 0
---> WARNING: using maximum number of splits: nSplit = 0
Error in fastCor(t(exp)) : invalid nSplit: 0

Example of my data

head(exp)[1:6]

      sample.39954 sample.39549 sample.39340 sample.39335 sample.39782 sample.39476

Gnai3-201 1834.61000 774.60800 1965.65000 2226.2400 293.46900 230.23
Pbsn-201 0.00000 0.00000 0.00000 0.0000 0.00000 0.00
Hoxb9-201 6.10689 0.00000 7.74459 0.0000 1.24906 0.00
Cdc45-201 40.13370 8.42056 26.81490 25.5792 7.60989 0.00
Igf2-201 244.81500 15.69920 403.01100 316.1610 15.89030 0.00
Apoh-201 0.00000 115.75100 114.91700 0.0000 0.00000 0.00

is this due to low read values? Any help would be appreciated.
Thanks,
s7
<R 3.5.1 GUI 1.70 El Capitan build (7543)>

Error in cor_data[fg, fg] : subscript out of bounds

I am using imsig to validate my findings.

This is my code:
exp = readRDS('raw_counts.Rds')
exp <- log2(exp + 1)
gene_stat (exp = exp, r = 0.7)
imsig_main <- imsig (exp = exp, r = 0.7)
#here I am having an error
plot_abundance (exp = exp, r = 0.7)

#---> Checking zero-variance data...
#---> Total number of variables: 77
#---> WARNING: 2 variables found with zero variance
#---> Checking zero-variance data...
#---> Total number of variables: 83
#---> WARNING: 1 variables found with zero variance
#---> Checking zero-variance data...
#---> Total number of variables: 520
#---> WARNING: 3 variables found with zero variance
#Error in cor_data[fg, fg] : subscript out of bounds

how to add gene symbols on network?

Hi Ajith,

Thanks for great tool. Could you help, how to show gene symbols on plot.network?

Cheers,
Venkat

Showing NA in certain cell types while applying ImSig on scRNA-seq data

Hi Ajit,

I met some problem while applying ImSig on scRNA-seq data with the ImSig R-package.
The input data I was using is PBMC scRNA-seq data (scran normalize). (~10K cells)

I don't why the T-cell columns is all NA. There should be T-cell in PBMC data.

Here are screenshot of the result:

Thank you for your time!

Error in Surv(time, status) : Time variable is not numeric

I'm trying to perform the survival analysis but I keep getting the following error message:

Error in Surv(time, status) : Time variable is not numeric

I have a survival df which says the values in the two columns are both numeric so I'm not sure what the issue is!

str(BCON_os)
'data.frame': 234 obs. of 2 variables:
$ os.time : num 27 57.4 77.5 32.5 89.6 16 4.5 13.8 18.6 73.5 ...
$ os.status: num 1 1 0 1 0 1 1 1 1 1 ...
imsig_survival (exp = BCON_expr, cli = BCON_os, r = 0.6, time = 'os.time', status= 'os.status')
Error in Surv(time, status) : Time variable is not numeric

Please can anyone advise?

Question regarding plot_survival interpretation

EDIT: answered my own question!

ajitjohnson / imsig Goto Github PK

imsig's People

Contributors

Stargazers

Watchers

Forkers

imsig's Issues

Recommend Projects

Recommend Topics

Recommend Org