endoPredict, gene70, gene76 and possible oncotypedx functions don't appear to work

Hi there,

This is my first ever Issue submission for a github project, so please bear with me.

I am trying to run the functions I have listed above (endoPredict, gene70, gene76 and oncotypedx, and they were all producing NULL information, or in some cases, exactly the same risk / score using my data. When I tried to run the example script (usually something along the lines of
rs.vdxs <- function(data=data.vdxs,_ annot=annot.vdxs, do.mapping=TRUE), either with or without the mapping, I also just got NULL results back, leading me to believe that there might be something off about the functions?

Running the latest version of R (4.0.3) and Genefu (2.22.0).

Any help would be much appreciated, thanks!

Centroids of PAM50 and Different result for TCGA-BRCA RNA-seq data

I am using the genefu package to subtype my breast cancer RNA-seq data to predict subtypes. For that purpose, I use pam50.robust data. However, I realized that the centroid values changed. I tried it before (almost one month ago) and saved the centroids, and I realized that the values are not the same as the present ones. I could not understand the reason since it should use the official centroids from Parker, 2009, as far as I know.

Also, another question is that I subtyped TCGA-BRCA data with pam50.robust, and It did not give the same result with TCGABioLinks which was also subtyped with PAM50 (not using genefu package). Does the prediction depend on the data? Or genefu package creates some differences compared to PAM50 alone?

package not available for R version 3.1.2

Hi!

I am trying to install genefu and it doesn't seem to work for the new version of R 3.2.2.
I tried on another machine as well, with R running 3.1.2.
I also tried installing from url:
> install.packages(url="http://bioconductor.org/packages/genefu/", 'genefu') Warning in install.packages : package ‘genefu’ is not available (for R version 3.1.2)

Thanks in advance!

Subtyping mouse samples

Hello,

I'm trying to subtype (PAM50) mouse breast cancer samples (RNAseq) and I was wondering if Genefu accounts for the fact that the genes in mouse have different gene id compared to their human orthologs. Please advice.
Thanks.

About claudinLow function to analyze three groups of patients

Hi "genefu" package team:

I have three groups of patients and plan to perform claudinLow analysis on them, but I find that such an error will be reported:
Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent.

When I use only two groups of patients, it works perfectly.

I found that the error is caused by these two lines of code of the claudinLow function

colnames(distances) <- c("euclidian distance to Claudin-low", "euclidian distance to Others")

What I want to know is if there is a way to perform claudinLow analysis on the three groups of patients.

Thank you !!

oncotypedx risk score NA

I am trying to use the powerful R package "genefu"to compute the risk score of my dataset.
Following the user's guide and vignettes provided by bioconductor platform, however,I just get the all reulsts NA.
my R code is as following:

##ddata has been prepared as a Matrix of RNA-seq gene expressions with samples in rows and probes in columns
##dnanot has been prepared as Matrix of annotations, first colums:gene symbol(colname probe), second: gene.symbol, third, EntrezGene.ID

rs.exp<-oncotypedx(data=ddata,annot=dannot,mapping=TRUE)
table(rs.exp$score)
##< table of extent 0 > the result be returned, that is, all score are NA.

PAM50 classification

Hello,

I am using your package genefu for PAM50 classification of tumors. I have observed that the samples analyzed with HGU133A does not contain 6 of the 50 genes of PAM50. ANLN, CDC20, NUF2, CXXC5, GPR160, KNTC2AP and PHGDH. You used in your example case 2 databases that were analyzed with HGU133A (GSE11121 and GSE7390) and you give a PAM50 classification of the samples. I would like to know how do you perform it? Did you check before that the result is the same with or without these 6 genes? Do you have some bibliography support for this decision?

Thank you very much for your help inn advance.

Best regards

molecular.subtyping error

Hello!
I am trying to analyze a database with the package genefu, but I have a problem that I could not solve.
I have prepare a matrix data with the gene expression (ddata) and a matrix with the annotations (dannot).
When I run the function "molecular.subtyping" or "intrinsic.cluster.predict" I always have the same result:

Error in rep(NA, nrow(data)) : invalid 'times' argument
In addition: Warning message:
In geneid.map(geneid1 = gid, data1 = data, geneid2 = centroids.gid, :
no gene ids in common!

I have checked if I have repeated data, samples or genes, and is still given the same problem.
Could you help me with this question?

Thank you very much I advance

Potential mapping issue

https://www.biostars.org/p/313540/

molecular.subtyping

Dear Haibe-Kains:
When we used molecular.subtyping function. The information exhibited Error in intrinsic.cluster.predict(sbt.model = pam50.robust, data = data, : object 'pam50.robust' not found. However, our input was correct format. A matrix of samples(rows) x genes(cols). Data was as follow:
AL627309.5 LINC01409 FAM87B
TCGA.97.7938 0.58079576 0.8061310 0.6265260
TCGA.J2.A4AG 0.31885458 0.8467873 0.7131940
The same error did not appear in genefu introduction, instead of "sbt.vdx.SCMGENE" function of rdrr.io (https://rdrr.io/bioc/genefu/man/molecular.subtyping.html). Thank you very much.

preprocessing of test data for genefu

Dear genefu team,

Thank you for providing this great resource! I have a few questions regarding the preprocessing of test data for use with genefu, in particular for PAM50 classification. I understand that there's no 'one-fits-all' approach, but would appreciate your input/recommendations, please.

Based on the examples in the vignette, I assume the test data matrix is expected to contain all genes/probes on the array and the 'molecular.subtyping' function then extracts the 50 PAM50 genes based on the provided annotation? How is the data from multiple probes collapsed to gene level? What if the input data matrix only contained those 50 genes, would that affect the classification results?
Since PAM50 is based on microarray data, I assume the test data is expected to be log2intensities. Is there any normalisation of the test data expected/recommended before input into genefu eg. quantile normalisation across samples and/or gene-wise scaling? Or does the 'molecular.subtyping' function do any required normalisation 'under the hood'?
In Cascianelli et al 2020, the authors state that "before calculating distances from subtype centroids, gene expression values for each sample must be transformed into Log2ratios against a reference sample, to be defined for each dataset. Typically, to avoid representation bias, such reference is constructed within the dataset by calculating for each gene the median across a subset of samples with a fixed proportion (60/40) of Estrogen Receptor-positive (ER+) and -negative (ER−) cases, as done for the original PAM50 training." Is this something genefu does in the background or is the user expected to provide the input data matrix as log2ratios to a 'reference' as described above?
I have seen the other entries regarding RNAseq data input and appreciate that PAM50 has not been designed for rnaseq data. However, in your Fumagalli et al 2014 publication, you have used log2(FPKM+1) values and thus, would you think this is a good place to start for classification on rnaseq data using genefu or has your view on this changed since then?
What's the difference between molecular.subtyping(), intrinsic.cluster.predict() and subtype.cluster.predict() ... are the latter ones just older/deprecated versions of the first function?

Kind regards, rocanja

Different results when subtyping with AIMS within genefu and using AIMS package

Hello. I ran AIMS classification using the genefu package. To cross-check my results I did the same using the AIMS package. The conclusion is that the results differ. I get different subtypes assigned to the same samples, and also the probability scores are much lower when using genefu. Why is this happening? Note, I have used exactly the same gene expression data for both packages.

Can this package be applied to RNA-seq data?

Hi.
Can this package be applied to RNAseq Data, such as the ontypedx and gene70 functions? OR only Affymetrix data is allowed?

RORS vs RORP

Currently: PAM50 risk of relapse score based on subtype (RORS)
Need to investigate addition/modification of ROR based on subtype and proliferation (RORP)

question about samples treatment conditions when using molecular.subtyping

Hi, hope this message finds you well. I found this tool super helpful, I can run the code successfully which I am very happy about. But I have a question about the proper way to use this tool. The RNA-Seq samples I am testing coming from breast cancer patients that are either treatment naive or after having chemo/radio therapies. My question is that for this subtyping tool, does it have any requirement on patients' treatment condition? one more question is that if this tool only worked on primary tumor, or can also work on metastatic tumor? Many thanks to your time and help.

AIMS classification fails under windows while works perfectly under linux

Hi, sorry for bothering but Its been a while since I've been using the Genefu package under linux, and now, I'm testing the exact same code under windows and getting the following problem:

ALL= loadBreastEsets(loadString =c( "UPP","TRANSBIG"), removeDuplicates = TRUE,
                     quantileCutoff = 0, rescale = FALSE, minNumberGenes = 0,
                     minNumberEvents = 0, minSampleSize = 0, removeRetracted = TRUE,
                     removeSubsets = TRUE, keepCommonOnly = FALSE, imputeMissing = FALSE)

esets=ALL$esets

i="UPP"
Dup=unique(fData(esets[[i]])[which(duplicated(fData(esets[[i]])$EntrezGene.ID)),"EntrezGene.ID"])
  Var= apply(exprs(esets[[i]]),1,var)
  drop=NULL
  for(j in Dup){
   pos=which(fData(esets[[i]])$EntrezGene.ID==j)
    drop= c(drop,pos[-which.max(Var[pos])])
    
  }
  esets[[i]]=esets[[i]][-drop,]

featureNames(esets[[i]]) <- fData(esets[[i]])$EntrezGene.ID

annot= fData(esets[[i]])
colnames(annot)[2]="Gene.Symbol"
annot$probe=annot$EntrezGene.ID

#Perform molecular subtyping
AIMS<-  molecular.subtyping(sbt.model = "AIMS",data = t(exprs(esets[[i]])), annot = annot,do.mapping = TRUE)
pData(esets[[i]])$AIMS=AIMS$subtype

#WINDOWS OUTPUT:
#You are missing the pair or have more than one 11004<25759 in 
#Current k = 20
#Error in if (object$isnumeric[i] != is.numeric(newdata[[i]])) warning(paste0("Type mismatch #between training and new data for variable '",  :  argument is of length zero

#LINUX OUTPUT (No error):
#You are missing the pair or have more than one 11004<25759 in 
#Current k = 20

WINDOWS sessionInfo()

sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Argentina.1252 LC_CTYPE=Spanish_Argentina.1252 LC_MONETARY=Spanish_Argentina.1252
[4] LC_NUMERIC=C LC_TIME=Spanish_Argentina.1252

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] genefu_2.14.0 AIMS_1.14.1 BiocVersion_3.8.0 e1071_1.7-1
[5] iC10_1.5 iC10TrainingData_1.3.1 pamr_1.56.1 cluster_2.0.9
[9] biomaRt_2.38.0 limma_3.38.3 mclust_5.4.3 survcomp_1.32.0
[13] prodlim_2018.04.18 survival_2.44-1.1 MetaGxBreast_1.2.0 ExperimentHub_1.8.0
[17] AnnotationHub_2.14.5 impute_1.56.0 lattice_0.20-38 Biobase_2.42.0
[21] BiocGenerics_0.28.0

loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 prettyunits_1.0.2 class_7.3-15
[4] assertthat_0.2.1 digest_0.6.18 mime_0.6
[7] R6_2.4.0 stats4_3.5.2 RSQLite_2.1.1
[10] BiocInstaller_1.32.1 httr_1.4.0 rlang_0.3.4
[13] progress_1.2.2 curl_3.3 rstudioapi_0.10
[16] blob_1.1.1 S4Vectors_0.20.1 Matrix_1.2-17
[19] splines_3.5.2 stringr_1.4.0 RCurl_1.95-4.12
[22] bit_1.1-14 shiny_1.3.2 compiler_3.5.2
[25] httpuv_1.5.1 pkgconfig_2.0.2 htmltools_0.3.6
[28] interactiveDisplayBase_1.20.0 IRanges_2.16.0 XML_3.98-1.19
[31] crayon_1.3.4 later_0.8.0 bitops_1.0-6
[34] SuppDists_1.1-9.4 grid_3.5.2 xtable_1.8-4
[37] DBI_1.0.0 magrittr_1.5 amap_0.8-16
[40] KernSmooth_2.23-15 stringi_1.4.3 promises_1.0.1
[43] rmeta_3.0 lava_1.6.5 tools_3.5.2
[46] bit64_0.9-7 hms_0.4.2 yaml_2.2.0
[49] AnnotationDbi_1.44.0 BiocManager_1.30.4 survivalROC_1.0.3
[52] bootstrap_2017.2 memoise_1.1.0

LINUX sessionInfo()

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8
[10] LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] grid parallel stats4 stats graphics grDevices utils datasets methods
[10] base

other attached packages:
[1] reshape2_1.4.3 dplyr_0.7.8 gageData_2.20.0
[4] gage_2.32.1 RColorBrewer_1.1-2 bindrcpp_0.2.2
[7] consensusOV_1.4.1 MetaGxOvarian_1.2.0 hgu133plus2.db_3.2.3
[10] edgeR_3.24.3 TCGAbiolinks_2.10.3 xlsx_0.6.1
[13] a4Base_1.30.0 a4Core_1.30.0 a4Preproc_1.30.0
[16] glmnet_2.0-16 Matrix_1.2-15 multtest_2.38.0
[19] genefilter_1.64.0 mpm_1.0-22 KernSmooth_2.23-15
[22] annaffy_1.54.0 KEGG.db_3.2.3 GO.db_3.7.0
[25] GEOquery_2.50.5 MetaGxBreast_1.2.0 ExperimentHub_1.8.0
[28] AnnotationHub_2.14.2 impute_1.56.0 nsga2R_1.0
[31] mco_1.0-15.1 annotate_1.60.0 XML_3.98-1.16
[34] org.Hs.eg.db_3.7.0 AnnotationDbi_1.44.0 WGCNA_1.66
[37] fastcluster_1.1.25 dynamicTreeCut_1.63-1 biclust_2.0.1
[40] colorspace_1.3-2 MASS_7.3-51.1 shiny_1.2.0
[43] subspace_1.0.4 survminer_0.4.3 ggpubr_0.2
[46] magrittr_1.5 genefu_2.14.0 AIMS_1.14.0
[49] e1071_1.7-0 iC10_1.4.2 iC10TrainingData_1.3.1
[52] pamr_1.55 biomaRt_2.38.0 limma_3.38.3
[55] mclust_5.4.2 illuminaio_0.24.0 caret_6.0-81
[58] ggplot2_3.1.0 lattice_0.20-38 matchingR_1.3.0
[61] Rcpp_1.0.0 gpuR_2.0.0 cluster_2.0.7-1
[64] cba_0.2-19 proxy_0.4-22 doParallel_1.0.14
[67] iterators_1.0.10 foreach_1.4.4 gplots_3.0.1
[70] survcomp_1.32.0 prodlim_2018.04.18 survival_2.43-1
[73] SummarizedExperiment_1.12.0 DelayedArray_0.8.0 BiocParallel_1.16.2
[76] matrixStats_0.54.0 Biobase_2.42.0 GenomicRanges_1.34.0
[79] GenomeInfoDb_1.18.1 IRanges_2.16.0 S4Vectors_0.20.1
[82] BiocGenerics_0.28.0 igraph_1.2.4

loaded via a namespace (and not attached):
[1] Hmisc_4.1-1 class_7.3-14
[3] assertive.properties_0.0-4 Rsamtools_1.34.1
[5] crayon_1.3.4 nlme_3.1-137
[7] backports_1.1.3 sva_3.30.0
[9] rlang_0.3.0.1 XVector_0.22.0
[11] rjson_0.2.20 cmprsk_2.2-7
[13] bit64_0.9-7 glue_1.3.0
[15] tidyselect_0.2.5 km.ci_0.5-2
[17] tidyr_0.8.2 assertive.types_0.0-3
[19] zoo_1.8-4 SuppDists_1.1-9.4
[21] GenomicAlignments_1.18.1 xtable_1.8-3
[23] zlibbioc_1.28.0 hwriter_1.3.2
[25] rstudioapi_0.8 rpart_4.1-13
[27] GSVA_1.30.0 xfun_0.4
[29] caTools_1.17.1.1 KEGGREST_1.22.0
[31] tibble_1.4.2 interactiveDisplayBase_1.20.0
[33] flexclust_1.4-0 ggrepel_0.8.0
[35] base64_2.0 assertive.sets_0.0-3
[37] xlsxjars_0.6.1 Biostrings_2.50.1
[39] png_0.1-7 ipred_0.9-8
[41] withr_2.1.2 bitops_1.0-6
[43] plyr_1.8.4 assertive.base_0.0-7
[45] GSEABase_1.44.0 pcaPP_1.9-73
[47] assertive.models_0.0-2 ggvis_0.4.4
[49] pillar_1.3.1 GlobalOptions_0.1.0
[51] GenomicFeatures_1.34.3 assertive.matrices_0.0-2
[53] GetoptLong_0.1.7 assertive.reflection_0.0-4
[55] generics_0.0.2 lava_1.6.4
[57] tools_3.5.1 foreign_0.8-71
[59] munsell_0.5.0 fit.models_0.5-14
[61] compiler_3.5.1 httpuv_1.4.5.1
[63] rtracklayer_1.42.1 assertive.data.uk_0.0-2
[65] rJava_0.9-10 GenomeInfoDbData_1.2.0
[67] gridExtra_2.3 assertive.data.us_0.0-2
[69] later_0.7.5 recipes_0.1.4
[71] jsonlite_1.6 scales_1.0.0
[73] graph_1.60.0 lazyeval_0.2.1
[75] promises_1.0.1 latticeExtra_0.6-28
[77] R.utils_2.7.0 checkmate_1.8.5
[79] downloader_0.4 selectr_0.4-1
[81] yaml_2.2.0 survivalROC_1.0.3
[83] htmltools_0.3.6 memoise_1.1.0
[85] modeltools_0.2-22 locfit_1.5-9.1
[87] digest_0.6.18 rrcov_1.4-7
[89] assertthat_0.2.0 mime_0.6
[91] KMsurv_0.1-5 DESeq_1.34.1
[93] assertive.code_0.0-3 RSQLite_2.1.1
[95] amap_0.8-16 assertive.strings_0.0-3
[97] data.table_1.11.8 blob_1.1.1
[99] R.oo_1.22.0 survMisc_0.5.5
[101] preprocessCore_1.44.0 shinythemes_1.1.2
[103] splines_3.5.1 Formula_1.2-3
[105] RCurl_1.95-4.11 broom_0.5.1
[107] assertive.numbers_0.0-2 hms_0.4.2
[109] ConsensusClusterPlus_1.46.0 base64enc_0.1-3
[111] BiocManager_1.30.4 shape_1.4.4
[113] EDASeq_2.16.3 assertive.files_0.0-2
[115] nnet_7.3-12 matlab_1.0.2
[117] mvtnorm_1.0-8 circlize_0.4.5
[119] ModelMetrics_1.2.2 R6_2.3.0
[121] acepack_1.4.1 ShortRead_1.40.0
[123] curl_3.2 gdata_2.18.0
[125] robustbase_0.93-4 assertive.data_0.0-3
[127] stringr_1.3.1 gower_0.1.2
[129] htmlwidgets_1.3 purrr_0.2.5
[131] rvest_0.3.2 ComplexHeatmap_1.20.0
[133] mgcv_1.8-25 openssl_1.1
[135] htmlTable_1.12 robust_0.4-18
[137] codetools_0.2-15 lubridate_1.7.4
[139] randomForest_4.6-14 gtools_3.8.1
[141] prettyunits_1.0.2 R.methodsS3_1.7.1
[143] gtable_0.2.0 DBI_1.0.0
[145] aroma.light_3.12.0 httr_1.4.0
[147] stringi_1.2.4 progress_1.2.0
[149] ggthemes_4.1.0 timeDate_3043.102
[151] xml2_1.2.0 assertive_0.3-5
[153] assertive.datetimes_0.0-2 rmeta_3.0
[155] additivityTests_1.1-4 readr_1.3.1
[157] geneplotter_1.60.0 DEoptimR_1.0-8
[159] bit_1.1-14 pkgconfig_2.0.2
[161] bootstrap_2017.2 bindr_0.1.1
[163] knitr_1.21

Pam50 and Pam50.robust models are identical

Hi,

there might be a mistake related to pam50 and pam50.robust models.
The help page states:

pam50
Use of the official centroids without scaling of the gene expressions.

pam50.scale
Use of the official centroids with traditional scaling of the gene expressions (see scale).

pam50.robust
Use of the official centroids with robust scaling of the gene expressions (see rescale).

However, the models differ from each other only regarding the attribute standardization (std).
The following code

sapply(names(genefu::pam50),
            function(x) identical(genefu::pam50[x],genefu::pam50.robust[x]))

results in

      method.cor method.centroids              std        rescale.q             mins 
            TRUE             TRUE            FALSE             TRUE             TRUE 
       centroids    centroids.map 
            TRUE             TRUE

The question is: these centroids match the ones found at https://genome.unc.edu/pubsup/breastGEO/pam50_centroids.txt, but are these the scaled or not scaled version?

Having both exactly the same (i.e. assuming they are supposed to be the same), except for the standardization parameter, is misleading because people might attempt to use the non-standardized/scaled model with their standardized/scaled data (or vice-versa) and get completely wrong results.

I hope it will help!

Regards

Renato

Update:

The same happens with pam50.scale indicating all of them are identical except for the std variable.

Seurat and genefu

Hello!

Thank you for the excellent package! I would like to use genefu's molecular.subtyping() function (using the pam.50.robust model) on my Seurat object, and was wondering whether the Seurat object should be

only normalized beforehand with NormalizeData()
additionally scaled after normalization using ScaleData()

Thank you for reading!

PAM50

Consider the differing PAM50 calls (Genefu vs. Curtis vs. Parker) - should we add another 'official' PAM50?

Eric Paquet and Michael Hallet’s recent paper last year (“Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype”, JNCI 2014) had performed an assessment of PAM50 calls across the genefu, Parker, and Curtis methods and indicated a difference in PAM50 calls across the three.

How to use an in-house-dataset to evaluate centroids for PAM50 method

Dear developer,
I was looking for a method to use my own dataset to evalute centroids for intrinsic.cluster.predict function.
I managed to evaluate my own centroids using intrinsic.cluster but then I was not able to recapitulate the five different subtype ( THe clusters has as name cluster.1 cluster.2 cluster.3 ecc).
Is it possible to fix this problem?

Problem with subtype and crisp classification with AIMS

As I understand it what you call the subtype.crisp should be a binary matrix with 1 in the subtype where the sample is classified and 0 otherwise. If you run your example:

data(vdxs)
sbt.vdxs.AIMS <- molecular.subtyping(sbt.model="AIMS",data=data.vdxs, annot=annot.vdxs,do.mapping=FALSE)

You can see that some samples are classified differently in sbt.vdxs.AIMS$subtype and sbt.vdxs.AIMS$subtype.crisp. For example, VDX_78 is classified as HER2 but in the crisp matrix it has a 1 in LumA (other problems are VDX_93, VDX_612, VDX_9 etc.). I went through your raw script (molecular.subtyping.R) and I think the problem is when you build the probability matrix. You do it by row but actually it should be by column.
Instead of (line 126 in molecular.subtyping.R):

sbts$subtype.proba <- matrix(unlist(sbts$all.probs$`20`), ncol = 5, byrow = TRUE)

it should be

sbts$subtype.proba <- matrix(unlist(sbts$all.probs$`20`), ncol = 5)

I tried to run it like that and I have the impression that it’s working.
Best,
Laurence

Genefu package

Click preview tab ^^^ above!

By continuing to file this new issue / feature request, I confirm I have :

Searched open and closed issues on this repository to ensure I am not duplicating an issue for a previously resolved problem, known bug or existing feature request.
Read all package documentation relevant to the functions or classes causing the problem, including the help pages. For example, by running ?functionName in the R terminal or using the help() function as well as consulting with relevant package vignettes.

Thanks! Please remove the text above and include the two items below.

# Minimal reproducible example; please be sure to set verbose=TRUE where possible!

# Output of sessionInfo()

Need for a column called "probe" in "annot" for "molecular.subtyping" and "intrinsic.cluster.predict" but not "oncotypedx" and "gene70"

Hey y'all,

It would be great if the help of "molecular.subtyping" and "intrinsic.cluster.predict" would indicate the need for a column called "probe" in "annot".

Right now it says only that at least one column named "EntrezGene.ID" (for ssp, scm, AIMS, and claudinLow models) or "Gene.Symbol" (for the intClust model) is needed, but a column called "probe" is also needed.

This is problematic because other functions such as "oncotypedx" and "gene70" require the rownames of "annot" to be the probe names, instead of a column called "probe" with the probe names.

It would be even better if this was unified (e.g., either "annot" always needs a "probe" column or it never does).

Cheers and thanks!

ggi() and gene70() commands input files

Hello,

I'm working on different bulk RNA-seq dataset and I've tried to compute GENE70, GGI and PAM50 classifications with "genefu", but I was only able to obtain the PAM50 classification. Both GENE70 and GGI there was no way to make them work.
If I've understood well, the commands to compute PAM50, GGI and GENE70 scores are (respectively) the following:

molecular.subtyping(sbt.model = "pam50", data=matrix, annot=annotation, do.mapping=F)
ggi(data=matrix, annot=annotation)
gene70(data=matrix, annot=annotation)

where "matrix" is my expression matrix with rownames as sample names and colnames as gene names (in my case NCBI gene symbols), and "annotation" is a dataframe with a column containing the NCBI gene symbols and a column containing the respective EntrezGene.ID.

As I've said, the PAM50 classification works, but the other two commands no.

In particular, "ggi" command runs but I guess it's not able to map any of the genes (all the GGI scorse are "NA"). If I put "do.mapping = T", I obtain a error saying:
"Error in data1[, gg.uniq, drop = FALSE] : subscript out of bounds").

For "gene70", if I don't specify "do.mapping = T" I obtain the error:
"Error in gene70(data = t(as.matrix(visium_brain@norm_expr)), annot = sig.ggi) :
object 'res' not found
In addition: Warning message:
In gene70(data = t(as.matrix(visium_brain@norm_expr)), annot = sig.ggi) :
No overalp between the gene signature EntrezGene.IDsand the colnames of your data... Returning all NAs."
If I put "do.mapping = T", I obtain the error:
"Error in data1[, gg.uniq, drop = FALSE] : subscript out of bounds".

The "matrix" and the "annotation" that I've used for PAM50 are exactly the same as the ones used for ggi and gene70.

Could someone help me to solve this issue? I guess there could be something wrong in the "annotation" file, but the weird thing is that it works well with the PAM50 command.

Thank you in advance

Subtype.Proba for CL

In the molecular subtyping function, need someone to revisit the subtype.proba output that is generated when using the Claudin low classifier.

To prevent ambiguity, the subtype.crisp calls for the CL classifier are generated based on the original classifier calls (and not using the subtype.proba)

Using genefu with other tumor than breast cancer

@bhklab

I am new to the concept of the classification, I am doing some sort of tutorial with other solid tumor different from breast cancer. Would you please guid me how to proceed?

I have my own subtype signature with 3 subtype of about 500 genes
I have a gene expression of 20000 genes with 700 samples

The current status:
clast<-intrinsic.cluster(data=data, annot=annot, do.mapping=TRUE, std="robust", intrinsicg=intr, number.custer=3,mins=3, method.cor="spearman", method.centroids="mean", verbos=TRUE)

489/500 probes are used in clustering

Note:

dim(data)
#700 20000
dim(annot)
#20000 3
dim(intr)
# 500 3

I got "clast" object as a list, but I don't know how to assign the cluster to my signature list with three subtypes. I don't know how this signature generated but I have to use the 500 genes and 3 subtypes to subtype the 700 samples.

names(clast)
# "model" "subtype" "subtype.proba" "cor"

"subtype" and "subtype.proba" have names as cluster.1, cluster.2 and cluster.3

How should I associate the 3 subtype names in my signature gene list with the above cluster?

Note: the signature subtyper is a matrix of the following format
...
,Subtype1,subtype2,subtype 3
Gene1,2.000,-1.000 , 0.000
Gene500,3.000, 2.000,0.7222
...
Thank you for your guidance!

Sorry it is my first time to try it

bhklab / genefu Goto Github PK

genefu's People

Contributors

Stargazers

Watchers

Forkers

genefu's Issues

Thanks! Please remove the text above and include the two items below.

489/500 probes are used in clustering

Recommend Projects

Recommend Topics

Recommend Org