bioinformaticsfmrp / tcgabiolinks Goto Github PK

TCGAbiolinks

Home Page: http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html

R 98.42% TeX 0.97% HTML 0.51% CSS 0.11%

integrative-analysis bioc r tcgabiolinks bioconductor gdc tcga tcga-data

tcgabiolinks's Introduction

TCGAbiolinks - An R/Bioconductor package for integrative analysis with TCGA data

TCGAbiolinks is able to access The National Cancer Institute (NCI) Genomic Data Commons (GDC) thorough its GDC Application Programming Interface (API) to search, download and prepare relevant data for analysis in R.

Installation from GitHub

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("BioinformaticsFMRP/TCGAbiolinksGUI.data")
BiocManager::install("BioinformaticsFMRP/TCGAbiolinks")

Installation from Bioconductor

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("TCGAbiolinks")

Docker image

TCGAbiolinks is available as Docker image (self-contained environments that contain everything needed to run the software), which can be easily run on Mac OS, Windows and Linux systems.

This PDF show how to install and execute the image.

The image can be obtained from Docker Hub: https://hub.docker.com/r/tiagochst/tcgabiolinksgui/

For more information please check: https://docs.docker.com/ and https://www.bioconductor.org/help/docker/

Manual

http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html

Citation

Please cite both TCGAbiolinks package:

Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, Sabedot T, Malta TM, Pagnotta SM, Castiglioni I, Ceccarelli M, Bontempi G and Noushmehr H. "TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data." Nucleic acids research (2015): gkv1507.
Mounir, Mohamed, Lucchetta, Marta, Silva, C T, Olsen, Catharina, Bontempi, Gianluca, Chen, Xi, Noushmehr, Houtan, Colaprico, Antonio, Papaleo, Elena (2019). “New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx.” PLoS computational biology, 15(3), e1006701.
Silva TC, Colaprico A, Olsen C et al.TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages [version 2; peer review: 1 approved, 2 approved with reservations]. F1000Research 2016, 5:1542 (https://doi.org/10.12688/f1000research.8923.2)

Also, if you have used ELMER analysis please cite:

Yao, L., Shen, H., Laird, P. W., Farnham, P. J., & Berman, B. P. "Inferring regulatory element landscapes and transcription factor networks from cancer methylomes." Genome Biol 16 (2015): 105.
Yao, Lijing, Benjamin P. Berman, and Peggy J. Farnham. "Demystifying the secret mission of enhancers: linking distal regulatory elements to target genes." Critical reviews in biochemistry and molecular biology 50.6 (2015): 550-573.
Tiago C Silva, Simon G Coetzee, Nicole Gull, Lijing Yao, Dennis J Hazelett, Houtan Noushmehr, De-Chen Lin, Benjamin P Berman, ELMER v.2: an R/Bioconductor package to reconstruct gene regulatory networks from DNA methylation and transcriptome profiles, Bioinformatics, Volume 35, Issue 11, 1 June 2019, Pages 1974–1977, https://doi.org/10.1093/bioinformatics/bty902

tcgabiolinks's People

Contributors

Stargazers

Watchers

Forkers

scarltee wanvdphelys mikeaalv porter2006 martinguerrero89 alenzhao juggernaut93 paridhichoudhary sahilseth minghao2016 woodhaha cwt1 zhangqiaosheng fireindark707 npsdc inambioinfo liurong1993 zzygyx9119 elelab 6guojun jonaszierer pfrancislyon gynecoloji shkong86 ssicreative83 averissimo ewail melkaz gamerino qistark xiaofeng007 dtopouza timze216 datapioneer rvalieris arsheedganaie wendashou soulzqm reese3928 y1zhou slartibartfarst chrislou-bioinfo sergiollaneza transbioinfolab ananyaarun vincentzsq kuteng11 sqsun fw1121 jacorvar xwhld2019 lukun06 tjogzt tiagomaie the-weird romagnolid zhangyupisa ukulala20 ganqiao1990 baconchan hjb1996 swcyo akmazad aelmas rahijaingithub qtguan tanekim6 sunflower0822 brillianbestan changshengzi kzkedzierska yuzy1996 emathian shunsunsun xudeh humaasif archismanbanerjee lgb-cyber shicheng-guo yue-jiang ajaysinghpathania takeh1sa feeeengym wangzhichao1990 jingjing-echo kakawill46 shixiangwang siqi-cool qweasdzxc-r-code libingnan11 xutongran zky17715002 xiaolan552 wook2014 lea-meunier pratyusha-code zhuchcn zagrosman brain-ninja jmche

tcgabiolinks's Issues

Error: could not find function "assay"and "assays "

I have downloaded TCGAbiolinks and but these GDC... functions do not work .
What am I doing wrong? for example,

BRCARnaseq_assay <- GDCprepare(query)
BRCAMatrix <- assay(BRCARnaseq_assay,"raw_counts")

Best regards

error in loading library

library(TCGAbiolinks)
Error in namespaceImportFrom(ns, loadNamespace(j <- i[[1L]], c(lib.loc, :
lazy-load database 'C:/Program Files/R/R-3.3.1/library/stringr/R/stringr.rdb' is corrupt
In addition: Warning messages:
1: In namespaceImportFrom(ns, loadNamespace(j <- i[[1L]], c(lib.loc, :
restarting interrupted promise evaluation
2: In namespaceImportFrom(ns, loadNamespace(j <- i[[1L]], c(lib.loc, :
internal error -3 in R_decompress1
Error: package or namespace load failed for ‘TCGAbiolinks’

GDCquery_Maf for mutect2

Hi,

I am using the GDCquery_Maf function in TCGAbiolinks v2.3.5 to retrieve MAF files from GDC. It works fine when pipeline is any of 'muse','somaticsniper', or 'varscan2', but it fails when I try 'mutect2' (Error message: 'cannot download all files / URL 400 Bad Request')

The same error happens for any tumor type. Would you be able to enable download of these MAF files as well?

cheers,
Sigve

problems getting data II

I execute

query2a <- GDCquery(project = "TCGA-COAD",
data.category = "Gene expression",
data.type = "Exon quantification",
legacy = TRUE,
sample.type = c("Solid Tissue Normal"))
GDCdownload(query2a)
data2a <- GDCprepare(query2a)

and I get

data2a
function (..., list = character(), package = NULL, lib.loc = NULL,
verbose = getOption("verbose"), envir = .GlobalEnv)
{
fileExt <- function(x) {
db <- grepl("\.[^.]+\.(gz|bz2|xz)$", x)
ans <- sub(".\.", "", x)
ans[db] <- sub(".\.([^.]+\.)(gz|bz2|xz)$", "\1\2",
x[db])

.....

Confused vital status

Hi,I just found a result confusing me.

> clin.query <- GDCquery(project = "TCGA-READ", data.category = "Clinical", barcode = "TCGA-F5-6702")
Accessing GDC. This might take a while...
> json  <- tryCatch(GDCdownload(clin.query), 
+                   error = function(e) GDCdownload(clin.query, method = "client"))
Of the 1 files for download 1 already exist.
All samples have been already downloaded
> clinical.patient <- GDCprepare_clinic(clin.query, clinical.info = "patient")
  |========================================================================================================================================================================================================| 100%
To get the following information please change the clinical.info argument
=> new_tumor_events: new_tumor_event 
=> drugs: drug 
=> follow_ups: follow_up 
=> radiations: radiation
Adding stage event information
  |========================================================================================================================================================================================================| 100%
> clinical.patient.followup <- GDCprepare_clinic(clin.query, clinical.info = "follow_up")
  |========================================================================================================================================================================================================| 100%
> clinical.index <- GDCquery_clinic("TCGA-READ")
> clinical.patient[,c("vital_status","days_to_death","days_to_last_followup")]
  vital_status days_to_death days_to_last_followup
1        Alive            NA                    66
> clinical.patient.followup[,c("vital_status","days_to_death","days_to_last_followup")]
  vital_status days_to_death days_to_last_followup
1         Dead           869                    NA
2        Alive            NA                   452
> clinical.index[clinical.index$submitter_id=="TCGA-F5-6702",
+                c("vital_status","days_to_death","days_to_last_follow_up")]
    vital_status days_to_death days_to_last_follow_up
159        alive           869                    452

The vital status should be "dead".

Problem Downloading Clinical Data

When I ran the following:
query <- GDCquery(project = "TCGA-OV", data.category = "Clinical") GDCdownload(query,directory = "~/Projects/GDC/Clinical")

I got the following error:

**Accessing GDC. This might take a while...
GDCdownload will download: 31.370846 MB compressed in a tar.gz file
Downloading as: Mon_Aug_22_09_22_25_2016.tar.gz
|======================================================================================| 100%
/bin/tar: This does not look like a tar archive

gzip: stdin: not in gzip format
/bin/tar: Child returned status 1
/bin/tar: Error is not recoverable: exiting now
[1] 2
Error in GDCdownload(query, directory = "~/Projects/GDC/Clinical") :
There was an error in the download process, please execute it again
In addition: Warning message:
In untar(name) :
‘/bin/tar -xf 'Mon_Aug_22_09_22_25_2016.tar.gz'’ returned error code 2**

The same error is also generated with:
"TCGA-LGG" & "TCGA-LIHC"

problems getting data

I have executed this

library(TCGAbiolinks)

query1 <- GDCquery(project = "TCGA-BRCA",
data.category = "Transcriptome Profiling",
data.type = "Isoform Expression Quantification",
sample.type = c("Primary solid Tumor"))

GDCdownload(query1)

data1 <- GDCprepare(query1)

and 'data1' is having this ????

data1
function (x, df1, df2, ncp, log = FALSE)
{
if (missing(ncp))
.Call(C_df, x, df1, df2, log)
else .Call(C_dnf, x, df1, df2, ncp, log)
}

the object query1 is containing this information

   results   project           data.category

1 c("Isofo.... TCGA-BRCA Transcriptome Profiling
data.type legacy access experimental.strategy
1 Isoform Expression Quantification FALSE NA NA
file.type platform sample.type barcode workflow.type
1 NA NA Primary .... NA NA

and the folder GDCdata contains the downloaded files. Actually, I got this when executing GDCdownload(query1)

SUMMARY:
Successfully downloaded: 1043

Do you know why I'm not getting the SummarizedExperiment object?

Thanks

Error report - GDCdownload

Hello.

I've met some errors using GDCdownload function.

GDCdownload(basalQuery) basalData <- GDCprepare(basalQuery)

Downloading as: Tue_Aug_02_17_11_55_2016.tar.gz
curl: (1) Protocol "'https" not supported or disabled in libcurl
Error in gzfile(path.expand(tarfile), "rb") : cannot open the connection
In addition: Warning messages:
1: running command 'curl -o Tue_Aug_02_17_11_55_2016.tar.gz --remote-header-name --request POST 'https://gdc-api.nci.nih.gov/legacy/data' --data @payload' had status 1
2: In gzfile(path.expand(tarfile), "rb") :
cannot open compressed file 'Tue_Aug_02_17_11_55_2016.tar.gz', probable reason 'No such file or directory'

I fixed this error with

trace("GDCdownload", edit = T)

chaning
'https://gdc-api.nci.nih.gov/legacy/data' into "https://gdc-api.nci.nih.gov/legacy/data\"

fixed this issue.

However, another error came.

GDCdownload(luminalQuery)
Downloading as: Tue_Aug_02_17_25_03_2016.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 23563 100 44 100 23519 23 12443 0:00:01 0:00:01 --:--:-- 12443
Error in untar2(tarfile, files, list, exdir) : incomplete block on file

I tried several times. The name of .tar.gz file changes whenever I try, but same error message comes on my IDE.

I need someone's help.

maybe a problem to download

Hi all !

Hope everybody is ok, thanks for the guys who are active here :)
There might a problem with the TCGAdownload function at the moment.
Code that used to work before don't work anymore on my machine.
For example :

query <- TCGAquery(tumor="BRCA", level=3, platform="MDA_RPPA_Core")
TCGAdownload(query, path="rppa")

or :

TCGAdownload(query, path="rppa", samples=c("TCGA-E2-A14V-01A-21-A13E-20"))

gives me :

-=-=-=-=-=-= | Downloading:5 folders | Path:rppa -=-=-=-=-=-= | | 0% Downloading:mdanderson.org_BRCA.MDA_RPPA_Core.Level_3.1.3.0.tar.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 240 100 240 0 0 40 0 0:00:06 0:00:05 0:00:01 54 100 2060 100 2060 0 0 343 0 0:00:06 0:00:05 0:00:01 343 Error in fread(paste0(root, data[i, "deployLocation"], ".md5"), header = F, : Expected sep (',') but new line or EOF ends field 1 on line 33 when reading data: --> De plus : Warning messages: 1: In fread(paste0(root, data[i, "deployLocation"], ".md5"), header = F, : Unable to find 5 lines with expected number of columns (+ middle) 2: In fread(paste0(root, data[i, "deployLocation"], ".md5"), header = F, : Unable to find 5 lines with expected number of columns (+ last)

Can somebody try to see if I'm dumb or it's not me.
Thanks :)

about the p-value in "TCGAanalyze_survival"

Hi,
Showing my great gratitudes for your versatile and updated "TCGAbiolinks".
I have two question:

The Log-Rank P-value from "TCGAanalyze_survival" can be directly used for next analysis?
I mean, if the p-value in figure < 0.05, the "clusterCol" is associated with survival, is this right?
As regards "TCGAanalyze_survival" and "TCGAanalyze_SurvivalKM",
I used "TCGAanalyze_survival" a few days ago.
In recent days, I see the "TCGAanalyze_SurvivalKM", which can be used to find survival genes.
What's the brief difference between them, or they are the similar?

Stratify and compare analysis between groups

How do I stratify and compare the analysis by one or more parameters. Thank you,

Error: could not find function "assay"

BRCAMatrix <- assay(BRCARnaseq_assay,"raw_counts")
Error: could not find function "assay"

Problems downloading GDC data

Hi TCGAbiolinks!

One of our users have tried to download GDC but keep getting the same error message:

Error in curl::curl_fetch_memory(url, handle = handle) :
SSL connect error
Calls: GDCquery ... request_fetch -> request_fetch.write_memory -> -> .Call

Among other things the user has tried:

query.exp <- GDCquery(project = "TCGA-BRCA",
legacy = TRUE,
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "results",
experimental.strategy = "RNA-Seq",
sample.type = c("Primary solid Tumor","Solid Tissue Normal"))

, but the same error keeps occurring. The user says that she has updated TCGAbiolinks last Saturday.

Do you have any idea of why this error occur?

Thanks in advance.

Best,
André

Problems with downloading

@torongs82
@tiagochst

Hi Antonio and Tiago!

The problem goes like this: The script downloadDataGDC_miRNA_.R (used to download data from GDC), works when I'm using it with my R-program, but it does not work when it is used on Marika's R-program. She has the newest version of TCGAbiolinks installed and the sessionInfo() after loading TCGAbiolinks outputs this:

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C

[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8

[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8

[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C

[9] LC_ADDRESS=C LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] TCGAbiolinks_2.1.6

loaded via a namespace (and not attached):
[1] circlize_0.3.8

[2] aroma.light_3.0.0

[3] plyr_1.8.4

[4] igraph_1.0.1

[5] lazyeval_0.2.0

[6] ConsensusClusterPlus_1.24.0

[7] splines_3.3.1

[8] BiocParallel_1.4.3

[9] GenomeInfoDb_1.6.3

[10] ggplot2_2.1.0

[11] TH.data_1.0-7

[12] digest_0.6.10

[13] foreach_1.4.3

[14] BiocInstaller_1.20.3

[15] gdata_2.17.0

[16] magrittr_1.5

[17] cluster_2.0.4

[18] doParallel_1.0.10

[19] limma_3.26.9

[20] ComplexHeatmap_1.11.6

[21] Biostrings_2.41.4

[22] readr_1.0.0

[23] annotate_1.48.0

[24] matrixStats_0.50.2

[25] R.utils_2.3.0

[26] sandwich_2.3-4

[27] colorspace_1.2-6

[28] rvest_0.3.2

[29] ggrepel_0.5

[30] haven_0.2.1

[31] dplyr_0.5.0

[32] jsonlite_1.0

[33] RCurl_1.95-4.8

[34] hexbin_1.27.1

[35] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[36] graph_1.48.0

[37] genefilter_1.52.1

[38] lme4_1.1-12

[39] supraHex_1.8.0

[40] survival_2.39-5

[41] zoo_1.7-13

[42] iterators_1.0.8

[43] ape_3.5

[44] gtable_0.2.0

[45] zlibbioc_1.16.0

[46] XVector_0.13.7

[47] sjstats_0.4.0

[48] GetoptLong_0.1.4

[49] sjmisc_1.8

[50] kernlab_0.9-24

[51] Rgraphviz_2.14.0

[52] shape_1.4.2

[53] prabclus_2.2-6

[54] BiocGenerics_0.19.2

[55] DEoptimR_1.0-6

[56] scales_0.4.0

[57] DESeq_1.22.1

[58] futile.options_1.0.0

[59] mvtnorm_1.0-5

[60] DBI_0.4-1

[61] GGally_1.2.0

[62] edgeR_3.12.1

[63] ggthemes_3.2.0

[64] Rcpp_0.12.7

[65] xtable_1.8-2

[66] matlab_1.0.2

[67] mclust_5.2

[68] preprocessCore_1.32.0

[69] stats4_3.3.1

[70] httr_1.2.1

[71] gplots_3.0.1

[72] RColorBrewer_1.1-2

[73] fpc_2.1-10

[74] modeltools_0.2-21

[75] reshape_0.8.5

[76] XML_3.98-1.4

[77] R.methodsS3_1.7.1

[78] flexmix_2.3-13

[79] nnet_7.3-12

[80] reshape2_1.4.1

[81] AnnotationDbi_1.32.3

[82] munsell_0.4.3

[83] tools_3.3.1

[84] downloader_0.4

[85] RSQLite_1.0.0

[86] broom_0.4.1

[87] stringr_1.1.0

[88] knitr_1.14

[89] robustbase_0.92-6

[90] caTools_1.17.1

[91] dendextend_1.3.0

[92] coin_1.1-2

[93] EDASeq_2.4.1

[94] nlme_3.1-128

[95] whisker_0.3-2

[96] R.oo_1.20.0

[97] xml2_1.0.0

[98] biomaRt_2.29.2

[99] affyio_1.40.0

[100] tibble_1.2

[101] geneplotter_1.48.0

[102] stringi_1.1.1

[103] futile.logger_1.4.3

[104] GenomicFeatures_1.22.13

[105] lattice_0.20-33

[106] trimcluster_0.1-2

[107] Matrix_1.2-6

[108] psych_1.6.6

[109] nloptr_1.0.4

[110] effects_3.1-1

[111] stringdist_0.9.4.2

[112] GlobalOptions_0.0.10

[113] parmigene_1.0.2

[114] data.table_1.9.6

[115] cowplot_0.6.2

[116] bitops_1.0-6

[117] dnet_1.0.9

[118] rtracklayer_1.30.4

[119] GenomicRanges_1.25.1

[120] R6_2.1.3

[121] latticeExtra_0.6-28

[122] affy_1.48.0

[123] hwriter_1.3.2

[124] ShortRead_1.28.0

[125] KernSmooth_2.23-15

[126] IRanges_2.7.15

[127] codetools_0.2-14

[128] lambda.r_1.1.9

[129] MASS_7.3-45

[130] gtools_3.5.0

[131] assertthat_0.1

[132] chron_2.3-47

[133] SummarizedExperiment_1.3.82

[134] rjson_0.2.15

[135] mnormt_1.5-4

[136] GenomicAlignments_1.6.3

[137] Rsamtools_1.22.0

[138] multcomp_1.4-6

[139] S4Vectors_0.11.14

[140] diptest_0.75-7

[141] parallel_3.3.1

[142] sjPlot_2.0.2

[143] grid_3.3.1

[144] tidyr_0.6.0

[145] class_7.3-14

[146] minqa_1.2.4

[147] Biobase_2.30.0

The error message she got goes like this:
"Failure when receiving data from the peer."

downloadDataGDC_miRNA_.R.zip

Hope you can help me on this.

Best,
André

Bug: GDCprepare(query) fails adding metadata for TCGA-COAD gene expression

query <- GDCquery(project = "TCGA-COAD",
                      data.category = "Gene expression",
                      data.type = "Gene expression quantification",
                      platform = "Illumina HiSeq", file.type  = "normalized_results",
                      legacy = TRUE)
GDCdownload(query)
z <- GDCprepare(query)

Results in:

Parsed with column specification:
cols(
  released = col_character(),
  state = col_character(),
  dbgap_accession_number = col_character(),
  primary_site = col_character(),
  disease_type = col_character(),
  project_id = col_character(),
  name = col_character()
)
Accessing GDC. This might take a while...
All samples have been already downloded

Downloading genome information. Using: Homo sapiens genes (GRCh37.p13)
Starting to add information to samples
 => Add clinical information to samples
 => Adding subtype information to samples
Subtype information from:doi:10.1038/nature11252
Error in fix.by(by.y, y): 'by' must specify a uniquely valid column
Traceback:

1. GDCprepare(query)
2. readGeneExpressionQuantification(files, query$results[[1]]$cases, 
 .     summarizedExperiment, unique(query$platform))
3. makeSEfromGeneExpressionQuantification(df, assay.list)
4. colDataPrepare(samples)
5. merge(ret, subtype, by = "sample", all.x = TRUE)
6. merge.data.frame(ret, subtype, by = "sample", all.x = TRUE)
7. fix.by(by.y, y)
8. stop(ngettext(sum(bad), "'by' must specify a uniquely valid column", 
 .     "'by' must specify uniquely valid columns"), domain = NA)

sessionInfo():

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.8 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] SummarizedExperiment_1.2.3 Biobase_2.32.0            
[3] GenomicRanges_1.24.2       GenomeInfoDb_1.8.3        
[5] IRanges_2.6.1              S4Vectors_0.10.3          
[7] BiocGenerics_0.18.0        TCGAbiolinks_2.0.3        

loaded via a namespace (and not attached):
  [1] uuid_0.1-2                             
  [2] circlize_0.3.8                         
  [3] aroma.light_3.2.0                      
  [4] plyr_1.8.4                             
  [5] igraph_1.0.1                           
  [6] ConsensusClusterPlus_1.36.0            
  [7] repr_0.9                               
  [8] splines_3.3.1                          
  [9] BiocParallel_1.6.6                     
 [10] ggplot2_2.1.0                          
 [11] TH.data_1.0-7                          
 [12] digest_0.6.10                          
 [13] foreach_1.4.3                          
 [14] BiocInstaller_1.22.3                   
 [15] gdata_2.17.0                           
 [16] magrittr_1.5                           
 [17] cluster_2.0.4                          
 [18] doParallel_1.0.10                      
 [19] limma_3.28.18                          
 [20] ComplexHeatmap_1.10.2                  
 [21] Biostrings_2.40.2                      
 [22] readr_1.0.0                            
 [23] annotate_1.50.0                        
 [24] matrixStats_0.50.2                     
 [25] R.utils_2.3.0                          
 [26] sandwich_2.3-4                         
 [27] colorspace_1.2-6                       
 [28] rvest_0.3.2                            
 [29] ggrepel_0.5                            
 [30] haven_0.2.1                            
 [31] dplyr_0.5.0                            
 [32] crayon_1.3.2                           
 [33] RCurl_1.95-4.8                         
 [34] jsonlite_1.0                           
 [35] hexbin_1.27.1                          
 [36] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [37] graph_1.50.0                           
 [38] genefilter_1.54.2                      
 [39] lme4_1.1-12                            
 [40] supraHex_1.10.0                        
 [41] survival_2.39-5                        
 [42] zoo_1.7-13                             
 [43] iterators_1.0.8                        
 [44] ape_3.5                                
 [45] gtable_0.2.0                           
 [46] zlibbioc_1.18.0                        
 [47] XVector_0.12.1                         
 [48] sjstats_0.3.0                          
 [49] GetoptLong_0.1.4                       
 [50] sjmisc_1.8                             
 [51] kernlab_0.9-24                         
 [52] Rgraphviz_2.16.0                       
 [53] shape_1.4.2                            
 [54] prabclus_2.2-6                         
 [55] DEoptimR_1.0-6                         
 [56] scales_0.4.0                           
 [57] DESeq_1.24.0                           
 [58] mvtnorm_1.0-5                          
 [59] DBI_0.5                                
 [60] GGally_1.2.0                           
 [61] edgeR_3.14.0                           
 [62] ggthemes_3.2.0                         
 [63] Rcpp_0.12.6                            
 [64] xtable_1.8-2                           
 [65] matlab_1.0.2                           
 [66] mclust_5.2                             
 [67] preprocessCore_1.34.0                  
 [68] httr_1.2.1                             
 [69] gplots_3.0.1                           
 [70] RColorBrewer_1.1-2                     
 [71] fpc_2.1-10                             
 [72] modeltools_0.2-21                      
 [73] reshape_0.8.5                          
 [74] XML_3.98-1.4                           
 [75] R.methodsS3_1.7.1                      
 [76] flexmix_2.3-13                         
 [77] nnet_7.3-12                            
 [78] reshape2_1.4.1                         
 [79] AnnotationDbi_1.34.4                   
 [80] munsell_0.4.3                          
 [81] tools_3.3.1                            
 [82] downloader_0.4                         
 [83] RSQLite_1.0.0                          
 [84] broom_0.4.1                            
 [85] evaluate_0.9                           
 [86] stringr_1.0.0                          
 [87] knitr_1.14                             
 [88] robustbase_0.92-6                      
 [89] caTools_1.17.1                         
 [90] dendextend_1.2.0                       
 [91] coin_1.1-2                             
 [92] EDASeq_2.6.2                           
 [93] nlme_3.1-128                           
 [94] whisker_0.3-2                          
 [95] R.oo_1.20.0                            
 [96] xml2_1.0.0                             
 [97] biomaRt_2.28.0                         
 [98] curl_1.2                               
 [99] affyio_1.42.0                          
[100] tibble_1.1                             
[101] geneplotter_1.50.0                     
[102] stringi_1.1.1                          
[103] GenomicFeatures_1.24.5                 
[104] lattice_0.20-33                        
[105] trimcluster_0.1-2                      
[106] IRdisplay_0.4.9000                     
[107] Matrix_1.2-6                           
[108] psych_1.6.6                            
[109] nloptr_1.0.4                           
[110] effects_3.1-1                          
[111] stringdist_0.9.4.1                     
[112] GlobalOptions_0.0.10                   
[113] data.table_1.9.6                       
[114] cowplot_0.6.2                          
[115] bitops_1.0-6                           
[116] dnet_1.0.9                             
[117] rtracklayer_1.32.2                     
[118] R6_2.1.2                               
[119] latticeExtra_0.6-28                    
[120] affy_1.50.0                            
[121] hwriter_1.3.2                          
[122] ShortRead_1.30.0                       
[123] KernSmooth_2.23-15                     
[124] codetools_0.2-14                       
[125] MASS_7.3-45                            
[126] gtools_3.5.0                           
[127] assertthat_0.1                         
[128] chron_2.3-47                           
[129] rjson_0.2.15                           
[130] mnormt_1.5-4                           
[131] GenomicAlignments_1.8.4                
[132] Rsamtools_1.24.0                       
[133] multcomp_1.4-6                         
[134] diptest_0.75-7                         
[135] sjPlot_2.0.2                           
[136] grid_3.3.1                             
[137] IRkernel_0.6                           
[138] tidyr_0.6.0                            
[139] class_7.3-14                           
[140] minqa_1.2.4                            
[141] pbdZMQ_0.2-3

Vignette Should Use Consistent Casing

In the vignette there is a table which has headings like "Data.type" and entries like "Copy number variation". Because R is case-sensitive, this causes errors if used. The vignette should be updated to use upper case for the entries and lower case for the parameters, to conform with the software.

TCGAquery_clinic error

Hi.
for several days ago, TCGAquery doesn't work throwing error message as below:

clinical_brca_data <- TCGAquery_clinic("brca","clinical_patient")
Error in fread(paste0(root, url, "/", files[grep("MANIFEST", files)]), :
Expected sep (',') but new line or EOF ends field 1 on line 33 when reading data: -->
In addition: Warning messages:
1: In fread(paste0(root, url, "/", files[grep("MANIFEST", files)]), :
Unable to find 5 lines with expected number of columns (+ middle)
2: In fread(paste0(root, url, "/", files[grep("MANIFEST", files)]), :
Unable to find 5 lines with expected number of columns (+ last)

What should I do to fix this problem?

Error in TCGAvisualize_oncoprint()

Hi,

While running TCGAvisualize_oncoprint() function for visualizing oncoprint, as given in the tutorial at https://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/tcgaBiolinks.html#installation, the following error is given

TCGAvisualize_oncoprint(mut = mut, genes = mut$Hugo_Symbol[1:20],
filename = NULL, annotation = clin,
color=c("background"="#CCCCCC","DEL"="purple","INS"="yellow","SNP"="brown"),
rows.font.size=10, heatmap.legend.side = "right", dist.col = 0, label.font.size = 10)

Aggregate function missing, defaulting to 'length'
Error in names(column_order) = as.character(column_order) :
attempt to set an attribute on NULL

Problem Parsing Patient info

I ran the following commands:

query <- GDCquery(project = "TCGA-BLCA", data.category = "Clinical") GDCdownload(query,directory = "~/Projects/GDC/Clinical") clinical <- GDCprepare_clinic(query,"patient",directory = "~/Projects/GDC/Clinical")

and I get the following error:

Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match

Problem downloading TCGA-LAML clinical data

Upon running the following simple query:

library(TCGAbiolinks)
q<- GDCquery(project = "TCGA-LAML", data.category = "Clinical")
GDCdownload(q)
GDCprepare_clinic(q, clinical.info = "patient")

The output in R returns the following error:

Error in [.data.frame(clin, , i) : undefined columns selected

This appears to be a bug in GDCprepare_clinic due to the data frame clin having no column named new_tumor_events in the following segment of code:

for (i in c("new_tumor_events", "drugs", "follow_ups",
"radiations")) {
clin[, i] <- as.character(clin[, i])
clin[which(clin[, i] != ""), i] <- "YES"
clin[which(clin[, i] == ""), i] <- "NO"
colnames(clin)[which(colnames(clin) == i)] <- paste0("has_",
i, "_information")
}

TCGAbiolinks-installation problems

Hi everybody!
I used TCGAbiolinks library in RStudio since 2 days ago when I have got some problems in downloading miRNA data.
So, I have decided to re-install TCGAbiolinks library.
I typed:
install.packages("devtools") devtools::install_github("BioinformaticsFMRP/TCGAbiolinks")

and I received this error:

`Downloading GitHub repo BioinformaticsFMRP/TCGAbiolinks@master
from URL https://api.github.com/repos/BioinformaticsFMRP/TCGAbiolinks/zipball/master
Installing TCGAbiolinks
"C:/PROGRA~~1/R/R-33~~1.1/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD
INSTALL
"C:/Users/marta/AppData/Local/Temp/RtmpuYhYhj/devtoolsa0c2f737579/BioinformaticsFMRP-TCGAbiolinks-2f2d79c"
--library="C:/Users/marta/Documents/R/win-library/3.3" --install-tests

installing source package 'TCGAbiolinks' ...
** R
** data
*** moving datasets to lazyload DB
** inst
** tests
** preparing package for lazy loading
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
namespace 'SummarizedExperiment' 1.2.3 is already loaded, but >= 1.4.0 is required
ERROR: lazy loading failed for package 'TCGAbiolinks'
removing 'C:/Users/marta/Documents/R/win-library/3.3/TCGAbiolinks'
Error: Command failed (1)`

I have thougth that the problem could be SummerizedExperiment library, but I have no idea how to fix it.
Moreover, SessionInfo() returns:

`R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 LC_MONETARY=Italian_Italy.1252
[4] LC_NUMERIC=C LC_TIME=Italian_Italy.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] httr_1.2.1 R6_2.2.0 BiocInstaller_1.22.3 tools_3.3.1
[5] withr_1.0.2 curl_2.2 memoise_1.0.0 knitr_1.15
[9] git2r_0.15.0 digest_0.6.10 devtools_1.12.0 `

How can I solve the problem?
Thanks in advance

HTSeq <<losing 5% of information when mapping to genomic regions>>

I want to know why the htseq data will lose 5% of information when mapping to genomic regions.
Is it because of the reference version problem or some other problems?

Installation problem

Hi, I'm new in R programming and RStudio software and I need to use TCGAbiolinks for my thesis.
However, I have some installation problems: typing devtools::install_github("BioinformaticsFMRP/TCGAbiolinks")
in RStudio console I receive this error:

`Downloading GitHub repo BioinformaticsFMRP/TCGAbiolinks@master
from URL https://api.github.com/repos/BioinformaticsFMRP/TCGAbiolinks/zipball/master
Installing TCGAbiolinks
"C:/PROGRA~~1/R/R-33~~1.1/bin/x64/R" --no-site-file --no-environ --no-save --no-restore
--quiet CMD INSTALL
"C:/Users/marta/AppData/Local/Temp/RtmpAFVBV9/devtools1e5024642378/BioinformaticsFMRP-TCGAbiolinks-4c9059d"
--library="C:/Users/marta/Documents/R/win-library/3.3" --install-tests

installing source package 'TCGAbiolinks' ...
** R
** data
*** moving datasets to lazyload DB
** inst
** tests
** preparing package for lazy loading
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
there is no package called 'shape'
ERROR: lazy loading failed for package 'TCGAbiolinks'
removing 'C:/Users/marta/Documents/R/win-library/3.3/TCGAbiolinks'
restoring previous 'C:/Users/marta/Documents/R/win-library/3.3/TCGAbiolinks'
Error: Command failed (1)`

Moreover, sessionInfo() is

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252
[3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C
[5] LC_TIME=Italian_Italy.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] httr_1.2.1 R6_2.1.3 BiocInstaller_1.22.3 tools_3.3.1
[5] withr_1.0.2 curl_1.2 memoise_1.0.0 knitr_1.14
[9] git2r_0.15.0 digest_0.6.10 devtools_1.12.0

Somebody knows what I'm getting wrong?

Conflict between old functions and the 'new' ones

@lucgar @torongs82 You added the old functions TCGAQuery, TCGADownload and now the common names are making everything not working. Also the documentation is destroyed when you roxygenize. Moreover the R file now is a mess with the lot you added again.
I recall that we decided to use the new functions because they were working, simpler to use and understandable. Is it right? Therefore your code for downstream analysis should be adapted to the new methods and the things we deleted and you re-added should be deleted. If we keep them nothing will work.
Otherwise you could make the old functions working and easy to use. I mean a demo file to download something that is more than 50 lines is not useful and no one will use it.

Htseq-count, normalization, miRNA analysis

I want to know why htseq-normalization doesn't seems to work. Apart from that
These code was found in the manual. Do you thinks its an error in manual. Is it matrix data which should be normalized ???

save(data2, geneInfo , file = "CESERNAseqExpression3.rda")
dataNorm2 <- TCGAanalyze_Normalization(tabDF = data2, geneInfo = geneInfo)

I am able to download the miRNA but I can't further move to comparison study.

TCGAanalyze_SurvivalKM

Hi,

I was wondering if there are any plans to update the TCGAanalyze_SurvivalKM function? It doesn't seem to be compatible with GDCquery.

Thanks,
Regina

Incorrect reading of MAF-files

Example code:
query.mut <- TCGAquery(tumor='COAD',level=2, platform="IlluminaGA_DNASeq");
TCGAdownload(query.mut, path='data/COAD');
mut.data <- TCGAprepare(query.mut, dir='data/COAD', save = TRUE, filename = 'COAD_IlluminaGA.rda',summarizedExperiment = FALSE);

returns data frame with 58995 rows, while downloaded maf contains 114596 rows.
Warning message is generated:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string

The same is for
TCGAquery_maf("COAD",archive.name="hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0")

Solution: both functions (TCGAprepare and TCGAquery_maf) should include additional argument when calling read.table( ..., quote='');

TCGAprepare() fails when 2 files are downloaded

tumor='GBM'
query.mut <- TCGAquery(tumor=tumor,level=2, platform="IlluminaGA_DNASeq");
TCGAdownload(query.mut, path=paste('data2',tumor,sep='/'));
mut.data <- TCGAprepare(query.mut, dir=paste('data2',tumor,sep='/'),save=TRUE,filename=paste(tumor,'mut','rda',sep='.'),summarizedExperiment = FALSE);

TCGAprepare() fails
"Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match"

This query returns two files with different number of columns. Hence rbind() fails
Maybe consider one of the solutions here:
http://stackoverflow.com/questions/3402371/combine-two-data-frames-by-rows-rbind-when-they-have-different-sets-of-columns

Error running GDCprepare

In running GDCprepare on a query obtaining 450K methylation data from TCGA-READ (although I've seen the same message from other projects/data_types), I got the following:

*** Preparing R object
|===============================================================================================| 100%Downloading genome information. Using: Homo sapiens genes (GRCh37.p13)
Starting to add information to samples
=> Add clinical information to samples
Error: lexical error: invalid char in json text.
<respons
(right here) ------^ (*** caret points at the '?' in '?xml')

The exact query is a bit complicated, but the traceback was:

19: .Call(R_parse, txt, bigint_as_char)
18: parse_string(txt, bigint_as_char)
17: parseJSON(txt, bigint_as_char)
16: fromJSON_string(txt = txt, simplifyVector = simplifyVector, simplifyDataFrame = simplifyDataFrame,
simplifyMatrix = simplifyMatrix, flatten = flatten, ...)
15: fromJSON(content(GET(url), as = "text", encoding = "UTF-8"),
simplifyDataFrame = TRUE)
14: value[3L]
13: tryCatchOne(expr, names, parentenv, handlers[[1L]])
12: tryCatchList(expr, classes, parentenv, handlers)
11: tryCatch(fromJSON(url, simplifyDataFrame = TRUE), error = function(e) {
fromJSON(content(GET(url), as = "text", encoding = "UTF-8"),
simplifyDataFrame = TRUE)
})
10: getBarcodeInfo(ret$patient[start:end])
9: colDataPrepare(colnames(df)[5:ncol(df)])
8: makeSEfromDNAmethylation(df)
7: readDNAmethylation(files, query$results[[1]]$cases, summarizedExperiment,
unique(query$platform))
6: GDCprepare(query, save = FALSE, directory = TCGA.downloads) at TCGAget.r#242
5: eval(expr, envir, enclos)
4: eval(ei, envir)
3: withVisible(eval(ei, envir))
2: source("TCGAget.r") at TCGAget.r#1
1: doit()

GDCdownload error

I'm getting an inexplicable error when I try to download data. See code and output below.

> query <- GDCquery(project = "TCGA-OV",data.category = "Gene expression",platform = "Illumina HiSeq",legacy = TRUE)

Accessing GDC. This might take a while...

> GDCdownload(query)

trying URL 'https://gdc.nci.nih.gov//files/public/file/gdc-client_v1.0.1-windows-x64.zip'
Content type 'application/zip' length 8592060 bytes (8.2 MB)
downloaded 8.2 MB

Error in if (bytes < unit) return(paste0(bytes + " B")) :
missing value where TRUE/FALSE needed

Any help is much appreciated.
Thanks,
Josh

Code from vignettes does not work

Hello,

I am running the code from here http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/tcgaBiolinks.html#tcga-downstream-analysis-case-studies, example 1. However the first code junk already does not work on my machine, the error message is

Error in GDCquery(project = "TCGA-BRCA", legacy = TRUE, data.category = "Gene expression", : Please set a valid data.type argument from the list below: =>

The output of sessionInfo()

R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux release 6.6 (Carbon)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] TCGAbiolinks_2.1.8 SummarizedExperiment_1.2.3
[3] Biobase_2.32.0 GenomicRanges_1.24.3
[5] GenomeInfoDb_1.8.7 IRanges_2.6.1
[7] S4Vectors_0.10.3 BiocGenerics_0.18.0

loaded via a namespace (and not attached):
[1] circlize_0.3.7
[2] aroma.light_3.2.0
[3] plyr_1.8.4
[4] igraph_1.0.1
[5] R.rsp_0.30.0
[6] lazyeval_0.2.0
[7] ConsensusClusterPlus_1.36.0
[8] splines_3.3.0
[9] BiocParallel_1.6.2
[10] ggplot2_2.1.0
[11] TH.data_1.0-7
[12] digest_0.6.10
[13] foreach_1.4.3
[14] BiocInstaller_1.22.3
[15] gdata_2.17.0
[16] memoise_1.0.0
[17] magrittr_1.5
[18] cluster_2.0.4
[19] doParallel_1.0.10
[20] limma_3.28.19
[21] ComplexHeatmap_1.10.2
[22] Biostrings_2.40.2
[23] readr_1.0.0
[24] annotate_1.50.0
[25] matrixStats_0.50.2
[26] R.utils_2.3.0
[27] sandwich_2.3-4
[28] colorspace_1.2-6
[29] rvest_0.3.2
[30] ggrepel_0.5
[31] haven_0.2.1
[32] dplyr_0.5.0
[33] RCurl_1.95-4.8
[34] jsonlite_1.0
[35] hexbin_1.27.1
[36] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[37] graph_1.50.0
[38] genefilter_1.54.2
[39] lme4_1.1-12
[40] supraHex_1.10.0
[41] survival_2.39-4
[42] zoo_1.7-13
[43] iterators_1.0.8
[44] ape_3.5
[45] gtable_0.2.0
[46] zlibbioc_1.18.0
[47] XVector_0.12.0
[48] sjstats_0.4.0
[49] GetoptLong_0.1.3
[50] sjmisc_1.8
[51] R.cache_0.12.0
[52] Rgraphviz_2.16.0
[53] shape_1.4.2
[54] scales_0.4.0
[55] DESeq_1.24.0
[56] mvtnorm_1.0-5
[57] DBI_0.5-1
[58] GGally_1.1.0
[59] edgeR_3.14.0
[60] ggthemes_3.2.0
[61] Rcpp_0.12.7
[62] xtable_1.8-2
[63] matlab_1.0.2
[64] preprocessCore_1.34.0
[65] httr_1.2.1
[66] gplots_3.0.1
[67] RColorBrewer_1.1-2
[68] modeltools_0.2-21
[69] reshape_0.8.5
[70] XML_3.98-1.4
[71] R.methodsS3_1.7.1
[72] nnet_7.3-12
[73] reshape2_1.4.1
[74] AnnotationDbi_1.34.3
[75] munsell_0.4.3
[76] tools_3.3.0
[77] downloader_0.4
[78] RSQLite_1.0.0
[79] devtools_1.12.0
[80] broom_0.4.1
[81] evaluate_0.9
[82] stringr_1.1.0
[83] knitr_1.13
[84] caTools_1.17.1
[85] dendextend_1.1.8
[86] coin_1.1-2
[87] EDASeq_2.6.2
[88] nlme_3.1-128
[89] whisker_0.3-2
[90] formatR_1.4
[91] R.oo_1.20.0
[92] xml2_1.0.0
[93] biomaRt_2.28.0
[94] curl_1.2
[95] affyio_1.42.0
[96] tibble_1.2
[97] geneplotter_1.50.0
[98] stringi_1.1.1
[99] highr_0.6
[100] GenomicFeatures_1.24.3
[101] lattice_0.20-33
[102] Matrix_1.2-6
[103] psych_1.6.4
[104] nloptr_1.0.4
[105] effects_3.1-1
[106] stringdist_0.9.4.1
[107] GlobalOptions_0.0.10
[108] parmigene_1.0.2
[109] data.table_1.9.6
[110] cowplot_0.6.2
[111] bitops_1.0-6
[112] dnet_1.0.9
[113] rtracklayer_1.32.1
[114] R6_2.1.3
[115] latticeExtra_0.6-28
[116] affy_1.50.0
[117] hwriter_1.3.2
[118] ShortRead_1.30.0
[119] KernSmooth_2.23-15
[120] codetools_0.2-14
[121] MASS_7.3-45
[122] gtools_3.5.0
[123] assertthat_0.1
[124] chron_2.3-47
[125] rjson_0.2.15
[126] withr_1.0.2
[127] GenomicAlignments_1.8.3
[128] Rsamtools_1.24.0
[129] mnormt_1.5-4
[130] multcomp_1.4-5
[131] grid_3.3.0
[132] sjPlot_2.0.2
[133] tidyr_0.6.0
[134] minqa_1.2.4
[135] git2r_0.15.0

Where could find geneInfo

For example here:
dataNorm <- TCGAanalyze_Normalization(tabDF = dataBRCA, geneInfo = geneInfo)

I just cann't where is the geneInfo come from. Thanks.

TCGAanalyze_Normalization

Hi!
I would like your help. I have used GDCquery and then GDCdownload to download the TCGA data. After that I used data <- GDCprepare(query) successfully but I get error messages when I try TCGAanalyze_Normalization. I use TCGA_Normalization(tabDF=data, geneInfo=geneInfo). Why is that? Could you please provide me with the missing steps?
Best wishes

httr GET() still failing connections

From yesterday I get:

>TCGAUpdate()
[1] "Reconnection attempt #10"
[1] "Reconnection attempt #20"
[1] "Reconnection attempt #30"
[1] "Reconnection attempt #40"
[1] "Reconnection attempt #50"
[1] "Reconnection attempt #60"
[1] "Reconnection attempt #70"
[1] "Reconnection attempt #80"
[1] "Reconnection attempt #90"
[1] "Reconnection attempt #100"
Error in DownloadHTML(siteTCGA) : 
  Connetion limit exceded. Check your internet connection and your proxy settings.
             If you are downloading very big files (proteins for example) you should add the proper variable.
             Take a look to the documentation. If the problem persists please contact the mantainers.

Also for TCGAVersion:

> version <- TCGAVersion(tumor = tumor,
+ centerType = centerType,
+ platform = platform,
+ level = level,
+ barcode = T)
Found 3 Version of genome_wide_snp_6
Looking for metadata...
Version 1 of 3 broad.mit.edu_ACC.Genome_Wide_SNP_6.Level_3.304.2002.0
[1] "Reconnection attempt #10"
[1] "Reconnection attempt #20"
[1] "Reconnection attempt #30"
[1] "Reconnection attempt #40"
[1] "Reconnection attempt #50"
[1] "Reconnection attempt #60"
[1] "Reconnection attempt #70"
[1] "Reconnection attempt #80"
[1] "Reconnection attempt #90"
[1] "Reconnection attempt #100"
Error in DownloadHTML(platform.url[j]) : 
  Connetion limit exceded. Check your internet connection and your proxy settings.
             If you are downloading very big files (proteins for example) you should add the proper variable.
             Take a look to the documentation. If the problem persists please contact the mantainers.

Error report - GDCprepare

Hi,
I've encountered with some error when using GDCprepare function.

library(TCGAbiolinks)

query = GDCquery(project = 'TCGA-HNSC',
                 data.category = 'Transcriptome Profiling',
                 data.type = 'Gene Expression Quantification',
                 workflow.type = 'HTSeq - FPKM'
                 )

GDCdownload(query, method = "client", directory = 'GDCdata')

HNSCdata = GDCprepare(query)

And the error came:

Starting to add information to samples
=> Add clinical information to samples
=> Adding subtype information to samples
Subtype information from:doi:10.1038/nature14129
Extra content at the end of the document
Error: 1: Extra content at the end of the document

traceback:

10: stop(e)
9: (function (msg, ...)
{
if (length(grep("\n$", msg)) == 0)
paste(msg, "\n", sep = "")
if (immediate)
cat(msg)
if (length(msg) == 0) {
e = simpleError(paste(1:length(messages), messages, sep = ": ",
collapse = ""))
class(e) = c(class, class(e))
stop(e)
}
messages <<- c(messages, msg)
})(character(0))
8: .Call("RS_XML_ParseTree", as.character(file), handlers, as.logical(ignoreBlanks),
as.logical(replaceEntities), as.logical(asText), as.logical(trim),
as.logical(validate), as.logical(getDTD), as.logical(isURL),
as.logical(addAttributeNamespaces), as.logical(useInternalNodes),
as.logical(isHTML), as.logical(isSchema), as.logical(fullNamespaceInfo),
as.character(encoding), as.logical(useDotNames), xinclude,
error, addFinalizer, as.integer(options), as.logical(parentFirst),
PACKAGE = "XML")
7: xmlTreeParse(registry, asText = TRUE)
6: listMarts(host = host, path = path, port = port, includeHosts = TRUE,
archive = archive, ssl.verifypeer = ssl.verifypeer)
5: useMart("ensembl", dataset = "hsapiens_gene_ensembl")
4: get.GRCh.bioMart("hg38")
3: makeSEfromTranscriptomeProfiling(df, cases, workflow.type)
2: readTranscriptomeProfiling(files = files, data.type = query$data.type,
workflow.type = unique(query$results[[1]]$analysis$workflow_type),
cases = query$results[[1]]$cases, summarizedExperiment)
1: GDCprepare(query)

have little knowledge about coding, need help.

Function suggestion - Ability to download data without project

For example, this has everything else but a project, and you can neither query nor download it due to that reason.

Thanks.

Question in using GDCdownload

Hi, question with using packagesTCGAbiolinks, error like this:

query<- GDCquery(project = "TCGA-STAD",
              data.category = "Transcriptome Profiling",
              data.type = "miRNA Expression Quantification" 
              )
Accessing GDC. This might take a while...
GDCdownload(query)
试开URL’https://gdc.cancer.gov/files/public/file/gdc-client_v1.1.0_Windows_x64.zip'
Error in download.file(url, method = method, ...) :
无法打开URL'https://gdc.cancer.gov/files/public/file/gdc-client_v1.1.0_Windows_x64.zip'

GDCquery SSL issue

Dear all,

When I was trying to use GDCquery() following your manual, there went for a SSL erorr:
Error in open.connection(con, "rb") : SSL connect error

Thanks a lot

Best

Vignette fixes

While working through some parts of http://www.bioconductor.org/packages/release/bioc/vignettes/TCGAbiolinks/inst/doc/tcgaBiolinks.html, I noted some inconsistencies. Fixing them may help others.

LGG_clinic <- TCGAquery_clinic(cancer = "LGG",
clinical_data_type = "clinical_patient")
should be
LGG_clinic <- TCGAquery_clinic(tumor = "LGG",
clinical_data_type = "clinical_patient")

Help page for ?TCGAquery_integrate has a typo: "commun" should be "common"

Section "TCGAquery_investigate: Find most studied TFs in pubmed"

Select only transcription factors (TFs) from DEGs

TFs <- EAGenes[EAGenes$Family =="transcription regulator",]

Where is the "EAGenes" object?

Section "Case study n. 1: Pan Cancer downstream analysis BRCA"

dataClin <- TCGAquery_clinic(tumor = cancer,
clinical_data_type = "clinical_patient") # time = 2.606s
should be
dataClin <- TCGAquery_clinic(tumor = cancer,
clinical_data_type = "clinical_patient") # time = 2.606s

dataPrep <- TCGAanalyze_Preprocessing(object = dataAssy,
cor.cut = 0.6,
path = pathCancer,
cancer = cancer ) #time = 50.372s
should be
dataPrep <- TCGAanalyze_Preprocessing(object = dataAssy,
cor.cut = 0.6) #time = 50.372s

Problems with TCGAanalyze_Preprocessing()

Hi TCGAbiolinks!

I have encountered problems with TCGAanalyze_Preprocessing() for a particular user on our server. The person is using TCGAbiolinks_2.1.13. When running the function we get the following error message:

Error in assay(object, datatype) :
'assay(, i="character", ...)' invalid subscript 'i'
'i' not in names(assays())
In addition: Warning message:
In data.row.names(row.names, rowsi, i) :
some row.names duplicated: 11,78,80,84,111,119 --> row.names NOT used

Have you experienced this error before? The error does not occur when I'm running TCGAanalyze_Preprocessing() on my user-account. I have an older version of TCGAbiolinks (2.1.6), so maybe there is something wrong with the function in the new version of TCGAbiolinks? (We use the same version of SummarizedExperiment: 1.3.82)

Best,
André

Error while querying Illumina HiSeq

I got the error while running the command. Please help me me to fix it

query <- GDCquery(project = "TCGA-BRCA",
data.category = "Gene expression",
data.type = "Gene Expression Quantification",
experimental.strategy = "RNA-seq",
platform = "Illumina HiSeq",
file.type = "results",
legacy = TRUE)
Error in GDCquery(project = "TCGA-BRCA", data.category = "Gene expression", :
Please set a valid data.type argument from the list below:
=>

miRNA DEA

How to perform miRNA differential expression analysis ，
Please help and give an example，

Thanks very much.****

XML parsing problem

Hello, I think TCGABiolinks is marvellous R-package.
However, I got a problem when I was trying to get clinical information such as radiation.

query1 <- GDCquery(project = "TCGA-GBM",
legacy = TRUE,
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "AgilentG4502A_07_1",
sample.type = "Primary solid Tumor")

GDCdownload(query1)

(Until here it's o.k., But.. )

clinical <- GDCprepare_clinic(query, clinical.info = "radiation")

(After above command, I get followed error message.)

exception: Start tag expected, '<' not found [4]

Your example at tutorial works fine.
But my case is not working due to XML parsing problem, i think.
How can I solve this problem?

It would be really helpful if this problem solved.
Thank you in advance.

Different pvalues in TCGAanalyze_survival

Thank you so much for your versatile and updated "TCGAbiolinks".
I have successfully installed that in our linux system. But when I used the "TCGAanalyze_survival", I met a problem.

code1.
library(TCGAbiolinks)
clin.data <- GDCquery_clinic("TCGA-ACC", "clinical", save.csv=TRUE)
TCGAanalyze_survival(clin.data,clusterCol = "gender", filename = "test.pdf")

# The pvalue is 6.5e−01 in test.pdf.

Because I need to output the pvalue, so I paste the code from "TCGAanalyze_survival", like this:
code2:
data=clin.data
clusterCol = "gender"
data$s <- grepl("dead", data$vital_status, ignore.case = TRUE)
data$type <- as.factor(data[, clusterCol])
f.m <- formula(Surv(as.numeric(data$days_to_death), event = data$s) ~ data$type)
fit <- survfit(f.m, data = data)
pvalue <- summary(coxph(Surv(as.numeric(data$days_to_death), event = data$s) ~ data$type))$logtest[3]

# The pvalue is 0.7388095.

I have tested several conditions, all with different pvalues.

So what happened? And how cal I get the exact pvalue of survival?

Error: could not find function "GDCquery"

Hi!
I have downloaded TCGAbiolinks and but these GDC... functions do not work. What am I doing wrong?
Thanks in advance.

Update TCGAbiolinks to GDC

TCGA data is now in the GDC portal.
As the old API will no more be supoorted, GDC API should be used.

Extra content at the end of the document Error: 1: Extra content at the end of the document

I've encountered with some error when using GDCprepare function.

query.exp <- GDCquery(project = "TCGA-HNSC",
legacy = TRUE,
data.category = "Gene expression",
data.type = "Gene expression quantification",
file.type = "results",
platform = "Illumina HiSeq",
barcode = listSamples)

GDCdownload(query.exp)

hnsc.exp <- GDCprepare(query = query.exp, save = TRUE, save.filename = "Laryn.exp.rda")

The error
xtra content at the end of the document
Error: 1: Extra content at the end of the document
but
library(biomaRt)
ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
There were no errors

Installation Error

Hi. I'm having problem installing TCGAbiolinks packages.

Warning messages:
1: running command '"C:/PROGRA~~1/R/R-33~~1.1/bin/x64/R" CMD INSTALL -l "C:\Users...\Documents\R\win-library\3.3" C:\Users...AppData\Local\Temp\RtmpULAp33/downloaded_packages/TCGAbiolinks_2.0.0.tar.gz' had status 1
2: In install.packages(pkgs = doing, lib = lib, ...) :
installation of package ‘TCGAbiolinks’ had non-zero exit status

I tried to install it from gz file but it also shows messege like this:

ERROR: dependencies 'survival', 'limma' are not available for package 'TCGAbiolinks'

removing 'C:/Users/.../Documents/R/win-library/3.3/TCGAbiolinks'
Warning in install.packages :
running command '"C:/PROGRA~~1/R/R-33~~1.1/bin/x64/R" CMD INSTALL -l "C:\Users...\Documents\R\win-library\3.3" "C://users/.../documents/TCGAbiolinks_2.0.0.tar.gz"' had status 1
Warning in install.packages :
installation of package ‘C://users/.../documents/TCGAbiolinks_2.0.0.tar.gz’ had non-zero exit status

Do you have any idea what I should do?

GDCDownload doesn't work when project non ASCII

I have an R project using TCGAbiolinks version 2.2.6 in RStudio.

I tried to download data using GDCdownload, using both client and api methods but kept getting 500 INTERNAL SERVER ERROR response from GDC. After looking around the web I found out that GDC returns a 500 error when you send a non ASCII encoded id to the server (see Known Issues and Workarounds https://gdc-docs.nci.nih.gov/Data_Transfer_Tool/Release_Notes/DTT_Release_Notes/). In my case, my project was encoded in UTF-8. When I changed the encoding to ASCII, everything worked.

Perhaps before the GDCdownload.aux function sends a request to, it could convert all ids to ASCII?

Many thanks, your project has saved me a lot of effort!

Question install and use TCGAbiolinks

Hi, question with using packagesTCGAbiolinks, error like this:

when I use function GDCquery_Maf(), it appears: cannot open URL 'https://gdc-api.nci.nih.gov/data//abbe72a5-cb39-48e4-8df5-5fd2349f2bb2'
using TCGAquery_investigate(): could not find function "TCGAquery_investigate"
I install the packages with order: biocLite("TCGAbiolinks"), is there any other packages need to install?

very thanks !
Yang

bioinformaticsfmrp / tcgabiolinks Goto Github PK

tcgabiolinks's Introduction

TCGAbiolinks - An R/Bioconductor package for integrative analysis with TCGA data

Installation from GitHub

Installation from Bioconductor

Docker image

Manual

Citation

tcgabiolinks's People

Contributors

Stargazers

Watchers

Forkers

tcgabiolinks's Issues

Select only transcription factors (TFs) from DEGs

# The pvalue is 6.5e−01 in test.pdf.

# The pvalue is 0.7388095.

So what happened? And how cal I get the exact pvalue of survival?

Recommend Projects

Recommend Topics

Recommend Org