Coder Social home page Coder Social logo

labs's Introduction

Data Analysis for the Life Sciences

NEWS:

September 16, 2015 : We are reogranizing the labs here for the new courses launching this Fall. We have decided to drop the course1 style of directory structure, as the number of courses is still in flux. We are now using a modular structure. See renaming_map.md for how courses were remapped to new names.

Book versions

Compiled versions of this document as HTML can be found here:

http://genomicsclass.github.io/book/

The ePub version of this document can be found on Leanpub:

https://leanpub.com/dataanalysisforthelifesciences/

Pull requests and issues

We greatly appreciate all of our readers who contribute pull requests!

If you want to contribute through pull request, please first clone a new version of the repo. If you have a version of the repo from 2014, it will contain some large data objects, which accidentally snuck in, and we won't be able to accept your pull request.

Please do not add an issue which says "I couldn't knit the Rmd". This is nearly always because users are missing one or more of the libraries and datasets used within (we do not re-install libraries in each Rmd script as this would slow down our compilation of the book material). You will find the missing library if you step through the Rmd one chunk at a time.

labs's People

Contributors

aleksandradabrowska avatar alexnones avatar eronisko avatar gillsignals avatar hcorrada avatar jmgore75 avatar josemrecio avatar kern3020 avatar lwaldron avatar lzamparo avatar massie avatar mikelove avatar molecules avatar molx avatar neerajt avatar obicke avatar pkimes avatar rafalab avatar ririzarr avatar schifferl avatar setgree avatar ste-fan avatar stephaniehicks avatar tomsing1 avatar vjcitn avatar yeredh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

labs's Issues

Permission to port over dplyr tutorial to DataFramesMeta.jl

Hello,

I am the maintainer of the Julia package DataFramesMeta.jl. It is a data manipulation package for the Julia language and it is very similar to dplyr. I would like permission to port your dplyr tutorial to DataFramesMeta.jl and host it on our documentation website.

We are getting very close to releasing version 1.0 of DataFramesMeta and as a result I'm working on tutorials to help new users get on board.

Because so many of our users will be coming from dplyr, it makes sense to not try and re-invent the wheel when it comes to tutorials and instead port over existing tutorials. Your dplyr tutorial ranks pretty high on Google search and is a nice introduction.

Can I modify your tutorial to be a tutorial for DataFramesMeta.jl and host it on our website? This pretty much just involves surface-level syntax changes, but most of the text will remain intact.

Thank you!

Dplyr select() returning error

Hi all,

Running the below code causes this error:
Error i select(., Bodyweight) : unused argument (Bodyweight)

Here is the full code:

library(rafalib)
library(downloader)
library(devtools)
library(dplyr)

install_github("genomicsclass/dagdata")
dir <- system.file(package="dagdata")
filename <- file.path(dir,"extdata/mice_pheno.csv")
dat <- read.csv(filename)

controlPopulation <- filter(dat,Sex == "F" & Diet == "chow") %>% select(Bodyweight) %>% unlist

I am running the following R installation through RStudio:

R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)

I attach a screenshot from RStudio too.
sd

make an ePub, please

Could you release an ePub version? When I compiled it by myself, there were lots of errors. Maybe because some files are deprecated. Thank you.

Error using minfi::plotSex in methyl/minfi.Rmd

The second to last line of methyl/minfi.Rmd generates an error:

> plotSex(sex)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'colData' for signature '"DataFrame"'

I implemented some small fixes to this document in a PR to fix some deprecated functions which you may wish to apply first. I do not know where this DataFrame error comes from.

The session info is below. Thanks!

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0 IlluminaHumanMethylation450kmanifest_0.4.0        
 [3] minfi_1.30.0                                       HistData_0.8-4                                    
 [5] broom_0.5.2                                        Lahman_7.0-1                                      
 [7] tidytext_0.2.2                                     gutenbergr_0.1.4                                  
 [9] rvest_0.3.4                                        xml2_1.2.2                                        
[11] bumphunter_1.26.0                                  locfit_1.5-9.1                                    
[13] iterators_1.0.12                                   foreach_1.4.7                                     
[15] limma_3.40.6                                       coloncancermeth_1.0                               
[17] cummeRbund_2.26.0                                  Gviz_1.28.1                                       
[19] rtracklayer_1.44.2                                 fastcluster_1.1.25                                
[21] reshape2_1.4.3                                     RSQLite_2.1.2                                     
[23] DEXSeq_1.30.0                                      RColorBrewer_1.1-2                                
[25] pasilla_1.12.0                                     sva_3.32.1                                        
[27] genefilter_1.66.0                                  mgcv_1.8-28                                       
[29] nlme_3.1-141                                       org.Hs.eg.db_3.8.2                                
[31] pheatmap_1.0.12                                    vsn_3.52.0                                        
[33] DESeq2_1.24.0                                      rafalib_1.0.0                                     
[35] GenomicAlignments_1.20.1                           Rsamtools_2.0.0                                   
[37] Biostrings_2.52.0                                  XVector_0.24.0                                    
[39] airway_1.4.0                                       SummarizedExperiment_1.14.1                       
[41] DelayedArray_0.10.0                                BiocParallel_1.18.1                               
[43] matrixStats_0.54.0                                 forcats_0.4.0                                     
[45] stringr_1.4.0                                      dplyr_0.8.3                                       
[47] purrr_0.3.2                                        readr_1.3.1                                       
[49] tidyr_0.8.3                                        tibble_2.1.3                                      
[51] tidyverse_1.2.1                                    dslabs_0.7.1                                      
[53] Cen.ele6_1.0.0                                     TxDb.Celegans.UCSC.ce6.ensGene_3.2.2              
[55] org.Ce.eg.db_3.8.2                                 GO.db_3.8.2                                       
[57] OrganismDbi_1.26.0                                 GenomicFeatures_1.36.4                            
[59] AnnotationDbi_1.46.1                               Biobase_2.44.0                                    
[61] GenomicRanges_1.36.0                               GenomeInfoDb_1.20.0                               
[63] IRanges_2.18.1                                     S4Vectors_0.22.0                                  
[65] ERBS_1.0                                           erbsViz_0.0.0.9000                                
[67] juxtaPack_0.0.0.9000                               ggbio_1.32.0                                      
[69] ggplot2_3.2.1                                      BiocGenerics_0.30.0                               
[71] usethis_1.5.1                                     

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.1           SnowballC_0.6.0          GGally_1.4.0             pkgmaker_0.27            acepack_1.4.1           
  [6] bit64_0.9-7              knitr_1.24               data.table_1.12.2        rpart_4.1-15             hwriter_1.3.2           
 [11] GEOquery_2.52.0          RCurl_1.95-4.12          AnnotationFilter_1.8.0   generics_0.0.2           snow_0.4-3              
 [16] preprocessCore_1.46.0    callr_3.3.1              commonmark_1.7           bit_1.1-14               tokenizers_0.2.1        
 [21] lubridate_1.7.4          assertthat_0.2.1         xfun_0.9                 hms_0.5.1                scrime_1.3.5            
 [26] fansi_0.4.0              progress_1.2.2           readxl_1.3.1             DBI_1.0.0                geneplotter_1.62.0      
 [31] htmlwidgets_1.3          reshape_0.8.8            selectr_0.4-1            backports_1.1.4          annotate_1.62.0         
 [36] textdata_0.3.0           biomaRt_2.40.4           vctrs_0.2.0              remotes_2.1.0            ensembldb_2.8.0         
 [41] withr_2.1.2              triebeard_0.3.0          BSgenome_1.52.0          checkmate_1.9.4          prettyunits_1.0.2       
 [46] mclust_5.4.5             cluster_2.1.0            lazyeval_0.2.2           crayon_1.3.4             pkgconfig_2.0.2         
 [51] labeling_0.3             pkgload_1.0.2            ProtGenerics_1.16.0      nnet_7.3-12              devtools_2.1.0          
 [56] rlang_0.4.0              registry_0.5-1           affyio_1.54.0            modelr_0.1.5             dichromat_2.0-0         
 [61] cellranger_1.1.0         rprojroot_1.3-2          graph_1.62.0             rngtools_1.4             base64_2.0              
 [66] Matrix_1.2-17            urltools_1.7.3           Rhdf5lib_1.6.0           base64enc_0.1-3          whisker_0.4             
 [71] processx_3.4.1           clisymbols_1.2.0         bitops_1.0-6             DelayedMatrixStats_1.6.0 blob_1.2.0              
 [76] doRNG_1.7.1              nor1mix_1.3-0            scales_1.0.0             memoise_1.1.0            magrittr_1.5            
 [81] plyr_1.8.4               hexbin_1.27.3            bibtex_0.4.2             zlibbioc_1.30.0          compiler_3.6.1          
 [86] illuminaio_0.26.0        cli_1.1.0                affy_1.62.0              janeaustenr_0.1.5        ps_1.3.0                
 [91] htmlTable_1.13.1         Formula_1.2-3            MASS_7.3-51.4            tidyselect_0.2.5         stringi_1.4.3           
 [96] askpass_1.1              latticeExtra_0.6-28      VariantAnnotation_1.30.1 tools_3.6.1              rstudioapi_0.10         
[101] foreign_0.8-71           git2r_0.26.1             gridExtra_2.3            digest_0.6.20            BiocManager_1.30.4      
[106] quadprog_1.5-7           Rcpp_1.0.1               siggenes_1.58.0          httr_1.4.1               biovizBase_1.32.0       
[111] colorspace_1.4-1         XML_3.98-1.20            fs_1.3.1                 splines_3.6.1            RBGL_1.60.0             
[116] statmod_1.4.32           multtest_2.40.0          sessioninfo_1.1.1        xtable_1.8-4             jsonlite_1.6            
[121] zeallot_0.1.0            testthat_2.2.1           R6_2.4.0                 Hmisc_4.2-0              pillar_1.4.2            
[126] htmltools_0.3.6          glue_1.3.1               beanplot_1.2             codetools_0.2-16         pkgbuild_1.0.5          
[131] utf8_1.1.4               lattice_0.20-38          curl_4.0                 openssl_1.4.1            survival_2.44-1.1       
[136] roxygen2_6.1.1           desc_1.2.0               munsell_0.5.0            rhdf5_2.28.0             GenomeInfoDbData_1.2.1  
[141] HDF5Array_1.12.2         haven_2.1.1              gtable_0.3.0 

Issues in section "NGS experiments and the Poisson distribution"

I see several issues in this section:

  1. The section (in the printed book p. 285) states "Assuming most genes are differentially expressed across individuals, then, if the Poisson model is appropriate, there should be a linear relationship in this plot." It is not explained why this should be so. Is it referring to the mean and standard deviation in the Poisson distribution, both being lambda? But that wasn't covered in the course.
  2. The plot which is then generated (in thunk "var_vs_mean") displays variance against means. I think this should be standard deviation? Indeed, plotting sd against means shows a pretty linear picture, with the diagonal cutting through the middle.
  3. In the paragraph following the figure it is unclear what the "this" refers to: "The reason for this is that the variability plotted here includes biological variability [...]." Does the "this" refer to linearity or the absence of linearity?
  4. That paragraph introduces the concepts of "biological variability" and "sampling variability" as if they had been discussed previously. I don't think they are defined at any earlier point in the book. Also, given the seemingly quite linear relation, does this point still hold water?

PS: thanks for the great course and the book!

Trouble building biocintro_5x / bioc1_LiftOver.Rmd

@vjcitn
Building the biocintro_5x / bioc1_LiftOver.Rmd throws an error:

Quitting from lines 66-70 (bioc1_liftOver.Rmd)
Error in seqlevels<-(*tmp*, force = TRUE, value = "chr1") :
unused argument (force = TRUE)
Calls: ... handle -> withCallingHandlers -> withVisible -> eval -> eval

Error in knitr of all RMD files

I'm trying to use the RMD files associated with the Statistics & R course, and I keep getting an error on the first chunk of code:

opts_chunk$set(fig.path=paste0("figure/", sub("(.*).Rmd","\\1",basename(knitr:::knit_concord$get('infile'))), "-"))

Error says:
Error in basename(knitr:::knit_concord$get("infile")) : a character vector argument expected

Any suggestions?

Different t.test result in type II error simulation

Hi, Prof. Rafa!
I'm using R 3.6.3 version and doing some false negative demonstration based on edX PH525x course.
i'm using exactly same code with the lecture video and the book. Here is it
`
dat <- read.csv("mice_pheno.csv")

controlPopulation <- filter(dat,Sex == "F" & Diet == "chow") %>%
select(Bodyweight) %>% unlist

hfPopulation <- filter(dat,Sex == "F" & Diet == "hf") %>%
select(Bodyweight) %>% unlist

mu_hf <- mean(hfPopulation)
mu_control <- mean(controlPopulation)

mu_hf - mu_control
[1] 2.375517
(mu_hf - mu_control)/mu_control * 100 # percent increase
[1] 9.942157
'
So far the result still the same with the video.
After that:
'
set.seed(1)
N <- 5
hf <- sample(hfPopulation,N)
control <- sample(controlPopulation,N)
t.test(hf,control)$p.value
the result supposed to be0.1410204, but my result is 0.5806661`. I retried for several times and several generating value method, but the result hasn't changed.

Seeing that this material was last edited 4 years ago, then I think that there is a logarithmic difference in the 'set.seed()' function.

Glad if you help me

GSE5859 available for R version 3.3.1

Is there a GSE5859 available for R version 3.3.1
When running the following command to install the package

biocLite('GSE5859')
I get the following error
Warning message:
package ‘GSE5859’ is not available (for R version 3.3.1)
thanks!

Trouble building biocintro_5x / bioc1_summex.Rmd

@vjcitn

Building the biocintro_5x / bioc1_summex.Rmd throws an error:

Quitting from lines 109-115 (bioc1_summex.Rmd)
Error in .local(x, ...) :
unused argument (vals = list(tx_chrom = "chr14"))
Calls: ... withCallingHandlers -> withVisible -> eval -> eval -> genes -> genes

404 not found error in dataman_2019.Rmd

Lines 875-877 of dataman_2019.Rmd generate a 404 not found error after authenticating with Google BigQuery and returning to R as directed by the browser:

tcgaCon %>% tbl("Somatic_Mutation") %>% dplyr::filter(project_short_name=="TCGA-GBM") %>% 
       dplyr::select(Variant_Classification, Hugo_Symbol) %>% group_by(Variant_Classification) %>%
       summarise(n=n())
Error: HTTP error [404] Not Found

Is this the appropriate workflow? If so, what do learners need to know or do in order to not encounter this 404 error?

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] magrittr_1.5                                       dplyr_0.8.3                                       
 [3] bigrquery_1.2.0                                    RaggedExperiment_1.8.0                            
 [5] curatedTCGAData_1.6.0                              MultiAssayExperiment_1.10.4                       
 [7] VariantTools_1.26.0                                VariantAnnotation_1.30.1                          
 [9] ph525x_0.0.48                                      png_0.1-7                                         
[11] ldblock_1.14.2                                     erma_1.0.0                                        
[13] Homo.sapiens_1.3.1                                 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2           
[15] OrganismDbi_1.26.0                                 GenomicFeatures_1.36.4                            
[17] GenomicAlignments_1.20.1                           GenomicFiles_1.20.0                               
[19] rtracklayer_1.44.2                                 Rsamtools_2.0.0                                   
[21] RNAseqData.HNRNPC.bam.chr14_0.22.0                 IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
[23] IlluminaHumanMethylation450kmanifest_0.4.0         minfi_1.30.0                                      
[25] bumphunter_1.26.0                                  locfit_1.5-9.1                                    
[27] iterators_1.0.12                                   foreach_1.4.7                                     
[29] Biostrings_2.52.0                                  XVector_0.24.0                                    
[31] data.table_1.12.2                                  GO.db_3.8.2                                       
[33] org.Hs.eg.db_3.8.2                                 airway_1.4.0                                      
[35] SummarizedExperiment_1.14.1                        DelayedArray_0.10.0                               
[37] BiocParallel_1.18.1                                matrixStats_0.54.0                                
[39] GenomicRanges_1.36.0                               GenomeInfoDb_1.20.0                               
[41] ArrayExpress_1.44.0                                GEOquery_2.52.0                                   
[43] annotate_1.62.0                                    XML_3.98-1.20                                     
[45] AnnotationDbi_1.46.1                               IRanges_2.18.1                                    
[47] S4Vectors_0.22.0                                   Biobase_2.44.0                                    
[49] BiocGenerics_0.30.0                                GSE5859Subset_1.0                                 

loaded via a namespace (and not attached):
  [1] tidyselect_0.2.5              RSQLite_2.1.2                 munsell_0.5.0                 codetools_0.2-16             
  [5] preprocessCore_1.46.0         withr_2.1.2                   colorspace_1.4-1              knitr_1.24                   
  [9] rstudioapi_0.10               labeling_0.3                  GenomeInfoDbData_1.2.1        bit64_0.9-7                  
 [13] rhdf5_2.28.0                  vctrs_0.2.0                   xfun_0.9                      BiocFileCache_1.8.0          
 [17] affxparser_1.56.0             R6_2.4.0                      illuminaio_0.26.0             AnnotationFilter_1.8.0       
 [21] bitops_1.0-6                  reshape_0.8.8                 assertthat_0.2.1              promises_1.0.1               
 [25] scales_1.0.0                  gtable_0.3.0                  ensembldb_2.8.0               rlang_0.4.0                  
 [29] zeallot_0.1.0                 genefilter_1.66.0             splines_3.6.1                 lazyeval_0.2.2               
 [33] gargle_0.3.1                  BiocManager_1.30.4            yaml_2.2.0                    reshape2_1.4.3               
 [37] snpStats_1.34.0               backports_1.1.4               httpuv_1.5.1                  RBGL_1.60.0                  
 [41] tools_3.6.1                   nor1mix_1.3-0                 ggplot2_3.2.1                 affyio_1.54.0                
 [45] ff_2.2-14                     RColorBrewer_1.1-2            siggenes_1.58.0               Rcpp_1.0.1                   
 [49] plyr_1.8.4                    progress_1.2.2                zlibbioc_1.30.0               purrr_0.3.2                  
 [53] RCurl_1.95-4.12               prettyunits_1.0.2             openssl_1.4.1                 fs_1.3.1                     
 [57] ProtGenerics_1.16.0           hms_0.5.1                     mime_0.7                      xtable_1.8-4                 
 [61] mclust_5.4.5                  gridExtra_2.3                 compiler_3.6.1                biomaRt_2.40.4               
 [65] tibble_2.1.3                  crayon_1.3.4                  htmltools_0.3.6               later_0.8.0                  
 [69] snow_0.4-3                    tidyr_0.8.3                   oligo_1.48.0                  DBI_1.0.0                    
 [73] ExperimentHub_1.10.0          dbplyr_1.4.2                  MASS_7.3-51.4                 rappdirs_0.3.1               
 [77] EnsDb.Hsapiens.v75_2.99.0     Matrix_1.2-17                 readr_1.3.1                   quadprog_1.5-7               
 [81] pkgconfig_2.0.2               registry_0.5-1                xml2_1.2.2                    rngtools_1.4                 
 [85] pkgmaker_0.27                 multtest_2.40.0               beanplot_1.2                  bibtex_0.4.2                 
 [89] doRNG_1.7.1                   scrime_1.3.5                  stringr_1.4.0                 digest_0.6.20                
 [93] graph_1.62.0                  base64_2.0                    DelayedMatrixStats_1.6.0      curl_4.0                     
 [97] shiny_1.3.2                   jsonlite_1.6                  nlme_3.1-141                  Rhdf5lib_1.6.0               
[101] askpass_1.1                   limma_3.40.6                  BSgenome_1.52.0               pillar_1.4.2                 
[105] lattice_0.20-38               httr_1.4.1                    survival_2.44-1.1             interactiveDisplayBase_1.22.0
[109] glue_1.3.1                    UpSetR_1.4.0                  bit_1.1-14                    stringi_1.4.3                
[113] HDF5Array_1.12.2              blob_1.2.0                    oligoClasses_1.46.0           AnnotationHub_2.16.1         
[117] memoise_1.1.0     

Thanks!

Typos in labs/course3/machine_learning.Rmd

Update for the new version (3/15/2015):
line 131: It should have len=80 -- fixed
line 295: "should" should be "shade" -- now line 314
line 313: "specif" should be "specific" -- now line 332
line 317: There's something missing. Right now it looks like this:
In the code above you will notice that we created two sets data"" -- now line 336
line 426: \mobx should be \mbox -- now line 445

line 82: Not a typo, but there's a huge output from loading the SpikeIn library. You can suppress it by adding the chunk option message=FALSE. -- fixed

(There are some others, but I have to stop here.)

Trouble building biocintro_5x / bioc1_iranges.Rmd

@vjcitn
Building the biocintro_5x / bioc1_iranges.Rmd throws an error:

Error in elementLengths(grl) : could not find function "elementLengths"
Calls: ... handle -> withCallingHandlers -> withVisible -> eval -> eval
Execution halted

elementLengths() has been replaced with elementNROWS() in the latest IRanges package.

one probeID to multi gene symbols

In mapping_features.Rmd
idx <- match(rownames(e), res$PROBEID)

The method match will choose the first one when there are more than one gene symbols automatically.

gbm, assayNames errors in dataman2017.Rmd

Two errors are present that break the code in the "Working with TCGA mutation data" section.

When defining the gbm object in dataman2017.Rmd, there are errors. The gbm object is still defined, but I am not sure it is successfully updated.

>gbm = updateObject(gbm)
>gbm
A MultiAssayExperiment object of 12 listed
 experiments with user-defined names and respective classes. 
 Containing an Error in vapply(object, FUN = function(obj) { : values must be length 1,
 but FUN(X[[3]]) result is length 0
Error during wrapup: cannot get a slot ("slots") from an object of type "NULL"

This may be related to a downstream error in assayNames. :

> mut = experiments(gbm)[["Mutations"]]
> head(assayNames(mut))
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'assayNames' for signature '"RangedRaggedAssay"'

Later components of the TCGA code rely on mut and cannot be performed due to current errors.

Installing Bioconductor

  • installing_Bioconductor_finding_help.Rmd:77 with -> without
  • installing_Bioconductor_finding_help.Rmd:79-80 +library(geneplotter)

    this time evaluated so that plotMA works

Can't install the packages from github

Hi, I am using R 3.3.3 in window. I was trying to install "genomicsclass/GSE5859Subset" but it always fail with error as:

  • installing source package 'GSE5859Subset' ...
    ** data
    ** help
    No man pages found in package 'GSE5859Subset'
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded
    Warning in library(pkg_name, lib.loc = lib, character.only = TRUE, logical.return = TRUE) :
    no library trees found in 'lib.loc'
    Error: loading failed
    Execution halted
    ERROR: loading failed
  • removing '\dtu-storage/lumye/Documents/R/win-library/3.3/GSE5859Subset'
    Installation failed: Command failed (1)
    Can anyone help me solve this problem? Many thanks!
    Lumeng

Ambiguous sentence in advinference/multiple_testing.Rmd

(Sorry if this is not the appropriate channel for reporting issues regarding potential typos etc.)

I got confused by the wording "This implies that with a 0.05 p-value cut-off, out of the 100 tests we incorrectly call between 4 and 5 significant on average. " in this line:

The FDR is relatively high here. This is because for 90% of the tests, the null hypotheses is true. This implies that with a 0.05 p-value cut-off, out of the 100 tests we incorrectly call between 4 and 5 significant on average. This combined with the fact that we don't "catch" all the cases where the alternative is true, gives us a relatively high FDR. So how can we control this? What if we want lower FDR, say 5%?

Stating "the 100 tests" made me initially think that the number 100 was supposed to refer to the number of experiments in the Monte Carlo simulation, which was obviously wrong since there are 10,000 experiments/tests for each replication. Did the authors mean something to this effect(?):

Since there are 9000 tests where the null hypothesis is true and the chosen significance level is 0.05, it follows that we incorrectly call between 400 and 500 tests significant on average (5% of 9000 equals 450) .

Getting Started Exercises question #1 sentence malformed

From http://genomicsclass.github.io/book/pages/getting_started_exercises.html - question 1 doesn't make sense:

"Read in the file femaleMiceWeights.csv and report the body weight of the mouse in the exact name of the column containing the weights."

Perhaps it meant:

"Read in the file femaleMiceWeights.csv and report a) the body weights of all the mice, and b) the exact name of the column containing the weights."

Also, is the source for the exercises in this repo? I could find the getting started exercises.

R.4.3.3 package installation

Greetings,

Is there any problem to install the package for R Ver. 4.3.3?
I have problem to do this!

My R Ver. 4.3.3, and Rstudio Ver. is 1.1.419

Regards,

Issue with devtools and genomicsclass/GSE5859Subset

Hello,
I am stuck for the past two days. I need to use genomicsclass/GSE5859Subset and
for that I have installed "devtools". I have also installed Rtools (Rtools34) from CRAN. I am running version 3.6.0 of RStudio.

I get these warning and error messages. I cannot use the GSE5859Subset dataset. Any help would be greatly appreciated.

library(devtools)
Loading required package: usethis
Warning messages:
1: package ‘devtools’ was built under R version 3.6.3
2: package ‘usethis’ was built under R version 3.6.3
install_github("genomicsclass/GSE5859Subset")
Error: Failed to install 'unknown package' from GitHub:
HTTP error 403.
API rate limit exceeded for 157.32.239.55. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

Rate limit remaining: 0/60
Rate limit reset at: 2020-05-26 15:42:00 UTC

To increase your GitHub API rate limit

  • Use usethis::browse_github_pat() to create a Personal Access Token.
  • Use usethis::edit_r_environ() and add the token as GITHUB_PAT.

library(GSE5859Subset)
Error in library(GSE5859Subset) :
there is no package called ‘GSE5859Subset’
data(GSE5859Subset)
Warning message:
In data(GSE5859Subset) : data set ‘GSE5859Subset’ not found

GRanges objects don't support lapply, unlist at the moment

Line 181 of bioc2_integExamps.Rmd generates the following error:

> phset = lapply( ovrngs, function(x)
+   unique( gwrngs19[ which(gwrngs19 %over% x) ]$Disease.Trait ) )
Error in getListElement(x, i, ...) : 
  GRanges objects don't support [[, as.list(), lapply(), or
  unlist() at the moment

Thanks! Here is the sessionInfo:

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
 [1] tools     grid      stats4    parallel  stats     graphics  grDevices utils    
 [9] datasets  methods   base     

other attached packages:
 [1] curatedTCGAData_1.6.0                             
 [2] harbChIP_1.22.0                                   
 [3] yeastCC_1.24.0                                    
 [4] gwascat_2.16.0                                    
 [5] ERBS_1.0                                          
 [6] magrittr_1.5                                      
 [7] dplyr_0.8.3                                       
 [8] bigrquery_1.2.0                                   
 [9] VariantTools_1.26.0                               
[10] VariantAnnotation_1.30.1                          
[11] RaggedExperiment_1.8.0                            
[12] MultiAssayExperiment_1.10.4                       
[13] GenomicAlignments_1.20.1                          
[14] BiocStyle_2.12.0                                  
[15] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
[16] IlluminaHumanMethylation450kmanifest_0.4.0        
[17] minfi_1.30.0                                      
[18] bumphunter_1.26.0                                 
[19] locfit_1.5-9.1                                    
[20] iterators_1.0.12                                  
[21] foreach_1.4.7                                     
[22] annotate_1.62.0                                   
[23] XML_3.98-1.20                                     
[24] GSE5859Subset_1.0                                 
[25] airway_1.4.0                                      
[26] ph525x_0.0.48                                     
[27] png_0.1-7                                         
[28] RNAseqData.HNRNPC.bam.chr14_0.22.0                
[29] erma_1.0.0                                        
[30] GenomicFiles_1.20.0                               
[31] rtracklayer_1.44.2                                
[32] Rsamtools_2.0.0                                   
[33] Biostrings_2.52.0                                 
[34] XVector_0.24.0                                    
[35] SummarizedExperiment_1.14.1                       
[36] DelayedArray_0.10.0                               
[37] BiocParallel_1.18.1                               
[38] matrixStats_0.54.0                                
[39] Homo.sapiens_1.3.1                                
[40] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2           
[41] org.Hs.eg.db_3.8.2                                
[42] GO.db_3.8.2                                       
[43] OrganismDbi_1.26.0                                
[44] GenomicFeatures_1.36.4                            
[45] GenomicRanges_1.36.0                              
[46] GenomeInfoDb_1.20.0                               
[47] AnnotationDbi_1.46.1                              
[48] IRanges_2.18.1                                    
[49] S4Vectors_0.22.0                                  
[50] GEOquery_2.52.0                                   
[51] data.table_1.12.2                                 
[52] Biobase_2.44.0                                    
[53] BiocGenerics_0.30.0                               

loaded via a namespace (and not attached):
  [1] tidyselect_0.2.5              RSQLite_2.1.2                
  [3] devtools_2.1.0                munsell_0.5.0                
  [5] codetools_0.2-16              preprocessCore_1.46.0        
  [7] withr_2.1.2                   colorspace_1.4-1             
  [9] knitr_1.24                    rstudioapi_0.10              
 [11] labeling_0.3                  GenomeInfoDbData_1.2.1       
 [13] bit64_0.9-7                   rhdf5_2.28.0                 
 [15] rprojroot_1.3-2               vctrs_0.2.0                  
 [17] xfun_0.9                      BiocFileCache_1.8.0          
 [19] R6_2.4.0                      illuminaio_0.26.0            
 [21] AnnotationFilter_1.8.0        bitops_1.0-6                 
 [23] reshape_0.8.8                 assertthat_0.2.1             
 [25] promises_1.0.1                scales_1.0.0                 
 [27] gtable_0.3.0                  processx_3.4.1               
 [29] ensembldb_2.8.0               rlang_0.4.0                  
 [31] zeallot_0.1.0                 genefilter_1.66.0            
 [33] splines_3.6.1                 lazyeval_0.2.2               
 [35] gargle_0.3.1                  BiocManager_1.30.4           
 [37] yaml_2.2.0                    snpStats_1.34.0              
 [39] backports_1.1.4               httpuv_1.5.1                 
 [41] RBGL_1.60.0                   usethis_1.5.1                
 [43] nor1mix_1.3-0                 ggplot2_3.2.1                
 [45] RColorBrewer_1.1-2            siggenes_1.58.0              
 [47] sessioninfo_1.1.1             Rcpp_1.0.1                   
 [49] plyr_1.8.4                    progress_1.2.2               
 [51] zlibbioc_1.30.0               purrr_0.3.2                  
 [53] RCurl_1.95-4.12               ps_1.3.0                     
 [55] prettyunits_1.0.2             openssl_1.4.1                
 [57] fs_1.3.1                      ProtGenerics_1.16.0          
 [59] pkgload_1.0.2                 hms_0.5.1                    
 [61] mime_0.7                      evaluate_0.14                
 [63] xtable_1.8-4                  mclust_5.4.5                 
 [65] gridExtra_2.3                 testthat_2.2.1               
 [67] compiler_3.6.1                biomaRt_2.40.4               
 [69] tibble_2.1.3                  crayon_1.3.4                 
 [71] htmltools_0.3.6               later_0.8.0                  
 [73] tidyr_0.8.3                   ldblock_1.14.2               
 [75] DBI_1.0.0                     ExperimentHub_1.10.0         
 [77] dbplyr_1.4.2                  rappdirs_0.3.1               
 [79] MASS_7.3-51.4                 EnsDb.Hsapiens.v75_2.99.0    
 [81] Matrix_1.2-17                 readr_1.3.1                  
 [83] cli_1.1.0                     quadprog_1.5-7               
 [85] pkgconfig_2.0.2               registry_0.5-1               
 [87] xml2_1.2.2                    rngtools_1.4                 
 [89] pkgmaker_0.27                 multtest_2.40.0              
 [91] beanplot_1.2                  bibtex_0.4.2                 
 [93] doRNG_1.7.1                   scrime_1.3.5                 
 [95] stringr_1.4.0                 callr_3.3.1                  
 [97] digest_0.6.20                 graph_1.62.0                 
 [99] rmarkdown_1.15                base64_2.0                   
[101] DelayedMatrixStats_1.6.0      curl_4.0                     
[103] shiny_1.3.2                   nlme_3.1-141                 
[105] jsonlite_1.6                  Rhdf5lib_1.6.0               
[107] desc_1.2.0                    askpass_1.1                  
[109] limma_3.40.6                  BSgenome_1.52.0              
[111] pillar_1.4.2                  lattice_0.20-38              
[113] httr_1.4.1                    pkgbuild_1.0.4               
[115] survival_2.44-1.1             interactiveDisplayBase_1.22.0
[117] glue_1.3.1                    remotes_2.1.0                
[119] UpSetR_1.4.0                  bit_1.1-14                   
[121] stringi_1.4.3                 HDF5Array_1.12.2             
[123] blob_1.2.0                    AnnotationHub_2.16.0         
[125] memoise_1.1.0 

Missing RData File for the quiz

Hi! It is impossible to download the RData File for the QQ Plot Exercise . Can you share it here please or share the link. I need it to complete the quiz
Thank you for your help !

Inquiries about solutions for the book's exercise answer

Hello! Thanks so much for the fantastic materials here. I am wondering if there are solutions for the exercises in the book Data Analysis for the Life Sciences? I think it will be helpful for the readers to verify the answers.

stack1kg error in dataman2017.Rmd

The stack1kg function does not run successfully:

>library(ldblock)
>sta = stack1kg()
Error in validObject(.Object) : 
  invalid class “VcfStack” object: all rownames(object) must be in seqlevels(object)

The content in the textbook section "1000 Genomes VCF in the cloud" depends on the sta object produced by running this function with no arguments.

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ldblock_1.14.0                                     HDF5Array_1.12.2                                  
 [3] rhdf5_2.28.0                                       ArrayExpress_1.44.0                               
 [5] magrittr_1.5                                       dplyr_0.8.3                                       
 [7] bigrquery_1.2.0                                    VariantTools_1.26.0                               
 [9] VariantAnnotation_1.30.1                           RaggedExperiment_1.8.0                            
[11] MultiAssayExperiment_1.10.4                        GenomicAlignments_1.20.1                          
[13] BiocStyle_2.12.0                                   IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
[15] IlluminaHumanMethylation450kmanifest_0.4.0         minfi_1.30.0                                      
[17] bumphunter_1.26.0                                  locfit_1.5-9.1                                    
[19] iterators_1.0.12                                   foreach_1.4.7                                     
[21] GSE5859Subset_1.0                                  airway_1.4.0                                      
[23] ph525x_0.0.48                                      png_0.1-7                                         
[25] RNAseqData.HNRNPC.bam.chr14_0.22.0                 erma_1.0.0                                        
[27] GenomicFiles_1.20.0                                rtracklayer_1.44.2                                
[29] Rsamtools_2.0.0                                    Biostrings_2.52.0                                 
[31] XVector_0.24.0                                     SummarizedExperiment_1.14.1                       
[33] DelayedArray_0.10.0                                BiocParallel_1.18.0                               
[35] matrixStats_0.54.0                                 Homo.sapiens_1.3.1                                
[37] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2            org.Hs.eg.db_3.8.2                                
[39] GO.db_3.8.2                                        OrganismDbi_1.26.0                                
[41] GenomicFeatures_1.36.4                             GenomicRanges_1.36.0                              
[43] GenomeInfoDb_1.20.0                                GEOquery_2.52.0                                   
[45] data.table_1.12.2                                  knitr_1.24                                        
[47] geneplotter_1.62.0                                 annotate_1.62.0                                   
[49] XML_3.98-1.20                                      AnnotationDbi_1.46.0                              
[51] IRanges_2.18.1                                     S4Vectors_0.22.0                                  
[53] lattice_0.20-38                                    Biobase_2.44.0                                    
[55] BiocGenerics_0.30.0                               

loaded via a namespace (and not attached):
  [1] snow_0.4-3               backports_1.1.4          plyr_1.8.4               lazyeval_0.2.2          
  [5] oligo_1.48.0             splines_3.6.1            ggplot2_3.2.1            digest_0.6.20           
  [9] htmltools_0.3.6          memoise_1.1.0            BSgenome_1.52.0          limma_3.40.6            
 [13] readr_1.3.1              askpass_1.1              siggenes_1.58.0          prettyunits_1.0.2       
 [17] colorspace_1.4-1         blob_1.2.0               xfun_0.8                 jsonlite_1.6            
 [21] crayon_1.3.4             RCurl_1.95-4.12          graph_1.62.0             genefilter_1.66.0       
 [25] zeallot_0.1.0            survival_2.44-1.1        glue_1.3.1               registry_0.5-1          
 [29] gtable_0.3.0             zlibbioc_1.30.0          Rhdf5lib_1.6.0           scales_1.0.0            
 [33] DBI_1.0.0                rngtools_1.4             bibtex_0.4.2             Rcpp_1.0.1              
 [37] xtable_1.8-4             progress_1.2.2           bit_1.1-14               mclust_5.4.5            
 [41] preprocessCore_1.46.0    httr_1.4.1               RColorBrewer_1.1-2       ff_2.2-14               
 [45] pkgconfig_2.0.2          reshape_0.8.8            labeling_0.3             reshape2_1.4.3          
 [49] tidyselect_0.2.5         rlang_0.4.0              later_0.8.0              munsell_0.5.0           
 [53] tools_3.6.1              RSQLite_2.1.2            evaluate_0.14            stringr_1.4.0           
 [57] yaml_2.2.0               bit64_0.9-7              oligoClasses_1.46.0      beanplot_1.2            
 [61] scrime_1.3.5             purrr_0.3.2              RBGL_1.60.0              nlme_3.1-141            
 [65] doRNG_1.7.1              mime_0.7                 nor1mix_1.3-0            xml2_1.2.2              
 [69] biomaRt_2.40.3           compiler_3.6.1           rstudioapi_0.10          curl_4.0                
 [73] affyio_1.54.0            tibble_2.1.3             stringi_1.4.3            Matrix_1.2-17           
 [77] multtest_2.40.0          vctrs_0.2.0              pillar_1.4.2             BiocManager_1.30.4      
 [81] snpStats_1.34.0          bitops_1.0-6             httpuv_1.5.1             R6_2.4.0                
 [85] promises_1.0.1           affxparser_1.56.0        codetools_0.2-16         MASS_7.3-51.4           
 [89] assertthat_0.2.1         openssl_1.4.1            pkgmaker_0.27            withr_2.1.2             
 [93] GenomeInfoDbData_1.2.1   hms_0.5.0                quadprog_1.5-7           tidyr_0.8.3             
 [97] base64_2.0               rmarkdown_1.14           DelayedMatrixStats_1.6.0 illuminaio_0.26.0       
[101] shiny_1.3.2             

error in Exercise 13, "Inference for High Dimensional Data"

Two small errors:

Create a Monte Carlo Simulation in which you simulate measurements from 8,793 genes for 24 samples, 12 cases and 12 controls. The for 100 genes create a difference of 1 between cases and

Change "The" to "Then"

n <- 24
m <- 8793
mat <- matrix(rnorm(n*m),m,n)
delta <- 1
positives <- 500   ###SHOULD BE 100
mat[1:positives,1:(n/2)] <- mat[1:positives,1:(n/2)]+delta

positives should be 100, or number of genes above should be 500.

New run of the courses?

Hi. I am not sure whether this is a good place to ask, but I am wondering will there be a new run of the PH525 courses next year? I finished 3 courses in the Data Analysis for Genomics Certificate this year and I am interested in taking the other 4 courses in the coming year if possible. Thanks.

different result of t.test in type II error simulation

Hi, Prof. Rafa!
I'm using R 3.6.3 version and doing some false negative demonstration based on edX PH525x course.
i'm using exactly same code with the lecture video and the book. Here is it
`
dat <- read.csv("mice_pheno.csv")

controlPopulation <- filter(dat,Sex == "F" & Diet == "chow") %>%
select(Bodyweight) %>% unlist

hfPopulation <- filter(dat,Sex == "F" & Diet == "hf") %>%
select(Bodyweight) %>% unlist

mu_hf <- mean(hfPopulation)
mu_control <- mean(controlPopulation)

mu_hf - mu_control
[1] 2.375517
(mu_hf - mu_control)/mu_control * 100 # percent increase
[1] 9.942157
'
So far the result still the same with the video.
After that:
'
set.seed(1)
N <- 5
hf <- sample(hfPopulation,N)
control <- sample(controlPopulation,N)
t.test(hf,control)$p.value
the result supposed to be0.1410204, but my result is 0.5806661`. I retried for several times and several generating value method, but the result hasn't changed.

Seeing that this material was last edited 4 years ago, then I think that there is a logarithmic difference in the 'set.seed()' function.

Glad if you help me

Can't make progression in Swirl package due to error

When founding this error I cannot make progression. I can also not go back to the menu and select another topic or skip the question. Can somebody help me?

Error in TRUE && c(TRUE, FALSE, FALSE) :
'length = 3' in coercion to 'logical(1)'

Incomplete sentence in Factor Analysis chapter

In the first paragraph:

Before we introduce the next type of statistical method for batch effect correction, we introduce the statistical idea that motivates the main idea: Factor Analysis. Factor Analysis was first developed over a century ago. Karl Pearson noted that correlation between different subjects when the correlation was computed across students. To explain this, he posed a model having one factor that was common across subjects for each student that explained this correlation:

The incomplete sentence is: "Karl Pearson noted that correlation between different subjects when the correlation was computed across students."

I don't know what this sentence is supposed to say, so I will not attempt to fix it.

Incidentally there is also a typo in the following equation,

Y_ij = \alpha_i W_1 + \varepsilon_{ij}
, where Y_ij should be Y_{ij}

How could I get the answer of the exercises in PH525x series?

Sorry to trouble you, I'm a beginner of bioinformatics and recently I'm reading your book "PH525x series - Biomedical Data Science". I followed the chapter and did the exercises, but I can't find the answer so I came here for help. Could you give me a link to the answer? Thank you.

getGEO commands in multiple Rmds give HTTP 404 error

I'm from HarvardX and assigned to test and update these courses for rerelease. I'm having trouble running several of the Rmd files and the associated code in the videos due to getGEO issues. Downloading files gives HTTP 404 issues:

For example, this code from "biocintro_5x/dataman2017.Rmd" gives such an error:

library(GEOquery)
glioMA <- getGEO("GSE78703")[[1]]`
> Error in open.connection(x, "rb") : HTTP error 404.`

Here's my session info if needed:

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] AnnotationDbi_1.46.0 IRanges_2.18.1       S4Vectors_0.22.0     GEOquery_2.52.0      data.table_1.12.2   
[6] Biobase_2.44.0       BiocGenerics_0.30.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1         pillar_1.4.2       compiler_3.6.1     BiocManager_1.30.4 bitops_1.0-6      
 [6] tools_3.6.1        digest_0.6.20      zeallot_0.1.0      bit_1.1-14         memoise_1.1.0     
[11] RSQLite_2.1.2      tibble_2.1.3       pkgconfig_2.0.2    rlang_0.4.0        DBI_1.0.0         
[16] rstudioapi_0.10    yaml_2.2.0         curl_4.0           xfun_0.8           dplyr_0.8.3       
[21] knitr_1.23         xml2_1.2.1         vctrs_0.2.0        hms_0.5.0          bit64_0.9-7       
[26] tidyselect_0.2.5   glue_1.3.1         R6_2.4.0           limma_3.40.6       tidyr_0.8.3       
[31] readr_1.3.1        purrr_0.3.2        blob_1.2.0         magrittr_1.5       backports_1.1.4   
[36] assertthat_0.2.1   RCurl_1.95-4.12    crayon_1.3.4

getFirehoseData fails in tcga.Rmd: cannot open the connection

The TCGA firehose data download on tcga.Rmd line 49 throws an error stating the connection cannot be opened:

> library(ph525x)
> firehose()
> library(RTCGAToolbox)
> readData = getFirehoseData (dataset="READ", runDate="20150402",forceDownload = TRUE,
+     Clinic=TRUE, Mutation=TRUE, Methylation=TRUE, RNASeq2GeneNorm=TRUE)
gdac.broadinstitute.org_READ.Clinical_Pick_Tier1.Level_4.2015040200.0.0.tar.gz
trying URL 'http://gdac.broadinstitute.org/runs/stddata__2015_04_02/data/READ/20150402/gdac.broadinstitute.org_READ.Clinical_Pick_Tier1.Level_4.2015040200.0.0.tar.gz'
Content type 'application/x-gzip' length 4754 bytes
downloaded 4754 bytes

gdac.broadinstitute.org_READ.Clinical_Pick_Tier1.Level_4.2015040200.0.0
gdac.broadinstitute.org_READ.Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2015040200.0.0.tar.gzgdac.broadinstitute.org_READ.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2015040200.0.0.tar.gz
trying URL 'http://gdac.broadinstitute.org/runs/stddata__2015_04_02/data/READ/20150402/gdac.broadinstitute.org_READ.Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2015040200.0.0.tar.gz'
Content type 'application/x-gzip' length 5917492 bytes (5.6 MB)
downloaded 5.6 MB

gdac.broadinstitute.org_READ.Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2015040200.0.0
cannot open file './20150402-READ-RNAseq2GeneNorm.txt': No such file or directoryError in file(file, "rt") : cannot open the connection

Much of the following code in the section and the related course videos depend on the output of this command.

In addition, the following code block on line 53 appears to read a local path on your machine.

Here is the sessionInfo:

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
 [1] grid      tools     parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RTCGAToolbox_2.14.0                     ph525x_0.0.48                           png_0.1-7                              
 [4] yeastCC_1.24.0                          harbChIP_1.22.0                         Biostrings_2.52.0                      
 [7] XVector_0.24.0                          ERBS_1.0                                gwascat_2.16.0                         
[10] Homo.sapiens_1.3.1                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 org.Hs.eg.db_3.8.2                     
[13] GO.db_3.8.2                             OrganismDbi_1.26.0                      GenomicFeatures_1.36.4                 
[16] GenomicRanges_1.36.0                    GenomeInfoDb_1.20.0                     ggbio_1.32.0                           
[19] ggplot2_3.2.1                           AnnotationDbi_1.46.1                    IRanges_2.18.1                         
[22] S4Vectors_0.22.0                        Biobase_2.44.0                          BiocGenerics_0.30.0                    

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.16.0         bitops_1.0-6                matrixStats_0.54.0          bit64_0.9-7                
 [5] RColorBrewer_1.1-2          progress_1.2.2              httr_1.4.1                  backports_1.1.4            
 [9] R6_2.4.0                    rpart_4.1-15                Hmisc_4.2-0                 DBI_1.0.0                  
[13] lazyeval_0.2.2              colorspace_1.4-1            nnet_7.3-12                 withr_2.1.2                
[17] tidyselect_0.2.5            gridExtra_2.3               prettyunits_1.0.2           GGally_1.4.0               
[21] bit_1.1-14                  curl_4.0                    compiler_3.6.1              graph_1.62.0               
[25] htmlTable_1.13.1            DelayedArray_0.10.0         rtracklayer_1.44.2          scales_1.0.0               
[29] checkmate_1.9.4             RBGL_1.60.0                 RCircos_1.2.1               stringr_1.4.0              
[33] digest_0.6.20               Rsamtools_2.0.0             foreign_0.8-71              base64enc_0.1-3            
[37] dichromat_2.0-0             pkgconfig_2.0.2             htmltools_0.3.6             limma_3.40.6               
[41] ensembldb_2.8.0             BSgenome_1.52.0             htmlwidgets_1.3             rlang_0.4.0                
[45] rstudioapi_0.10             RSQLite_2.1.2               BiocParallel_1.18.1         acepack_1.4.1              
[49] dplyr_0.8.3                 VariantAnnotation_1.30.1    RCurl_1.95-4.12             magrittr_1.5               
[53] GenomeInfoDbData_1.2.1      Formula_1.2-3               Matrix_1.2-17               Rcpp_1.0.1                 
[57] munsell_0.5.0               stringi_1.4.3               RaggedExperiment_1.8.0      RJSONIO_1.3-1.2            
[61] SummarizedExperiment_1.14.1 zlibbioc_1.30.0             plyr_1.8.4                  blob_1.2.0                 
[65] crayon_1.3.4                lattice_0.20-38             splines_3.6.1               hms_0.5.1                  
[69] zeallot_0.1.0               knitr_1.24                  pillar_1.4.2                reshape2_1.4.3             
[73] biomaRt_2.40.4              XML_3.98-1.20               glue_1.3.1                  biovizBase_1.32.0          
[77] latticeExtra_0.6-28         BiocManager_1.30.4          data.table_1.12.2           vctrs_0.2.0                
[81] gtable_0.3.0                purrr_0.3.2                 reshape_0.8.8               assertthat_0.2.1           
[85] xfun_0.9                    AnnotationFilter_1.8.0      survival_2.44-1.1           tibble_2.1.3               
[89] GenomicAlignments_1.20.1    memoise_1.1.0               cluster_2.1.0 

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.