uclouvain-cbio / depmap Goto Github PK

View Code? Open in Web Editor NEW

24.0 3.0 7.0 1.51 GB

Cancer Dependency Map package

Home Page: https://uclouvain-cbio.github.io/depmap/

R 100.00%

bioconductor bioconductor-package depmap

depmap's People

Contributors

Stargazers

Watchers

Forkers

huiwuzhao back-kom tfkillian allisonvuong allisonvmitch kelly1210

depmap's Issues

depmap licence

I saw that you choose GPL-2 | GPL-3 for a licence? Any reason why both?
Is there a specific license for the data? Does the depmap data portal say anything?

Installation Failure

So I installed ~2 days ago, no major issues. However, today, while helping a colleague I got this error then got it myself trying the same install:

Error: Failed to install 'depmap' from GitHub:
(converted from warning) installation of package ‘C:/Users/username/AppData/Local/Temp/Rtmp4uHAOP/file2b249ff41c2/depmap_0.99.5.tar.gz’ had non-zero exit status

Ensembl_id

It would be great to include in rnai and the crispy datasets a column with gene Ensembl identifiers, in order to be able to merge in a straightforward manner these datasets with the tpm dataset (where genes are identified by Ensembl_ids)

Update depamp for new dplyr release

This is an automated email to let you know that:

A new version of dplyr is ready to go to CRAN. dplyr is currently at version 0.8.99.9002 and will become 1.0.0 upon release.
depmap uses dplyr and has problems with the new version.
We plan to submit dplyr to CRAN on May 1.

This is a major release. See https://www.tidyverse.org/blog/2020/03/dplyr-1-0-0-is-coming-soon/ for a detailed article about what's changed.

I need your help to keep depmap and dplyr working together smoothly. In the next weeks, can you please:

Read about the changes to dplyr at https://github.com/tidyverse/dplyr/blob/master/NEWS.md. This page includes a list of breaking changes, the reasoning behind them, and to how to update your code.
Carefully inspect the failing checks listed at the bottom of this email.
For each failing check, either update your package, or tell me that I have a bug. If you have made changes to your package, please submit an update to CRAN before May 1.

If you have discovered a bug in dplyr, please file an issue (ideally with a small reprex that illustrates the problem) at https://github.com/tidyverse/dplyr/issues. If you're not sure whether or not you've found a bug, please file an issue at https://github.com/tidyverse/dplyr/issues for discussion. Breaking changes that are not listed qualify as bugs.

Please respond to this message if you have any questions.

Add a tissue column for cell lines

Suggestion from @aloriot

serve data as MultiAssayExperiment?

Hi,

Thank you for this very useful package. As many of the CCLE lines have now been screened in Achilles experiments, I'm wondering whether any thought has been given to harmonizing and serving this data as a MultiAssayExperiment. This would enable the user to subset the MAE by a set of cell lines of interest, and easily retrieve all omics data pertaining to that subset of lines.

If you think this is a good idea, I am also happy to contribute an initial PR if more hands would be appreciated.

Best,
Allison

Install Error

Hello,

I have an error trying to install depmap:

Error: package or namespace load failed for 'depmap':
 .onLoad failed in loadNamespace() for 'depmap', details:
  call: h(simpleError(msg, call))
  error: error in evaluating the argument 'x' in selecting a method for function 'query': DEFUNCT: As of ExperimentHub (>1.17.2), default caching location has changed.
  Problematic cache: C:\Users\ADMINI~1\AppData\Local/ExperimentHub/ExperimentHub/Cache

Error: loading failed
Execution halted
ERROR: loading failed
* removing 'F:/software/R-4.2.2/library/depmap'

Thanks
Best
Mirrersan

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 14393)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] ps_1.7.2            prettyunits_1.1.1   crayon_1.5.2        withr_2.5.0         rprojroot_2.0.3    
 [6] R6_2.5.1            cli_3.6.0           curl_5.0.0          rstudioapi_0.14     remotes_2.4.2      
[11] callr_3.7.3         desc_1.4.2          tools_4.2.2         compiler_4.2.2      processx_3.8.0     
[16] pkgbuild_1.4.0      BiocManager_1.30.20

Enhance drug sensitivity data with compound metadata

@tfkillian , @lgatto , the depmap_drug_sensitivity() data is very useful, but it currently omits the metadata on the compound, which is an internal broad identifier.

> dr <- depmap::depmap_drug_sensitivity()
snapshotDate(): 2021-02-07
see ?depmap and browseVignettes('depmap') for documentation
loading from cache
> dr
# A tibble: 2,708,508 x 4
   depmap_id  cell_line             compound                         dependency
   <chr>      <chr>                 <chr>                                 <dbl>
 1 ACH-000001 NIHOVCAR3_OVARY       BRD-A00077618-236-07-6::2.5::HTS    -0.0156
 2 ACH-000007 LS513_LARGE_INTESTINE BRD-A00077618-236-07-6::2.5::HTS    -0.0957
 3 ACH-000008 A101D_SKIN            BRD-A00077618-236-07-6::2.5::HTS     0.379
 4 ACH-000010 NCIH2077_LUNG         BRD-A00077618-236-07-6::2.5::HTS     0.119
 5 ACH-000011 253J_URINARY_TRACT    BRD-A00077618-236-07-6::2.5::HTS     0.145
 6 ACH-000012 HCC827_LUNG           BRD-A00077618-236-07-6::2.5::HTS     0.103
 7 ACH-000013 ONCODG1_OVARY         BRD-A00077618-236-07-6::2.5::HTS     0.353
 8 ACH-000014 HS294T_SKIN           BRD-A00077618-236-07-6::2.5::HTS     0.128
 9 ACH-000015 NCIH1581_LUNG         BRD-A00077618-236-07-6::2.5::HTS     0.167
10 ACH-000018 T24_URINARY_TRACT     BRD-A00077618-236-07-6::2.5::HTS     0.832

The relevant associated metadata appears to be in file: primary-screen-replicate-collapsed-treatment-info.csv. Is this something you would consider merging and serving?

> md <- read.table("primary-screen-replicate-collapsed-treatment-info.csv", sep=",", quote='\"', header=TRUE, comment.char="")
> head(md)
                                  column_name               broad_id       name
1 BRD-A00055058-001-01-0::2.325889319::MTS004 BRD-A00055058-001-01-0    RS-0481
2         BRD-A00842753-001-01-9::2.5::MTS004 BRD-A00842753-001-01-9 oleuropein
3         BRD-A02232681-001-01-8::2.5::MTS004 BRD-A02232681-001-01-8 isoleucine
4         BRD-A04447196-001-01-8::2.5::MTS004 BRD-A04447196-001-01-8  gepefrine
5  BRD-A04971881-003-01-3::2.65294603::MTS004 BRD-A04971881-003-01-3 cloranolol
6         BRD-A08316590-001-01-3::2.5::MTS004 BRD-A08316590-001-01-3 broxaterol
      dose screen_id                            moa
1 2.325889    MTS004                immunostimulant
2 2.500000    MTS004      estrogen receptor agonist
3 2.500000    MTS004                           <NA>
4 2.500000    MTS004    adrenergic receptor agonist
5 2.652946    MTS004 adrenergic receptor antagonist
6 2.500000    MTS004    adrenergic receptor agonist
                             target disease.area  indication
1                              <NA>         <NA>        <NA>
2                             GPER1         <NA>        <NA>
3 ACADSB, BCAT1, BCAT2, IARS, IARS2         <NA>        <NA>
4                              <NA>   cardiology hypotension
5               ADRB1, ADRB2, ADRB3         <NA>        <NA>
6                             ADRB2         <NA>        <NA>
                                                                 smiles
1                                CC(NC(=O)C1CSCN1C(=O)c1ccccc1)c1ccccc1
2 COC(=O)C1=COC(OC2OC(CO)C(O)C(O)C2O)\\C(=C/C)C1CC(=O)OCCc1ccc(O)c(O)c1
3                                                      CCC(C)C(N)C(O)=O
4                                                     CC(N)Cc1cccc(O)c1
5                                        CC(C)(C)NCC(O)COc1cc(Cl)ccc1Cl
6                                             CC(C)(C)NCC(O)c1cc(Br)no1
     phase
1  Phase 2
2  Phase 2
3 Launched
4 Launched
5 Launched
6  Phase 3

Error installing depmap

Hello,

I have an error trying to install depmap:

Error: package or namespace load failed for ‘depmap’:
.onLoad failed in loadNamespace() for 'depmap', details:
call: FUN(X[[i]], ...)
error: ‘rnai_19Q2’ not found in ExperimentHub
Error: loading failed
Execution halted

Thanks
Best
M

error in evaluating the argument 'x' in selecting a method for function 'get': argument is of length zero

Hi,

Many thanks for this useful package. I have installed it successfully however when I am trying to download and cache datasets using ExperimentHub I am getting the following error:

crispr <- eh[["EH2261"]]
see ?depmap and browseVignettes('depmap') for documentation
downloading 1 resources
retrieving 1 resource
  |=================================================================================================================================================================================| 100%

loading from cache
Error: failed to load resource
  name: EH2261
  title: crispr_19Q1
  reason: error in evaluating the argument 'x' in selecting a method for function 'get': argument is of length zero

Could you please advise?

Many thanks,
Dimitris

character-type column for entrez_id

In rnai and crispr datasets, entrez_id should be columns of integer instead of character (this is a source of bug if we want to merge these datasets -by entrez id- with our own genes dataset)

Consider unit test

@tfkillian - It might be worth considering adding simple unit tests to test for the small issues in the latest releases, for example checking that the accessor functions return the latest data, that the old data are available, ... assuming these don't take too much time.

21Q3

Hi @lgatto , @tfkillian , is 21_Q3 coming any time soon? :D
FYI this has been a huge help for my organization.

Also FYI although we have experimented with the MAE, our visualization applications currently support SEs, so we concatenated multiple different molecular assays into a single SE assay, where each row corresponds to a single feature of a single molecular assay.

ignore.case=TRUE leads to incorrect 20Q4 copyNumber data

Hi,

I think there may be a bug in the copy number data for the 20Q4 data (Bioc-devel depmap_1.5.1). It looks like one of the ExperimentHub depmap tags is CopyNumberVariationData.

tbl <- AnnotationHub:::.db_index_load(ExperimentHub::ExperimentHub())
> tbl[3413]
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       EH3964
"metadata_20Q4\rBroad Institute\rHomo sapiens\r9606\r\rMetadata for cell lines in the 20Q4 DepMap release, for 0 genes, 1812 cell lines, 35 primary diseases and 39 lineages.\r1\rTheo Killian <[email protected]>\r2020-11-25\rdepmap\rc(\"ExperimentHub\", \"ExperimentData\", \"ReproducibleResearch\", \"RepositoryData\", \"AssayDomainData\", \"CopyNumberVariationData\", \"DiseaseModel\", \"CancerData\", \"BreastCancerData\", \"ColonCancerData\", \"KidneyCancerData\", \"LeukemiaCancerData\", \"LungCancerData\", \"OvarianCancerData\", \"ProstateCancerData\", \"OrganismData\", \"Homo_sapiens_Data\", \"PackageTypeData\", \"SpecimenSource\", \"CellCulture\", \"Genome\", \"Proteome\", \"StemCell\", \"Tissue\")\rtibble\rdepmap/metadata_20Q4.rda\rhttps://ndownloader.figshare.com/files/25494443\rCSV"

Thus, when AnnotationHub::query tries to grepl for copyNumber within depmap_data_loading, it picks up all depmap entries because the search is case-insensitive. Then, because the last result happens to be the metadata, when depmap::depmap_copyNumber is called, the user is accidentally returned the metadata instead of the copy number data.

MRE:

name <- "copyNumber"
eh <- ExperimentHub::ExperimentHub()
eh1 <- AnnotationHub::query(eh, c("depmap", name), ignore.case=TRUE) # Default ignore.case=TRUE; 48 records
eh2 <- AnnotationHub::query(eh, c("depmap", name), ignore.case=FALSE) # 8 records
depmap::depmap_copyNumber()

Best,
Allison

Depmap Linux download without GUI

Hi Sir/Madam,

Is there anyway to download depmap from Linux terminal? I want to download the data from linux server which don't have GUI and therefore, I cannot click the download button.

https://ndownloader.figshare.com/files/27902091

Thanks.

Shicheng

Accessing latest 21Q3 data from Depmap package

I've installed Depmap version 1.7.1 but I can only view data releases upto 21Q1.

BiocManager::install(version='devel')
BiocManager::install("depmap")

Is there a step I'm missing to access the latest datasets?

library("depmap")
library("ExperimentHub")
query(eh, "depmap")
depmap_crispr()

Thank you for your work on this package it's fantastically useful.

Bioconductor submission

Submitted to Bioconductor.

Consider adding "essential" gene designations data

The CRISPR_common_essentials.csv list is very helpful for CRISPR dependency screens to remove pan-lethal hits. Inclusion of this list in future releases would be very welcome and a relatively simple addition.

installation and function

BiocManager::install("uclouvain-cbio/depmap") failed

remotes::install_github("UCLouvain-CBIO/depmap", ref = "4a9f52ed6cf9c3821891ebdd9db317194d6518c9") can get 19Q1 release. How to get 19Q3

Also, depmap_copyNumber() function is not available

Thanks.

depmap_mutationCalls() returning metadata table

I am using the Bioconductor depmap package and I find it very useful and easy to use.

It seems like the depmap_mutationCalls() function is returning the metadata table.
mutationCalls_19Q3() returns the correct mutation data.
Could you take a look at it?

definition of "co-dependent genes"

I am wondering whether expression, CNV are included in "co-dependent genes" estimation.

The DepMap portal website (https://depmap.org/portal) provides a range of information for each gene, cell lines and lineage dependent on the gene, and co-dependent genes (i.e., other genes whose dependency scores are highly correlated with the gene) as well as basal transcriptome, copy numbers, and mutations of the gene.

Thanks.
Shicheng

Read about ExerimentHub

The ExperimentHub package

provides a client for the Bioconductor ExperimentHub web resource. ExperimentHub provides a central location where curated data from experiments, publications or training courses can be accessed. Each resource has associated metadata, tags and date of modification. The client creates and manages a local cache of files retrieved enabling quick and reproducible access.

accessor functions not exported

@lgatto I noted an error in that all of the accessor functions except one were not exported to the NAMESPACE.

2022 Version Problem

Hello-
I used biocomanger to install the tool. But the version installed is depmap: 1.6.0 and ExperimentHub: 2.0.0. Any help on how to install the 2022 dataset?

Thanks!

Loading data using named functions

I can't remember if there are named functions to lead data - something like

rnai <- depmap_rnai()

that would automatically load the latest RNAi data from ExperimentHub.

Installation instructions

The installation instructions in the vignette need to describe how to install from Bioconductor, and not from GitHub. There can of course be a mention of the GitHub repo.

Can't get data from ExperimentHub

While the depmap data is on ExperimentHub, I can't seem to actually download it.

> library(ExperimentHub)
> eh <- ExperimentHub()
snapshotDate(): 2019-04-16
> query(eh, "depmap")
ExperimentHub with 7 records
# snapshotDate(): 2019-04-16 
# $dataprovider: Broad Institute
# $species: Homo sapiens
# $rdataclass: tibble
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["EH2260"]]' 

           title             
  EH2260 | rnai_19Q1         
  EH2261 | crispr_19Q1       
  EH2262 | copyNumber_19Q1   
  EH2263 | RPPA_19Q1         
  EH2264 | TPM_19Q1          
  EH2265 | mutationCalls_19Q1
  EH2266 | metadata_19Q1     
> eh["EH2260"]
ExperimentHub with 1 record
# snapshotDate(): 2019-04-16 
# names(): EH2260
# package(): depmap
# $dataprovider: Broad Institute
# $species: Homo sapiens
# $rdataclass: tibble
# $rdatadateadded: 2019-04-15
# $title: rnai_19Q1
# $description: (DEMETER2) Batch and off-target corrected RNAi gene knockdow...
# $taxonomyid: 9606
# $genome: 
# $sourcetype: CSV
# $sourceurl: https://ndownloader.figshare.com/files/13515395
# $sourcesize: NA
# $tags: c("ExperimentHub", "ExperimentData", "ReproducibleResearch",
#   "RepositoryData", "AssayDomainData", "CopyNumberVariationData",
#   "DiseaseModel", "CancerData", "BreastCancerData", "ColonCancerData",
#   "KidneyCancerData", "LeukemiaCancerData", "LungCancerData",
#   "OvarianCancerData", "ProstateCancerData", "OrganismData",
#   "Homo_sapiens_Data", "PackageTypeData", "SpecimenSource",
#   "CellCulture", "Genome", "Proteome", "StemCell", "Tissue") 
# retrieve record with 'object[["EH2260"]]' 
> eh[["EH2260"]]
snapshotDate(): 2019-04-16
see ?depmap and browseVignettes('depmap') for documentation
downloading 1 resources
retrieving 1 resource
Downloading: 240 B     
Error: failed to load resource
  name: EH2260
  title: rnai_19Q1
  reason: 1 resources failed to download
In addition: Warning messages:
1: download failed
  web resource path: ‘https://experimenthub.bioconductor.org/fetch/2260’
  local file path: ‘/home/lgatto/.cache/ExperimentHub/66b14bbce11c_2260’
  reason: Forbidden (HTTP 403). 
2: bfcadd() failed; resource removed
  rid: BFC30
  fpath: ‘https://experimenthub.bioconductor.org/fetch/2260’
  reason: download failed 
3: download failed
  hub path: ‘https://experimenthub.bioconductor.org/fetch/2260’
  cache resource: ‘EH2260 : 2260’
  reason: bfcadd() failed; see warnings()

@lshep - is this an issue with the depmap package that is still missing something?

This is with

> sessionInfo()
R version 3.6.0 beta (2019-04-15 r76395)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Manjaro Linux

Matrix products: default
BLAS:   /usr/lib/libblas.so.3.8.0
LAPACK: /usr/lib/liblapack.so.3.8.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] depmap_0.2.0          dplyr_0.8.0.1         ExperimentHub_1.9.3  
[4] AnnotationHub_2.15.12 BiocFileCache_1.7.9   dbplyr_1.3.0         
[7] BiocGenerics_0.29.2  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1                    later_0.8.0                  
 [3] compiler_3.6.0                pillar_1.3.1                 
 [5] BiocManager_1.30.4            prettyunits_1.0.2            
 [7] remotes_2.0.4                 tools_3.6.0                  
 [9] digest_0.6.18                 pkgbuild_1.0.3               
[11] bit_1.1-14                    RSQLite_2.1.1                
[13] memoise_1.1.0                 tibble_2.1.1                 
[15] pkgconfig_2.0.2               rlang_0.3.4                  
[17] shiny_1.3.1                   DBI_1.0.0                    
[19] cli_1.1.0                     yaml_2.2.0                   
[21] curl_3.3                      withr_2.1.2                  
[23] httr_1.4.0                    IRanges_2.17.5               
[25] S4Vectors_0.21.23             rappdirs_0.3.1               
[27] stats4_3.6.0                  rprojroot_1.3-2              
[29] bit64_0.9-7                   tidyselect_0.2.5             
[31] Biobase_2.43.1                glue_1.3.1                   
[33] R6_2.4.0                      processx_3.3.0               
[35] AnnotationDbi_1.45.1          callr_3.2.0                  
[37] purrr_0.3.2                   blob_1.1.1                   
[39] magrittr_1.5                  promises_1.0.1               
[41] htmltools_0.3.6               backports_1.1.4              
[43] ps_1.3.0                      assertthat_0.2.1             
[45] xtable_1.8-3                  mime_0.6                     
[47] interactiveDisplayBase_1.21.0 httpuv_1.5.1                 
[49] crayon_1.3.4

"Dependency" in Drug Sensitivity Data

The fourth column in the table downloaded by depmap_drug_sensitivity() is "dependency." Should that be "sensitivity?"

How does this column relate to the GDSC measures of ic50, auc, etc.?

depmap_copyNumber() function does not return CN data; returns metadata table

Hi,

I just wanted to bring this to your attention; in the vignette it says:

The most recent copyNumber dataset can be automatically loaded into R by using the depmap_copyNumber function.

But it just returns a metadata table. The output is formaly identical to depmap_metadata.

This can even be seen in the vignette output for depmap_copyNumber()

Thanks a lot for the very useful package,

MFP

depmap_drug_sensitivity function does not load all metadata

In previous versions, depmap_drug_sensitivity() loaded a 4-column dataset, with limited metadata, but I see in the updated documentation that extended metadata should now be accessible. However, the function still returns the original 4 columns. Has this update been rolled out yet?

Next steps

add remaining data creation code chunks in inst/scripts/make_data.R
use long format for data
review data manual pages
build/check/test package

From there on, we will look at the depmap.Rmd vignette.

Removing large files from git history

The repository is large due to the large data files that used to live in the package. I'll clean the git history from these large files using the procedure described here.

`depmap_drug_sensitivity()` throwing a 404

Seems this data was removed?

depmap::depmap_drug_sensitivity()
snapshotDate(): 2022-04-26
see ?depmap and browseVignettes('depmap') for documentation
downloading 1 resources
retrieving 1 resource
  |=============================================================================================================| 100%

Error: failed to load resource
  name: EH7530
  title: drug_sensitivity_21Q2
  reason: 1 resources failed to download
In addition: Warning messages:
1: download failed
  web resource path: ‘https://experimenthub.bioconductor.org/fetch/7580’
  local file path: ‘C:\Users\jandrews\AppData\Local/R/cache/R/ExperimentHub/589c4ad59f5_7580’
  reason: Not Found (HTTP 404). 
2: bfcadd() failed; resource removed
  rid: BFC33
  fpath: ‘https://experimenthub.bioconductor.org/fetch/7580’
  reason: download failed 
3: download failed
  hub path: ‘https://experimenthub.bioconductor.org/fetch/7580’
  cache resource: ‘EH7530 : 7580’
  reason: bfcadd() failed; see warnings()

Change `depmap_crispr()` to pull from combined dataset rather than Achilles only

The main data releases from 21Q2 onwards switched from using the Achilles tables to the CRISPR ones, which combined the Broad and Sanger datasets.

This is apparent from looking at the downloads page and selecting older releases versus new ones.

Either adding parameters to distinguish between the Achilles and CRISPR files or making explicit functions to retrieve each may reduce confusion.

20Q3

Hi, I'm trying out the depmap package and really like the integration with ExperimentHub. One issue is that the stable version of the package is currently two releases (20Q1) behind the latest quarterly release available on the DepMap website (20Q3). Is there a mechanism to push 20Q3 to ExperimentHub and then query using the depmap package?

Updating NEWS file at every change

A reminder to update the NEWS.md file every time there's a user-visible change.

can not install

I tried to install the package and got the following error: