Coder Social home page Coder Social logo

lieberinstitute / recount-brain Goto Github PK

View Code? Open in Web Editor NEW
8.0 8.0 2.0 123.39 MB

Code and analyses for the recount-brain project

Home Page: http://LieberInstitute.github.io/recount-brain/

License: MIT License

HTML 99.77% TeX 0.17% R 0.07%
recount rnaseq metadata public human rstats

recount-brain's Introduction

recount-brain's People

Contributors

ashkaunr avatar djsokolowski avatar lcolladotor avatar nick-eagles avatar seandavi avatar shanellis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recount-brain's Issues

SRP032798 - "disease_status" is inverted

Hi!

I was looking "SRP032798" in recount-brain - I would like to report that the "disease status" for the 16 samples has got inverted. In other words, the 8 control samples are marked as "Disease" and the 8 disease samples are marked as control.

meta_v2 = get(load( "recount_brain_v2.Rdata"))
want_cols = c("sra_study_s", "Study_full", "development", "tissue_site_1", "tissue", "brodmann_area", "tumor_type", 
              "disease","disease_status", "sample_origin", "cell_line", "sex", "run_s")
as.data.frame(meta_v2 [which(meta_v2$Study_full=="SRP032798"), want_cols] )
           sra_study_s Study_full development tissue_site_1 tissue
SRR1027591   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027592   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027593   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027594   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027595   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027596   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027597   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027598   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027599   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027600   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027601   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027602   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027603   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027604   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027605   SRP032798  SRP032798        <NA>          <NA>   <NA>
SRR1027606   SRP032798  SRP032798        <NA>          <NA>   <NA>
           brodmann_area tumor_type                       disease
SRR1027591            NA       <NA>                          <NA>
SRR1027592            NA       <NA>                          <NA>
SRR1027593            NA       <NA>                          <NA>
SRR1027594            NA       <NA>                          <NA>
SRR1027595            NA       <NA>                          <NA>
SRR1027596            NA       <NA>                          <NA>
SRR1027597            NA       <NA>                          <NA>
SRR1027598            NA       <NA>                          <NA>
SRR1027599            NA       <NA> Amyotrophic lateral sclerosis
SRR1027600            NA       <NA> Amyotrophic lateral sclerosis
SRR1027601            NA       <NA> Amyotrophic lateral sclerosis
SRR1027602            NA       <NA> Amyotrophic lateral sclerosis
SRR1027603            NA       <NA> Amyotrophic lateral sclerosis
SRR1027604            NA       <NA> Amyotrophic lateral sclerosis
SRR1027605            NA       <NA> Amyotrophic lateral sclerosis
SRR1027606            NA       <NA> Amyotrophic lateral sclerosis
           disease_status sample_origin                  cell_line  sex
SRR1027591        Disease          iPSC iPSC-derived motor neurons <NA>
SRR1027592        Disease          iPSC iPSC-derived motor neurons <NA>
SRR1027593        Disease          iPSC iPSC-derived motor neurons <NA>
SRR1027594        Disease          iPSC iPSC-derived motor neurons <NA>
SRR1027595        Disease          iPSC iPSC-derived motor neurons <NA>
SRR1027596        Disease          iPSC iPSC-derived motor neurons <NA>
SRR1027597        Disease          iPSC iPSC-derived motor neurons <NA>
SRR1027598        Disease          iPSC iPSC-derived motor neurons <NA>
SRR1027599        Control          iPSC iPSC-derived motor neurons <NA>
SRR1027600        Control          iPSC iPSC-derived motor neurons <NA>
SRR1027601        Control          iPSC iPSC-derived motor neurons <NA>
SRR1027602        Control          iPSC iPSC-derived motor neurons <NA>
SRR1027603        Control          iPSC iPSC-derived motor neurons <NA>
SRR1027604        Control          iPSC iPSC-derived motor neurons <NA>
SRR1027605        Control          iPSC iPSC-derived motor neurons <NA>
SRR1027606        Control          iPSC iPSC-derived motor neurons <NA>

You can see from SRA run selector ( "https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP032798&o=acc_s%3Aa")that the SRR1027591 to SRR1027598 are "Normal Control" and SRR1027599 to SRR1027606 are ALS samples

Thanks,
Sonali.

Check into querying by ontology term for disease, location, etc.

Thanks for pulling in the ontology stuff. If someone on the team has time, it would be interesting allow queries of the type "location" = "supratentorial" (or whatever the correct ontology term is for that) and find all child terms that match. This would involved loading the full ontologies as DAGs and then querying for terms and all "more specific" children.

Different Number of samples in metadata file and rse object

Hi!

I was looking at the gene expression of SEQC samples from this paper SRP025982. As seen from SRA sun selector, there are a total of 2898 samples (https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP025982)

The metadata file has also got 2898 samples.

> meta_v2 = read.csv("recount_brain_v2.csv", header=T, stringsAsFactors=FALSE)
> dim(meta_v2)
[1] 6547   65
> seqc_meta = meta_v2[which(meta_v2[,"Study_full"]=="SRP025982"), ]
> dim(seqc_meta)
[1] 2898   65

And it appears that all of them are unique

> length(unique(seqc_meta$run_s))
[1] 2898
> length(unique(seqc_meta$count_file_identifier))
[1] 2898
> length(unique(seqc_meta$Dataset))
[1] 1
> length(unique(seqc_meta$sra_sample_s))
[1] 2898
> length(unique(seqc_meta$sample_name_s))
[1] 2898
> length(unique(seqc_meta$loaddate_s))
[1] 5
> table(seqc_meta$loaddate_s)

2013-06-12 2013-07-08 2014-04-03 2015-10-29 2015-10-30 
      2245          1        140         46        466 
> length(unique(seqc_meta$experiment_s))
[1] 2898

But the recount website says there are only ~1720 samples, and the loaded rse has only 1712 samples -

> load("rse_gene.Rdata")
dim> dim(rse_gene)
[1] 58037  1712

I was wondering if you could explain why there is a difference in the number of samples?
Thanks and Regards,
Sonali

SRP066009 samples are actually reference samples, but marked as "Disease = Brain Tumor unspecifed"

Hi!
One of the datasets inside recount-brain is coming from SRP066009
The publication for this dataset is : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4793214/

As you can see from Fig 1 , this is all reference samples - but a few of these are marked as "Brain tumor unspecified" in the disease samples - which is incorrect.

Thanks,
Sonali.

meta_v2 = get(load( "recount_brain_v2.Rdata")
meta_v2[which(meta_v2$sra_study_s=="SRP066009"), c("disease", "disease_status", "run_s", "sample_name_s", "sample_origin", "sex") ]
                             disease disease_status      run_s sample_name_s
SRP066009.1  brain tumor unspecified        Disease SRR2912443        MAQC A
SRP066009.2  brain tumor unspecified        Disease SRR2912444        MAQC A
SRP066009.3  brain tumor unspecified        Disease SRR2912446        MAQC A
SRP066009.4                     <NA>        Control SRR2912479        MAQC B
SRP066009.5                     <NA>        Control SRR2912481        MAQC B
SRP066009.6                     <NA>        Control SRR2912483        MAQC B
SRP066009.7  brain tumor unspecified        Disease SRR2912487        MAQC C
SRP066009.8  brain tumor unspecified        Disease SRR2912489        MAQC C
SRP066009.9  brain tumor unspecified        Disease SRR2912490        MAQC C
SRP066009.10 brain tumor unspecified        Disease SRR2912491        MAQC D
SRP066009.11 brain tumor unspecified        Disease SRR2912492        MAQC D
SRP066009.12 brain tumor unspecified        Disease SRR2912494        MAQC D
             sample_origin    sex
SRP066009.1          Brain pooled
SRP066009.2          Brain pooled
SRP066009.3          Brain pooled
SRP066009.4          Brain pooled
SRP066009.5          Brain pooled
SRP066009.6          Brain pooled
SRP066009.7          Brain pooled
SRP066009.8          Brain pooled
SRP066009.9          Brain pooled
SRP066009.10         Brain pooled
SRP066009.11         Brain pooled
SRP066009.12         Brain pooled

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.