Coder Social home page Coder Social logo

leekgroup / recount-contributions Goto Github PK

View Code? Open in Web Editor NEW
5.0 13.0 3.0 10 KB

Contribute your human Illumina RNA-seq data to recount2

Home Page: https://jhubiostatistics.shinyapps.io/recount/

License: MIT License

rstats bioconductor recount rnaseq annotation-agnostic r

recount-contributions's Introduction

Table of contents

Overview

If you are interested in contributing your human RNA-seq data sequenced on the Illumina platform to recount2 you will need to modify and submit this form, which has you describe your data and tell us how to access the required files. We'll respond as soon as we can and add your data to recount2. For more information, read the recount2 paper via Nature Biotechnology.

Getting help

If you're confused by any of the instructions below or are having trouble with your submission, feel free to chat with us in the recount2 contributions Gitter or create an issue.

Generating recount2-like files

Before we can add your RNA-seq data to recount2, you'll need to generate files similar to the ones we already provide for each project on SRA. This means you'll have to run Rail-RNA on your dataset against the human reference genome hg38. Rail-RNA will generate a set of deliverables. For recount2, we'll need the cross-sample tables, coverage vectors in bigWig format, and the exon-exon junction files. After the Rail-RNA run is complete, you'll need you to execute this set of R scripts that create the remaining files we need. Once they are generated, you may want to add more phenotype information for your samples. Then you can modify and submit this form, and we'll take it from there. Details foll

The first step involves running Rail-RNA with your data. You will find more information about how to do this at the Rail-RNA documentation website. You can run it on the cloud using Amazon Web Services Elastic MapReduce. Note that it's crucial that the Rail-RNA deliverables include:

  • cross-sample tables: tsv
  • coverage vectors: bw
  • junction files: jx

and also that reads are aligned to the hg38 assembly. If you perform the alignment locally, please run Rail-RNA using the Bowtie indexes from the hg38 Illumina iGenome. If you perform the alignment using Amazon Web Services Elastic MapReduce, make sure to use the command-line parameter -a hg38. So in local mode, a single command to preprocess and align your RNA-seq data should look like this:

    rail-rna go local -x /path/to/hg38/Bowtie/basename /path/to/hg38/Bowtie2/basename \
    -m /path/to/Rail-RNA/manifest/file -o /path/to/output/dir -d tsv,bw,jx

while in elastic (cloud) mode, the command should look like this:

    rail-rna go elastic -a hg38 -m /path/to/Rail-RNA/manifest/file -o s3://bucket-name/output-dir \
    -d tsv,bw,jx -c <number of core instances>

Create recount2 objects

Once you have the output from Rail-RNA you will need to run the recount-prep R scripts. If you've run Rail-RNA in the cloud, you'll have to download its output to your local system. To run the R scripts, you'll first need to install some dependencies. These are:

as well as the following R/Bioconductor packages that can be installed with the following R command:

install.packages("BiocManager")
BiocManager::install(c('recount', 'devtools', 'getopt', 'downloader',
    'SummarizedExperiment', 'Hmisc'))

Now run prep_setup.R which downloads some files that will be needed in the other scripts. Next, run prep_sample.R for each sample in your data set. You can perform this step in parallel if you like. Finally, run prep_merge.R to create the final recount2 objects. A bash script example that runs all three scripts is available as example_prep.sh. If you choose to model your script after this one, make sure to change the variable definitions made in it as follows.

    DATADIR: (local) path to Rail-RNA output directory
    BWTOOL: path to bwtool v1.0 executable
    WIGGLE: path to wiggletools v1.1 executable
    WIGTOBIGWIG: path to UCSC wigToBigWig executable
    MANIFEST: path to Rail-RNA manifest file that was used in the `rail-rna` command invocation

If you have more metadata (phenotype information for your samples) than the one included by default in the recount2 objects, you can add it to the RangedSummarizedExperiment objects once they are created or modify the preparation R scripts accordingly. For example, adding the tissue information, cell line, age, sex and other demographic variables can be of great use to other researchers.

Submit files

Once you have created all the recount2 objects, please modify and submit this form. In it, we ask you for information on how to contact you, information about your dataset, and instructions on how to access the recount files you created. We will download and check your files. If they're approved, we'll upload them to recount2. The files we'll need access to are:

  • The bigwig coverage files for each sample created by Rail-RNA: coverage_bigwigs/*.bw
  • The junction files created by Rail-RNA
  • The normalized mean coverage file: bw/mean.bw
  • counts_exon.tsv.gz
  • counts_gene.tsv.gz
  • rse_exon.Rdata
  • rse_gene.Rdata
  • rse_jx.Rdata
  • Log files created by the R scripts for reproducibility purposes

The RangedSummarizedExperiment objects contain the sample metadata that we'll use. You should make sure that all three objects have the same metadata.

Summary

  • Run Rail-RNA on your data against the human hg38 genome reference.
  • Install dependencies for the recount2 R scripts.
  • Download the deliverables from Rail-RNA using the same file structure.
  • Run the prep_setup.R R script.
  • Run the R script prep_sample.R for each sample.
  • Run the R script prep_merge.R.
  • Optionally add more metadata to your phenotype information.
  • Make the files accessible to us.
  • Modify and submit this form.

recount-contributions's People

Contributors

caleblareau avatar lcolladotor avatar nellore avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recount-contributions's Issues

Docker container (request)

I've been trying to follow the instructions for preparing my data after running Rail-RNA and am having problems with the amount of dependencies these additional tools need and getting them to run. Some do not look to be maintained.

Release of a docker container would greatly assist this if it is possible.

Testing rail-RNA on a project uploaded in Recount2 - different results

To whom, it may concern.
I would like to run the same pipeline used to generate Recount2 in my dataset.
To check if the rail-RNA pipe worked correctly I ran it on a small project uploaded in Recount2 (SRP047399)

Rail-RNA was run using this command:
rail-rna go local -x dir_bowtie_hg38 dir_bowtie2_hg38 -m manifest_file --log dir_log_file -d tsv,bw,jx -f --scratch temp_folder_dir

based on Bowtie and Bowtie2 indexes for the reference genome hg38. Downloaded from:

I genome Illumina:

  • ftp://ussd-ftp-illumina.com/Homo_sapiens/UCSC/hg38/Homo_sapiens/sequences/BowtieIndex
  • ftp://ussd-ftp-illumina.com/Homo_sapiens/UCSC/hg38/Homo_sapiens/sequences/Bowtie2Index

I used the following data to run the R scripts after the rail-RNA pipeline.

However when I match the gene counts or exon counts that I obtained against those uploaded in Recount2 I found some differences. In some cases, these differences resulted in more than 100.000 read counts

Moreover, I get this error when I run the prep_merge.R script
oo <- findOverlaps(jx_gr, introns_unique, type = 'equal')
stopifnot(length(unique(queryHits(oo))) == length(oo))

Can this be due to the use of wrong annotations?

I read that the tool is deterministic, I expected to obtain the same results. I check the number of reads in input and they are the same. However, the number of mapped reads is different

Do you have any idea why this happens?
Considering that I would like to compare my samples with the data in Recount. Please, can you confirm that the pipeline and the reference and the annotation files are correct?

Thank you.
Best Regards,
Gianluca

Please find below an example for one of the sample.

SRR1583592_my results   SRR1583592_recount2
4398   1345
108156   108153
15685   15685
1   0
47805   115970
195   195

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.