Coder Social home page Coder Social logo

julianneyang / transcriptomicsonhoffman Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 21.38 MB

Walks through installation and usage of FASTQC, MultiQC, Trimmomatic, and Salmon for transcriptomic data preprocessing. Includes Grid Engine shell scripts that can be looped over many files in a directory.

License: MIT License

Shell 34.58% R 65.42%
fastqc hpc multiqc rnaseq salmon transcriptomics trimmomatic

transcriptomicsonhoffman's Introduction

transcriptomicsonhoffman

After you log in to Hoffman2 and request a computational node:

Preprocessing the data

Assuming your fastq files are in your current working directory:

  1. Install FastQC (here we create a new conda env to install fastqc)
conda create -n fastqc fastqc
conda activate fastqc
  1. Inside a directory with the raw data files, run FastQC.

Interactive: creates directory called FastQC_output and stores fastqc reports in that directory

mkdir FastQC_output/
fastqc *.fastq.gz -o FastQC_output/

Job submission (recommended). Note, you may need to provide the full filepath to 1-FastQC.sh

qsub ../rna_scripts/FastQC.sh
  1. Aggregate quality reports for all samples by using multiQC (note: for some reason I had issues with forcing multiqc to use python 3.10 so I had to use the below workaround. MultiQC takes as input a directory full of report.html files.

Create a new conda environment and deactivate the old:

conda deactivate
conda create -n multiqc

For downloading MultiQC, do not use conda, it downloads an outdated version. Instead I used pip to install the development version, and I also forced installed to $PROJECT which has enough space as opposed to the default $HOME installation

pip install --upgrade --force-reinstall git+https://github.com/MultiQC/MultiQC.git -t /u/project/jpjacobs/jpjacobs/rna_seq/

You may need to find the exact filepath to multiqc via the following command:

which multiqc

To run interactively, Replace ~/.local/bin/multiqc with the exact filepath:

python ~/.local/bin/multiqc ./

Job submission (recommended). Do this within the directory where your outputs from FastQC are located.

cd FastQC_output
qsub multiqc.sh
  1. Copy the .html report over to your local directory with scp or push to Github from Hoffman. open report.html in a browser. For help interpreting multiqc results, see the following resoureces:

  2. Trim adapters and low-quality reads with Trimmomatic. Since we already have trimmomatic installed in the kneaddata env, we are going to activate the kneaddata env. Note that you can append additional parameters for Trimmomatic; the command embedded in trimmomatic.sh has very gentle trimming parameters and removes adapters assuming Illumina Hiseq was the sequencer.

conda activate kneaddata

Interactive:

trimmomatic PE JJ1715_393_S43_R1_001.fastq.gz JJ1715_393_S43_R2_001.fastq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:/u/home/j/jpjacobs/project-jpjacobs/software_rna_seq/Trimmomatic/trimmomatic-0.39/adapters/TruSeq3-PE.fa:2:30:10:2:True LEADING:3 TRAILING:3 MINLEN:36

Job submission for many files (assumes you are in the directory where your raw fastQ files are located). You may need to change the filepath to point to run_trimmomatic.sh

for f in *R1_001.fastq.gz; do name=$(basename $f R1_001.fastq.gz); qsub ../../../software_rna_seq/rna_scripts/3-trimmomatic.sh ${name}R1_001.fastq.gz ${name}R2_001.fastq.gz; done
  1. Install salmon (I downloaded the salmon-1.10.0_linux_x86_64.tar.gz to the software_rna_seq folder, then I unpacked it with tar) https://github.com/COMBINE-lab/salmon/releases
tar xzvf salmon-1.10.0_linux_x86_64.tar.gz
  1. Use salmon to index a mouse genome

Download transcriptome file (I tried gencode first but had a lot of warnings, so I switched to ensembl). Note I've provided these for you in this repo:

wget http://ftp.ensembl.org/pub/release-111/fasta/mus_musculus_c57bl6nj/cdna/Mus_musculus_c57bl6nj.C57BL_6NJ_v1.cdna.all.fa.gz

Download annotation file. Note I've provided this in the repo:

http://ftp.ensembl.org/pub/release-111/gtf/mus_musculus_c57bl6nj/Mus_musculus_c57bl6nj.C57BL_6NJ_v1.111.gtf.gz

Index transcriptome file. Note, I've provided it in this repo but feel free to build your own or update as new releases come out:

/u/home/j/jpjacobs/project-jpjacobs/software_rna_seq/salmon/salmon-latest_linux_x86_64
bin/salmon index -t Mus_musculus_c57bl6nj.C57BL_6NJ_v1.cdna.all.fa.gz -i Mus_musculus_c57bl6nj_index -p 8
  1. Run salmon on trimmed fastq files:
../salmon/salmon-latest_linux_x86_64/bin/salmon quant -i ../salmon/salmon-latest_linux_x86_64/Mus_musculus_c57bl6nj_index -l A -1 output_JJ1715_393_S43_R1_001.fastq_paired.fq.gz -2 output_JJ1715_393_S43_R2_001.fastq_paired.fq.gz -p 8 --gcBias --validateMappings -o JJ1715_393_quant

Job submission (Recommended)

for f in *R1_001.fastq_paired.fq.gz; do name=$(basename $f R1_001.fastq_paired.fq.gz); qsub ../rna_scripts/salmon.sh ${name}R1_001.fastq_paired.fq.gz ${name}R2_001.fastq_paired.fq.gz; done

Generating a count matrix

  1. Follow instructions in tximport.R and txmeta.R to generate TPM/ count matrices and gene-level annotations.

References:

transcriptomicsonhoffman's People

Contributors

julianneyang avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.