Coder Social home page Coder Social logo

hypertribe_pipeline's Introduction

Instructions to Run HyperTribe Pipeline

Prerequisites:

  • Anaconda or Miniconda installed on your system.
  • R
  • All the scripts are written to run on LSF environment

1. Installation of Conda Environments

install the conda/mamba environment

Create the Conda environment named "hypertribe" using the provided environment.yml file:

conda env create -f environment.yml 

2. Downloading Genome and GTF Annotation Files

Steps:

Navigate to the "download_scripts" directory. Run the provided scripts to download the genome and GTF annotation files.

./download_genome.sh
./download_gtf.sh

3. Place the fastq files under input_data folder

4. HyperTRIBE Psudogene Files

we provided a sample sequence and gtf annotation of a unique sequence of the HyperTRIBE construct that can be used to quantify its expression level. modify the two files to include your specific experiment construct, such as the name of the file and pseudo gene name. also set L in the gtf file to the lenght of the unique sequence.

after that, concatenate the .fa files from the geneme and the construct pseudo gene into a single .fa file

Example:

    cat hg38.fa RBP_ADAR.fa > hg38_RBP_ADAR.fa

do the same thing with the gtf files.

5. Star Indexer Scripts

Steps:

  1. Navigate to the "genome_data" folder.
  2. Modify the paths of the files in the star_index_genome.sh and picard_dictionary_genome.sh scripts
  3. Run the star indexer script to generate necessary index file.
bsub < star_index_genome.sh 

then

bsub < picard_dictionary_genome.sh 

5. Running the Pipeline

Steps:

  1. Navigate to the "software_scripts/hypertribe" folder.
  2. Execute the pipeline scripts in the given order:

Step 1: Alignemnt

a. modify the samples' file name inside 1_star_align_genome.sh script

samples=("Sample_1" "Sample_1" "Sample_3" \
         "Sample_4" "Sample_5" "Sample_6")

also if you want to update the filenames to a more descriptive filenames, modify the following line accordingly, otherwise provide the same input as for samples

new_names=("Sample_New_Name_1" "Sample_New_Name_1" "Sample_New_Name_3" \
           "Sample_New_Name_4" "Sample_New_Name_5" "Sample_New_Name_6")

b. Set the path to Star Index folder

index_folder=../../genome_data/STAR_INDEX_OUTPUT

c. run the alignemnt step

bsub < 1_star_align_genome.sh

Step 2: MultiQC

bsub < 2_multiqc.sh

Step 3: Variant Calling

a. modify the following lines as in previous steps:

samples=("Sample_New_Name_1" "Sample_New_Name_1" "Sample_New_Name_3" \
           "Sample_New_Name_4" "Sample_New_Name_5" "Sample_New_Name_6")           

and

genome_file=../../genome_data/hg38_RBP_ADAR.fa
genome_dict_file=../../genome_data/hg38_RBP_ADAR.dict
dbsnp_file=../../genome_data/dbsnp/dbsnp.vcf.gz

and finally:

bsub < 3_variant_calling.sh

Step 4: Format Output

This step aims at aggregating the results of the variant calling for each sample into a single file that can be used to exploration or downstream analysis.

This step requires R.

a. modify the following lines in the 4_format_output.R script:

sample_list <- c(
  "Sample_New_Name_1", "Sample_New_Name_1", "Sample_New_Name_3",
  "Sample_New_Name_4", "Sample_New_Name_5", "Sample_New_Name_6"
  )

gtf_path <- "../../genome_data/hg38_RBP_ADAR.gtf"

b. run the script

bsub < 4_format_output.sh

Step 5: Differential Analysis

This step runs differential analysis between the control and HyperTRIBE samples to identify significant edited sites. it applies the same filtering and processing steps as described in the original HyperTRIBE paper.

There are various part of the script that need to be modified depending on the number of samples in each group. the current script assumes 3 samples per group.

The following lines need to be modified accordingly:

fit1 <- mle_custom_h1(ref_list[(1:3)], alt_list[(1:3)], ref_list[-c(1:3)], alt_list[-c(1:3)])
ctrl_freq_list <- rowMeans(alt_freq_df[, c(1:3)], na.rm = TRUE)
test_freq_list <- rowMeans(alt_freq_df[, c(4:6)], na.rm = TRUE)
stats_df <- data.frame(
  diff_mean = test_freq_list - ctrl_freq_list,
  ctrl_mean = ctrl_freq_list,
  test_mean = test_freq_list,
  pval = res_df$pval,
  Control_Sample_1_freq =  alt_freq_df[, 1],
  Control_Sample_2_freq =  alt_freq_df[, 2],
  Control_Sample_3_freq =  alt_freq_df[, 3],
  Treatment_Sample_1_freq =  alt_freq_df[, 4],
  Treatment_Sample_2_freq =  alt_freq_df[, 5], 
  Treatment_Sample_3_freq =  alt_freq_df[, 6]
)
stats_df <- stats_df[c(
  "diff_mean", "ctrl_mean", "test_mean",
  "pval", "padj",
  "Control_Sample_1_freq", 
  "Control_Sample_2_freq",
  "Control_Sample_3_freq",
  "Treatment_Sample_1_freq",
  "Treatment_Sample_2_freq",
  "Treatment_Sample_3_freq" 
)]

Rename the ouput filename:

write.csv(stats_df,
  paste0(output_folder, "5_CELL_LINE_Control_Treatment.csv"),
  row.names = FALSE
)

Finally, run the script

bsub < 5_test_differential.sh

hypertribe_pipeline's People

Contributors

ilyes495 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.