Neural Plate Border Analysis

This repository contains all the code used for analysis pertaining to the Neural Plate Border project. This project focused on understanding how cell fate decisions are made in the developing ectoderm of the chick embryo. This project involved analysis of scRNA-seq and scATAC-seq data created by the Streit lab, and publicly avaliable CUT&RUN and HiChip data.

Repository Organisation

Each folder beginning with 'NF-' contains an independent subset of the analysis. Some of the folders run NF-core pipelines (click here to see all NF-core pipelines), and some of the folders run our shared Streitlab Nextflow pipelines (click here to see all Streitlab repositories). Two folders (NF-hichip_downstream and NF-downstream_analysis) contain custom Nextflow pipelines.

Folder	Pipeline Origin	Description	README
NF-cellranger_align	Streit-lab pipeline	Aligns scRNA-seq and scATAC-seq data	README file
NF-cut_and_run	NF-core pipeline	Downloads and aligns CUT&RUN data	README file
NF-downstream_analysis	Custom pipeline	Analyses scATAC-seq data and integrate it with previously published scRNA-seq data	README file
NF-enhancer_annotation_and_motif_analysis	Streit-lab pipeline	Annotates genomic regions to genes and run motif scanning	README file
NF-hic	NF-core pipeline	Aligns HiChip daata	README file
NF-hichip_downstream	Custom pipeline	Analyses HiChip data	README file

Reproducing the analysis of this project

As stated above, different subsets of this project's analysis have been run in different ways.

To re-run the analysis that utilise the Streit-lab pipelines or NF-core pipelines please see the README files linked in the table above and consult the documentation of the original pipelines.

To reproduce the analysis performed in the two custom pipelines: NF-downstream_analysis and NF-hichip_downstream, perform the following steps:

Clone this repository from Github
Ensure Nextflow is installed on the computer or HPC in which the pipeline will be run (HPC or cloud server recommended as computing requirements are high)
Download and process the raw data (using the Nextflow hic pipeline for the HiChip or Cellranger for the scATAC-seq data) and change the samplesheets to reflect the correct path to the aligned data. Samplesheet for NF-hichip_downstream input / Samplesheet for NF-downstream_analysis input
The bash scripts in the 'run_scripts' folder are how the Nextflow pipelines are triggered. These may need to be edited depending on how modules are loaded in the HPC and to set cache directories for singularity containers and nextflow. Run script for NF-hichip_downstream pipeline / Run script for NF-downstream_analysis pipeline
The run script also specifies a 'profile' which includes platform-specific parameters and paths to reference genomes. A new profile can be created, named and specified in the run script. This example profile can be used as a template.
Once the pipeline has been adapted to run from a different computer or HPC system, it can be executed by running the run script

Reusing code from this repository for other projects

The two custom Nextflow pipelines, NF-downstream_analysis and NF-hichip_downstream, can be reused in whole or in part for other projects and analyses. Below is instructions on how the whole pipeline, modules of the pipeline or scripts of the pipeline can be re-used and re-purposed. For more information on what the different steps of these pipelines do and which might be useful for re-purposing see their README docs (NF-downstream_analysis README, NF-hichip_downstream README).

Reusing whole Nextflow pipeline with different input data - NF_hichip-downstream

Despite being made of some generalised components, the whole NF-downstream_analysis pipeline is specific to the data and question of the NPB project, and so it is not recommeded to run these pipelines entirely on different data.

In contrast, the NF-hichip_downstream pipeline can be used to predict enhancer-gene pairs in any HiChip dataset. To run the pipeline with different data, 3 data inputs need to be changed.

the samplesheet should be modified to reflect the different HiChip data in the form of Valid Pairs - this data must have been pre-processed using the NF-core HiC pipeline.
The genome should be adapted to the species of interest by changing the path to the gtf file and genome index file in the profile config.
The 'peaks' path should be a bed file of where ehancers are predicted to be located. In this project the peaks from the scATAC-seq data were used, however bulk peaks, Chip-seq peaks or equivalent can be used. This path is also specified in the profile config.

For more information on what the NF-hichip_downstream pipeline does see the pipeline's documentation.

Reusing Nextflow modules

Nextflow modules are called into the pipeline as processes. The modules used for the NF_hichip-downstream pipeline can be found here and for the NF-downstream_analysis pipeline here. These modules can be re-used in other Nextflow pipelines by copying their main.nf file and loading them into a pipeline as a process e.g.

include {EXTRACT_PROMOTERS} from "$baseDir/modules/local/extract_promoters/main"

Looking at the main.nf Nextflow script for the module can reveal what the module does and what it expects as input and output, for example see the module to extract promoters from a gtf.

Two of the Nextflow modules used in these pipelines are generic in that they run either an R script or a python script. Please note that both modules expect a tuple as input data - i.e. the data and associated metadata.

Note that each Nextflow module is associated with a Docker container, this can be seen in the module's main.nf script. These containers overcome dependency issues and make the Nextflow pipeline portable. Some of the containers are from the Docker repository Docker.hub whilst others are custom-made, and their Docker scripts can be found in the same folder as the module's main.nf script

Reusing R and python scripts

The generic R and python modules mentioned above can be read into the Nextflow pipeline together with a specific script like so:

include {R as PEAK_CALL} from "$baseDir/modules/local/r/main"               addParams(script: file("$baseDir/bin/ArchR_utilities/ArchR_peak_calling.R", checkIfExists: true) )

The custom pipelines in this repository include many R and python scripts that can be found in the 'bin' folder of each pipeline. Most of these scripts are highly generalisable and can be used to analyse any scATAC-seq, scRNA-seq and HiChip data. Most of the scripts contain a list of script options at the top which can be overwritten to adjust parameters or skip parts of the script, these options look like this:

# Read in command line opts
option_list <- list(
    make_option(c("-r", "--runtype"), action = "store", type = "character", help = "Specify whether running through through 'nextflow' in order to switch paths"),
    make_option(c("-c", "--cores"), action = "store", type = "integer", help = "Number of CPUs"),
    make_option(c("", "--stage_clust_res"), action = "store", type = "double", help = "clustering resolution for stage data", default = 1),
    make_option(c("", "--full_clust_res"), action = "store", type = "double", help = "clustering resolution for full data", default = 2),
    make_option(c("", "--clustree_stage"), action = "store", type = "logical", help = "whether to run clustree plot on stage data", default = FALSE),
    make_option(c("", "--clustree_full"), action = "store", type = "logical", help = "whether to run clustree plot on full data", default = FALSE),
    )

opt_parser = OptionParser(option_list = option_list)
opt <- parse_args(opt_parser)
if(opt$verbose) print(opt)

and can be overwritten in a config file if running the script using the R/python modules in Nextflow pipeline, or they can just be added as arguments if the script is being run from the commandline.

For more information on each R and python script and what they do see the pipeline's README docs (NF-downstream_analysis README, NF-hichip_downstream README).

evaham1 / atac_neural_plate_border Goto Github PK

atac_neural_plate_border's Introduction

Neural Plate Border Analysis

Repository Organisation

Reproducing the analysis of this project

Reusing code from this repository for other projects

Reusing whole Nextflow pipeline with different input data - NF_hichip-downstream

Reusing Nextflow modules

Reusing R and python scripts

atac_neural_plate_border's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent