Coder Social home page Coder Social logo

amsswanglab / re-goa Goto Github PK

View Code? Open in Web Editor NEW
5.0 0.0 1.0 41.74 MB

A tool for non-coding genomic regions function enrichment analysis based on Regulatory Elements Gene Ontology Annotation (RE-GOA)

Python 100.00%
gene-ontology network-embedding regulatory-elements

re-goa's Introduction

RE-GOA

A tool for non-coding genomic regions functional enrichment analysis based on Regulatory Elements Gene Ontology Annotation (RE-GOA). Resources for RE-GOA in mouse and human are availible.

Installation

  1. Run the following commands for installation:
wget https://github.com/AMSSwanglab/RE-GOA/archive/master.zip  
unzip master.zip
cd RE-GOA-master
  1. Download the necessary files for RE-GOA into RE-GOA-master at: https://drive.google.com/file/d/17nYBAGKc2ZZK06mY-9RQPZVSWgAoHxgu/view?usp=sharing and run the fowllowing command:
tar -zxvf REGOA_Data.tar.gz

Run RE-GOA

  1. Prepare input file: RE-GOA takes a .bed file as input, with which each line shoud have at least 3 columns: chromchromStartchromEnd. .bed file for mm9 or hg19 are accepted. Few lines of an example input file exampleinput.bed are shown as follows:
chr1	3389104	3389281
chr1	3991781	3992030
chr1	4333622	4333830
chr1	4402701	4402873
chr1	4424781	4425411
chr1	4463463	4465117
chr1	4469419	4470362
chr1	4527428	4529092
  1. Editconfigures.txt, including:
  • path for path of input file;
  • datapath=./datamgi/ for mm9 bed file and datapath=./datahg/ for hg19 bed file;
  • resultpath for path of output files;
  • filename for input file name.

For example:

path	./inputfiledict/
datapath	./datamgi/
resultpath	./outputfiledict/
filename	exampleinput.bed
  1. After preparing the input files as above, run the following command:
python peaksanalysis.py
  1. Output files in resultpath fold includes four files, exampleinput_BP.txt,exampleinput_CC.txt,exampleinput_MF.txt,exampleinput_genes.txt.

exampleinput_BP.txt,exampleinput_CC.txt,exampleinput_MF.txt list out enriched terms in Biological Process, Cellular Component, Molecular Function, and have 9 colunms in each line, which are described as follows:

1.  GO id
2.  GO term name
3.  m (Num of peaks annotated with the term)
4.  N (Num of peaks located in defined REs)
5.  p (background probability of the term)
6.  raw p-value=Binom(m,N,p)
7.  -log p-value
8.  B-H corrected Q-value
9.  -log Q-value

Terms are sorted by p-value.

exampleinput_genes.txt lists out enriched genes, and have 2 columns, which are described as follows:

1.  gene
2.  n (Num of peaks located in a RE which regulates the gene)

Genes are sorted by n.

Requirements

Python environment: python 3

Python package: pickle, math, and scipy

Memory: >= 3.0 Gb

It usually takes about one minite for computing a .bed file and and write all text output files.

REs Annotations

Annotations of REs can be browsed after unzip REs Annotations.zip. Dictionary ./hg19/ and ./mm9/ contain REs annotation of REs for human and mouse in BP, MF, and CC. Each line in a XX_REGOA.txt file contains several columns, with first column represents the RE (chromchromStartand chromEnd), and the following columns are GO terms' id which are annotated to the RE. Few lines are showm as an example as follows:

chr15_73319767_73323849	GO:0003674	GO:0005515	GO:0005488
chr15_90046441_90046590	GO:0042802	GO:0003824	GO:0005488	GO:0005515	GO:0003674
chr15_98418432_98419329	GO:0005488	GO:0060090	GO:0005515	GO:0030674	GO:0003674

Codes for generate RE-GOA

Codes for training andannotating are availible at generate RE-GOA, and associated datas are availible at https://drive.google.com/file/d/1jIU_DtBSQXID65Ky2HpTt5Z3wFojG4rh/view?usp=sharing

Citation

If you use RE-GOA or RE-GOA associated resources, please cite

Yurun, et al. Annotating regulatory elements by heterogeneous network embedding. 2021.

re-goa's People

Contributors

yurunlu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Forkers

yurunlu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.