Coder Social home page Coder Social logo

yurunlu / re-goa Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 345 KB

A tool for non-coding genomic regions function enrichment analysis based on Regulatory Elements Gene Ontology Annotation (RE-GOA)

Python 100.00%
regulatory-elements gene-ontology network-embedding

re-goa's Introduction

RE-GOA

A tool for non-coding genomic regions functional enrichment analysis based on Regulatory Elements Gene Ontology Annotation (RE-GOA). Resources for RE-GOA in mouse and human are availible.

Installation

  1. Run the following commands for installation:
wget https://github.com/AMSSwanglab/RE-GOA/archive/master.zip  
unzip master.zip
cd RE-GOA-master
  1. Download the necessary files for RE-GOA into RE-GOA-master at: https://drive.google.com/file/d/17nYBAGKc2ZZK06mY-9RQPZVSWgAoHxgu/view?usp=sharing and run the fowllowing command:
tar -zxvf REGOA_Data.tar.gz

Run RE-GOA

  1. Prepare input file: RE-GOA takes a .bed file as input, with which each line shoud have at least 3 columns: chromchromStartchromEnd. .bed file for mm9 or hg19 are accepted. Few lines of an example input file exampleinput.bed are shown as follows:
chr1	3389104	3389281
chr1	3991781	3992030
chr1	4333622	4333830
chr1	4402701	4402873
chr1	4424781	4425411
chr1	4463463	4465117
chr1	4469419	4470362
chr1	4527428	4529092
  1. Editconfigures.txt, including:
  • path for path of input file;
  • datapath=./datamgi/ for mm9 bed file and datapath=./datahg/ for hg19 bed file;
  • resultpath for path of output files;
  • filename for input file name.

For example:

path	./inputfiledict/
datapath	./datamgi/
resultpath	./outputfiledict/
filename	exampleinput.bed
  1. After preparing the input files as above, run the following command:
python peaksanalysis.py
  1. Output files in resultpath fold includes four files, exampleinput_BP.txt,exampleinput_CC.txt,exampleinput_MF.txt,exampleinput_genes.txt.

exampleinput_BP.txt,exampleinput_CC.txt,exampleinput_MF.txt list out enriched terms in Biological Process, Cellular Component, Molecular Function, and have 9 colunms in each line, which are described as follows:

1.  GO id
2.  GO term name
3.  m (Num of peaks annotated with the term)
4.  N (Num of peaks located in defined REs)
5.  p (background probability of the term)
6.  raw p-value=Binom(m,N,p)
7.  -log p-value
8.  B-H corrected Q-value
9.  -log Q-value

Terms are sorted by p-value.

exampleinput_genes.txt lists out enriched genes, and have 2 columns, which are described as follows:

1.  gene
2.  n (Num of peaks located in a RE which regulates the gene)

Genes are sorted by n.

Requirements

Python environment: python 3

Python package: pickle, math, and scipy

Memory: >= 3.0 Gb

It usually takes about one minite for computing a .bed file and and write all text output files.

Codes for generate RE-GOA

Codes for training andannotating are availible at generate RE-GOA, and associated datas are availible at https://drive.google.com/file/d/1jIU_DtBSQXID65Ky2HpTt5Z3wFojG4rh/view?usp=sharing

Citation

If you use RE-GOA or RE-GOA associated resources, please cite

Yurun, et al. Annotating regulatory elements by heterogeneous network embedding. 2021.

re-goa's People

Contributors

yurunlu avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.