Coder Social home page Coder Social logo

gseapy's Introduction

GSEAPY

GSEAPY: Gene Set Enrichment Analysis in Python.

https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square https://travis-ci.org/BioNinja/gseapy.svg?branch=master Documentation Status

The main documentation for GSEAPY can be found at https://pythonhosted.org/gseapy

An example to use gseapy, please click here: Example

Release notes : https://github.com/BioNinja/gseapy/releases

GSEAPY is a python wrapper for GSEA and Enrichr.

It's used for convenient GO enrichments and produce publishable quality figures in python.

GSEAPY could be used for RNA-seq, ChIP-seq, Microarry data.

Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).

The full GSEA is far too extensive to describe here; see GSEA documentation for more information.

Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr .

Why GSEAPY

I would like to use Pandas to explore my data, but I did not find a convenient tool to do gene set enrichment analysis in python. So, here is my reason:

  • Running inside python console without switch to R!!!
  • User friendly for both wet and dry lab usrers.
  • Produce pubilishable figures.
  • Doing many jobs at the same time without using mouse to select differrent data table, differrent gene sets repeatly.
  • Easy to use in Bash shell.

GSEA Java version output:

This is an example of GSEA desktop application output

docs/GSEA_OCT4_KD.png

GSEAPY Prerank module output

Using the same data from GSEA, GSEAPY reproduce the example above.

docs/gseapy_OCT4_KD.png

Generated by GSEAPY

GSEAPY figures are supported by all matplotlib figures formats.

You can modify GSEA plots easily in .pdf files. Please Enjoy.

GSEAPY enrichr module

The powerfull module will enable you perform gene set enrichment analysis extreamly easily.

docs/enrichr.PNG

The only thing you need to prepeare is a gene list file in txt format(one gene id per row).

Note: Enrichr uses a list of Entrez gene symbols as input.

For example, both a list object and txt file are supported for enrchr API

# if you perfer to run gseapy.enrchr() inside python console, you could assign a list object to
# gseapy like this.
gene_list = ['SCARA3', 'LOC100044683', 'CMBL', 'CLIC6', 'IL13RA1', 'TACSTD2', 'DKKL1',
                'CSF1', 'CITED1', 'SYNPO2L']
# an alternative way is that you could provide a gene list txt file which looks like this:
with open('data/gene_list.txt') as genes:
    print(genes.read())


CTLA2B
SCARA3
LOC100044683
CMBL
CLIC6
IL13RA1
TACSTD2
DKKL1
CSF1
CITED1
SYNPO2L
TINAGL1
PTX3

Installation

Install gseapy package from pypi and download.
# if you have conda(the recommended way)
$ conda install -c bioconda gseapy
# or
$ conda install -c bioninja gseapy

# or use pip
$ pip install gseapy
You may instead want to use the development version from Github, by running
$ pip install git+git://github.com/BioNinja/gseapy.git#egg=gseapy

Dependency

  • Python 2.7 or 3.3+

Mandatory

  • Numpy
  • Pandas
  • Matplotlib
  • Beautifulsoup4
  • Requests(for enrichr API)

You may also need to install lxml, html5lib, if you could not parse xml files.

Run GSEAPY

GSEAPY has four subcommands: replot, call, prerank, enrichr.

The replot module reproduce GSEA desktop version results. The only input for GSEAPY is the location to GSEA results.

The call module produce GSEAPY results. The input requries a txt file(FPKM, Expected Counts, TPM, et.al), a cls file, and gene_sets file in gmt format.

The prerank module produce GSEAPY results. The input expects a pre-ranked gene list dataset with correlation values, which in .rnk format, and gene_sets file in gmt format. prerank module is an API to GSEA pre-rank tools.

All input files' formats are identical to GSEA desktop version. See GSEA documentation for more information.

The enrichr module will using enrichr online tool. It will generate results in txt format.

For command line usage:

# An example to reproduce figures using replot module.
$ gseapy replot -i ./Gsea.reports -o test


# An example to compute using gseapy call module
$ gseapy call -d exptable.txt -c test.cls -g gene_sets.gmt -o test

# An example to compute using gseapy prerank module
$ gseapy prerank -r gsea_data.rnk -g gene_sets.gmt -o test

# An example to use enrichr api
# see details of -g parameter below, -d parmameter is optional
$ gseapy enrichr -i gene_list.txt -g KEGG_2016 -d pathway_enrichment -o test

Run gseapy inside python console:

  1. Prepare expression.txt, gene_sets.gmt and test.cls required by GSEA, you could do this
import gseapy
# An example to reproduce figures using replot module.
gseapy.replot(indir='./Gsea.reports',outdir='test')

# calculate es, nes, pval,fdrs, and produce figures using gseapy.
gseapy.call(data='expression.txt', gene_sets='gene_sets.gmt', cls='test.cls', outdir='test')

# using prerank tool
gseapy.prerank(rnk='gsea_data.rnk', gene_sets='gene_sets.gmt', outdir='test')
  1. If you perfer to use assign Dataframe, dict, list to gseapy, you could do this
# assign dataframe, and use enrichr libary data set 'KEGG_2016'
expression_dataframe = pd.DataFrame()

sample_name = ['A','A','A','B','B','B']

# assign gene_sets parameter with enrichr library name is suported.
gseapy.call(data=expression_dataframe, gene_sets='KEGG_2016', cls= sample_names, outdir='test')

# using prerank tool
gene_ranked_dataframe = pd.DataFrame()
gseapy.prerank(rnk=gene_ranked_dataframe, gene_sets='KEGG_2016', outdir='test')
  1. For enrichr , you could assign a list object or a txt file
# assign a list object to enrichr
l = ['SCARA3', 'LOC100044683', 'CMBL', 'CLIC6', 'IL13RA1', 'TACSTD2', 'DKKL1', 'CSF1',
     'SYNPO2L', 'TINAGL1', 'PTX3', 'BGN', 'HERC1', 'EFNA1', 'CIB2', 'PMP22', 'TMEM173']

gseapy.enrichr(gene_list=l, description='pathway', gene_sets='KEGG_2016', outdir='test')

# or a txt file path.
gseapy.enrichr(gene_list='gene_list.txt', description='pathway', gene_sets='KEGG_2016',
               outdir='test', cutoff=0.05, format='png' )

For a full list of enrchr libary name :

 #see full list of latest enrichr library names, which will pass to -g parameter:
 names = gseapy.get_library_name()
 print(names[:20])


['Genome_Browser_PWMs',
'TRANSFAC_and_JASPAR_PWMs',
'ChEA_2013',
'Drug_Perturbations_from_GEO_2014',
'ENCODE_TF_ChIP-seq_2014',
'BioCarta_2013',
'Reactome_2013',
'WikiPathways_2013',
'Disease_Signatures_from_GEO_up_2014',
'KEGG_2016',
'TF-LOF_Expression_from_GEO',
'TargetScan_microRNA',
'PPI_Hub_Proteins',
'GO_Molecular_Function_2015',
'GeneSigDB',
'Chromosome_Location',
'Human_Gene_Atlas',
'Mouse_Gene_Atlas',
'GO_Cellular_Component_2015',
'GO_Biological_Process_2015',
'Human_Phenotype_Ontology',]

Bug Report

If you would like to report any bugs when you running gseapy, don't hesitate to create an issue on github here, or email me: [email protected]

To get help of GSEAPY

Visit the document site at https://pythonhosted.org/gseapy

gseapy's People

Contributors

oreh avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.