PANTHER_API

Use the PANTHER system to do statistical overrepresentaion tests in Python.

See http://pantherdb.org/help/PANTHERhelp.jsp#V. for the PANTHER webservices instructions.

Documentation

Command-line usage:

Requires:

pandas
requests
bs4
html5lib

usage: panther_api.py [-h] [--organism ORGANISM] [--test_type TEST_TYPE]
                      [--annotation_option ANNOTATION_OPTION]
                      inputfile outputfile

PANTHER overrepresentation test of a gene list.

positional arguments:
  inputfile             Gene list file. One ID per line.
  outputfile            File to save results

optional arguments:
  -h, --help            show this help message and exit
  --organism ORGANISM   Organism for reference/background
  --test_type TEST_TYPE
                        One of FISHER or BINOMIAL
  --annotation_option ANNOTATION_OPTION
                        Annotation option, see table below

Annotation options

Option	Annotation Dataset
`pathway`	PANTHER Pathways
`panther_mf`	PANTHER GO-Slim Molecular Function
`panther_bp`	PANTHER GO-Slim Biological Process
`panther_cc`	PANTHER GO-Slim Cellular Component
`panther_pc`	PANTHER Protein Class
`fullgo_mf_comp`	GO molecular function complete
`fullgo_bp_comp`	GO biological process complete
`fullgo_cc_comp`	GO cellular component complete
`reactome`	Reactome pathways

Examples

./panther_api.py example_0.txt example_0_panther.txt

Reference size:         21042
Number IDs mapped:      66
Number IDs not mapped:  4

Output:

            name                            # in reference  # in list   # expected in list  fold_enrichment direction   pvalue      FDR
GO:0000902  cell morphogenesis              696             11          2.22                4.95            +           1.31E-05    8.60E-03
GO:0001701  in utero embryonic development  347             7           1.11                6.32            +           1.31E-04    4.58E-02
GO:0001890  placenta development            151             5           .48                 10.38           +           1.39E-04    4.78E-02
GO:0001892  embryonic placenta development  87              5           .28                 18.01           +           1.11E-05    7.63E-03
GO:0002009  morphogenesis of an epithelium  420             11          1.34                8.21            +           1.06E-07    2.80E-04
GO:0002064  epithelial cell development     186             6           .59                 10.11           +           3.34E-05    1.76E-02
GO:0003382  epithelial cell morphogenesis   35              4           .11                 35.81           +           7.07E-06    6.57E-03
GO:0007043  cell-cell junction assembly     104             5           .33                 15.07           +           2.53E-05    1.48E-02
GO:0007155  cell adhesion                   916             13          2.92                4.45            +           6.10E-06    6.42E-03

./panther_api.py example_1.txt example_1_panther.txt

Reference size:         21042
Number IDs mapped:      21
Number IDs not mapped:  2
No statistically significant results.

No output.

clusters_to_panther.py

This script takes as argument an output file generated by a clustering algorithm (i.e. one line per cluster, tab-delimited genes), generates the nessecary gene list files and calls the main panther_api.py function on each of them.

Instructions

usage: clusters_to_panther.py [-h] [--remove_version] [--organism ORGANISM]
                                [--test_type TEST_TYPE]
                                [--annotation_option ANNOTATION_OPTION]
                                [--min_size MIN_SIZE]
                                [--start_cluster START_CLUSTER]
                                inputfile outputprefix

PANTHER overenrichment tests on the output of a clustering algorithm.

positional arguments:
  inputfile             Paraclique file. One cluster per line, tab seperated
                        gene IDs.
  outputprefix          Prefix to give to output gene lists and enrichment
                        result files. Output will be saved to
                        OUTPUTPREFIX_cluster_<num>.txt and
                        OUTPUTPREFIX_cluster_<num>.panther

optional arguments:
  -h, --help            show this help message and exit
  --remove_version      Whether to remove version number from gene IDs
  --organism ORGANISM   Organism for reference/background.
  --test_type TEST_TYPE
                        One of FISHER or BINOMIAL
  --annotation_option ANNOTATION_OPTION
                        Annotation option, see table in code
  --min_size MIN_SIZE   Minimum cluster size to test
  --start_cluster START_CLUSTER
                        Line in file to start at (from 0)

Examples

./clusters_to_panther.py --remove_version --min_size 8 cluster_example.txt cluster_example.out

Cluster 0`
Reference size:         21042
Number IDs mapped:      66
Number IDs not mapped:  4
-----------------------------

Cluster 1
Reference size:         21042
Number IDs mapped:      21
Number IDs not mapped:  2
No statistically significant results.
No results to save
-----------------------------

Cluster 2
Reference size:         21042
Number IDs mapped:      20
Number IDs not mapped:  1
-----------------------------

If a "Session Exceeded" error occurs, you can restart from the last cluster tested using the start_cluster optional argument.

carissableker / python-panther Goto Github PK

python-panther's Introduction

PANTHER_API

Documentation

Command-line usage:

Annotation options

Examples

clusters_to_panther.py

Instructions

Examples

python-panther's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent