Use the PANTHER system to do statistical overrepresentaion tests in Python.
See http://pantherdb.org/help/PANTHERhelp.jsp#V. for the PANTHER webservices instructions.
Requires:
- pandas
- requests
- bs4
- html5lib
usage: panther_api.py [-h] [--organism ORGANISM] [--test_type TEST_TYPE]
[--annotation_option ANNOTATION_OPTION]
inputfile outputfile
PANTHER overrepresentation test of a gene list.
positional arguments:
inputfile Gene list file. One ID per line.
outputfile File to save results
optional arguments:
-h, --help show this help message and exit
--organism ORGANISM Organism for reference/background
--test_type TEST_TYPE
One of FISHER or BINOMIAL
--annotation_option ANNOTATION_OPTION
Annotation option, see table below
Option | Annotation Dataset |
---|---|
pathway |
PANTHER Pathways |
panther_mf |
PANTHER GO-Slim Molecular Function |
panther_bp |
PANTHER GO-Slim Biological Process |
panther_cc |
PANTHER GO-Slim Cellular Component |
panther_pc |
PANTHER Protein Class |
fullgo_mf_comp |
GO molecular function complete |
fullgo_bp_comp |
GO biological process complete |
fullgo_cc_comp |
GO cellular component complete |
reactome |
Reactome pathways |
./panther_api.py example_0.txt example_0_panther.txt
Reference size: 21042
Number IDs mapped: 66
Number IDs not mapped: 4
Output:
name # in reference # in list # expected in list fold_enrichment direction pvalue FDR
GO:0000902 cell morphogenesis 696 11 2.22 4.95 + 1.31E-05 8.60E-03
GO:0001701 in utero embryonic development 347 7 1.11 6.32 + 1.31E-04 4.58E-02
GO:0001890 placenta development 151 5 .48 10.38 + 1.39E-04 4.78E-02
GO:0001892 embryonic placenta development 87 5 .28 18.01 + 1.11E-05 7.63E-03
GO:0002009 morphogenesis of an epithelium 420 11 1.34 8.21 + 1.06E-07 2.80E-04
GO:0002064 epithelial cell development 186 6 .59 10.11 + 3.34E-05 1.76E-02
GO:0003382 epithelial cell morphogenesis 35 4 .11 35.81 + 7.07E-06 6.57E-03
GO:0007043 cell-cell junction assembly 104 5 .33 15.07 + 2.53E-05 1.48E-02
GO:0007155 cell adhesion 916 13 2.92 4.45 + 6.10E-06 6.42E-03
./panther_api.py example_1.txt example_1_panther.txt
Reference size: 21042
Number IDs mapped: 21
Number IDs not mapped: 2
No statistically significant results.
No output.
This script takes as argument an output file generated by a clustering algorithm (i.e. one line per cluster, tab-delimited genes), generates the nessecary gene list files and calls the main panther_api.py
function on each of them.
usage: clusters_to_panther.py [-h] [--remove_version] [--organism ORGANISM]
[--test_type TEST_TYPE]
[--annotation_option ANNOTATION_OPTION]
[--min_size MIN_SIZE]
[--start_cluster START_CLUSTER]
inputfile outputprefix
PANTHER overenrichment tests on the output of a clustering algorithm.
positional arguments:
inputfile Paraclique file. One cluster per line, tab seperated
gene IDs.
outputprefix Prefix to give to output gene lists and enrichment
result files. Output will be saved to
OUTPUTPREFIX_cluster_<num>.txt and
OUTPUTPREFIX_cluster_<num>.panther
optional arguments:
-h, --help show this help message and exit
--remove_version Whether to remove version number from gene IDs
--organism ORGANISM Organism for reference/background.
--test_type TEST_TYPE
One of FISHER or BINOMIAL
--annotation_option ANNOTATION_OPTION
Annotation option, see table in code
--min_size MIN_SIZE Minimum cluster size to test
--start_cluster START_CLUSTER
Line in file to start at (from 0)
./clusters_to_panther.py --remove_version --min_size 8 cluster_example.txt cluster_example.out
Cluster 0`
Reference size: 21042
Number IDs mapped: 66
Number IDs not mapped: 4
-----------------------------
Cluster 1
Reference size: 21042
Number IDs mapped: 21
Number IDs not mapped: 2
No statistically significant results.
No results to save
-----------------------------
Cluster 2
Reference size: 21042
Number IDs mapped: 20
Number IDs not mapped: 1
-----------------------------
If a "Session Exceeded" error occurs, you can restart from the last cluster tested using the start_cluster
optional argument.