Coder Social home page Coder Social logo

ppsg / permuting-feature-enrichment-test Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cwhelan/permuting-feature-enrichment-test

0.0 1.0 0.0 168 KB

A tool to test the enrichment of genomic features by permutations

License: MIT License

Python 72.98% R 27.02%

permuting-feature-enrichment-test's Introduction

permuting-feature-enrichment-test

This tool attempts to help determine the relationship of one set of genomic features to another, by executing a set of permutation tests, and by producing a set of graphs to help visualize the results. The basic idea of the permutation testing is to randomly change the location of the regions of interest in the genome, counting the number of overlaps with the feature of interest, to determine the distribution of overlap counts you would expect to see between the features and the regions by chance. This tool does permutation tests on three different, but related, measures: the number of regions that contain a feature; the number of features that overlap a region; and the number of base pairs of overlap between regions and features.

Example Usage and Output

To test the relationship of regions A to features B using 10000 permutations run on 4 cores, run the following command:

python overlap_bps_with_feature.py A.bed "Region A" B.bed "Feature B" genome.fa.fai --permutations 1000 --cores 4

The results will be placed in the directory A_to_B and contain a file results.txt that has the results of the permutation testing:

    REGION_HITS     REAL    Region A      Feature B        46
    REGION_HITS     QUANTILE        Region A      Feature B        0.823
    FEATURE_HITS    REAL    Region A      Feature B        110
    FEATURE_HITS    QUANTILE        Region A      Feature B        0.999
    OVERLAP REAL    Region A      Feature B        753149
    OVERLAP QUANTILE        Region A      Feature B        0.933

The lines marked "REAL" have the actual observed measures for the feature and region sets. The lines marked "QUANTILE" show the quantile of the real observed value in the distribution measured by the permutations; this can be used to estimate a p-value for enrichment (or un-enrichment, if it is low).

There will also be a pdf file, A_to_B.pdf, which contains plots of the permutation distributions of each of the three measures described above, as well as a series of plots of the three measures when the regions are shifted fixed amounts to the left and to the right of their current locations. You can see an example plot here.

Prerequisites

The following programs need to be installed and in the path of the user executing the script:

The Python script requires the argparse and multiprocessing libraries The R script requires the ggplot2 and grid libraries

Input Files

The script requires three input files:

  • Region BED file
  • Feature BED file

These are BED files with the locations of your regions and features. For proper results, these should be sorted and merged (with bedtools mergeBed) ahead of time.

  • Genome index file

This is a list of the chromosome names and their lengths in the genome of interest, of the type generated by samtools faidx.

Optionally you can also use a gaps file. This is a BED file with the locations of gaps in the genome reference, or other regions you want to be excluded from the possible locations used in the location permutations.

Options

Usage for the script is:

  usage: overlap_bps_with_feature.py [-h] [--gaps GAPS]
                               [--permutations PERMUTATIONS]
                               [--cores CORES] [--replot] [--shift_only]
                               [--process_iteration_chunk PROCESS_ITERATION_CHUNK]
                               [--iteration_number ITERATION_NUMBER]
                               [--iteration_chunk_size ITERATION_CHUNK_SIZE]
                               [--use_condor] [--verbose]
                               bp_file bp_name feature_file feature_name
                               genome

The options are:

  • gaps: A BED file containing regions of the genome to exclude from location permutation
  • permutations: The number of random location permutations to compute
  • cores: The number of simultaneous processes to launch to run permutations
  • replot: if this option is used, the script will not do any new testing but will use the data files stored in the results directory to redraw the plots
  • shift_only: skip the permutation testing and only generate the shift figures
  • use_condor: execute the permutations by spawning new tasks to be run on a compute grid using the condor scheduling engine
  • iteration_chunk_size: number of permutations to compute in a single process (only useful if you are playing with distrbuting processes on condor)
  • verbose: print extra debug/logging information
  • process_iteration_chunk/iteration_number/iteration_chunk_size: do not use; these parameters are used when distributing tasks to condor

Other Tools

There are a variety of other techniques for computing the significance of enrichment between a set of regions and a set of features. Most of them are more sophisticated than this script. Here are a few to investigate:

permuting-feature-enrichment-test's People

Contributors

cwhelan avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.