Coder Social home page Coder Social logo

ma-compbio / spin Goto Github PK

View Code? Open in Web Editor NEW
2.0 4.0 1.0 500 KB

nuclear compartmentalization, 3D genome, nuclear bodies, MRF

License: MIT License

Python 98.44% Shell 1.56%
3d-genome markov-random-field genome-compartmentalization genome-segmentation nuclear-compartmentalization nuclear-genome

spin's Introduction

SPIN

SPIN (Spatial Position Inference of the Nuclear genome) is a integrative computational method to identify genome-wide chromosome localization patterns relative to multiple nuclearcompartments using TSA-seq, DamID, and Hi-C data.

Required Packages

SPIN requires the following Python packages to be installed:

  • Python (tested on version 3.6)
  • scikit-learn (tested on version 0.22.2)
  • NumPy
  • SciPy
  • pandas
  • pickle

Juicer tools is required to extract Hi-C data from .hic files. Requires Java Runtime Engine installed on your file system.

Usage

After install all dependencies, run the following python command:

python main.py -i <input_signal> --hic <hic_interactions> -w <window_size> -n <number_of_states> -o <output_path> -g <genome_bin> [--prev <previous_model>] [--save]

The options:

  • -i <input_signal> : 1D genomic measurements of nuclear organization. input_signal file should be a tab-separated text file where each line is a vector of 1D genomic measurements. For example (header not included):
signal1 signal2 signal3 signal4
0.4209 -0.3468 -0.0405 0.01026
0.9098 -0.7316 -0.0961 0.0224
1.3589 -1.0421 -0.2001 0.0229
1.4688 -1.1082 -0.2716 0.0105
1.4552 -1.1599 -0.3605 -0.0345
1.3385 -1.1504 -0.3953 -0.0727
... ... ... ...
  • --hic <hic_interactions> : List of Hi-C interactions added as edges in the model. hic_interactions file should be a tab-separated text file where the first two columns are the index numbers of genomic bins (needs to be consistent with genome_bin) and the third column is edge weight (opitional). For example (header not included):
bin1 bin2 weight
1 7 1.0
3 5 1.0
4 20 1.0
1 299 1.0
7 312 1.0
... ... ...
  • -w <window_size> : Window size of each genome bin. Default = 100000

  • -n <number_of_states> : Number of states to estimate for SPIN. Default = 5

  • -o <output_path> : Output path.

  • -g <genome_bin>: Genomic coordinates of each bin. genome_bin file should be a tab-separated text file where the first three columns are the genomic coordinates of each bin and the fourth column is the index number (starting from zero). For example (header not included):

chr start end index number
chr1 0 25000 0
chr1 25000 50000 1
chr1 50000 75000 2
chr1 75000 100000 3
chr1 100000 125000 4
... ... ... ...
  • --prev <previous_model>: (opitional) Load previously saved model.

  • --save : (opitional) Save curent model to .pkl file.

Example:

python main.py -i input_chr1.txt --hic Hi-C_chr1.txt -w 25000 -n 5 -m <mode> -o example_chr1 -g bin_chr1.bed --save

Predicted states state_n can be found under output_path folder. To convert the results to bed format:

merge2bed.sh genome_bin_file state_n output.bed

Visualization

The bed file can be manually uploaded as custom annotation tracks to UCSC genome browser for visualization. An additional header must be added as the first line of the file in the following format:

track name='<track_name>' description='<description>' itemRgb='On'

Then go to a UCSC genome browser view, click the "add custom tracks" or "manage custom tracks" button below the tracks window. On the Add Custom Tracks page, load the annotation track data or URL for your custom track into the upper text box and the track documentation (optional) into the lower text box, then click the "Submit" button.

It is recommended to use distinguishable colors for different states. For example:

State itemRgb
State_1 228,26,28
State_2 55,126,184
State_3 77,175,74
State_4 152,78,163
State_5 255,127,0
... ...

Additional BED fields can be added (with 9th column showing the color in RGB value). For example (header not included):

chrom chromStart chromEnd name score strand thickStart thickEnd itemRgb
chr1 0 725000 State_1 0 . 0 725000 102,194,165
chr1 725000 875000 State_2 0 . 725000 875000 252,141,98
chr1 875000 2550000 State_3 0 . 875000 2550000 141,160,203
chr1 2550000 2650000 State_2 0 . 2550000 2650000 252,141,98
chr1 2650000 2800000 State_1 0 . 2650000 2800000 102,194,165
chr1 2800000 2925000 State_2 0 . 2800000 2925000 252,141,98
... ... ... ... ... ... ... ... ...

Use the bedToBigBed utility to create a bigBed file.

bedToBigBed input.bed chrom.sizes myBigBed.bb

It is also recommended to use track bubs for more configurable custom annotations. See the track hub help page for more information.

In addition, tracks can be visualized on Nucleome Browser. See the Nucleome Browser help page for more instructions.

Data Availability

SPIN annotations for K562 can be found here.

Citation

If you use SPIN in your work

@article{wang2020spin,
  title={SPIN reveals genome-wide landscape of nuclear compartmentalization},
  author={Wang, Yuchuan and Zhang, Yang and Zhang, Ruochi and van Schaik, Tom and Zhang, Liguo and Sasaki, Takayo and Peric-Hupkes, Daniel and Chen, Yu and Gilbert, David M and van Steensel, Bas and others},
  journal={bioRxiv},
  year={2020},
  publisher={Cold Spring Harbor Laboratory}
}

spin's People

Contributors

ma-compbio avatar wxx0316 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

wxx0316

spin's Issues

Questions about parameters of SPIN

Hi,
Thanks to develop so powerful tool, Im very intererting about it, but I have some questions obout SPIN.

The first question is the format of input_signal files. In sample data, I found there four columns. I guess they include TSA-seq, DamID signals (LMNB1 or SON), but if I only have the Dam signals (and only for LMNB1) ,can I run SPIN successfully? and whats the order of TSA-seq and DamID signals?
The second question is about hic_interaction files. How can I calculate the edge weight column, or it just is the KR normlized interactions etracted by juicer_tools?
The third questions is that could I perform SPIN for single chromsomes or it can be used for whole genome? In example, I found that SPIN only run for chr1, but I thought that inter-chromosomal interactions are more important than intra-chromosomal interactions for calculating the spatial position inference of the nuclear genome.

Looking forward your reply and thanks in advance!

Best,
Qianzhao

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.