Coder Social home page Coder Social logo

zdf1122 / similarityregression Goto Github PK

View Code? Open in Web Editor NEW

This project forked from smlmbrt/similarityregression

1.0 0.0 0.0 1.38 MB

Predicting TF sequence-specificity similarity with weighted alignments

License: GNU General Public License v3.0

Python 9.82% R 11.99% Jupyter Notebook 78.19%

similarityregression's Introduction

SimilarityRegression

This is the code repositiory for Similarity Regression (SR) a method to predict motif similarity using weighted alignments. Description of the directories:

  • ConstructSimilarityModels/: jupyter notebooks, and R scripts used to train and select SR models. This directory contains a README that describes the notebooks in greater detail.
  • Examples/ : Contains example data and a jupyter notebook with code to read TF gene/protein information from Cis-BP, and parse it into formats that can be used to train SR models, or score sequences using existing SR models.
  • Scripts/ contains python scripts and R code for aligning sequences to a Pfam HMM.
  • similarityregression/: python module containing code to align DBDs, and score alignments using SR models.
  • CisBP/: scripts to calculate E-score overlaps from data present in CisBP flat files.

python Dependancies: numpy, pandas, biopython, sklearn

R Dependancies: caret, glmnet, PRROC, aphid, seqinr

Citation

Samuel A. Lambert, Ally Yang, Alexander Sasse, Gwendolyn Cowley, Mark X. Caddick, Quaid D. Morris, Matthew T. Weirauch, and Timothy R. Hughes (2019). Similarity Regression predicts evolution of transcription factor sequence specificity. Nature Genetics. 51:981โ€“989.

Abstract

Transcription factor (TF) binding specificities (motifs) are essential for the analysis of gene regulation. Accurate prediction of TF motifs is critical, because it is infeasible to assay all TFs in all sequenced eukaryotic genomes. There is ongoing controversy regarding the degree of motif diversification among related species that is, in part, because of uncertainty in motif prediction methods. Here we describe Similarity Regression, a significantly improved method for predicting motifs, which we use to update and expand the Cis-BP database. Similarity regression inherently quantifies TF motif evolution, and shows that previous claims of near-complete conservation of motifs between human and Drosophila are inflated, with nearly half of the motifs in each species absent from the other, largely due to extensive divergence in C2H2 zinc finger proteins. We conclude that diversification in DNA-binding motifs is pervasive, and present a new tool and updated resource to study TF diversity and gene regulation across eukaryotes.

similarityregression's People

Contributors

smlmbrt avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.