Coder Social home page Coder Social logo

tbd's Introduction

Build Status Coverage Status PyPI version License: MIT

Project TBD [Tau Be Damned]


The project TBD [Tau Be Damned] aims to use the amino acid sequence of a protein to identify whether it is disordered.

The TBD logo, nice isn't it?

Project Objective


Our goal is to build a tool to identify whether a protein is disordered based on its amino acid sequence. We have collected amino acid sequences for ordered and disordered proteins from publicly available datasets to train a machine learning model to perform the classification task.

Mission


We share an interest in proteins. While many proteins fold into regular conformations which can be easily analyzed on a structural basis, intrinsically disordered proteins (IDPs) do not. IDPs like tau are implicated in diseases such as Alzheimer's and other neurodegenerative diseases. We aim to employ machine learning tools to improve the study of IDPs for scientific researchers and citizen scientists alike.

Requirements


Package TBD has the following major dependencies:

  1. python = 3.6
  2. tensorflow = 2.4
  3. scikit-learn = 0.23
  4. scipy = 1.5
  5. pandas = 1.1

The detailed list of dependencies can be found in the environment.yml file.

Installation


The package TBD can be installed with the following steps:

  1. Download the repository: git clone https://github.com/Intrinsically-Disordered/TBD.git
  2. Go to the root directory: cd TBD
  3. Create a virtual environment: conda env create --name tbdenv -f environment.yml
  4. Activate the environment: conda activate tbdenv
  5. Install the package: python setup.py install
  6. Check installation run: python -c "import tbd"

Usage


An example to run the whole pipeline of data processing, modeling and prediction using a single script can be found here: run_tbd.py

An example to predict with the pretrained model can be found here: example notebook

Use Cases


This project aims to be of use to the general public with interest in learning about classifying proteins, scientists determining if the protein they are working with or designed is disordered, and by those with experience in machine learning.

Use cases graphic

Modules Overview


  • preprocessing.py : Functions related to data cleaning and data processing to be ready for modeling.
  • model.py : Functions related to modeling of convolutional neural network (CNN).
  • predict.py : Functions related to predicting whether protein sequences are ordered or disordered using trainedmodel.
  • evaluate.py : Functions related to evaluting the trained model.
  • utils.py : Utility functions that can be used by other modules.

Community Guidelines


We welcome the members of open-source community to extend the functionalities of TBD, submit feature requests and report bugs.

Feature Request:

If you would like to suggest a feature or start a discussion on possible extension of TBD, please feel free to raise an issue.

Bug Report:

If you would like to report a bug, please follow this link.

Contributions:

If you would to contribute to TBD, you can fork the repository, add your contribution and generate a pull request. The complete guide to make contributions can be found at this link

tbd's People

Contributors

jiaweih avatar rjpecoraro avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.