Coder Social home page Coder Social logo

afex.ai's Introduction

Screenshot 2023-06-10 at 13 04 46

Amino Acid Functions Explained

The Problem - Inferring Protein Functionality from Amino Acid sequences

Initially, BLAST was the tool of choice of classifying proteins. However, this is clearly overly slow and not the right tool for the job. Scientists came up with another method of classifying proteins into groups and training HMM's on each classification. All HMMs are then run through the protein to classify with the highest probability deciding the classification. This is also very slow if you consider the n^2 time complexity of getting a probability from an HMM. The next solution just relies on a neural network which is a complete black box and not helpful. Our solution takes the best of both worlds, it is fast and fully explainable.

The Process

Screenshot 2023-06-10 at 15 06 51

Clustering

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.SpectralClustering.html

What is the problem

Understanding how cells work is hard and important for bio research and medicine. Lots of high quality data can be obtained from RNAseq or proteomics, but interpretation is hard.

RNAseq and proteomics provide only protein sequence, at least in the first step.

Understanding the protein interactions and their roles is important so we undestand how cell works and how we can treat diseases.

We're building explainable ML model to tell us what the individual proteins do and how we know.

How are we different

People are already trying to use large models to predict protein function - but this is unexplainable blackbox.

Our model could be improved by using 3D structural data from AlphaFold, to understand spatial interactions - this would be the next iteration of our model.

Identifying important sequences could also give us insight on how proteins work and how can we engineer them

Example application

Treating cancer.

Take RNAseq data from cells from tumour and from healthy patient. We want to use the data to rationalize what caused the cancer and what targets (proteins) can we use to treat it.

The data obviously contains lots of noise (proteins very much depend on sex, age, diet, tissue type), so it's hard to find what bits of data are relevant to cancer.

By understanding the protein functions we can rationalize the differences in RNAseq and how they relate to cancer.

Understanding how the specific proteins contribute to cancer helps us decide which proteins are good targets.

afex.ai's People

Contributors

oboril avatar dvlasits avatar majkimge avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.