Coder Social home page Coder Social logo

duolinwang / acm-bcb-2019-tutorial Goto Github PK

View Code? Open in Web Editor NEW

This project forked from riti4538/acm-bcb-2019-tutorial

0.0 1.0 0.0 18.96 MB

Slides, code, and examples for the Low-dimensional Representation of Biological Sequence Data tutorial at the ACM-BCB 2019 conference

Python 12.14% Jupyter Notebook 87.86%

acm-bcb-2019-tutorial's Introduction

ACM-BCB-2019-Tutorial

Slides, code, and examples for the Low-dimensional Representation of Biological Sequence Data tutorial at the ACM-BCB 2019 conference. These materials have been partially funded by the NSF ISS BIGDATA grant No. 1836914.

  • ACM-BCB-2019_embedding_tutorial: slides for the Low-dimensional Representation of Biological Sequence Data tutorial at the ACM-BCB 2019 conference.
  • word2vec-examples.ipynb: a Jupyter notebook with two simple examples using Word2Vec in the context of english words and biological sequence data. Requires sequences.txt, questions-words.txt, and a pretrained Word2Vec embedding (e.g. https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit).
  • multilateration-examples.ipynb: a Jupyter notebook with two examples using multilateration to generate embeddings for DNA 3-mers and to generate and embedding for the Hamming graph over amino acid sequences of length 8. Requires multilateration.py.
  • multilateration.py: code implementing the Information Content Heuristic (ICH) algorithm for approximating the metric dimension of graphs.
  • questions-words.txt: analogies for testing Word2Vec embeddings (from https://github.com/nicholas-leonard/word2vec/blob/master/questions-words.txt).
  • human_cds.txt: human coding sequences obtained from Ensembl Biomart 8/13/2019 (https://www.ensembl.org/biomart/martview/) Human genes (GRCh38.p12).
  • sequences.txt: each sequence from humans_cds.txt on a separate line.
  • parseData.py: generate sequences.txt given human_cds.txt.
  • .loc files: Word2Vec embeddings based on sequences.txt using different dimensions and reading frames.
  • Multilateration
    • Code to generate resolving sets for Hamming graphs.
    • Embeddings of Hamming graphs and codons generated using multilateration.
  • Word2Vec
    • Code to generate embeddings of k-mers based on the data in sequences.txt.
    • Embeddings of k-mers using Word2Vec
    • An NLP example of Word2Vec.
  • TSNE
    • Code to visualize embeddings using t-SNE.

acm-bcb-2019-tutorial's People

Contributors

riti4538 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.