Coder Social home page Coder Social logo

deepgene's Introduction

DeepGene: An Efficient Foundation Model for Genomics based on Pan-genome Graph Transformer

We introduce DeepGene, a model leveraging Pan-genome and Minigraph representations to encompass the broad diversity of genetic language. DeepGene employs the rotary position embedding to improve the length extrapolation in various genetic analysis tasks. On the 28 tasks in Genome Understanding Evaluation, DeepGene reaches the top position in 9 tasks, second in 5, and achieves the overall best score. DeepGene outperforms other cutting-edge models for its compact model size and its superior efficiency in processing sequences of varying lengths.

Preprint available at bioRxiv.

1. Environment setup

Please see PanGeneGraphTrans/requirements.txt.

2. Pan-genome Dataset

2.1 Download data

Download Minigraph file (.rgfa) and place it in the dataPretreatment folder.

2.2 Data processing

Please see dataPretreatment and PanGeneGraphTrans/dataset.py.

3. Model Pre-training

Please see PanGeneGraphTrans/pretrain.py.

4. Model Fine-tuning

4.1 Download pre-trained model

Download pretrained model.

4.2 Fine-tune with pre-trained model

Please see PanGeneGraphTrans/finetune.py.

Download prom_5000 data and place it in the \data\LPD\promoter_prediction\prom_5000 folder.

deepgene's People

Contributors

yummyjay avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.