Coder Social home page Coder Social logo

ncrnabert's Introduction

ncRNABert: Deciphering the landscape of non-coding RNA using language model

PyPI - Version PyPI - Python Version GitHub - LICENSE PyPI - Downloads Wheel build

Model details

Model # of parameters # of hidden size Pretraining dataset # of ncRNAs Model download
ncRNABert 303M 1024 RNAcentral 26M Download
ncRNABert 303M 1024 RNAcentral + nt - Download

Install

As a prerequisite, you must have PyTorch installed to use this repository.

You can use this one-liner for installation, using the latest release version

# latest version
pip install git+https://github.com/wangleiofficial/ncRNABert

# stable version
pip install ncRNABert

Usage

ncRNA sequence embedding

from ncRNABert.pretrain import load_ncRNABert, load_ncRNABert_ex
from ncRNABert.utils import BatchConverter
import torch

data = [
    ("ncRNA1", "ACGGAGGATGCGAGCGTTATCCGGATTTACTGGGCG"),
    ("ncRNA2", "AGGTTTTTAATCTAATTAAGATAGTTGA"),
]

ids, batch_token, lengths = BatchConverter(data)
model = load_ncRNABert()
model_ex = load_ncRNABert_ex()
with torch.no_grad():
    results = model(batch_token, lengths, repr_layers=[24])
    results_ex = model_ex(batch_token, lengths, repr_layers=[24])
# Generate per-sequence representations via averaging
token_representations = results["representations"][24]
token_representations_ex = results_ex["representations"][24]
sequence_representations = []
sequence_representations_ex = []
batch_lens = [len(item[1]) for item in data]
for i, tokens_len in enumerate(batch_lens):
    sequence_representations.append(token_representations[i, 1 : tokens_len - 1].mean(0))
    sequence_representations_ex.append(token_representations_ex[i, 1 : tokens_len - 1].mean(0))

License

This source code is licensed under the Apache-2.0 license found in the LICENSE file in the root directory of this source tree.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.