Coder Social home page Coder Social logo

awesome-transformers's Introduction

Awesome Transformers

Transformers

A curated list of awesome transformer models.

If you want to contribute to this list, send a pull request or reach out to me on twitter: @abacaj. Let's make this list useful.

There are a number of models available that are not entirely open source (non-commercial, etc), this repository should serve to also make you aware of that. Tracking the original source/company of the model will help.

I would also eventually like to add model use cases. So it is easier for others to find the right one to fine-tune.

Format:

  • Model name: short description, usually from paper
    • Model link (usually huggingface or github)
    • Paper link
    • Source as company or group
    • Model license

Table of Contents

Encoder models

  • ALBERT: "A Lite" version of BERT
  • BERT: Bidirectional Encoder Representations from Transformers
  • DistilBERT: Distilled version of BERT smaller, faster, cheaper and lighter
  • DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
  • Electra: Pre-training Text Encoders as Discriminators Rather Than Generators
  • RoBERTa: Robustly Optimized BERT Pretraining Approach

Decoder models

  • BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
  • CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
  • LLaMa: Open and Efficient Foundation Language Models
  • GPT: Improving Language Understanding by Generative Pre-Training
  • GPT-2: Language Models are Unsupervised Multitask Learners
  • GPT-J: A 6 Billion Parameter Autoregressive Language Model
  • GPT-NEO: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow
  • GPT-NEOX-20B: An Open-Source Autoregressive Language Model
  • NeMo Megatron-GPT: Megatron-GPT 20B is a transformer-based language model.
  • OPT: Open Pre-trained Transformer Language Models
  • BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
  • GLM: An Open Bilingual Pre-Trained Model
  • YaLM: Pretrained language model with 100B parameters

Encoder+decoder (seq2seq) models

  • T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
  • FLAN-T5: Scaling Instruction-Finetuned Language Models
  • Code-T5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
  • Bart: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
  • Pegasus: Pre-training with Extracted Gap-sentences for Abstractive Summarization
  • MT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
  • UL2: Unifying Language Learning Paradigms
  • FLAN-UL2: A New Open Source Flan 20B with UL2
  • EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation

Multimodal models

  • Donut: OCR-free Document Understanding Transformer
  • LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
    • Model
    • Paper
    • Microsoft
    • CC BY-NC-SA 4.0 (non-commercial)
  • TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
  • CLIP: Learning Transferable Visual Models From Natural Language Supervision
  • Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Vision models

  • DiT: Self-supervised Pre-training for Document Image Transformer
  • DETR: End-to-End Object Detection with Transformers
  • EfficientFormer: Vision Transformers at MobileNet Speed

Audio models

  • Whisper: Robust Speech Recognition via Large-Scale Weak Supervision
  • VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

Recommendation models

  • Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)

awesome-transformers's People

Contributors

abacaj avatar evison avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.