Coder Social home page Coder Social logo

awesome-llm-system-papers's Introduction

Awesome-LLM-System-Papers

This is a list of (non-comprehensive) LLM system papers maintained by ALCHEM Lab. Welcome to create a pull requst or an issue if we have missed any interesting papers!

Algorithm-System Co-Design

  • Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (JMLR'21) link to paper
  • Scalable and Efficient MoE Training for Multitask Multilingual Models (arXiv'21) link to paper
  • DeepSpeed-MOE: Advancing Mixture of Experts Inference and Training to Power Next-Generation AI Scale (ICML'22) link to paper

LLM Inference (Serving) Systems

Single-GPU Systems

  • TurboTransformers: An Efficient GPU Serving System For Transformer Models (PPoPP'21) link to paper
  • PetS: A Unified Framework for Parameter-Efficient Transformers Serving (ATC'22) link to paper

Distributed Systems

  • Orca: A Distributed Serving System for Transformer-Based Generative Models (OSDI'22) link to paper
  • DeepSpeed-inference: enabling efficient inference of transformer models at unprecedented scale (SC'22) link to paper
  • EnergeonAI: An Inference System for 10-100 Billion Parameter Transformer Models (arXiv'22) link to paper
  • PETALS: Collaborative Inference and Fine-tuning of Large Models (NeurIPS'22 Workshop WBRC) link to paper

LLM Training Systems

Single-GPU Systems

  • CRAMMING: Training a Language Model on a Single GPU in One Day (arXiv'22) link to paper
  • Easy and Efficient Transformer : Scalable Inference Solution For large NLP model (arXiv'22) link to paper
  • High-throughput Generative Inference of Large Language Models with a Single GPU (arXiv'23) link to paper

Distributed Systems

  • ZeRO: Memory optimizations Toward Training Trillion Parameter Models (SC'20) link to paper
  • Megatron-lm: Training multi-billion parameter language models using model parallelism (arXiv'20) link to paper
  • PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models Algorithm (ICML'21) link to paper
  • Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM (SC'21) link to paper
  • TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models (ICML'21) link to paper
  • Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model (arXiv'22) link to paper
  • Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning (OSDI'22) link to paper
  • LightSeq2: Accelerated Training for Transformer-Based Models on GPUs (SC'22) link to paper
  • PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing (arXiv'23) link to paper
  • Mobius: Fine Tuning Large-Scale Models on Commodity GPU Servers (ASPLOS'23) link to paper
  • Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression(ASPLOS'23) link to paper

General MLSys-Related Techniques (Not Complete)

  • Efficient GPU Spatial-Temporal Multitasking (TPDS'14) link to paper
  • Enabling preemptive multiprogramming on GPUs (ISCA'14) link to paper
  • Chimera: Collaborative Preemption for Multitasking on a Shared GPU (ASPLOS'15) link to paper
  • Simultaneous Multikernel GPU: Multi-tasking Throughput Processors via Fine-Grained Sharing (HPCA'16) link to paper
  • FLEP: Enabling Flexible and Efficient Preemption on GPUs (ASPLOS'17) link to paper
  • Dynamic Resource Management for Efficient Utilization of Multitasking GPUs (ASPLOS'17) link to paper
  • Mesh-TensorFlow: Deep Learning for Supercomputers (NeurIPS'18) link to paper
  • PipeDream: Fast and Efficient Pipeline Parallel DNN Training (SOSP'19) link to paper
  • GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism (NeurIPS'19) link to paper
  • PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications (OSDI'20) link to paper
  • Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences (OSDI'22) link to paper
  • Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models (ASPLOS'23) link to paper

LLM Algorithm Papers Recommended for System Researchers

  • Attention is all you need (NeurIPS'17) link to paper
  • Language Models are Unsupervised Multitask Learners (preprint from OpenAI) link to paper
  • Improving Language Understanding by Generative Pretraining (preprint from OpenAI) link to paper
  • Language Models are Few-Shot Learners (NeurIPS'20) link to paper
  • Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (JMLR'20) link to paper
  • Multitask Prompted Training Enables Zero-Shot Task Generalization (ICLR'22) link to paper
  • Finetuned Language Models are Zero-Shot Learners (ICLR'22) link to paper
  • GLaM: Efficient Scaling of Language Models with Mixture-of-Experts (ICML'22) link to paper
  • LaMDA: Language Models for Dialog Applications (arXiv'22) link to paper
  • PaLM: Scaling Language Modeling with Pathways (arXiv'22) link to paper
  • OPT: Open Pre-trained Transformer Language Models (arXiv'22) link to paper
  • Holistic Evaluation of Language Models (arXiv'22) link to paper
  • BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (arXiv'23) link to paper
  • LLaMA: Open and Efficient Foundation Language Models (arXiv'23) link to paper
  • DeepMind: Training Compute Optimal Large Language Models (preprint from DeepMind) link to paper
  • Scaling Laws for Neural Language Models (preprint) link to paper
  • Scaling Language Models: Methods, Analysis & Insights from Training Gopher (preprint from DeepMind) link to paper

Other Useful Resources

awesome-llm-system-papers's People

Contributors

amadeuschan avatar maphist0 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.