Coder Social home page Coder Social logo

awesome-llm-compression's Introduction

Awesome-LLM-Compression Awesome

Awesome LLM compression research papers and tools to accelerate the LLM training and inference.

Contents

Papers

Quantization

  • ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
    NeurIPS 2022 [Paper] [Code]

  • LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
    NeurIPS 2022 [Paper] [Code]

  • LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
    Arxiv 2022 [Paper]

  • Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
    Arxiv 2023 [Paper]

  • Quantized Distributed Training of Large Models with Convergence Guarantees
    Arxiv 2023 [Paper]

  • SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
    ICML 2023 [Paper] [Code]

  • GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
    ICLR 2023 [Paper] [Code]

  • RPTQ: Reorder-based Post-training Quantization for Large Language Models
    Arxiv 2023 [Paper] [Code]

  • ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
    Arxiv 2023 [Paper] [Code]

  • QLoRA: Efficient Finetuning of Quantized LLMs
    Arxiv 2023 [Paper] [Code]

  • Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
    Arxiv 2023 [Paper]

  • The Quantization Model of Neural Scaling
    Arxiv 2023 [Paper]

  • Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
    Arxiv 2023 [Paper]

  • AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
    Arxiv 2023 [Paper] [Code]

  • LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
    Arxiv 2023 [Paper]

  • SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
    Arxiv 2023 [Paper] [Code]

  • OWQ: Lessons learned from activation outliers for weight quantization in large language models
    Arxiv 2023 [Paper]

Pruning/Sparsity

  • The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
    ICLR 2023 [Paper]

  • SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
    Arxiv 2023 [Paper] [Code]

  • LLM-Pruner: On the Structural Pruning of Large Language Models
    Arxiv 2023 [Paper] [Code]

  • Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models
    ICLR 2023 TinyPapers [Paper]

  • Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of LLMs with Self-Information-Based Content Filtering
    Arxiv 2023 [Paper] [Code]

  • Learning to Compress Prompts with Gist Tokens
    Arxiv 2023 [Paper] [Code]

  • Efficient Prompting via Dynamic In-Context Learning
    Arxiv 2023 [Paper]

Distillation

  • Lifting the Curse of Capacity Gap in Distilling Language Models
    ACL 2023 [Paper] [Code]

  • Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
    ACL 2023 [Paper]

  • LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
    Arxiv 2023 [Paper] [Code]

  • Large Language Model Distillation Doesn't Need a Teacher
    Arxiv 2023 [Paper] [Code]

  • The False Promise of Imitating Proprietary LLMs
    Arxiv 2023 [Paper]

  • GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo
    Arxiv 2023 [Paper] [Code]

  • PaD: Program-aided Distillation Specializes Large Models in Reasoning
    Arxiv 2023 [Paper]

Tools

  • BMCook: Model Compression for Big Models [Code]

  • llama.cpp: Inference of LLaMA model in pure C/C++ [Code]

  • LangChain: Building applications with LLMs through composability [Code]

  • GPTQ-for-LLaMA: 4 bits quantization of LLaMA using GPTQ [Code]

  • Alpaca-CoT: An Instruction Fine-Tuning Platform with Instruction Data Collection and Unified Large Language Models Interface [Code]

awesome-llm-compression's People

Contributors

huangowen avatar michaelzhouwang avatar nbasyl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.