Awesome-LLM-Compression

Awesome LLM compression research papers and tools to accelerate the LLM training and inference.

Papers
Tools

Papers

Quantization

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
NeurIPS 2022 [Paper] [Code]
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
NeurIPS 2022 [Paper] [Code]
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Arxiv 2022 [Paper]
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Arxiv 2023 [Paper]
Quantized Distributed Training of Large Models with Convergence Guarantees
Arxiv 2023 [Paper]
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
ICML 2023 [Paper] [Code]
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
ICLR 2023 [Paper] [Code]
RPTQ: Reorder-based Post-training Quantization for Large Language Models
Arxiv 2023 [Paper] [Code]
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Arxiv 2023 [Paper] [Code]
QLoRA: Efficient Finetuning of Quantized LLMs
Arxiv 2023 [Paper] [Code]
Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Arxiv 2023 [Paper]
The Quantization Model of Neural Scaling
Arxiv 2023 [Paper]
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Arxiv 2023 [Paper]
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Arxiv 2023 [Paper] [Code]
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Arxiv 2023 [Paper]
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Arxiv 2023 [Paper] [Code]
OWQ: Lessons learned from activation outliers for weight quantization in large language models
Arxiv 2023 [Paper]

Pruning/Sparsity

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
ICLR 2023 [Paper]
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Arxiv 2023 [Paper] [Code]
LLM-Pruner: On the Structural Pruning of Large Language Models
Arxiv 2023 [Paper] [Code]
Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models
ICLR 2023 TinyPapers [Paper]
Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of LLMs with Self-Information-Based Content Filtering
Arxiv 2023 [Paper] [Code]
Learning to Compress Prompts with Gist Tokens
Arxiv 2023 [Paper] [Code]
Efficient Prompting via Dynamic In-Context Learning
Arxiv 2023 [Paper]

Distillation

Lifting the Curse of Capacity Gap in Distilling Language Models
ACL 2023 [Paper] [Code]
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
ACL 2023 [Paper]
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Arxiv 2023 [Paper] [Code]
Large Language Model Distillation Doesn't Need a Teacher
Arxiv 2023 [Paper] [Code]
The False Promise of Imitating Proprietary LLMs
Arxiv 2023 [Paper]
GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo
Arxiv 2023 [Paper] [Code]
PaD: Program-aided Distillation Specializes Large Models in Reasoning
Arxiv 2023 [Paper]

Tools

BMCook: Model Compression for Big Models [Code]
llama.cpp: Inference of LLaMA model in pure C/C++ [Code]
LangChain: Building applications with LLMs through composability [Code]
GPTQ-for-LLaMA: 4 bits quantization of LLaMA using GPTQ [Code]
Alpaca-CoT: An Instruction Fine-Tuning Platform with Instruction Data Collection and Unified Large Language Models Interface [Code]

michaelzhouwang / awesome-llm-compression Goto Github PK

awesome-llm-compression's Introduction

Awesome-LLM-Compression

Contents

Papers

Quantization

Pruning/Sparsity

Distillation

Tools

awesome-llm-compression's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent