MLC LLM

Documentation | Blog | WebLLM | WebStableDiffusion | Discord

Machine Learning Compilation for Large Language Models (MLC LLM) is a high-performance universal deployment solution that allows native deployment of any large language models with native APIs with compiler acceleration. The mission of this project is to enable everyone to develop, optimize and deploy AI models natively on everyone's devices with ML compilation techniques.

MLC LLM supports the following platforms and hardware:

	AMD GPU	NVIDIA GPU	Apple M1/M2 GPU	Intel GPU
Linux / Win	✅ Vulkan, ROCm	✅ Vulkan, CUDA	N/A	✅ Vulkan
macOS	✅ Metal	N/A	✅ Metal	✅ Metal
Web Browser	✅ WebGPU	✅ WebGPU	✅ WebGPU	✅ WebGPU
iOS / iPadOS	✅ Metal on Apple M1/M2 GPU
Android	✅ OpenCL on Adreno GPU		✅ OpenCL on Mali GPU

News

[08/02/2023] Dockerfile released for CUDA performance benchmarking
[07/19/2023] Supports 7B/13B/70B Llama-2

Getting Started

Check out our instruction page to try out!

Universal Deployment APIs

MLC LLM provides multiple sets of APIs across platforms and environments. These include

Citation

Please consider citing our project if you find it useful:

@software{mlc-llm,
    author = {MLC team},
    title = {{MLC-LLM}},
    url = {https://github.com/mlc-ai/mlc-llm},
    year = {2023}
}

The underlying techniques of MLC LLM include:

References (Click to expand)

@inproceedings{tensorir,
    author = {Feng, Siyuan and Hou, Bohan and Jin, Hongyi and Lin, Wuwei and Shao, Junru and Lai, Ruihang and Ye, Zihao and Zheng, Lianmin and Yu, Cody Hao and Yu, Yong and Chen, Tianqi},
    title = {TensorIR: An Abstraction for Automatic Tensorized Program Optimization},
    year = {2023},
    isbn = {9781450399166},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3575693.3576933},
    doi = {10.1145/3575693.3576933},
    booktitle = {Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},
    pages = {804–817},
    numpages = {14},
    keywords = {Tensor Computation, Machine Learning Compiler, Deep Neural Network},
    location = {Vancouver, BC, Canada},
    series = {ASPLOS 2023}
}

@inproceedings{metaschedule,
    author = {Shao, Junru and Zhou, Xiyou and Feng, Siyuan and Hou, Bohan and Lai, Ruihang and Jin, Hongyi and Lin, Wuwei and Masuda, Masahiro and Yu, Cody Hao and Chen, Tianqi},
    booktitle = {Advances in Neural Information Processing Systems},
    editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
    pages = {35783--35796},
    publisher = {Curran Associates, Inc.},
    title = {Tensor Program Optimization with Probabilistic Programs},
    url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/e894eafae43e68b4c8dfdacf742bcbf3-Paper-Conference.pdf},
    volume = {35},
    year = {2022}
}

@inproceedings{tvm,
    author = {Tianqi Chen and Thierry Moreau and Ziheng Jiang and Lianmin Zheng and Eddie Yan and Haichen Shen and Meghan Cowan and Leyuan Wang and Yuwei Hu and Luis Ceze and Carlos Guestrin and Arvind Krishnamurthy},
    title = {{TVM}: An Automated {End-to-End} Optimizing Compiler for Deep Learning},
    booktitle = {13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)},
    year = {2018},
    isbn = {978-1-939133-08-3},
    address = {Carlsbad, CA},
    pages = {578--594},
    url = {https://www.usenix.org/conference/osdi18/presentation/chen},
    publisher = {USENIX Association},
    month = oct,
}

Acknowledgements

This project is initiated by members from CMU catalyst, UW SAMPL, SJTU, OctoML and the MLC community. We would love to continue developing and supporting the open-source ML community.

This project is only possible thanks to the shoulders open-source ecosystems that we stand on. We want to thank the Apache TVM community and developers of the TVM Unity effort. The open-source ML community members made these models publicly available. PyTorch and Hugging Face communities that make these models accessible. We would like to thank the teams behind Vicuna, SentencePiece, LLaMA, Alpaca, MOSS and RWKV. We also would like to thank the Vulkan, Swift, C++, Python Rust communities that enables this project.

shawnlu25 / mlc-llm Goto Github PK

mlc-llm's Introduction

MLC LLM

News

Getting Started

Universal Deployment APIs

Citation

Links

Acknowledgements

mlc-llm's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent