Coder Social home page Coder Social logo

hemingkx / spec-bench Goto Github PK

View Code? Open in Web Editor NEW
84.0 1.0 8.0 3.96 MB

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding

Home Page: https://sites.google.com/view/spec-bench

License: Apache License 2.0

Shell 0.23% Python 65.40% Rust 1.72% C 32.65%

spec-bench's Introduction

Spec-Bench: A Comprehensive Benchmark and Unified
Evaluation Platform for Speculative Decoding

| Paper | Blog | Leaderboard | Roadmap |

timeline

Speedup comparison of Speculative Decoding methods on Spec-Bench, evaluated by Vicuna-7B-v1.3.

Introduction

Spec-Bench is a comprehensive benchmark designed for assessing Speculative Decoding methods across diverse scenarios. Based on Spec-Bench, we aim to establish and maintain a unified evaluation platform for open-source Speculative Decoding approaches. This platform facilitates the systematic assessment of existing methods in the same device and testing environment, thereby ensuring fair comparisons.

Currently, Spec-Bench supports the evaluation of the following open source models:

Update

2024.3.12: We now support statistics for #Mean accepted tokens.

2024.3.11: We have integrated Hydra into Spec-Bench, check it out!

Installation

conda create -n specbench python=3.9
conda activate specbench
cd Spec-Bench
pip install -r requirements.txt

Model Weights

Download corresponding model weights (if required) and modify the checkpoint path in eval.sh.

Additonal Setup

REST (Optional)

Build DraftRetriever from source
cd model/rest/DraftRetriever
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
maturin build --release --strip -i python3.9 # will produce a .whl file
pip3 install ./target/wheels/draftretriever-0.1.0-cp39-cp39-linux_x86_64.whl
Create a datastore
cd model/rest/datastore
./datastore.sh # modify your own path

Inference

Select specific command line in eval.sh, the results will be stored in data/spec_bench/model_answer/.

./eval.sh

Speedup Report

Obtain the corresponding speedup compared to vanilla autoregressive decoding.

python evaluation/speed.py --file-path /your_own_path/eagle.jsonl --base-path /your_own_path/vicuna.jsonl

Result Comparison

Examine whether the generated results are equal to autoregressive decoding or not.

python evaluation/equal.py --file-path /your_own_path/model_answer/ --jsonfile1 vicuna.jsonl --jsonfile2 eagle.jsonl

Contributing

We warmly welcome contributions and discussions related to Spec-Bench! If you have any suggestions for improvements or ideas you'd like to discuss, please don't hesitate to open an issue. This will allow us to collaborate and discuss your ideas in detail.

More models are welcome! - If you're aware of any open-source Speculative Decoding methods not currently included in Spec-Bench, we encourage you to contribute by submitting a pull request. This helps ensure Spec-Bench remains a comprehensive and fair benchmarking platform for comparing existing methods. Please ensure that your changes are well-tested before submission.

Acknowledgments

This codebase is built from Medusa and EAGLE. We integrated code implementations of multiple open-source Speculative Decoding methods to facilitate unified evaluation.

Citation

If you find the resources in this repository useful, please cite our paper:

@misc{xia2024unlocking,
      title={Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding}, 
      author={Heming Xia and Zhe Yang and Qingxiu Dong and Peiyi Wang and Yongqi Li and Tao Ge and Tianyu Liu and Wenjie Li and Zhifang Sui},
      year={2024},
      eprint={2401.07851},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

spec-bench's People

Contributors

hemingkx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

spec-bench's Issues

REST methodology verification process

I noticed that in Table 3 you present a summary of different decoding methodologies.

In particular, you present REST as a methodology amenable to Nucleus sampling, but on reading the REST methodology paper, it's not clear how they establish that their methodology maintains the same output distribution from a LLM.

How to think a comparison is fair

Thanks for your work, I would like to ask, how do you think the comparison results shown by spec-bench are fair? For example, REST can control the size of the datastore that needs to be maintained; lookahead needs to control the length of N-grams and the size of the pool; how do you think the results provided by spec-bench are fair? I'm not quite sure, it would be greatly appreciated if you could provide further explanation.

Add Hydra

Maybe consider adding Hydra to these? It's a very similar approach to EAGLE, so I'd be curious to see what the performance difference is.

PaSS methodology.

Thank you so much for your work in establishing this benchmark!

I wanted to ask if you would include PaSS (https://arxiv.org/pdf/2311.13581.pdf) to your benchmark analysis, since it is included in your references, and it allows nucleus sampling, greedy sampling, but it doe require the training of extra token embeddings.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.