Thanks for your work, I would like to ask, how do you think the comparison results shown by spec-bench are fair? For example, REST can control the size of the datastore that needs to be maintained; lookahead needs to control the length of N-grams and the size of the pool; how do you think the results provided by spec-bench are fair? I'm not quite sure, it would be greatly appreciated if you could provide further explanation.
Thank you so much for your work in establishing this benchmark!
I wanted to ask if you would include PaSS (https://arxiv.org/pdf/2311.13581.pdf) to your benchmark analysis, since it is included in your references, and it allows nucleus sampling, greedy sampling, but it doe require the training of extra token embeddings.
I noticed that in Table 3 you present a summary of different decoding methodologies.
In particular, you present REST as a methodology amenable to Nucleus sampling, but on reading the REST methodology paper, it's not clear how they establish that their methodology maintains the same output distribution from a LLM.