It'd be useful if there was a way to define tokens that would cause the output to stop

[end of text] is 5 tokens. <div class="snippet-c

Ah, i see. I guess this <a class="issue-link js-issue-link" data-error-text="Failed to

Stop keywords,about ggerganov/llama.cpp

Bec-k commented on July 24, 2024 4

Stop token, generative model should be stopped when generation encounters stop token. I haven't found that in the cli. I suppose you have it built-in for each supported model.

from llama.cpp.

Bec-k commented on July 24, 2024 3

Was surprised that there is no such setting by default in the --help, is it implemented or is considered out of scope for this project?

from llama.cpp.

blackhole89 commented on July 24, 2024 2

The [end of text] output corresponds to a special token (number 2) in the LLaMa embedding. As for stopping on other token strings, the "reverse prompt" parameter does that in interactive mode now, with exactly the opening post's use case in mind. Is there a use case for something like it in non-interactive mode?

from llama.cpp.

KevinColemanInc commented on July 24, 2024 2

[end of text] is 5 tokens.

   518 -> ' ['
   355 -> 'end'
   310 -> ' of'
  1426 -> ' text'
 29962 -> ']'

I looked in the vocab file to see if there are any uncommon long tokens that would be cheaper stop tokens and I found arquitect to be a single token that I don't expect to show up in the dialogue.

28827 -> ' arquitect'

from llama.cpp.

petergeneric commented on July 24, 2024 1

Yeah, it would just be useful to have more control over that in cases where the model itself doesn't want to stop

from llama.cpp.

j-f1 commented on July 24, 2024 1

It could be useful for cases where you want to pull structured data out of the model (for example, asking for a city’s population, then reading tokens up until the next whitespace to get the number out).

from llama.cpp.

DavidCWGA commented on July 24, 2024 1

It is absolutely useful in non-interactive mode. In any "conversation"-style input it prevents the model from talking to itself. To really make this useful you would need a switch that would stop the program re-printing the prompt, and only printing the new generated output.

from llama.cpp.

j-f1 commented on July 24, 2024 1

[end of text] is actually a single token (sometimes represented as </s> but llama.cpp translates it as the empty string by default) that we have special behavior for.

from llama.cpp.

KevinColemanInc commented on July 24, 2024 1

Ah, i see. I guess this #365 doesn't work, because you can't encode the stop token as a string literal. So you have to use another set of tokens, which doesn't always work.

from llama.cpp.

jminardi commented on July 24, 2024

I believe there already are stop keywords. At least some of my responses end with [end of text] before the character limit.

from llama.cpp.

DavidCWGA commented on July 24, 2024

Yes, seconding this. It's sometimes very important to set a name prefix or even a newline character as the stop keyword.

from llama.cpp.

Foundation42 commented on July 24, 2024

These stop keywords would have to be recorded in token space, and at each token generated a check for possible match made. Seems like the right way to do that would be state machine.

there may be other uses down the line where a callback is called every time a match is made, which could be useful for implementing "actions", although may be outside of the scope here idk

from llama.cpp.

KevinColemanInc commented on July 24, 2024

@j-f1 Why does my llama.cpp logs show 5 tokens (see above)? I am using the stop-keywords code.

from llama.cpp.

j-f1 commented on July 24, 2024

That’s because you’re trying to tokenize that literal string — if you search in the source code for "[end of text]" you’ll see where it gets printed out.

from llama.cpp.

dwillie commented on July 24, 2024

Should this be considered resolved by #1032 ? The chain of closed-in-favor-of lead me there, but it doesn't actually refer back to this issue.

from llama.cpp.

ejones commented on July 24, 2024

Seems reasonable to me.

from llama.cpp.

Green-Sky commented on July 24, 2024

@Bec-k can you elaborate on what you think is not implemented?

from llama.cpp.

talwrii commented on July 24, 2024

This is the -r option at the command line.

-r,    --reverse-prompt PROMPT  halt generation at PROMPT, return control in interactive mode`

from llama.cpp.

0wwafa commented on July 24, 2024

if I do: -r "<|im_end|>" it does not work and continues.

from llama.cpp.

Stop keywords about llama.cpp HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent