Comments (19)
Stop token, generative model should be stopped when generation encounters stop token. I haven't found that in the cli. I suppose you have it built-in for each supported model.
from llama.cpp.
Was surprised that there is no such setting by default in the --help, is it implemented or is considered out of scope for this project?
from llama.cpp.
The [end of text]
output corresponds to a special token (number 2) in the LLaMa embedding. As for stopping on other token strings, the "reverse prompt" parameter does that in interactive mode now, with exactly the opening post's use case in mind. Is there a use case for something like it in non-interactive mode?
from llama.cpp.
[end of text]
is 5 tokens.
518 -> ' ['
355 -> 'end'
310 -> ' of'
1426 -> ' text'
29962 -> ']'
I looked in the vocab file to see if there are any uncommon long tokens that would be cheaper stop tokens and I found arquitect
to be a single token that I don't expect to show up in the dialogue.
28827 -> ' arquitect'
from llama.cpp.
Yeah, it would just be useful to have more control over that in cases where the model itself doesn't want to stop
from llama.cpp.
It could be useful for cases where you want to pull structured data out of the model (for example, asking for a city’s population, then reading tokens up until the next whitespace to get the number out).
from llama.cpp.
It is absolutely useful in non-interactive mode. In any "conversation"-style input it prevents the model from talking to itself. To really make this useful you would need a switch that would stop the program re-printing the prompt, and only printing the new generated output.
from llama.cpp.
[end of text]
is actually a single token (sometimes represented as </s>
but llama.cpp translates it as the empty string by default) that we have special behavior for.
from llama.cpp.
Ah, i see. I guess this #365 doesn't work, because you can't encode the stop token as a string literal. So you have to use another set of tokens, which doesn't always work.
from llama.cpp.
I believe there already are stop keywords. At least some of my responses end with [end of text]
before the character limit.
from llama.cpp.
Yes, seconding this. It's sometimes very important to set a name prefix or even a newline character as the stop keyword.
from llama.cpp.
These stop keywords would have to be recorded in token space, and at each token generated a check for possible match made. Seems like the right way to do that would be state machine.
there may be other uses down the line where a callback is called every time a match is made, which could be useful for implementing "actions", although may be outside of the scope here idk
from llama.cpp.
@j-f1 Why does my llama.cpp logs show 5 tokens (see above)? I am using the stop-keywords code.
from llama.cpp.
That’s because you’re trying to tokenize that literal string — if you search in the source code for "[end of text]"
you’ll see where it gets printed out.
from llama.cpp.
Should this be considered resolved by #1032 ? The chain of closed-in-favor-of lead me there, but it doesn't actually refer back to this issue.
from llama.cpp.
Seems reasonable to me.
from llama.cpp.
@Bec-k can you elaborate on what you think is not implemented?
from llama.cpp.
This is the -r option at the command line.
-r, --reverse-prompt PROMPT halt generation at PROMPT, return control in interactive mode`
from llama.cpp.
if I do: -r "<|im_end|>" it does not work and continues.
from llama.cpp.
Related Issues (20)
- Bug: a null-pointer defer in examples/gguf/gguf.cpp/gguf_ex_read_0 and gguf_ex_read_1
- Feature Request: Hope to support Qwen VL
- Bug - Can't build vulkan backend on RISC-V platform anymore HOT 5
- Bug: gemma2 perplexity pending forever
- Bug: MESA: error: ../src/intel/vulkan/anv_device.c:4237: VK_ERROR_OUT_OF_DEVICE_MEMORY HOT 4
- Bug: GGML_HIP_UMA causes consistency errors HOT 6
- Bug: In a small n_ctx_slot, the llama.cpp begins gibberish HOT 1
- Bug: Weird output from llama-speculative HOT 11
- Bug: Qwen-2-7b-q8 and Qwen-2-7b-instruct-q8 giving weird output when run with CUDA support HOT 2
- Bug: ROCm CUDA error HOT 1
- Run Llama.cpp in silent mode HOT 1
- Llama.cpp release notes lacking descriptions in the github.com page
- lliblama.so is missing HOT 1
- Bug: GGML_CUDA_FORCE_CUBLAS cannot be compile for hipblas HOT 2
- Newest apple model unsupported...
- Bug: Failed to load model HOT 10
- Support for H2O Danube3 Family of Models HOT 4
- Feature Request: Support Codestral Mamba HOT 6
- Este set
- <?xml version="1.0" encoding="UTF-8"?>
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama.cpp.