Comments (2)
Hey, thanks for reporting this! Just noticed now that we don't wire up loaded sessions to the REPL - that should be easy enough to fix.
With switching out the cached sessions per prompt, I think that'd be out of scope for the CLI, but that should be easy enough to implement. You can see that session loading is done here:
https://github.com/rustformers/llama-rs/blob/main/llama-cli/src/main.rs#L224-L244
and the REPL just runs in a loop, feeding the user prompt to the session:
https://github.com/rustformers/llama-rs/blob/main/llama-cli/src/main.rs#L21-L63
So all you'd have to do is load the model once, and for each session: load an existing session if required, run inference, and then persist the session back to disk.
from llm.
Thanks for the quick update!
I'll try to implement the switching of the cache myself, never worked with Rust but I'll give it a go it should be fun. Thanks for pointing me in the right direction.
from llm.
Related Issues (20)
- AMD ROCm support with HIPBLAS HOT 2
- WizardCoder llama assert failure HOT 3
- NaN logits on LLaMA 65B when using 2k+ token contexts
- Default String for ConfiguredSamplers HOT 1
- SIGTRAP triggered on MacOS HOT 2
- Medusa Speculative Decoding HOT 1
- How do I use Huggingface tokenization to use a model on Huggingace in MODEL_PATH instead of my local machine? HOT 1
- Clarify MSRV policy HOT 2
- How to disable ggml logging? HOT 1
- Support for Mistral-7b HOT 5
- Disable tokenizers-remote support for the library by default HOT 1
- Reduce dependencies
- Why is the feed_prompt process so slow? HOT 5
- Support Separate Loading of Vocabulary or Tensors
- EOS is not read from gguf format HOT 1
- Behavior when missing quantization version HOT 1
- Build fails: error: no such file or directory: 'ggml/src/ggml.c'
- When using tokio and HuggingFaceRemote it breaks dropping the runtime HOT 1
- Currently in dev any inference is broken HOT 2
- Sub reddit is down HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm.