Coder Social home page Coder Social logo

Comments (4)

mudler avatar mudler commented on June 18, 2024

@netandreus this seems to likely happen when the prompt exhausts the context size - can you check if that's causing issues in your case by bumping the context size?

In any case sounds reasonable to bail out early instead of trying to free space in the KV cache. this seem also related to #2258 - can you also try by setting batch to 1 in the model configuration and see if keeps happening?

parameters:
  batch: 1

from localai.

DavidGOrtega avatar DavidGOrtega commented on June 18, 2024

related to #2258

from localai.

netandreus avatar netandreus commented on June 18, 2024

Thank you for assistance, I will check.

from localai.

imihic avatar imihic commented on June 18, 2024

It seems that this error also happens if we enable parallel llama.cpp processing. For an example, setting the context size to 8192 and the number of parallel processes to 20, the token stream generation always stops at around 410 characters, which is roughly equal to 8192 divided by 20.

So, instead of each process allocating 8192 context window size, as specified in the .env/yaml file, the backend takes this value and splits it between all the processes.

Is this a bug or expected behaviour? If it's expected it might not be a bad idea to clarify this behaviour in the documentation.

from localai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.