Please add support to connect falcon and llama models with this . about chatdev HOT 6 CLOSED

openbmb commented on June 2, 2024 2

Please add support to connect falcon and llama models with this .

from chatdev.

Comments (6)

j-loquat commented on June 2, 2024 2

One idea for this is that you could allow up to two local models to be loaded and assigned to one or more agents. We could load one model into the GPU and the other into the CPU with some RAM allocated to it. So say llama2 into the gpu and use it for most of the agents, and then a Python optimized smaller model into cpu for the engineer agent.

from chatdev.

andraz commented on June 2, 2024 2

In theory a long running process could:

accumulate a queue of prompts/tasks for a specific agent
swap the engine to that agent and load it in 10-30s
perform actions and save the queue results of the agent
then prepare queues for other agents from results
repeat at 1.

This would allow us to use big models more efficiently, not accumulating a lot of time penalty for VRAM loading times on swaps.

from chatdev.

hemangjoshi37a commented on June 2, 2024 2

@andraz @j-loquat your solution and suggestions are looking good to implement.

from chatdev.

Alphamasterliu commented on June 2, 2024 1

Hello, regarding the use of other GPT models or local models, you can refer to the discussion on our GitHub page: #27. Some of these models have corresponding configurations in this Pull Request: #53. You may consider forking the project and giving them a try. While our team currently lacks the time to test every model, it's worth noting that they have received positive feedback and reviews. If you have any other questions, please don't hesitate to ask. We truly appreciate your support and suggestions. We are continuously working to improve more significant features, so please stay tuned.😊

from chatdev.

hemangjoshi37a commented on June 2, 2024

This has been referenced in #27

from chatdev.

j-loquat commented on June 2, 2024

One thing to consider with local LLM agents is that we should keep the prompts shorter than for OpenAI and reduce the temperature to perhaps lower than 0.5. Lower temp and shorter prompts makes a huge difference in local response times as per GPT4All project.

from chatdev.

Recommend Projects

Please add support to connect falcon and llama models with this . about chatdev HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent