c0sogi / llmchat Goto Github PK

View Code? Open in Web Editor NEW

225.0 7.0 37.0 59.72 MB

A full-stack Webui implementation of Large Language model, such as ChatGPT or LLaMA.

License: MIT License

Python 87.36% Shell 0.03% HTML 2.99% Batchfile 0.07% JavaScript 1.17% Jupyter Notebook 8.38%

chatbot chatgpt fastapi flutter fullstack mysql python redis restapi sqlalchemy

llmchat's People

Contributors

Stargazers

Watchers

Forkers

vaginessa gpt-develop-group tang-guangyue gfarrel 729533572 1980dragon sharejing hpunetha je2kwan2 yihaocompany flutter-preview deepakbhatia developmentunited broctune asketos tooniez limcheekin prepnew calirenaecreatives the-llm-lab bet0x bizai3000 sundogs8603 michal-miky-jankovsky cdaprod joonyeong97 bigi0squeal stophobia ankitsharma22458 mvandermeulen arifuzzamanjoy vsingh9076 abdrmlr manojkumardheenadhayalan mlincon tangyiyong

llmchat's Issues

Make qdrant collection name a constant

QDRANT_COLLECTION: str = environ["QDRANT_COLLECTION"]
and
shared_vectorestore_name: str = QDRANT_COLLECTION
This makes an instance of qdrant shareable

Fails on Docker swarm

"WARNING You Probably Don't Need this Docker Image: " we should follow that advice and remove Gunicorn and the requirement for forwarded-ip. You can't used fixed ip for traefik in swarm mode or Kubernetes. Also better to let swarm manage replication instead of Gunicorn workers

Too many prompts

Why so many prompts for a simple 1 sentence chat? The embeddings are fine, but I think app is sending same chat message to openai in a loop

6:00 AM 16 requests 6:05 AM Local time: May 21, 2023, 2:05 AM gpt-4-0314, 1 request 271 prompt + 6 completion = 277 tokens 6:05 AM Local time: May 21, 2023, 2:05 AM text-embedding-ada-002-v2, 4 requests 600 prompt + 0 completion = 600 tokens 6:10 AM Local time: May 21, 2023, 2:10 AM gpt-4-0314, 3 requests 1,498 prompt + 155 completion = 1,653 tokens 6:10 AM Local time: May 21, 2023, 2:10 AM text-embedding-ada-002-v2, 2 requests 37 prompt + 0 completion = 37 tokens 6:15 AM Local time: May 21, 2023, 2:15 AM gpt-4-0314, 4 requests 5,737 prompt + 148 completion = 5,885 tokens 6:15 AM Local time: May 21, 2023, 2:15 AM text-embedding-ada-002-v2, 2 requests 14 prompt + 0 completion = 14 tokens

Are you open to a private chat

I have an idea for company/product. I think it's winner take all situation but it requires a first mover advantage. Btw I live in Canada

chat box not expandable

The chat box should increase in height if you add more content. Fixed height makes it hard to use long prompts

Save embeddings metadata to MySQL

Admin should be able to examine and delete redundant vectors by means of stored metadata. Date of inserts, author , title etc should be saved in mysql. Saving in Redis may pollute the vector?

consumer in _websocket_sender takes too much time for me

@c0sogi Hi, thanks for working on this, I was creating something using this.

could you help me out please, item: MessageFromWebsocket | str = await buffer.queue.get()

here my sender get's stuck for some reason.

I'm using a svelte frontend client

docker-compose-lan

Trying to install into a LAN network so everyone within the network can use. I added IP as HOST_MAIN="192.168.2.202". The chat interface is reachable but authentication is denied

production error: 404 page not found

Navigating to sub.domain.tld/chatgpt returns 404 page not found.
connection is secure and letsencrypt appears to be working

Can't login in web UI

Anyone used it?
I launch it through docker command, there is user/password login page in left sidebar. I can't register a new user, it always report XMLHTTPRequest error.

Then I go to mysql docker, and create the table info, and find users/api_keys table, so I insert one row into users table:

INSERT INTO users(status,email,password, marketing_agree, created_at, updated_at)
VALUES ('admin','[email protected]','12341234', 1, now(), now() );

I saw new user exists, but I still can't login from UI. Any detail instruction?

Referencing Models

In llms.py your example model reference location is model_path="./llama_models/ggml/Wizard-Vicuna-7B-Uncensored.ggmlv2.q4_1.bin".
This leads me to believe the model should be at LLMChat/app/llama_models/ggml/Wizard-Vicuna-7B-Uncensored.ggmlv2.q4_1.bin,
but the app is not able to load the model.
i created the llama_models and ggml directories as there were none.
Thank you

Implement private memory

We should use /imped (notice the p) to embed text to a private in-browser db. This project makes it possible
Vector Storage

API_ENV=?

For production what is the value
API_ENV="prod" or API_ENV="production"

Modular architecture

So far we have achieved a lot with this codebase. Perhaps its time, we pause new features and refactor code base to allow for core/plugin structure. Adopting modular architecture, will give this project incredible flexibility. I want us to be the "drupal" of this space, but we must structure it now before code base grow too big.

Lets standardize what a module is and push:

embedding/document querying
browsing
Prompts (prefix/suffix)
All no core features

This allows developers to add, maintain modules without interfering with core. Project is useable as is. I will send you an email shortly. Your thoughts

email registration fails silently

If the email has . the registration fails. e.g. [email protected] fails

Admin UI

Integrate FastAPI Admin or Amis version

[Feature Request] Support InternLM

Dear LLMChat developer,

Greetings! I am vansinhu, a community developer and volunteer at InternLM. Your work has been immensely beneficial to me, and I believe it can be effectively utilized in InternLM as well. Welcome to add Discord https://discord.gg/gF9ezcmtM3 . I hope to get in touch with you.

Best regards,
vansinhu

Overlap embeddings

chunk_overlap: int = 0, overlap should be about 100 to preserve meanings cutoff by chunk_size

Replace Redis

Lets implement qdrant for embeddings. Use Redis for what it's good at - caching chats. Qdrant is fast and stable and excellent at search and filtering see benchmark.
Redis single thread execution is bad for vertical scaling. Down the road we should allow BYOD (bring your own db)

Please implement as separate docker for independent scaling docker run -p 6333:6333 qdrant/qdrant

Display token/word count and cost

This will help contain cost.

Courtesy of chatpad

text-embedding-ada-002 for embeddings

openai says to use "text-embedding-ada-002" for all text embeddings. It's very cheap. gpt3.5/4 are 1000x more expensive tokenizer_model: str = "text-embedding-ada-002"

Restrict embed to elevated role

Possible abuse of shared memory if all authenticated user can embed text. Restrict embedding to admin/editor roles

Sensible chatroom title

Chatroom title should be editable and default to sensible summary of the initial prompt

/query as defaut

Implement /bypass to search via LLM without hitting vector. Otherwise chats must check vector for embeddings before interacting with LLM. The purpose of this app I believe is to grant longer context memory. Being forced to add /query in front of every chat is tiresome. Embeddings can prompt LLM how to behave on chat initiation.

Implement stop generating button

There has to be a way to stop response generation. This is best practice.

How can I switch to local LLM engine

In Chat UI, there is a long list of LLM model. The default one is GPT 3.5 Turbo, which is openAI as I guess.
I configure openAI api key in .env, so it should be used, as the answer is very fast.

When I try to switch it to Llama 7B, it report:

An error occurred while generating text: Model llama-7b-GGML is currently booting.

I setup another llm engine "vllm" based on llama-2-7b-chat model, and expose in port 3000, it is compatible with openAI API.
how can I configure it to use this new engine?

Add minimum hardware requirement

My testing shows you need minimum 2gb of ram and ubuntu-22

Integrate gptCache

Lets save some $$$ by implementing GPTcache
There is a docker image and I think it may already work with Redis
$ docker pull zilliz/gptcache:latest $ docker run -p 8000:8000 -it zilliz/gptcache:latest

Admin can use temperature settings to bypass cache.

Important: cache must maintain user privacy. Admin can add sitewide cache. This will make FAQ generation a breeze and cost nothing to retrieve cached info each time.

Universal Sentence Encoder for embedding

Openai embedding is quite weak, proprietary and token hungry. Lets move to USE. PDFGPT has an implementation of USE we can modify and build upon.

using sub.domain.tld as HOST_MAIN

Hi,
I want to install this as production to a subdomain. Using sub.domain.tld as HOST_MAIN should work I am guessing?
Thanks for this app, I will test and give my 2 cents of feedback.

Message prefix/suffix

Ability to add prefix/suffix instruction to first message in chat. Best as .env variable. I want to instruct openAI about policies. Kindly give me a pointer so I can attempt this.

Move model selection to .env

Either make gpt4 default or allow admin to set in .env. Users may override it with /model

Chatroles.system default startup TMPL

How to use a DESCRIPTION_TMPL for openai "chatroles.system" models. I see DESCRIPTION being passed as "description" in other modals. Gpt needs this feature, if it exist how do I trigger it.?

Fail to run api in docker

When I run everything in docker, the api container fail to build llama.cpp as cmake didn't exist.
Does it require the external gcc to be 11?

>docker-compose -f docker-compose-local.yaml up api
[+] Running 3/0
 ✔ Container llmchat-cache-1  Running                                                                                                                              0.0s 
 ✔ Container llmchat-db-1     Running                                                                                                                              0.0s 
 ✔ Container llmchat-api-1    Created                                                                                                                              0.0s 
Attaching to llmchat-api-1, llmchat-cache-1, llmchat-db-1
llmchat-api-1    | None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
llmchat-api-1    | [2023-09-19 07:50:40,323] SQLAlchemy:CRITICAL - Current DB connection of LocalConfig: db/traffic@traffic_admin
llmchat-api-1    | INFO:     Started server process [1]
llmchat-api-1    | INFO:     Waiting for application startup.
llmchat-api-1    | [2023-09-19 07:50:41,191] ApiLogger:CRITICAL - ⚙️ Booting up...
llmchat-api-1    | [2023-09-19 07:50:41,191] ApiLogger:CRITICAL - MySQL DB connected!
llmchat-api-1    | [2023-09-19 07:50:41,195] ApiLogger:CRITICAL - Redis CACHE connected!
llmchat-api-1    | [2023-09-19 07:50:41,195] ApiLogger:CRITICAL - uvloop installed!
llmchat-api-1    | [2023-09-19 07:50:41,195] ApiLogger:CRITICAL - Llama CPP server monitoring started!
llmchat-api-1    | INFO:     Application startup complete.
llmchat-api-1    | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
llmchat-api-1    | [2023-09-19 07:50:41,200] ApiLogger:ERROR - Llama CPP server is not available
llmchat-api-1    | [2023-09-19 07:50:41,200] ApiLogger:CRITICAL - Starting Llama CPP server
llmchat-api-1    | - Loaded .env file successfully.
llmchat-api-1    | - API_ENV: local
llmchat-api-1    | - DOCKER_MODE: True
llmchat-api-1    | - Parsing function for function calling: control_browser
llmchat-api-1    | - Parsing function for function calling: control_web_page
llmchat-api-1    | - Parsing function for function calling: web_search
llmchat-api-1    | - Parsing function for function calling: vectorstore_search
llmchat-api-1    | Using openai embeddings
llmchat-api-1    | 🦙 llama.cpp DLL not found, building it...
llmchat-api-1    | 🦙 Trying to build llama.cpp DLL: /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-cublas.sh
llmchat-api-1    | /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-cublas.sh: line 2: cd: /app/repositories/llama_cpp/vendor/llama.cpp: No such file or directory
llmchat-api-1    | /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-cublas.sh: line 6: cmake: command not found
llmchat-api-1    | /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-cublas.sh: line 7: cmake: command not found
llmchat-api-1    | cp: cannot stat '/app/repositories/llama_cpp/vendor/llama.cpp/build/bin/Release/libllama.so': No such file or directory
llmchat-api-1    | 🦙 Could not build llama.cpp DLL!
llmchat-api-1    | 🦙 Trying to build llama.cpp DLL: /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-default.sh
llmchat-api-1    | /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-default.sh: line 2: cd: /app/repositories/llama_cpp/vendor/llama.cpp: No such file or directory
llmchat-api-1    | /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-default.sh: line 6: cmake: command not found
llmchat-api-1    | /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-default.sh: line 7: cmake: command not found
llmchat-api-1    | cp: cannot stat '/app/repositories/llama_cpp/vendor/llama.cpp/build/bin/Release/libllama.so': No such file or directory
llmchat-api-1    | 🦙 Could not build llama.cpp DLL!
llmchat-api-1    | [2023-09-19 07:50:41,256] ApiLogger:WARNING - 🦙 Could not import llama-cpp-python repository: 🦙 Could not build llama.cpp DLL!
llmchat-api-1    | ...trying to import installed llama-cpp package...
llmchat-api-1    | INFO:     10.101.7.43:42488 - "GET / HTTP/1.1" 304 Not Modified
llmchat-api-1    | INFO:     10.101.7.43:42488 - "GET /main.dart.js HTTP/1.1" 200 OK
llmchat-api-1    | INFO:     10.101.7.43:49942 - "GET / HTTP/1.1" 304 Not Modified
llmchat-api-1    | INFO:     10.101.7.43:49942 - "GET /main.dart.js HTTP/1.1" 304 Not Modified

Token conservation

I propose we use a two LLMs approach to cut them on the cost of using gpt4 and all expensive future variants.

This mostly applies if you are using GTP4, but why use anything else :)

GTP 4 handles current inquiries and gpt3 summarizes past histories.
As the conversation approaches admin's set token limit (e.g.2000), use gpt3 to create a summary.
use this summary for next conversation except if user hits regenerate then use original index of full history
Admin on/off switch
index 0/ first prompt is never deleted

You have:

Initial prompt
summary
last question

This may even get gpt4 to be more focus and on point

New key before chatting is unnecessary

Can we remove the "new key" requirements and just load a new chatroom. It's a bit confusing

Separate container for web

Advice needed. I am trying to move web into a separate container and leave only fastAPI in api.

Trash bin

Deleted chats should be moved to trash bin and deleted after 30 days. Add "trash" link to side bar. This should load a list of deleted chats in the main column. Lets mimic the gmail interface for this feature. I think people forget openai will soon change their ui to match true and tested email interface.

enable copying of chat text

It is impossible to copy the result of a chat.

Default system message

I altered the "user" method and added a "content= ChatConfig.chat_role_system_message"

` async def user(msg: str, translate: bool, buffer: BufferedUserContext) -> None:
"""Handle user message, including translation"""
if len(buffer.current_user_message_histories) == 0 and UTC.check_string_valid(buffer.current_chat_room_name):
buffer.current_chat_room_name = msg[:20]
await CacheManager.update_profile(user_chat_context=buffer.current_user_chat_context)

    # Add default system message at the start of a conversation
        await MessageManager.add_message_history_safely(
            user_chat_context=buffer.current_user_chat_context,
            content= ChatConfig.chat_role_system_message,
            role=ChatRoles.SYSTEM,
        )
        await SendToWebsocket.init(buffer=buffer, send_chat_rooms=True, wait_next_query=True)`

and in ChatConfig I added

chat_role_system_message: Optional[str] = environ["CHAT_ROLE_SYSTEM_MESSAGE"]

Now the ai behaviour is more predictable

production fails on debian 11

This is the error I get:
File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1206, in _gcd_import File "<frozen importlib._bootstrap>", line 1178, in _find_and_load File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/app/main.py", line 39, in <module> from app.common.app_settings import create_app File "/app/app/common/app_settings.py", line 7, in <module> from app.auth.admin import MyAuthProvider File "/app/app/auth/admin.py", line 5, in <module> from app.common.config import config File "/app/app/common/config.py", line 177, in <module> config = Config.get() ^^^^^^^^^^^^ File "/app/app/common/config.py", line 122, in get _config = { ^ KeyError: '"prod"'

Rename project - possible copyright violations

OpenAI is beginning to enforce their trademark on GPT.
https://techstartups.com/2023/05/04/openai-send-cease-owner-of-sitegpt-ai-forced-to-rebrand/
They may disallow api for any site or project using that name

Icons tool tips

Add tooltips to icons so that hovering displays hints. Or add labels