c0sogi / llmchat Goto Github PK
View Code? Open in Web Editor NEWA full-stack Webui implementation of Large Language model, such as ChatGPT or LLaMA.
License: MIT License
A full-stack Webui implementation of Large Language model, such as ChatGPT or LLaMA.
License: MIT License
QDRANT_COLLECTION: str = environ["QDRANT_COLLECTION"]
and
shared_vectorestore_name: str = QDRANT_COLLECTION
This makes an instance of qdrant shareable
"WARNING You Probably Don't Need this Docker Image: " we should follow that advice and remove Gunicorn and the requirement for forwarded-ip. You can't used fixed ip for traefik in swarm mode or Kubernetes. Also better to let swarm manage replication instead of Gunicorn workers
Why so many prompts for a simple 1 sentence chat? The embeddings are fine, but I think app is sending same chat message to openai in a loop
6:00 AM 16 requests 6:05 AM Local time: May 21, 2023, 2:05 AM gpt-4-0314, 1 request 271 prompt + 6 completion = 277 tokens 6:05 AM Local time: May 21, 2023, 2:05 AM text-embedding-ada-002-v2, 4 requests 600 prompt + 0 completion = 600 tokens 6:10 AM Local time: May 21, 2023, 2:10 AM gpt-4-0314, 3 requests 1,498 prompt + 155 completion = 1,653 tokens 6:10 AM Local time: May 21, 2023, 2:10 AM text-embedding-ada-002-v2, 2 requests 37 prompt + 0 completion = 37 tokens 6:15 AM Local time: May 21, 2023, 2:15 AM gpt-4-0314, 4 requests 5,737 prompt + 148 completion = 5,885 tokens 6:15 AM Local time: May 21, 2023, 2:15 AM text-embedding-ada-002-v2, 2 requests 14 prompt + 0 completion = 14 tokens
I have an idea for company/product. I think it's winner take all situation but it requires a first mover advantage. Btw I live in Canada
The chat box should increase in height if you add more content. Fixed height makes it hard to use long prompts
Admin should be able to examine and delete redundant vectors by means of stored metadata. Date of inserts, author , title etc should be saved in mysql. Saving in Redis may pollute the vector?
@c0sogi Hi, thanks for working on this, I was creating something using this.
could you help me out please, item: MessageFromWebsocket | str = await buffer.queue.get()
here my sender get's stuck for some reason.
I'm using a svelte frontend client
Trying to install into a LAN network so everyone within the network can use. I added IP as HOST_MAIN="192.168.2.202"
. The chat interface is reachable but authentication is denied
Navigating to sub.domain.tld/chatgpt returns 404 page not found.
connection is secure and letsencrypt appears to be working
Anyone used it?
I launch it through docker command, there is user/password login page in left sidebar. I can't register a new user, it always report XMLHTTPRequest error.
Then I go to mysql docker, and create the table info, and find users/api_keys table, so I insert one row into users table:
INSERT INTO users(status,email,password, marketing_agree, created_at, updated_at)
VALUES ('admin','[email protected]','12341234', 1, now(), now() );
I saw new user exists, but I still can't login from UI. Any detail instruction?
In llms.py your example model reference location is model_path="./llama_models/ggml/Wizard-Vicuna-7B-Uncensored.ggmlv2.q4_1.bin".
This leads me to believe the model should be at LLMChat/app/llama_models/ggml/Wizard-Vicuna-7B-Uncensored.ggmlv2.q4_1.bin,
but the app is not able to load the model.
i created the llama_models and ggml directories as there were none.
Thank you
We should use /imped (notice the p) to embed text to a private in-browser db. This project makes it possible
Vector Storage
For production what is the value
API_ENV="prod" or API_ENV="production"
So far we have achieved a lot with this codebase. Perhaps its time, we pause new features and refactor code base to allow for core/plugin structure. Adopting modular architecture, will give this project incredible flexibility. I want us to be the "drupal" of this space, but we must structure it now before code base grow too big.
Lets standardize what a module is and push:
This allows developers to add, maintain modules without interfering with core. Project is useable as is. I will send you an email shortly. Your thoughts
If the email has . the registration fails. e.g. [email protected] fails
Integrate FastAPI Admin or Amis version
Dear LLMChat developer,
Greetings! I am vansinhu, a community developer and volunteer at InternLM. Your work has been immensely beneficial to me, and I believe it can be effectively utilized in InternLM as well. Welcome to add Discord https://discord.gg/gF9ezcmtM3 . I hope to get in touch with you.
Best regards,
vansinhu
chunk_overlap: int = 0,
overlap should be about 100 to preserve meanings cutoff by chunk_size
Lets implement qdrant for embeddings. Use Redis for what it's good at - caching chats. Qdrant is fast and stable and excellent at search and filtering see benchmark.
Redis single thread execution is bad for vertical scaling. Down the road we should allow BYOD (bring your own db)
Please implement as separate docker for independent scaling docker run -p 6333:6333 qdrant/qdrant
Courtesy of chatpad
openai says to use "text-embedding-ada-002" for all text embeddings. It's very cheap. gpt3.5/4 are 1000x more expensive tokenizer_model: str = "text-embedding-ada-002"
Possible abuse of shared memory if all authenticated user can embed text. Restrict embedding to admin/editor roles
Chatroom title should be editable and default to sensible summary of the initial prompt
Implement /bypass to search via LLM without hitting vector. Otherwise chats must check vector for embeddings before interacting with LLM. The purpose of this app I believe is to grant longer context memory. Being forced to add /query in front of every chat is tiresome. Embeddings can prompt LLM how to behave on chat initiation.
There has to be a way to stop response generation. This is best practice.
In Chat UI, there is a long list of LLM model. The default one is GPT 3.5 Turbo, which is openAI as I guess.
I configure openAI api key in .env, so it should be used, as the answer is very fast.
When I try to switch it to Llama 7B, it report:
An error occurred while generating text: Model llama-7b-GGML is currently booting.
I setup another llm engine "vllm" based on llama-2-7b-chat model, and expose in port 3000, it is compatible with openAI API.
how can I configure it to use this new engine?
My testing shows you need minimum 2gb of ram and ubuntu-22
Lets save some $$$ by implementing GPTcache
There is a docker image and I think it may already work with Redis
$ docker pull zilliz/gptcache:latest $ docker run -p 8000:8000 -it zilliz/gptcache:latest
Admin can use temperature settings to bypass cache.
Important: cache must maintain user privacy. Admin can add sitewide cache. This will make FAQ generation a breeze and cost nothing to retrieve cached info each time.
Openai embedding is quite weak, proprietary and token hungry. Lets move to USE. PDFGPT has an implementation of USE we can modify and build upon.
Hi,
I want to install this as production to a subdomain. Using sub.domain.tld as HOST_MAIN should work I am guessing?
Thanks for this app, I will test and give my 2 cents of feedback.
Ability to add prefix/suffix instruction to first message in chat. Best as .env variable. I want to instruct openAI about policies. Kindly give me a pointer so I can attempt this.
Either make gpt4 default or allow admin to set in .env. Users may override it with /model
How to use a DESCRIPTION_TMPL for openai "chatroles.system" models. I see DESCRIPTION being passed as "description" in other modals. Gpt needs this feature, if it exist how do I trigger it.?
When I run everything in docker, the api container fail to build llama.cpp as cmake didn't exist.
Does it require the external gcc to be 11?
>docker-compose -f docker-compose-local.yaml up api
[+] Running 3/0
✔ Container llmchat-cache-1 Running 0.0s
✔ Container llmchat-db-1 Running 0.0s
✔ Container llmchat-api-1 Created 0.0s
Attaching to llmchat-api-1, llmchat-cache-1, llmchat-db-1
llmchat-api-1 | None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
llmchat-api-1 | [2023-09-19 07:50:40,323] SQLAlchemy:CRITICAL - Current DB connection of LocalConfig: db/traffic@traffic_admin
llmchat-api-1 | INFO: Started server process [1]
llmchat-api-1 | INFO: Waiting for application startup.
llmchat-api-1 | [2023-09-19 07:50:41,191] ApiLogger:CRITICAL - ⚙️ Booting up...
llmchat-api-1 | [2023-09-19 07:50:41,191] ApiLogger:CRITICAL - MySQL DB connected!
llmchat-api-1 | [2023-09-19 07:50:41,195] ApiLogger:CRITICAL - Redis CACHE connected!
llmchat-api-1 | [2023-09-19 07:50:41,195] ApiLogger:CRITICAL - uvloop installed!
llmchat-api-1 | [2023-09-19 07:50:41,195] ApiLogger:CRITICAL - Llama CPP server monitoring started!
llmchat-api-1 | INFO: Application startup complete.
llmchat-api-1 | INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
llmchat-api-1 | [2023-09-19 07:50:41,200] ApiLogger:ERROR - Llama CPP server is not available
llmchat-api-1 | [2023-09-19 07:50:41,200] ApiLogger:CRITICAL - Starting Llama CPP server
llmchat-api-1 | - Loaded .env file successfully.
llmchat-api-1 | - API_ENV: local
llmchat-api-1 | - DOCKER_MODE: True
llmchat-api-1 | - Parsing function for function calling: control_browser
llmchat-api-1 | - Parsing function for function calling: control_web_page
llmchat-api-1 | - Parsing function for function calling: web_search
llmchat-api-1 | - Parsing function for function calling: vectorstore_search
llmchat-api-1 | Using openai embeddings
llmchat-api-1 | 🦙 llama.cpp DLL not found, building it...
llmchat-api-1 | 🦙 Trying to build llama.cpp DLL: /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-cublas.sh
llmchat-api-1 | /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-cublas.sh: line 2: cd: /app/repositories/llama_cpp/vendor/llama.cpp: No such file or directory
llmchat-api-1 | /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-cublas.sh: line 6: cmake: command not found
llmchat-api-1 | /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-cublas.sh: line 7: cmake: command not found
llmchat-api-1 | cp: cannot stat '/app/repositories/llama_cpp/vendor/llama.cpp/build/bin/Release/libllama.so': No such file or directory
llmchat-api-1 | 🦙 Could not build llama.cpp DLL!
llmchat-api-1 | 🦙 Trying to build llama.cpp DLL: /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-default.sh
llmchat-api-1 | /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-default.sh: line 2: cd: /app/repositories/llama_cpp/vendor/llama.cpp: No such file or directory
llmchat-api-1 | /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-default.sh: line 6: cmake: command not found
llmchat-api-1 | /app/repositories/llama_cpp/llama_cpp/build-llama-cpp-default.sh: line 7: cmake: command not found
llmchat-api-1 | cp: cannot stat '/app/repositories/llama_cpp/vendor/llama.cpp/build/bin/Release/libllama.so': No such file or directory
llmchat-api-1 | 🦙 Could not build llama.cpp DLL!
llmchat-api-1 | [2023-09-19 07:50:41,256] ApiLogger:WARNING - 🦙 Could not import llama-cpp-python repository: 🦙 Could not build llama.cpp DLL!
llmchat-api-1 | ...trying to import installed llama-cpp package...
llmchat-api-1 | INFO: 10.101.7.43:42488 - "GET / HTTP/1.1" 304 Not Modified
llmchat-api-1 | INFO: 10.101.7.43:42488 - "GET /main.dart.js HTTP/1.1" 200 OK
llmchat-api-1 | INFO: 10.101.7.43:49942 - "GET / HTTP/1.1" 304 Not Modified
llmchat-api-1 | INFO: 10.101.7.43:49942 - "GET /main.dart.js HTTP/1.1" 304 Not Modified
I propose we use a two LLMs approach to cut them on the cost of using gpt4 and all expensive future variants.
This mostly applies if you are using GTP4, but why use anything else :)
You have:
This may even get gpt4 to be more focus and on point
Can we remove the "new key" requirements and just load a new chatroom. It's a bit confusing
Advice needed. I am trying to move web into a separate container and leave only fastAPI in api.
Deleted chats should be moved to trash bin and deleted after 30 days. Add "trash" link to side bar. This should load a list of deleted chats in the main column. Lets mimic the gmail interface for this feature. I think people forget openai will soon change their ui to match true and tested email interface.
It is impossible to copy the result of a chat.
I altered the "user" method and added a "content= ChatConfig.chat_role_system_message"
` async def user(msg: str, translate: bool, buffer: BufferedUserContext) -> None:
"""Handle user message, including translation"""
if len(buffer.current_user_message_histories) == 0 and UTC.check_string_valid(buffer.current_chat_room_name):
buffer.current_chat_room_name = msg[:20]
await CacheManager.update_profile(user_chat_context=buffer.current_user_chat_context)
# Add default system message at the start of a conversation
await MessageManager.add_message_history_safely(
user_chat_context=buffer.current_user_chat_context,
content= ChatConfig.chat_role_system_message,
role=ChatRoles.SYSTEM,
)
await SendToWebsocket.init(buffer=buffer, send_chat_rooms=True, wait_next_query=True)`
and in ChatConfig I added
chat_role_system_message: Optional[str] = environ["CHAT_ROLE_SYSTEM_MESSAGE"]
Now the ai behaviour is more predictable
This is the error I get:
File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1206, in _gcd_import File "<frozen importlib._bootstrap>", line 1178, in _find_and_load File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/app/main.py", line 39, in <module> from app.common.app_settings import create_app File "/app/app/common/app_settings.py", line 7, in <module> from app.auth.admin import MyAuthProvider File "/app/app/auth/admin.py", line 5, in <module> from app.common.config import config File "/app/app/common/config.py", line 177, in <module> config = Config.get() ^^^^^^^^^^^^ File "/app/app/common/config.py", line 122, in get _config = { ^ KeyError: '"prod"'
OpenAI is beginning to enforce their trademark on GPT.
https://techstartups.com/2023/05/04/openai-send-cease-owner-of-sitegpt-ai-forced-to-rebrand/
They may disallow api for any site or project using that name
Add tooltips to icons so that hovering displays hints. Or add labels
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.