Comments (5)
@Johnz86 hi!
'Required memory exceeds the GPU's memory' is just a warning, it does not affect inference at all. But CONTRASTcode/3B requires ~8Gb of VRAM at full context. Using 4Gb for large files can lead to OOM. This warning is unclear and we're fix it in the future.
I think your problem is in infurl: change https to http.
SERVER_API_TOKEN is not using by new docker container, you can remove it.
from refact.
I tried it with http, the Invalid HTTP request received.
does no longer appear in logs. The issues is that no inference is happening, and I can not determine from logs or any response, what is the state of the process.
Is there any way to determine, If I should wait for the inference, or when the process does not work at all?
from refact.
@Johnz86 you can check error occured in refact.ai below chat (yellow box). Also please give server logs, it should be OOM or something like this.
from refact.
Here is an example of docker container logs:
PS C:\Users\z0034zpz> docker run -d --rm -p 8008:8008 -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting
9b1d43ca00bbe3f68e05876e2c18da266348a81cdd851553494643b27ae9afcc
PS C:\Users\z0034zpz> docker logs -f 9b1d
==========
== CUDA ==
==========
CUDA Version 11.8.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
20230907 09:06:14 adding job model-contrastcode-3b-multi-0.cfg
20230907 09:06:14 adding job enum_gpus.cfg
20230907 09:06:14 adding job filetune.cfg
20230907 09:06:14 adding job filetune_filter_only.cfg
20230907 09:06:14 adding job process_uploaded.cfg
20230907 09:06:14 adding job webgui.cfg
20230907 09:06:14 CVD=0 starting python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi --compile
-> pid 31
20230907 09:06:14 CVD= starting python -m self_hosting_machinery.scripts.enum_gpus
-> pid 32
20230907 09:06:14 CVD= starting python -m self_hosting_machinery.webgui.webgui
-> pid 33
-- 33 -- 20230907 09:06:14 WEBUI Started server process [33]
-- 33 -- 20230907 09:06:14 WEBUI Waiting for application startup.
-- 33 -- 20230907 09:06:14 WEBUI Application startup complete.
-- 33 -- 20230907 09:06:14 WEBUI Uvicorn running on http://0.0.0.0:8008 (Press CTRL+C to quit)
-- 31 -- 20230907 09:06:19 MODEL STATUS loading model
-- 31 -- 20230907 09:07:03 MODEL STATUS test batch
20230907 09:07:46 31 finished python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi @:gpu00, retcode 0
/finished compiling as recognized by watchdog
20230907 09:07:47 CVD=0 starting python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi
-> pid 111
-- 111 -- 20230907 09:07:50 MODEL STATUS loading model
-- 33 -- 20230907 09:07:51 WEBUI 172.17.0.1:41986 - "GET /v1/login HTTP/1.1" 200
-- 111 -- 20230907 09:08:17 MODEL STATUS test batch
-- 111 -- 20230907 09:08:52 MODEL STATUS serving CONTRASTcode/3b/multi
-- 111 -- 20230907 09:09:02 MODEL 10008.3ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:02 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:12 MODEL 10004.0ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:12 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:22 MODEL 10003.5ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:22 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 33 -- 20230907 09:09:26 WEBUI comp-SvVQSvapACW6 model resolve "gpt3.5" -> error "model is not loaded (2)" from XXX
-- 33 -- 20230907 09:09:26 WEBUI 172.17.0.1:41990 - "POST /v1/chat HTTP/1.1" 400
-- 111 -- 20230907 09:09:32 MODEL 10005.4ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:32 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:42 MODEL 10005.0ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:42 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:52 MODEL 10002.7ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:52 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:10:02 MODEL 10002.7ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:10:02 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:10:12 MODEL 10003.0ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:10:12 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:10:22 MODEL 10006.3ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:10:22 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
from refact.
model is not loaded (2) -- it can't access the model, according the logs.
I guess the good way to go about solving this -- react to configuration changes faster.
from refact.
Related Issues (20)
- Support for HTTP proxies
- Add DeepSeek Coder models HOT 4
- Add Code LLaMA HOT 4
- Could not scan repo with Refact/1.6B selected HOT 2
- Refactoring of the finetuning script HOT 1
- [CICL] cache flash_attn
- add check for the minumum number of files for fine-tuning job
- Without DB, web UI should still open, tell what the problem is
- When Using Self-Hosted Server VS Code Plugin Does Not Send Requests HOT 2
- Simple API key for OSS version, so people can expose docker port via reverse proxy HOT 1
- Finetune Problem HOT 12
- Host Model for Embeddings for RAG
- Finetune failed with "No train files provided"
- GPU Filtering improvement
- Finetune improvement for better performance HOT 2
- Self Hosted Chat Times Out VSCode
- docker image fails to start on mac m3 HOT 8
- Database not starting? HOT 2
- Finetune of deepseek-coder fails HOT 7
- refact refuses to finetune if finds weird bytes in files HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from refact.