This is my local setup: NVIDIA-SMI 535.103</

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Here is an example of docker container logs: <div class="snippet-clipboard-content

Failures on insufficient GPU memory about refact HOT 5 CLOSED

smallcloudai commented on May 14, 2024

Failures on insufficient GPU memory

from refact.

Comments (5)

mitya52 commented on May 14, 2024

@Johnz86 hi!

'Required memory exceeds the GPU's memory' is just a warning, it does not affect inference at all. But CONTRASTcode/3B requires ~8Gb of VRAM at full context. Using 4Gb for large files can lead to OOM. This warning is unclear and we're fix it in the future.
I think your problem is in infurl: change https to http.
SERVER_API_TOKEN is not using by new docker container, you can remove it.

from refact.

Johnz86 commented on May 14, 2024

I tried it with http, the Invalid HTTP request received. does no longer appear in logs. The issues is that no inference is happening, and I can not determine from logs or any response, what is the state of the process.

Is there any way to determine, If I should wait for the inference, or when the process does not work at all?

from refact.

mitya52 commented on May 14, 2024

@Johnz86 you can check error occured in refact.ai below chat (yellow box). Also please give server logs, it should be OOM or something like this.

from refact.

Johnz86 commented on May 14, 2024

Here is an example of docker container logs:

PS C:\Users\z0034zpz> docker run -d --rm -p 8008:8008 -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting
9b1d43ca00bbe3f68e05876e2c18da266348a81cdd851553494643b27ae9afcc
PS C:\Users\z0034zpz> docker logs -f 9b1d

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

20230907 09:06:14 adding job model-contrastcode-3b-multi-0.cfg
20230907 09:06:14 adding job enum_gpus.cfg
20230907 09:06:14 adding job filetune.cfg
20230907 09:06:14 adding job filetune_filter_only.cfg
20230907 09:06:14 adding job process_uploaded.cfg
20230907 09:06:14 adding job webgui.cfg
20230907 09:06:14 CVD=0 starting python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi --compile
 -> pid 31
20230907 09:06:14 CVD= starting python -m self_hosting_machinery.scripts.enum_gpus
 -> pid 32
20230907 09:06:14 CVD= starting python -m self_hosting_machinery.webgui.webgui
 -> pid 33
-- 33 -- 20230907 09:06:14 WEBUI Started server process [33]
-- 33 -- 20230907 09:06:14 WEBUI Waiting for application startup.
-- 33 -- 20230907 09:06:14 WEBUI Application startup complete.
-- 33 -- 20230907 09:06:14 WEBUI Uvicorn running on http://0.0.0.0:8008 (Press CTRL+C to quit)
-- 31 -- 20230907 09:06:19 MODEL STATUS loading model
-- 31 -- 20230907 09:07:03 MODEL STATUS test batch
20230907 09:07:46 31 finished python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi @:gpu00, retcode 0
/finished compiling as recognized by watchdog
20230907 09:07:47 CVD=0 starting python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi
 -> pid 111
-- 111 -- 20230907 09:07:50 MODEL STATUS loading model
-- 33 -- 20230907 09:07:51 WEBUI 172.17.0.1:41986 - "GET /v1/login HTTP/1.1" 200
-- 111 -- 20230907 09:08:17 MODEL STATUS test batch
-- 111 -- 20230907 09:08:52 MODEL STATUS serving CONTRASTcode/3b/multi
-- 111 -- 20230907 09:09:02 MODEL 10008.3ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:02 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:12 MODEL 10004.0ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:12 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:22 MODEL 10003.5ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:22 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 33 -- 20230907 09:09:26 WEBUI comp-SvVQSvapACW6 model resolve "gpt3.5" -> error "model is not loaded (2)" from XXX
-- 33 -- 20230907 09:09:26 WEBUI 172.17.0.1:41990 - "POST /v1/chat HTTP/1.1" 400
-- 111 -- 20230907 09:09:32 MODEL 10005.4ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:32 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:42 MODEL 10005.0ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:42 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:52 MODEL 10002.7ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:52 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:10:02 MODEL 10002.7ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:10:02 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:10:12 MODEL 10003.0ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:10:12 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:10:22 MODEL 10006.3ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:10:22 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200

Here is how it looks in ui:

from refact.

olegklimov commented on May 14, 2024

model is not loaded (2) -- it can't access the model, according the logs.

I guess the good way to go about solving this -- react to configuration changes faster.

#158

from refact.

Failures on insufficient GPU memory about refact HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent