System Info ghcr.io/predibase/lorax:latest Running within Kube

Here's the trace: <a target="_blank" rel="noopener noreferrer" href=

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for the back trace <a class="user-mention notranslate" data-hovercard-type="use

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Lorax Hanging in production about lorax HOT 9 CLOSED

karlbernard2 commented on May 20, 2024

Lorax Hanging in production

from lorax.

Comments (9)

karlbernard2 commented on May 20, 2024 1

Here's the trace:

from lorax.

karlbernard2 commented on May 20, 2024 1

We’re running H100 on NebiusAI Kubernetes. I’ll have to get back to you on Tuesday with info on drivers.

from lorax.

tgaddair commented on May 20, 2024

Hey @karlbernard2, thanks for reporting. It sounds like there's a deadlock that's occurring here that may be triggered under very specific conditions (requests coming it at just the wrong time). Can you share any additional details about your setup (args to lorax-launcher for example) that can help with reproducing the error?

One thing that stands out from the logs you provided is that the adapter NextDayAI/xtraspicy1.0_13b_r32_800 was loaded, successfully processed a request, then offloaded, then loaded back, but never successfully processed any other requests. It's curious that it was offloaded at all, as it looks like only two adapters were loaded, while by default we will allow up to 128 to be loaded before doing any offloading. So whatever is causing the deadlock may be related to that behavior.

I'll try and take a closer look, but if there's anything you can provide to help me repro that would be helpful.

from lorax.

tgaddair commented on May 20, 2024

The fact that the /health endpoint is unresponsive but the /info endpoint works would suggest that there's an issue with the Python server, rather than the router. It's possible that the Python server is stuck on some operation.

Something you could try:

Make sure your container is running in privileged mode by adding SYS_PTRACE to the security context of the container as shown here.
SSH into the pod with kubectl exec -it <pod_name> -- /bin/bash
Install py-spy so you can get a backtrace from the Python server: pip install py-spy
Find the Python server process: ps aux | grep python
Run py-spy on the Python server to obtain the backtrace: sudo py-spy dump -p <pid>

If you're able to run that on one of the hung pods, that would be very helpful for debugging the error.

from lorax.

karlbernard2 commented on May 20, 2024

Thanks for the detailed instructions,, I'll try to do that.

Here's how I launched teh container
containers:
- name: lorax-container
image: ghcr.io/predibase/lorax:latest
ports:
- containerPort: 8001
env:
- name: HUGGING_FACE_HUB_TOKEN
value: hf_secret
- name: PORT
value: "8001"
- name: ROPE_SCALING
value: "dynamic"
- name: ROPE_FACTOR
value: "2.0"
args:
- "--max-input-length=7900"
- "--max-total-tokens=8192"
- "--max-batch-prefill-tokens=8192"
- "--model-id=NextDayAI/extraspicy"

from lorax.

karlbernard2 commented on May 20, 2024

@tgaddair My first attempt to replicate didn;t have the same issue (althouh earlier today I got it all the time, so will try more.)

However, since you talked about offloading that shouln't happen, you might find these logs strange:

2023-12-23T03:04:06.827779Z INFO HTTP request{otel.name=POST /generate http.client_ip= http.flavor=1.1 http.host=inference.spicychat.ai:7001 http.method=POST http.route=/generate http.scheme=HTTP http.target=/generate http.user_agent=axios/1.4.0 otel.kind=server trace_id=1b65c393d478941d6b16797446fc1519}:generate{parameters=GenerateParameters { adapter_id: Some("NextDayAI/xtraspicy1.0_13b_r32_720_adapter"), adapter_source: None, api_token: None, best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.1), top_k: Some(90), top_p: Some(0.7), typical_p: None, do_sample: true, max_new_tokens: 180, return_full_text: None, truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None } total_time="3.414291187s" validation_time="3.11ms" queue_time="46.871µs" inference_time="3.41113458s" time_per_token="18.950747ms" seed="Some(18257989878521111275)"}: lorax_router::server: router/src/server.rs:298: Success 2023-12-23T03:04:07.058713Z INFO lorax_router::loader: router/src/loader.rs:241: adapter __base_model__ offloaded 2023-12-23T03:04:07.058731Z INFO lorax_router::queue: router/src/queue.rs:135: set adapter __base_model__ status to Downloaded 2023-12-23T03:04:07.095727Z INFO lorax_router::loader: router/src/loader.rs:197: adapter NextDayAI/xtraspicy1.0_13b_r32_760_adapter loaded 2023-12-23T03:04:07.095745Z INFO lorax_router::queue: router/src/queue.rs:135: set adapter NextDayAI/xtraspicy1.0_13b_r32_760_adapter status to Ready 2023-12-23T03:04:07.588268Z INFO HTTP request{otel.name=POST /generate http.client_ip= http.flavor=1.1 http.host=inference.spicychat.ai:7001 http.method=POST http.route=/generate http.scheme=HTTP http.target=/generate http.user_agent=axios/1.4.0 otel.kind=server trace_id=f94f4f52e417b624f40329c484ee954f}:generate{parameters=GenerateParameters { adapter_id: Some("NextDayAI/xtraspicy1.0_13b_r32_760_adapter"), adapter_source: None, api_token: None, best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.1), top_k: Some(90), top_p: Some(0.7), typical_p: None, do_sample: true, max_new_tokens: 180, return_full_text: None, truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None } total_time="541.9253ms" validation_time="2.590788ms" queue_time="64.482779ms" inference_time="474.851989ms" time_per_token="23.742599ms" seed="Some(14941048975292732004)"}: lorax_router::server: router/src/server.rs:298: Success 2023-12-23T03:04:07.625223Z INFO HTTP request{otel.name=POST /generate http.client_ip= http.flavor=1.1 http.host=inference.spicychat.ai:7001 http.method=POST http.route=/generate http.scheme=HTTP http.target=/generate http.user_agent=axios/1.4.0 otel.kind=server trace_id=676504f7982ff96d058121f59d961dd4}:generate{parameters=GenerateParameters { adapter_id: Some("NextDayAI/xtraspicy1.0_13b_r32_800_adapter"), adapter_source: None, api_token: None, best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.1), top_k: Some(90), top_p: Some(0.7), typical_p: None, do_sample: true, max_new_tokens: 300, return_full_text: None, stop: ["\nEva:", "\nShizuka:", "\n###"], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None } total_time="1.211921872s" validation_time="778.706µs" queue_time="13.944244ms" inference_time="1.197199151s" time_per_token="21.003493ms" seed="Some(5853084379632762489)"}: lorax_router::server: router/src/server.rs:298: Success 2023-12-23T03:04:09.218011Z INFO lorax_router::loader: router/src/loader.rs:241: adapter NextDayAI/xtraspicy1.0_13b_r32_800_adapter offloaded 2023-12-23T03:04:09.218033Z INFO lorax_router::queue: router/src/queue.rs:135: set adapter NextDayAI/xtraspicy1.0_13b_r32_800_adapter status to Downloaded 2023-12-23T03:04:09.218716Z INFO lorax_router::loader: router/src/loader.rs:241: adapter NextDayAI/xtraspicy1.0_13b_r32_720_adapter offloaded 2023-12-23T03:04:09.218739Z INFO lorax_router::queue: router/src/queue.rs:135: set adapter NextDayAI/xtraspicy1.0_13b_r32_720_adapter status to Downloaded 2023-12-23T03:04:09.239236Z INFO lorax_router::loader: router/src/loader.rs:197: adapter NextDayAI/xtraspicy1.0_13b_r32_400_adapter loaded 2023-12-23T03:04:09.239269Z INFO lorax_router::queue: router/src/queue.rs:135: set adapter NextDayAI/xtraspicy1.0_13b_r32_400_adapter status to Ready 2023-12-23T03:04:09.258709Z INFO lorax_router::loader: router/src/loader.rs:197: adapter NextDayAI/xtraspicy1.0_13b_r32_720_adapter loaded 2023-12-23T03:04:09.258724Z INFO lorax_router::queue: router/src/queue.rs:135: set adapter NextDayAI/xtraspicy1.0_13b_r32_720_adapter status to Ready 2023-12-23T03:04:09.701608Z INFO HTTP request{otel.name=POST /generate http.client_ip= http.flavor=1.1 http.host=inference.spicychat.ai:7001 http.method=POST http.route=/generate http.scheme=HTTP http.target=/generate http.user_agent=axios/1.4.0 otel.kind=server trace_id=a9611795aa4b0336970442689421c739}:generate{parameters=GenerateParameters { adapter_id: Some("NextDayAI/xtraspicy1.0_13b_r32_400_adapter"), adapter_source: None, api_token: None, best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.1), top_k: Some(90), top_p: Some(0.7), typical_p: None, do_sample: true, max_new_tokens: 180, return_full_text: None, stop: ["\nYour roommate Amber :", "\nWinston:", "\n###"], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None } total_time="489.291069ms" validation_time="4.884384ms" queue_time="22.199583ms" inference_time="462.207337ms" time_per_token="24.326701ms" seed="Some(16751434335624526013)"}: lorax_router::server: router/src/server.rs:298: Success

Screenshot might be easier to read:

We are only dealing with 4 adapters

from lorax.

tgaddair commented on May 20, 2024

Thanks for the back trace @karlbernard2, this is very helpful!

Definitely looks like the hanging is occurring the SGMV kernel.

In the short term, you can try disabling SGMV with an environment variable: DISABLE_SGMV=1. That's not a great longterm solution since SGMV is very fast when you have lots of adapters, but it should at least unblock you while I try and repro the issue, and the performance hit shouldn't be very noticeable with fewer than 10 adapters.

I'll see if I can repro this behavior with the adapters you're using here.

from lorax.

tgaddair commented on May 20, 2024

Hey @karlbernard2, update on this: I tried running some stress tests today with a variety of request patterns to try and replicate your setup, but was unable to trigger the hanging behavior.

Can you share a few more details about your environment:

What GPU are you running on?
What Nvidia device driver version are you using (from nvidia-smi)?
Is this running on prem or in the cloud? If cloud, which one?

Thanks.

from lorax.

tgaddair commented on May 20, 2024

Hey @karlbernard2, I managed to track down the root cause of the deadlock, and has been fixed in #156.

from lorax.

Lorax Hanging in production about lorax HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent