Coder Social home page Coder Social logo

Comments (9)

karlbernard2 avatar karlbernard2 commented on May 20, 2024 1

Here's the trace:

CleanShot 2023-12-22 at 22 41 28@2x

from lorax.

karlbernard2 avatar karlbernard2 commented on May 20, 2024 1

We’re running H100 on NebiusAI Kubernetes. I’ll have to get back to you on Tuesday with info on drivers.

from lorax.

tgaddair avatar tgaddair commented on May 20, 2024

Hey @karlbernard2, thanks for reporting. It sounds like there's a deadlock that's occurring here that may be triggered under very specific conditions (requests coming it at just the wrong time). Can you share any additional details about your setup (args to lorax-launcher for example) that can help with reproducing the error?

One thing that stands out from the logs you provided is that the adapter NextDayAI/xtraspicy1.0_13b_r32_800 was loaded, successfully processed a request, then offloaded, then loaded back, but never successfully processed any other requests. It's curious that it was offloaded at all, as it looks like only two adapters were loaded, while by default we will allow up to 128 to be loaded before doing any offloading. So whatever is causing the deadlock may be related to that behavior.

I'll try and take a closer look, but if there's anything you can provide to help me repro that would be helpful.

from lorax.

tgaddair avatar tgaddair commented on May 20, 2024

The fact that the /health endpoint is unresponsive but the /info endpoint works would suggest that there's an issue with the Python server, rather than the router. It's possible that the Python server is stuck on some operation.

Something you could try:

  • Make sure your container is running in privileged mode by adding SYS_PTRACE to the security context of the container as shown here.
  • SSH into the pod with kubectl exec -it <pod_name> -- /bin/bash
  • Install py-spy so you can get a backtrace from the Python server: pip install py-spy
  • Find the Python server process: ps aux | grep python
  • Run py-spy on the Python server to obtain the backtrace: sudo py-spy dump -p <pid>

If you're able to run that on one of the hung pods, that would be very helpful for debugging the error.

from lorax.

karlbernard2 avatar karlbernard2 commented on May 20, 2024

Thanks for the detailed instructions,, I'll try to do that.

Here's how I launched teh container
containers:
- name: lorax-container
image: ghcr.io/predibase/lorax:latest
ports:
- containerPort: 8001
env:
- name: HUGGING_FACE_HUB_TOKEN
value: hf_secret
- name: PORT
value: "8001"
- name: ROPE_SCALING
value: "dynamic"
- name: ROPE_FACTOR
value: "2.0"
args:
- "--max-input-length=7900"
- "--max-total-tokens=8192"
- "--max-batch-prefill-tokens=8192"
- "--model-id=NextDayAI/extraspicy"

from lorax.

karlbernard2 avatar karlbernard2 commented on May 20, 2024

@tgaddair My first attempt to replicate didn;t have the same issue (althouh earlier today I got it all the time, so will try more.)

However, since you talked about offloading that shouln't happen, you might find these logs strange:

2023-12-23T03:04:06.827779Z INFO HTTP request{otel.name=POST /generate http.client_ip= http.flavor=1.1 http.host=inference.spicychat.ai:7001 http.method=POST http.route=/generate http.scheme=HTTP http.target=/generate http.user_agent=axios/1.4.0 otel.kind=server trace_id=1b65c393d478941d6b16797446fc1519}:generate{parameters=GenerateParameters { adapter_id: Some("NextDayAI/xtraspicy1.0_13b_r32_720_adapter"), adapter_source: None, api_token: None, best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.1), top_k: Some(90), top_p: Some(0.7), typical_p: None, do_sample: true, max_new_tokens: 180, return_full_text: None, truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None } total_time="3.414291187s" validation_time="3.11ms" queue_time="46.871µs" inference_time="3.41113458s" time_per_token="18.950747ms" seed="Some(18257989878521111275)"}: lorax_router::server: router/src/server.rs:298: Success 2023-12-23T03:04:07.058713Z INFO lorax_router::loader: router/src/loader.rs:241: adapter __base_model__ offloaded 2023-12-23T03:04:07.058731Z INFO lorax_router::queue: router/src/queue.rs:135: set adapter __base_model__ status to Downloaded 2023-12-23T03:04:07.095727Z INFO lorax_router::loader: router/src/loader.rs:197: adapter NextDayAI/xtraspicy1.0_13b_r32_760_adapter loaded 2023-12-23T03:04:07.095745Z INFO lorax_router::queue: router/src/queue.rs:135: set adapter NextDayAI/xtraspicy1.0_13b_r32_760_adapter status to Ready 2023-12-23T03:04:07.588268Z INFO HTTP request{otel.name=POST /generate http.client_ip= http.flavor=1.1 http.host=inference.spicychat.ai:7001 http.method=POST http.route=/generate http.scheme=HTTP http.target=/generate http.user_agent=axios/1.4.0 otel.kind=server trace_id=f94f4f52e417b624f40329c484ee954f}:generate{parameters=GenerateParameters { adapter_id: Some("NextDayAI/xtraspicy1.0_13b_r32_760_adapter"), adapter_source: None, api_token: None, best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.1), top_k: Some(90), top_p: Some(0.7), typical_p: None, do_sample: true, max_new_tokens: 180, return_full_text: None, truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None } total_time="541.9253ms" validation_time="2.590788ms" queue_time="64.482779ms" inference_time="474.851989ms" time_per_token="23.742599ms" seed="Some(14941048975292732004)"}: lorax_router::server: router/src/server.rs:298: Success 2023-12-23T03:04:07.625223Z INFO HTTP request{otel.name=POST /generate http.client_ip= http.flavor=1.1 http.host=inference.spicychat.ai:7001 http.method=POST http.route=/generate http.scheme=HTTP http.target=/generate http.user_agent=axios/1.4.0 otel.kind=server trace_id=676504f7982ff96d058121f59d961dd4}:generate{parameters=GenerateParameters { adapter_id: Some("NextDayAI/xtraspicy1.0_13b_r32_800_adapter"), adapter_source: None, api_token: None, best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.1), top_k: Some(90), top_p: Some(0.7), typical_p: None, do_sample: true, max_new_tokens: 300, return_full_text: None, stop: ["\nEva:", "\nShizuka:", "\n###"], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None } total_time="1.211921872s" validation_time="778.706µs" queue_time="13.944244ms" inference_time="1.197199151s" time_per_token="21.003493ms" seed="Some(5853084379632762489)"}: lorax_router::server: router/src/server.rs:298: Success 2023-12-23T03:04:09.218011Z INFO lorax_router::loader: router/src/loader.rs:241: adapter NextDayAI/xtraspicy1.0_13b_r32_800_adapter offloaded 2023-12-23T03:04:09.218033Z INFO lorax_router::queue: router/src/queue.rs:135: set adapter NextDayAI/xtraspicy1.0_13b_r32_800_adapter status to Downloaded 2023-12-23T03:04:09.218716Z INFO lorax_router::loader: router/src/loader.rs:241: adapter NextDayAI/xtraspicy1.0_13b_r32_720_adapter offloaded 2023-12-23T03:04:09.218739Z INFO lorax_router::queue: router/src/queue.rs:135: set adapter NextDayAI/xtraspicy1.0_13b_r32_720_adapter status to Downloaded 2023-12-23T03:04:09.239236Z INFO lorax_router::loader: router/src/loader.rs:197: adapter NextDayAI/xtraspicy1.0_13b_r32_400_adapter loaded 2023-12-23T03:04:09.239269Z INFO lorax_router::queue: router/src/queue.rs:135: set adapter NextDayAI/xtraspicy1.0_13b_r32_400_adapter status to Ready 2023-12-23T03:04:09.258709Z INFO lorax_router::loader: router/src/loader.rs:197: adapter NextDayAI/xtraspicy1.0_13b_r32_720_adapter loaded 2023-12-23T03:04:09.258724Z INFO lorax_router::queue: router/src/queue.rs:135: set adapter NextDayAI/xtraspicy1.0_13b_r32_720_adapter status to Ready 2023-12-23T03:04:09.701608Z INFO HTTP request{otel.name=POST /generate http.client_ip= http.flavor=1.1 http.host=inference.spicychat.ai:7001 http.method=POST http.route=/generate http.scheme=HTTP http.target=/generate http.user_agent=axios/1.4.0 otel.kind=server trace_id=a9611795aa4b0336970442689421c739}:generate{parameters=GenerateParameters { adapter_id: Some("NextDayAI/xtraspicy1.0_13b_r32_400_adapter"), adapter_source: None, api_token: None, best_of: None, temperature: Some(0.7), repetition_penalty: Some(1.1), top_k: Some(90), top_p: Some(0.7), typical_p: None, do_sample: true, max_new_tokens: 180, return_full_text: None, stop: ["\nYour roommate Amber :", "\nWinston:", "\n###"], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None } total_time="489.291069ms" validation_time="4.884384ms" queue_time="22.199583ms" inference_time="462.207337ms" time_per_token="24.326701ms" seed="Some(16751434335624526013)"}: lorax_router::server: router/src/server.rs:298: Success

Screenshot might be easier to read:
CleanShot 2023-12-22 at 22 09 05@2x

We are only dealing with 4 adapters

from lorax.

tgaddair avatar tgaddair commented on May 20, 2024

Thanks for the back trace @karlbernard2, this is very helpful!

Definitely looks like the hanging is occurring the SGMV kernel.

In the short term, you can try disabling SGMV with an environment variable: DISABLE_SGMV=1. That's not a great longterm solution since SGMV is very fast when you have lots of adapters, but it should at least unblock you while I try and repro the issue, and the performance hit shouldn't be very noticeable with fewer than 10 adapters.

I'll see if I can repro this behavior with the adapters you're using here.

from lorax.

tgaddair avatar tgaddair commented on May 20, 2024

Hey @karlbernard2, update on this: I tried running some stress tests today with a variety of request patterns to try and replicate your setup, but was unable to trigger the hanging behavior.

Can you share a few more details about your environment:

  • What GPU are you running on?
  • What Nvidia device driver version are you using (from nvidia-smi)?
  • Is this running on prem or in the cloud? If cloud, which one?

Thanks.

from lorax.

tgaddair avatar tgaddair commented on May 20, 2024

Hey @karlbernard2, I managed to track down the root cause of the deadlock, and has been fixed in #156.

from lorax.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.