System Info Nvidia GPU A100*8 Linux OS <div class="snippet

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-h

Hey <a class="user-mention notranslate" data-hovercard-type=

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-h

Sample command with mistral-7b failed,about predibase/lorax

Comments (10)

tgaddair commented on June 3, 2024

Hey @hayleyhu, thanks for reporting this. This is a surprising error. Could you try running the same command, but including the environment variable RUST_BACKTRACE=1 and sharing the full log output?

Example:

docker run -e RUST_BACKTRACE=1 ...

from lorax.

Nipi64310 commented on June 3, 2024

Hello @tgaddair ,
I encountered the same problem when testing the image "ghcr.io/predibase/lorax:latest". Here are the logs:

docker run --gpus '"device=7"' -e RUST_BACKTRACE=1 --shm-size 1g  -p 8081:80 -v /model_dir:/data ghcr.io/predibase/lorax:latest --model-id /data/Qwen-14B-Chat --trust-remote-code

2024-03-11T02:31:06.117503Z  INFO lorax_launcher: Args { model_id: "/data/Qwen-14B-Chat", adapter_id: None, source: "hub", adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, compile: false, dtype: None, trust_remote_code: true, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_active_adapters: 128, adapter_cycle_time_s: 2, adapter_memory_fraction: 0.0, hostname: "3ef400c8e367", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false }
2024-03-11T02:31:06.117556Z  WARN lorax_launcher: `trust_remote_code` is set. Trusting that model `/data/Qwen-14B-Chat` do not contain malicious code.
2024-03-11T02:31:06.117744Z  INFO download: lorax_launcher: Starting download process.
2024-03-11T02:31:09.676052Z  INFO lorax_launcher: cli.py:109 Files are already present on the host. Skipping download.

2024-03-11T02:31:10.721726Z  INFO download: lorax_launcher: Successfully downloaded weights.
2024-03-11T02:31:10.722129Z  INFO shard-manager: lorax_launcher: Starting shard rank=0
2024-03-11T02:31:20.730706Z  INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=0
2024-03-11T02:31:25.287915Z  INFO lorax_launcher: server.py:291 Server started at unix:///tmp/lorax-server-0

2024-03-11T02:31:25.334414Z  INFO shard-manager: lorax_launcher: Shard ready in 14.611113274s rank=0
2024-03-11T02:31:25.432031Z  INFO lorax_launcher: Starting Webserver
2024-03-11T02:31:25.464515Z  INFO lorax_router: router/src/main.rs:202: Loading tokenizer /data/Qwen-14B-Chat
2024-03-11T02:31:25.464578Z  INFO lorax_router: router/src/main.rs:206: Using local tokenizer: /data/Qwen-14B-Chat
2024-03-11T02:31:25.464601Z  WARN lorax_router: router/src/main.rs:251: Could not find a fast tokenizer implementation for /data/Qwen-14B-Chat
2024-03-11T02:31:25.464605Z  WARN lorax_router: router/src/main.rs:252: Rust input length validation and truncation is disabled
2024-03-11T02:31:25.464609Z  WARN lorax_router: router/src/main.rs:277: no pipeline tag found for model /data/Qwen-14B-Chat
2024-03-11T02:31:25.485387Z  INFO lorax_router: router/src/main.rs:296: Warming up model
2024-03-11T02:31:57.331056Z  INFO lorax_launcher: flash_causal_lm.py:781 Memory remaining for kv cache: 3082.375 MB

2024-03-11T02:31:57.572087Z  INFO lorax_router: router/src/main.rs:335: Setting max batch total tokens to 12128
2024-03-11T02:31:57.572120Z  INFO lorax_router: router/src/main.rs:336: Connected
2024-03-11T02:31:57.572134Z  WARN lorax_router: router/src/main.rs:341: Invalid hostname, defaulting to 0.0.0.0
2024-03-11T02:31:57.573058Z  INFO lorax_router::server: router/src/server.rs:974: CORS: origin: Const("*"), methods: Const(Some("GET,POST")), headers: Const(Some("content-type")), expose-headers: Const(None) credentials: No
2024-03-11T02:31:57.573079Z  INFO lorax_router::server: router/src/server.rs:986: CORS: CorsLayer { allow_credentials: No, allow_headers: Const(Some("content-type")), allow_methods: Const(Some("GET,POST")), allow_origin: Const("*"), allow_private_network: No, expose_headers: Const(None), max_age: Exact(None), vary: Vary(["origin", "access-control-request-method", "access-control-request-headers"]) }
thread 'tokio-runtime-worker' panicked at /usr/src/router/src/server.rs:794:26:
called `Option::unwrap()` on a `None` value
stack backtrace:
   0: rust_begin_unwind
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:597:5
   1: core::panicking::panic_fmt
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:127:5
   3: core::option::Option<T>::unwrap
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/option.rs:935:21
   4: lorax_router::server::request_logger::{{closure}}
             at ./router/src/server.rs:794:22
   5: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/core.rs:328:17
   6: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/loom/std/unsafe_cell.rs:16:9
   7: tokio::runtime::task::core::Core<T,S>::poll
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/core.rs:317:30
   8: std::panicking::try::do_call
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:504:40
   9: std::panicking::try
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:468:19
  10: std::panic::catch_unwind
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panic.rs:142:14
  11: tokio::runtime::task::harness::poll_future
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:473:18
  12: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:208:27
  13: tokio::runtime::task::harness::Harness<T,S>::poll
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:153:15
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
2024-03-11T02:31:57.855398Z ERROR lorax_launcher: Webserver Crashed
2024-03-11T02:31:57.855429Z  INFO lorax_launcher: Shutting down shards
2024-03-11T02:31:57.931701Z  INFO shard-manager: lorax_launcher: Shard terminated rank=0
Error: WebserverFailed

from lorax.

tgaddair commented on June 3, 2024

Hey @Nipi64310, thanks for providing this additional context. Unfortunately, it looks like the offending call to Option::unwrap() is still being hidden somehow. Can you try running docker pull ghcr.io/predibase/lorax:latest to ensure you're running the latest image and set RUST_BACKTRACE=full to get the full stack trace? Thanks.

from lorax.

Nipi64310 commented on June 3, 2024

Hey @Nipi64310, thanks for providing this additional context. Unfortunately, it looks like the offending call to Option::unwrap() is still being hidden somehow. Can you try running docker pull ghcr.io/predibase/lorax:latest to ensure you're running the latest image and set RUST_BACKTRACE=full to get the full stack trace? Thanks.

Hi @tgaddair , thanks for getting back to me. I've now updated to the latest Docker image and I can start it now.

from lorax.

Nipi64310 commented on June 3, 2024

Hey @Nipi64310, thanks for providing this additional context. Unfortunately, it looks like the offending call to Option::unwrap() is still being hidden somehow. Can you try running docker pull ghcr.io/predibase/lorax:latest to ensure you're running the latest image and set RUST_BACKTRACE=full to get the full stack trace? Thanks.

Hi @tgaddair , thanks for getting back to me. I've now updated to the latest Docker image and I can start it now.

Hello @tgaddair ,
Loaded Qwen-72B-Chat-Int4, encountered RuntimeError: CUDA error: an illegal memory access was encountered. Loading Qwen-14B-Chat-Int4 gives the correct result. Here is the error log:

docker run --gpus '"device=2,3,4,5"' -e RUST_BACKTRACE=full --shm-size 1g  -p 8081:80 -v /Qwen/:/data ghcr.nju.edu.cn/predibase/lorax:latest --model-id /data/Qwen-72B-Chat-Int4 --adapter-source local --trust-remote-code --quantize gptq


2024-03-11T09:24:56.420409Z  INFO lorax_launcher: Starting Webserver
2024-03-11T09:24:56.457190Z  INFO lorax_router: router/src/main.rs:202: Loading tokenizer /data/Qwen-72B-Chat-Int4
2024-03-11T09:24:56.459163Z  INFO lorax_router: router/src/main.rs:206: Using local tokenizer: /data/Qwen-72B-Chat-Int4
2024-03-11T09:24:56.459186Z  WARN lorax_router: router/src/main.rs:251: Could not find a fast tokenizer implementation for /data/Qwen-72B-Chat-Int4
2024-03-11T09:24:56.459265Z  WARN lorax_router: router/src/main.rs:252: Rust input length validation and truncation is disabled
2024-03-11T09:24:56.459270Z  WARN lorax_router: router/src/main.rs:277: no pipeline tag found for model /data/Qwen-72B-Chat-Int4
2024-03-11T09:24:56.503452Z  INFO lorax_router: router/src/main.rs:296: Warming up model
2024-03-11T09:24:59.348856Z ERROR lorax_launcher: interceptor.py:41 Method Warmup encountered an error.
Traceback (most recent call last):
  File "/opt/conda/bin/lorax-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 89, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 330, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
    return await self.intercept(
> File "/opt/conda/lib/python3.10/site-packages/lorax_server/interceptor.py", line 38, in intercept
    return await response
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
    raise error
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 80, in Warmup
    max_supported_total_tokens = self.model.warmup(batch, request.max_new_tokens)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 746, in warmup
    _, batch = self.generate_token(batch, is_warmup=True)
  File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 878, in generate_token
    raise e
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 875, in generate_token
    out = self.forward(batch, adapter_data)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 833, in forward
    return model.forward(
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_qwen_modeling.py", line 476, in forward
    hidden_states = self.transformer(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_qwen_modeling.py", line 433, in forward
    hidden_states, residual = layer(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_qwen_modeling.py", line 358, in forward
    attn_output = self.attn(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_qwen_modeling.py", line 227, in forward
    qkv = self.c_attn(hidden_states, adapter_data)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/layers.py", line 601, in forward
    result = self.base_layer(input)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/layers.py", line 399, in forward
    return self.linear.forward(x)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/gptq/quant_linear.py", line 349, in forward
    out = QuantLinearFunction.apply(
  File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 123, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/gptq/quant_linear.py", line 244, in forward
    output = matmul248(input, qweight, scales, qzeros, g_idx, bits, maxq)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/gptq/quant_linear.py", line 216, in matmul248
    matmul_248_kernel[grid](
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/gptq/custom_autotune.py", line 110, in run
    timings = {
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/gptq/custom_autotune.py", line 111, in <dictcomp>
    config: self._bench(*args, config=config, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/gptq/custom_autotune.py", line 90, in _bench
    return triton.testing.do_bench(
  File "/opt/conda/lib/python3.10/site-packages/triton/testing.py", line 103, in do_bench
    torch.cuda.synchronize()
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 801, in synchronize
    return torch._C._cuda_synchronize()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


2024-03-11T09:25:05.191066Z ERROR warmup{max_input_length=1024 max_prefill_tokens=4096 max_total_tokens=2048}:warmup: lorax_client: router/client/src/lib.rs:34: Server error: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Error: Warmup(Generation("CUDA error: an illegal memory access was encountered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n"))
2024-03-11T09:25:05.227846Z ERROR lorax_launcher: Webserver Crashed
2024-03-11T09:25:05.227884Z  INFO lorax_launcher: Shutting down shards
2024-03-11T09:25:05.576928Z  INFO shard-manager: lorax_launcher: Shard terminated rank=0
2024-03-11T09:25:05.599339Z  INFO shard-manager: lorax_launcher: Shard terminated rank=2
2024-03-11T09:25:05.599523Z  INFO shard-manager: lorax_launcher: Shard terminated rank=3
2024-03-11T09:25:05.643815Z  INFO shard-manager: lorax_launcher: Shard terminated rank=1
Error: WebserverFailed

from lorax.

tgaddair commented on June 3, 2024

Hey @Nipi64310, can you share the output of nvidia-smi? It looks like the warmup process is running out of memory. You may need to try reducing these values:

max_input_length=1024 max_prefill_tokens=4096 max_total_tokens=2048

@hayleyhu can you try pulling the latest image and see if that resolves the unwrap() panic?

from lorax.

tgaddair commented on June 3, 2024

Okay, I think I see what's happening here. The unwrap error is occurring because of PR #309, which was accidentally pushing latest images during development.

cc @magdyksaleh

Let's make sure we only push dev images with a specific tag for the branch. I'll see if there's something we can do to prevent this automatically. In the meantime, I'll see if we can retag the current latest with the last commit to main.

from lorax.

tgaddair commented on June 3, 2024

@magdyksaleh confirmed the latest image has been fixed to be tagged from main.

from lorax.

Nipi64310 commented on June 3, 2024

Hey @Nipi64310, can you share the output of nvidia-smi? It looks like the warmup process is running out of memory. You may need to try reducing these values:
max_input_length=1024 max_prefill_tokens=4096 max_total_tokens=2048
@hayleyhu can you try pulling the latest image and see if that resolves the unwrap() panic?

Hi @tgaddair , I specified ---max-input-length 128 --max-batch-prefill-tokens 512 --max-batch-total-tokens 512 --max-total-tokens 512, but I'm still getting the same error log.

docker run --gpus '"device=2,3,4,5"' -e RUST_BACKTRACE=full --shm-size 1g  -p 8081:80 -v /Qwen:/data ghcr.nju.edu.cn/predibase/lorax:latest --model-id /data/Qwen-72B-Chat-Int4 --adapter-source local --quantize gptq --max-input-length 128 --max-batch-prefill-tokens 512 --max-batch-total-tokens 512  --max-total-tokens 512 --trust-remote-code

RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


2024-03-12T02:40:12.730824Z ERROR warmup{max_input_length=128 max_prefill_tokens=512 max_total_tokens=512}:warmup: lorax_client: router/client/src/lib.rs:34: Server error: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Error: Warmup(Generation("CUDA error: an illegal memory access was encountered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n"))
2024-03-12T02:40:12.751591Z ERROR lorax_launcher: Webserver Crashed
2024-03-12T02:40:12.751620Z  INFO lorax_launcher: Shutting down shards
2024-03-12T02:40:13.041195Z  INFO shard-manager: lorax_launcher: Shard terminated rank=2
2024-03-12T02:40:13.064553Z  INFO shard-manager: lorax_launcher: Shard terminated rank=1
2024-03-12T02:40:13.091416Z  INFO shard-manager: lorax_launcher: Shard terminated rank=3
2024-03-12T02:40:13.138504Z  INFO shard-manager: lorax_launcher: Shard terminated rank=0

from lorax.

hayleyhu commented on June 3, 2024

Thanks my original question was resolved!

from lorax.

Sample command with mistral-7b failed about lorax HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent