Coder Social home page Coder Social logo

Comments (8)

cx-hub avatar cx-hub commented on August 25, 2024

i do not use ray because i can not sudo to change the progress limit

from vllm.

youkaichao avatar youkaichao commented on August 25, 2024

upgrade to 0.5.1 should help.

from vllm.

cx-hub avatar cx-hub commented on August 25, 2024

i will try

from vllm.

cx-hub avatar cx-hub commented on August 25, 2024

upgrade to 0.5.1 should help.

it fail too:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/llama3/mybench/mbxp/my_vllm.py", line 19, in <module>
[rank0]:     llm = LLM(model=the_path, gpu_memory_utilization=0.8, tensor_parallel_size = 2)
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 149, in __init__
[rank0]:     self.llm_engine = LLMEngine.from_engine_args(
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 414, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 243, in __init__
[rank0]:     self.model_executor = executor_class(
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/executor/distributed_gpu_executor.py", line 25, in __init__
[rank0]:     super().__init__(*args, **kwargs)
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/executor/executor_base.py", line 42, in __init__
[rank0]:     self._init_executor()
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/executor/multiproc_gpu_executor.py", line 78, in _init_executor
[rank0]:     self._run_workers("init_device")
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/executor/multiproc_gpu_executor.py", line 130, in _run_workers
[rank0]:     driver_worker_output = driver_worker_method(*args, **kwargs)
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/worker/worker.py", line 126, in init_device
[rank0]:     init_worker_distributed_environment(self.parallel_config, self.rank,
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/worker/worker.py", line 327, in init_worker_distributed_environment
[rank0]:     ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 915, in ensure_model_parallel_initialized
[rank0]:     initialize_model_parallel(tensor_model_parallel_size,
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 883, in initialize_model_parallel
[rank0]:     _TP = init_model_parallel_group(group_ranks,
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 729, in init_model_parallel_group
[rank0]:     return GroupCoordinator(
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 175, in __init__
[rank0]:     self.pynccl_comm = PyNcclCommunicator(
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/distributed/device_communicators/pynccl.py", line 89, in __init__
[rank0]:     self.comm: ncclComm_t = self.nccl.ncclCommInitRank(
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 244, in ncclCommInitRank
[rank0]:     self.NCCL_CHECK(self._funcs["ncclCommInitRank"](ctypes.byref(comm),
[rank0]:   File "/miniconda/envs/vllama/lib/python3.9/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 223, in NCCL_CHECK
[rank0]:     raise RuntimeError(f"NCCL error: {error_str}")
[rank0]: RuntimeError: NCCL error: unhandled cuda error (run with NCCL_DEBUG=INFO for details)

from vllm.

youkaichao avatar youkaichao commented on August 25, 2024

that's another error you need to check, following https://docs.vllm.ai/en/latest/getting_started/debugging.html

from vllm.

cx-hub avatar cx-hub commented on August 25, 2024

device_communicators/pynccl_wrapper.py", line 223, in NCCL_CHECK

i found why, i use cuda11.8, however the runtime cuda is 11.7.
but i can not change the runtime cuda version, so is there any method to solve it?

init.cc:1674 NCCL WARN Cuda failure 'CUDA driver version is insufficient for CUDA runtime version'

from vllm.

cx-hub avatar cx-hub commented on August 25, 2024

may i use torch 2.0.1 and vllm to offline the llama3 70B or qwen?

from vllm.

youkaichao avatar youkaichao commented on August 25, 2024

Nvidia driver version: 515.43.04

you need to contact your admin to see how to update the driver version.

from vllm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.