It works when I use cuda:0 while it not works when I use cuda:1. Same on inferecing.<b

fixed by <a href="https://github.com/NVIDIA/TensorRT/issues/1050" data-hovercard-type=

Cool! Did you load 'libcudart.so' and invoke <cod

How to deploy with --device cuda:1 ? about mmdeploy HOT 6 CLOSED

open-mmlab commented on August 24, 2024

How to deploy with --device cuda:1 ?

from mmdeploy.

Comments (6)

DeepBlue97 commented on August 24, 2024 2

fixed by How to set cuda device with tensorRT python API?

It worked in tools/test.py and tools/deploy.py to convert and infer, but can't visualize model of the backend.

from mmdeploy.

grimoire commented on August 24, 2024

Err, first you can try move with torch.cuda.device(device): to the beginning of create_trt_engine in tensorrt/utils.py.
If that does not works, You can set CUDA_VISIBLE_DEVICES=1 when convert your model with cuda:0 and do inference on cuda:1.

I do not have a host with multiple devices for now. So I am not sure if these two methods will work. I will try it ASAP.

from mmdeploy.

DeepBlue97 commented on August 24, 2024

Failed in first one and second one.

Firstly:
I changed:

def create_trt_engine(onnx_model: Union[str, onnx.ModelProto],
input_shapes: Dict[str, Sequence[int]],
log_level: trt.Logger.Severity = trt.Logger.ERROR,
fp16_mode: bool = False,
int8_mode: bool = False,
int8_param: dict = None,
max_workspace_size: int = 0,
device_id: int = 0,
**kwargs) -> trt.ICudaEngine:

device = torch.device('cuda:{}'.format(device_id))
with torch.cuda.device(device):
    load_tensorrt_plugin()
    # create builder and network
    .................................
    engine = builder.build_engine(network, config)

assert engine is not None, 'Failed to create TensorRT engine'
return engine

but failed with log:
load checkpoint from local path: /home/aiuser/workspace/mmdetection/checkpoints/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth
2022-01-27:08:57:24,root ERROR [utils.py:41] CUDA error: invalid device ordinal
Traceback (most recent call last):
File "/home/aiuser/workspace/lab_mmdeploy/MMDeploy/mmdeploy/utils/utils.py", line 36, in target_wrapper
result = target(*args, **kwargs)
File "/home/aiuser/workspace/lab_mmdeploy/MMDeploy/mmdeploy/apis/pytorch2onnx.py", line 87, in torch2onnx
torch_model = task_processor.init_pytorch_model(model_checkpoint)
File "/home/aiuser/workspace/lab_mmdeploy/MMDeploy/mmdeploy/codebase/mmdet/deploy/object_detection.py", line 94, in init_pytorch_model
cfg_options)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/mmdet/apis/inference.py", line 51, in init_detector
model.to(device)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 673, in to
return self._apply(convert)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 409, in _apply
param_applied = fn(param)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 671, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal
2022-01-27 08:57:25,028 - mmdeploy - ERROR - torch2onnx failed.

Secondly:
I set: CUDA_VISIBLE_DEVICES=1

but failed:

result = inference_model(model_cfg, deploy_cfg, backend_models, img=img, device=device)
2022-01-27 08:59:19,222 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /home/aiuser/workspace/lab_mmdeploy/MMDeploy/build/lib/libmmdeploy_tensorrt_ops.so
2022-01-27 08:59:19,222 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /home/aiuser/workspace/lab_mmdeploy/MMDeploy/build/lib/libmmdeploy_tensorrt_ops.so
[01/27/2022-08:59:19] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0
[01/27/2022-08:59:19] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0
Traceback (most recent call last):
File "", line 1, in
File "/home/aiuser/workspace/lab_mmdeploy/MMDeploy/mmdeploy/apis/inference.py", line 36, in inference_model
model_inputs, _ = task_processor.create_input(img, input_shape)
File "/home/aiuser/workspace/lab_mmdeploy/MMDeploy/mmdeploy/codebase/mmdet/deploy/object_detection.py", line 150, in create_input
data = scatter(data, [self.device])[0]
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/mmcv/parallel/scatter_gather.py", line 44, in scatter
return scatter_map(inputs)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/mmcv/parallel/scatter_gather.py", line 34, in scatter_map
out = list(map(type(obj), zip(*map(scatter_map, obj.items()))))
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/mmcv/parallel/scatter_gather.py", line 29, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/mmcv/parallel/scatter_gather.py", line 31, in scatter_map
out = list(map(list, zip(*map(scatter_map, obj))))
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/mmcv/parallel/scatter_gather.py", line 19, in scatter_map
return OrigScatter.apply(target_gpus, None, dim, obj)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/parallel/_functions.py", line 92, in forward
streams = [_get_stream(device) for device in target_gpus]
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/parallel/_functions.py", line 92, in
streams = [_get_stream(device) for device in target_gpus]
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/parallel/_functions.py", line 119, in _get_stream
if _streams[device] is None:
IndexError: list index out of range

from mmdeploy.

grimoire commented on August 24, 2024

Cool!
Did you load 'libcudart.so' and invoke cudaSetDevice? Where did you add the code?
Theoretically with torch.cuda.device(device) will do the very same as cudaSetDevice. I still can't find a device with 2 GPUs so I can not test it for now.

from mmdeploy.

DeepBlue97 commented on August 24, 2024

To use deploy.py:

I paste these code into mmdeploy/backend/tensorrt/init_plugins.py after "ctypes.CDLL(lib_path)" and do some other
steps to got "device_idx" if using cuda.

def load_tensorrt_plugin(device_idx=None) -> bool:
    """Load TensorRT plugins library.

    Returns:
        bool: True if TensorRT plugin library is successfully loaded.
    """
    lib_path = get_ops_path()
    success = False
    logger = get_root_logger()
    if os.path.exists(lib_path):

        # fixed by Peter.W: enable cuda:1 device
        ctypes.CDLL(lib_path)
        if device_idx is not None:
            from ctypes import cdll, c_char_p
            libcudart = cdll.LoadLibrary('libcudart.so')
            libcudart.cudaGetErrorString.restype = c_char_p
            def cudaSetDevice(device_idx):
                ret = libcudart.cudaSetDevice(device_idx)
                if ret != 0:
                    error_string = libcudart.cudaGetErrorString(ret)
                    raise RuntimeError("cudaSetDevice: " + error_string)
            cudaSetDevice(device_idx=device_idx)

        # logger.info(f'Successfully loaded tensorrt plugins from {lib_path}')
        logger.info(f'Successfully loaded tensorrt plugins from {lib_path}, device_idx: {device_idx}')
        success = True
    else:
        logger.warning(f'Could not load the library of tensorrt plugins. \
            Because the file does not exist: {lib_path}')
    return success

To use test.py:

I paste these code into tools/test.py after "args = parse_args()":

def main():
    args = parse_args()

    # fixed by Peter.W: enable cuda:1
    from mmdeploy.utils import parse_device_id
    device_idx = parse_device_id(args.device)
    if device_idx >= 0:
        from ctypes import cdll, c_char_p
        libcudart = cdll.LoadLibrary('libcudart.so')
        libcudart.cudaGetErrorString.restype = c_char_p

        def cudaSetDevice(device_idx):
            ret = libcudart.cudaSetDevice(device_idx)
            if ret != 0:
                error_string = libcudart.cudaGetErrorString(ret)
                raise RuntimeError("cudaSetDevice: " + error_string)

        cudaSetDevice(device_idx=device_idx)

from mmdeploy.

lvhan028 commented on August 24, 2024

closing it since no activity for quite a long time. You can reopen it if the issue still happens. Thanks

from mmdeploy.

How to deploy with --device cuda:1 ? about mmdeploy HOT 6 CLOSED

Comments (6)

To use deploy.py:

To use test.py:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent