Coder Social home page Coder Social logo

Comments (6)

DeepBlue97 avatar DeepBlue97 commented on August 24, 2024 2

fixed by How to set cuda device with tensorRT python API?

It worked in tools/test.py and tools/deploy.py to convert and infer, but can't visualize model of the backend.

from mmdeploy.

grimoire avatar grimoire commented on August 24, 2024

Err, first you can try move with torch.cuda.device(device): to the beginning of create_trt_engine in tensorrt/utils.py.
If that does not works, You can set CUDA_VISIBLE_DEVICES=1 when convert your model with cuda:0 and do inference on cuda:1.

I do not have a host with multiple devices for now. So I am not sure if these two methods will work. I will try it ASAP.

from mmdeploy.

DeepBlue97 avatar DeepBlue97 commented on August 24, 2024

Failed in first one and second one.

Firstly:
I changed:

def create_trt_engine(onnx_model: Union[str, onnx.ModelProto],
input_shapes: Dict[str, Sequence[int]],
log_level: trt.Logger.Severity = trt.Logger.ERROR,
fp16_mode: bool = False,
int8_mode: bool = False,
int8_param: dict = None,
max_workspace_size: int = 0,
device_id: int = 0,
**kwargs) -> trt.ICudaEngine:

device = torch.device('cuda:{}'.format(device_id))
with torch.cuda.device(device):
    load_tensorrt_plugin()
    # create builder and network
    .................................
    engine = builder.build_engine(network, config)

assert engine is not None, 'Failed to create TensorRT engine'
return engine

but failed with log:
load checkpoint from local path: /home/aiuser/workspace/mmdetection/checkpoints/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth
2022-01-27:08:57:24,root ERROR [utils.py:41] CUDA error: invalid device ordinal
Traceback (most recent call last):
File "/home/aiuser/workspace/lab_mmdeploy/MMDeploy/mmdeploy/utils/utils.py", line 36, in target_wrapper
result = target(*args, **kwargs)
File "/home/aiuser/workspace/lab_mmdeploy/MMDeploy/mmdeploy/apis/pytorch2onnx.py", line 87, in torch2onnx
torch_model = task_processor.init_pytorch_model(model_checkpoint)
File "/home/aiuser/workspace/lab_mmdeploy/MMDeploy/mmdeploy/codebase/mmdet/deploy/object_detection.py", line 94, in init_pytorch_model
cfg_options)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/mmdet/apis/inference.py", line 51, in init_detector
model.to(device)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 673, in to
return self._apply(convert)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 409, in _apply
param_applied = fn(param)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 671, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal
2022-01-27 08:57:25,028 - mmdeploy - ERROR - torch2onnx failed.

Secondly:
I set: CUDA_VISIBLE_DEVICES=1

but failed:

result = inference_model(model_cfg, deploy_cfg, backend_models, img=img, device=device)
2022-01-27 08:59:19,222 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /home/aiuser/workspace/lab_mmdeploy/MMDeploy/build/lib/libmmdeploy_tensorrt_ops.so
2022-01-27 08:59:19,222 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /home/aiuser/workspace/lab_mmdeploy/MMDeploy/build/lib/libmmdeploy_tensorrt_ops.so
[01/27/2022-08:59:19] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0
[01/27/2022-08:59:19] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0
Traceback (most recent call last):
File "", line 1, in
File "/home/aiuser/workspace/lab_mmdeploy/MMDeploy/mmdeploy/apis/inference.py", line 36, in inference_model
model_inputs, _ = task_processor.create_input(img, input_shape)
File "/home/aiuser/workspace/lab_mmdeploy/MMDeploy/mmdeploy/codebase/mmdet/deploy/object_detection.py", line 150, in create_input
data = scatter(data, [self.device])[0]
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/mmcv/parallel/scatter_gather.py", line 44, in scatter
return scatter_map(inputs)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/mmcv/parallel/scatter_gather.py", line 34, in scatter_map
out = list(map(type(obj), zip(*map(scatter_map, obj.items()))))
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/mmcv/parallel/scatter_gather.py", line 29, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/mmcv/parallel/scatter_gather.py", line 31, in scatter_map
out = list(map(list, zip(*map(scatter_map, obj))))
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/mmcv/parallel/scatter_gather.py", line 19, in scatter_map
return OrigScatter.apply(target_gpus, None, dim, obj)
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/parallel/_functions.py", line 92, in forward
streams = [_get_stream(device) for device in target_gpus]
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/parallel/_functions.py", line 92, in
streams = [_get_stream(device) for device in target_gpus]
File "/home/aiuser/miniconda3/envs/mmdeploy/lib/python3.7/site-packages/torch/nn/parallel/_functions.py", line 119, in _get_stream
if _streams[device] is None:
IndexError: list index out of range

from mmdeploy.

grimoire avatar grimoire commented on August 24, 2024

Cool!
Did you load 'libcudart.so' and invoke cudaSetDevice? Where did you add the code?
Theoretically with torch.cuda.device(device) will do the very same as cudaSetDevice. I still can't find a device with 2 GPUs so I can not test it for now.

from mmdeploy.

DeepBlue97 avatar DeepBlue97 commented on August 24, 2024

To use deploy.py:

I paste these code into mmdeploy/backend/tensorrt/init_plugins.py after "ctypes.CDLL(lib_path)" and do some other
steps to got "device_idx" if using cuda.

def load_tensorrt_plugin(device_idx=None) -> bool:
    """Load TensorRT plugins library.

    Returns:
        bool: True if TensorRT plugin library is successfully loaded.
    """
    lib_path = get_ops_path()
    success = False
    logger = get_root_logger()
    if os.path.exists(lib_path):

        # fixed by Peter.W: enable cuda:1 device
        ctypes.CDLL(lib_path)
        if device_idx is not None:
            from ctypes import cdll, c_char_p
            libcudart = cdll.LoadLibrary('libcudart.so')
            libcudart.cudaGetErrorString.restype = c_char_p
            def cudaSetDevice(device_idx):
                ret = libcudart.cudaSetDevice(device_idx)
                if ret != 0:
                    error_string = libcudart.cudaGetErrorString(ret)
                    raise RuntimeError("cudaSetDevice: " + error_string)
            cudaSetDevice(device_idx=device_idx)

        # logger.info(f'Successfully loaded tensorrt plugins from {lib_path}')
        logger.info(f'Successfully loaded tensorrt plugins from {lib_path}, device_idx: {device_idx}')
        success = True
    else:
        logger.warning(f'Could not load the library of tensorrt plugins. \
            Because the file does not exist: {lib_path}')
    return success

To use test.py:

I paste these code into tools/test.py after "args = parse_args()":

def main():
    args = parse_args()

    # fixed by Peter.W: enable cuda:1
    from mmdeploy.utils import parse_device_id
    device_idx = parse_device_id(args.device)
    if device_idx >= 0:
        from ctypes import cdll, c_char_p
        libcudart = cdll.LoadLibrary('libcudart.so')
        libcudart.cudaGetErrorString.restype = c_char_p

        def cudaSetDevice(device_idx):
            ret = libcudart.cudaSetDevice(device_idx)
            if ret != 0:
                error_string = libcudart.cudaGetErrorString(ret)
                raise RuntimeError("cudaSetDevice: " + error_string)

        cudaSetDevice(device_idx=device_idx)

from mmdeploy.

lvhan028 avatar lvhan028 commented on August 24, 2024

closing it since no activity for quite a long time. You can reopen it if the issue still happens. Thanks

from mmdeploy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.