A Viam MLModelService resource backed by Nvidia's Triton Server. You can read a tutorial on how to set it up on the Viam docs site.
This module is still under active development.
- An NVidia Jetson Orin board with Jetpack 5 installed. Note that use of an NVMe SSD is strongly recommended over using an SD card.
- Ensure that the NVidia Container Runtime is installed:
sudo apt-get install nvidia-container
. Note thatnvidia-container
is part ofnvidia-jetpack
so if you have Jetpack installed on the board, you probably already have this. But it is worth running the above command to make sure!
Follow the instructions to add a modular service to your robot, and search for "triton", then select the version from the Registry.
This module currently requires manual setup of a Model
Repository
under the ~/.viam
directory of the user who will run
viam-server
. Here, we place the model repository under
~/.viam/triton/repository
, but the exact subpath under ~/.viam
does not matter.
For instance, to add the EfficientDet-Lite4 Object Detection model, you should have a layout like this after unpacking the module:
$ tree ~/.viam
~/.viam
├── cached_cloud_config_05536cf6-f8a6-464f-b05c-bf1e57e9d1d9.json
└── triton
└── repository
└── efficientdet-lite4-detection
├── 1
│ └── model.savedmodel
│ ├── saved_model.pb
│ └── variables
│ ├── variables.data-00000-of-00001
│ └── variables.index
└── config.pbext
The config.pbext
file must exist, but at least for TensorFlow models
it can be empty. The version here is 1
but it can be any positive
integer. Newer versions will be preferred by default.
The next step is to create an instance of the resource this module
serves. This will go in the services
section of your robot's JSON
configuration:
A minimal configuration looks like:
...
"services": [
...
{
"type": "mlmodel",
"attributes": {
"model_name": "efficientdet-lite4-detection",
"model_repository_path": "/path/to/.viam/triton/repository",
},
"model": "viam:mlmodelservice:triton",
"name": "mlmodel-effdet-triton"
},
...
],
...
A complete configuration, specifying many optional parameters, might look like:
...
"services": [
...
{
"type": "mlmodel",
"attributes": {
"backend_directory": "/opt/tritonserver/backends",
"model_name": "efficientdet-lite4-detection",
"model_version": 1,
"model_repository_path": "/path/to/.viam/triton/repository",
"preferred_input_memory_type_id": 0,
"preferred_input_memory_type": "gpu",
"tensor_name_remappings": {
"outputs": {
"output_3": "n_detections",
"output_0": "location",
"output_1": "score",
"output_2": "category"
},
"inputs": {
"images": "image"
}
}
},
"model": "viam:mlmodelservice:triton",
"name": "mlmodel-effdet-triton"
},
...
],
...
The type
field must be mlmodel
, and the model
field must use the
viam:mlmodelservice:triton
tag, but the name
of this module is up
to you. The following attribute
level configurations are available:
-
model_name
[required]: The model to be loaded from the repository. -
model_repository_path
[required]: The (container side) path to a model repository. Note that this must be a subdirectory of the$HOME/.viam
directory of the user runningviam-server
. -
backend_directory
[optional, default determined at build time]: A container side path to the TritonServer "backend" directory. You normally do not need to override this; the build will set it to the backend directory of the Triton Server installation in the container. You may set it if you wish to use a different set of backends. -
model_version
[optional, defaults to -1, meaning 'newest']: The version of the model to be loaded. If not specified, the module will use the newest version of the model named bymodel_name
. -
preferred_input_memory_type
[optional, see below for default]: One ofcpu
,cpu-pinned
, orgpu
. This controlls the type of memory that will be allocated by the module for input tensors. If not specified, this will default tocpu
if no CUDA-capable devices are detected at runtime, or togpu
if CUDA-capable devices are found. -
preferred_input_memory_type_id
[optional, defaults to0
]: CUDA identifier on which to allocategpu
orcpu-pinned
input tensors. This defaults to0
, meaning the first device. You probably don't need to change this unless you have multiple GPUs. -
tensor_name_remappings
[optional, defaults to{}
]: Provides two dictionaries under theinput
andoutput
keys that rename the models tensors. Higher level services may expect tensors with particular names (e.g. the Viam Vision services). Use this map to rename the tensors from the loaded model as needed to meet those requirements.
If all has gone right, you can now create a Viam vision service with a configuration like the following:
...
"services": [
...
{
"attributes": {
"mlmodel_name": "mlmodel-effdet-triton"
},
"model": "mlmodel",
"name": "vision-effdet-triton",
"type": "vision"
}
You can now connect this vision service to a transform camera, or get detections programatically via any SDK.
I recommend using the
jtop
utility on the
Jetson line in order to monitor GPU usage and validate that Triton is
accelerating inference via the GPU.