allegroai / clearml-serving Goto Github PK
View Code? Open in Web Editor NEWClearML - Model-Serving Orchestration and Repository Solution
Home Page: https://clear.ml
License: Apache License 2.0
ClearML - Model-Serving Orchestration and Repository Solution
Home Page: https://clear.ml
License: Apache License 2.0
From the docs, i can see that there are commands to add model to an endpoint, and also add a model monitoring via the auto-update command. I can't seem to find any commands to remove the model monitoring. I can only do model removal
Is there no such capabilities for now, or is the doc just not updated?
is there any way that I can take the model from clearml server to my machine through API so I can run it locally without download it? thank you
Trying to call model endpoint where the model stored on GCP bucket, getting the error:
clearml.storage - ERROR - Google cloud driver not found. Please install driver using: pip install "google-cloud-storage>=1.13.2"
After installing manually it works.
We use the k8s version - installed with helm
Have been working on model ensemble, continuing conversations from #53, just thought it may be a better idea to create new issues for different things that I find along the way. Essentially we want the output of the model be a S3 path where all the results are saved to as a JSON.
However it doesn't seem like clearml-serving is mapping object
datatype properly? Triton does support strings..
The issue lies here I believe: np_to_triton_dtype. This currently maps an object to TYPE_BYTES
to be written to the config.pbtxt
file (which is not a valid type as per link above), whereas it should be TYPE_STRING
.
Models which are located on the clearML servers (created by Task.init(..., output_uri=True) ) run perfectly while models which are located on azure blob storage produce different problems in different scenarios:
test_model_pytorch': failed to open text file for read /models/test_model_pytorch/config.pbtxt: No such file or directory
.clearml-serving-triton | Error retrieving model ID ca186e8440b84049971a0b623df36783 []
clearml-serving-triton | Starting server: ['tritonserver', '--model-control-mode=poll', '--model-repository=/models', '--repository-poll-secs=60.0', '--metrics-port=8002', '--allow-metrics=true', '--allow-gpu-metrics=true']
clearml-serving-triton | Traceback (most recent call last):
clearml-serving-triton | File "clearml_serving/engines/triton/triton_helper.py", line 540, in <module>
clearml-serving-triton | main()
clearml-serving-triton | File "clearml_serving/engines/triton/triton_helper.py", line 532, in main
clearml-serving-triton | helper.maintenance_daemon(
clearml-serving-triton | File "clearml_serving/engines/triton/triton_helper.py", line 274, in maintenance_daemon
clearml-serving-triton | raise ValueError("triton-server process ended with error code {}".format(error_code))
clearml-serving-triton | ValueError: triton-server process ended with error code 1
Side note: The same problem occurs hosting the containers on windows and on linux. All azure credentials are succesfully set up as envioronment variables in 'clearml-serving-inference', 'clearml-serving-triton' and 'clearml-serving-statistics' containers.
Apologies that I may not have understood well as there are limited documentation.
From the read me:
"Notice: If we re-run our keras training example and publish a new model in the repository, the engine will automatically update to the new model."
I tested, when i first run my training but did not publish. When i start triton, this version is still avail in triton for inference. Is this correct?
I also tried after start triton with version 1, i retrain the same model with same params. The triton polling indicates no change, thus did not pull the models over. Can I ask if this is the intended behavior?
Trying to create a custom model using Ultralytics' YoloV8, I got this message while using Postman for testing my endpoint.
body payload:
{
"imgString": "base64encodedImage"
}
The preprocess input would be like this:
def preprocess(self, body: dict, state: dict, collect_custom_statistics_fn=None) -> Any:
print(body)
base64String = body.get("imgString")
print(base64String)
self._image = cv2.imdecode(np.frombuffer(base64.b64decode(base64String), np.uint8), cv2.IMREAD_COLOR)
self._scalingH, self._scalingW = self._image.shape[0]/imgSize, self._image.shape[1]/imgSize
data = cv2.resize(self._image, (imgSize, imgSize))
return data
The process
def process(
self,
data: Any,
state: dict,
collect_custom_statistics_fn: Optional[Callable[[dict], None]],
) -> Any: # noqa
# this is where we do the heavy lifting, i.e. run our model.
results = self._model.predict(data, imgsz = imgSize,
conf = configModel["model-config"]["conf"], iou = configModel["model-config"]["iou"],
save = configModel["model-config"]["save-mode"], save_conf = configModel["model-config"]["save-mode"],
save_crop = configModel["model-config"]["save-mode"], save_txt = configModel["model-config"]["save-mode"],
device = configModel["model-config"]["device-mode"])
return results
and the postprocess like this.
def postprocess(self, data: Any, state: dict, collect_custom_statistics_fn=None) -> dict:
results = data
classes = results[0].names
imgDict = {}
finalDict = {}
dictDataEntity = {}
for boxes in results[0].boxes:
for box in boxes:
labelNo = int(box.cls)
x1 = int(box.xyxy[0][0]*self._scalingW)
y1 = int(box.xyxy[0][1]*self._scalingH)
x2 = int(box.xyxy[0][2]*self._scalingW)
y2 = int(box.xyxy[0][3]*self._scalingH)
tempCrop = self._image[y1:y2, x1:x2]
imgDict.update({labelNo:tempCrop})
orderedDict = OrderedDict(sorted(imgDict.items()))
for key, value in orderedDict.items():
for classKey, classValue in classes.items():
if key == classKey:
finalDict[classValue] = value
img_v_resize = hconcat_resize(finalDict.values(),imgDelimiter) #
gray_imgResize = get_grayscale(img_v_resize) # call the grayscaling function
success, encoded_image = cv2.imencode('.jpg', gray_imgResize) # save the image in memory
BytesImage = encoded_image.tobytes()
a = cv2.resize(img_v_resize, (960, 540))
#cv2.imwrite("test.jpg", gray_imgResize)
text_response = get_text_response_from_path(BytesImage)
#========== POST PROCESSING ================#
dataEntity = text_response[0].description.strip() # show only the description info from gvision
a = [i.split("\n") for i in dataEntity.split('PEMISAH') if i]
value = []
value.clear()
for i in a:
c = [d for d in i if d]
listToStr = ' '.join([str(elem) for elem in c])
stripListToStr = listToStr.strip()
value.append(stripListToStr)
i = 0
for entity in classes.values():
dictDataEntity[entity] = value[i]
i+=1
if len(value) == i:
break
for label in classes.values():
if label not in dictDataEntity.keys():
dictDataEntity[label] = "-"
return dict(predict=dictDataEntity.tolist())
the problem is I want to check the logging to find which codes having problems from that and I can't find where the log is for preprocessing. because I'm pretty sure the problem is one of my codes but I cant find from what line it is. or is there any way to write the log in the docker log or terminal. Thanks
I've been following the example on Keras, but using a PyTorch model.
I have setup a serving instance with the following command:
clearml-serving triton --project "Caltech Birds/Deployment" --name "ResNet34 Serving"
I then added the model endpoint and the model ID of the model to be served:
clearml-serving triton --endpoint "resent34_cub200" --model-id "57ed24c1011346d292ecc9e797ccb47e"
The model was trained using an experiment script which included the generation of a config.pbtxt
configuration file at the time of completion of model training. This was connected to the experiment configuration as per the Keras example, and resulted in the following configuration being added to the experiment:
platform: "pytorch_libtorch"
input [
{
name: "input_layer"
data_type: TYPE_FP32
dims: [ 3, 224, 224 ]
}
]
output [
{
name: "fc"
data_type: TYPE_FP32
dims: [ 200 ]
}
]
I then created a queue on a GPU compute node (as the model requires GPU resource):
clearml-agent daemon --queue default --gpus all --detached --docker
The serving endpoint is then started with the following command:
clearml-serving launch -queue default
I can see two items in my deployment sub-project, the service I created, and a triton serving engine inference object.
On execution, the triton serving engine inference fails with the following errors:
2021-06-08 16:28:49
task f2fbb3218e8243be9f6ab37badbb4856 pulled from 2c28e5db27e24f348e1ff06ba93e80c5 by worker ecm-clearml-compute-gpu-002:0
2021-06-08 16:28:49
Running Task f2fbb3218e8243be9f6ab37badbb4856 inside docker: nvcr.io/nvidia/tritonserver:21.03-py3 arguments: ['--ipc=host', '-p', '8000:8000', '-p', '8001:8001', '-p', '8002:8002']
2021-06-08 16:28:50
Executing: ['docker', 'run', '-t', '--gpus', 'all', '--ipc=host', '-p', '8000:8000', '-p', '8001:8001', '-p', '8002:8002', '-e', 'CLEARML_WORKER_ID=ecm-clearml-compute-gpu-002:0', '-e', 'CLEARML_DOCKER_IMAGE=nvcr.io/nvidia/tritonserver:21.03-py3 --ipc=host -p 8000:8000 -p 8001:8001 -p 8002:8002', '-v', '/tmp/.clearml_agent.ft8vulpe.cfg:/root/clearml.conf', '-v', '/tmp/clearml_agent.ssh.j9b8arhf:/root/.ssh', '-v', '/home/edmorris/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/edmorris/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/edmorris/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/edmorris/.clearml/cache:/clearml_agent_cache', '-v', '/home/edmorris/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', 'nvcr.io/nvidia/tritonserver:21.03-py3', 'bash', '-c', 'apt-get update ; apt-get install -y git ; . /opt/conda/etc/profile.d/conda.sh ; conda activate base ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring --id f2fbb3218e8243be9f6ab37badbb4856']
2021-06-08 16:28:55
docker: Error response from daemon: driver failed programming external connectivity on endpoint wonderful_galileo (0c2feca5684f2f71b11fa1e8da4550d42b23c456e52ba0069d0aae64cd75f55b): Error starting userland proxy: listen tcp4 0.0.0.0:8001: bind: address already in use.
2021-06-08 16:28:55
Process failed, exit code 125
This could be related to the parameters of the Triton docker container, which includes both ipc=host
and specific port mapping '-p', '8000:8000'
. This appears to be specified in the ServingService.launch_service()
method from the ServingService
Class from the clearml-serving
package would appear to have both been specified as hard coded for the Triton docker container:
def launch_engine(self, queue_name, queue_id=None, verbose=True):
# type: (Optional[str], Optional[str], bool) -> None
"""
Launch serving engine on a specific queue
:param queue_name: Queue name to launch the engine service running the inference on.
:param queue_id: specify queue id (unique stand stable) instead of queue_name
:param verbose: If True print progress to console
"""
# todo: add more engines
if self._engine_type == 'triton':
# create the serving engine Task
engine_task = Task.create(
project_name=self._task.get_project_name(),
task_name="triton serving engine",
task_type=Task.TaskTypes.inference,
repo="https://github.com/allegroai/clearml-serving.git",
branch="main",
commit="ad049c51c146e9b7852f87e2f040e97d88848a1f",
script="clearml_serving/triton_helper.py",
working_directory=".",
docker="nvcr.io/nvidia/tritonserver:21.03-py3 --ipc=host -p 8000:8000 -p 8001:8001 -p 8002:8002",
argparse_args=[('serving_id', self._task.id), ],
add_task_init_call=False,
)
if verbose:
print('Launching engine {} on queue {}'.format(self._engine_type, queue_id or queue_name))
engine_task.enqueue(task=engine_task, queue_name=queue_name, queue_id=queue_id)
Hello! I am trying to play around with the configs for gRPC
for the triton
server.
I’m using the docker-compose
setup, so not sure if the CLI will work for my usecase (perhaps passing them as env variables would work?)
For instance, I’d like to set some variables like this:
('grpc.max_send_message_length', 512 * 1024 * 1024), ('grpc.max_receive_message_length', 512 * 1024 * 1024)]
Is this possible currently? I’m getting an error from gRPC that my payload is more than the limit (8MB instead of 4MB…)
Following commit b5f5d72 , the fixes regarding the container arguments and the cloud service python SDK's were resolved, howeve the Triton server still cannot find the downloaded model from Azure Blob Store locally.
This is because the name of the file is inherited from the Azure filename, rather than the expected "model.pt" that Triton is looking for. The model is placed in the correct folder structure, just not the correct name.
I successfully resolved this in my fork, by placing the following at the end of the triton_model_service_update_step
method of the ServingService
class.
new_target_path = Path(os.path.join(target_path.parent),'model.pt')
shutil.move(target_path.as_posix(), new_target_path.as_posix())
Hi, I have encountered error stating that model was expecting input [1 28 28] but given [1 784] when trying out the pytorch example. I think it is due to the flatten() of the array before return by the preprocess method.
Can I also ask
Thanks.
I saw this line in the readme:
"Notice: If we re-run our keras training example and publish a new model in the repository, the engine will automatically update to the new model."
I have a created a serving service. However, when i retrain my model with same project and task name, and publish the model once training is done, the model version deployed in triton is not updated.
May I know if this is a bug or did I misunderstand some steps?
Hi, currently when I use clearml serving, when deploying serving triton service, it is always version 21.03. Is there a way that I can config or set to 21.05? I need some feature from 21.05.
Hello,
I have a multi-tenant application and I would like to control who have access to each endpoint with API Keys. That is still a bit unclear for me. How can I authorize users before they consume some endpoint?
This question also extends to serving engines in general, like TorchServe. How people are normally controlling access to the inference APIs?
Thanks,
Bruno
The Triton server is now able to find the local copy of the model weight pt file and attempts to serve it, following fixes in #3.
The following error occurs when the model is served by the Triton Inference server:
Starting Task Execution:
clearml-serving - Nvidia Triton Engine Helper
ClearML results page: https://clearml-server.westeurope.cloudapp.azure.com/projects/779be4f4d83541d786eb839bb062fa93/experiments/364c73e36a454842a314169d78514034/output/log
String Triton Helper service
{'serving_id': 'b978817fa0544b94b2015b420a96f14c', 'project': 'serving', 'name': 'nvidia-triton', 'update_frequency': 10, 'metric_frequency': 1, 't_http_port': None, 't_http_thread_count': None, 't_allow_grpc': None, 't_grpc_port': None, 't_grpc_infer_allocation_pool_size': None, 't_pinned_memory_pool_byte_size': None, 't_cuda_memory_pool_byte_size': None, 't_min_supported_compute_capability': None, 't_buffer_manager_thread_count': None}
Updating local model folder: /models
[INFO]:: URL: cub200_resnet34 Endpoint: ServingService.EndPoint(serving_url='cub200_resnet34', model_ids=['57ed24c1011346d292ecc9e797ccb47e'], model_project=None, model_name=None, model_tags=None, model_config_blob='\n platform: "pytorch_libtorch"\n input [\n {\n name: "input_layer"\n data_type: TYPE_FP32\n dims: [ 3, 224, 224 ]\n }\n ]\n output [\n {\n name: "fc"\n data_type: TYPE_FP32\n dims: [ 200 ]\n }\n ]\n ', max_num_revisions=None, versions=OrderedDict())
[INFO]:: Model ID: 57ed24c1011346d292ecc9e797ccb47e Version: 1
[INFO]:: Model ID: 57ed24c1011346d292ecc9e797ccb47e Model URL: azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,447 - clearml.storage - INFO - Downloading: 5.00MB / 81.72MB @ 18.80MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,730 - clearml.storage - INFO - Downloading: 13.00MB / 81.72MB @ 28.29MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,741 - clearml.storage - INFO - Downloading: 21.00MB / 81.72MB @ 684.91MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,760 - clearml.storage - INFO - Downloading: 29.00MB / 81.72MB @ 426.19MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,791 - clearml.storage - INFO - Downloading: 37.00MB / 81.72MB @ 258.86MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,806 - clearml.storage - INFO - Downloading: 45.00MB / 81.72MB @ 535.17MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,907 - clearml.storage - INFO - Downloading: 53.00MB / 81.72MB @ 79.03MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,963 - clearml.storage - INFO - Downloading: 61.72MB / 81.72MB @ 155.64MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,968 - clearml.storage - INFO - Downloading: 69.72MB / 81.72MB @ 1502.19MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,979 - clearml.storage - INFO - Downloading: 77.72MB / 81.72MB @ 790.76MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,985 - clearml.storage - INFO - Downloaded 81.72 MB successfully from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt , saved to /clearml_agent_cache/storage_manager/global/e38f6052e6b887337635fc2821a6b5d4.cub200_resnet34_ignite_best_model_0.pt
[INFO] Local path to the model: /clearml_agent_cache/storage_manager/global/e38f6052e6b887337635fc2821a6b5d4.cub200_resnet34_ignite_best_model_0.pt
Update model v1 in /models/cub200_resnet34/1
[INFO] Target Path:: /models/cub200_resnet34/1/e38f6052e6b887337635fc2821a6b5d4.cub200_resnet34_ignite_best_model_0.pt
[INFO] Local Path:: /clearml_agent_cache/storage_manager/global/e38f6052e6b887337635fc2821a6b5d4.cub200_resnet34_ignite_best_model_0.pt
[INFO] New Target Path:: /models/cub200_resnet34/1/model.pt
Starting server: ['tritonserver', '--model-control-mode=poll', '--model-repository=/models', '--repository-poll-secs=600.0', '--metrics-port=8002', '--allow-metrics=true', '--allow-gpu-metrics=true']
I0610 15:20:55.182775 671 metrics.cc:221] Collecting metrics for GPU 0: Tesla P40
I0610 15:20:55.498654 671 libtorch.cc:940] TRITONBACKEND_Initialize: pytorch
I0610 15:20:55.498688 671 libtorch.cc:950] Triton TRITONBACKEND API version: 1.0
I0610 15:20:55.498699 671 libtorch.cc:956] 'pytorch' TRITONBACKEND API version: 1.0
2021-06-10 15:20:55.688775: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0610 15:20:55.729429 671 tensorflow.cc:1880] TRITONBACKEND_Initialize: tensorflow
I0610 15:20:55.729458 671 tensorflow.cc:1890] Triton TRITONBACKEND API version: 1.0
I0610 15:20:55.729464 671 tensorflow.cc:1896] 'tensorflow' TRITONBACKEND API version: 1.0
I0610 15:20:55.729473 671 tensorflow.cc:1920] backend configuration:
{}
I0610 15:20:55.731061 671 onnxruntime.cc:1728] TRITONBACKEND_Initialize: onnxruntime
I0610 15:20:55.731085 671 onnxruntime.cc:1738] Triton TRITONBACKEND API version: 1.0
I0610 15:20:55.731095 671 onnxruntime.cc:1744] 'onnxruntime' TRITONBACKEND API version: 1.0
I0610 15:20:55.756821 671 openvino.cc:1166] TRITONBACKEND_Initialize: openvino
I0610 15:20:55.756848 671 openvino.cc:1176] Triton TRITONBACKEND API version: 1.0
I0610 15:20:55.756854 671 openvino.cc:1182] 'openvino' TRITONBACKEND API version: 1.0
I0610 15:20:56.081773 671 pinned_memory_manager.cc:205] Pinned memory pool is created at '0x7f229c000000' with size 268435456
I0610 15:20:56.082099 671 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 67108864
I0610 15:20:56.083854 671 model_repository_manager.cc:1065] loading: cub200_resnet34:1
I0610 15:20:56.184287 671 libtorch.cc:989] TRITONBACKEND_ModelInitialize: cub200_resnet34 (version 1)
I0610 15:20:56.185272 671 libtorch.cc:1030] TRITONBACKEND_ModelInstanceInitialize: cub200_resnet34 (device 0)
1623338462128 ecm-clearml-compute-gpu-002:gpuall DEBUG I0610 15:20:59.633139 671 libtorch.cc:1063] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0610 15:20:59.633184 671 libtorch.cc:1012] TRITONBACKEND_ModelFinalize: delete model state
E0610 15:20:59.633206 671 model_repository_manager.cc:1242] failed to load 'cub200_resnet34' version 1: Internal: failed to load model 'cub200_resnet34': [enforce fail at inline_container.cc:227] . file not found: archive/constants.pkl
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void const*) + 0x68 (0x7f23c6279498 in /opt/tritonserver/backends/pytorch/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::getRecordID(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xda (0x7f23a1a23d4a in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #2: caffe2::serialize::PyTorchStreamReader::getRecord(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x38 (0x7f23a1a23da8 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #3: torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&) + 0xab (0x7f23a323508b in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #4: <unknown function> + 0x3c035e5 (0x7f23a32355e5 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #5: <unknown function> + 0x3c05fd0 (0x7f23a3237fd0 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #6: torch::jit::load(std::shared_ptr<caffe2::serialize::ReadAdapterInterface>, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x1ab (0x7f23a32391eb in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #7: torch::jit::load(std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0xc2 (0x7f23a323b332 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #8: torch::jit::load(std::istream&, c10::optional<c10::Device>) + 0x6a (0x7f23a323b41a in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #9: <unknown function> + 0x104a6 (0x7f23c67d44a6 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #10: <unknown function> + 0x12ac4 (0x7f23c67d6ac4 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #11: <unknown function> + 0x13772 (0x7f23c67d7772 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #12: TRITONBACKEND_ModelInstanceInitialize + 0x374 (0x7f23c67d7b34 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #13: <unknown function> + 0x2f8a99 (0x7f24104a8a99 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #14: <unknown function> + 0x2f927c (0x7f24104a927c in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #15: <unknown function> + 0x2f77ec (0x7f24104a77ec in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #16: <unknown function> + 0x183c00 (0x7f2410333c00 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #17: <unknown function> + 0x191581 (0x7f2410341581 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #18: <unknown function> + 0xd6d84 (0x7f240fcead84 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #19: <unknown function> + 0x9609 (0x7f2410185609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #20: clone + 0x43 (0x7f240f9d8293 in /lib/x86_64-linux-gnu/libc.so.6)
I0610 15:20:59.633540 671 server.cc:500]
+-----------------...</char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char>
Originally posted by @ecm200 in #3 (comment)
Was following the tutorial for PyTorch, was able to create an endpoint successfully, but wasn't able to get an inference result (using both ways) due to a shape mismatch error.
unexpected shape for input 'INPUT__0' for model 'test_model_pytorch'. Expected [1,28,28], got [1,784]
I then tried setting the INPUT__0
shape to 1 784
but that didn't work either. Then realized that the preprocess
function in preprocess.py
flattens the data before returning it which was causing the error. Removing the flatten()
resolved my issue.
This also seems to be the case for the keras
example.
Hello clearml team,
Congrats on the release of clearml-serving
V2 🎉
I really wanted to check it out, and I'm having difficulties running the basic setup and scikit-learn
example commands on my side.
I want to run the Installation and the Toy model (scikit learn) deployment example
I have a self-hosted clearml Server
built with the helm chart on Kubernetes.
The environment variables of clearml-serving/docker/docker-compose.yml
where defined in the myexemple.env
file, and starts like this :
CLEARML_WEB_HOST="<http://localhost:8080/>"
CLEARML_API_HOST="<http://localhost:8008/>"
CLEARML_FILES_HOST="<http://localhost:8081/>"
Upon running docker-compose , both clearml-serving-inference
and clearml-serving-statistics
return errors:
Retrying (Retry(total=236, connect=236, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4065110310>: Failed to establish a new connection: [Errno 111] Connection refused')': /auth.login
I think the issue comes from the communication with the Kafka service, but I do not know how to solve this.
Has anyone encountered this issue and solved it before, since it's the default installation on the doc ?
Haven't found any related issues on any of the GitHub repos
Thanks for the help 🤖
Hello, ClearML team!
I'm trying to understand how serving auto-scaling works.
From readme:
Scalable
Multi model per container
Multi models per serving service
Multi-service support (fully seperated multiple serving service running independently)
Multi cluster support
Out-of-the-box node auto-scaling based on load/usage <---- *
I found that serving has ability to auto-scale, but in helm charts (triton for example) I only found replicas: 1
and didn't find an auto-scale implementation anywhere (like this https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/, for example).
Could you please clarify Clearml-serving scaling strategy and where can I find configuration files?
Thanks in advance.
I am unable to use clearml-serving
for model deployment on my setup.
OS: Ubuntu 22.04 Server LTS
Python: 3.10.6
Steps:
pip install clearml-serving
clearml-serving create --name "serving example"
I get the following error:
Traceback (most recent call last):
File "/home/user_65s/.local/bin/clearml-serving", line 5, in <module>
from clearml_serving.__main__ import main
File "/home/user_65s/.local/lib/python3.10/site-packages/clearml_serving/__main__.py", line 9, in <module>
from clearml_serving.serving.model_request_processor import ModelRequestProcessor, CanaryEP
File "/home/user_65s/.local/lib/python3.10/site-packages/clearml_serving/serving/model_request_processor.py", line 18, in <module>
from .preprocess_service import BasePreprocessRequest
File "/home/user_65s/.local/lib/python3.10/site-packages/clearml_serving/serving/preprocess_service.py", line 247, in <module>
class TritonPreprocessRequest(BasePreprocessRequest):
File "/home/user_65s/.local/lib/python3.10/site-packages/clearml_serving/serving/preprocess_service.py", line 253, in TritonPreprocessRequest
np.int: 'int_contents',
File "/home/user_65s/.local/lib/python3.10/site-packages/numpy/__init__.py", line 284, in __getattr__
raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'int'. Did you mean: 'inf'?
It appears you are using np.int
internally which has been deprecated since numpy 1.20:
1: DeprecationWarning:
np.int
is a deprecated alias for the builtinint
. To silence this warning, useint
by itself. Doing this will not modify any behavior and is safe. When replacingnp.int
, you may wish to use e.g.np.int64
ornp.int32
to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
When I downgrade to numpy==1.23.*
it works.
Hello. I have some helper functions that are shared acros the preprocess.py
files, so I'd like to refactor them. However, I'm not sure where I can put them, and how to import them? The pythonpath
seems to be /root/clearml
, but I can't find any of the files when I start browing there inside the inference
Docker container.
Any insights?
Goal: Create a simple interface to serve multiple models with scalable serving engines on top of Kubernetes
Design Diagram (edit here)
Features
Modules
Usage Example
Just noticed that the output type argument have a different syntax depending on which clearml-serving model
command is run:
clearml-serving --id xxxxxxxxx model auto-update [...] --output-type float32
Returns an error:
clearml-serving: error: unrecognized arguments: --output-type float32
but it works with --output_type
.
If you run the command clearml-serving model add
, is the other way around: the argument --output_type
throws an error, --output-type
works just fine.
I have a question regarding the use of multiple TensorRT engines and how ClearML addresses this issue. As you may know, TensorRT plan files need to be optimized based on the compute capability of each GPU. Consequently, each GPU requires a distinct plan file. Triton addresses this by introducing a variable named cc_model_filenames
in config.pbtxt
, where we specify which model will be used for each GPU, based on the compute capability. However, in ClearML, and specifically within triton_helper.py
, it seems that any plan file is renamed to model.plan
. This approach appears to be problematic in cases where different GPUs are used. For example, in my configuration, I have:
model-repository
| -------- Resnet50
| -------- config.pbtxt
| -------- 1
| -------- resnet50_T4.plan
| -------- resnet50_A100.plan
And my config.pbtxt
looks like this:
cc_model_filenames [
{
key: "7.5"
value: "resnet50_T4.plan"
},
{
key: "8.0"
value: "resnet50_A100.plan"
}
]
Given the code written in triton_helper.py, is it possible to manage multiple models?
Getting the following error when trying to deploy to ECS using docker-compose in the Kafka service:
Unable to canonicalize address clearml-serving-zookeeper:2181 because it's not resolvable
Wondering, why are the ports
commented out in the docker-compose
file?
The zookeeper service seemed to be up and running on the ECS console.
Thanks!
Hello!
I use ClearML free (the one without configuration vault stuff) + clearml-serving module
When I spinned docker-compose and tried to pull model from our s3, I've got an error in tritonserver container:
2024-03-13 11:26:56,913 - clearml.storage - WARNING - Failed getting object size: ClientError('An error occurred (403) when calling the HeadObject operation: Forbidden')
2024-03-13 14:26:57
2024-03-13 11:26:57,042 - clearml.storage - ERROR - Could not download s3://<BUCKET>/<FOLDER>/<PROJECT>/<TASK_NAME>.75654091e56141199c9d9594305d6872/models/model_package.zip , err: An error occurred (403) when calling the HeadObject operation: Forbidden
But I've set env variables in example.env (AWS_ ones too) and I could find them in tritonserver container via
$ env | grep CLEARML
$ env | grep AWS
version: "3"
services:
zookeeper:
image: bitnami/zookeeper:3.7.0
container_name: clearml-serving-zookeeper
# ports:
# - "2181:2181"
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
networks:
- clearml-serving-backend
kafka:
image: bitnami/kafka:3.1.1
container_name: clearml-serving-kafka
# ports:
# - "9092:9092"
environment:
- KAFKA_BROKER_ID=1
- KAFKA_CFG_LISTENERS=PLAINTEXT://clearml-serving-kafka:9092
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://clearml-serving-kafka:9092
- KAFKA_CFG_ZOOKEEPER_CONNECT=clearml-serving-zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
- KAFKA_CREATE_TOPICS="topic_test:1:1"
depends_on:
- zookeeper
networks:
- clearml-serving-backend
prometheus:
image: prom/prometheus:v2.34.0
container_name: clearml-serving-prometheus
volumes:
- ./prometheus.yml:/prometheus.yml
command:
- '--config.file=/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=200h'
- '--web.enable-lifecycle'
restart: unless-stopped
# ports:
# - "9090:9090"
depends_on:
- clearml-serving-statistics
networks:
- clearml-serving-backend
alertmanager:
image: prom/alertmanager:v0.23.0
container_name: clearml-serving-alertmanager
restart: unless-stopped
# ports:
# - "9093:9093"
depends_on:
- prometheus
- grafana
networks:
- clearml-serving-backend
grafana:
image: grafana/grafana:8.4.4-ubuntu
container_name: clearml-serving-grafana
volumes:
- './datasource.yml:/etc/grafana/provisioning/datasources/datasource.yaml'
restart: unless-stopped
ports:
- "3001:3000"
depends_on:
- prometheus
networks:
- clearml-serving-backend
clearml-serving-inference:
image: allegroai/clearml-serving-inference:1.3.1-vllm
build:
context: ../
dockerfile: clearml_serving/serving/Dockerfile
container_name: clearml-serving-inference
restart: unless-stopped
# optimize perforamnce
security_opt:
- seccomp:unconfined
ports:
- "8080:8080"
environment:
CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml}
CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml}
CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml}
CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY}
CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY}
CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-}
CLEARML_SERVING_PORT: ${CLEARML_SERVING_PORT:-8080}
CLEARML_SERVING_POLL_FREQ: ${CLEARML_SERVING_POLL_FREQ:-1.0}
CLEARML_DEFAULT_BASE_SERVE_URL: ${CLEARML_DEFAULT_BASE_SERVE_URL:-http://127.0.0.1:8080/serve}
CLEARML_DEFAULT_KAFKA_SERVE_URL: ${CLEARML_DEFAULT_KAFKA_SERVE_URL:-clearml-serving-kafka:9092}
CLEARML_DEFAULT_TRITON_GRPC_ADDR: ${CLEARML_DEFAULT_TRITON_GRPC_ADDR:-clearml-serving-triton:8001}
CLEARML_USE_GUNICORN: ${CLEARML_USE_GUNICORN:-}
CLEARML_SERVING_NUM_PROCESS: ${CLEARML_SERVING_NUM_PROCESS:-}
CLEARML_EXTRA_PYTHON_PACKAGES: ${CLEARML_EXTRA_PYTHON_PACKAGES:-}
AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-}
AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-}
AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION:-}
GOOGLE_APPLICATION_CREDENTIALS: ${GOOGLE_APPLICATION_CREDENTIALS:-}
AZURE_STORAGE_ACCOUNT: ${AZURE_STORAGE_ACCOUNT:-}
AZURE_STORAGE_KEY: ${AZURE_STORAGE_KEY:-}
depends_on:
- kafka
- clearml-serving-triton
networks:
- clearml-serving-backend
clearml-serving-triton:
image: allegroai/clearml-serving-triton:1.3.1-vllm
build:
context: ../
dockerfile: clearml_serving/engines/triton/Dockerfile.vllm
container_name: clearml-serving-triton
restart: unless-stopped
# optimize perforamnce
security_opt:
- seccomp:unconfined
# ports:
# - "8001:8001"
environment:
CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml}
CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml}
CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml}
CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY}
CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY}
CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-}
CLEARML_TRITON_POLL_FREQ: ${CLEARML_TRITON_POLL_FREQ:-1.0}
CLEARML_TRITON_METRIC_FREQ: ${CLEARML_TRITON_METRIC_FREQ:-1.0}
CLEARML_EXTRA_PYTHON_PACKAGES: ${CLEARML_EXTRA_PYTHON_PACKAGES:-}
AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-}
AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-}
AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION:-}
GOOGLE_APPLICATION_CREDENTIALS: ${GOOGLE_APPLICATION_CREDENTIALS:-}
AZURE_STORAGE_ACCOUNT: ${AZURE_STORAGE_ACCOUNT:-}
AZURE_STORAGE_KEY: ${AZURE_STORAGE_KEY:-}
depends_on:
- kafka
networks:
- clearml-serving-backend
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['1']
capabilities: [gpu]
clearml-serving-statistics:
image: allegroai/clearml-serving-statistics:latest
container_name: clearml-serving-statistics
restart: unless-stopped
# optimize perforamnce
security_opt:
- seccomp:unconfined
# ports:
# - "9999:9999"
environment:
CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml}
CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml}
CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml}
CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY}
CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY}
CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-}
CLEARML_DEFAULT_KAFKA_SERVE_URL: ${CLEARML_DEFAULT_KAFKA_SERVE_URL:-clearml-serving-kafka:9092}
CLEARML_SERVING_POLL_FREQ: ${CLEARML_SERVING_POLL_FREQ:-1.0}
depends_on:
- kafka
networks:
- clearml-serving-backend
networks:
clearml-serving-backend:
driver: bridge
CLEARML_WEB_HOST="[REDACTED]"
CLEARML_API_HOST="[REDACTED]"
CLEARML_FILES_HOST="s3://[REDACTED]"
CLEARML_API_ACCESS_KEY="<access_key_here>"
CLEARML_API_SECRET_KEY="<secret_key_here>"
CLEARML_SERVING_TASK_ID="<serving_service_id_here>"
CLEARML_EXTRA_PYTHON_PACKAGES="boto3"
AWS_ACCESS_KEY_ID="[REDACTED]"
AWS_SECRET_ACCESS_KEY="[REDACTED]"
AWS_DEFAULT_REGION="[REDACTED]"
FROM nvcr.io/nvidia/tritonserver:24.02-vllm-python-py3
ENV LC_ALL=C.UTF-8
COPY clearml_serving /root/clearml/clearml_serving
COPY requirements.txt /root/clearml/requirements.txt
COPY README.md /root/clearml/README.md
COPY setup.py /root/clearml/setup.py
RUN python3 -m pip install --no-cache-dir -r /root/clearml/clearml_serving/engines/triton/requirements.txt
RUN python3 -m pip install --no-cache-dir -U pip -e /root/clearml/
# default serving port
EXPOSE 8001
# environement variable to load Task from CLEARML_SERVING_TASK_ID, CLEARML_SERVING_PORT
WORKDIR /root/clearml/
ENTRYPOINT ["clearml_serving/engines/triton/entrypoint.sh"]
The inference task was successfully created after launching the serving services (clearml-serving launch --queue default
).
However, it seems that the nvdia container failed to start with the following error:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
Using a local worker without GPU attached to the public hosting clearml server.
My clearml serving deployment is stuck.
No models are registered,
clearml-serving --id 7303713271b941f7a0b45760d45208dd model list
clearml-serving - CLI for launching ClearML serving engine
List model serving and endpoints, control task id=7303713271b941f7a0b45760d45208dd
Info: syncing model endpoint configuration, state hash=d3290336c62c7fb0bc8eb4046b60bc7f
Endpoints:
{}
Model Monitoring:
{}
Canary:
{}
However, old models are still somehow there:
There is a leftover model that I am unable to remove:
Triton-Task:
2023-11-20 16:18:40
ClearML Task: created new task id=9b3460b62f9d4015890c7dd2c0064bcf
2023-11-20 15:18:40,452 - clearml.Task - INFO - No repository found, storing script code instead
ClearML results page: http://clearml-webserver:8080/projects/9b4bbac7f1c248e894793f5771005826/experiments/9b3460b62f9d4015890c7dd2c0064bcf/output/log
2023-11-20 16:18:40
configuration args: Namespace(inference_task_id=None, metric_frequency=1.0, name='triton engine', project=None, serving_id='7303713271b941f7a0b45760d45208dd', t_allow_grpc=None, t_buffer_manager_thread_count=None, t_cuda_memory_pool_byte_size=None, t_grpc_infer_allocation_pool_size=None, t_grpc_port=None, t_http_port=None, t_http_thread_count=None, t_log_verbose=None, t_min_supported_compute_capability=None, t_pinned_memory_pool_byte_size=None, update_frequency=1.0)
String Triton Helper service
{'serving_id': '7303713271b941f7a0b45760d45208dd', 'project': None, 'name': 'triton engine', 'update_frequency': 1.0, 'metric_frequency': 1.0, 'inference_task_id': None, 't_http_port': None, 't_http_thread_count': None, 't_allow_grpc': None, 't_grpc_port': None, 't_grpc_infer_allocation_pool_size': None, 't_pinned_memory_pool_byte_size': None, 't_cuda_memory_pool_byte_size': None, 't_min_supported_compute_capability': None, 't_buffer_manager_thread_count': None, 't_log_verbose': None}
Updating local model folder: /models
2023-11-20 15:18:41,106 - clearml.Model - ERROR - Action failed <400/201: models.get_by_id/v1.0 (Invalid model id (no such public or company model): id=0bbba86c98c54610a14350ba69e2e330, company=d1bd92a3b039400cbafc60a7a5b1e52b)> (model=0bbba86c98c54610a14350ba69e2e330)
2023-11-20 15:18:41,107 - clearml.Model - ERROR - Failed reloading task 0bbba86c98c54610a14350ba69e2e330
2023-11-20 15:18:41,115 - clearml.Model - ERROR - Action failed <400/201: models.get_by_id/v1.0 (Invalid model id (no such public or company model): id=0bbba86c98c54610a14350ba69e2e330, company=d1bd92a3b039400cbafc60a7a5b1e52b)> (model=0bbba86c98c54610a14350ba69e2e330)
2023-11-20 15:18:41,115 - clearml.Model - ERROR - Failed reloading task 0bbba86c98c54610a14350ba69e2e330
2023-11-20 16:18:41
Traceback (most recent call last):
File "clearml_serving/engines/triton/triton_helper.py", line 540, in <module>
main()
File "clearml_serving/engines/triton/triton_helper.py", line 532, in main
helper.maintenance_daemon(
File "clearml_serving/engines/triton/triton_helper.py", line 237, in maintenance_daemon
self.model_service_update_step(model_repository_folder=local_model_repo, verbose=True)
File "clearml_serving/engines/triton/triton_helper.py", line 146, in model_service_update_step
print("Error retrieving model ID {} []".format(model_id, model.url if model else ''))
File "/usr/local/lib/python3.8/dist-packages/clearml/model.py", line 341, in url
return self._get_base_model().uri
File "/usr/local/lib/python3.8/dist-packages/clearml/backend_interface/model.py", line 496, in uri
return self.data.uri
AttributeError: 'NoneType' object has no attribute 'uri'
How can a broken task be fixed without deploying a new serving instance?
Hey there,
I just tried launching a new serving instance as our demands are growing. A few months ago I commited a change that resolved a missing await allowing us to override the process()
method.
However, it seems like when pulling the latest docker image, this change is not reflected as no new image is pushed to docker hub. I'm not sure how often you release, but it seems like there are other changes which may not be reflected on the images. Could you please elaborate? Should I just create my own image from the updates source code...?
Describe the bug
After following the docker-compose-triton-gpu.yml
instructions for the pytorch example the server fails to spin up. The service fails due to the following error:
model_repository_manager.cc:1152] failed to load 'test_model_pytorch' version 1: Internal: unable to create stream: the provided PTX was compiled with an unsupported toolchain.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The service spins up without the model_repository_manager.cc:1152
error message.
Screenshots
n/a
Desktop (please complete the following information):
docker --version & docker-compose --version
)docker --version & docker-compose --version [1] 1611412 Docker version 20.10.16, build aa7e414 docker-compose version 1.29.2, build 5becea4c [1]+ Done docker --version
Additional context
See similar issue here: triton-inference-server/server#3877
I am trying to install clearml serving on python 3.9.
The problem seems to be related to new releases of numpy.
Here is the full stack trace:
clearml-serving create --name "serving example"
Traceback (most recent call last):
File "/Users/galleon/.pyenv/versions/maio-serving/bin/clearml-serving", line 5, in <module>
from clearml_serving.__main__ import main
File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/clearml_serving/__main__.py", line 9, in <module>
from clearml_serving.serving.model_request_processor import ModelRequestProcessor, CanaryEP
File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/clearml_serving/serving/model_request_processor.py", line 18, in <module>
from .preprocess_service import BasePreprocessRequest
File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/clearml_serving/serving/preprocess_service.py", line 247, in <module>
class TritonPreprocessRequest(BasePreprocessRequest):
File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/clearml_serving/serving/preprocess_service.py", line 253, in TritonPreprocessRequest
np.int: 'int_contents',
File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/numpy/__init__.py", line 305, in __getattr__
raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Installed packages:
pip list
Package Version
------------------ -----------
attrs 22.2.0
certifi 2022.12.7
charset-normalizer 3.1.0
clearml 1.9.3
clearml-serving 1.2.0
furl 2.1.3
idna 3.4
jsonschema 4.17.3
numpy 1.24.2
orderedmultidict 1.0.1
pathlib2 2.3.7.post1
Pillow 9.4.0
pip 23.0.1
psutil 5.9.4
PyJWT 2.4.0
pyparsing 3.0.9
pyrsistent 0.19.3
python-dateutil 2.8.2
PyYAML 6.0
requests 2.28.2
setuptools 58.1.0
six 1.16.0
urllib3 1.26.15
Hello! I am trying to use clearml-serving to serve my PyTorch pretrained model.
I deploy ClearML Server and using S3 Minio on local network to store artifacts and pretrained weights.
There is no problem with storing and getting models using Input\Output Models. Everything works correctly.
But clearml-serving (particularly the clearml-serving-triton container) don't have opportunity to work with Minio as it has not python module boto3
Using tutorial I add S3 credentials to example.env:
CLEARML_WEB_HOST=http://192.168.3.217:8080
CLEARML_API_HOST=http://192.168.3.217:8008
CLEARML_FILES_HOST=http://192.168.3.217:8081
CLEARML_API_ACCESS_KEY=CLEARML_API_ACCESS_KEY
CLEARML_API_SECRET_KEY=CLEARML_API_SECRET_KEY
CLEARML_SERVING_TASK_ID="ccfed15e442242a19338c20772562df2"
AWS_ACCESS_KEY_ID=AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY=AWS_SECRET_ACCESS_KEY
After that it doesn't work as AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
not be sent to docker-compose. I add those variables to clearml-serving-triton
container in the way to solve it:
clearml-serving-triton:
image: allegroai/clearml-serving-triton:latest
container_name: clearml-serving-triton
restart: unless-stopped
# optimize perforamnce
security_opt:
- seccomp:unconfined
# ports:
# - "8001:8001"
environment:
CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml}
CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml}
CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml}
CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY}
CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY}
CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-}
CLEARML_TRITON_POLL_FREQ: ${CLEARML_TRITON_POLL_FREQ:-1.0}
CLEARML_TRITON_METRIC_FREQ: ${CLEARML_TRITON_METRIC_FREQ:-1.0}
AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-ACCES_KEY}
AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-SECRET_ACCESS_KEY}
But after that there is an error in this container:
clearml-serving-triton | 2022-12-15 05:05:45,607 - clearml.storage - ERROR - AWS S3 storage driver (boto3) not found. Please install driver using: pip install "boto3>=1.9"
I guess that it can be fixed adding "boto3>=1.9" to the container requirements.txt here:
https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/engines/triton/requirements.txt
After doing this and building using local docker image I get the following error:
clearml-serving-triton | 2022-12-15 05:10:54,624 - clearml.storage - ERROR - Could not download s3://192.168.3.217:9000/models/test/RegNet.b04da49b696a472b94677e26762078d1/models/regnet_y_400MF.pt , err: SSL validation failed for https://192.168.3.217:9000/models/test/RegNet.b04da49b696a472b94677e26762078d1/models/regnet_y_400MF.pt [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1131)
And I don't have any ideas how to disable secure connection in this container
Hello, I see TorchServe engine support mentioned in the Readme but cannot find any way to actually use it. Is it available?
I couldn't find any backends and configurations that support Scikit-Learn models (eg. pickle format).
As Clearml is having integration with Scikit-Learn, there should be some options to serve the model.
Please add a workaround to support it.
Hi!
In the requirements.txt
file, the requests
version specifier is currently set to >=2.31.0,<2.29.0
, which seems to be a mistake. And should be >=2.29.0,<2.31.0
.
https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/serving/requirements.txt
https://github.com/allegroai/clearml-serving/tree/main/examples/pytorch
I'm running examples as per readme.md, but I get the following error.
What should I do?
{"detail":"Error processing request: <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"Request for unknown model: 'test_model_pytorch' version 1 is not at ready state\"\n\tdebug_error_string = \"{\"created\":\"@1652700912.192078289\",\"description\":\"Error received from peer ipv4:172.25.0.5:8001\",\"file\":\"src/core/lib/surface/call.cc\",\"file_line\":1069,\"grpc_message\":\"Request for unknown model: 'test_model_pytorch' version 1 is not at ready state\",\"grpc_status\":14}\"\n>"}
Endpoints appear to be normal.
Hi everyone!
I faced the problem with ClearML-serving. I've deployed onnx model from huggingface in clearml-serving, but "Error processing request: Error: Failed loading pre process code for '<>': No module named 'transformers'"
appears when trying to send a request like in example (https://github.com/allegroai/clearml-serving/tree/main/examples/huggingface).
Preprocessing file just like in example.
The transformers package has been installed with CLEARML_EXTRA_PYTHON_PACKAGES
variable in serving service deployment file.
You got any ideas?
Thanks in advance
I have created endpoint like this:
clearml-serving --id "<>" model add --engine triton --endpoint 'conformer_joint' --model-id '<>' --preprocess 'preprocess_joint.py' --aux-config "./config.pbtxt"
config.pbtxt
file:
name: "conformer_joint"
default_model_filename: "model.bin"
max_batch_size: 16
dynamic_batching {
max_queue_delay_microseconds: 100
}
input: [
{
name: "encoder_outputs"
data_type: TYPE_FP32
dims: [
1,
640
]
},
{
name: "decoder_outputs"
data_type: TYPE_FP32
dims: [
640,
1
]
}
]
output: [
{
name: "outputs"
data_type: TYPE_FP32
dims: [
129
]
}
]
preprocess_joint.py
file:
from typing import Any, Union, Optional, Callable
class Preprocess(object):
def __init__(self):
# set internal state, this will be called only once. (i.e. not per request)
pass
def preprocess(
self,
body: Union[bytes, dict],
state: dict,
collect_custom_statistics_fn: Optional[Callable[[dict], None]]
) -> Any:
return body["encoder_outputs"], body["decoder_outputs"]
def postprocess(
self,
data: Any,
state: dict,
collect_custom_statistics_fn: Optional[Callable[[dict], None]]
) -> dict:
return {"data":data.tolist()}
triton container and inference container show no errors, and I can find this triton model with right config.pbtxt
in folder /models/conformer_joint
. But when I try to make a request to model like this:
import numpy as np
import requests
body={
"encoder_outputs": [np.random.randn(1, 640).tolist()],
"decoder_outputs": [np.random.randn(640, 1).tolist()]
}
response = requests.post(f"<>/conformer_joint", json=body)
response.json()
I am getting an error:
Error processing request: object of type 'NoneType' has no len()
Model endpoint in serving task:
conformer_joint {
engine_type = "triton"
serving_url = "conformer_joint"
model_id = "<>"
preprocess_artifact = "py_code_conformer_joint"
auxiliary_cfg = """name: "conformer_joint"
default_model_filename: "model.bin"
max_batch_size: 16
dynamic_batching {
max_queue_delay_microseconds: 100
}
input: [
{
name: "encoder_outputs"
data_type: TYPE_FP32
dims: [
1,
640
]
},
{
name: "decoder_outputs"
data_type: TYPE_FP32
dims: [
640,
1
]
}
]
output: [
{
name: "outputs"
data_type: TYPE_FP32
dims: [
129
]
}
]
"""
}
Error occurs in process
function of TritonPreprocessRequest
(https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/serving/preprocess_service.py#L358C9-L358C81) because function use endpoint params like input_name
, input_type
and input_size
. When we create endpoint like above, this parameters placed in auxiliary_cfg
attribute.
Is there any chance to fix that error and create endpoint like above?
Hello, after some issues raised, there are codes add on to define triton engine args (e.g. ports, triton version) but the usage of these are not updated to README nor the --help yet.
Also it is not very clear the purpose of certain args like project name, name even after reading the --help.
Perhaps these can be updated in the README, so easier to use it.
This is also related to my failed attempt to use my own ClearML server and Triton setup with ClearML serving.
Suspect it might be due to familiarisation of these args and also might have gaps in implementation. But suggest to get the args right first, so that I can test out further.
When i try to follow examples/pytorch
the triton server is crashing, i.e. exiting with status code -6
.
this is the log from the container:
I1004 17:32:10.693691 41 grpc_server.cc:4375] Started GRPCInferenceService at 0.0.0.0:8001 │
│ I1004 17:32:10.693968 41 http_server.cc:3075] Started HTTPService at 0.0.0.0:8000 │
│ I1004 17:32:10.736035 41 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002 │
│ I1004 17:34:10.746305 41 model_repository_manager.cc:994] loading: test_model_pytorch:1 │
│ I1004 17:34:10.848495 41 libtorch.cc:1355] TRITONBACKEND_ModelInitialize: test_model_pytorch (version 1) │
│ I1004 17:34:10.852702 41 libtorch.cc:253] Optimized execution is enabled for model instance 'test_model_pytorch' │
│ I1004 17:34:10.852761 41 libtorch.cc:271] Inference Mode is disabled for model instance 'test_model_pytorch' │
│ I1004 17:34:10.852801 41 libtorch.cc:346] NvFuser is not specified for model instance 'test_model_pytorch' │
│ I1004 17:34:10.856732 41 libtorch.cc:1396] TRITONBACKEND_ModelInstanceInitialize: test_model_pytorch (device 0) │
│ terminate called after throwing an instance of 'c10::Error' │
│ what(): isTuple()INTERNAL ASSERT FAILED at "/opt/pytorch/pytorch/aten/src/ATen/core/ivalue_inl.h":1910, please report a bug to PyTorch. Expected Tuple but got String │
│ Exception raised from toTupleRef at /opt/pytorch/pytorch/aten/src/ATen/core/ivalue_inl.h:1910 (most recent call first): │
│ frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7f6caf24e11c in /opt/tritonserver/backends/pytorch/libc10.so │
│ frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x7f6caf22bcb4 in /opt/tri │
│ frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x53 (0x7f │
│ frame #3: <unknown function> + 0x368a57a (0x7f6cf239657a in /opt/tritonserver/backends/pytorch/libtorch_cpu.so) │
│ frame #4: <unknown function> + 0x368a6e9 (0x7f6cf23966e9 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so) │
│ frame #5: torch::jit::SourceRange::highlight(std::ostream&) const + 0x48 (0x7f6cefe48678 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so) │
│ frame #6: torch::jit::ErrorReport::what() const + 0x2c3 (0x7f6cefe2eeb3 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so) │
│ frame #7: <unknown function> + 0x102b9 (0x7f6cf91f92b9 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so) │
│ frame #8: <unknown function> + 0x1d4d2 (0x7f6cf92064d2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so) │
│ frame #9: <unknown function> + 0x1d9f2 (0x7f6cf92069f2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so) │
│ frame #10: TRITONBACKEND_ModelInstanceInitialize + 0x374 (0x7f6cf9206db4 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so) │
│ frame #11: <unknown function> + 0x307dee (0x7f6cfb143dee in /opt/tritonserver/bin/../lib/libtritonserver.so)
│ frame #12: <unknown function> + 0x3093b3 (0x7f6cfb1453b3 in /opt/tritonserver/bin/../lib/libtritonserver.so) │
│ frame #13: <unknown function> + 0x301067 (0x7f6cfb13d067 in /opt/tritonserver/bin/../lib/libtritonserver.so) │
│ frame #14: <unknown function> + 0x18a7ca (0x7f6cfafc67ca in /opt/tritonserver/bin/../lib/libtritonserver.so) │
│ frame #15: <unknown function> + 0x1979b1 (0x7f6cfafd39b1 in /opt/tritonserver/bin/../lib/libtritonserver.so) │
│ frame #16: <unknown function> + 0xd6de4 (0x7f6cfa991de4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) │
│ frame #17: <unknown function> + 0x9609 (0x7f6cfae0f609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0) │
│ frame #18: clone + 0x43 (0x7f6cfa67f293 in /usr/lib/x86_64-linux-gnu/libc.so.6) │
│ │
│ Signal (6) received. │
│ 0# 0x000055E2DF079299 in tritonserver │
│ 1# 0x00007F6CFA5A3210 in /usr/lib/x86_64-linux-gnu/libc.so.6 │
│ 2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6 │
│ 3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6 │
│ 4# 0x00007F6CFA959911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 │
│ 5# 0x00007F6CFA96538C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 │
│ 6# 0x00007F6CFA964369 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 │
│ 7# __gxx_personality_v0 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 │
│ 8# 0x00007F6CFA761BEF in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 │
│ 9# _Unwind_Resume in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 │
│ 10# 0x00007F6CEFA61C49 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so │
│ 11# 0x00007F6CF23966E9 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so
12# torch::jit::SourceRange::highlight(std::ostream&) const in /opt/tritonserver/backends/pytorch/libtorch_cpu.so │
│ 13# torch::jit::ErrorReport::what() const in /opt/tritonserver/backends/pytorch/libtorch_cpu.so │
│ 14# 0x00007F6CF91F92B9 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so │
│ 15# 0x00007F6CF92064D2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so │
│ 16# 0x00007F6CF92069F2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so │
│ 17# TRITONBACKEND_ModelInstanceInitialize in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so │
│ 18# 0x00007F6CFB143DEE in /opt/tritonserver/bin/../lib/libtritonserver.so │
│ 19# 0x00007F6CFB1453B3 in /opt/tritonserver/bin/../lib/libtritonserver.so │
│ 20# 0x00007F6CFB13D067 in /opt/tritonserver/bin/../lib/libtritonserver.so │
│ 21# 0x00007F6CFAFC67CA in /opt/tritonserver/bin/../lib/libtritonserver.so │
│ 22# 0x00007F6CFAFD39B1 in /opt/tritonserver/bin/../lib/libtritonserver.so │
│ 23# 0x00007F6CFA991DE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6 │
│ 24# 0x00007F6CFAE0F609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0 │
│ 25# clone in /usr/lib/x86_64-linux-gnu/libc.so.6 │
│ │
│ configuration args: Namespace(inference_task_id=None, metric_frequency=1.0, name='triton engine', project=None, serving_id='dd756abf5e8b42efab92dfb0cfa57a5e', t_allow_grpc=None, t_buffer_manager_threa │
│ String Triton Helper service │
│ {'serving_id': 'dd756abf5e8b42efab92dfb0cfa57a5e', 'project': None, 'name': 'triton engine', 'update_frequency': 1.0, 'metric_frequency': 1.0, 'inference_task_id': None, 't_http_port': None, 't_http_t │
│ │
│ Starting server: ['tritonserver', '--model-control-mode=poll', '--model-repository=/models', '--repository-poll-secs=60.0', '--metrics-port=8002', '--allow-metrics=true', '--allow-gpu-metrics=true'] │
│ Info: syncing models from main serving service │
│ reporting metrics: relative time 60 sec │
│ Info: syncing models from main serving service │
│ Updating local model folder: /models │
│ INFO: target config.pbtxt file for endpoint 'test_model_pytorch': │
│ │
│ input: [{ │
│ dims: [1, 28, 28] │
│ data_type: TYPE_FP32 │
│ name: "INPUT__0" │
│ }] │
│ output: [{ │
│ dims: [-1, 10]
data_type: TYPE_FP32 │
│ name: "OUTPUT__0" │
│ }] │
│ backend: "pytorch" │
│ │
│ Update model v1 in /models/test_model_pytorch/1 │
│ Info: Models updated from main serving service │
│ reporting metrics: relative time 120 sec │
│ Traceback (most recent call last): │
│ File "clearml_serving/engines/triton/triton_helper.py", line 515, in <module> │
│ main() │
│ File "clearml_serving/engines/triton/triton_helper.py", line 507, in main │
│ helper.maintenance_daemon( │
│ File "clearml_serving/engines/triton/triton_helper.py", line 248, in maintenance_daemon │
│ raise ValueError("triton-server process ended with error code {}".format(error_code)) │
│ ValueError: triton-server process ended with error code -6 │
│ Stream closed EOF for clearml-serving/clearml-serving-triton-85779b957d-hdx7q (clearml-serving-triton)
Hi everyone! I use command to create entrypoint:
clearml-serving --id "<>" model add --engine triton --endpoint 'conformer_joint' --model-id '<>' --preprocess 'preprocess_joint.py' --input-size '[1, 640]' '[640, 1]' --input-name 'encoder_outputs' 'decoder_outputs' --input-type float32 float32 --output-size '[100]' --output-name 'outputs' --output-type float32 --aux-config name=\"conformer_joint\" max_batch_size=16 dynamic_batching.max_queue_delay_microseconds=100 platform=\"onnxruntime_onnx\" default_model_filename=\"model.bin\"
this command creates config.pbtxt
like that (copied from logs):
name: "conformer_joint"
platform: "onnxruntime_onnx"
default_model_filename: "model.bin"
input: [{
dims: [-1, 1, 640]
data_type: TYPE_FP32
name: "encoder_outputs"
},
{
dims: [-1, 640, 1]
data_type: TYPE_FP32
name: "decoder_outputs"
}]
output: [{
dims: [-1, 129]
data_type: TYPE_FP32
name: "outputs"
}]
Logs from k8s:
I0802 22:48:17.274440 53 model_repository_manager.cc:1206] loading: conformer_joint:1
I0802 22:48:17.274536 53 onnxruntime.cc:2560] TRITONBACKEND_ModelInitialize: conformer_joint (version 1)
I0802 22:48:17.274881 53 onnxruntime.cc:666] skipping model configuration auto-complete for 'conformer_joint': inputs and outputs already specified
I0802 22:48:17.276238 53 onnxruntime.cc:2603] TRITONBACKEND_ModelInstanceInitialize: conformer_joint (GPU device 0)
I0802 22:48:17.279143 53 model_repository_manager.cc:1352] successfully loaded 'conformer_joint' version 1
And there are no error in clearml-serving. But when I'm trying to create request like that
import numpy as np
import requests
r = requests.post(f"<URL>", json={"encoder_outputs": np.random.randn(1, 1, 640).tolist(), "decoder_outputs": np.random.randn(1, 640, 1).tolist()})
r.json()
I get this:
[2023-08-02 23:02:51 +0000] [113] [ERROR] Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 152, in jsonable_encoder
data = dict(obj)
^^^^^^^^^
ValueError: dictionary update sequence element #0 has length 129; 2 is required
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 436, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 276, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
raise e
File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/root/clearml/clearml_serving/serving/main.py", line 31, in custom_route_handler
return await original_route_handler(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 255, in app
content = await serialize_response(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 152, in serialize_response
return jsonable_encoder(response_content)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 117, in jsonable_encoder
encoded_value = jsonable_encoder(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 160, in jsonable_encoder
raise ValueError(errors) from e
ValueError: [ValueError('dictionary update sequence element #0 has length 129; 2 is required'), TypeError('vars() argument must have __dict__ attribute')]
I think this is because of batch size and maybe I need to add something in config.pbtxt
. Any ideas?
Thanks in advance!
We have setup clearml serving on Kubernetes including triton support. Our triton instance has no GPU, so deploying a model leads to the following error in the triton instance:
E0718 07:41:21.083440 30 model_lifecycle.cc:596] failed to load 'distilbert-test2' version 1: Invalid argument: unable to load model 'distilbert-test2', TensorRT backend supports only GPU device
Trying to remove the model again is not possible:
clearml-serving --id 5097f44fe9cb45f7be2a917c6fe8cad9 model remove --endpoint distilbert-test2
yields the following:
`clearml-serving - CLI for launching ClearML serving engine
2023-07-18 09:47:59,260 - clearml.Task - ERROR - Failed reloading task 5097f44fe9cb45f7be2a917c6fe8cad9
2023-07-18 09:47:59,290 - clearml.Task - ERROR - Failed reloading task 5097f44fe9cb45f7be2a917c6fe8cad9
Error: Task ID "5097f44fe9cb45f7be2a917c6fe8cad9" could not be found
`
In general, our observation is that the serving is not resilient against these kind of problems. A broken model should not break the instance.
I have triton deployed in k8s. Am I able to link my serving service to this instance of triton?
Hello,
I deployed a model using clearml-serving
, but it generate inconsistent results across same HTTP requests.
To recreate:
allegroai/clearml:1.4.0
).helm
(helm repo: NAME allegroai/clearml-serving, CHART VERSION 0.4.1, APP VERSION 0.9.0).Everything goes well as the readme.md
from https://github.com/allegroai/clearml-serving/tree/main/examples/pytorch instructed.
But mysteriously, the HTTP responses are not consistent! (The MNIST model occasionally returns different "digits" from the same input image)
I'm quite confused here, and have no idea if any random process happens during the model inference.
Thanks for any help!
Hello,
I checked the pipeline example where you use custom engine, but it is not very complete. What if I want to run normal pytorch inference without any engine?
Is it also possible to implement my own Rest API (e.g. Flask) or at least have more control how I process my inferences? In your README.md, it says: Customizable RestAPI for serving (i.e. allow per model pre/post-processing for easy integration)
. How can a really customize the RestAPI?
Thanks!
Bruno
Currently once published, the model status remains as "published" and Triton will only use the latest "published" model and unload the previous version.
But this unloading of model did not align with the "published" state and can be confusing.
May I suggest to expand the function with unpublish so we can explicitly unload model versions in Triton.
This will also allow multiple versions (published) of the models to be available in Triton.
I had this issue while running the very first command
clearml-serving triton --project "serving" --name "serving example"
<class 'argparse.FileType'> is a FileType class object, instance of it must be passed
Hi there,
I have been working on deploying our inference pipeline on clearml-serving
using the docker-compose
approach. I've hashed out most of the issues thus far thanks to the community, now I am facing another issue while loading onnx
models.
I am getting the following error:
clearml-serving-triton | mmdet | UNAVAILABLE: Internal: **failed to stat file /models/mmdet/1/model.onnx**
I exec'd into the container to see what's inside /models
, and under /models/mmdet/1
there was a model.bin
but no model.onnx
. I created the modeling using the OutputModel. I also tried doing it through the CLI:
clearml-serving --id $SERVING_ID model upload --name "mmdet_cli" --project $PROJECT_NAME --path /mmdet/model.onnx
but the same thing. I'm guessing when the folder structure is getting set up, this file gets renamed to a .bin
extention. Should this be happening or am I doing something wrong?
When I download the file from the models section in the portal, it's an onnx file, exactly the one I uploaded. So not sure where this renaming is happening tbh...
Hi there, today I struggled with the (--variable-scalar) argument and the buckets in the clearml-serving metrics add
command.
I think the documentation could be improved.
I already got help in the ClearML Slack:
A scalar in buckets is simply a histogram. Because if you have 1000s of requests per second, it makes no sense to display every data point. So scalars can be divided into buckets and for each minute, for example, we can calculate how much % of total traffic fell in bucket 1, bucket 2, bucket 3, etc. Then we display this histogram as a single column in a heatmap. Y axis is the buckets, color is the value ~ % of traffic in that bucket, and X is time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.