allegroai / clearml-serving Goto Github PK

View Code? Open in Web Editor NEW

132.0 11.0 39.0 1.96 MB

ClearML - Model-Serving Orchestration and Repository Solution

Home Page: https://clear.ml

License: Apache License 2.0

Python 96.64% Dockerfile 0.99% Shell 2.37%

machine-learning mlops devops deep-learning kubernetes ai clearml model-serving serving serving-pytorch-models

clearml-serving's People

Contributors

Stargazers

Watchers

Forkers

pollfly ecm200 h4dr1en suparnaj3 manojlds vksilpa guolsnetgap okyspace fawadahmed322 ainoam thepycoder acexoul081 gaspard-bruno digitty-forks aleksandar1932 andycancado codechem edenbuaa maxvanhoucke techthiyanes hafizurcse anhngml lexuanthinh ai-jie01 besrym anon-it alsawy45 amirhmk aidinmatt3r fulankun1412 toilaluan open-ml lac2io ilyahabr galthran-wq icc-o david101-hunter ilyamescheryakov1402 stephanbertl

clearml-serving's Issues

Removing model monitoring in endpoint

From the docs, i can see that there are commands to add model to an endpoint, and also add a model monitoring via the auto-update command. I can't seem to find any commands to remove the model monitoring. I can only do model removal

Is there no such capabilities for now, or is the doc just not updated?

Take model from clearml server via API

is there any way that I can take the model from clearml server to my machine through API so I can run it locally without download it? thank you

clearml.storage - ERROR - Google cloud driver not found

Trying to call model endpoint where the model stored on GCP bucket, getting the error:

clearml.storage - ERROR - Google cloud driver not found. Please install driver using: pip install "google-cloud-storage>=1.13.2"

After installing manually it works.

We use the k8s version - installed with helm

String not supported for Triotn

Have been working on model ensemble, continuing conversations from #53, just thought it may be a better idea to create new issues for different things that I find along the way. Essentially we want the output of the model be a S3 path where all the results are saved to as a JSON.

However it doesn't seem like clearml-serving is mapping object datatype properly? Triton does support strings..

The issue lies here I believe: np_to_triton_dtype. This currently maps an object to TYPE_BYTES to be written to the config.pbtxt file (which is not a valid type as per link above), whereas it should be TYPE_STRING.

Deploying Models from Azure Blob

Models which are located on the clearML servers (created by Task.init(..., output_uri=True) ) run perfectly while models which are located on azure blob storage produce different problems in different scenarios:

start the docker container, add a model from the clearML server and afterwards add a model located on azure (on the same endpoint) -> no error, http requests are answered properly (but probably the model which was added first is used)
start the docker container with no model added and first add a model from azure -> error: test_model_pytorch': failed to open text file for read /models/test_model_pytorch/config.pbtxt: No such file or directory .
start the docker container where a model from azure was already added before -> error:

clearml-serving-triton        | Error retrieving model ID ca186e8440b84049971a0b623df36783 []
clearml-serving-triton        | Starting server: ['tritonserver', '--model-control-mode=poll', '--model-repository=/models', '--repository-poll-secs=60.0', '--metrics-port=8002', '--allow-metrics=true', '--allow-gpu-metrics=true']
clearml-serving-triton        | Traceback (most recent call last):
clearml-serving-triton        |   File "clearml_serving/engines/triton/triton_helper.py", line 540, in <module>
clearml-serving-triton        |     main()
clearml-serving-triton        |   File "clearml_serving/engines/triton/triton_helper.py", line 532, in main
clearml-serving-triton        |     helper.maintenance_daemon(
clearml-serving-triton        |   File "clearml_serving/engines/triton/triton_helper.py", line 274, in maintenance_daemon
clearml-serving-triton        |     raise ValueError("triton-server process ended with error code {}".format(error_code))
clearml-serving-triton        | ValueError: triton-server process ended with error code 1

Side note: The same problem occurs hosting the containers on windows and on linux. All azure credentials are succesfully set up as envioronment variables in 'clearml-serving-inference', 'clearml-serving-triton' and 'clearml-serving-statistics' containers.

Behavior of published model

Apologies that I may not have understood well as there are limited documentation.

From the read me:
"Notice: If we re-run our keras training example and publish a new model in the repository, the engine will automatically update to the new model."

I tested, when i first run my training but did not publish. When i start triton, this version is still avail in triton for inference. Is this correct?

I also tried after start triton with version 1, i retrain the same model with same params. The triton polling indicates no change, thus did not pull the models over. Can I ask if this is the intended behavior?

Where to find logging for preprocessing Custom model

Trying to create a custom model using Ultralytics' YoloV8, I got this message while using Postman for testing my endpoint.

header:

body payload:

{
"imgString": "base64encodedImage"
}

The preprocess input would be like this:

def preprocess(self, body: dict, state: dict, collect_custom_statistics_fn=None) -> Any:
        print(body)
        base64String = body.get("imgString")
        print(base64String)
        self._image = cv2.imdecode(np.frombuffer(base64.b64decode(base64String), np.uint8), cv2.IMREAD_COLOR)
        self._scalingH, self._scalingW = self._image.shape[0]/imgSize, self._image.shape[1]/imgSize
        data = cv2.resize(self._image, (imgSize, imgSize))
        return data

The process

def process(
            self,
            data: Any,
            state: dict,
            collect_custom_statistics_fn: Optional[Callable[[dict], None]],
    ) -> Any:  # noqa
        

        # this is where we do the heavy lifting, i.e. run our model.
        results = self._model.predict(data, imgsz = imgSize,
                                      conf = configModel["model-config"]["conf"], iou = configModel["model-config"]["iou"],
                                      save = configModel["model-config"]["save-mode"], save_conf = configModel["model-config"]["save-mode"],
                                      save_crop = configModel["model-config"]["save-mode"], save_txt = configModel["model-config"]["save-mode"],
                                      device = configModel["model-config"]["device-mode"])
        return results

and the postprocess like this.

def postprocess(self, data: Any, state: dict, collect_custom_statistics_fn=None) -> dict:
        results = data
        classes = results[0].names

        imgDict = {}
        finalDict = {}
        dictDataEntity = {}
        for boxes in results[0].boxes:
            for box in boxes:
                labelNo = int(box.cls)

                x1 = int(box.xyxy[0][0]*self._scalingW)
                y1 = int(box.xyxy[0][1]*self._scalingH)
                x2 = int(box.xyxy[0][2]*self._scalingW)
                y2 = int(box.xyxy[0][3]*self._scalingH)

                tempCrop = self._image[y1:y2, x1:x2]

                imgDict.update({labelNo:tempCrop})

        orderedDict = OrderedDict(sorted(imgDict.items()))
        for key, value in orderedDict.items():
            for classKey, classValue in classes.items(): 
                if key == classKey:
                    finalDict[classValue] = value

        img_v_resize = hconcat_resize(finalDict.values(),imgDelimiter) #
        gray_imgResize = get_grayscale(img_v_resize) # call the grayscaling function
        success, encoded_image = cv2.imencode('.jpg', gray_imgResize) # save the image in memory
        BytesImage = encoded_image.tobytes()
        a = cv2.resize(img_v_resize, (960, 540))
        #cv2.imwrite("test.jpg", gray_imgResize)

        text_response = get_text_response_from_path(BytesImage)

        #========== POST PROCESSING ================#
        dataEntity = text_response[0].description.strip() # show only the description info from gvision
        a = [i.split("\n") for i in dataEntity.split('PEMISAH') if i]
        

        value = []
        value.clear()
        for i in a:
            c = [d for d in i if d]
            listToStr = ' '.join([str(elem) for elem in c])
            stripListToStr = listToStr.strip()
            value.append(stripListToStr)

        i = 0

        for entity in classes.values():
            dictDataEntity[entity] = value[i]
            i+=1
            if len(value) == i:
                break

        for label in classes.values():
            if label not in dictDataEntity.keys():
                dictDataEntity[label] = "-"

        return dict(predict=dictDataEntity.tolist())

the problem is I want to check the logging to find which codes having problems from that and I can't find where the log is for preprocessing. because I'm pretty sure the problem is one of my codes but I cant find from what line it is. or is there any way to write the log in the docker log or terminal. Thanks

Triton inference server docker container deployment fails due to ports conflict

I've been following the example on Keras, but using a PyTorch model.
I have setup a serving instance with the following command:

clearml-serving triton --project "Caltech Birds/Deployment" --name "ResNet34 Serving"

I then added the model endpoint and the model ID of the model to be served:

clearml-serving triton --endpoint "resent34_cub200" --model-id "57ed24c1011346d292ecc9e797ccb47e"

The model was trained using an experiment script which included the generation of a config.pbtxt configuration file at the time of completion of model training. This was connected to the experiment configuration as per the Keras example, and resulted in the following configuration being added to the experiment:

            platform: "pytorch_libtorch"
            input [
                {
                    name: "input_layer"
                    data_type: TYPE_FP32
                    dims: [ 3, 224, 224 ]
                }
            ]
            output [
                {
                    name: "fc"
                    data_type: TYPE_FP32
                    dims: [ 200 ]
                }
            ]

I then created a queue on a GPU compute node (as the model requires GPU resource):

clearml-agent daemon --queue default --gpus all --detached --docker

The serving endpoint is then started with the following command:

clearml-serving launch -queue default

I can see two items in my deployment sub-project, the service I created, and a triton serving engine inference object.

On execution, the triton serving engine inference fails with the following errors:

2021-06-08 16:28:49
task f2fbb3218e8243be9f6ab37badbb4856 pulled from 2c28e5db27e24f348e1ff06ba93e80c5 by worker ecm-clearml-compute-gpu-002:0
2021-06-08 16:28:49
Running Task f2fbb3218e8243be9f6ab37badbb4856 inside docker: nvcr.io/nvidia/tritonserver:21.03-py3 arguments: ['--ipc=host', '-p', '8000:8000', '-p', '8001:8001', '-p', '8002:8002']
2021-06-08 16:28:50
Executing: ['docker', 'run', '-t', '--gpus', 'all', '--ipc=host', '-p', '8000:8000', '-p', '8001:8001', '-p', '8002:8002', '-e', 'CLEARML_WORKER_ID=ecm-clearml-compute-gpu-002:0', '-e', 'CLEARML_DOCKER_IMAGE=nvcr.io/nvidia/tritonserver:21.03-py3 --ipc=host -p 8000:8000 -p 8001:8001 -p 8002:8002', '-v', '/tmp/.clearml_agent.ft8vulpe.cfg:/root/clearml.conf', '-v', '/tmp/clearml_agent.ssh.j9b8arhf:/root/.ssh', '-v', '/home/edmorris/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/edmorris/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/edmorris/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/edmorris/.clearml/cache:/clearml_agent_cache', '-v', '/home/edmorris/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', 'nvcr.io/nvidia/tritonserver:21.03-py3', 'bash', '-c', 'apt-get update ; apt-get install -y git ; . /opt/conda/etc/profile.d/conda.sh ; conda activate base ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring  --id f2fbb3218e8243be9f6ab37badbb4856']
2021-06-08 16:28:55
docker: Error response from daemon: driver failed programming external connectivity on endpoint wonderful_galileo (0c2feca5684f2f71b11fa1e8da4550d42b23c456e52ba0069d0aae64cd75f55b): Error starting userland proxy: listen tcp4 0.0.0.0:8001: bind: address already in use.
2021-06-08 16:28:55
Process failed, exit code 125

This could be related to the parameters of the Triton docker container, which includes both ipc=host and specific port mapping '-p', '8000:8000'. This appears to be specified in the ServingService.launch_service() method from the ServingService Class from the clearml-serving package would appear to have both been specified as hard coded for the Triton docker container:

def launch_engine(self, queue_name, queue_id=None, verbose=True):
        # type: (Optional[str], Optional[str], bool) -> None
        """
        Launch serving engine on a specific queue
        :param queue_name: Queue name to launch the engine service running the inference on.
        :param queue_id: specify queue id (unique stand stable) instead of queue_name
        :param verbose: If True print progress to console
        """
        # todo: add more engines
        if self._engine_type == 'triton':
            # create the serving engine Task
            engine_task = Task.create(
                project_name=self._task.get_project_name(),
                task_name="triton serving engine",
                task_type=Task.TaskTypes.inference,
                repo="https://github.com/allegroai/clearml-serving.git",
                branch="main",
                commit="ad049c51c146e9b7852f87e2f040e97d88848a1f",
                script="clearml_serving/triton_helper.py",
                working_directory=".",
                docker="nvcr.io/nvidia/tritonserver:21.03-py3 --ipc=host -p 8000:8000 -p 8001:8001 -p 8002:8002",
                argparse_args=[('serving_id', self._task.id), ],
                add_task_init_call=False,
            )
            if verbose:
                print('Launching engine {} on queue {}'.format(self._engine_type, queue_id or queue_name))
            engine_task.enqueue(task=engine_task, queue_name=queue_name, queue_id=queue_id)

Configure options for gRPC (Triton server)

Hello! I am trying to play around with the configs for gRPC for the triton server.

I’m using the docker-compose setup, so not sure if the CLI will work for my usecase (perhaps passing them as env variables would work?)

For instance, I’d like to set some variables like this:

('grpc.max_send_message_length', 512 * 1024 * 1024), ('grpc.max_receive_message_length', 512 * 1024 * 1024)]

Is this possible currently? I’m getting an error from gRPC that my payload is more than the limit (8MB instead of 4MB…)

Triton cannot find downloaded model

Following commit b5f5d72 , the fixes regarding the container arguments and the cloud service python SDK's were resolved, howeve the Triton server still cannot find the downloaded model from Azure Blob Store locally.

This is because the name of the file is inherited from the Azure filename, rather than the expected "model.pt" that Triton is looking for. The model is placed in the correct folder structure, just not the correct name.

I successfully resolved this in my fork, by placing the following at the end of the triton_model_service_update_step method of the ServingService class.

new_target_path = Path(os.path.join(target_path.parent),'model.pt')
shutil.move(target_path.as_posix(), new_target_path.as_posix())

Issue with pytorch preprocess code

Hi, I have encountered error stating that model was expecting input [1 28 28] but given [1 784] when trying out the pytorch example. I think it is due to the flatten() of the array before return by the preprocess method.

Can I also ask

How do we update the preprocess code to the same created endpoint using command line / codes?
When we create the endpoint with the preprocess code, the code preprocess.py is stored in the clearml server. Does the inference container periodically pull from clearml server or the clearml server will push to the inference container upon any update? May I know where to access this codes that manage this behavior to better understand what's going behind this?

Thanks.

Triton Engine did not auto update to new model after retraining

I saw this line in the readme:
"Notice: If we re-run our keras training example and publish a new model in the repository, the engine will automatically update to the new model."

I have a created a serving service. However, when i retrain my model with same project and task name, and publish the model once training is done, the model version deployed in triton is not updated.

May I know if this is a bug or did I misunderstand some steps?

Error triton helper file

Hello,
I'm trying clearml serving with nvidia triton. I'm having trouble with the above error which is FileNotFoundError: [WinError 2] The system cannot find the file specified. Can someone help me?

Set Triton version

Hi, currently when I use clearml serving, when deploying serving triton service, it is always version 21.03. Is there a way that I can config or set to 21.05? I need some feature from 21.05.

Endpoint authorization with API key

Hello,

I have a multi-tenant application and I would like to control who have access to each endpoint with API Keys. That is still a bit unclear for me. How can I authorize users before they consume some endpoint?

This question also extends to serving engines in general, like TorchServe. How people are normally controlling access to the inference APIs?

Thanks,
Bruno

Triton inference server fails to load checkpointed PyTorch Ignite model

The Triton server is now able to find the local copy of the model weight pt file and attempts to serve it, following fixes in #3.

The following error occurs when the model is served by the Triton Inference server:

Starting Task Execution:

clearml-serving - Nvidia Triton Engine Helper
ClearML results page: https://clearml-server.westeurope.cloudapp.azure.com/projects/779be4f4d83541d786eb839bb062fa93/experiments/364c73e36a454842a314169d78514034/output/log
String Triton Helper service
{'serving_id': 'b978817fa0544b94b2015b420a96f14c', 'project': 'serving', 'name': 'nvidia-triton', 'update_frequency': 10, 'metric_frequency': 1, 't_http_port': None, 't_http_thread_count': None, 't_allow_grpc': None, 't_grpc_port': None, 't_grpc_infer_allocation_pool_size': None, 't_pinned_memory_pool_byte_size': None, 't_cuda_memory_pool_byte_size': None, 't_min_supported_compute_capability': None, 't_buffer_manager_thread_count': None}

Updating local model folder: /models
[INFO]:: URL: cub200_resnet34 Endpoint: ServingService.EndPoint(serving_url='cub200_resnet34', model_ids=['57ed24c1011346d292ecc9e797ccb47e'], model_project=None, model_name=None, model_tags=None, model_config_blob='\n            platform: "pytorch_libtorch"\n            input [\n                {\n                    name: "input_layer"\n                    data_type: TYPE_FP32\n                    dims: [ 3, 224, 224 ]\n                }\n            ]\n            output [\n                {\n                    name: "fc"\n                    data_type: TYPE_FP32\n                    dims: [ 200 ]\n                }\n            ]\n        ', max_num_revisions=None, versions=OrderedDict())
[INFO]:: Model ID: 57ed24c1011346d292ecc9e797ccb47e Version: 1
[INFO]:: Model ID: 57ed24c1011346d292ecc9e797ccb47e Model URL: azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,447 - clearml.storage - INFO - Downloading: 5.00MB / 81.72MB &#64; 18.80MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,730 - clearml.storage - INFO - Downloading: 13.00MB / 81.72MB &#64; 28.29MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,741 - clearml.storage - INFO - Downloading: 21.00MB / 81.72MB &#64; 684.91MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,760 - clearml.storage - INFO - Downloading: 29.00MB / 81.72MB &#64; 426.19MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,791 - clearml.storage - INFO - Downloading: 37.00MB / 81.72MB &#64; 258.86MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,806 - clearml.storage - INFO - Downloading: 45.00MB / 81.72MB &#64; 535.17MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,907 - clearml.storage - INFO - Downloading: 53.00MB / 81.72MB &#64; 79.03MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,963 - clearml.storage - INFO - Downloading: 61.72MB / 81.72MB &#64; 155.64MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,968 - clearml.storage - INFO - Downloading: 69.72MB / 81.72MB &#64; 1502.19MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,979 - clearml.storage - INFO - Downloading: 77.72MB / 81.72MB &#64; 790.76MBs from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt
2021-06-10 15:20:54,985 - clearml.storage - INFO - Downloaded 81.72 MB successfully from azure://clearmllibrary/artefacts/Caltech Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt , saved to /clearml_agent_cache/storage_manager/global/e38f6052e6b887337635fc2821a6b5d4.cub200_resnet34_ignite_best_model_0.pt
[INFO] Local path to the model: /clearml_agent_cache/storage_manager/global/e38f6052e6b887337635fc2821a6b5d4.cub200_resnet34_ignite_best_model_0.pt
Update model v1 in /models/cub200_resnet34/1
[INFO] Target Path:: /models/cub200_resnet34/1/e38f6052e6b887337635fc2821a6b5d4.cub200_resnet34_ignite_best_model_0.pt
[INFO] Local Path:: /clearml_agent_cache/storage_manager/global/e38f6052e6b887337635fc2821a6b5d4.cub200_resnet34_ignite_best_model_0.pt
[INFO] New Target Path:: /models/cub200_resnet34/1/model.pt
Starting server: ['tritonserver', '--model-control-mode=poll', '--model-repository=/models', '--repository-poll-secs=600.0', '--metrics-port=8002', '--allow-metrics=true', '--allow-gpu-metrics=true']
I0610 15:20:55.182775 671 metrics.cc:221] Collecting metrics for GPU 0: Tesla P40
I0610 15:20:55.498654 671 libtorch.cc:940] TRITONBACKEND_Initialize: pytorch
I0610 15:20:55.498688 671 libtorch.cc:950] Triton TRITONBACKEND API version: 1.0
I0610 15:20:55.498699 671 libtorch.cc:956] 'pytorch' TRITONBACKEND API version: 1.0
2021-06-10 15:20:55.688775: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0610 15:20:55.729429 671 tensorflow.cc:1880] TRITONBACKEND_Initialize: tensorflow
I0610 15:20:55.729458 671 tensorflow.cc:1890] Triton TRITONBACKEND API version: 1.0
I0610 15:20:55.729464 671 tensorflow.cc:1896] 'tensorflow' TRITONBACKEND API version: 1.0
I0610 15:20:55.729473 671 tensorflow.cc:1920] backend configuration:
{}
I0610 15:20:55.731061 671 onnxruntime.cc:1728] TRITONBACKEND_Initialize: onnxruntime
I0610 15:20:55.731085 671 onnxruntime.cc:1738] Triton TRITONBACKEND API version: 1.0
I0610 15:20:55.731095 671 onnxruntime.cc:1744] 'onnxruntime' TRITONBACKEND API version: 1.0
I0610 15:20:55.756821 671 openvino.cc:1166] TRITONBACKEND_Initialize: openvino
I0610 15:20:55.756848 671 openvino.cc:1176] Triton TRITONBACKEND API version: 1.0
I0610 15:20:55.756854 671 openvino.cc:1182] 'openvino' TRITONBACKEND API version: 1.0
I0610 15:20:56.081773 671 pinned_memory_manager.cc:205] Pinned memory pool is created at '0x7f229c000000' with size 268435456
I0610 15:20:56.082099 671 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 67108864
I0610 15:20:56.083854 671 model_repository_manager.cc:1065] loading: cub200_resnet34:1
I0610 15:20:56.184287 671 libtorch.cc:989] TRITONBACKEND_ModelInitialize: cub200_resnet34 (version 1)
I0610 15:20:56.185272 671 libtorch.cc:1030] TRITONBACKEND_ModelInstanceInitialize: cub200_resnet34 (device 0)

1623338462128 ecm-clearml-compute-gpu-002:gpuall DEBUG I0610 15:20:59.633139 671 libtorch.cc:1063] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0610 15:20:59.633184 671 libtorch.cc:1012] TRITONBACKEND_ModelFinalize: delete model state
E0610 15:20:59.633206 671 model_repository_manager.cc:1242] failed to load 'cub200_resnet34' version 1: Internal: failed to load model 'cub200_resnet34': [enforce fail at inline_container.cc:227] . file not found: archive/constants.pkl
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void const*) + 0x68 (0x7f23c6279498 in /opt/tritonserver/backends/pytorch/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::getRecordID(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xda (0x7f23a1a23d4a in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #2: caffe2::serialize::PyTorchStreamReader::getRecord(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x38 (0x7f23a1a23da8 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #3: torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&) + 0xab (0x7f23a323508b in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #4: <unknown function> + 0x3c035e5 (0x7f23a32355e5 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #5: <unknown function> + 0x3c05fd0 (0x7f23a3237fd0 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #6: torch::jit::load(std::shared_ptr<caffe2::serialize::ReadAdapterInterface>, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x1ab (0x7f23a32391eb in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #7: torch::jit::load(std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0xc2 (0x7f23a323b332 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #8: torch::jit::load(std::istream&, c10::optional<c10::Device>) + 0x6a (0x7f23a323b41a in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #9: <unknown function> + 0x104a6 (0x7f23c67d44a6 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #10: <unknown function> + 0x12ac4 (0x7f23c67d6ac4 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #11: <unknown function> + 0x13772 (0x7f23c67d7772 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #12: TRITONBACKEND_ModelInstanceInitialize + 0x374 (0x7f23c67d7b34 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #13: <unknown function> + 0x2f8a99 (0x7f24104a8a99 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #14: <unknown function> + 0x2f927c (0x7f24104a927c in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #15: <unknown function> + 0x2f77ec (0x7f24104a77ec in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #16: <unknown function> + 0x183c00 (0x7f2410333c00 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #17: <unknown function> + 0x191581 (0x7f2410341581 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #18: <unknown function> + 0xd6d84 (0x7f240fcead84 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #19: <unknown function> + 0x9609 (0x7f2410185609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #20: clone + 0x43 (0x7f240f9d8293 in /lib/x86_64-linux-gnu/libc.so.6)

I0610 15:20:59.633540 671 server.cc:500] 
+-----------------...</char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char></char>

Originally posted by @ecm200 in #3 (comment)

Incorrect shape size PyTorch

Was following the tutorial for PyTorch, was able to create an endpoint successfully, but wasn't able to get an inference result (using both ways) due to a shape mismatch error.

unexpected shape for input 'INPUT__0' for model 'test_model_pytorch'. Expected [1,28,28], got [1,784]

I then tried setting the INPUT__0 shape to 1 784 but that didn't work either. Then realized that the preprocess function in preprocess.py flattens the data before returning it which was causing the error. Removing the flatten() resolved my issue.

This also seems to be the case for the keras example.

[ Setup/examples ] Initial Installation Issues - docker compose errors

Hello clearml team,
Congrats on the release of clearml-serving V2 🎉

I really wanted to check it out, and I'm having difficulties running the basic setup and scikit-learn example commands on my side.
I want to run the Installation and the Toy model (scikit learn) deployment example

I have a self-hosted clearml Server built with the helm chart on Kubernetes.

The environment variables of clearml-serving/docker/docker-compose.yml where defined in the myexemple.env file, and starts like this :

CLEARML_WEB_HOST="<http://localhost:8080/>"
CLEARML_API_HOST="<http://localhost:8008/>"
CLEARML_FILES_HOST="<http://localhost:8081/>"

Upon running docker-compose , both clearml-serving-inference and clearml-serving-statistics return errors:

Retrying (Retry(total=236, connect=236, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4065110310>: Failed to establish a new connection: [Errno 111] Connection refused')': /auth.login

I think the issue comes from the communication with the Kafka service, but I do not know how to solve this.
Has anyone encountered this issue and solved it before, since it's the default installation on the doc ?

Haven't found any related issues on any of the GitHub repos
Thanks for the help 🤖

Serving autoscaling strategy

Hello, ClearML team!

I'm trying to understand how serving auto-scaling works.

From readme:

Scalable
Multi model per container
Multi models per serving service
Multi-service support (fully seperated multiple serving service running independently)
Multi cluster support
Out-of-the-box node auto-scaling based on load/usage <---- *

I found that serving has ability to auto-scale, but in helm charts (triton for example) I only found replicas: 1 and didn't find an auto-scale implementation anywhere (like this https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/, for example).

Could you please clarify Clearml-serving scaling strategy and where can I find configuration files?

Thanks in advance.

clearml-serving not working with newer numpy 1.24

I am unable to use clearml-serving for model deployment on my setup.

OS: Ubuntu 22.04 Server LTS
Python: 3.10.6

Steps:

pip install clearml-serving
clearml-serving create --name "serving example"

I get the following error:

Traceback (most recent call last):
  File "/home/user_65s/.local/bin/clearml-serving", line 5, in <module>
    from clearml_serving.__main__ import main
  File "/home/user_65s/.local/lib/python3.10/site-packages/clearml_serving/__main__.py", line 9, in <module>
    from clearml_serving.serving.model_request_processor import ModelRequestProcessor, CanaryEP
  File "/home/user_65s/.local/lib/python3.10/site-packages/clearml_serving/serving/model_request_processor.py", line 18, in <module>
    from .preprocess_service import BasePreprocessRequest
  File "/home/user_65s/.local/lib/python3.10/site-packages/clearml_serving/serving/preprocess_service.py", line 247, in <module>
    class TritonPreprocessRequest(BasePreprocessRequest):
  File "/home/user_65s/.local/lib/python3.10/site-packages/clearml_serving/serving/preprocess_service.py", line 253, in TritonPreprocessRequest
    np.int: 'int_contents',
  File "/home/user_65s/.local/lib/python3.10/site-packages/numpy/__init__.py", line 284, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'int'. Did you mean: 'inf'?

It appears you are using np.int internally which has been deprecated since numpy 1.20:

1: DeprecationWarning: np.int is a deprecated alias for the builtin int. To silence this warning, use int by itself. Doing this will not modify any behavior and is safe. When replacing np.int, you may wish to use e.g. np.int64 or np.int32 to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

When I downgrade to numpy==1.23.* it works.

Shared functions across preprocess file

Hello. I have some helper functions that are shared acros the preprocess.py files, so I'd like to refactor them. However, I'm not sure where I can put them, and how to import them? The pythonpath seems to be /root/clearml, but I can't find any of the files when I start browing there inside the inference Docker container.

Any insights?

ClearML serving design v2

ClearML serving design document v2.0

Goal: Create a simple interface to serve multiple models with scalable serving engines on top of Kubernetes

Design Diagram (edit here)

Features

Fully continuous model upgrade/configuration capabilities
Separate pre/post processing from model inference (serving engine)
Support custom python script per endpoint (pre/post processing)
Support multiple model inference serving engine instances
Support A/B/Canary testing per Endpoint (i.e. test new versions of the model with probability distribution)
Support model monitoring functions
Support for 3rd party monitoring plugins
Abstract Serving Engine interface
REST API with serving engine
gRPC interface between pre-processing python code and model inference
- More efficient encoding than json encode/decode (both compute and network)
Performance (i.e. global latency / throughput and model inference latency) logging
- Optional custom metric reporting
Standalone setup for debugging
- Pre-process (proxy) code (running on host machine) (launching the “Model Inference”)
- Model inference (serving engine) inside local container
Deployment support for Kubernetes
- Proxy container (with pre-processing code) has kubectl control
- Serving engine container (model inference) launched by the proxy container
Autoscaling inference model engines based on latency

Modules

ClearML serving container
- Singleton instance, acting as the proxy & load balancer
ClearML serving Task
- Stores configuration of a single instance of the Serving container
  - 3rd party plugins
  - Kubernetes config
  - Serving Engine configuration
  - Models / Endpoints
Serving Engine
- Standalone container interacting with the ClearML serving instance
- ClearML Sidecar configuring the Serving Engine (real-time) & sending reports back
ClearML model repository
- Unique ID per model
- Links to model files
- Links to modler pre/post processing code base (git)
- Supports Tags / Name
- General purpose key/value meta-data
- Queryable
Configuration CLI
- Build containers
- Configure serving system

Usage Example

CLI configuring the ClearML serving Task
- Select initial set of models / endpoints (i.e. endpoint for specific model)
- Set Kubernetes pod template YAML
  - Job YAML to be used for launching the serving engine container
CLI build Kubernetes Job YAML
- Build the Kubernetes Job YAML to be used to launch the ClearML serving container
- Add necessary credentials making sure the “ClearML serving container” will be able to launch serving containers
Kubectl launching the “ClearML serving container”
- The “ClearML serving container” will be launching the serving engine containers
Once “ClearML serving container” is up, logs are monitored in the ClearML UI
Add additional models to a running “ClearML serving container”
- Provide the “ClearML serving Task”
- Add/Remove new model UID

Inconsistent argument syntax in clearml-serving client

Just noticed that the output type argument have a different syntax depending on which clearml-serving model command is run:

clearml-serving --id xxxxxxxxx model auto-update [...] --output-type float32

Returns an error:

clearml-serving: error: unrecognized arguments: --output-type float32

but it works with --output_type.

If you run the command clearml-serving model add, is the other way around: the argument --output_type throws an error, --output-type works just fine.

Multiple TensorRT handling(plan file per gpu)

I have a question regarding the use of multiple TensorRT engines and how ClearML addresses this issue. As you may know, TensorRT plan files need to be optimized based on the compute capability of each GPU. Consequently, each GPU requires a distinct plan file. Triton addresses this by introducing a variable named cc_model_filenames in config.pbtxt, where we specify which model will be used for each GPU, based on the compute capability. However, in ClearML, and specifically within triton_helper.py, it seems that any plan file is renamed to model.plan. This approach appears to be problematic in cases where different GPUs are used. For example, in my configuration, I have:

model-repository
       | -------- Resnet50
                      | -------- config.pbtxt
                      | -------- 1
                                 | -------- resnet50_T4.plan
                                 | -------- resnet50_A100.plan

And my config.pbtxt looks like this:

cc_model_filenames [
  {
    key: "7.5"
    value: "resnet50_T4.plan"
  },
  {
    key: "8.0"
    value: "resnet50_A100.plan"
  }
]

Given the code written in triton_helper.py, is it possible to manage multiple models?

Unable to canonicalize address from Kafka

Getting the following error when trying to deploy to ECS using docker-compose in the Kafka service:

Unable to canonicalize address clearml-serving-zookeeper:2181 because it's not resolvable

Wondering, why are the ports commented out in the docker-compose file?

The zookeeper service seemed to be up and running on the ECS console.

Thanks!

Could not download model in triton container

Hello!

I use ClearML free (the one without configuration vault stuff) + clearml-serving module

When I spinned docker-compose and tried to pull model from our s3, I've got an error in tritonserver container:

2024-03-13 11:26:56,913 - clearml.storage - WARNING - Failed getting object size: ClientError('An error occurred (403) when calling the HeadObject operation: Forbidden')
2024-03-13 14:26:57
2024-03-13 11:26:57,042 - clearml.storage - ERROR - Could not download s3://<BUCKET>/<FOLDER>/<PROJECT>/<TASK_NAME>.75654091e56141199c9d9594305d6872/models/model_package.zip , err: An error occurred (403) when calling the HeadObject operation: Forbidden

But I've set env variables in example.env (AWS_ ones too) and I could find them in tritonserver container via

$ env | grep CLEARML
$ env | grep AWS

FILES

docker-compose-triton-gpu.yaml

version: "3"

services:
  zookeeper:
    image: bitnami/zookeeper:3.7.0
    container_name: clearml-serving-zookeeper
    # ports:
      # - "2181:2181"
    environment:
      - ALLOW_ANONYMOUS_LOGIN=yes
    networks:
      - clearml-serving-backend

  kafka:
    image: bitnami/kafka:3.1.1
    container_name: clearml-serving-kafka
    # ports:
      # - "9092:9092"
    environment:
      - KAFKA_BROKER_ID=1
      - KAFKA_CFG_LISTENERS=PLAINTEXT://clearml-serving-kafka:9092
      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://clearml-serving-kafka:9092
      - KAFKA_CFG_ZOOKEEPER_CONNECT=clearml-serving-zookeeper:2181
      - ALLOW_PLAINTEXT_LISTENER=yes
      - KAFKA_CREATE_TOPICS="topic_test:1:1"
    depends_on:
      - zookeeper
    networks:
      - clearml-serving-backend

  prometheus:
    image: prom/prometheus:v2.34.0
    container_name: clearml-serving-prometheus
    volumes:
      - ./prometheus.yml:/prometheus.yml
    command:
      - '--config.file=/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--storage.tsdb.retention.time=200h'
      - '--web.enable-lifecycle'
    restart: unless-stopped
    # ports:
      # - "9090:9090"
    depends_on:
      - clearml-serving-statistics
    networks:
      - clearml-serving-backend

  alertmanager:
    image: prom/alertmanager:v0.23.0
    container_name: clearml-serving-alertmanager
    restart: unless-stopped
    # ports:
      # - "9093:9093"
    depends_on:
      - prometheus
      - grafana
    networks:
      - clearml-serving-backend

  grafana:
    image: grafana/grafana:8.4.4-ubuntu
    container_name: clearml-serving-grafana
    volumes:
      - './datasource.yml:/etc/grafana/provisioning/datasources/datasource.yaml'
    restart: unless-stopped
    ports:
      - "3001:3000"
    depends_on:
      - prometheus
    networks:
      - clearml-serving-backend


  clearml-serving-inference:
    image: allegroai/clearml-serving-inference:1.3.1-vllm
    build:
      context: ../
      dockerfile: clearml_serving/serving/Dockerfile
    container_name: clearml-serving-inference
    restart: unless-stopped
    # optimize perforamnce
    security_opt:
      - seccomp:unconfined
    ports:
      - "8080:8080"
    environment:
      CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml}
      CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml}
      CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml}
      CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY}
      CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY}
      CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-}
      CLEARML_SERVING_PORT: ${CLEARML_SERVING_PORT:-8080}
      CLEARML_SERVING_POLL_FREQ: ${CLEARML_SERVING_POLL_FREQ:-1.0}
      CLEARML_DEFAULT_BASE_SERVE_URL: ${CLEARML_DEFAULT_BASE_SERVE_URL:-http://127.0.0.1:8080/serve}
      CLEARML_DEFAULT_KAFKA_SERVE_URL: ${CLEARML_DEFAULT_KAFKA_SERVE_URL:-clearml-serving-kafka:9092}
      CLEARML_DEFAULT_TRITON_GRPC_ADDR: ${CLEARML_DEFAULT_TRITON_GRPC_ADDR:-clearml-serving-triton:8001}
      CLEARML_USE_GUNICORN: ${CLEARML_USE_GUNICORN:-}
      CLEARML_SERVING_NUM_PROCESS: ${CLEARML_SERVING_NUM_PROCESS:-}
      CLEARML_EXTRA_PYTHON_PACKAGES: ${CLEARML_EXTRA_PYTHON_PACKAGES:-}
      AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-}
      AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-}
      AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION:-}
      GOOGLE_APPLICATION_CREDENTIALS: ${GOOGLE_APPLICATION_CREDENTIALS:-}
      AZURE_STORAGE_ACCOUNT: ${AZURE_STORAGE_ACCOUNT:-}
      AZURE_STORAGE_KEY: ${AZURE_STORAGE_KEY:-}
    depends_on:
      - kafka
      - clearml-serving-triton
    networks:
      - clearml-serving-backend

  clearml-serving-triton:
    image: allegroai/clearml-serving-triton:1.3.1-vllm
    build:
      context: ../
      dockerfile: clearml_serving/engines/triton/Dockerfile.vllm
    container_name: clearml-serving-triton
    restart: unless-stopped
    # optimize perforamnce
    security_opt:
      - seccomp:unconfined
    # ports:
      # - "8001:8001"
    environment:
      CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml}
      CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml}
      CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml}
      CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY}
      CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY}
      CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-}
      CLEARML_TRITON_POLL_FREQ: ${CLEARML_TRITON_POLL_FREQ:-1.0}
      CLEARML_TRITON_METRIC_FREQ: ${CLEARML_TRITON_METRIC_FREQ:-1.0}
      CLEARML_EXTRA_PYTHON_PACKAGES: ${CLEARML_EXTRA_PYTHON_PACKAGES:-}      
      AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-}
      AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-}
      AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION:-}
      GOOGLE_APPLICATION_CREDENTIALS: ${GOOGLE_APPLICATION_CREDENTIALS:-}
      AZURE_STORAGE_ACCOUNT: ${AZURE_STORAGE_ACCOUNT:-}
      AZURE_STORAGE_KEY: ${AZURE_STORAGE_KEY:-}
    depends_on:
      - kafka
    networks:
      - clearml-serving-backend
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['1']
              capabilities: [gpu]

  clearml-serving-statistics:
    image: allegroai/clearml-serving-statistics:latest
    container_name: clearml-serving-statistics
    restart: unless-stopped
    # optimize perforamnce
    security_opt:
      - seccomp:unconfined
    # ports:
      # - "9999:9999"
    environment:
      CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml}
      CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml}
      CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml}
      CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY}
      CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY}
      CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-}
      CLEARML_DEFAULT_KAFKA_SERVE_URL: ${CLEARML_DEFAULT_KAFKA_SERVE_URL:-clearml-serving-kafka:9092}
      CLEARML_SERVING_POLL_FREQ: ${CLEARML_SERVING_POLL_FREQ:-1.0}
    depends_on:
      - kafka
    networks:
      - clearml-serving-backend


networks:
  clearml-serving-backend:
    driver: bridge

example.env

CLEARML_WEB_HOST="[REDACTED]"
CLEARML_API_HOST="[REDACTED]"
CLEARML_FILES_HOST="s3://[REDACTED]"
CLEARML_API_ACCESS_KEY="<access_key_here>"
CLEARML_API_SECRET_KEY="<secret_key_here>"
CLEARML_SERVING_TASK_ID="<serving_service_id_here>"
CLEARML_EXTRA_PYTHON_PACKAGES="boto3"
AWS_ACCESS_KEY_ID="[REDACTED]"
AWS_SECRET_ACCESS_KEY="[REDACTED]"
AWS_DEFAULT_REGION="[REDACTED]"

Dockerfile.vllm:

FROM nvcr.io/nvidia/tritonserver:24.02-vllm-python-py3


ENV LC_ALL=C.UTF-8

COPY clearml_serving /root/clearml/clearml_serving
COPY requirements.txt /root/clearml/requirements.txt
COPY README.md /root/clearml/README.md
COPY setup.py /root/clearml/setup.py

RUN python3 -m pip install --no-cache-dir -r /root/clearml/clearml_serving/engines/triton/requirements.txt
RUN python3 -m pip install --no-cache-dir -U pip -e /root/clearml/

# default serving port
EXPOSE 8001

# environement variable to load Task from CLEARML_SERVING_TASK_ID, CLEARML_SERVING_PORT

WORKDIR /root/clearml/
ENTRYPOINT ["clearml_serving/engines/triton/entrypoint.sh"]

Triton docker container failed to start due to unknown error

The inference task was successfully created after launching the serving services (clearml-serving launch --queue default).

However, it seems that the nvdia container failed to start with the following error:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

Using a local worker without GPU attached to the public hosting clearml server.

serving stuck because of deleted model

My clearml serving deployment is stuck.

No models are registered,

clearml-serving --id 7303713271b941f7a0b45760d45208dd model list
clearml-serving - CLI for launching ClearML serving engine
List model serving and endpoints, control task id=7303713271b941f7a0b45760d45208dd
Info: syncing model endpoint configuration, state hash=d3290336c62c7fb0bc8eb4046b60bc7f
Endpoints:
{}
Model Monitoring:
{}
Canary:
{}

However, old models are still somehow there:

serving-task:

There is a leftover model that I am unable to remove:

Triton-Task:

2023-11-20 16:18:40
ClearML Task: created new task id=9b3460b62f9d4015890c7dd2c0064bcf
2023-11-20 15:18:40,452 - clearml.Task - INFO - No repository found, storing script code instead
ClearML results page: http://clearml-webserver:8080/projects/9b4bbac7f1c248e894793f5771005826/experiments/9b3460b62f9d4015890c7dd2c0064bcf/output/log
2023-11-20 16:18:40
configuration args: Namespace(inference_task_id=None, metric_frequency=1.0, name='triton engine', project=None, serving_id='7303713271b941f7a0b45760d45208dd', t_allow_grpc=None, t_buffer_manager_thread_count=None, t_cuda_memory_pool_byte_size=None, t_grpc_infer_allocation_pool_size=None, t_grpc_port=None, t_http_port=None, t_http_thread_count=None, t_log_verbose=None, t_min_supported_compute_capability=None, t_pinned_memory_pool_byte_size=None, update_frequency=1.0)
String Triton Helper service
{'serving_id': '7303713271b941f7a0b45760d45208dd', 'project': None, 'name': 'triton engine', 'update_frequency': 1.0, 'metric_frequency': 1.0, 'inference_task_id': None, 't_http_port': None, 't_http_thread_count': None, 't_allow_grpc': None, 't_grpc_port': None, 't_grpc_infer_allocation_pool_size': None, 't_pinned_memory_pool_byte_size': None, 't_cuda_memory_pool_byte_size': None, 't_min_supported_compute_capability': None, 't_buffer_manager_thread_count': None, 't_log_verbose': None}
Updating local model folder: /models
2023-11-20 15:18:41,106 - clearml.Model - ERROR - Action failed <400/201: models.get_by_id/v1.0 (Invalid model id (no such public or company model): id=0bbba86c98c54610a14350ba69e2e330, company=d1bd92a3b039400cbafc60a7a5b1e52b)> (model=0bbba86c98c54610a14350ba69e2e330)
2023-11-20 15:18:41,107 - clearml.Model - ERROR - Failed reloading task 0bbba86c98c54610a14350ba69e2e330
2023-11-20 15:18:41,115 - clearml.Model - ERROR - Action failed <400/201: models.get_by_id/v1.0 (Invalid model id (no such public or company model): id=0bbba86c98c54610a14350ba69e2e330, company=d1bd92a3b039400cbafc60a7a5b1e52b)> (model=0bbba86c98c54610a14350ba69e2e330)
2023-11-20 15:18:41,115 - clearml.Model - ERROR - Failed reloading task 0bbba86c98c54610a14350ba69e2e330
2023-11-20 16:18:41
Traceback (most recent call last):
  File "clearml_serving/engines/triton/triton_helper.py", line 540, in <module>
    main()
  File "clearml_serving/engines/triton/triton_helper.py", line 532, in main
    helper.maintenance_daemon(
  File "clearml_serving/engines/triton/triton_helper.py", line 237, in maintenance_daemon
    self.model_service_update_step(model_repository_folder=local_model_repo, verbose=True)
  File "clearml_serving/engines/triton/triton_helper.py", line 146, in model_service_update_step
    print("Error retrieving model ID {} []".format(model_id, model.url if model else ''))
  File "/usr/local/lib/python3.8/dist-packages/clearml/model.py", line 341, in url
    return self._get_base_model().uri
  File "/usr/local/lib/python3.8/dist-packages/clearml/backend_interface/model.py", line 496, in uri
    return self.data.uri
AttributeError: 'NoneType' object has no attribute 'uri'

How can a broken task be fixed without deploying a new serving instance?

Docker image not up-to-date

Hey there,

I just tried launching a new serving instance as our demands are growing. A few months ago I commited a change that resolved a missing await allowing us to override the process() method.

However, it seems like when pulling the latest docker image, this change is not reflected as no new image is pushed to docker hub. I'm not sure how often you release, but it seems like there are other changes which may not be reflected on the images. Could you please elaborate? Should I just create my own image from the updates source code...?

docker-compose-triton-gpu fails due to "PTX was compiled with an unsupported toolchain"

Describe the bug
After following the docker-compose-triton-gpu.yml instructions for the pytorch example the server fails to spin up. The service fails due to the following error:

model_repository_manager.cc:1152] failed to load 'test_model_pytorch' version 1: Internal: unable to create stream: the provided PTX was compiled with an unsupported toolchain.

To Reproduce
Steps to reproduce the behavior:

Run the pytorch example in

clearml-serving/examples/pytorch/readme.md

Line 1 in 827905c

# Train and Deploy Keras model with Nvidia Triton Engine

Expected behavior
The service spins up without the model_repository_manager.cc:1152 error message.

Screenshots
n/a

Desktop (please complete the following information):

OS: Ubuntu 20.04.1
Virtualization version: (docker --version & docker-compose --version)
docker --version & docker-compose --version [1] 1611412 Docker version 20.10.16, build aa7e414 docker-compose version 1.29.2, build 5becea4c [1]+ Done docker --version

Additional context
See similar issue here: triton-inference-server/server#3877

AttributeError: module 'numpy' has no attribute 'int'

I am trying to install clearml serving on python 3.9.
The problem seems to be related to new releases of numpy.

Here is the full stack trace:

clearml-serving create --name "serving example"

Traceback (most recent call last):
  File "/Users/galleon/.pyenv/versions/maio-serving/bin/clearml-serving", line 5, in <module>
    from clearml_serving.__main__ import main
  File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/clearml_serving/__main__.py", line 9, in <module>
    from clearml_serving.serving.model_request_processor import ModelRequestProcessor, CanaryEP
  File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/clearml_serving/serving/model_request_processor.py", line 18, in <module>
    from .preprocess_service import BasePreprocessRequest
  File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/clearml_serving/serving/preprocess_service.py", line 247, in <module>
    class TritonPreprocessRequest(BasePreprocessRequest):
  File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/clearml_serving/serving/preprocess_service.py", line 253, in TritonPreprocessRequest
    np.int: 'int_contents',
  File "/Users/galleon/.pyenv/versions/3.9.16/envs/maio-serving/lib/python3.9/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Installed packages:

pip list
Package            Version
------------------ -----------
attrs              22.2.0
certifi            2022.12.7
charset-normalizer 3.1.0
clearml            1.9.3
clearml-serving    1.2.0
furl               2.1.3
idna               3.4
jsonschema         4.17.3
numpy              1.24.2
orderedmultidict   1.0.1
pathlib2           2.3.7.post1
Pillow             9.4.0
pip                23.0.1
psutil             5.9.4
PyJWT              2.4.0
pyparsing          3.0.9
pyrsistent         0.19.3
python-dateutil    2.8.2
PyYAML             6.0
requests           2.28.2
setuptools         58.1.0
six                1.16.0
urllib3            1.26.15

AWS S3 storage driver (boto3) not found

Hello! I am trying to use clearml-serving to serve my PyTorch pretrained model.
I deploy ClearML Server and using S3 Minio on local network to store artifacts and pretrained weights.

There is no problem with storing and getting models using Input\Output Models. Everything works correctly.
But clearml-serving (particularly the clearml-serving-triton container) don't have opportunity to work with Minio as it has not python module boto3

Using tutorial I add S3 credentials to example.env:

CLEARML_WEB_HOST=http://192.168.3.217:8080
CLEARML_API_HOST=http://192.168.3.217:8008
CLEARML_FILES_HOST=http://192.168.3.217:8081
CLEARML_API_ACCESS_KEY=CLEARML_API_ACCESS_KEY
CLEARML_API_SECRET_KEY=CLEARML_API_SECRET_KEY
CLEARML_SERVING_TASK_ID="ccfed15e442242a19338c20772562df2"
AWS_ACCESS_KEY_ID=AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY=AWS_SECRET_ACCESS_KEY

After that it doesn't work as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY not be sent to docker-compose. I add those variables to clearml-serving-triton container in the way to solve it:

  clearml-serving-triton:
    image: allegroai/clearml-serving-triton:latest
    container_name: clearml-serving-triton
    restart: unless-stopped
    # optimize perforamnce
    security_opt:
      - seccomp:unconfined
    # ports:
      # - "8001:8001"
    environment:
      CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-https://app.clear.ml}
      CLEARML_API_HOST: ${CLEARML_API_HOST:-https://api.clear.ml}
      CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-https://files.clear.ml}
      CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY}
      CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY}
      CLEARML_SERVING_TASK_ID: ${CLEARML_SERVING_TASK_ID:-}
      CLEARML_TRITON_POLL_FREQ: ${CLEARML_TRITON_POLL_FREQ:-1.0}
      CLEARML_TRITON_METRIC_FREQ: ${CLEARML_TRITON_METRIC_FREQ:-1.0}
      AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-ACCES_KEY}
      AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-SECRET_ACCESS_KEY}

But after that there is an error in this container:

clearml-serving-triton        | 2022-12-15 05:05:45,607 - clearml.storage - ERROR - AWS S3 storage driver (boto3) not found. Please install driver using: pip install "boto3>=1.9"

I guess that it can be fixed adding "boto3>=1.9" to the container requirements.txt here:
https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/engines/triton/requirements.txt

After doing this and building using local docker image I get the following error:

clearml-serving-triton        | 2022-12-15 05:10:54,624 - clearml.storage - ERROR - Could not download s3://192.168.3.217:9000/models/test/RegNet.b04da49b696a472b94677e26762078d1/models/regnet_y_400MF.pt , err: SSL validation failed for https://192.168.3.217:9000/models/test/RegNet.b04da49b696a472b94677e26762078d1/models/regnet_y_400MF.pt [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1131)

And I don't have any ideas how to disable secure connection in this container

torchserve support?

Hello, I see TorchServe engine support mentioned in the Readme but cannot find any way to actually use it. Is it available?

Serving Scikit-Learn models

I couldn't find any backends and configurations that support Scikit-Learn models (eg. pickle format).

As Clearml is having integration with Scikit-Learn, there should be some options to serve the model.

Please add a workaround to support it.

Incorrect requests version specifier in requirements.txt

Issue Description

Hi!
In the requirements.txt file, the requests version specifier is currently set to >=2.31.0,<2.29.0, which seems to be a mistake. And should be >=2.29.0,<2.31.0.
https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/serving/requirements.txt

Request for unknown model: 'test_model_pytorch' version 1 is not at ready state

https://github.com/allegroai/clearml-serving/tree/main/examples/pytorch

I'm running examples as per readme.md, but I get the following error.

What should I do?

{"detail":"Error processing request: <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"Request for unknown model: 'test_model_pytorch' version 1 is not at ready state\"\n\tdebug_error_string = \"{\"created\":\"@1652700912.192078289\",\"description\":\"Error received from peer ipv4:172.25.0.5:8001\",\"file\":\"src/core/lib/surface/call.cc\",\"file_line\":1069,\"grpc_message\":\"Request for unknown model: 'test_model_pytorch' version 1 is not at ready state\",\"grpc_status\":14}\"\n>"}

Endpoints appear to be normal.

Error: Failed loading preprocess code for '<>': No module named 'transformers'

Hi everyone!

I faced the problem with ClearML-serving. I've deployed onnx model from huggingface in clearml-serving, but "Error processing request: Error: Failed loading pre process code for '<>': No module named 'transformers'" appears when trying to send a request like in example (https://github.com/allegroai/clearml-serving/tree/main/examples/huggingface).

Preprocessing file just like in example.

The transformers package has been installed with CLEARML_EXTRA_PYTHON_PACKAGES variable in serving service deployment file.

You got any ideas?

Thanks in advance

Error during creation endpoints with config.pbtxt

I have created endpoint like this:

clearml-serving --id "<>" model add --engine triton --endpoint 'conformer_joint' --model-id '<>' --preprocess 'preprocess_joint.py' --aux-config "./config.pbtxt"

config.pbtxt file:

name: "conformer_joint"
default_model_filename: "model.bin"
max_batch_size: 16
dynamic_batching {
    max_queue_delay_microseconds: 100
}
input: [
    {
        name: "encoder_outputs"
        data_type: TYPE_FP32
        dims: [
            1,
            640
        ]
    },
    {
        name: "decoder_outputs"
        data_type: TYPE_FP32
        dims: [
            640,
            1
        ]
    }
]
output: [
    {
        name: "outputs"
        data_type: TYPE_FP32
        dims: [
            129
        ]
    }
]

preprocess_joint.py file:

from typing import Any, Union, Optional, Callable

class Preprocess(object):
    def __init__(self):
        # set internal state, this will be called only once. (i.e. not per request)
        pass

    def preprocess(
            self,
            body: Union[bytes, dict],
            state: dict, 
            collect_custom_statistics_fn: Optional[Callable[[dict], None]]
        ) -> Any:
        return body["encoder_outputs"], body["decoder_outputs"]

    def postprocess(
            self,
            data: Any,
            state: dict, 
            collect_custom_statistics_fn: Optional[Callable[[dict], None]]
        ) -> dict:
        return {"data":data.tolist()}

triton container and inference container show no errors, and I can find this triton model with right config.pbtxt in folder /models/conformer_joint. But when I try to make a request to model like this:

import numpy as np
import requests
body={
    "encoder_outputs": [np.random.randn(1, 640).tolist()],
    "decoder_outputs": [np.random.randn(640, 1).tolist()]
}
response = requests.post(f"<>/conformer_joint", json=body)
response.json()

I am getting an error:

Error processing request: object of type 'NoneType' has no len()

Model endpoint in serving task:

conformer_joint {
  engine_type = "triton"
  serving_url = "conformer_joint"
  model_id = "<>"
  preprocess_artifact = "py_code_conformer_joint"
  auxiliary_cfg = """name: "conformer_joint"
default_model_filename: "model.bin"
max_batch_size: 16
dynamic_batching {
    max_queue_delay_microseconds: 100
}
input: [
    {
        name: "encoder_outputs"
        data_type: TYPE_FP32
        dims: [
            1,
            640
        ]
    },
    {
        name: "decoder_outputs"
        data_type: TYPE_FP32
        dims: [
            640,
            1
        ]
    }
]
output: [
    {
        name: "outputs"
        data_type: TYPE_FP32
        dims: [
            129
        ]
    }
]
"""
}

Error occurs in process function of TritonPreprocessRequest (https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/serving/preprocess_service.py#L358C9-L358C81) because function use endpoint params like input_name, input_type and input_size. When we create endpoint like above, this parameters placed in auxiliary_cfg attribute.

Is there any chance to fix that error and create endpoint like above?

Add on for README

Hello, after some issues raised, there are codes add on to define triton engine args (e.g. ports, triton version) but the usage of these are not updated to README nor the --help yet.

Also it is not very clear the purpose of certain args like project name, name even after reading the --help.
Perhaps these can be updated in the README, so easier to use it.

This is also related to my failed attempt to use my own ClearML server and Triton setup with ClearML serving.
Suspect it might be due to familiarisation of these args and also might have gaps in implementation. But suggest to get the args right first, so that I can test out further.

Triton server keeps crashing

When i try to follow examples/pytorch the triton server is crashing, i.e. exiting with status code -6.

this is the log from the container:

I1004 17:32:10.693691 41 grpc_server.cc:4375] Started GRPCInferenceService at 0.0.0.0:8001                                                                                                               │
│ I1004 17:32:10.693968 41 http_server.cc:3075] Started HTTPService at 0.0.0.0:8000                                                                                                                        │
│ I1004 17:32:10.736035 41 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002                                                                                                                     │
│ I1004 17:34:10.746305 41 model_repository_manager.cc:994] loading: test_model_pytorch:1                                                                                                                  │
│ I1004 17:34:10.848495 41 libtorch.cc:1355] TRITONBACKEND_ModelInitialize: test_model_pytorch (version 1)                                                                                                 │
│ I1004 17:34:10.852702 41 libtorch.cc:253] Optimized execution is enabled for model instance 'test_model_pytorch'                                                                                         │
│ I1004 17:34:10.852761 41 libtorch.cc:271] Inference Mode is disabled for model instance 'test_model_pytorch'                                                                                             │
│ I1004 17:34:10.852801 41 libtorch.cc:346] NvFuser is not specified for model instance 'test_model_pytorch'                                                                                               │
│ I1004 17:34:10.856732 41 libtorch.cc:1396] TRITONBACKEND_ModelInstanceInitialize: test_model_pytorch (device 0)                                                                                          │
│ terminate called after throwing an instance of 'c10::Error'                                                                                                                                              │
│   what():  isTuple()INTERNAL ASSERT FAILED at "/opt/pytorch/pytorch/aten/src/ATen/core/ivalue_inl.h":1910, please report a bug to PyTorch. Expected Tuple but got String                                 │
│ Exception raised from toTupleRef at /opt/pytorch/pytorch/aten/src/ATen/core/ivalue_inl.h:1910 (most recent call first):                                                                                  │
│ frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7f6caf24e11c in /opt/tritonserver/backends/pytorch/libc10.so │
│ frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x7f6caf22bcb4 in /opt/tri │
│ frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x53 (0x7f │
│ frame #3: <unknown function> + 0x368a57a (0x7f6cf239657a in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)                                                                                          │
│ frame #4: <unknown function> + 0x368a6e9 (0x7f6cf23966e9 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)                                                                                          │
│ frame #5: torch::jit::SourceRange::highlight(std::ostream&) const + 0x48 (0x7f6cefe48678 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)                                                          │
│ frame #6: torch::jit::ErrorReport::what() const + 0x2c3 (0x7f6cefe2eeb3 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)                                                                           │
│ frame #7: <unknown function> + 0x102b9 (0x7f6cf91f92b9 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)                                                                                       │
│ frame #8: <unknown function> + 0x1d4d2 (0x7f6cf92064d2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)                                                                                       │
│ frame #9: <unknown function> + 0x1d9f2 (0x7f6cf92069f2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)                                                                                       │
│ frame #10: TRITONBACKEND_ModelInstanceInitialize + 0x374 (0x7f6cf9206db4 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)                                                                     │
│ frame #11: <unknown function> + 0x307dee (0x7f6cfb143dee in /opt/tritonserver/bin/../lib/libtritonserver.so)
│ frame #12: <unknown function> + 0x3093b3 (0x7f6cfb1453b3 in /opt/tritonserver/bin/../lib/libtritonserver.so)                                                                                             │
│ frame #13: <unknown function> + 0x301067 (0x7f6cfb13d067 in /opt/tritonserver/bin/../lib/libtritonserver.so)                                                                                             │
│ frame #14: <unknown function> + 0x18a7ca (0x7f6cfafc67ca in /opt/tritonserver/bin/../lib/libtritonserver.so)                                                                                             │
│ frame #15: <unknown function> + 0x1979b1 (0x7f6cfafd39b1 in /opt/tritonserver/bin/../lib/libtritonserver.so)                                                                                             │
│ frame #16: <unknown function> + 0xd6de4 (0x7f6cfa991de4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)                                                                                                     │
│ frame #17: <unknown function> + 0x9609 (0x7f6cfae0f609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0)                                                                                                     │
│ frame #18: clone + 0x43 (0x7f6cfa67f293 in /usr/lib/x86_64-linux-gnu/libc.so.6)                                                                                                                          │
│                                                                                                                                                                                                          │
│ Signal (6) received.                                                                                                                                                                                     │
│  0# 0x000055E2DF079299 in tritonserver                                                                                                                                                                   │
│  1# 0x00007F6CFA5A3210 in /usr/lib/x86_64-linux-gnu/libc.so.6                                                                                                                                            │
│  2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6                                                                                                                                                       │
│  3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6                                                                                                                                                         │
│  4# 0x00007F6CFA959911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6                                                                                                                                       │
│  5# 0x00007F6CFA96538C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6                                                                                                                                       │
│  6# 0x00007F6CFA964369 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6                                                                                                                                       │
│  7# __gxx_personality_v0 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6                                                                                                                                     │
│  8# 0x00007F6CFA761BEF in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1                                                                                                                                        │
│  9# _Unwind_Resume in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1                                                                                                                                            │
│ 10# 0x00007F6CEFA61C49 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so                                                                                                                             │
│ 11# 0x00007F6CF23966E9 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so
12# torch::jit::SourceRange::highlight(std::ostream&) const in /opt/tritonserver/backends/pytorch/libtorch_cpu.so                                                                                        │
│ 13# torch::jit::ErrorReport::what() const in /opt/tritonserver/backends/pytorch/libtorch_cpu.so                                                                                                          │
│ 14# 0x00007F6CF91F92B9 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so                                                                                                                        │
│ 15# 0x00007F6CF92064D2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so                                                                                                                        │
│ 16# 0x00007F6CF92069F2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so                                                                                                                        │
│ 17# TRITONBACKEND_ModelInstanceInitialize in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so                                                                                                     │
│ 18# 0x00007F6CFB143DEE in /opt/tritonserver/bin/../lib/libtritonserver.so                                                                                                                                │
│ 19# 0x00007F6CFB1453B3 in /opt/tritonserver/bin/../lib/libtritonserver.so                                                                                                                                │
│ 20# 0x00007F6CFB13D067 in /opt/tritonserver/bin/../lib/libtritonserver.so                                                                                                                                │
│ 21# 0x00007F6CFAFC67CA in /opt/tritonserver/bin/../lib/libtritonserver.so                                                                                                                                │
│ 22# 0x00007F6CFAFD39B1 in /opt/tritonserver/bin/../lib/libtritonserver.so                                                                                                                                │
│ 23# 0x00007F6CFA991DE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6                                                                                                                                       │
│ 24# 0x00007F6CFAE0F609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0                                                                                                                                      │
│ 25# clone in /usr/lib/x86_64-linux-gnu/libc.so.6                                                                                                                                                         │
│                                                                                                                                                                                                          │
│ configuration args: Namespace(inference_task_id=None, metric_frequency=1.0, name='triton engine', project=None, serving_id='dd756abf5e8b42efab92dfb0cfa57a5e', t_allow_grpc=None, t_buffer_manager_threa │
│ String Triton Helper service                                                                                                                                                                             │
│ {'serving_id': 'dd756abf5e8b42efab92dfb0cfa57a5e', 'project': None, 'name': 'triton engine', 'update_frequency': 1.0, 'metric_frequency': 1.0, 'inference_task_id': None, 't_http_port': None, 't_http_t │
│                                                                                                                                                                                                          │
│ Starting server: ['tritonserver', '--model-control-mode=poll', '--model-repository=/models', '--repository-poll-secs=60.0', '--metrics-port=8002', '--allow-metrics=true', '--allow-gpu-metrics=true']   │
│ Info: syncing models from main serving service                                                                                                                                                           │
│ reporting metrics: relative time 60 sec                                                                                                                                                                  │
│ Info: syncing models from main serving service                                                                                                                                                           │
│ Updating local model folder: /models                                                                                                                                                                     │
│ INFO: target config.pbtxt file for endpoint 'test_model_pytorch':                                                                                                                                        │
│                                                                                                                                                                                                          │
│ input: [{                                                                                                                                                                                                │
│     dims: [1, 28, 28]                                                                                                                                                                                    │
│     data_type: TYPE_FP32                                                                                                                                                                                 │
│     name: "INPUT__0"                                                                                                                                                                                     │
│   }]                                                                                                                                                                                                     │
│ output: [{                                                                                                                                                                                               │
│     dims: [-1, 10]
    data_type: TYPE_FP32                                                                                                                                                                                 │
│     name: "OUTPUT__0"                                                                                                                                                                                    │
│   }]                                                                                                                                                                                                     │
│ backend: "pytorch"                                                                                                                                                                                       │
│                                                                                                                                                                                                          │
│ Update model v1 in /models/test_model_pytorch/1                                                                                                                                                          │
│ Info: Models updated from main serving service                                                                                                                                                           │
│ reporting metrics: relative time 120 sec                                                                                                                                                                 │
│ Traceback (most recent call last):                                                                                                                                                                       │
│   File "clearml_serving/engines/triton/triton_helper.py", line 515, in <module>                                                                                                                          │
│     main()                                                                                                                                                                                               │
│   File "clearml_serving/engines/triton/triton_helper.py", line 507, in main                                                                                                                              │
│     helper.maintenance_daemon(                                                                                                                                                                           │
│   File "clearml_serving/engines/triton/triton_helper.py", line 248, in maintenance_daemon                                                                                                                │
│     raise ValueError("triton-server process ended with error code {}".format(error_code))                                                                                                                │
│ ValueError: triton-server process ended with error code -6                                                                                                                                               │
│ Stream closed EOF for clearml-serving/clearml-serving-triton-85779b957d-hdx7q (clearml-serving-triton)

ValueError: dictionary update sequence element #0 has length <>; 2 is required

Hi everyone! I use command to create entrypoint:

clearml-serving --id "<>" model add --engine triton --endpoint 'conformer_joint' --model-id '<>' --preprocess 'preprocess_joint.py' --input-size '[1, 640]' '[640, 1]' --input-name 'encoder_outputs' 'decoder_outputs' --input-type float32 float32 --output-size '[100]' --output-name 'outputs' --output-type float32 --aux-config name=\"conformer_joint\" max_batch_size=16 dynamic_batching.max_queue_delay_microseconds=100 platform=\"onnxruntime_onnx\" default_model_filename=\"model.bin\"

this command creates config.pbtxt like that (copied from logs):

name: "conformer_joint"
platform: "onnxruntime_onnx"
default_model_filename: "model.bin"
input: [{
    dims: [-1, 1, 640]
    data_type: TYPE_FP32
    name: "encoder_outputs"
  },
  {
    dims: [-1, 640, 1]
    data_type: TYPE_FP32
    name: "decoder_outputs"
  }]
output: [{
    dims: [-1, 129]
    data_type: TYPE_FP32
    name: "outputs"
  }]

Logs from k8s:

I0802 22:48:17.274440 53 model_repository_manager.cc:1206] loading: conformer_joint:1
I0802 22:48:17.274536 53 onnxruntime.cc:2560] TRITONBACKEND_ModelInitialize: conformer_joint (version 1)
I0802 22:48:17.274881 53 onnxruntime.cc:666] skipping model configuration auto-complete for 'conformer_joint': inputs and outputs already specified
I0802 22:48:17.276238 53 onnxruntime.cc:2603] TRITONBACKEND_ModelInstanceInitialize: conformer_joint (GPU device 0)
I0802 22:48:17.279143 53 model_repository_manager.cc:1352] successfully loaded 'conformer_joint' version 1

And there are no error in clearml-serving. But when I'm trying to create request like that

import numpy as np
import requests
r = requests.post(f"<URL>", json={"encoder_outputs": np.random.randn(1, 1, 640).tolist(), "decoder_outputs": np.random.randn(1, 640, 1).tolist()})
r.json()

I get this:

[2023-08-02 23:02:51 +0000] [113] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 152, in jsonable_encoder
    data = dict(obj)
           ^^^^^^^^^
ValueError: dictionary update sequence element #0 has length 129; 2 is required

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 436, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/root/clearml/clearml_serving/serving/main.py", line 31, in custom_route_handler
    return await original_route_handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 255, in app
    content = await serialize_response(
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 152, in serialize_response
    return jsonable_encoder(response_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 117, in jsonable_encoder
    encoded_value = jsonable_encoder(
                    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/encoders.py", line 160, in jsonable_encoder
    raise ValueError(errors) from e
ValueError: [ValueError('dictionary update sequence element #0 has length 129; 2 is required'), TypeError('vars() argument must have __dict__ attribute')]

I think this is because of batch size and maybe I need to add something in config.pbtxt. Any ideas?

Thanks in advance!

triton model breaks serving instance

We have setup clearml serving on Kubernetes including triton support. Our triton instance has no GPU, so deploying a model leads to the following error in the triton instance:

E0718 07:41:21.083440 30 model_lifecycle.cc:596] failed to load 'distilbert-test2' version 1: Invalid argument: unable to load model 'distilbert-test2', TensorRT backend supports only GPU device

Trying to remove the model again is not possible:
clearml-serving --id 5097f44fe9cb45f7be2a917c6fe8cad9 model remove --endpoint distilbert-test2

yields the following:

`clearml-serving - CLI for launching ClearML serving engine
2023-07-18 09:47:59,260 - clearml.Task - ERROR - Failed reloading task 5097f44fe9cb45f7be2a917c6fe8cad9
2023-07-18 09:47:59,290 - clearml.Task - ERROR - Failed reloading task 5097f44fe9cb45f7be2a917c6fe8cad9

Error: Task ID "5097f44fe9cb45f7be2a917c6fe8cad9" could not be found
`

In general, our observation is that the serving is not resilient against these kind of problems. A broken model should not break the instance.

Support for already Running Triton

I have triton deployed in k8s. Am I able to link my serving service to this instance of triton?

Inconsistent inference results from clearml-serving

Hello,
I deployed a model using clearml-serving, but it generate inconsistent results across same HTTP requests.

To recreate:

I deployed a self-hosted clearml server in my local kubernetes (from docker image allegroai/clearml:1.4.0).
Reused the pytorch MNIST example from https://github.com/allegroai/clearml-serving/tree/main/examples/pytorch.
Went through the model training process.
Installed clearml-serving with helm (helm repo: NAME allegroai/clearml-serving, CHART VERSION 0.4.1, APP VERSION 0.9.0).
Deployed the MNIST model to a serving endpoint.
Tested the endpoint "http://ip:port/serve/test_model_pytorch" using POSTMAN

Everything goes well as the readme.md from https://github.com/allegroai/clearml-serving/tree/main/examples/pytorch instructed.
But mysteriously, the HTTP responses are not consistent! (The MNIST model occasionally returns different "digits" from the same input image)

I'm quite confused here, and have no idea if any random process happens during the model inference.
Thanks for any help!

Serving with custom engine

Hello,

I checked the pipeline example where you use custom engine, but it is not very complete. What if I want to run normal pytorch inference without any engine?

Is it also possible to implement my own Rest API (e.g. Flask) or at least have more control how I process my inferences? In your README.md, it says: Customizable RestAPI for serving (i.e. allow per model pre/post-processing for easy integration). How can a really customize the RestAPI?

Thanks!
Bruno

add aux-config parser error

Hello,
I'm trying to add a onnx model and specify the platform on clearml-serving model add aux-config but i have a parser error,

on the file triton_helper.py the parser expect an int,

thanks for your work on clearml-serving 🙌

Managing published versions

Currently once published, the model status remains as "published" and Triton will only use the latest "published" model and unload the previous version.

But this unloading of model did not align with the "published" state and can be confusing.
May I suggest to expand the function with unpublish so we can explicitly unload model versions in Triton.
This will also allow multiple versions (published) of the models to be available in Triton.

<class 'argparse.FileType'> is a FileType class object, instance of it must be passed

I had this issue while running the very first command
clearml-serving triton --project "serving" --name "serving example"

<class 'argparse.FileType'> is a FileType class object, instance of it must be passed

Unable to load onnx models into Triton

Hi there,

I have been working on deploying our inference pipeline on clearml-serving using the docker-compose approach. I've hashed out most of the issues thus far thanks to the community, now I am facing another issue while loading onnx models.

I am getting the following error:

clearml-serving-triton   | mmdet  | UNAVAILABLE: Internal: **failed to stat file /models/mmdet/1/model.onnx**

I exec'd into the container to see what's inside /models, and under /models/mmdet/1 there was a model.bin but no model.onnx. I created the modeling using the OutputModel. I also tried doing it through the CLI:

clearml-serving --id $SERVING_ID model upload --name "mmdet_cli" --project $PROJECT_NAME --path /mmdet/model.onnx

but the same thing. I'm guessing when the folder structure is getting set up, this file gets renamed to a .bin extention. Should this be happening or am I doing something wrong?

When I download the file from the models section in the portal, it's an onnx file, exactly the one I uploaded. So not sure where this renaming is happening tbh...

better explanation of the scalar type and buckets

Hi there, today I struggled with the (--variable-scalar) argument and the buckets in the clearml-serving metrics add command.

I think the documentation could be improved.

I already got help in the ClearML Slack:

A scalar in buckets is simply a histogram. Because if you have 1000s of requests per second, it makes no sense to display every data point. So scalars can be divided into buckets and for each minute, for example, we can calculate how much % of total traffic fell in bucket 1, bucket 2, bucket 3, etc. Then we display this histogram as a single column in a heatmap. Y axis is the buckets, color is the value ~ % of traffic in that bucket, and X is time.