🐛 Describe the bug Load model failed: policy_vs_doc_model_tar_gz,

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Load model failed - error: Worker died about serve HOT 6 OPEN

geraldstanje commented on June 12, 2024

Load model failed - error: Worker died

from serve.

Comments (6)

mreso commented on June 12, 2024

Hi @geraldstanje
Seems like you're trying to deploy a model using a Sagemaker example. SageMake uses TorchServe for model deployment but the model artifact you're creating can not directly be deployed with TorchServe. Is there a specific tutorial you're following?

Not too familiar with Sagemaker itself but the inference.py script you're providing looks like you're trying to deploy a setfit model. You will need to integrate this into a TorchServe handler and package it with the model-archiver into a tar.gz file which will add important meta information. Please have a look as our XGBoost example which should be easily adaptable to your use case as you can basically deploy any framework or library through this approach.

Let me know if you have further questions.

from serve.

geraldstanje commented on June 12, 2024

@mreso thanks for pointing out - is there a simple way to convertit to run setFit models with TorchServe? can i copy the code i have into a BaseHandler and implement those functions?

does the sagemaker return the same datatype / format as the BaseHandler, what is required?

from serve.

agunapal commented on June 12, 2024

cc @namannandan

from serve.

mreso commented on June 12, 2024

@geraldstanje yes, you basically follow the XGBoost example to create your own handler or if your model is a HuggingFace model from their transformers library you can just follow one of of these examples:
https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers

Let me know if you're having problems converting your example.

from serve.

geraldstanje commented on June 12, 2024

@mreso thanks - how is sagemaker than be able to use torchServe if they dont implement the ts.torch_handler.base_handler? lets say i take this as an example: https://github.com/aws/amazon-sagemaker-examples/blob/main/frameworks/pytorch/get_started_mnist_deploy.ipynb

i looked at https://github.com/pytorch/serve/tree/master/examples/xgboost_classfication.
i have the trained setfit model here: torchserve/setfit-test-model:

ls -la torchserve/setfit-test-model/1_Pooling/
total 12
drwx------ 2 ubuntu ubuntu 4096 May 13 03:52 .
drwx------ 4 ubuntu ubuntu 4096 May 13 03:52 ..
-rw------- 1 ubuntu ubuntu  296 May 13 03:52 config.json

ls -la torchserve/setfit-test-model/
total 89728
drwx------ 4 ubuntu ubuntu     4096 May 13 03:52 .
drwx------ 3 ubuntu ubuntu     4096 May 13 03:52 ..
drwx------ 2 ubuntu ubuntu     4096 May 13 03:52 1_Pooling
drwx------ 2 ubuntu ubuntu     4096 May 13 03:52 2_Normalize
-rw------- 1 ubuntu ubuntu     7586 May 13 03:52 README.md
-rw------- 1 ubuntu ubuntu      660 May 13 03:52 config.json
-rw------- 1 ubuntu ubuntu      164 May 13 03:52 config_sentence_transformers.json
-rw------- 1 ubuntu ubuntu      116 May 13 03:52 config_setfit.json
-rw------- 1 ubuntu ubuntu 90864192 May 13 03:52 model.safetensors
-rw------- 1 ubuntu ubuntu    13431 May 13 03:52 model_head.pkl
-rw------- 1 ubuntu ubuntu      349 May 13 03:52 modules.json
-rw------- 1 ubuntu ubuntu       53 May 13 03:52 sentence_bert_config.json
-rw------- 1 ubuntu ubuntu      695 May 13 03:52 special_tokens_map.json
-rw------- 1 ubuntu ubuntu   711649 May 13 03:52 tokenizer.json
-rw------- 1 ubuntu ubuntu     1433 May 13 03:52 tokenizer_config.json
-rw------- 1 ubuntu ubuntu   231508 May 13 03:52 vocab.txt

how can i create the model.pt for the torch-model-archiver?

torch-model-archiver --model-name SetFitModel --version 1.0 --serialized-file torchserve/setfit-test-model/model.pt --handler ./setfit_handler_generalized.py --extra-files "./torchserve/setfit-test-model/config.json,./torchserve/setfit-test-model/config_sentence_transformers.json,./torchserve/setfit-test-model/config_sentence_transformers.json,./torchserve/setfit-test-model/config_setfit.json,./torchserve/setfit-test-model/model.safetensors,./torchserve/setfit-test-model/model_head.pkl,./torchserve/setfit-test-model/modules.json,./torchserve/setfit-test-model/sentence_bert_config.json,./torchserve/setfit-test-model/special_tokens_map.json,./torchserve/setfit-test-model/tokenizer.json,./torchserve/setfit-test-model/tokenizer_config.json,./torchserve/setfit-test-model/vocab.txt,./1_Pooling/config.json"

from serve.

namannandan commented on June 12, 2024

@geraldstanje to answer your question

how is sagemaker than be able to use torchServe if they dont implement the ts.torch_handler.base_handler?

The PyTorch inference containers that are compatible with SageMaker install a package called the SageMaker PyTorch Inference Toolkit which provides a handler implementation that is compatible with TorchServe and plugs in the input_fn, predict_fn and output_fn that you provide in the inference.py script above. For reference, please see

If you'd like to create a custom docker container that is SageMaker compatible, I would suggest starting out with a SageMaker PyTorch Inference Container as the base image and build on top of it. For ex: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.2.0-gpu-py310-cu118-ubuntu20.04-sagemaker.

If you would like to use TorchServe natively on SageMaker, here's an example on the same: https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/torchserve/mme-gpu/torchserve_multi_model_endpoint.ipynb

Also, looking at the error logs, I see from the traceback that the model load failed because the handler was unable to find a necessary module:

ModuleNotFoundError: No module named '124636bff1364f2bbc7f71a8667110af'

Could you please check if all the required dependencies to load the model are either installed in the container or included in the model archive?

from serve.

Load model failed - error: Worker died about serve HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent