Coder Social home page Coder Social logo

Comments (12)

philschmid avatar philschmid commented on June 12, 2024

Do you save your tokenizer as well in your training script? if not this would explain why it cannot be found when deploying the model.

from sagemaker-huggingface-inference-toolkit.

elozano98 avatar elozano98 commented on June 12, 2024

Yes, I've tried saving the tokenizer too. But I still get the same client error message.

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--model-dir", type=str, default=os.environ["SM_MODEL_DIR"])
    ...
    trainer.model.save_pretrained(args.model_dir)
    trainer.tokenizer.save_pretrained(args.model_dir)

from sagemaker-huggingface-inference-toolkit.

philschmid avatar philschmid commented on June 12, 2024

Can you please share the structure for your model.tar.gz?

from sagemaker-huggingface-inference-toolkit.

elozano98 avatar elozano98 commented on June 12, 2024

The model.tar.gz structure is:

  • checkpoint-43
  • config.json
  • pytorch_model.bin
  • special_tokens_map.json
  • tokenizer_config.json
  • vocab.txt

from sagemaker-huggingface-inference-toolkit.

philschmid avatar philschmid commented on June 12, 2024

This should work! after adding trainer.tokenizer.save_pretrained(args.model_dir) are you still seeing the same issue?

from sagemaker-huggingface-inference-toolkit.

elozano98 avatar elozano98 commented on June 12, 2024

Yes, I still have the same issue.

I think it can maybe be something related to the path used by the endpoint when it tries to load the tokenizer using from_pretrained.

Also, all the code that creates the estimator, trains it, deploys it, and uses the predictor for inference, is executed in my local machine. Despite the model is trained, stored, and deployed successfully, maybe this can be related to an error in the path used in the endpoint.

from sagemaker-huggingface-inference-toolkit.

philschmid avatar philschmid commented on June 12, 2024

What do you mean with

Also, all the code that creates the estimator, trains it, deploys it, and uses the predictor for inference, is executed in my local machine.

Aren't you running the training on AWS? Is the model.tar.gz uploaded to s3?

from sagemaker-huggingface-inference-toolkit.

elozano98 avatar elozano98 commented on June 12, 2024

Yes, the model is stored in the S3 bucket. The only difference is that I don't use a notebook instance in SageMaker to create, train and deploy the estimator. But that should work too, since I make sure the model is store in the correct S3 bucket.

from sagemaker-huggingface-inference-toolkit.

philschmid avatar philschmid commented on June 12, 2024

Yeah this shouldn't be an issue. Can you try creating a new model.tar.gz manually?
https://huggingface.co/docs/sagemaker/inference#creating-a-model-artifact-modeltargz-for-deployment

Replace step 1 with unzipping your current archive. You can also remove your checkpoint then.

from sagemaker-huggingface-inference-toolkit.

segments-tobias avatar segments-tobias commented on June 12, 2024

@elozano98 Did you ever find the cause of the problem? I might be facing something similar

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "Can\u0027t load config for \u0027/.sagemaker/mms/models/model\u0027. Make sure that:\n\n- \u0027/.sagemaker/mms/models/model\u0027 is a correct model identifier listed on \u0027https://huggingface.co/models\u0027\n  (make sure \u0027/.sagemaker/mms/models/model\u0027 is not a path to a local directory with something else, in that case)\n\n- or \u0027/.sagemaker/mms/models/model\u0027 is the correct path to a directory containing a config.json file\n\n"
}

from sagemaker-huggingface-inference-toolkit.

philschmid avatar philschmid commented on June 12, 2024

@segments-tobias could you describe how you have created your model.tar.gz?

from sagemaker-huggingface-inference-toolkit.

segments-tobias avatar segments-tobias commented on June 12, 2024

@philschmid I found the answer in another issue. I also accidentally zipped the directory, instead of just the contents.

from sagemaker-huggingface-inference-toolkit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.