Coder Social home page Coder Social logo

Comments (9)

krzim-aws avatar krzim-aws commented on August 14, 2024

Hi @david-saeger, what machine are you deploying from? My hunch is that you're on an ARM based Mac?

If so, the issue stems from the fact that TGI only provides images built for x86_64 (the arch of our EC2 targets as well) but your local docker is configured to pull ARM based images for use on a Mac.

You may have luck running export DOCKER_DEFAULT_PLATFORM=linux/amd64 as a part of your environment setup. I (and most others) have the most success deploying LISA from Amazon Linux instances in EC2/SageMakerNotebook/Cloud9 etc. but I'm interested in learning about how you are deploying the solution.

from lisa.

david-saeger avatar david-saeger commented on August 14, 2024

@krzim-aws you are right on all accounts. Was attempting deploy from a Mac M1. exporting the docker platform environment did enable successful deployment. Going to close this ticket out. Thanks 😄 !

from lisa.

krzim-aws avatar krzim-aws commented on August 14, 2024

Awesome! Glad to hear that resolved your issue. @petermuller we might consider adding this to the documentation?

from lisa.

david-saeger avatar david-saeger commented on August 14, 2024

reopening as I think building docker images for the linux/amd64 had further downstream effects. I was able to deploy many of the stacks but ECS containers all fail with

exec ./src/entrypoint.sh: exec format error

on startup

I take it these are related

https://stackoverflow.com/questions/74705475/aws-ecs-exec-usr-local-bin-docker-entrypoint-sh-exec-format-error

Not sure if it is possible or worth the effort for the images to be platform independent but that would likely reduce some of this friction.

from lisa.

krzim-aws avatar krzim-aws commented on August 14, 2024

That error is indeed because the images were built for the wrong architecture. I think we'll need to go one step further here to ensure that all of the ARM based images are destroyed and rebuilt.

Can you please remove all of the CDK generated images from your local docker registry, run a docker prune to remove the build cache, and then rerun? If the error persists, then we know this solution isn't the answer.

Example commands to remove all images (not just the CDK generated ones) and do a system prune (removes all unused volumes, containers, images, caches, .etc) would look like :

docker rmi -f $(docker images -aq)
docker system prune

from lisa.

david-saeger avatar david-saeger commented on August 14, 2024

hmm yea cleaned up the CDK generated images and attempted re deploy with no joy

The stack named owl-owl-serve-dev failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Resource handler returned message: "Error occurred during operation 'ECS Deployment Circuit Breaker was triggered'." (RequestToken: 3c4545df-a590-979b-b340-a6dbb26818db, HandlerErrorCode: GeneralServiceException)
    at FullCloudFormationDeployment.monitorDeployment (/Users/dsaeger/projects/LISA/node_modules/aws-cdk/lib/index.js:428:10615)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Object.deployStack2 [as deployStack] (/Users/dsaeger/projects/LISA/node_modules/aws-cdk/lib/index.js:431:182776)
    at async /Users/dsaeger/projects/LISA/node_modules/aws-cdk/lib/index.js:431:164745

from lisa.

david-saeger avatar david-saeger commented on August 14, 2024

some context that I assumed would not be relevant but figure I might as well share just in case is that I was running into timeouts generating safetensors so I downloaded those artifacts from hugging face and manually uploaded them to the s3 bucket. Curious if this might be tripping things up somehow.

from lisa.

krzim-aws avatar krzim-aws commented on August 14, 2024

That shouldn't be an issue. There could be yet one more place that the old docker images are hiding. Can you delete the local images as well as any CDK generated images for this deployment in ECR in the AWS console as well?

Unfortunately I don't have an exact setup to replicate this on my end. The other option (and the most well tested) is to do a deployment from an EC2 instance or even SageMaker Notebook if you have access to one.

from lisa.

david-saeger avatar david-saeger commented on August 14, 2024

Thanks for this iterating on error messages so that is promising to some degree. Still erroring out on deploying LisaServe but now I'm getting

"Group did not stabilize. Last scaling activity: The requested configuration is currently not supported. Please check the documentation for supported configurations. Launching EC2 instance failed."

documentation indicates that this may be the result of "instance configuration might not be supported in your requested AWS Region or Availability Zones." it seems the requested instance type is available in my region so perhaps there is something about its configuration that is particular to our deployment environment. Dont think its a VPC setup thing because we didnt deviate from the LISA vpc deployment

from lisa.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.