Comments (9)
Hi @david-saeger, what machine are you deploying from? My hunch is that you're on an ARM based Mac?
If so, the issue stems from the fact that TGI only provides images built for x86_64 (the arch of our EC2 targets as well) but your local docker is configured to pull ARM based images for use on a Mac.
You may have luck running export DOCKER_DEFAULT_PLATFORM=linux/amd64
as a part of your environment setup. I (and most others) have the most success deploying LISA from Amazon Linux instances in EC2/SageMakerNotebook/Cloud9 etc. but I'm interested in learning about how you are deploying the solution.
from lisa.
@krzim-aws you are right on all accounts. Was attempting deploy from a Mac M1. exporting the docker platform environment did enable successful deployment. Going to close this ticket out. Thanks 😄 !
from lisa.
Awesome! Glad to hear that resolved your issue. @petermuller we might consider adding this to the documentation?
from lisa.
reopening as I think building docker images for the linux/amd64 had further downstream effects. I was able to deploy many of the stacks but ECS containers all fail with
exec ./src/entrypoint.sh: exec format error
on startup
I take it these are related
Not sure if it is possible or worth the effort for the images to be platform independent but that would likely reduce some of this friction.
from lisa.
That error is indeed because the images were built for the wrong architecture. I think we'll need to go one step further here to ensure that all of the ARM based images are destroyed and rebuilt.
Can you please remove all of the CDK generated images from your local docker registry, run a docker prune to remove the build cache, and then rerun? If the error persists, then we know this solution isn't the answer.
Example commands to remove all images (not just the CDK generated ones) and do a system prune (removes all unused volumes, containers, images, caches, .etc) would look like :
docker rmi -f $(docker images -aq)
docker system prune
from lisa.
hmm yea cleaned up the CDK generated images and attempted re deploy with no joy
The stack named owl-owl-serve-dev failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Resource handler returned message: "Error occurred during operation 'ECS Deployment Circuit Breaker was triggered'." (RequestToken: 3c4545df-a590-979b-b340-a6dbb26818db, HandlerErrorCode: GeneralServiceException)
at FullCloudFormationDeployment.monitorDeployment (/Users/dsaeger/projects/LISA/node_modules/aws-cdk/lib/index.js:428:10615)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Object.deployStack2 [as deployStack] (/Users/dsaeger/projects/LISA/node_modules/aws-cdk/lib/index.js:431:182776)
at async /Users/dsaeger/projects/LISA/node_modules/aws-cdk/lib/index.js:431:164745
from lisa.
some context that I assumed would not be relevant but figure I might as well share just in case is that I was running into timeouts generating safetensors so I downloaded those artifacts from hugging face and manually uploaded them to the s3 bucket. Curious if this might be tripping things up somehow.
from lisa.
That shouldn't be an issue. There could be yet one more place that the old docker images are hiding. Can you delete the local images as well as any CDK generated images for this deployment in ECR in the AWS console as well?
Unfortunately I don't have an exact setup to replicate this on my end. The other option (and the most well tested) is to do a deployment from an EC2 instance or even SageMaker Notebook if you have access to one.
from lisa.
Thanks for this iterating on error messages so that is promising to some degree. Still erroring out on deploying LisaServe but now I'm getting
"Group did not stabilize. Last scaling activity: The requested configuration is currently not supported. Please check the documentation for supported configurations. Launching EC2 instance failed."
documentation indicates that this may be the result of "instance configuration might not be supported in your requested AWS Region or Availability Zones." it seems the requested instance type is available in my region so perhaps there is something about its configuration that is particular to our deployment environment. Dont think its a VPC setup thing because we didnt deviate from the LISA vpc deployment
from lisa.
Related Issues (2)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lisa.