Comments (6)
I do not know what you mean by this, perhaps you could clarify. To be clear, I am not looking for monetary compensation to add this update -
ah, I apologize for misunderstanding. I've noticed that terraform-aws-eks
is sponsored by a consulting/professional services business, and so I assumed you're here on behalf of that business.
it was more of raising it as a discussion first before going down the path of submitting a PR, and just overall align on the approach. If you are open to the idea of updating the EKS portion of the docs, I am offering my services to do so, free of charge 😬
As a background, today we have a det deploy
tool which uses cloudformation on AWS and terraform on GCP to spin up determined clusters on raw EC2/GCP nodes. We also have a very raw solution for GKE which sets up a GKE cluster and depoys our helm chart to it.
What I'd like to have in a long term, is det deploy eks
which creates an appropriate EKS cluster and deploys our helm chart to it. If I were to break it up into milestones:
- Terraform code to create/update/maintain an EKS cluster with autoscaling for two types of instances: GPU instances of configurable type and max count for ML loads, and cheap CPU instances (e.g. m5.xlarge) for lightweight jobs. On GKE it's literally a checkbox, but I've really struggled to set this up on EKS before opening that ticket.
- Support for a RDS Postgres instance our helm chart will use for database needs.
- Support for a S3 bucket our helm chart will use for (ml model training) checkpoint storage.
- Support for a shared AWS EFS filesystem for users home directories and so on.
- Put a helm chart on it.
from determined.
yep, I understand that's a typical approach for terraform ecosystem. However in our product historically we've been targeting ML engineers who do not have any experience with terraform, but want to push a button and get a cluster in a box deployed. In the end of the day, CLI is just a thin wrapper on top of terraform code. Some users elect to bypass the wrapper and take the raw terraform code if they want to consume it that way.
from determined.
hello @bryantbiggs ,
thanks a lot for addressing terraform-aws-modules/terraform-aws-eks#3027 . you are right to guess that I've been investigating how we can modernize our EKS support and move from a manual setup to terraform. as an open-source product we'd be happy to take a PR for that.
reading between the lines, I assume you're looking to offer us your professional services. unfortunately we're not able to do that at this time.
from determined.
I assume you're looking to offer us your professional services. unfortunately we're not able to do that at this time.
I do not know what you mean by this, perhaps you could clarify. To be clear, I am not looking for monetary compensation to add this update - it was more of raising it as a discussion first before going down the path of submitting a PR, and just overall align on the approach. If you are open to the idea of updating the EKS portion of the docs, I am offering my services to do so, free of charge 😬
from determined.
thank you for sharing that information! I'll put it on my list to try putting together a pattern of running the Determined AI helm chart on EKS and then we can discuss how that fits into the documentation that is currently provided
One thing to keep in mind - most of the Terraform users are used to interacting with Terraform directly, and not through a wrapper CLI. So this is more along the lines of what we provide for folks to help them understand how to achieve a certain outcome. This gives them options for consumption - they can copy+paste it into their environment and deploy it, they can compare the code against their setup if trying to figure out what they may be missing, or they can simply use it as a frame of reference to guide their implementation
from determined.
@bryantbiggs can you please share what are you plans and timelines? I'd also like working in that direction, but I don't want to repeat the same work you are doing.
from determined.
Related Issues (20)
- 🤔 model registry - inference with pytorch model HOT 1
- 🐛[bug] Error Starting Up Cluster using det deploy HOT 4
- 🐛[bug] Bad ref on requirements.rst in Docs HOT 1
- 🐛[bug] Resources failed with non-zero exit code: container failed with non-zero exit code: 80 HOT 5
- 🐛[bug] Master refuses to accept agents connection HOT 4
- 🤔[question] Changing the default config path for the determined-agent.service HOT 5
- 🤔[question] Updating the default Determined-Pytorch container to 2.1/2.2 HOT 1
- 🐛[bug] Running Mnist Tutorial distributed causes Runtime Errors and Hanging behavior HOT 12
- 🤔[question] dialing to http://172.22.0.1:32862: dial tcp 172.22.0.1:32862: connect: connection refused HOT 2
- 🐛[bug] Kernel status: pending HOT 11
- 🤔[question] Where can I find the source code of the CLI? HOT 1
- 🤔[question] Can not connect to master node HOT 6
- 🤔[question] How to get pod address by experiment HOT 1
- Integrated with VTable HOT 1
- 🐛[bug] pulling container image: error parsing image name HOT 3
- 💡[feat] local cluster to use offline docker images HOT 1
- 💡[feat] the request to add a feature that releases resources automatically in case of a timeout or if the GPU utilization falls below a certain threshold HOT 4
- 💡[feat] delete the task logs HOT 1
- 🤔[question] I want to callback the interface when the resource is released. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from determined.