Comments (29)
@oba11 Well, you are absolutely right IF architecture is implemented with LB and stay the same as now.
I am thinking about it this in a different way:
- Github PR sends a request to AWS Lambda
- AWS Lambda function triggers ECS Task creation & return a reply to Github PR that "Atlantis is starting now. You should see a response here in a couple minutes."
- ECS Task completes processing the request (eg, SQS message), persist response in S3 or DynamoDB
- S3 or DynamoDB event triggers AWS Lambda function which posts a proper reply to original GitHub PR with all details Atlantis has produced.
While this solution CAN be implemented with several hacks using the current version of Atlantis, it will be better to do some architectural changes to Atlantis, which has to be discussed in more details.
I think Atlantis should be divided into several services:
- Core - Service which does the heavy-lifting (run terraform commands according to the configuration specified) and outputs result to
STDOUT
(for simplicity) - Acceptor - Service which process web-hooks. Now it supports VCS (GitHub, GitLab, BitBucket), but it can be more generic. The ultimate idea is to allow colleagues to trigger the same web-hooks from Slack.
- Publisher - Service which posts back to the acceptor. Currently, it is VCS which triggered the invocation (eg, GitHub PR). This adds to the previous service (see above).
@lkysow, what do you think about this? Should I move this discussion to https://github.com/runatlantis/atlantis or do you have something like this already in your plans?
from terraform-aws-atlantis.
@vitaliCoasy terraform runs can take more than 15 minutes (current limit of max duration for lambda function), so I don't think it will make much sense to migrate from ECS Fargate to pure Lambda function.
from terraform-aws-atlantis.
Hi Everyone. I'm all for cost-saving but I don't think this will work with how Atlantis currently runs. Also, the cost savings aren't that substantial. Given us-east-1 pricing, I think Atlantis costs 45 cents a day:
- per vCPU per hour | $0.0506
- per GB per hour | $0.0127
- we're using CPU 256 and mem 512 -> 0.25 and 0.5
- (0.0506 * 0.25 + 0.0127 * 0.5)*24 hours = $0.456/day
As to the other questions:
- Atlantis keeps state on the filesystem between plan and apply. If the container is torn down then this state will be lost so I don't think it's possible to spin it up on demand right now
- I don't want to pull apart Atlantis into separate services right now. I thinking having a single binary makes it operationally much simpler to deploy and it makes it easier to contribute to.
from terraform-aws-atlantis.
Thanks for the feedback @lkysow!
I also won't be working on this feature myself in the nearest future, so I can't come up with numerous hacks which can be applied to get this to work.
Let's keep this issue open and come back to it when time allows, or someone wants to contribute :)
from terraform-aws-atlantis.
I dont know if this would make any sense for current atlantis architecture, but as AWS Lambda allows to run containers right now, I would rather consider re-building the atlantis container and to add lambda handler API in it, so we could just deploy atlantis container into Lambda and to run it per Lambda calls, without a need to run it in ECS.
from terraform-aws-atlantis.
I could be wrong but I think it breaks system design whereby there is unhealthy loadbalancer on weekends. Tasks schedule feature is good for tasks without service (and loabalancer). Here you need the loadbalancer to always be healthy and available to consume requests from github webhook.
Also I think ideal cold start is just tearing down the module and the end of workweeks and start it up on workweeks.
Like I mentioned, I could be wrong 😄
from terraform-aws-atlantis.
No doubt this will be super nice, lets see what @lkysow thinks :)
from terraform-aws-atlantis.
Where is the container_definition for the Fargate task?
from terraform-aws-atlantis.
Container definition is specified as part of aws_ecs_task_definition resource:
terraform-aws-atlantis/main.tf
Lines 446 to 456 in 9150d5a
from terraform-aws-atlantis.
There does not appear to be anything related to running the atlantis server. You don't provide command
and entrypoint
parameters.
from terraform-aws-atlantis.
There does not appear to be anything related to running the atlantis server. You don't provide
command
andentrypoint
parameters.
The Atlantis Docker image will automatically run the server
command if not given any args: https://github.com/runatlantis/atlantis/blob/master/Dockerfile#L29
from terraform-aws-atlantis.
Hi, Luke; thanks for response
What about arguments? I have to provide --repo-config-json
for my use. Likewise, I need to set credentials with profile
since my user handles many accounts
from terraform-aws-atlantis.
Can you use the custom_environment_secrets
and custom_environment_variables
variables? Atlantis supports using environment variables for all its flags (https://www.runatlantis.io/docs/server-configuration.html#environment-variables).
ex. ATLANTIS_REPO_CONFIG_JSON
.
Sorry but I'm not too familiar with this module myself. Also maybe if you have more questions you could open up a separate issue because I think this issue is about running Atlantis on-demand via lambda so we shouldn't pollute that purpose too much.
from terraform-aws-atlantis.
OK, thanks!
from terraform-aws-atlantis.
Given us-east-1 pricing, I think Atlantis costs 45 cents a day:
@lkysow I don't think these are good defaults though. I had an apply die on me and I had to manually recover some stuff because it ran out of resources and was killed by ECS
from terraform-aws-atlantis.
:( that sucks. Curious, why did ECS kill it? Maybe we can bump up the default resources so others don't have that issue.
Yeah if you want to avoid that you must give it persistent disk. Either through kube or through an actual VM.
from terraform-aws-atlantis.
@lkysow I don't think persistent disk would have helped me here. My state is in S3, it just left a cfn stack in a bad state. Wasn't a huge pain, at least this time, and I bumped up the resources.
The problem was that it swamped the CPU enough that it took too long to respond to the health check. One possible solution is to just make the health check more forgiving. The tradeoff of course is taking longer to recover when there's a real problem.
from terraform-aws-atlantis.
@smiller171 What did you bump the resources to?
Current module defaults
container_memory_reservation = 128
ecs_task_cpu = 256
ecs_task_memory = 512
cloudposse/terraform-aws-ecs-atlantis uses the same defaults
container_cpu = 256
container_memory = 512
from terraform-aws-atlantis.
@nitrocode I ended up using:
ecs_task_cpu = 1024
ecs_task_memory = 2048
container_memory_reservation = 1536
This has worked well for me so far
from terraform-aws-atlantis.
Oh wow so you quadrupled each setting. Thanks. If I see similar issues, I'll do the same.
from terraform-aws-atlantis.
@nitrocode Yeah, but this almost certainly depends on how big your stack is and how many projects are running in parallel
from terraform-aws-atlantis.
This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days
from terraform-aws-atlantis.
this coupled with #206 could work - holding from going stale
from terraform-aws-atlantis.
Yeah, I think it would make more sense to trigger a Lambda which starts an ECS job
from terraform-aws-atlantis.
This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days
from terraform-aws-atlantis.
bump :)
from terraform-aws-atlantis.
This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days
from terraform-aws-atlantis.
This issue was automatically closed because of stale in 10 days
from terraform-aws-atlantis.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
from terraform-aws-atlantis.
Related Issues (20)
- Invalid index in module "ecs_service" in v 4.0.1 HOT 3
- security_group_egress_rules uses wrong input HOT 2
- Missing secretsmanager:GetSecretValue policy action HOT 2
- ATLANTIS_ATLANTIS_URL not correctly picked up from atlantis.fqdn HOT 3
- Allow custom ALB Security Group rules HOT 7
- Job details lost after re-deployment HOT 2
- Document that setting ATLANTIS_GH_USER breaks the github app (ATLANTIS_GH_APP_ID / ATLANTIS_GH_APP_KEY) scenario HOT 2
- Atlantis default UID of 100 vs 1000 HOT 2
- Redeploying fargate atlantis task always breaks HOT 6
- atlantis_repo_allowlist format for Atlantis and the github_repository_webhooks HOT 2
- Do we need to call github_organization_webhook instead of github_repository_webhook for * ? HOT 1
- When EFS is Enabled, the Created EFS File System has an empty 'Name' Tag HOT 2
- The ALB Target Group Name is not set to the 'name' Variable HOT 2
- No possibility to pass a created SG for Atlantis ALB to ECS Service if we set var.create_alb = true, and var.alb.create_security_group = false HOT 1
- Just curious how to allow atlantis to comment on Bitbucket PRs using this tf module. HOT 3
- (re-open #384) atlantis_repo_allowlist format for Atlantis and the github_repository_webhooks HOT 4
- "Encountering 'Unsupported attribute' Errors with OIDC Configuration in AWS LB Listener" HOT 1
- Issues with ACM Certificate Validation Timeout and ECS Service Creation Due to Target Group Association HOT 7
- Secret manager version reported changes HOT 3
- Political ware breaks Atlantis module HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from terraform-aws-atlantis.