Coder Social home page Coder Social logo

Comments (29)

antonbabenko avatar antonbabenko commented on August 16, 2024 9

@oba11 Well, you are absolutely right IF architecture is implemented with LB and stay the same as now.

I am thinking about it this in a different way:

  1. Github PR sends a request to AWS Lambda
  2. AWS Lambda function triggers ECS Task creation & return a reply to Github PR that "Atlantis is starting now. You should see a response here in a couple minutes."
  3. ECS Task completes processing the request (eg, SQS message), persist response in S3 or DynamoDB
  4. S3 or DynamoDB event triggers AWS Lambda function which posts a proper reply to original GitHub PR with all details Atlantis has produced.

While this solution CAN be implemented with several hacks using the current version of Atlantis, it will be better to do some architectural changes to Atlantis, which has to be discussed in more details.

I think Atlantis should be divided into several services:

  1. Core - Service which does the heavy-lifting (run terraform commands according to the configuration specified) and outputs result to STDOUT (for simplicity)
  2. Acceptor - Service which process web-hooks. Now it supports VCS (GitHub, GitLab, BitBucket), but it can be more generic. The ultimate idea is to allow colleagues to trigger the same web-hooks from Slack.
  3. Publisher - Service which posts back to the acceptor. Currently, it is VCS which triggered the invocation (eg, GitHub PR). This adds to the previous service (see above).

@lkysow, what do you think about this? Should I move this discussion to https://github.com/runatlantis/atlantis or do you have something like this already in your plans?

from terraform-aws-atlantis.

antonbabenko avatar antonbabenko commented on August 16, 2024 2

@vitaliCoasy terraform runs can take more than 15 minutes (current limit of max duration for lambda function), so I don't think it will make much sense to migrate from ECS Fargate to pure Lambda function.

from terraform-aws-atlantis.

lkysow avatar lkysow commented on August 16, 2024 1

Hi Everyone. I'm all for cost-saving but I don't think this will work with how Atlantis currently runs. Also, the cost savings aren't that substantial. Given us-east-1 pricing, I think Atlantis costs 45 cents a day:

  • per vCPU per hour | $0.0506
  • per GB per hour | $0.0127
  • we're using CPU 256 and mem 512 -> 0.25 and 0.5
  • (0.0506 * 0.25 + 0.0127 * 0.5)*24 hours = $0.456/day

As to the other questions:

  1. Atlantis keeps state on the filesystem between plan and apply. If the container is torn down then this state will be lost so I don't think it's possible to spin it up on demand right now
  2. I don't want to pull apart Atlantis into separate services right now. I thinking having a single binary makes it operationally much simpler to deploy and it makes it easier to contribute to.

from terraform-aws-atlantis.

antonbabenko avatar antonbabenko commented on August 16, 2024 1

Thanks for the feedback @lkysow!

I also won't be working on this feature myself in the nearest future, so I can't come up with numerous hacks which can be applied to get this to work.

Let's keep this issue open and come back to it when time allows, or someone wants to contribute :)

from terraform-aws-atlantis.

vitaliCoasy avatar vitaliCoasy commented on August 16, 2024 1

I dont know if this would make any sense for current atlantis architecture, but as AWS Lambda allows to run containers right now, I would rather consider re-building the atlantis container and to add lambda handler API in it, so we could just deploy atlantis container into Lambda and to run it per Lambda calls, without a need to run it in ECS.

from terraform-aws-atlantis.

oba11 avatar oba11 commented on August 16, 2024

I could be wrong but I think it breaks system design whereby there is unhealthy loadbalancer on weekends. Tasks schedule feature is good for tasks without service (and loabalancer). Here you need the loadbalancer to always be healthy and available to consume requests from github webhook.
Also I think ideal cold start is just tearing down the module and the end of workweeks and start it up on workweeks.
Like I mentioned, I could be wrong 😄

from terraform-aws-atlantis.

oba11 avatar oba11 commented on August 16, 2024

No doubt this will be super nice, lets see what @lkysow thinks :)

from terraform-aws-atlantis.

Jaff avatar Jaff commented on August 16, 2024

Where is the container_definition for the Fargate task?

from terraform-aws-atlantis.

antonbabenko avatar antonbabenko commented on August 16, 2024

Container definition is specified as part of aws_ecs_task_definition resource:

resource "aws_ecs_task_definition" "atlantis" {
family = "${var.name}"
execution_role_arn = "${aws_iam_role.ecs_task_execution.arn}"
task_role_arn = "${aws_iam_role.ecs_task_execution.arn}"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "${var.ecs_task_cpu}"
memory = "${var.ecs_task_memory}"
container_definitions = "${local.container_definitions}"
}

from terraform-aws-atlantis.

Jaff avatar Jaff commented on August 16, 2024

There does not appear to be anything related to running the atlantis server. You don't provide command and entrypoint parameters.

from terraform-aws-atlantis.

lkysow avatar lkysow commented on August 16, 2024

There does not appear to be anything related to running the atlantis server. You don't provide command and entrypoint parameters.

The Atlantis Docker image will automatically run the server command if not given any args: https://github.com/runatlantis/atlantis/blob/master/Dockerfile#L29

from terraform-aws-atlantis.

Jaff avatar Jaff commented on August 16, 2024

Hi, Luke; thanks for response

What about arguments? I have to provide --repo-config-json for my use. Likewise, I need to set credentials with profile since my user handles many accounts

from terraform-aws-atlantis.

lkysow avatar lkysow commented on August 16, 2024

Can you use the custom_environment_secrets and custom_environment_variables variables? Atlantis supports using environment variables for all its flags (https://www.runatlantis.io/docs/server-configuration.html#environment-variables).

ex. ATLANTIS_REPO_CONFIG_JSON.

Sorry but I'm not too familiar with this module myself. Also maybe if you have more questions you could open up a separate issue because I think this issue is about running Atlantis on-demand via lambda so we shouldn't pollute that purpose too much.

from terraform-aws-atlantis.

Jaff avatar Jaff commented on August 16, 2024

OK, thanks!

from terraform-aws-atlantis.

smiller171 avatar smiller171 commented on August 16, 2024

Given us-east-1 pricing, I think Atlantis costs 45 cents a day:

@lkysow I don't think these are good defaults though. I had an apply die on me and I had to manually recover some stuff because it ran out of resources and was killed by ECS

from terraform-aws-atlantis.

lkysow avatar lkysow commented on August 16, 2024

:( that sucks. Curious, why did ECS kill it? Maybe we can bump up the default resources so others don't have that issue.

Yeah if you want to avoid that you must give it persistent disk. Either through kube or through an actual VM.

from terraform-aws-atlantis.

smiller171 avatar smiller171 commented on August 16, 2024

@lkysow I don't think persistent disk would have helped me here. My state is in S3, it just left a cfn stack in a bad state. Wasn't a huge pain, at least this time, and I bumped up the resources.

The problem was that it swamped the CPU enough that it took too long to respond to the health check. One possible solution is to just make the health check more forgiving. The tradeoff of course is taking longer to recover when there's a real problem.

from terraform-aws-atlantis.

nitrocode avatar nitrocode commented on August 16, 2024

@smiller171 What did you bump the resources to?

Current module defaults

container_memory_reservation = 128

ecs_task_cpu    = 256
ecs_task_memory = 512

cloudposse/terraform-aws-ecs-atlantis uses the same defaults

container_cpu    = 256
container_memory = 512

from terraform-aws-atlantis.

smiller171 avatar smiller171 commented on August 16, 2024

@nitrocode I ended up using:

  ecs_task_cpu                 = 1024
  ecs_task_memory              = 2048
  container_memory_reservation = 1536

This has worked well for me so far

from terraform-aws-atlantis.

nitrocode avatar nitrocode commented on August 16, 2024

Oh wow so you quadrupled each setting. Thanks. If I see similar issues, I'll do the same.

from terraform-aws-atlantis.

smiller171 avatar smiller171 commented on August 16, 2024

@nitrocode Yeah, but this almost certainly depends on how big your stack is and how many projects are running in parallel

from terraform-aws-atlantis.

github-actions avatar github-actions commented on August 16, 2024

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

from terraform-aws-atlantis.

bryantbiggs avatar bryantbiggs commented on August 16, 2024

this coupled with #206 could work - holding from going stale

from terraform-aws-atlantis.

smiller171 avatar smiller171 commented on August 16, 2024

Yeah, I think it would make more sense to trigger a Lambda which starts an ECS job

from terraform-aws-atlantis.

github-actions avatar github-actions commented on August 16, 2024

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

from terraform-aws-atlantis.

smiller171 avatar smiller171 commented on August 16, 2024

bump :)

from terraform-aws-atlantis.

github-actions avatar github-actions commented on August 16, 2024

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

from terraform-aws-atlantis.

github-actions avatar github-actions commented on August 16, 2024

This issue was automatically closed because of stale in 10 days

from terraform-aws-atlantis.

github-actions avatar github-actions commented on August 16, 2024

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

from terraform-aws-atlantis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.