Coder Social home page Coder Social logo

Comments (20)

rwagner00 avatar rwagner00 commented on August 19, 2024

Hello! I'm looking into this and had a couple questions.

The error "Maximum retry attempts reached" is a result of Terminus repeating a failed command in an attempt to get a successful execution but running out of attempts. If you execute the command with -v (or -vvv), you will get more information on the error. We do not deliberately block requests above a certain frequency, but we do return cached responses to systems making over 5 requests per second. With that in mind:

Do you see failures when you're performing Terminus actions outside the context of these scripted deploys?
At around how many simultaneous deploys do you begin seeing errors?
Would it be possible for you to attempt to reproduce the issue with the -v or -vvv parameters?

Thank you!

from terminus.

vinmassaro avatar vinmassaro commented on August 19, 2024

@msupko Do you have the ability to authenticate once and then use Gitlab CI cache to save and then restore ~/.terminus inside the runner for each site?

Not sure how your workflow is structured but I'm experimenting with a Github Actions workflow where I retrieve a site list and run that through a matrix, which spins up a runner to run commands against each site.

In a setup step, I install Terminus and then terminus auth:login once to retrieve the site list, then run a cache step for /usr/local/bin/terminus and ~/.terminus and restore that cache in matrix job that spins up a runner for each site.

If you can avoid the auth:login step in parallel across all of those runner instances, I think you will avoid this issue.

from terminus.

msupko avatar msupko commented on August 19, 2024

@rwagner00

Do you see failures when you're performing Terminus actions outside the context of these scripted deploys?

No...or at least it's extremely uncommon.

At around how many simultaneous deploys do you begin seeing errors?

Rough estimate: about 10. We do stagger them by a few seconds so I don't believe we would be exceeding 5 requests per second.

Would it be possible for you to attempt to reproduce the issue with the -v or -vvv parameters?

Sure! I added this to our deploy script and will report back next time we have a chance to run it.

from terminus.

msupko avatar msupko commented on August 19, 2024

@vinmassaro

Do you have the ability to authenticate once and then use Gitlab CI cache to save and then restore ~/.terminus inside the runner for each site?

That is a fantastic suggestion; thank you! Our CI workflow already uses a multi-stage process similar to Github's matrix concept, so--yes, we should be able to authenticate terminus once during the setup phase, cache the .terminus folder, and carry it forward during subsequent stages. There'll be a learning curve and some trial & error involved, but we'll give it a shot. Will confirm if that works.

from terminus.

rwagner00 avatar rwagner00 commented on August 19, 2024

@msupko

Sure! I added this to our deploy script and will report back next time we have a chance to run it.

Great, thank you!

from terminus.

msupko avatar msupko commented on August 19, 2024

@rwagner00 Ran a deployment today with -vvv flag on. Here's the output:

$ terminus auth:login -vvv --machine-token="${!VARNAME}"
 [notice] Logging in via machine token.
 [info] #### REQUEST ####
Headers: {"User-Agent":"Terminus/3.0.7 (php_version=8.1.17&script=/root/terminus/terminus)","Accept":"application/json","Content-Type":"application/json","Content-Length":85}
URI: [https://terminus.pantheon.io:443/api/authorize/machine-token](https://terminus.pantheon.io/api/authorize/machine-token)
Method: POST
Body: {"machine_token":"**HIDDEN**","client":"terminus"}
 [warning] HTTP request POST https://terminus.pantheon.io/api/authorize/machine-token has failed: status code - 500
 [warning] Retrying POST https://terminus.pantheon.io/api/authorize/machine-token 1 out of 5 (reason: status code - 500)
 [warning] Retrying POST https://terminus.pantheon.io/api/authorize/machine-token 2 out of 5 (reason: status code - 500)
 [warning] Retrying POST https://terminus.pantheon.io/api/authorize/machine-token 3 out of 5 (reason: status code - 500)
 [warning] Retrying POST https://terminus.pantheon.io/api/authorize/machine-token 4 out of 5 (reason: status code - 500)
 [error]  HTTP request has failed with error "Maximum retry attempts reached".

from terminus.

rwagner00 avatar rwagner00 commented on August 19, 2024

Thank you for the update, I'm opening an internal ticket about this.

from terminus.

jsulz avatar jsulz commented on August 19, 2024

@msupko, thanks for providing this information.

We're looking into this, but the next time this occurs, could you provide the date and time you encountered this so we can dive deeper into our logging and potentially unearth the root cause?

from terminus.

msupko avatar msupko commented on August 19, 2024

@jsulz We just ran a deployment today...looks like we had a large number of failures at 2023-04-10 20:17 UTC.

from terminus.

namespacebrian avatar namespacebrian commented on August 19, 2024

This is being tracked internally as BUGS-6057.

@msupko Thanks for giving us an exact timeframe for the errors. We were hoping that finding log messages recorded from your issue would help us identify the cause. We see 500s from /authorize/machine-token in the time window that you encountered them, but they are not quickly illuminating a cause. We'll keep investigating.

I think support was in error when they directed you to the github issue queue, because this doesn't appear to be a problem with the terminus code. Since it's intermittent and presents as a 500 error from the API, this seems to be something happening internally with the platform. A support ticket is linked to your Pantheon account/site, while the github issue ticket isn't, so it limits our ability to investigate anything relating to the specific user/site in question.

If we decide that looking into your specific/user site would be helpful in investigating this further, we'll come back and ask you to re-open a support ticket and reference BUGS-6057 and this ticket.

Also, you might update to the latest version of terminus (3.1.5 as I'm writing this). I don't expect this to fix your issue, but one of the first things developers do when triaging an issue is check if it's occurring with the latest version of the software. I don't know of any changes in between 3.0.7 and 3.1.5 that would relate to your issue, so I don't expect it to actually help.. just a suggestion.

from terminus.

namespacebrian avatar namespacebrian commented on August 19, 2024

Update: I spoke with support and we were able to identify the support ticket and link it to BUGS-6057

from terminus.

namespacebrian avatar namespacebrian commented on August 19, 2024

@msupko I'd intended to get back to you with findings but it got lost in the shuffle.

We traced the machine token authentication failures to rate limiting in Auth0's API. Nuances between the horizontal scaling in our platform and the way Auth0's API counts requests against its limits can cause some variation in how the rate limiting manifests to the terminus end user, but the API for generating a session token from a machine token is limited to one request per minute with bursts up to 10 requests. We are not able to change that rate limit.

The solution is what has been suggested previously in this thread; using the machine token to generate a session token once and then caching and reusing that session token across the parallel jobs.

We don't utilize Gitlab, so we don't have a lot of experience with it and it's not simple for us to set up and test a setup like yours, but we are working on documenting a method for caching and reusing the session tokens for common CI providers. At a high level, you would want to create a single "setup" job that runs before any of the parallel deployment jobs run and installs and authenticates terminus, then the parallel jobs would kick off after that job completes and do their deployment work without each downloading terminus and using the machine token.

Currently we have the following example code for GitLab which has not been tested and does not demonstrate a single setup job followed by many parallel deployment jobs, but the cache section with the ~/.terminus path should hold onto and reuse the session token:

<incomplete example code removed - see comment below for updated code>

Does this help? Are you able to share an example of your code with the parallel jobs? If you're not comfortable sharing the example code here but would be willing to share it privately via support, could you open a ticket with support and share it referencing BUGS-6057 so they know where to route the information? BUGS-6057 is now the documentation effort.

from terminus.

deviantintegral avatar deviantintegral commented on August 19, 2024

@msupko 👋 we are running into this with another project too! In this case, it's deployments using GitHub actions with a build matrix.

@namespacebrian how long do session tokens last for?

I'm trying to figure out where we'd need to implement retry logic. In other APIs we've handled this by wrapping all requests with retry logic. Let's say the session token expires during terminus site:info, terminus would try to regenerate a session token a few times before actually erroring. While I see a datestamp in the token file, it corresponds to the time the session token was generated, not an expiry time. There's no storage of the machine token in the cache, so I don't think automatic retries are currently possible.

Another idea; If session tokens don't expire often, especially if they are used every week, could we simplify the above to remove the login calls entirely? My thinking is that if expiries are a rare occurrence, a developer could store the contents of ~/.terminus/cache/<token> as an environment secret.

from terminus.

vinmassaro avatar vinmassaro commented on August 19, 2024

@deviantintegral Can you just auth:login at the beginning of the CI job and then cache and restore the session?

There are docs for caching authentication in CI now: https://docs.pantheon.io/terminus/scripting#authenticate-terminus-for-continuous-integration

See the platform-specific examples linked under #2.

from terminus.

namespacebrian avatar namespacebrian commented on August 19, 2024

@msupko I got some time to try getting this working on Gitlab and there were some adjustments needed to the .gitlab-ci.yml in my previous comment. The yaml below is working for me -- the install_terminus job downloads the terminus.phar and does auth:login with the machine token, and then deployment_one and deployment_two are each able to re-use the terminus.phar and session token without needing to download or auth:login again.

It's necessary to override the default folder where terminus stores session tokens because it defaults to $HOME/.terminus and the CI build dir is outside the home directory and won't cache folders outside the build dir.

image: ubuntu:latest

variables:
  DEBIAN_FRONTEND: noninteractive  # avoid interactive prompts

before_script:
  - apt-get update -yq
  - apt-get install -y jq php curl php-xml php-mbstring

  # add current directory to $PATH
  - export PATH="${PATH}:."

  # need to store the session token inside the build directory
  - export TERMINUS_CACHE_DIR=${PWD}/terminus-session


stages:
  - build
  - deploy_many

cache:
  paths:
    # holds the session login token for reuse in future jobs - $HOME/.terminus by default
    - terminus-session
    # The actual terminus phar so we only need to download it once
    - terminus

install_terminus:
  stage: build
  script:
    - export TERMINUS_RELEASE=$(curl --silent "https://api.github.com/repos/pantheon-systems/terminus/releases/latest" | jq -r .tag_name)
    - echo Fetching release ${TERMINUS_RELEASE}
    - echo "Installing Terminus v${TERMINUS_RELEASE}"
    - curl -L https://github.com/pantheon-systems/terminus/releases/download/${TERMINUS_RELEASE}/terminus.phar --output terminus
    - chmod +x terminus
    - mkdir -p ${PWD}/terminus-session
    - terminus -vvv auth:login --machine-token="${TERMINUS_TOKEN}"
    - terminus -vvv auth:whoami

deployment_one:
  stage: deploy_many
  dependencies: [install_terminus]
  script:
    - terminus -vvv auth:whoami

deployment_two:
  stage: deploy_many
  dependencies: [install_terminus]
  script:
    - terminus -vvv auth:whoami

@deviantintegral The idea is to auth:login in an earlier job and then use the CI environment's caching system to reuse the session token across the rest of the deployments in the job. You still auth:login with the machine token on each CI run, but only once, and all subsequent jobs reuse the token. I believe tokens expire after 24 hours.. as long as your CI job isn't taking 24+ hours to finish, the session shouldn't expire..

from terminus.

deviantintegral avatar deviantintegral commented on August 19, 2024

You still auth:login with the machine token on each CI run, but only once, and all subsequent jobs reuse the token.

Ah! So in that case, we'd need a job that runs first to get the session token, cache it, and then spin up the parallel jobs with that cached token.

I believe tokens expire after 24 hours

Great, that's the answer I was looking for. Thank you!

from terminus.

namespacebrian avatar namespacebrian commented on August 19, 2024

So in that case, we'd need a job that runs first to get the session token, cache it, and then spin up the parallel jobs with that cached token.

This is the way :)

from terminus.

namespacebrian avatar namespacebrian commented on August 19, 2024

@msupko have you had a chance to try out the examples?

from terminus.

deviantintegral avatar deviantintegral commented on August 19, 2024

We're working on solving this for GitHub Actions users at https://github.com/Lullabot/terminus-auth-with-session-cache.

from terminus.

msupko avatar msupko commented on August 19, 2024

@namespacebrian Apologies for the delayed response here! Yes, we have successfully implemented a solution using Gitlab's CI caching to reuse the session across our child jobs. It's basically the same approach as in your examples. It works consistently!

from terminus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.