philips-labs / terraform-aws-github-runner Goto Github PK
View Code? Open in Web Editor NEWTerraform module for scalable GitHub action runners on AWS
Home Page: https://philips-labs.github.io/terraform-aws-github-runner/
License: MIT License
Terraform module for scalable GitHub action runners on AWS
Home Page: https://philips-labs.github.io/terraform-aws-github-runner/
License: MIT License
scale-up lambda function fails with error, RequestError [HttpError]: Resource not accessible by integration
.
I also struggled to find what is the expected format for the github_app
key_base64
variable (I kept getting errors like error:0909006C:PEM routines:get_name:no start line
, a multi-line string (starting LS0t
) which was the base64 of the PEM file seemed to work.
I tried the suggestion in #203 of granting Read access to the installed app in the "Actions" repository permissions without success.
The error message shows that the URL being accessed is https://api.github.com/repos/<my organisation>/<my repo>/actions/runs?status=queued
with authorization header, authorization: 'token [REDACTED]'
and the request is rejected by GitHub.com with '403 Forbidden'.
Please advise.
I'm sure I am just missing something documented somewhere, my integration has been made under my user account whilst I test this. Does the integration need anymore permissions than:
at /var/task/index.js:15325:23
at processTicksAndRejections (internal/process/task_queues.js:97:5) {
status: 403,
headers: {
'access-control-allow-origin': '*',
'access-control-expose-headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset',
connection: 'close',
'content-encoding': 'gzip',
'content-security-policy': "default-src 'none'",
'content-type': 'application/json; charset=utf-8',
date: 'Mon, 13 Jul 2020 14:29:12 GMT',
'referrer-policy': 'origin-when-cross-origin, strict-origin-when-cross-origin',
server: 'GitHub.com',
status: '403 Forbidden',
'strict-transport-security': 'max-age=31536000; includeSubdomains; preload',
'transfer-encoding': 'chunked',
vary: 'Accept-Encoding, Accept, X-Requested-With',
'x-content-type-options': 'nosniff',
'x-frame-options': 'deny',
'x-github-media-type': 'github.v3; format=json',
'x-github-request-id': 'B73E:A1E4:B1A84D:D6FF77:5F0C6FB8',
'x-ratelimit-limit': '5000',
'x-ratelimit-remaining': '4884',
'x-ratelimit-reset': '1594651917',
'x-xss-protection': '1; mode=block'
},
request: {
method: 'GET',
url: 'https://api.github.com/repos/callum-tait-pbx/test_repository/actions/runs?status=queued',
headers: {
accept: 'application/vnd.github.v3+json',
'user-agent': 'octokit-rest.js/17.11.2 octokit-core.js/2.5.4 Node.js/12.16.3 (Linux 4.14; x64)',
authorization: 'token [REDACTED]'
},
request: { hook: [Function: bound bound register] }
},
documentation_url: 'https://developer.github.com/v3/actions/workflow_runs/#list-repository-workflow-runs'
Note I haven't attached a dummy runner to my repository yet, I was assuming I would run into a problem at some point that pointed to that and deal with it then.
scale-down.ts:113
and scale-down.ts:116
use actions.listSelfHostedRunnersForOrg
and actions.listSelfHostedRunnersForRepo
directly.
The returned object contains an array data.runners
, containing up to 100 registered runners according to the API documentation.
If there are more than 100 runners registered, the additional runners are not considered for scaling down.
Hi, first of all, congratulations for this great project.
We have deployed github-runner successfully and it's running very well so far.
One question please. As you know, spot instances can be terminated by AWS. If a GitHub Runner EC2 instance is suddenly stopped by AWS (I mean, in the middle of a pipeline), what happens to the GitHub pipeline? It fails? Is there any retry/re-schedule mechanism to re-execute the build?
Thank you very much in advance.
When building this solution the Github API couldn't tell if a runner was busy or not, so we resorted to trying to delete each runner via de API. If that returned a 500 Internal Server Error, we would know if it was busy.
Just played with the API again and saw the busy
flag was added for runners. See https://docs.github.com/en/rest/reference/actions#list-self-hosted-runners-for-a-repository
Instead of trying to delete a runner we should use this flag so reduce the number of API calls on Github.
Bring lambda under test, check other lamda's for examples.
If termination of an EC2 instance fails the code never cleans up the runner on the next pass.
We should handle this better and clean the EC2 instance up in a next attempt.
As a project owner I want to limit production runner access to protected branches
Repo owners setting up deployment rules
In GitLab you can tie certain runners to protected branches. This enables us to use runners with production credentials and access levels, separate from the pool of runners available for every other branch.
It provides a security model in which accidental or intentional changes to production are limited to merged code.
No proposal, this is a question.
I asked a similar question in the GitHub Actions community forum:
https://github.community/t5/GitHub-Actions/Limit-self-managed-runners-to-protected-branches/m-p/55943#M9692
Hi. I've error on lambda scale up after setup your module.
Cloudwatch logs below:
ERROR Invoke Error
{
"errorType": "Error",
"errorMessage": "Failed handling SQS event",
"stack": [
"Error: Failed handling SQS event",
" at _homogeneousError (/var/runtime/CallbackContext.js:12:12)",
" at postError (/var/runtime/CallbackContext.js:29:54)",
" at callback (/var/runtime/CallbackContext.js:41:7)",
" at /var/runtime/CallbackContext.js:104:16",
" at /var/task/index.js:16834:16",
" at Generator.throw (<anonymous>)",
" at rejected (/var/task/index.js:16816:65)",
" at processTicksAndRejections (internal/process/task_queues.js:97:5)"
]
}
at /var/task/index.js:15124:23
at processTicksAndRejections (internal/process/task_queues.js:97:5) {
status: 403,
headers: {
'access-control-allow-origin': '*',
'access-control-expose-headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset',
connection: 'close',
'content-encoding': 'gzip',
'content-security-policy': "default-src 'none'",
'content-type': 'application/json; charset=utf-8',
date: 'Tue, 17 Nov 2020 17:51:47 GMT',
'referrer-policy': 'origin-when-cross-origin, strict-origin-when-cross-origin',
server: 'GitHub.com',
status: '403 Forbidden',
'strict-transport-security': 'max-age=31536000; includeSubdomains; preload',
'transfer-encoding': 'chunked',
vary: 'Accept-Encoding, Accept, X-Requested-With',
'x-content-type-options': 'nosniff',
'x-frame-options': 'deny',
'x-github-media-type': 'github.v3; format=json',
'x-github-request-id': '93DE:E7C5:957F272:AC944E7:5FB40DB3',
'x-ratelimit-limit': '5600',
'x-ratelimit-remaining': '5598',
'x-ratelimit-reset': '1605639047',
'x-ratelimit-used': '2',
'x-xss-protection': '1; mode=block'
},
request: {
method: 'GET',
url: 'https://api.github.com/repos/RaketaApp/packer-base-ami/actions/runs?status=queued',
headers: {
accept: 'application/vnd.github.v3+json',
'user-agent': 'octokit-rest.js/18.0.6 octokit-core.js/3.1.1 Node.js/12.18.4 (linux; x64)',
authorization: 'token [REDACTED]'
},
request: { hook: [Function: bound bound register] }
},
documentation_url: 'https://docs.github.com/rest/reference/actions#list-workflow-runs-for-a-repository'
}```
I have followed the README and created the Github App and setup the terraform modules, however, can't get the runners created. Please see the error below, I guess its something to do with App permissions but I have tried them all and have been at this for a while, but no luck, not sure what I'm missing!!
Run the example module and try to create a runner
Does not create a runner
Should create a runner
2020-09-08T12: 53: 03.620Z 61a6e96f-ddd4-5bd3-ac59-bebe5d0eb4b7 ERROR RequestError [HttpError
]: Not Found
at /var/task/index.js: 14863: 23
at processTicksAndRejections (internal/process/task_queues.js: 97: 5) {
status: 404,
headers: {
'access-control-allow-origin': '*',
'access-control-expose-headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset',
connection: 'close',
'content-encoding': 'gzip',
'content-security-policy': "default-src 'none'",
'content-type': 'application/json; charset=utf-8',
date: 'Tue,
08 Sep 2020 12: 53: 03 GMT',
'referrer-policy': 'origin-when-cross-origin, strict-origin-when-cross-origin',
server: 'GitHub.com',
status: '404 Not Found',
'strict-transport-security': 'max-age=31536000; includeSubdomains; preload',
'transfer-encoding': 'chunked',
vary: 'Accept-Encoding, Accept, X-Requested-With',
'x-content-type-options': 'nosniff',
'x-frame-options': 'deny',
'x-github-media-type': 'github.v3; format=json',
'x-github-request-id': 'DFE0: 5DBC: 88B16E0:A4C558A: 5F577EAF',
'x-ratelimit-limit': '5000',
'x-ratelimit-remaining': '4986',
'x-ratelimit-reset': '1599572823',
'x-ratelimit-used': '14',
'x-xss-protection': '1; mode=block'
},
request: {
method: 'POST',
url: 'https: //api.github.com/orgs/theabrar/actions/runners/registration-token',
headers: {
accept: 'application/vnd.github.v3+json',
'user-agent': 'octokit-rest.js/18.0.3 octokit-core.js/3.1.1 Node.js/12.18.2 (linux; x64)',
authorization: 'token [REDACTED
]',
'content-length': 0
},
request: { hook: [Function: bound bound register
]
}
},
documentation_url: 'https: //docs.github.com/rest/reference/actions#create-a-registration-token-for-an-organization'
}
I think this issue can be resolved by automating the GitHub app creation, possibly using probot.
Currently secrets required for the lambda functions are stored in environment varialbes. I would be much better to secure them in KMS or SSM.
Avoid manual update of terraform docs (input / output) in readme. Options
Distribution syncer lambda sometimes not working after a terraform apply, cause unclear. Removing the lambda and run an apply again solves the issue.
Not reproduceable, happens occasionally.
Currently the runner scales at the moment any workflow in the repo that sends the event is in the status queued. This should be if the workflow correleated tot the received event.
Stream ec2 instance logging to cloudwatch
Make retention time configureable.
Currently app creation is a manual process. Much better if we can automate the creation of an app.
Add linting and update workflow, check webhook for examples
I'm not sure if this this is one problem that is with the runners or two problems and one of them is GitHub. Or maybe it's all GitHub's fault, and it's not communicating properly with the webhooks. IDK.
In the past week, I have regularly been seeing jobs getting cancelled or just not happening. The first thing I'm seeing, it seems like there might be some sort of timing issue between the close order being given to the spot request and the request picking up another job. My jobs are getting "The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled" when no one has done anything.
The other thing I'm seeing is jobs that just never run and yet the workflow fails.
I don't know. It happens all the time with my pipeline with all of my workflows and jobs. All of my jobs are running bash scripts which in turn run docker containers for everything. I do have a few differences from the default settings. I have instance type set to m5.4xlarge, and I have a post_install script that provides ecr access:
mkdir /home/ec2-user/.docker
touch /home/ec2-user/.docker/config.json
echo "{" >> /home/ec2-user/.docker/config.json
echo ' "credsStore": "ecr-login"' >> /home/ec2-user/.docker/config.json
echo "}" >> /home/ec2-user/.docker/config.json
amazon-linux-extras enable docker
yum install -y amazon-ecr-credential-helper
I just thought to try updating the lambda zips, since I'm based straight on the github repo and haven't done that since the last time I ran a terraform init. So I'll give that a shot.
In the readme the following is stated:
Go to GitHub and create a new app. Beware you can create apps your organization or for a user. For now we handle only the organization level app.
But when the option to create an organization level app also forces the app to be public, so it is installable by anyone.
So if I create an organization level app for running this module, what's stopping someone else from discovering my github app installation url and using my self-hosted runners?
Currently all log levels are logged. By default only info should logged.
Due some exports are deprecated the webhook octokit lib cannot upgraded above 7.5.0
It seems like we need a -y on this line
My runners hang and don't even finish booting without it.
When following the readme, using the example configuration and adjusting the Github app permissions as per #100 (comment) the scale-up
lambda fails to create the EC2 instance due to ServiceLinkedRoleCreationNotPermitted
terraform.tfvars
file with Github App credentialsterraform init && terraform apply
Github app sends webhook, webhook lambda forwards it, scaleup-lambda throws error:
...
ERROR AuthFailure.ServiceLinkedRoleCreationNotPermitted: The provided credentials do not have permission to create the service-linked role for EC2 Spot Instances.
at Request.extractError (/var/task/index.js:41424:35)
at Request.callListeners (/var/task/index.js:47771:20)
at Request.emit (/var/task/index.js:47743:10)
at Request.emit (/var/task/index.js:18467:14)
at Request.transition (/var/task/index.js:17801:10)
at AcceptorStateMachine.runTo (/var/task/index.js:26145:12)
at /var/task/index.js:26157:10
at Request.<anonymous> (/var/task/index.js:17817:9)
at Request.<anonymous> (/var/task/index.js:18469:12)
at Request.callListeners (/var/task/index.js:47781:18) {
code: 'AuthFailure.ServiceLinkedRoleCreationNotPermitted',
time: 2020-07-30T15:03:24.631Z,
requestId: 'c7bab39e-b75c-4e7d-bc29-6622b3d4ddb1',
statusCode: 403,
retryable: false,
retryDelay: 68.19342592727871
}
Scale up lambda should create EC2 instance
I'm sure this is a IAM permissions issue. I am rather new to both AWS and terraform and am not sure in which of them this needs to be solved and how to go about it.
Would be great to get some pointers.
Hello, thanks for your great work here.
Can we rename the runners_maxiumum_count
variable name please?
The word maxiumum
has a typo. Correct name is maximum
Thanks!
Error in scale lambda invocation having to do with the private key decoding.
The configuration:
module "runners" {
source = "philips-labs/github-runner/aws"
version = "~> 0.2"
...snip...
github_app = {
key_base64 = var.github_app_key_base64
id = var.github_app_id
client_id = var.github_app_client_id
client_secret = var.github_app_client_secret
webhook_secret = random_password.random.result
}
webhook_lambda_zip = "lambdas-download/webhook.zip"
runner_binaries_syncer_lambda_zip = "lambdas-download/runner-binaries-syncer.zip"
runners_lambda_zip = "lambdas-download/runners.zip"
enable_organization_runners = true
runner_extra_labels = "default"
}
The github_app_key_base64
which I suspect is the problem is set as following (PKCS#1 RSAPrivateKey):
github_app_key_base64 = <<-EOT
-----BEGIN RSA PRIVATE KEY-----
<base64 encoded>
-----END RSA PRIVATE KEY-----
EOT
scale
lambda fails.
scale
lambda succeeds.
ERROR Error: error:0909006C:PEM routines:get_name:no start line
at Sign.sign (internal/crypto/sig.js:105:29)
at Object.sign (/var/task/index.js:12802:45)
at Object.jwsSign [as sign] (/var/task/index.js:9637:24)
at Object.module.exports.6343.module.exports [as sign] (/var/task/index.js:36570:16)
at getToken (/var/task/index.js:1861:23)
at Object.githubAppJwt (/var/task/index.js:1882:23)
at getAppAuthentication (/var/task/index.js:1509:57)
at getInstallationAuthentication (/var/task/index.js:1630:35)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (internal/process/task_queues.js:97:5) {
library: 'PEM routines',
function: 'get_name',
reason: 'no start line',
code: 'ERR_OSSL_PEM_NO_START_LINE'
}
Kudos for the really nice work on this and for sharing with the community! :)
Is this policy needed for the runners to function? It seems like it would allow the runner to have arbitrary access to SSM values.
https://github.com/philips-labs/terraform-aws-github-runner/blob/develop/modules/runners/policies-runner.tf#L28
It looks like this one policy should be enough for the runner to access its own secret values.
Github Actions checks are not executed, instances boot up and then shut down without executing the job.
I just did the normal v2 setup
The workers boot up, but are idle, until they are shut down again.
The workers pick up the jobs and execute them in a reasonable time
Not sure if this belongs here, but have you any idea what could be the reason? The workers are definitely online and it just started randomly. The only thing I did is delete workers and unregister some that did not exist anymore and were somehow not unregistered or running without stop for multiple days.
I have an offline macOs Worker for tests not to fail, my CI runs on linux. Does this pose a problem?
I managed to fix the issues I encountered regarding Resource not accessible by integration
and Not Found
:
Repository permissions > Actions > Read-only
(regardless if you're an organization or not)Also need to add into the terraform file (via #104):
resource "aws_iam_service_linked_role" "spot" {
aws_service_name = "spot.amazonaws.com"
}
(this is for the 0.5.0
version)
Hello!
Since we are using GitHub Enterprise Server, the github.com url is not available.
I want the download-lambda module to allow me to specify a url when I download a zip.
Cheers
I just followed all the instructions and successfuly deployed this services to AWS and integrated it with GitHub. When I create a new job run, GitHub contacts the webhook and the webhook successfuly sends the message to the scale-up lambda, but as there were no runners at the time the job was enqueued, it immediately fails. This provokes that the scale-up lambda finds 0 queued jobs when querying the repository workflows, and doesn't create any runner.
I simply followed the instructions. Tried with 0.1.0 and 0.2.0
Ideally we would be able to tell a minimum amount of running runners, so we guarantee that there always is an immediately available runner.
I've had a poke at the module and I am presuming this currently only supports Linux-based runners? Any plans to add Windows runner support?
Add linting and update workflow, check webhook for examples
Lambda distribution should be configureable so the repo can be used as terraform module.
Hi,
Whenever I trigger a workflow run while there are no running EC2 runner instances, the following happens:
If I trigger another workflow run while the EC2 runner is started, it gets properly picked up and executed.
Any idea what the problem is here?
Thanks!
The current solution is unable to launch instances compatible to use with GitHub's ARM64 self-hosted runner.
Developers/Teams building for ARM64 (e.g. Raspberry Pi)
Benefit: extends support to GitHub Actions pipelines that use ARM64
PR in progress with changes to runner_architecture
auto-detected from instance type, support for downloading arm64 actions-runner from GitHub, and a patch to account for lack of pre-installed ICU support in .NET Core, required by the arm64 actions-runner.
Will document how to enable arm64 support as well as some gotchas I ran into (some Graviton instances aren't available in all AZs)
Not much? Might need a test case for the change to syncer
lambda.
Setting a Graviton/Graviton2 instance type in example/default/main.tf
and (optionally) specifying subnet AZs in example/default/vpc.tf
results in a successful stack that can launch functional ARM64 self-hosted runners.
n/a
dev-usw2-scale-up Execution result: failed
Trigger via commit in configured application with requisite github app per the docs/README in this repo and https://040code.github.io/2020/05/25/scaling-selfhosted-action-runners
ERROR Error: error:0909006C:PEM routines:get_name:no start line at Sign.sign ERROR Invoke Error
(see full error trace/out below)
The commit to configured repo causes lambda function execution and requisite scaling up of or deployment of AWS EC2 spot instance.
The most recent failure/error upon commit to configured github repo with github app configured to watch
CloudWatch: CloudWatch Logs: Log groups: /aws/lambda/dev-usw2-scale-up
available in github gist here:
gist-file-aws-lambda-dev-usw2-scale-up-error
At first glance, appears like might be related to a cert/key error?
Requesting validation and suggestions on resolution
Thank you
Hello Again!
I've got a question, how do you support spinning up different images depending on the label? My dream solution is runners are span up as required with them avaliable at the organisation level. The runner that is span up is based on the label provided, so if it's a node-12 label for instance then a node 12 instance is span up based on a node 12 launch template. How does the setup support multiple labels?
Cheers
As a developer interacting with a public repository, I want to be able to have ephemeral instances so that I can safely use self-hosted runners in a public repo.
Any public repository user where github actions are used, and the default github hosted runners do not provide sufficient resources.
How can you pass the instance type you want to build. I saw that the default instance is m5.large, but there is no explanation on how we can change that.
Hello, thanks for the great project. Everything is working fine, except scale-down lambda. It fails with the SyntaxError: Unexpected token u in JSON at position 0
errors.
here is my lambda download code:
module "lambdas" {
source = "philips-labs/github-runner/aws//modules/download-lambda"
version = "0.4.0"
lambdas = [
{
name = "webhook"
tag = "v0.4.0"
},
{
name = "runners"
tag = "v0.4.0"
},
{
name = "runner-binaries-syncer"
tag = "v0.4.0"
}
]
}
as for idle config, I'm using defaults.
here is the logs from the cloudwatch:
2020-08-19T15:03:24.035Z 22fd0c68-5fde-46e1-963e-422a6ae3aa00 ERROR Unhandled Promise Rejection
{
"errorType": "Runtime.UnhandledPromiseRejection",
"errorMessage": "SyntaxError: Unexpected token u in JSON at position 0",
"reason": {
"errorType": "SyntaxError",
"errorMessage": "Unexpected token u in JSON at position 0",
"stack": [
"SyntaxError: Unexpected token u in JSON at position 0",
" at JSON.parse (<anonymous>)",
" at Object.<anonymous> (/var/task/index.js:8456:39)",
" at Generator.next (<anonymous>)",
" at /var/task/index.js:8385:71",
" at new Promise (<anonymous>)",
" at module.exports.471.__awaiter (/var/task/index.js:8381:12)",
" at Object.scaleDown (/var/task/index.js:8455:12)",
" at /var/task/index.js:16564:22",
" at Generator.next (<anonymous>)",
" at /var/task/index.js:16543:71"
]
},
"promise": {},
"stack": [
"Runtime.UnhandledPromiseRejection: SyntaxError: Unexpected token u in JSON at position 0",
" at process.<anonymous> (/var/runtime/index.js:35:15)",
" at process.emit (events.js:315:20)",
" at process.EventEmitter.emit (domain.js:482:12)",
" at processPromiseRejections (internal/process/promises.js:209:33)",
" at processTicksAndRejections (internal/process/task_queues.js:98:32)"
]
}
| 2020-08-19T17:03:24.075+02:00Copy[ERROR] [1597849404074] LAMBDA_RUNTIME Failed to post handler success response. Http response code: 403. | [ERROR] [1597849404074] LAMBDA_RUNTIME Failed to post handler success
Scale down lambda should work as expected and terminate idle instances after timeout.
===
thanks for any help!
Seeing runners fail to delete. The underlying AWS instances get purged with "orphaned runner deleted" log messages, but for some reason we are getting rate limited somewhere (I think in AWS) and then the Github runners never get removed.
If we wait long enough, we have seen as many as 800 offline runners...
Here are some relevant lambda logs from the scale down lambda:
{
"errorType": "Runtime.UnhandledPromiseRejection",
"errorMessage": "RequestLimitExceeded: Request limit exceeded.",
"reason": {
"errorType": "RequestLimitExceeded",
"errorMessage": "Request limit exceeded.",
"code": "RequestLimitExceeded",
"message": "Request limit exceeded.",
"time": "2020-10-19T12:00:12.277Z",
"requestId": "eb9ecb84-5f5a-4317-974b-10371c2df8f7",
"statusCode": 503,
"retryable": true,
"stack": [
"RequestLimitExceeded: Request limit exceeded.",
" at Request.extractError (/var/task/index.js:40075:35)",
" at Request.callListeners (/var/task/index.js:46386:20)",
" at Request.emit (/var/task/index.js:46358:10)",
" at Request.emit (/var/task/index.js:17843:14)",
" at Request.transition (/var/task/index.js:17177:10)",
" at AcceptorStateMachine.runTo (/var/task/index.js:25384:12)",
" at /var/task/index.js:25396:10",
" at Request.<anonymous> (/var/task/index.js:17193:9)",
" at Request.<anonymous> (/var/task/index.js:17845:12)",
" at Request.callListeners (/var/task/index.js:46396:18)"
]
},
"promise": {},
"stack": [
"Runtime.UnhandledPromiseRejection: RequestLimitExceeded: Request limit exceeded.",
" at process.<anonymous> (/var/runtime/index.js:35:15)",
" at process.emit (events.js:315:20)",
" at process.EventEmitter.emit (domain.js:483:12)",
" at processPromiseRejections (internal/process/promises.js:209:33)",
" at processTicksAndRejections (internal/process/task_queues.js:98:32)"
]
}
[ERROR] [1603108812400] LAMBDA_RUNTIME Failed to post handler success response. Http response code: 403. | [ERROR] [1603108812400] LAMBDA_RUNTIME Failed to post handler success response. Http response code: 403.
-- | --
Any ideas what could be going on here? Thanks in advance for your help!
When there are over 30 runners in a repo/organization the scale down lambda thinks new runners are orphans and will terminate them even while running a build.
minimum_running_time_in_minutes
option or 5 minutes by default)Runners get terminated while they should not be deleted.
These runners should not be terminated in this scenario.
2020-08-26T09:30:06.050Z 7d6440bb-27ba-4d17-ba26-3edc285b88c1 INFO Runner 'i-0d90f0e61ef64b847' is orphan, and will be removed.
2020-08-26T09:30:06.272Z 7d6440bb-27ba-4d17-ba26-3edc285b88c1 DEBUG Runner terminated.i-0d90f0e61ef64b847
In modules/runners/lambdas/runners/src/scale-runners/scale-down.ts
in the scaleDown
function the following code is used to retrieve registered runners.
const registered = enableOrgLevel
? await githubAppClient.actions.listSelfHostedRunnersForOrg({
org: repo.repoOwner,
})
: await githubAppClient.actions.listSelfHostedRunnersForRepo({
owner: repo.repoOwner,
repo: repo.repoName,
});
This API is paginated and by default returns the first 30 runners. The page size can be upped to 100 runners, but to be sure we should get all the runners.
Maximum number runners can be set in the lambda but not configured in terraform.
Seems no instructions / directions are documented for pre commit hooks.
Hello!
This looks excellent whilst we wait for GitHub to provide a supported solution. I work in a Cloudformation shop however and so I will be converting the Terraform, what isn't super clear to me is what varous env variables I need to provide the various Lambda functions as I will be deploying via Cloudformation and the serverless framework. Could you clarify what is required on the individual lambda functions for them to work?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.