actions / actions-runner-controller Goto Github PK

View Code? Open in Web Editor NEW

4.4K 49.0 1.1K 5.43 MB

Kubernetes controller for GitHub Actions self-hosted runners

License: Apache License 2.0

Dockerfile 2.03% Makefile 1.94% Go 90.42% Shell 3.15% Smarty 2.17% HCL 0.30%

github-actions kubernetes operator

actions-runner-controller's Introduction

Actions Runner Controller (ARC)

About

Actions Runner Controller (ARC) is a Kubernetes operator that orchestrates and scales self-hosted runners for GitHub Actions.

With ARC, you can create runner scale sets that automatically scale based on the number of workflows running in your repository, organization, or enterprise. Because controlled runners can be ephemeral and based on containers, new runner instances can scale up or down rapidly and cleanly. For more information about autoscaling, see "Autoscaling with self-hosted runners."

You can set up ARC on Kubernetes using Helm, then create and run a workflow that uses runner scale sets. For more information about runner scale sets, see "Deploying runner scale sets with Actions Runner Controller."

People

Actions Runner Controller (ARC) is an open-source project currently developed and maintained in collaboration with the GitHub Actions team, external maintainers @mumoshu and @toast-gear, various contributors, and the awesome community.

If you think the project is awesome and is adding value to your business, please consider directly sponsoring community maintainers and individual contributors via GitHub Sponsors.

In case you are already the employer of one of contributors, sponsoring via GitHub Sponsors might not be an option. Just support them in other means!

See the sponsorship dashboard for the former and the current sponsors.

Getting Started

To give ARC a try with just a handful of commands, Please refer to the Quickstart guide.

For an overview of ARC, please refer to About ARC

With the introduction of autoscaling runner scale sets, the existing autoscaling modes are now legacy. The legacy modes have certain use cases and will continue to be maintained by the community only.

For further information on what is supported by GitHub and what's managed by the community, please refer to this announcement discussion.

Documentation

ARC documentation is available on docs.github.com.

Legacy documentation

The following documentation is for the legacy autoscaling modes that continue to be maintained by the community

Contributing

We welcome contributions from the community. For more details on contributing to the project (including requirements), please refer to "Getting Started with Contributing."

Troubleshooting

We are very happy to help you with any issues you have. Please refer to the "Troubleshooting" section for common issues.

actions-runner-controller's People

Contributors

Stargazers

Watchers

Forkers

justinholmes mumoshu naka-gawa eugenestarchenko tmnkgwa4 syndbg people-ai andrewsavino1 reiniertimmer jorge07 gmsllc acj vitobotta aweris arash-bizcover dragon3 deanrock vitali-raikov avinashdesireddy sminamot dvdliao taskade banban9999 cooclsee stackdumper edeediong tactint onelapahead scalyr jwiebalk brendangalloway harrygogonis urcomputeringpal hi-fi jnewland flanksource julienkode bacca-lab chartleyit zacharybenamram damacus xoen toast-gear warashi emaincourt erikkn ahmad-hamade nylas kmarquardsen stratuscent duboisf dirkcgrunwald distruapp giantswarm geek-cookbook jonico donovanmuller thrawny taxibeat arkoselabs habx alex-mozejko ap0phi5 zetaab hrishin dakuon int128 sanderknape tapih audip kimxogus tabossert erikh rofafor jollerprutt pierreyves-lebrun jbenaventem spirrello mikeperry-kr robwhitby carus11 bkimbrough88 melscoop-test dcarvalhal ba32107 manabusakai syllogy sid129 tome25 alyragab maniacs-oss adaptivelab gabriel-dantas98 arjunmadan cbrand sp-manuel-jurado javipolo asoldino mandsa01 smarteacher

actions-runner-controller's Issues

Reposiroty names

Hey, your pattern check on repository names don´t allow underscores.

For example:

my_repo.

Which github allows as names.

secret "controller-manager" not found

Cert manager creates the webhook-server-cert secret, but not the controller-manager secret.

✗  kubectl version --short
Client Version: v1.15.10
Server Version: v1.16.13-gke.401

✗  kubectl -n actions-runner-system get secret
NAME                  TYPE                                  DATA   AGE
default-token-nw72h   kubernetes.io/service-account-token   3      13m
webhook-server-cert   kubernetes.io/tls                     3      13m

✗  kubectl -n actions-runner-system describe pod/controller-manager-c5c7cd46-wsdnc
...
Events:
  Type     Reason       Age                    From                                                  Message
  ----     ------       ----                   ----                                                  -------
  Normal   Scheduled    7m35s                  default-scheduler                                     Successfully assigned actions-runner-system/controller-manager-c5c7cd46-wsdnc to gke-ops-cdx-default-node-pool-878eff7b-615h
  Warning  FailedMount  3m18s (x2 over 5m32s)  kubelet, gke-ops-cdx-default-node-pool-878eff7b-615h  Unable to attach or mount volumes: unmounted volumes=[controller-manager], unattached volumes=[controller-manager cert default-token-nw72h]: timed out waiting for the condition
  Warning  FailedMount  83s (x11 over 7m34s)   kubelet, gke-ops-cdx-default-node-pool-878eff7b-615h  MountVolume.SetUp failed for volume "controller-manager" : secret "controller-manager" not found
  Warning  FailedMount  64s                    kubelet, gke-ops-cdx-default-node-pool-878eff7b-615h  Unable to attach or mount volumes: unmounted volumes=[controller-manager], unattached volumes=[default-token-nw72h controller-manager cert]: timed out waiting for the condition

Unregistering of runner is broken

When I try to delete runner, I got

2020-03-18T18:34:44.737Z ERROR controller-runtime.controller Reconciler error {"controller": "runner", "request": "actions-runner-system/my-repo-runner", "error": "json: cannot unmarshal object into Go value of type []controllers.GitHubRunner"}

Seems like Github changed something in API.

Runner update end fails

Problem

Runner update fails and stop working

Logs

Runner update in progress, do not shutdown runner.
Downloading 2.169.0 runner
Waiting for current job finish running.
Generate and execute update script.
Runner will exit shortly for update, should back online within 10 seconds.
/runner/run.sh: line 47: /runner/bin/Runner.Listener: No such file or directory

where to set --sync-period flag?

in your docs you state that one could set that flag, but I don't get where I could pass that or which interval size that is about.

Runner pod always comes back in Terminating state

I deleted a runner pod to pick IRSA changes, but the pod always comes back in Terminating state:

kubectl get pods rev-gh-actions-sample-tkv6h-tr7d4 
NAME                                READY   STATUS        RESTARTS   AGE
rev-gh-actions-sample-tkv6h-tr7d4   0/2     Terminating   0          5s

In the manager logs, I see the following error:

2020-04-16T17:21:45.963Z        ERROR   controller-runtime.controller   Reconciler error        {"controller": "runner", "request": "actions-runners/rev-gh-actions-sample-tkv6h-tr7d
4", "error": "pods \"rev-gh-actions-sample-tkv6h-tr7d4\" already exists"}
github.com/go-logr/zapr.(*zapLogger).Error
        /go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88

Any ideas on what I'm doing wrong? Thank you!

Volume mounted twice, cause controller to fail

Not sure if it's the only issue. I've been trying to get volume + volumeMount to work to inject a secret.

Here's what I'm getting in the logs


 manager 2020-04-29T14:47:57.249Z    ERROR    controllers.Runner    Failed to create pod resource    {"runner": "default/docker-images-runners-98kht-xffvd", "error": "Pod \"docker-images-runners-98kht-xffvd\" is
  invalid: [spec.volumes[1].name: Duplicate value: \"artifactory-secret-volume\", spec.containers[0].volumeMounts[0].name: Not found: \"work\", spec.containers[0].volumeMounts[1].name: Not found: \"docker\", spe
 c.containers[1].volumeMounts[0].name: Not found: \"work\", spec.containers[1].volumeMounts[1].name: Not found: \"docker\"]"}

See my runnerDeployment below

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: docker-images-runners
spec:
  replicas: 2
  template:
    spec:
      repository: myorg/docker-images
      labels:
        - custom-label
      volumes:
      - name: artifactory-secret-volume
        secret:
          secretName: artifactory-secret
      volumeMounts:
      - name: artifactory-secret-volume
        readOnly: true
        mountPath: "/etc/artifactory-secret-volume"

I believe this happens because of

https://github.com/summerwind/actions-runner-controller/blob/b96979888c0bf4075360b4c6858c8c5fe8fee00d/controllers/runner_controller.go#L363

If I'm reading this properly, it's appending the same thing twice?
I haven't tried a fix yet.

edit:

My guess is that it should be this instead

pod.Spec.Volumes = append(pod.Spec.Volumes , runner.Spec.Volumes...)

`Go to the "Install App" tab on the left side of the page` is no longer correct

I get to generating and automatically downloading the pem private key. At this point I have the App ID, Client ID, and Client secret besides the pem key download. The docs say Go to the "Install App" tab on the left side of the page and shows a picture with that on the left side. I do not have that option, it appears github changed things. My side menu items are:

General
Permissions & events
Advanced
Beta features
Public page

Is install app still a thing? If so, does anyone have a URL?

RunnerDeployment isn't recognizing min and max replicas

First, thank you very much for this controller and your readme, setup was a breeze.

Initially, I set up a basic runner, everything worked well. Then I deleted it and ran applied this RunnerDeployment

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: oncehub-github-runner-controller
spec:
  minReplicas: 1
  maxReplicas: 12
  scaleDownDelaySecondsAfterScaleUp: 1m
  template:
    spec:
      organization: my-org

After running this and waiting a few minutes, I saw no pods and no runners.

So I modified the file to this:

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: oncehub-github-runner-controller
spec:
  replicas: 3
  minReplicas: 1
  maxReplicas: 12
  scaleDownDelaySecondsAfterScaleUp: 1m
  template:
    spec:
      organization: my-org

This successfully set up 3 runners, and 2 are idle. But it's not scaling down to 1 runner.

What am I doing wrong?

No runners

My first time setting this up, unfortunately it's showing that i have no runners.

kay@khan:~/checkpoint/self-hosted-runner$ kubectl get runners
No resources found in default namespace.

Setup:

kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.15.1/cert-manager.yaml
kubectl apply -f https://github.com/summerwind/actions-runner-controller/releases/latest/download/actions-runner-controller.yaml
kubectl create secret generic controller-manager \
    -n actions-runner-system \
    --from-literal=github_token=<redacted>
kubectl apply -f runner.yml

runner.yml

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: checkpoint-runner
spec:
  minReplicas: 1
  maxReplicas: 5
  scaleDownDelaySecondsAfterScaleUp: 1m
  template:
    spec:
      organization: <redacted>

I did not come across any errors/problems in the setup. So what have i done wrong, why are there no runners (even tho minReplicas is set to 1)

Ah i forgot to actually add a kind: Runner. So now i have

kay@khan:~/checkpoint/self-hosted-runner$ kubectl get runners
NAME                ORGANIZATION   REPOSITORY   LABELS   STATUS
checkpoint-runner   <redacted>                         Running

Can someone confirm that i need both Runner and RunnerDeployment?

because i updated min replicas to 2 like in the example but i still only have (like above)

Stuck runners

Hi,

Firstly thank you for the wonderful project. It was much needed. Everything was working well for me before I tried moving my self hosted runners from dev env to production env. Now I have a bunch of runners that I can't seem to delete:

arhue@hp-laptop:~/git$ kubectl get runners
NAME                      REPOSITORY            STATUS
ga-unittest-6glhx-74d28   tectonic/awardforce   
ga-unittest-6glhx-crp9v   tectonic/awardforce   
ga-unittest-wzj5j-l6r4t   tectonic/awardforce   
ga-unittest-wzj5j-zmjnd   tectonic/awardforce

arhue@hp-laptop:~/git/infrastructure-helm$ kubectl delete runners --force ga-unittest-6glhx-74d28
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
runner.actions.summerwind.dev "ga-unittest-6glhx-74d28" force deleted
^C

I waited almost 10 mins but they didn't get deleted. This is what my runner deployment looks like:

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: ga-unittest
  namespace: default
spec:
  replicas: 2
  template:
    spec:
      repository: tectonic/awardforce
      ImagePullPolicy: Always
      image: xxx/infra:actions-runner
      resources:
        limits:
          cpu: "4.0"
          memory: "8Gi"
        requests:
          cpu: "2.0"
          memory: "4Gi"
      sidecarContainers:
        - name: mysql
          image: mysql:5.7
          env:
            - name: MYSQL_ROOT_PASSWORD
              value: abcd1234
          securityContext:
            runAsUser: 0

No pods come up. There seems to be an error in runners:

Events:
  Type     Reason                         Age                 From               Message
  ----     ------                         ----                ----               -------
  Warning  FailedUpdateRegistrationToken  25m (x21 over 64m)  runner-controller  Updating registration token failed

I tried adding new tokens and app credentials but that change anything. Would like some help on how I can delete the runners created earlier and how I can spin up new ones. Config is managed with helmfile.

Runner container does not restart even if the job ends normally

Hai ! Thank you always for your Kubernetes operator !!

I have a question.
Even if the action job finishes successfully, only certain pods get stuck without performing a runner container restart...

runnerDeployment’s manifest:

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: runner
  namespace: runner
spec:
  replicas: 3
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: beta.kubernetes.io/instance-type
                operator: In
                values:
                - c4.2xlarge
                - c5.2xlarge
      envFrom:
      - configMapRef:
          name: ro-actions-runner
      - secretRef:
          name: ro-actions-runner
      image: <aws-id>.dkr.ecr.ap-northeast-1.amazonaws.com/custom-runner:1.15
      labels:
      - ro
      repository: <repository>

runner's status:

😀 ❯❯❯ k get pod -n runner
NAME                              READY   STATUS    RESTARTS   AGE
ro-runner-52zmh-4tcrt   2/2     Running   0          47m
ro-runner-52zmh-vjb2f   2/2     Running   0          136m
ro-runner-52zmh-wxt6k   1/2     Running   0          124m <<< this runner pod

😀 ❯❯❯ k describe pod -n runner ro-runner-52zmh-wxt6k
Name:           ro-runner-52zmh-wxt6k
Namespace:      runner
Priority:       0
Node:           ip-10-0-56-65.ap-northeast-1.compute.internal/10.0.56.65
Start Time:     Thu, 30 Jul 2020 16:44:09 +0900
Labels:         runner-template-hash=847bc54779
Annotations:    kubernetes.io/psp: eks.privileged
Status:         Running
IP:             10.0.59.171
Controlled By:  Runner/ro-runner-52zmh-wxt6k
Containers:
  runner:
    Container ID:   docker://19080ec1adc8822bfaf916c43797e7171a1c5ff5b1666b0df2485cd9c19db976
    Image:          <aws-id>.dkr.ecr.ap-northeast-1.amazonaws.com/custom-runner:1.15
    Image ID:       docker-pullable://<aws-id>.dkr.ecr.ap-northeast-1.amazonaws.com/custom-runner@sha256:bb88570bc0bedc6c8b8321887e9b6e693c0fb6542aba83e3a839949338f99b73
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed <<< normally finish
      Exit Code:    0 <<< exit status 0
      Started:      Thu, 30 Jul 2020 16:44:10 +0900
      Finished:     Thu, 30 Jul 2020 17:05:52 +0900
    Ready:          False

I’m thinking that the runner container will be restarted when the actions jobs normally finish running by the following processing.
Is my perception correct?

https://github.com/summerwind/actions-runner-controller/blob/ba8f61141b30268a00387795a66abdd72b60c78c/controllers/runner_controller.go#L194-L196

runner's status

😱 ❯❯❯ k get pod -n runner ro-runner-52zmh-wxt6k -o json | jq -r ".status.containerStatuses[].state"
{
  "running": {
    "startedAt": "2020-07-30T07:44:10Z"
  }
}
{
  "terminated": {
    "containerID": "docker://19080ec1adc8822bfaf916c43797e7171a1c5ff5b1666b0df2485cd9c19db976",
    "exitCode": 0,
    "finishedAt": "2020-07-30T08:05:52Z",
    "reason": "Completed",
    "startedAt": "2020-07-30T07:44:10Z"
  }
}

Thank you!

[FEATURE]: Support for SelfHosted Org live Runners

@mumoshu @summerwind: https://github.blog/changelog/2020-04-22-github-actions-organization-level-self-hosted-runners/. Finally, it happened.

[question]: Support of pod spec fields

Hi! Thanks, for this cool tool. I have few questions, do you have a plan to add pod annotations and servicaccount to runners?

spec:
  replicas: 3
  template:
    metadata:
      annotations:
        iam.amazonaws.com/role: role-arn```

I'd like to build docker images and push them to ecr or have ability to access k8s cluster api for example as use case.

If nope, I could work on this, thanks

Associate runners with multiple repos

I wonder if there is any way that we can associate action runners with multiple repos.

(this would also allow to reduce the committed resource usage)

Deployment does not always restart pod after completion of a job.

Occasionally I will have a deployment that ends up in this state below:

[bagel@fort-kickass  ]$ kubectl get pods
NAME                                                READY   STATUS    RESTARTS   AGE
controller-manager-c5c7cd46-qbgrp                   2/2     Running   0          27h
helm-github-actions-runner-deployment-bc99p-8n6jt   1/2     Running   0          17m
tiller-deploy-7b56f5db68-5brkz                      1/1     Running   0          5d1

The output of the runner pod says that the job has completed and exited code 0. It's at this point that I would expect the pod to be restarted, but obviously it does not. This results in a never-ender queued job.

I can delete this pod and another will be spawned up to pick up the queued jobs.

Autoscaling is lacking Actions permission by default

The problem

The "Create GitHub Apps on your account/organization" link in the readme doesn't enable "Actions" permission by default.

https://github.com/settings/apps/new?url=http://github.com/summerwind/actions-runner-controller&webhook_active=false&public=false&administration=write

This results in "403 Resource not accessible by integration" once the autoscaling is enabled because the autoscaler tries to reach out to GitHub Actions API (https://api.github.com/repos/rotten-tech/fs-api/actions/runs).

2020-08-21T15:43:16.819Z  ERROR  controllers.HorizontalRunnerAutoscaler  Could not compute replicas

{
  "horizontalrunnerautoscaler": "actions-runner-system/actions-runner-actions-runner-deployment-autoscaler",
  "error": "GET https://api.github.com/repos/rotten-tech/fs-api/actions/runs: 403 Resource not accessible by integration []"
}

github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
github.com/summerwind/actions-runner-controller/controllers.(*HorizontalRunnerAutoscalerReconciler).Reconcile
	/workspace/controllers/horizontalrunnerautoscaler_controller.go:85
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88

Possible solutions

Enable "Actions" repository permission by default in the "Create GitHub Apps for your account/organization" link.
Add a section to the readme that describes the need to add "Actions" permission if autoscaling is enabled.

I would be happy to provide a pull request with a resolution of the described problem.

P.S: Thank you for your work on the actions-runner-controller. You saved me a lot of money after I ran out of free minutes on the 10th day of the month, and allowed me to test my application in the VPC without a need to connect to VPN from GitHub-hosted runners.

Old runnerReplicaSet is still alive with new runnerReplicaSet when using command 'kubectl edit'

Hai, i use this tool for my project!
Thank you for this greatest tool!!

By the way, i have a question.
When i tried the rollout of runnerDeployment by change the image-tag, old runnerReplicaSet is still alive with new runnerReplicaSet.
Is this behavior expected?

runnerDeployment’s manifest:

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: k8s-runnerDeployment
spec:
  replicas: 3
  template:
    spec:
      image: naka-gawa/runner:1.0
      repository: naka-gawa/actions-runner-controller
      env: []

before status:

$ kubectl get runnerdeployment
NAME               DESIRED   CURRENT   READY
k8s-runnerdeploy   3

$ kubectl get runnerreplicaset
NAME                     DESIRED   CURRENT   READY
k8s-runnerdeploy-vqfbd   3         3         3

$ kubectl get runner
NAME                           REPOSITORY                            STATUS
k8s-runnerdeploy-vqfbd-5gqjw   naka-gawa/actions-runner-controller   Running
k8s-runnerdeploy-vqfbd-6jmzl   naka-gawa/actions-runner-controller   Running
k8s-runnerdeploy-vqfbd-knr9g   naka-gawa/actions-runner-controller   Running

$ kubectl get pod
NAME                           READY   STATUS    RESTARTS   AGE
k8s-runnerdeploy-vqfbd-5gqjw   2/2     Running   0          5m23s
k8s-runnerdeploy-vqfbd-6jmzl   2/2     Running   0          5m22s
k8s-runnerdeploy-vqfbd-knr9g   2/2     Running   0          5m22s

step to reproduce:

$ kubectl edit runnerDeployment k8s-runner
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: k8s-runnerDeployment
spec:
  replicas: 3
  template:
    spec:
      envFrom:
      - configMapRef:
          name: env
      image: naka-gawa/runner:1.0 -> 1.1 # Changed!!
      repository: naka-gawa/actions-runner-controller

after status:

$ kubectl get runnerdeployment
NAME               DESIRED   CURRENT   READY
k8s-runnerdeploy   3

$ kubectl get runnerreplicaset
NAME                     DESIRED   CURRENT   READY
k8s-runnerdeploy-gdwtf   3         3         3 <<< new
k8s-runnerdeploy-vqfbd   3         3         3 <<< old

$ kubectl get runner
NAME                           REPOSITORY                            STATUS
k8s-runnerdeploy-gdwtf-5zxvw   naka-gawa/actions-runner-controller   Running
k8s-runnerdeploy-gdwtf-jwcpq   naka-gawa/actions-runner-controller   Running
k8s-runnerdeploy-gdwtf-zklk7   naka-gawa/actions-runner-controller   Running
k8s-runnerdeploy-vqfbd-5gqjw   naka-gawa/actions-runner-controller   Running
k8s-runnerdeploy-vqfbd-6jmzl   naka-gawa/actions-runner-controller   Running
k8s-runnerdeploy-vqfbd-knr9g   naka-gawa/actions-runner-controller   Running

$ kubectl get pod
NAME                           READY   STATUS    RESTARTS   AGE
k8s-runnerdeploy-gdwtf-5zxvw   2/2     Running   0          118s
k8s-runnerdeploy-gdwtf-jwcpq   2/2     Running   0          118s
k8s-runnerdeploy-gdwtf-zklk7   2/2     Running   0          118s
k8s-runnerdeploy-vqfbd-5gqjw   2/2     Running   0          9m24s
k8s-runnerdeploy-vqfbd-6jmzl   2/2     Running   0          9m23s
k8s-runnerdeploy-vqfbd-knr9g   2/2     Running   0          9m23s

Environment:

EKS 1.15

Autoscaling for private repositories

It would be great if we could use the HorizontalRunnerAutoscaler for Private repositories.

I created a organization level RunnerDeployment and a HorizontalRunnerAutoscaler which targets it, and I get the usual 404 for unauthenticated requests:

  Normal  RunnerAutoscalingFailure  2m44s (x78 over 6h22m)  horizontalrunnerautoscaler-controller  GET https://api.github.com/repos/MYORG/MYREPO/actions/runs: 404 Not Found []

Wondering if this is something that be added or if I'm missing a way to authenticate?

Runners become offline

Hi, I could finally get actions runner working last night. My setup involves:

Kubernetes v1.16
Built with eksctl 0.22.0
RunnerDeployment (with 3 replicas)
Organization-wide
Custom runner image
nodeSelector
Github Personal Access Token

However, all of a sudden it stopped working now. All the runners seem to be offline from the Organization's actions view (as shown below)

As these runners seemed to stop working roughly after 12 hours after it first started working, is it more of a token expiration issue?

What should I do to avoid this issue?

Thanks

Getting started - runner connection refused

Hello,
I really appreciate all the <3 that is going into auto-scaling self hosted runners.
I am closely following the readme and running into:

➜  summerwind kubectl apply -f runner.yaml
Error from server (InternalError): error when creating "runner.yaml": Internal error occurred: failed calling webhook "mutate.runner.actions.summerwind.dev": Post https://webhook-service.actions-runner-system.svc:443/mutate-actions-summerwind-dev-v1alpha1-runner?timeout=30s: dial tcp 10.105.1.87:443: connect: connection refused

I'm using k8s for Mac locally
I've tried standing up the cert-manager stuff using both kubectl and helm

Any thoughts would really be appreciated! thank you

Docker caching

Hi! I just installed this and am happy I can delete the dedicated VPS for my runner. I am adapting my workflows, but I am having problems with Docker builds, because it's not caching image layers. I exec'ed into the runner and saw that after each build docker images returns nothing.

What can I do to prevent each build from starting from scratch?

Thanks!

Support for bind-mounting docker socket

At the moment, runner pods always consist of a runner container and a docker container (based on dind).

Is it something to consider a configuration setting that would allow the runner to use the bind-mount instead of dind?

While I understand the reason for using dind, it's not always ideal in terms of performance (layer caching, etc) and for some cases you might want to consider mounting /var/run/docker.sock directly from the underlying platform.

Default setting could be kept as-is (dind), but setting it to mount the native socket could be added as an extra possiblity.

Does it make sense to have such a setting?

Helm chart contribution

Hey there!

I wrote a helm 3 chart that I'm currently using to deploy this. I'm willing to contribute it to this repo if you're interested. I just wanted to gauge your interest before working on a PR.

Effect by depreciation of "installation" and "installation_repositories" events

Thank a lots for starting and maintaining this project.

It's been used in my organization widely and making github action runner deployment very easy.

I just received a notification from Github regarding the replacement of "installation" and "installation_repositories" events
https://developer.github.com/changes/2020-04-15-replacing-the-installation-and-installation-repositories-events/

Just wonder whether this change will break the controller?

Support mulit-arch images

Description

It would be great to support arm64 architecture using docker buildx.

I am more than happy to assist with this as a core maintainer in https://github.com/raspbernetes/multi-arch-images/

However, if this is something you do not wish to support let me know.

Allow to use emptyDir "in-memory" for "/runner/_work" mount point :)

Hi everyone, i'm wondering if it's possible to change the default workdir directory for your runners.
I'm planning to use an in-memory emptyDir for my workdir.

For now, i'm not able to change the default work mount point, the controller is not happy :)

I'm trying something like this :

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: actions-runner-deployment
  namespace: actions-runner-system
spec:
  replicas: 20
  template:
    metadata:
      labels:
        name: actions-runner
        group: actions
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: name
                operator: In
                values:
                - actions-runner
              - key: group
                operator: In
                values:
                - actions
            topologyKey: "kubernetes.io/hostname"
      repository: go-aos/aos-app
      volumeMounts:
      - mountPath: /runner/_work
        name: work
    volumes:
    - name: work
      emptyDir:
        medium: "Memory"
      env:
      - name: TZ
        value: Europe/Paris
      resources:
        limits:
          cpu: "4.0"
          memory: "8Gi"
        requests:
          cpu: "2.0"
          memory: "4Gi"

Controller Logs :

2020-09-30T07:40:54.199540148Z 2020-09-30T07:40:54.199Z	ERROR	controller-runtime.controller	Reconciler error	{"controller": "runner", "request": "actions-runner-system/actions-runner-deployment-rz7vd-b22bl", "error": "Pod \"actions-runner-deployment-rz7vd-b22bl\" is invalid: spec.containers[0].volumeMounts[2].mountPath: Invalid value: \"/runner/_work\": must be unique"}
2020-09-30T07:40:54.199543734Z github.com/go-logr/zapr.(*zapLogger).Error
2020-09-30T07:40:54.199545846Z 	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
2020-09-30T07:40:54.199547929Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
2020-09-30T07:40:54.199550890Z 	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258
2020-09-30T07:40:54.199553284Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
2020-09-30T07:40:54.199555429Z 	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
2020-09-30T07:40:54.199557534Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
2020-09-30T07:40:54.199559687Z 	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
2020-09-30T07:40:54.199561885Z k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
2020-09-30T07:40:54.199563861Z 	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
2020-09-30T07:40:54.199565994Z k8s.io/apimachinery/pkg/util/wait.JitterUntil
2020-09-30T07:40:54.199568063Z 	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
2020-09-30T07:40:54.199570154Z k8s.io/apimachinery/pkg/util/wait.Until
2020-09-30T07:40:54.199572131Z 	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88

Have a nice day !

Failed calling webhook...context deadline exceeded

Hi,

I have been trying to get this working for some time now referring the guide in the README but it always fails at the last step.
I have tried both the approaches of authentication - Github App as well as PAT.

I tried installing Runners as well as RunnerDeployments but the step fails with similar errors

Here are the steps that I followed -

# Install cert-manager
kubectl create namespace cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install \
  cert-manager jetstack/cert-manager \
  --namespace actions-runner-system \
  --version v0.15.1 \
  --set installCRDs=true

# Create necessary resources in actions-runner-system namespace
kubectl apply -f https://github.com/summerwind/actions-runner-controller/releases/latest/download/actions-runner-controller.yaml

# Create secret with github PAT
kubectl create secret generic controller-manager \
    -n actions-runner-system \
    --from-literal=github_token=<GH_PAT>

# Copied the Runner.yml file from the example and changed the repo name
kubectl apply -f Runner.yaml

# Error for Runner
Error from server (InternalError): error when creating "Runner.yaml": Internal error occurred: failed calling webhook "mutate.runner.actions.summerwind.dev": Post https://webhook-service.actions-runner-system.svc:443/mutate-actions-summerwind-dev-v1alpha1-runner?timeout=30s: context deadline exceeded

# Error for RunnerDeployment
Error from server (InternalError): error when creating "RunnerDeployment.yaml": Internal error occurred: failed calling webhook "mutate.runnerdeployment.actions.summerwind.dev": Post https://webhook-service.actions-runner-system.svc:443/mutate-actions-summerwind-dev-v1alpha1-runnerdeployment?timeout=30s: context deadline exceeded

If it is relevant, I am on Google Kubernetes Engine and we run nginx-ingress for ingress management instead of default LoadBalancer. Do I need to add some firewall rule or ingress rule?

Question: auto scaling clarification, how to achieve quick scaling for org runners?

I'm managed to get the solution up with a organisational runner with workflows successfully working it it, I'm very impressed with the solution so far. I am now trying to get the autoscaling aspects working and am struggling a bit. Is my understanding correct?

The docs state In the below example, actions-runner checks for pending workflow runs for each sync period, and scale to e.g. 3 if there're 3 pending jobs at sync time.. As a result if I want the number of runners to scale quickly I need to set this period as low as possible? Is 1 minute the shortest sync period possible. It seems to me that a quick scaling convergence time would be the default experience most people would want so I'm sure I am miss understanding something?

My deployment is just the default stack deployed with the removal of the namespace creation (would love a Helm chart for all this as we are a Helm shop). My runner yaml is:

apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
  name: serverless-deployments-runner
spec:
  organization: my-org-name
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: serverless-deployments-deployment
spec:
  template:
    spec:
      organization: my-org-name
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: serverless-deployments-autoscaler
spec:
  scaleTargetRef:
    name: serverless-deployments-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: TotalNumberOfQueuedAndInProgressWorkflowRuns
    repositoryNames:
    - summerwind/actions-runner-controller

Make RunnerSpec fields more generic and flexible

So that it looks more like a Pod spec, which might give better U/X for standard Kubernetes users who are familiar with the Pod spec.

Extracted from #14 (comment)

Context:

TODOs:

The runner container name should be configurable. Add spec.runnerContainerName?
The runner pod containers should be merged into the default spec in a strategic-merge-like manner. See https://github.com/summerwind/actions-runner-controller/pull/14/files/8abfd207357f2825eff61cf38a637a14e2515ccc#diff-bfdff11557d0a2c88e52c5c7d0d94777

Add unzip/git to runner dependencies

I'm not sure if you're willing to consider this, but for some use cases it would be very helpful to have some additional dependencies installed in the runner. Although it is easy to get around by building your own runner image with the additional dependencies, maybe the following ones could have added value for the official runner image.

unzip

Some tool installation packages use .zip instead of .tgz, which will fail on this runner

git > 2.18

The official checkout action actions/checkout gives the following message when checking out a repository:

The repository will be downloaded using the GitHub REST API
To create a local Git repository instead, add Git 2.18 or higher to the PATH

If you want to work with a local git repo (analyze, commit, etc) then a higher git version is needed. At the moment, the base ubuntu:18.04 image contains git 2.17.1, so in my custom image I install this via ppa (I'm not sure if this is desirable or if there's a viable alternative)

Again - I'm not sure this should be in the official image - it's also very well possible to do without and use a custom runner image

Could not run RunnerDeployment with organization setting

Hi, I just installed this for my organization and I already saw that the runners are running:

› kubectl get runner                                                                                                                                                 
NAME                        ORGANIZATION   REPOSITORY   LABELS   STATUS
github-runner-dn442-25m4h   lifepal                              Running
github-runner-dn442-btkzd   lifepal                              Running
github-runner-dn442-pxgh6   lifepal                              Running
github-runner-dn442-xzd2k   lifepal                              Running

And the runners are registered within the organization as well as shown below.

And this is how I installed the runners:

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: github-runner
spec:
  replicas: 4
  template:
    spec:
      organization: <my org>

However, the actions could not be run. This was the error I got from the action log

It looks like I must have missed a setting somewhere?

Prevent auto-update of runner version

Whenever there is an update released from Github, it breaks the runners unless you update the binary with the latest version in Docker. Making this issue to explore if the auto-update can be stopped or if it is possible to update the runner automatically.

I'm currently using a script which rebuilds the runner Docker container with a binary from curl -sL https://api.github.com/repos/:org/:repo/actions/runners/downloads -H "Authorization: token xxx" | jq -r '.[].download_url' | grep linux-x64 and restarts the pods when they stop working due to an update; which is obviously not very ideal.

Controller fails to delete unregistered runner resources

Hello,

Thank you for the project I ran into some problems when I tried to use a custom image for the runner as my use case requires it. Here is what I added:

kind: Runner
metadata:
  name: testrepoansible-runner
spec:
  image: custom_container
  repository: testrepoansible
  env: []

I did this with 2 different containers, one had a typo and failed during pulling. The other did not have a github actions runner installed at all. The idea was to check if someone makes a mistake and adds a resource that does not exist what happens.

Running kubectl delete runner testrepoansible-runner , resource deleted, but the runner was still there. I checked the code and correct me if I am wrong but if the runner does not unregister it cannot be deleted. Here is the error from the controller:

github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
github.com/summerwind/actions-runner-controller/controllers.(*RunnerReconciler).Reconcile
	/workspace/controllers/runner_controller.go:100
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
2020-03-19T11:51:06.185Z	ERROR	controller-runtime.controller	Reconciler error	{"controller": "runner", "request": "default/example-github-actions-runner", "error": "json: cannot unmarshal object into Go value of type []controllers.GitHubRunner"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88

To remove the resource I had to patch the runner:
kubectl patch runner example-github-actions-runner -p '{"metadata":{"finalizers":[]}}' --type=merge

Runner job interrupted due to token refresh

When runners are reconciled, a termination signal is triggered and running jobs are stopped.

Using a custom image based on the latest release (2020-07-13) summerwind/actions-runner.

apiVersion: v1
kind: Namespace
metadata:
  name: ci-cd-test-runners
  labels:
    name: ci-cd-test-runners
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: test-runner
  namespace: ci-cd-test-runners
spec:
#  minReplicas: 1
#  maxReplicas: 12
  replicas: 3
  template:
    spec:
      repository: acme/acme
      labels:
        - test
      tolerations:
        - key: runner-taint
          operator: Exists
          effect: NoSchedule
      nodeSelector:
        cloud.google.com/gke-nodepool: ci-cd-test-runners
      ImagePullPolicy: Always
      image: gcr.io/acme/ci-cd-runner/test:latest
      resources:
        limits:
          cpu: "2.0"
          memory: "8Gi"
        requests:
          cpu: "1.5"
          memory: "5Gi"

Support ephemeral runners to minimize committed resource usage

From studying the current code base, it seems that Runner, RunnerDeployment, RunnerReplicaSet are backed by always on Pod resources. If this is the case, there is a concern about having committed resources sitting idle within the cluster for individual projects, not to mention doubled or tripled as more concurrency is desired.

I'm open to any options on the table to reduce the amount of resources committed while being idle, not simply via ephemeral runners. Any thoughts and ideas would be appreciated!

Runner scale up failed and the error is not quite clear

Today, I am trying to scale up my runner fleet, the runner container just self-restarted like this

Events:
  Type     Reason     Age                   From                                   Message
  ----     ------     ----                  ----                                   -------
  Normal   Scheduled  7m29s                 default-scheduler                      Successfully assigned default/meetup-android-runner-f6zfn-hvbjm to ip-10-181-3-182.ec2.internal
  Normal   Pulled     7m28s                 kubelet, ip-10-181-3-182.ec2.internal  Container image "docker:19.03.6-dind" already present on machine
  Normal   Created    7m28s                 kubelet, ip-10-181-3-182.ec2.internal  Created container docker
  Normal   Started    7m27s                 kubelet, ip-10-181-3-182.ec2.internal  Started container docker
  Warning  BackOff    2m24s                 kubelet, ip-10-181-3-182.ec2.internal  Back-off restarting failed container
  Normal   Created    2m9s (x3 over 7m28s)  kubelet, ip-10-181-3-182.ec2.internal  Created container runner
  Normal   Started    2m9s (x3 over 7m28s)  kubelet, ip-10-181-3-182.ec2.internal  Started container runner
  Normal   Pulling    2m9s (x3 over 7m28s)  kubelet, ip-10-181-3-182.ec2.internal  Pulling image "chenrui/actions-runner:v2.168.0"
  Normal   Pulled     2m9s (x3 over 7m28s)  kubelet, ip-10-181-3-182.ec2.internal  Successfully pulled image "chenrui/actions-runner:v2.168.0"

When I tried to check with the logs, I could not find anything useful (with runner, it is just info logs, with docker, it has something like this):

time="2020-04-15T16:30:44.882673953Z" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." type=io.containerd.metadata.v1
time="2020-04-15T16:30:44.882858479Z" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
time="2020-04-15T16:30:44.883115566Z" level=warning msg="could not use snapshotter aufs in metadata plugin" error="modprobe aufs failed: "ip: can't find device 'aufs'\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n": exit status 1"
time="2020-04-15T16:30:44.883295667Z" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin"
...
time="2020-04-15T16:30:45.174440728Z" level=info msg="Loading containers: start."
time="2020-04-15T16:30:45.244227388Z" level=warning msg="Running modprobe bridge br_netfilter failed with message: ip: can't find device 'bridge'\nbridge                172032  1 br_netfilter\nstp                    16384  1 bridge\nllc                    16384  2 bridge,stp\nipv6                  524288 408 nf_conntrack_ipv6,nf_nat_ipv6,bridge,[permanent]\nip: can't find device 'br_netfilter'\nbr_netfilter           24576  0 \nbridge                172032  1 br_netfilter\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n, error: exit status 1"
time="2020-04-15T16:30:45.540762014Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"

How to make the build more stable with self-hosted runner

I deployed a fleet of self-hosted runners yesterday for my team, it looks like it does the work, but the process seems little bit flaky.

Two issues I have seen multiple times:

##[error].github/workflows/api.yml (Line: 107, Col: 16):
##[error]The template is not valid. .github/workflows/api.yml (Line: 107, Col: 16): The operation was canceled.

Or something like this:

The rerun just turned the build green.

So I am curious if there is anyway that I can retry or how to make the build process more stable. Thanks!

Runner Pod can't start up with ServiceAccount on EKS 1.15.10

I tried to deploy a runner with IAM Role for ServiceAccount (IRSA) in EKS 1.15.10, but the runner pod can't start up at all:

kubectl get pod
NAME                          READY   STATUS        RESTARTS   AGE
my-runner-lmzjc-znh8c         0/1     Terminating   0          10s

the manifest i used:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: actions-runner-system-sa
  namespace: actions-runner-system
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<AWS_ACCOUNT_ID>:role/<IAM_ROLE>
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: my-runner
  namespace: actions-runner-system
spec:
  replicas: 1
  template:
    spec:
      repository: example/repo
      serviceAccountName: actions-runner-system-sa
      automountServiceAccountToken: true
      containers:
        - name: runner
          image: example/action-runner:latest
          imagePullPolicy: Always

look into the event, i saw:

2s          Warning   FailedMount                pod/my-runner-zsvsd-vfdpz                Unable to mount volumes for pod "my-runner-zsvsd-vfdpz_actions-runner-system(1b78d4fb-25e6-4457-bd47-04ea06ae7d14)": timeout expired waiting for volumes to attach or mount for pod "actions-runner-system"/"my-runner-zsvsd-vfdpz". list of unmounted volumes=[aws-iam-token actions-runner-system-sa-token-8gmq9]. list of unattached volumes=[aws-iam-token actions-runner-system-sa-token-8gmq9]

Succeeded status, but the runner cannot be queried

Today, I saw a weird Succeeded for the first time, but found the pod cannot be queried against:

NAME                                       REPOSITORY                     STATUS
github-action-runner-gl5c8-zv7vh    xx/xx                  Succeeded

$ kubectl describe pod github-action-runner-gl5c8-zv7vh
Error from server (NotFound): pods "github-action-runner-gl5c8-zv7vh" not found

Some questions for setup

This looks like a super interesting project however I am unclear on a few things:

I've got cert-manager installed, but what do I actually need to do. I assume I need to install the webhook server cert as my pod is failing to launch because:

MountVolume.SetUp failed for volume "cert" : secret "webhook-server-cert" not found

There isn't anything documented about how you expect the pod to be exposed to github.com? What should I be doing to setup the webhook endpoint as required when creating the github app.
I'm surprised the apps client secret and client id aren't needed?
Any plans to support an webhook secret to further secure the webhook?

Runner failed to start with custom Pod spec

with version 0.4.0 (PR #7), Runner failed to start with custom Pod spec

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: my-runner
  annotations:
    iam.amazonaws.com/role: my-role
spec:
  replicas: 1
  template:
    spec:
      repository: example/repo
      containers:
        - name: runner
          image: example/custom-action-runner:latest
          imagePullPolicy: Always

Error log

kubectl logs muy-runner-8gcph-v7plj -f
RUNNER_NAME must be set

a fix PR will follow up soon.

Add support ImagePullSecrets

Hi!
Please add ImagePullSecrets to pull from private registry

Pulling an action from a private docker registry

Hello,

If I have a workflow that runs a docker image, what is the suggested way to pass the registry credentials to the dind container?

Trying to use a volume and volumeMount only mounts into the runner container, not the docker container.

For example, the following runner spec:

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: self-hosted
  namespace: actions-runners
spec:
  replicas: 2
  template:
    spec:
      organization: my-private-org
      labels:
        - self-hosted
      resources:
        limits:
          cpu: 500m
          memory: 512Mi
        requests:
          cpu: 500m
          memory: 512Mi
      volumeMounts:
        - name: regcred
          mountPath: /root/.docker
          readonly: true
      volumes:
        - name: regcred
          secret:
            secretName: regcred

with the following workflow

name: Test
on:
  - push

jobs:
  test:
    name: hello world
    runs-on: self-hosted
    steps:
      - name: Update test.yaml
        uses: docker://my-private-runner.azurecr.io/actions/gitops-update-yaml:latest

results in the following github action log

2020-06-01T07:56:12.2087927Z ##[section]Starting: Request a runner to run this job
2020-06-01T07:56:12.5058787Z Can't find any online and idle self-hosted runner in current repository that matches the required labels: 'self-hosted'
2020-06-01T07:56:12.5058855Z Found online and idle self-hosted runner in current repository's account/organization that matches the required labels: 'self-hosted'
2020-06-01T07:56:12.8158757Z ##[section]Finishing: Request a runner to run this job
2020-06-01T07:56:19.7434885Z Current runner version: '2.263.0'
2020-06-01T07:56:19.7438707Z Prepare workflow directory
2020-06-01T07:56:19.7753607Z Prepare all required actions
2020-06-01T07:56:19.7917676Z Pull down action image 'my-private-registry.azurecr.io/actions/gitops-update-yaml:latest'
2020-06-01T07:56:19.7955861Z ##[command]/usr/local/bin/docker pull my-private-registry.azurecr.io/actions/gitops-update-yaml:latest
2020-06-01T07:56:20.0672735Z Error response from daemon: Get https://my-private-registry.azurecr.io/v2/actions/gitops-update-yaml/manifests/latest: unauthorized: authentication required
2020-06-01T07:56:20.0717040Z ##[warning]Docker pull failed with exit code 1, back off 7.014 seconds before retry.
2020-06-01T07:56:27.0864577Z ##[command]/usr/local/bin/docker pull my-private-registry.azurecr.io/actions/gitops-update-yaml:latest
2020-06-01T07:56:27.2657611Z Error response from daemon: Get https://my-private-registry.azurecr.io/v2/actions/gitops-update-yaml/manifests/latest: unauthorized: authentication required
2020-06-01T07:56:27.2662491Z ##[warning]Docker pull failed with exit code 1, back off 9.645 seconds before retry.
2020-06-01T07:56:36.9146163Z ##[command]/usr/local/bin/docker pull my-private-registry.azurecr.io/actions/gitops-update-yaml:latest
2020-06-01T07:56:37.0280507Z Error response from daemon: Get https://my-private-registry.azurecr.io/v2/actions/gitops-update-yaml/manifests/latest: unauthorized: authentication required
2020-06-01T07:56:37.0315630Z ##[error]Docker pull failed with exit code 1
2020-06-01T07:56:37.0345781Z Cleaning up orphan processes

Make controller also look at queued jobs.

A running workflow may have multiple queued jobs. The controller could add more runners based on this.

As an example: I have a RunnerDeployment with autoscaling enabled between 3 and 20 instances. One of the workflows I run, has 6 parallel jobs, of which 3 will be always queued.

I can try to make a PR with this over the next few days if you are open to it.

Autoscaling support

It would be great if pods could be adjusted automatically based on the number of pending workflows. I'm not sure if that's possible using the the current version of the Github Actions API or we should use a metrics server and try to keep the number of idle runners below X, but there should be a way to implement this feature. 🙂

Issue with running `docker run` for integration testing

I have run into some issue with running docker run xx-image after moving some of the workflow into self-hosted runner. I could not figure it out, but feel it might be helpful to post in here and discuss.

The flow looks like something below:

docker run -it --rm \
	-v $(PWD):/local \
	xx-image:latest \
	/local/e2e.sh

The error log that i keep getting:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"/local/e2e.sh\": stat /local/e2e.sh: no such file or directory": unknown.

While it works before, I suspect it might be related to the dind docker image it used for the runner. Does the dind docker engine impact the volume mount??

Please shed some insight on fixing the issue and let me know if this makes sense. Thanks!!

self hosted runner stuck on queued

I have a self hosted runner and its using a custom image. This has been deployed and i can see in the pod logs that its listening for jobs. I can see in my organisation that their is an idle runner. But when i run my pipeline it is stuck.

Starting your workflow run...

runner.yml

apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
  name: checkpoint-runner
spec:
  organization: org
  image: <aws_id>.dkr.ecr.us-east-2.amazonaws.com/self-hosted-runner:master

Pod logs:


kay@khan:~/checkpoint/self-hosted-runner$ kubectl get runners
NAME                ORGANIZATION   REPOSITORY   LABELS   STATUS
checkpoint-runner   org                         Running

> 
> --------------------------------------------------------------------------------
> |        ____ _ _   _   _       _          _        _   _                      |
> |       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
> |      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
> |      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
> |       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
> |                                                                              |
> |                       Self-hosted runner registration                        |
> |                                                                              |
> --------------------------------------------------------------------------------
> # Authentication
> √ Connected to GitHub
> # Runner Registration
> A runner exists with the same name
> √ Successfully replaced the runner
> √ Runner connection is good
> # Runner settings
> √ Settings Saved.
> √ Connected to GitHub
> 2020-07-06 16:23:10Z: Listening for Jobs

pipeline.yml

name: test-pipeline

on: [ push ]

jobs:
  build:
    runs-on: self-hosted
    steps:
    - uses: actions/checkout@v2
    - name: Run a multi-line script
      run: |
        echo Hello from self-hosted
        ls
        mysql --version

You can see the custom docker i am using here it just contains aws and mysql cli.

dockerfile.yml


FROM summerwind/actions-runner:v2.169.1

RUN sudo apt-get update

RUN sudo curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
    sudo unzip awscliv2.zip && \
    sudo ./aws/install && \
    aws --version

RUN sudo apt-get -y install mysql-client && \
    mysql --version

It only became stuck like this when i added the custom image

spec.replicas in body must be of type integer: "null"

I'm trying to create an autoscaling RunnerDeployment, but kubernetes won't let me unless I set the replicas value. I've barely modified the example in the README.

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: github-actions-runner
spec:
  minReplicas: 1
  maxReplicas: 5
  template:
    spec:
      organization: ORGNAME

When I run kubectl apply -f filename.yml, I get this error:

The RunnerDeployment "github-actions-runner" is invalid: spec.replicas: Invalid value: "null": spec.replicas in body must be of type integer: "null"

replicas seems to be defined as optional in runnerdeployment_types.go, why is this happening?

Failing updating registration token with Github organization app

I'm currently using option 2 - the "personal access token" way of authenticating. Everything is working in this way.

However, when i try to do a fresh install using a github app created by an organization, i can't seem to get it to work at all. Not sure where the error is likely to be. I suspected permissions but all those seem to be ok? I have read & write for Actions, Checks Workflows in addition to the mandatory ones (Org Self-hosted runners is also read & write). Maybe the issue is still there?

logs from the controller manager:

2020-08-13T16:10:07.510Z	DEBUG	controller-runtime.manager.events	Warning	{"object": {"kind":"Runner","namespace":"actions-runner-system","name":"recsyslabs-runner-deployment-z9n69-49mlq","uid":"19725e67-d746-434e-ad32-a4d922e2968e","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"27540708"}, "reason": "FailedUpdateRegistrationToken", "message": "Updating registration token failed"}
2020-08-13T16:10:07.510Z	ERROR	controller-runtime.controller	Reconciler error	{"controller": "runner", "request": "actions-runner-system/recsyslabs-runner-deployment-z9n69-49mlq", "error": "failed to create registration token: Post https://api.github.com/orgs/recsyslabs/actions/runners/registration-token: could not refresh installation id 8388012's token: request &{Method:POST URL:https://api.github.com/app/installations/8388012/access_tokens Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Accept:[application/vnd.github.machine-man-preview+json application/vnd.github.machine-man-preview+json] Authorization:[Bearer eyJ...w] Content-Type:[application/json]] Body:{Reader:} GetBody:0x7d5c40 ContentLength:5 TransferEncoding:[] Close:false Host:api.github.com Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr: RequestURI: TLS:<nil> Cancel:<nil> Response:<nil> ctx:0xc000114200} received non 2xx response status &{%!q(*http.body=&{0xc0004d5280 <nil> <nil> false false {0 0} false false false <nil>}) {'\\x00' '\\x00'} %!q(bool=false) <nil> %!q(func(error) error=0x7d7500) %!q(func() error=0x7d7490)} with body &{Method:POST URL:https://api.github.com/app/installations/8388012/access_tokens Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Accept:[application/vnd.github.machine-man-preview+json application/vnd.github.machine-man-preview+json] Authorization:[Bearer eyJ...w] Content-Type:[application/json]] Body:{Reader:} GetBody:0x7d5c40 ContentLength:5 TransferEncoding:[] Close:false Host:api.github.com Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr: RequestURI: TLS:<nil> Cancel:<nil> Response:<nil> ctx:0xc000114200} and TLS &{Version:772 HandshakeComplete:true DidResume:false CipherSuite:4865 NegotiatedProtocol:http/1.1 NegotiatedProtocolIsMutual:true ServerName: PeerCertificates:[0xc000b0f600 0xc000b0fb80] VerifiedChains:[[0xc000b0f600 0xc000b0fb80 0xc000d7b700]] SignedCertificateTimestamps:[] OCSPResponse:[] ekm:0x7303f0 TLSUnique:[]}"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88

Some more potentially relevant info:

This is on v0.7.2 release, using my own custom runner image which is up to date (v2.272.0), it's on mircok8s v1.18 installed on ubuntu 18.04 with snap.

Not really sure what else to try. I'll continue using the personal access token for now.