uselagoon / build-deploy-tool Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 5.0 1.65 MB

Tool to generate build resources

Go 67.97% Makefile 0.10% Dockerfile 0.59% Shell 20.85% Mustache 10.22% Smarty 0.26%

hacktoberfest

build-deploy-tool's People

Contributors

Stargazers

Watchers

Forkers

bridgecrew-perf6 tobybellwood maxcwhitehead viu

build-deploy-tool's Issues

Space out k8up backups when there are many services to backup

If there are many database services in a single Lagoon project (this particular project has 30) k8up will spawn 30 backup pods at roughly the same time.

Here you care see large spikes of backup pods in the cluster:

Drilling into the production namespace, they all seem to start within a 12 minute period (likely just due to pod scheduling)

It would be nicer if there was a way to spread the backup pods in a given project over a longer time period, e.g. 4 hours.

This occasionally leads to some backup pods getting stuck, and then manual intervention required to remove the stuck backup pods. This leads to missing database dumps.

Support feature flagged custom backup/restore locations

With uselagoon/lagoon#2820 now merged in, Lagoon now supports custom backup and restore locations natively. Once the remote-controller has support for accepting and passing these variables, we'll have the ability to setup custom backup/restore locations on a per-cluster basis.

uselagoon/remote-controller#147

possibility for mid-rollout task

Usecase: Some sites use the pre-rollout task to enable the maintenance mode during the entire rollout
As there can be a lot of images (or big images) to be pushed to the registry, the time where the maintenance mode is active can be quite long.

Having a mid-rollout task that would be ran as soon as the images have been pushed would help lowering the time where the maintenance page is active. Place where we could run the mid-rollout https://github.com/amazeeio/lagoon/blob/0c3c5e9010c3281d4b7b589fb43477df483b9882/images/kubectl-build-deploy-dind/build-deploy-docker-compose.sh#L1079

Pre/Post rollout disabling doesn't work when defined at the project level

When disabling pre-rollout tasks at the project level, it fails to set to the variable correctly if there is only the project variable defined.

https://github.com/amazeeio/lagoon/blob/master/images/oc-build-deploy-dind/build-deploy-docker-compose.sh#L369-L376

The build script should check in a similar way to how we do other overrides (eg https://github.com/amazeeio/lagoon/blob/master/images/oc-build-deploy-dind/build-deploy-docker-compose.sh#L74-L82)

This will impact Lagoon 2 and Lagoon 1 openshift and kubernetes build deploy images

Unexpected LAGOON_GIT_SHA in PR environments

Describe the bug

When an environment is created from a PR, the LAGOON_GIT_SHA runtime environment variable is populated with a merge commit which only exists in the current build.

There are a couple of levels of surprising behaviour here in my view.

Why is the PR branch merged during the build? If the merge target has had commits added to it since the PR branch diverged then there will be:
a) possibly a merge conflict which will fail the build; and
b) code from commits in the merge target added to the build which isn't in the PR branch.
Assuming that merging makes sense by ignoring the problems in 1., the LAGOON_GIT_SHA should be some value which is useful to the developer. A merge commit which only exists in the current build doesn't seem very useful, although it is technically accurate.

To Reproduce

Steps to reproduce the behavior:

Check the LAGOON_GIT_SHA value in a PR environment.
See that it doesn't exist in the git repository.

Expected behavior

I would expect Lagoon not to merge the PR during the build.

If that is required for compatibility or other reasons then the method for injecting the PR HEAD commit into a runtime variable should be documented. That would be something like this I think:

ARG LAGOON_PR_HEAD_SHA
ENV LAGOON_PR_HEAD_SHA=$LAGOON_PR_HEAD_SHA

I had a look for this snippet in the Lagoon documentation but couldn't find it.

Screenshots

n/a

Additional context

git merge during build occurs here.

Support running project specific cronjobs in polysite projects

lagoon.yml requires cronjob order to be environments.$environment.cronjob which makes cronjobs to not work in a polysite setup if added as below.

poly-project1:
  environments:
    main:
      cronjobs:

Only way to run cronjobs as of now is:

poly-project1:
  environments:
    main:
      routes:
        - nginx:
          - project1.com
environments:
  main:
    cronjobs:

This makes the same cron to run on all projects in a polysite, which is not ideal.

Failure when using `LAGOON_ROUTES_JSON`

We have a customer where the lagoon deployment fails with:

++ build-deploy-tool template ingress
Using value from environment variable MONITORING_ALERTCONTACT
Using value from environment variable PROJECT
Using value from environment variable ENVIRONMENT
Using value from environment variable BRANCH
Using value from environment variable ENVIRONMENT_TYPE
Using value from environment variable BUILD_TYPE
Using value from environment variable ACTIVE_ENVIRONMENT
Using value from environment variable LAGOON_VERSION
Using value from environment variable DEFAULT_BACKUP_SCHEDULE
Using value from environment variable HOURLY_BACKUP_DEFAULT_RETENTION
Using value from environment variable DAILY_BACKUP_DEFAULT_RETENTION
Using value from environment variable WEEKLY_BACKUP_DEFAULT_RETENTION
Using value from environment variable MONTHLY_BACKUP_DEFAULT_RETENTION
Using value from environment variable DBAAS_OPERATOR_HTTP
Using value from environment variable LAGOON_PROJECT_VARIABLES
Using value from environment variable LAGOON_ENVIRONMENT_VARIABLES
Using value from environment variable K8UP_WEEKLY_RANDOM_FEATURE_FLAG
Using value from environment variable K8UP_WEEKLY_RANDOM_FEATURE_FLAG
Using value from environment variable HOURLY_BACKUP_DEFAULT_RETENTION
Using value from environment variable DAILY_BACKUP_DEFAULT_RETENTION
Using value from environment variable WEEKLY_BACKUP_DEFAULT_RETENTION
Using value from environment variable MONTHLY_BACKUP_DEFAULT_RETENTION
Collecting routes from environment variable LAGOON_ROUTES_JSON
Templating ingress manifest for  to /kubectl-build-deploy/lagoon/services-routes/.yaml
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1216c1c]

goroutine 1 [running]:
github.com/uselagoon/build-deploy-tool/internal/templating/ingress.GenerateIngressTemplate({{0x0, 0x0}, {0x0, 0x0}, 0x0, 0x0, 0x0, {0x0, 0x0}, {{0xc00050ae00, ...}, ...}, ...}, ...)
	/home/runner/work/build-deploy-tool/build-deploy-tool/internal/templating/ingress/templates_ingress.go:60 +0x75c
github.com/uselagoon/build-deploy-tool/cmd.IngressTemplateGeneration(0x1)
	/home/runner/work/build-deploy-tool/build-deploy-tool/cmd/template_ingress.go:62 +0xbdb
github.com/uselagoon/build-deploy-tool/cmd.glob..func10(0x20e5c20?, {0x14fd75f?, 0x0?, 0x0?})
	/home/runner/work/build-deploy-tool/build-deploy-tool/cmd/template_ingress.go:17 +0x1e
github.com/spf13/cobra.(*Command).execute(0x20e5c20, {0x212a0b0, 0x0, 0x0})
	/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:856 +0x67c
github.com/spf13/cobra.(*Command).ExecuteC(0x20e4fa0)
	/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
	/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
github.com/uselagoon/build-deploy-tool/cmd.Execute()
	/home/runner/work/build-deploy-tool/build-deploy-tool/cmd/root.go:66 +0x25
main.main()
	/home/runner/work/build-deploy-tool/build-deploy-tool/main.go:21 +0x17

Their LAGOON_ROUTES_JSON is the following:

{"routes":[{"domain":"customer.com","tls-acme":false,"service":"nginx","hsts":"max-age=31536000","insecure":"Allow","monitoring-path":"/","fastly":{"service-id":"xxxxxxx","watch":true}}]}

Unfortunately I don't speak go enough, but it seems to be connected to this line:

build-deploy-tool/internal/templating/ingress/templates_ingress.go

Line 60 in 541b877

"kubernetes.io/tls-acme": strconv.FormatBool(*route.TLSAcme),

Backup schedules define common bucket names

Describe the bug
Schedules with the same project name on different clusters will create buckets with the same name. This is a problem, one of the baas operators will succeed in claiming the bucket name and the other will return errors like:

Last reported error: Connection to S3 endpoint not possible: 400 Bad Request

To Reproduce
Steps to reproduce the behavior:

Define a schedule object with the same bucket name on clusters in different regions.
See errors in the backup logs of one of them.

Expected behavior
Bucket names should be globally unique. Maybe a random suffix?

Screenshots
n/a

Additional context
n/a

Name of the database backup file downloaded from the "Backups" section in the dashboard

Project/environment are duplicated in the filename of the database backup file downloaded from the dashboard "Backups" section
e.g. backup-govcms8-training-master-mariadb-prebackuppod-govcms8-training-master-mariadb-prebackuppod.mariadb.sql-2020-12-22T18_02_30Z.tar.gz

Nginx backup filename does not have this issue
e.g. backup-govcms8-training-master-nginx-2020-12-22T18_04_04Z.tar.gz

Add core compatibility check functionality

As we start to ramp up work on this tool and the associated build image, we may encounter times where we add functionality to core in specific versions, but if users are still running old versions of core, and are consuming the bleeding edge version of the build image, this could cause issues.

We should build in some compatibility check that can check for the existence/value of an internal_system scoped variable LAGOON_SYSTEM_CORE_VERSION which core should/will pass through and we can use this to fail builds and alert users/lagoon administrators that they are running an out of date core for the build image in use.

pre-deploy task doesn't wait long enough for pods to scale up

pre-deploy task seem to wait only for 30 secs to scale up the pods, unfortunately in some situations (big images, slightly loaded cluster, etc.) it can take longer than 30secs and the build is failing, here an example

Is there a simple way to extend the waiting time? I would suggest something around 5 mins.

lagoon.type change warn or error

When this tool starts to support checking the existing deployed environment state, it should check the lagoon.type of deployed services and should probably fail a build as changing the lagoon.type is currently unsupported (for most type changes anyway, nginx to nginx-persistent might be ok, or any non-persistent to persistent type change might work OK, but not the other way)

Use password-stdin flag when logging in to private container registries

When using private container registry, we should pipe the password to stdin using --password-stdin flag to prevent the warning message below

2020-12-04T00:42:48.306229219Z WARNING! Using --password via the CLI is insecure. Use --password-stdin.
2020-12-04T00:42:49.971050129Z Error response from daemon: Get https://registry-1.docker.io/v2/: unauthorized: incorrect username or password

We use this in oc-build-deploy-dind and kubectl-build-deploy-dind in a few places.

Use Environment Variable System also for Buildtime

Currently we use the Environment Variable System only for Runtime Environment Variables: https://lagoon.readthedocs.io/en/latest/using_lagoon/environment_variables/#environment-variables

We could use the same also for Buildtime, where each of the defined environment variables is passed as a Build Argument.

Require ability to re-deploy without cached docker layers

We are running decoupled Drupal + Gatsby as two separate lagoon environments. When content is updated in Drupal, we need to be able to click the deploy button in the lagoon UI so Gatsby can go retrieve the updated content.

Currently this is not possible b/c lagoon will not rebuild unless it's a new commit

post-rollout task error message is difficult to decode

If a post-rollout task fails, it is difficult to identify where the error occurred based on the output, example

Executing Post-rollout Tasks
Evaluating task condition -  LAGOON_ENVIRONMENT != "main"
Executing task '':'/app/scripts/deploy/post-deploy.sh'Going to exec into  cli-5c9b44bcbb-rpbdh
Error: error in Stream: command terminated with exit code 1
error in Stream: command terminated with exit code 1
Usage:
  lagoon-build tasks post-rollout [flags]

The above error does not indicate to me clearly if the script failed to run inside the pod, or if the exec command failed. Can we make this more descriptive?

Provide an env var containing the latest tag

Hi,

We tag our commits SHAs with a friendly version name but are unable to retrieve those in the environments as only LAGOON_GIT_SHA is available.

It would be very helpful to have a LAGOON_GIT_LATEST_TAG (for example) variable containing the output of:

git describe --abbrev=0

Cheers

Fallback to db-single flag with default disable

I think we should disable the fallback to xdb-single by default, and have a flag to enable the fallback feature in

.lagoon.yml
- allowDatabaseFallback: true|false ?
docker-compose service label
- lagoon.databasefallback: true|false
lagoon api environment variable
- LAGOON_FEATURE_FLAG_DATABASE_FALLBACK: enabled|disabled|true|false

This would mean that builds will FAIL if there is no dbaas-provider present to take the workload, which seems desirable as it would prevent builds from creating db pods and then trying to figure out what went wrong and fixing the environment to use a dbaas provider.

Long cronjob names can lead to issues on deplyoments

Describe the bug

A long cronjob name can lead to failing deployments

* metadata.name: Invalid value: "cronjob-cli-fastly-purge-queue-worker--flush-cache-tags-": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
* spec.jobTemplate.spec.template.labels: Invalid value: "fastly-purge-queue-worker--flush-cache-tags-": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')
* spec.jobTemplate.spec.template.spec.containers[0].name: Invalid value: "cronjob-cli-fastly-purge-queue-worker--flush-cache-tags-": a DNS-1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')
* metadata.name: Invalid value: "cronjob-cli-fastly-purge-queue-worker--flush-cache-tags-": must be no more than 52 characters

Tasks not finding `name` field

It seems that pre/post rollout tasks aren't picking up the name field, it doesn't appear to be passed into the new struct as part of the exectution. See message from this in the wild, the Executing task is missing the name.
Also the Going to exec ... should probably be on a new line?

Executing Post-rollout Tasks
Evaluating task condition -  LAGOON_ENVIRONMENT != "main"
Executing task '':'/app/scripts/deploy/post-deploy.sh'Going to exec into  cli-5c9b44bcbb-rpbdh

https://github.com/uselagoon/build-deploy-tool/blob/v0.13.3/cmd/tasks_run.go#L209

Add section headers to build logs

Currently build log section information is marked by a call to https://github.com/uselagoon/build-deploy-tool/blob/main/legacy/build-deploy-docker-compose.sh#L100-L130

This is called at the end of a section to output stats for the section and effectively marks the beginning of the next section.

This makes the parsing of these sections difficult because there's currently no structured start of the section.

What we want to do is

is to duplicate the patchBuildStep function, call it something like "beginBuildStep" and change it so that it outputs "BEGIN " instead of "STEP ..."
Add a beginBuildStep at the beginning of every section (with the section being bookended by the current patchBuildStep output)

`--dry-run` is deprecated in Lagoon build deploy

Describe the bug

There appears to be use of a deprecated API in Lagoon build deploy.

To Reproduce

Run a build, notice

++ kubectl -n project-branch create configmap lagoon-env -o yaml --dry-run --from-env-file=/kubectl-build-deploy/values.env
++ kubectl apply -n project-branch -f -
W0804 03:37:51.774636    1346 helpers.go:553] --dry-run is deprecated and can be replaced with --dry-run=client.
configmap/lagoon-env configured

Expected behavior

No warnings.

support additional IngressClass in ingresses

As an advanced user of Lagoon, managing a dedicated remote cluster, I may wish to install an additional Ingress. the build-deploy-tool should allow me to specify which IngressClass a non-default ingress be mapped to.

Research usage of Security Contexts and PodSecurityPolicies for kubernetes clusters

while OpenShift manages Security Contexts via Security Context Constraints fully automatically there are no Security Contexts enforced in a vanilla Kubernetes cluster.
Therefore Lagoon should make sure that the pods/containers are only allowed the least amount of privilege they need.

In order to achieve this we can either use Security Contexts on a Pod level or PodSecurityPolicies on a cluster level to enforce pod securities.

In this work we would like to implement:

Enforce best security practices for pods:
- no running as root
- run as random user id? (I think this causes more harm than real usage in the real world, so I would not copy this from OpenShift)
- drop capabilities that are not used (openshift drops KILL,MKNOD,SETUID,SETGID)
- do not allow priviledged pods (this is the default, but just to be sure)
If possible also define a PodSecurityPolicies from Lagoon that is used on all pods created by lagoon in order to make sure we have a centralized place to manage the security contexts

Deleted variables from API remain in configmap

Describe the bug

When deleting an environment variable from the API, it doesn't remove it from the lagoon-env configmap. This is a problem when using the existence of the variable to enable/disable a feature. Like BASIC_AUTH_USERNAME and BASIC_AUTH_PASSWORD.

To Reproduce

Steps to reproduce the behavior:

Add global|runtime variable to the API
Deploy environment
See variable is added to the config map
Delete variable from the API
Deploy environment
See that deleted variable is still in the config map

Expected behavior

Deleted API variables should be removed from the configmap

Impacted customers

At least 2 large customers in APJ are impacted by this.

Allow .lagoon.yml override/merge for pre and post deploy tasks

Proposal:

It would be useful to be able to centrally control and add extra pre and post rollout tasks, over and above those explicitly defined in the .lagoon.yml.

The proposal is to allow merging of pre and post rollout tasks via two avenues.

having something like a docker compose override file, a .lagoon.override.yml (for example) whose entries will be merged into/over a given .lagoon.yml file.
Having an environment variable (name still tbd) which will itself contain the contents of a lagoon.yml override. This will allow the specification of extra tasks to be done via the api, rather than needing to add a new file to the git repo.

Any additional pre/post rollout tasks defined here will be run as part of the deployment.

Increase timeout for scaling services for tasks

Currently the tool waits 3 seconds between attempts and gives up after 10 attempts.

This should be increased, and made configurable

pre-rollout tasks should be skipped on first deployment

On the first deployment of an environment to Lagoon, there are no running services to perform pre-rollout tasks on.

Previously this was checked in the script (https://github.com/uselagoon/lagoon/blob/v2.8.0/images/kubectl-build-deploy-dind/scripts/exec-pre-tasks-run.sh#L17-L19)

This task should be replicated here to stop builds failing

++ build-deploy-tool tasks pre-rollout
Executing Pre-rollout Tasks
Executing task : [ -f /app/vendor/bin/govcms-pre-deploy-db-update ] && /app/vendor/bin/govcms-pre-deploy-db-update || echo 'Pre Update databse is not available.'
Error: No deployments found matching label: lagoon.sh/service=cli
Usage:
  lagoon-build tasks pre-rollout [flags]

Aliases:
  pre-rollout, pre

Flags:
  -h, --help                help for pre-rollout
  -l, --lagoon-yml string   The .lagoon.yml file to read (default ".lagoon.yml")
  -n, --namespace string    The environments environment variables JSON payload

No deployments found matching label: lagoon.sh/service=cli

document requirements of a build in depth

It would be good to document or outline all the requirements that a build needs.

variables that are added to the build pod that are made available to the build script
files that are consumed (not created) by the build script
- .lagoon.yml
- docker-compose.yml
- others?
anything else?

Lagoon builds break when users specify custom docker creds matching internal registry

Describe the bug

If a user specifies custom docker credentials which use the same registry URL as the internal Harbor registry, Lagoon will always process the internal registry docker login command first. Any subsequent docker login commands to the same registry url will result in only the last set of credentials which succeeded in logging into the upstream repo, meaning that the customer provided credentials will be the only credentials available for the build pod to use for pushing and pulling images to the internal registry. If these credentials do not have the correct access (as in a globally configured puller account with no push access), builds will fail because they are unable to push their artifacts up to the internal registry.

To Reproduce

Steps to reproduce the behavior:

Add a custom registry to an environment's .lagoon.yml which is valid for the same url as the internal Harbor registry for that region
Deploy the environment
See build fail because it is unable to push images to the correct repository

Expected behavior

Lagoon should not allow users to specify custom registry credentials which are valid for the same url as the internal registry. If such an event is found, Lagoon should fail the build with a descriptive error.

Additional context

This behavior is related to the following Docker issue: moby/moby#37569

Objects no longer in `.lagoon.yml` should be deleted upon successful deploy

Describe the bug

At the moment if you have custom routes in your .lagoon.yml they will get created upon deploy. If however you later remove them, they will not be deleted. This forces people to create clunky scripts to delete all routes, and trigger a new deployment.

Expected behavior

Upon successful deployment, all routes that used to exist, but no longer do in .lagoon.yml should be removed.

Migrate existing legacy build functionality from bash to golang

The existing build stages that are currently in Bash, that need migrating across to golang here:

Build Steps:

Scripts:

Git checkout & merge
Promote

Additional functionality to be added to the build-tool

AlternativeNames to add to custom routes (#25)
Support additional registries, proxies
Add resource limits and custom metrics
k8up v2 support
improved polysite support

Add support for configuring hsts via lagoon.yml on k8s

Describe the bug

Lagoon's oc-build-deploy had support for configuring HSTS headers per route in lagoon.yml, this is currently not available when deploying via kubectl-build-deploy

To Reproduce

Steps to reproduce the behavior:

Add hsts configuration to a route definition
Deploy environment with route attached
No hsts headers are returned for that route

Expected behavior

HSTS headers are configured based on the configuration applied in lagoon.yml as describe by our documentation -- https://docs.lagoon.sh/lagoon/using-lagoon-the-basics/lagoon-yml#ssl-configuration-tls-acme

Possible solutions

Set custom headers via configuration-snippet in nginx ingress controller.

metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/configuration-snippet: |
      more_set_headers "MyCustomHeader: Value";

Add `noindex, nofollow` to all autogenerated routes

In uselagoon/lagoon#2867 (comment) we decided to add the noindex, nofollow header to all autogenerated routes, not just development ones.

It seems in the transfer to build-deploy-tool, this has only been applied to development environments. We should re-add it to prod autogens.

Add IngressClass (and possible required annotations) to ingress creation process

As of Kubernetes 1.19 - both ingress-nginx and traefik require the presence of an IngressClass resource on created Ingresses if they are required to operate on a non-default Ingress.

Lagoon doesn't currently support this for workloads - to make this happen, we need three things - a feature flag (that can be set at the controller or proj/env level), a resource created in the ingress and any relevant Annotations (e.g. certmanager). The first two are pretty straightforward - but we would have to find a way to define the annotation that needs setting as they may differ if people don't use certmanager?

Branch names consisting of only numbers are not handled correctly

Describe the bug
During testing, a branch named 1299 was not picking up custom routes defined in the .lagoon.yml correctly. It looks like this issue is actually shyaml's indeterminate typing for a string that can be either an integer or a string.

To Reproduce
Steps to reproduce the behavior:

Create a branch with only numbers as its name, like 1299
Define custom routes for that branch in the .lagoon.yml file
Observe that the custom routes are not picked up during the lagoon build.

Expected behavior
Since a branch name consisting of only numbers should be a valid branch name, Lagoon should support using solely digits as branch names

Additional context
The fix for this issue will likely involve forcing shyaml to read every branch name as a string, to remove any ambiguity with regards to the type of the branch name.

Cronjobs not always removed during deployment

Describe the bug

Sometimes when cronjobs in .lagoon.yml are replaced, the new cronjobs are created but the old cronjobs are not removed.

To Reproduce

Steps to reproduce the behavior:

Deploy a project with a single cronjob named foo. Make it hourly so it is created as a native kubernetes cronjob. See that the cronjob is created correctly.
Replace that cronjob with two cronjobs: foo-bar and foo-baz, and deploy again.
See that you now have three cronjob objects in the environment namespace: foo, foo-bar, and foo-baz.

Expected behavior

I expect the foo cronjob to be removed from the environment namespace since it is no longer defined in the .lagoon.yml.

Screenshots

n/a

Additional context

This piece of code does the cronjob removal:

https://github.com/uselagoon/lagoon/blob/ad45d779c27f7841e9f00c1b7a90452554a17c0f/images/kubectl-build-deploy-dind/build-deploy-docker-compose.sh#L1619-L1634

For the reproduction steps above, in the deploy at step 2, text="foo-bar foo-baz", and re="\<foo\>". The test on line 1627 evaluates to true, and so the old cronjob is not removed. For example:

$ test="foo-bar foo-baz"; re="\<foo\>"; [[ "$test" =~ $re ]]; echo $?
0

The \< and \> are GNU regex extensions equivalent to \b, which matches a word boundary. Because hyphens are a word boundary and the name of the new cronjob matches the old cronjob with a hyphenated suffix, this test evaluates to true when it should evaluate to false. The same problem would occur with hyphenated prefixes.

Document how to create cronjobs for non-standard branch names

Describe the bug

We have docs that state you must use the name of the branch in your .lagoon.yml file. However it does not cover the use case where your branch has special characters in it.

Example of this

environments:
  'feature/long-branch-name-here':
    cronjobs:
      - name: 'drush cron --root=/app'
        schedule: '*/15 * * * *'
        command: drush cron
        service: cli

With the branch name feature/long-branch-name-here you are required to use quotes to ensure that the special characters are not evaluated in YAML.

Double quotes or single quotes will work, but single quotes are likely better to use for the docs.

Pre-rollout tasks fail when an environment is idled in kubernetes

Describe the bug

If an environment is idled in kubernetes by the new lagoon-idler, there is no way to unidle the environment without hitting the ingress first, or ssh-ing into the environment.

This can be problematic for deployments that have pre-rollout tasks that may need to talk to an idled resource.

To Reproduce

Steps to reproduce the behavior:

Wait for an environment to idle.
Trigger a build that contains a pre-rollout task that contacts a potentially idled deployment.
Observe the build failure as it tries to contact something that is idled.

Expected behavior

The environment should be unidled if there are any pre-rollout tasks to perform.

Additional context

This problem does not exist in openshift as idling is done differently, and only recently idling was rolled out to all kubernetes clusters so this has only been discovered recently.

Disabling autogenerated routes should remove existing ingress objects

Describe the bug

At present, it is not possible to delete existing autogenerated ingress objects by pushing the appropriate changes to your .lagoon.yml file.

Because disabling autogenerated routes is a positive action (i.e. you need to explicitly set a value in your .lagoon.yml file, it is highly unlikely to be accidentally done. From the docs, something like this is needed in your .lagoon.yml file:

routes:
  autogenerate:
    enabled: false

To Reproduce

Steps to reproduce the behavior:

Push new branch to existing project, omit the autogenerate code block in your .lagoon.yml file. Notice that autogenerated routes are created (as expected)
Add the appropriate changes to your .lagoon.yml file to disable autogenerated routes, and push
Notice how the autogenerated routes still exist post deployment

Expected behavior

Autogenerated routes are removed if disabled explicitly from the .lagoon.yml file.

Additional context

I don't think this should apply to custom domains, as there is no positive action way to delete these.

Related to uselagoon/lagoon#2863

Multiple nginx-php deployments are not supported

Currently deploying multiple nginx-php or nginx-php-persistent deployments are not supported.

The way the templates values.yaml check works here means that it isn't possible to run more than one.

The line DEPLOYMENT_SERVICETYPE=$line is populated with nginx and php in the while loop based on what is defined in the values.yaml file in the templates here

This causes the next variable DEPLOYMENT_SERVICETYPE_IMAGE_NAME to be empty, as the values used to check the map ${SERVICE_NAME}:${DEPLOYMENT_SERVICETYPE} are set to the hardcoded values of nginx or php for DEPLOYMENT_SERVICETYPE.

Example, given a docker-compose.yaml that is like this

version: '2.3'
services:
  nginx:
    build:
      context: .
      dockerfile: lagoon/nginx.dockerfile
    labels:
      lagoon.type: nginx-php
  php:
    build:
      context: .
      dockerfile: lagoon/php.dockerfile
    labels:
      lagoon.type: nginx-php
      lagoon.name: nginx

  second-nginx:
    build:
      context: .
      dockerfile: lagoon/second-nginx.dockerfile
    labels:
      lagoon.type: nginx-php
  second-php:
    build:
      context: .
      dockerfile: lagoon/second-php.dockerfile
    labels:
      lagoon.type: nginx-php
      lagoon.name: second-nginx

The variable map of MAP_DEPLOYMENT_SERVICETYPE_TO_IMAGENAME would contain the following indexes

nginx:nginx
php:php
second-nginx:second-nginx
second-php:second-php

But since the exec-kubectl-resources-with-images.sh script is sourcing the value DEPLOYMENT_SERVICETYPE from the templates values.yaml, the map lookup for second-nginx and second-php fail because it tries to look up in the map for these indexes

second-nginx:nginx
second-php:php

but they don't actually exist.

Since all of our other templates are single-container templates, this restriction doesn't exist.
We should be able to support this configuration, even if it may not be ideal.

There are other issues with this too that need to be checked with how persistent volumes are used/consumed if a persistent volumes was to be shared between both of those deployments similar to how it is shared with the CLI deployment. Or if the second deployment is meant to have its own persistent volume created, separate to the other deployment.

Set specific environment variables for promoted environments

It would be useful if there was an identifier to say whether the current environment is promoted or not and which git branch it was promoted from.

These variables could be consumed at build for the Lagoon Facts, as right now I have to manually pass in the promoted environment name as LAGOON_GIT_BRANCH is empty.

Random `LAGOON_ROUTE` with multiple, auto-generate-only, services

Describe the bug

When deploying an environment that only has autogenerate routes, the LAGOON_ROUTE is not always for the first service listed in docker-compose.yml.

To Reproduce

Have a docker-compose.yml with multiple routable services (e.g., nginx and varnish)
Run build-deploy-tool identify primary-ingress
See that the primary ingress will randomly return the nginx or varnish route

Expected behavior

The LAGOON_ROUTE should always be for the first service in the docker-compose.yml, assuming no other primary route has been defined.

Additional context

The type for unmarshaled docker compose services is a map, which is randomly ordered. Therefore, when iterating over the services to create autogenerated routes, the order will be random. When no primary route is defined, the first auto-generated route is picked to be the primary route and cause the returned primary route to be random.

Settings `tls-acme` to `false` should remove existing Certificate objects that are not Ready

Describe the bug

If you set tls-acme to True by mistake, e.g. due to the DNS mapping to a CDN that strips Lets Encrypt challenges (e.g. Akamai). This creates a pod that will run in the namespace trying to validate the HTTP challenge.

If you then realise your mistake, and correct the .lagoon.yml to have tls-acme set to False, this will not clean up the existing Lets Encrypt challenge. Leaving you with manual intervention required to clean up the Certificate object.

To Reproduce

Steps to reproduce the behavior:

Set tls-acme to True on a domain in .lagoon.yml
Point DNS for the domain to a CDN that strips the Lets Encrypt challenges (e.g. Akamai)
Correct the .lagoon.yml to have tls-acme set to False
Notice challenge pod remains

Expected behavior

Lagoon should delete all Certificate objects in the namespace if they are not Ready in status.

Additional context

Potentially related to uselagoon/lagoon#2795

Prebackuppods run for a long time

Describe the bug

After uselagoon/lagoon#2307 prebackuppods can run for a long time. They will also be restarted if they die. However if they do restart it will kill the backup process, so there's no point. They should be specified not to restart if they die.

To Reproduce

n/a

Expected behavior

prebackuppods should not be restarted.

Screenshots

n/a

Additional context

n/a

`LAGOON_ROUTE` not generated when project has only autogenerated routes

When calling the build-deploy tool as part of a Lagoon build to identify the environment's primary ingress, if the environment has only autogenerated routes, the primary ingress (and thus, the LAGOON_ROUTE) variable, will be empty. The LAGOON_ROUTES variable, however, will be populated with the project's autogenerated routes as expected.

Reading through the code, it seems that an autogenerated route can never be used as an environment's primary ingress. Specifically,

build-deploy-tool/cmd/identify_ingress.go

Line 112 in e2903e0

if len(mainRoutes.Routes) > 0 {

only checks main routes if there is at least one main route already defined. It seems to me like we need to add an else block to that statement, which injects the first item from the array of autogenerated routes into the array of main routes if the array of main routes is empty.

Create a `livenessProbe` for `node` with HTTP

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request

This will allow Nodejs web servers to automatically restart in the event that they produce HTTP responses >= 400.

Likely this should be opt-in (perhaps another helm template?), because if a Nodejs application (for example) has basic authentication, you would not want an HTTP 401 causing the pod to be restarted.

Adding a new route to a service in the .lagoon.yml will not cause that service to be updated

Describe the bug
When adding a new route to a service, the configmap is updated, but the deploymentconfig is not deployed afterwards to reflect the new configmap changes.

To Reproduce
Steps to reproduce the behavior:

Create a few services and deploy them.
Add a custom route to one of those services and deploy the changes.
See that the build completed, but the route is not reflected in the LAGOON_ROUTES env var of any container.

Expected behavior
Adding a new route to a service should trigger a deploy of that service.

Reading environment variables from `docker-compose.yml`

Are there any plans to support specifying all required environment variables in docker-compose.yml and using it in addition to other methods of reading variables?

docker-compose.yml would become a single place for all per-site config and would allow "single-file Drupal stack with platform deployment configuration".

Moreover, this would mitigate problems with providing values in .env or .env.default files and then replicating their default values in docker-compose.yml file, which no longer has a single place of configuration.

Of course, the existing ways of overriding configuration would still work.

Multiline cron commands failing deployments in k8s

Describe the bug

Lagoon builds fail when defining a multiline cronjob command.

To Reproduce

Steps to reproduce the behavior:

Define a cronjob with a multiline command

    cronjobs:
      - name: multiline cron command
        schedule: "15 1 * * *"
        command: |-
            echo "multiline \
            cron command"
        service: cli

Deploy the environment which contains the cronjob
See build error.

nativeCronjobs:
multiline-cron-command:
schedule: 15 1 * * *
command: |-
echo "multiline \
cron command"
inPodCronjobs: ""
+++ [[ -n image: ""
imagePullPolicy: Always
imagePullSecrets: [] ]]
+++ SERVICE_NAME_IMAGE=cli
+++ SERVICE_NAME_IMAGE_HASH=harbor-nginx-lagoon-master.ch.amazee.io/drupal9-example-simple/cron-debug/cli@sha256:792030cb7f375633bb01aefae4e67f7b72ab3a16aff1fa8e711f3603d67a8753
+++ cat /kubectl-build-deploy/cli-values.yaml
+++ helm template cli /kubectl-build-deploy/helmcharts/cli-persistent -f /kubectl-build-deploy/values.yaml -f /kubectl-build-deploy/cli-values.yaml [... logs truncated ...]
Error: YAML parse error on cli-persistent/templates/cronjob.yaml: error converting YAML to JSON: yaml: line 83: could not find expected ':'

Expected behavior

Mulitline commands defined in YAML with valid syntax are converted to valid commands for helm to create cronjob objects.

Workarounds

You can use YAML syntax > which folds multiple lines into a single line command.

    cronjobs:
      - name: multiline cron command
        schedule: "15 1 * * *"
        command: >
            echo "first command";
            echo "second command"
        service: cli

uselagoon / build-deploy-tool Goto Github PK

build-deploy-tool's People

Contributors

Stargazers

Watchers

Forkers

build-deploy-tool's Issues

Describe the bug

To Reproduce

Expected behavior

Screenshots

Additional context

Describe the bug

Describe the bug

To Reproduce

Expected behavior

Describe the bug

To Reproduce

Expected behavior

Additional context

Describe the bug

To Reproduce

Expected behavior

Possible solutions

Describe the bug

To Reproduce

Expected behavior

Screenshots

Additional context

Describe the bug

To Reproduce

Expected behavior

Additional context

Describe the bug

To Reproduce

Expected behavior

Additional context

Describe the bug

To Reproduce

Expected behavior

Additional context

Describe the bug

To Reproduce

Expected behavior

Additional context

Describe the bug

To Reproduce

Expected behavior

Screenshots

Additional context

Describe the bug

To Reproduce

Expected behavior

Workarounds

Recommend Projects

Recommend Topics

Recommend Org