ukhomeoffice / kb8or Goto Github PK

View Code? Open in Web Editor NEW

45.0 30.0 5.0 169 KB

CI Tool for deploying with kubernetes

License: MIT License

Ruby 98.43% Shell 1.57%

kb8or's People

Contributors

Stargazers

Watchers

Forkers

cloudxtreme stefancocora 40a stiller-leser uk-gov-mirror

kb8or's Issues

Add NoControllerOK setting to allow for deploys without services

Allow an option to override this message:

Invalid deployment unit (Missing controller) in dir:kb8or_deploys/*.yaml

No tests, so no living documentation

There aren't currently any tests for this tool, which will cause a few issues:

No way of knowing whether a code change breaks existing functionality
No living documentation - I created some docs to show how an example would work, but there is no way of knowing if it will still actually work, and there would be no alert if a commit broke the test

I think we should write tests :)

Missing Good Readme

A project is useless without a good readme. See here for a guide: https://gist.github.com/PurpleBooth/6f1ba788bf70fb501439, and here for an example https://gist.github.com/PurpleBooth/109311bb0361f32d87a2

Allow use of kb8or settings as Vars

There are some settings that previously could not be accessed as variables directly. More specifically, I needed to deploy a complex Pod with one image from a private registry and one not so I wanted to use the PrivateRegistry setting as a variable directly for this Pod e.g.:

      - name: checking
        image: ${ PrivateRegistry }/checking:replace.me

Allow additional noop checks

At the moment -n (no operation) will only parse the environment. It should potentially (in order of probably priority):

Ensure all kubernetes yaml input has all variables present
Potentially check all environments for a deploy have equivalent variables
Investigate if kubernetes files match a known schema (it there an API endpoint)

Missing a travis ci config

Make life easier for people submitting code to your repository. Make sure that you double check that their code works the best you can (even if that's just running a build of the package) in travis. This way potential contributors can fix bugs before you even need to look at their request. https://gist.github.com/PurpleBooth/6f1ba788bf70fb501439

Add --limit-deploy option to single resource

When troubleshooting new applications, it's really useful to use all the variables but not to have to re-deploy an entire application.

Add a --limit-deploy-csv to a list of resources to only deploy the resources listed e.g.:

--limit-deploy resourecontrollers/proxy,services/proxy

Detect when node selectors used and no nodes available...

If a replicationController is deployed with an invalid tag (or there are no nodes available), the event below is present but kb8or will not detect it.

- apiVersion: v1
  count: 20
  firstTimestamp: 2015-10-13T14:09:11Z
  involvedObject:
    apiVersion: v1
    kind: Pod
    name: es-data-eu-west-1b-v1-pdzql
    namespace: nfidd-vagrant
    resourceVersion: "91572"
    uid: f9dff11e-71b3-11e5-a6e7-080027085f30
  kind: Event
  lastTimestamp: 2015-10-13T14:13:51Z
  message: Failed for reason MatchNodeSelector and possibly others
  metadata:
    creationTimestamp: 2015-10-13T14:13:51Z
    deletionTimestamp: 2015-10-13T15:13:51Z
    name: es-data-eu-west-1b-v1-pdzql.140cc43e00a4fbe9
    namespace: nfidd-vagrant
    resourceVersion: "93092"
    selfLink: /api/v1/namespaces/nfidd-vagrant/events/es-data-eu-west-1b-v1-pdzql.140cc43e00a4fbe9
    uid: a0dd21cb-71b4-11e5-a6e7-080027085f30
  reason: failedScheduling
  source:
    component: scheduler

Failed to pull image ...: API error (500)

The following error wasn't detected by kb8or:

Failed to pull image "10.101.100.100:30000/nfidd_mysql_management:0.1.0.3-999": API error (500): invalid registry endpoint "http://10.101.100.100:30000/v0/". HTTPS attempt: unable to ping registry endpoint https://10.101.100.100:30000/v0/
v2 ping attempt failed with error: Get https://10.101.100.100:30000/v2/: dial tcp 10.101.100.100:30000: i/o timeout
 v1 ping attempt failed with error: Get https://10.101.100.100:30000/v1/_ping: dial tcp 10.101.100.100:30000: i/o timeout. HTTP attempt: unable to ping registry endpoint http://10.101.100.100:30000/v0/
v2 ping attempt failed with error: Get http://10.101.100.100:30000/v2/: dial tcp 10.101.100.100:30000: i/o timeout
 v1 ping attempt failed with error: Get http://10.101.100.100:30000/v1/_ping: dial tcp 10.101.100.100:30000: i/o timeout

Improve error handling when missing environment file

Current error handling isn't clear about what the issue is. If you specify an environment file that doesn't exist you get the following error:
E, [2015-09-06T20:35:47.688880 #1] ERROR -- : undefined method has_key?' for nil:NilClass E, [2015-09-06T20:35:47.688829 #1] ERROR -- : undefined methodhas_key?' for nil:NilClass

To improve this it should check for the existence of the environment file and report a clearer error if it doesn't exist

If a resource directory is empty - there's no error

Given an invalid directory name:

deploys:
- path: ./k8resources/missing_directory

There is no error. Add an error if the glob returns no files...

@vaijab

Be able to deploy even when current deployment is broken

kb8or previously failed to do a deploy because I tried to deploy a docker image that didn't exist. Now when I try to do a deploy with an image that doesn't exist it still fails and gives the following error:

docker run --rm -v /root/.kube:/root/.kube/ -v /var/lib/jenkins/jobs/DEV_DEPLOY/workspace/kb8or_deploys:/var/lib/deploy quay.io/ukhomeofficedigital/kb8or:v0.1.8 -e dev digital-storage.yaml
error: couldn't read version from server: the server responded with the status code 408 but did not return more information

E, [2015-09-08T13:40:07.971539 #1] ERROR -- : undefined method []' for false:FalseClass E, [2015-09-08T13:40:07.971448 #1] ERROR -- : undefined method[]' for false:FalseClass
Build step 'Execute shell' marked build as failure
Warning: you have no plugins providing access control for builds, so falling back to legacy behavior of permitting any downstream builds to be triggered
Finished: FAILURE

Handle transient api read errors from kubectl

Detect the error below from kubectl (may possibly have associated error code):

error: couldn't read version from server: Get http://10.250.1.203:8080/api: dial tcp 10.250.1.203:8080: i/o timeout

Use rolling update

Kubernetes gives us rolling update for free, which does rollback, no downtime, etc. We should use this

Allow Kb8User tag in settings

Although kubernetes will manage it's own kubectl config and context, it doesn't allow a user to be specified.

Add kb8User: setting.

nuke option

I'd like to be able to delete everything within a namespace when deleting a namespace, as opposed to failing. Something like a --nuke option would be good. The reason for this is I'll be spinning up loads of envs for CI tests and would like an easy and consistent way to get rid of them. I imagine others will come to the same conclusion.

Allow detection of resource changes for redeploy

Possibly create a md5 of the source resource and store it in a metadata item (e.g. kb8ResourceMd5) when creating / updating items so that upgrades only happen when resources actually change...

It would assume that no changes have been made to the resource that were not done by kb8or but could speed up deployments with no changes...?

Thoughts?

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: '{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"nfidd-dev","name":"es-master-v3","uid":"4859f7b5-7275-11e5-aab0-0676feffb509","apiVersion":"v1","resourceVersion":"4651582"}}'
  creationTimestamp: 2015-10-14T13:12:58Z
  generateName: es-master-v3-
  labels:
    component: elasticsearch
    kb8_deploy_id: v3
    role: master
  kb8ResourceMd5: 5936d2167939ff8fa9e7ce78b300bf9e
.
.
.

Create onbuild event for easy deployments

Deployments should just have a Dockerfile with a simple build / run which could also handle getting secrets.

Work done for SRRS can just be added to kb8or and entrypoint updated...

Add delete resource

It is possible that a resource has been renamed from a previous deploy and any new deploy will have conflicting resources: e.g.:

Error (exit code:'1') running 'kubectl create -f -':
The Service "checking-https" is invalid:spec.ports[0].nodePort: invalid value '30803': provided port is already allocated

Provide a feature for deleting an old name for a resource e.g.:

Deploys:
  - Path: ./myapp/kb8or
    Delete: services/checking-proxy

Generate context and use that over Kb8Server

To manage multiple namespaces for multiple clusters it is simpler for kb8or to have the namespace, cluster and user specified in the environment file. The settings for cluster, user and namespace should be managed separately from kb8or.

The logic can be:

Preserve current behaviour if a kb8server is set (but mark as deprecated take out of docs).
If there is a Kb8Context setting, error if kb8Server is set.

Kb8Context should have the following settings:

Kb8Context:
  cluster: production_cluster
  user: prod_user
  namespace: prod_namespace

All three should be mandatory to prevent any potential confusion.
The Kb8Context setting should only be available in the environment "context".

Add NamespacePrefix and NamespaceValue to settings

For DSP deployments, the namespaces will have to be differentiated between projects and or may already exist. Currently, namespaces will always use the environment name.

Add a default environment to a deploy yaml

To remove the requirement for an environment switch, add a default environment to a deploy file. This would allow platform resources (rather than applications) to be deployed using the correct environment / namespace etc. without having to specify which environment at the command line e.g.:

The following:

kb8or -e platform ./k8deploys/platform.yaml

Could go to:

kb8or ./k8deploys/platform.yaml

If the platform.yaml file could include the following:

K8DefaultEnv: platform

Can't patch resources when separate namespace specified

e.g. The following replication controller is deployed to the skydns namespace using the platform namespace context default...

/var/lib/kb8or-0.6.7/libs/kb8_run.rb:107:in `run': Error (exit code:'1') running 'kubectl patch ReplicationController kube-dns-v9 -p '{"metadata":{"labels":{"kb8_md5":""}}}'': (Kb8Run::KubeCtlError)
Error from server: replicationControllers "kube-dns-v9" not found

Error from server: replicationControllers "..." not found

This will happen when a replication controller resource is deployed with a rolling update and then changed to use NoRollingUpdate. Kb8or tries to patch a non-existent resources (so they'll always be redeployed)...

Can only run one kb8or deploy at a time if the same kube config file used

As the deployment currently uses kubectl config set-context, the config file will relate to a specific namespace and deployment only!

Potentially, update all commands to target a namespace or copy the config file to a tmp file before running (and delete it on exit).

Allow deploys to whitelisted environments

Currently a deploy may be deployed to any environments. This may be a problem when applications should NOT be deployed to another application's environment.

Either add a tag to a deploy:

DeployToWhitelist:
  - app_a_aceptance
  - app_a_preprod
  - app_a_prod

Or add a tag to an Environment:

WhiteListDeploys:
  - app_tier_1
  - app_tier_2

Missing a License

Without a license a user cannot use a project as they might be sued! See here for a guide: https://gist.github.com/PurpleBooth/6f1ba788bf70fb501439, and here for an example https://github.com/nevir/readable-licenses/tree/master/markdown

Missing a CONTRIBUTING file

A contributing file is a place for a user to work out how to contribute to a project, and what coding standards, and things they need to do before submitting code. It is also the place you should include a code of conduct, to make sure people know they can be expected to be treated fairly by the project. https://gist.github.com/PurpleBooth/6f1ba788bf70fb501439

Should report version in output.

To track any issues found, should report kb8or version in output.

Event, "Failed to pull image" no longer detected

- apiVersion: v1
  count: 61
  firstTimestamp: 2016-01-29T11:44:55Z
  involvedObject:
    apiVersion: v1
    fieldPath: spec.containers{registry}
    kind: Pod
    name: registry-v1-j6noi
    namespace: srrs-ci
    resourceVersion: "1765734"
    uid: b5aab60b-c67d-11e5-accd-06606734e5a7
  kind: Event
  lastTimestamp: 2016-01-29T11:54:45Z
  message: 'Failed to pull image "quay.io/ukhomeofficedigital/s3registry:v2.0.0-bob":
    image pull failed for quay.io/ukhomeofficedigital/s3registry:v2.0.0-bob, this
    may be because there are no credentials on this request.  details: (Tag v2.0.0-bob
    not found in repository quay.io/ukhomeofficedigital/s3registry)'
  metadata:
    creationTimestamp: null
    deletionTimestamp: 2016-01-29T12:54:45Z
    name: registry-v1-j6noi.142de30bef3dc23b
    namespace: srrs-ci
    resourceVersion: "1773019"
    selfLink: /api/v1/namespaces/srrs-ci/events/registry-v1-j6noi.142de30bef3dc23b
  reason: Failed
  source:
    component: kubelet
    host: ip-10-50-1-50.eu-west-1.compute.internal

Enable base64 encoding of variables

Either add an option to base64 encode file includes OR add a Function to base64 encode any variable.
Should be able to generate secrets from a bunch of files...

Would require #37 E.g:

my_var:
  Fn::FileData: /root/.secrets/secret-data-1

base64_var:
  Fn::Base64EncVar: my_var

Then in a resource:

apiVersion: v1
kind: Secret
metadata:
  name: test-secret
data:
  data-1: ${ base64_var }

Please add a LICENSE

Allow creation of secrets from a set of files

At the moment, consuming secrets in kb8or typically involves some bash scripts to base64 encode and create a variable file that can then be included as variables.

Enable the creation of secrets from a set of files directly. Note the path should be able to differ to reflect the fact that secrets are often different per environment:

FileSecrets:
  name: app_secrets
  path: ../secrets/${ env}/proxy_*

The file names would be used as the secret file names and the file contents would be base64 encoded as the secret values.

In addition, as the s3secrets utility imposes an upper limit on file size, also implement #37 and #31 (to allow for concatenation of secret data).

Add eventual replica number to a deploy (to support slow starting clusters)

Example scenario:
When deploying elsaticsearch, a single master needs to be started first before then adding multiple standby masters...

Add a key EventualReplicas to a deployment e.g.:

Deploy:
  path: ./elastic_search/
  EventualReplicas:
    Rc: se-master
    replicas: 3

Alternatively, maybe this could be dealt with using a patch file e.g.:

es_master_replicas: 3
Deploy:
  path: ./elastic_search/
  PostDeployPatch:
    ResourceName: rc-master
    PatchFile: ./replica/patch

Then patch file could embody any post deployment change e.g.:

spec:
  replicas: ${ es_master_replicas }

Can't patch resources when in failed state

May have to patch in MD5 ONLY when resource deploy successful?

Kubectl can't patch failed replication controllers?

Error (exit code:'1') running 'kubectl patch ReplicationController jenkins -p '{"metadata":{"labels":{"kb8_md5":""}}}'':
Error from server: an error on the server has prevented the request from succeeding (patch replicationcontrollers jenkins)

consider using something else for variables instead of bash variables.

So, I might want to use ${ENV_SERVICE_HOST} as the value I pass through to a container. Given that k8bor trys to evaluate this and causes problems if it does, leading me to trying to escape it.

We either figure out how to pass strings through like that in a reliable way or add something to kubernetes to ignore them or replace the variables with something else like maybe handlebars.

Rename controllers after upgrade to provide consistent logging

If the logging solution uses replication controller names for logging, we may need this feature...

As ReplicationControllers are "upgraded" with rolling updates and these require a new name, the ReplicationController names are currently suffied with a "-v2". These resource names could be renamed back after deploy to ensure there is a consistent name for logging...

Need to check if this is required by testing with the logstash logging first...

Add timeout for deploys

Allow run with just kubernetes yaml files

In some cases, it would be good to run without an environment file or defaults for simplest case deployments. Command line parameters should be added for the context to use.

This would still give the benefit of deployments with reporting on success.

Allow auto creation of namespaces when possible

At the moment, kb8or will deploy to namespaces based on kb8or environment name. It will fail on versions of kubernetes where namespaces need to be explicitly created.

Allow namespaces to be created automatically if they don't exist e.g. when deploying to nfidd-vagrant, pass kubectl this yaml:

apiVersion: v1
kind: Namespace
metadata:
  name: nfidd-vagrant

If a resource is created in a specified namespace it can't be updated

When deleting a resource that has a namespace specified in a resource file, the resource will not be found with the current .exists? method...

Always deploy pods with `RestartPolicy: Never`

At the moment the md5 check will wrongly skip maintenance deployment jobs.

can't rolling upgrade a when in wait state.

If you have a pod in a crashLoopWait it won't let you do a rolling-upgrade and thus requires either manual intervention, or I assume luck in the timing if your request. I'd like folks not to have direct kubectl access and thus would like to deal with this.

I think we can assume that a pod in crashLoopWait isn't working and can be deleted before we try again?

Detect and handle single time error events e.g. "Failed to start with docker id..."

There are some errors which only happen once as the error string keeps changing. Kb8or uses an error event threshold which must be met before it fails.

Not sure why the symptom below ever happened but the symptom is not detected by kb8or. Simplest fix would be to have a list of regexpes for fail immediately issues.

- apiVersion: v1
  count: 1
  firstTimestamp: 2015-09-07T15:19:13Z
  involvedObject:
    apiVersion: v1
    fieldPath: spec.containers{checking}
    kind: Pod
    name: checking-8ca3h
    namespace: prod
    resourceVersion: "5369357"
    uid: fa841316-5572-11e5-8641-0af397f74e73
  kind: Event
  lastTimestamp: 2015-09-07T15:19:13Z
  message: |
    Failed to start with docker id a0a6f076516b with error: API error (500): Cannot start container a0a6f076516b3c46878985a911f822332d353431d7ed06168d01584208bc0f69: [8] System error: read parent: connection reset by peer
  metadata:
    creationTimestamp: 2015-09-07T15:19:13Z
    deletionTimestamp: 2015-09-07T16:19:13Z
    name: checking-8ca3h.1401bb2bfb50d333
    namespace: prod
    resourceVersion: "5371908"
    selfLink: /api/v1/namespaces/prod/events/checking-8ca3h.1401bb2bfb50d333
    uid: cbc75133-5573-11e5-8641-0af397f74e73
  reason: failed
  source:
    component: kubelet
    host: 10.50.1.89
- apiVersion: v1
  count: 1
  firstTimestamp: 2015-09-07T15:19:23Z
  involvedObject:
    apiVersion: v1
    fieldPath: spec.containers{checking}
    kind: Pod
    name: checking-8ca3h
    namespace: prod
    resourceVersion: "5369357"
    uid: fa841316-5572-11e5-8641-0af397f74e73
  kind: Event
  lastTimestamp: 2015-09-07T15:19:23Z
  message: Created with docker id 7e3e1ee29f02
  metadata:
    creationTimestamp: 2015-09-07T15:19:23Z
    deletionTimestamp: 2015-09-07T16:19:23Z
    name: checking-8ca3h.1401bb2e3dde0db1
    namespace: prod
    resourceVersion: "5371982"
    selfLink: /api/v1/namespaces/prod/events/checking-8ca3h.1401bb2e3dde0db1
    uid: d1906698-5573-11e5-8641-0af397f74e73
  reason: created
  source:
    component: kubelet
    host: 10.50.1.89
- apiVersion: v1
  count: 1
  firstTimestamp: 2015-09-07T15:19:23Z
  involvedObject:
    apiVersion: v1
    fieldPath: spec.containers{checking}
    kind: Pod
    name: checking-8ca3h
    namespace: prod
    resourceVersion: "5369357"
    uid: fa841316-5572-11e5-8641-0af397f74e73
  kind: Event
  lastTimestamp: 2015-09-07T15:19:23Z
  message: |
    Failed to start with docker id 7e3e1ee29f02 with error: API error (500): Cannot start container 7e3e1ee29f02742a02cdce2115612e82797f739755a02481e39bb63679be8b72: [8] System error: read parent: connection reset by peer
  metadata:
    creationTimestamp: 2015-09-07T15:19:23Z
    deletionTimestamp: 2015-09-07T16:19:23Z
    name: checking-8ca3h.1401bb2e50cb3449
    namespace: prod
    resourceVersion: "5371983"
    selfLink: /api/v1/namespaces/prod/events/checking-8ca3h.1401bb2e50cb3449
    uid: d1c19ec0-5573-11e5-8641-0af397f74e73
  reason: failed
  source:
    component: kubelet
    host: 10.50.1.89

Always upgrade pods if they use changed secrets

If a secret changes that is used by a rc / pod, redeploy even if md5 the same!

Remove the requirement for a defaults.yaml and ensure sensible default values

At the moment a defaults.yaml file is required with at least the following:

EnvFileGlobPath: ./environments/*.yaml
UsePrivateRegistry: false

Remove this requirements and set the settings above as defaults.

Enable re-deploy of unhealthy pods even when not changed

At the moment, if a pod fails (e.g. no image etc...) a subsequent deploy will pass as the failed resource is not re-deployed.

Either:

Add a function to determine health of an existing pod or controller (using existing code) to determine if a deploy needs to run when md5 matches.
Delete the md5 label for failed resources at deploy time (I think I prefer this).

Support multi-container pod log reporting

At the moment, no logs will be shown for containers with multiple containers.

e.g:

Waiting for 'checking-v8-kj7cv'...Detected restarting container:'proxy'. Backing off to check again in 10
.
Error, failing pods...
Error, failing pods...
Error messages for pod:checking-v8-kj7cv

$

Add Fn::FileData

Sometimes we just want to import data as a variable value e.g:

my_var:
  Fn::FileData: /tmp/file

Enable multiple RC creation from a template (e.g. to allow for HA)

At the moment, when a POD needs a specific node selector (to facilitate high availability data deployments), multiple RC's must be created on disc.

The following pattern would allow any resource controller named es-template to be used as a template to create three RC's called es-data-az-1, es-data-az-2, es-data-az-3 and with the variable ${ az } set appropriately as the template is added to the deployment.

Additionally, multiple resource controller entries could be specified:

For multiple RC's defined within the same directory.
For a static list of multiple RC's from the same template name (RcName is a static value).
For a dynamic list of RC's using the optional EnumName and EnumValues (RcName is generated from the EnumValues).

Deploy:
  Path: ../kb8or_deploys/elasticsearch
  MultiRC:
    Name: es-template
    Items:
    - RcName: es-data
      VarData:
        replicas: 1
        EnumName: az
        EnumValues:
        - az-1
        - az-2
        - az-3