habitat-sh / habitat-operator Goto Github PK

View Code? Open in Web Editor NEW

61.0 12.0 17.0 29.69 MB

A Kubernetes operator for Habitat services

License: Apache License 2.0

Go 90.03% Makefile 2.97% Shell 6.27% Dockerfile 0.09% Mustache 0.64%

kubernetes habitat operator kubernetes-cluster

habitat-operator's Introduction

habitat-operator

This project is currently unstable - breaking changes may still land in the future.

Overview

The Habitat operator is a Kubernetes controller designed to solve running and auto-managing Habitat Services on Kubernetes. It does this by making use of Custom Resource Definitions.

To learn more about Habitat, please visit the Habitat website.

For a more detailed description of the Habitat type have a look here.

Prerequisites

Habitat >= 0.52.0
Kubernetes cluster with version 1.9.x, 1.10.x or 1.11.x
Kubectl version 1.10.x or 1.11.x

Installing

Make sure you have golang compiler installed. Follow the installation instructions on download page to learn about GOPATH. Then run following command:

go get -u github.com/habitat-sh/habitat-operator/cmd/habitat-operator

This will put the built binary in $GOPATH/bin, make sure this is in your PATH, so you can access the binary from anywhere.

Building manually from source directory

Clone the code locally:

go get -u github.com/habitat-sh/habitat-operator/cmd/habitat-operator
cd ${GOPATH:-$HOME/go}/src/github.com/habitat-sh/habitat-operator

Then build it:

make build

This command will create a habitat-operator binary in the source directory. Copy this file somewhere to your PATH.

Usage

Running outside of a Kubernetes cluster

Start the Habitat operator by running:

habitat-operator --kubeconfig ~/.kube/config

Running inside a Kubernetes cluster

Building image from source

First build the image:

make image

This will produce a habitat/habitat-operator image, which can then be deployed to your cluster.

The name of the generated docker image can be changed with an IMAGE variable, for example make image IMAGE=mycorp/my-habitat-operator. If the habitat-operator name is fine, then a REPO variable can be used like make image REPO=mycorp to generate the mycorp/habitat-operator image. Use the TAG variable to change the tag to something else (the default value is taken from git describe --tags --always) and a HUB variable to avoid using the default docker hub.

Using release image

Habitat operator images are located here, they are tagged with the release version.

Deploying Habitat operator

Cluster with RBAC enabled

Make sure to give Habitat operator the correct permissions, so it's able to create and monitor certain resources. To do it, use the manifest files located under the examples directory:

kubectl create -f examples/rbac

For more information see the README file in RBAC example

Cluster with RBAC disabled

To deploy the operator inside the Kubernetes cluster use the Deployment manifest file located under the examples directory:

kubectl create -f examples/habitat-operator-deployment.yml

Deploying an example

To create an example service run:

kubectl create -f examples/standalone/habitat.yml

This will create a single-pod deployment of an nginx Habitat service.

More examples are located in the example directory.

Contributing

Dependency management

This project uses go dep >= v0.4.1 for dependency management.

If you add, remove or change an import, run:

dep ensure

Testing

To run unit tests locally, run:

make test

Clean up after the tests with:

make clean-test

Our current setup does not allow e2e tests to run locally. It is best run on a CI setup with Google Cloud.

Code generation

If you change one of the types in pkg/apis/habitat/v1beta1/types.go, run the code generation script with:

make codegen

habitat-operator's People

Contributors

Stargazers

Watchers

Forkers

etsangsplk lilic rbramwell tuananh asymmetric bryantidd kinvolk-archives defilan atalanta surajssd oooojavaoooo talits clarkin-bps biome-sh eeyun moutons isabella232

habitat-operator's Issues

Topology should default to `none`

Currently the topology options in the operator are standalone and leader/follower, we need to add an option of no topology. The default option in Habitat is none and the operator should default to that as well, which means:

the difference between standalone and none is that none will never update itself

See further discussion for this here.

Rename operator binary

The current name, operator, is obviously not optimal.

Create e2e tests

Create e2e tests.

Switch to pflag in test

Tried using pflag but it did not work, because flag registering/parsing seem to get overridden. Maybe it's because other parts of the code/dependencies parse things through init methods, but it needs some further investigation.

My attempt:

type testFlag struct {
	image      string
	kubeconfig string
	externalIP string
}
....
flags := flag.NewFlagSet(os.Args[0], flag.ContinueOnError)

flags.StringVar(&tf.image, "image", "", "habitat operator image, 'kinvolk/habitat-operator'")
flags.StringVar(&tf.kubeconfig, "kubeconfig", "", "path to kube config file")
flags.StringVar(&tf.externalIP, "ip", "", "external ip, eg. minikube ip")

flags.Parse(os.Args[2:]) // As the previous flags are test related flags.

Add documentation for the examples

Each one of the directories under examples/ should have a README file explaining what the example does, how to run it, etc.

Fix hard dependency to minikube in Makefile

If minikube is not present/running, building the project results in an error being displayed:

❯ make
E0830 11:51:01.433090   24624 ip.go:48] Error getting IP:  Host is not running
go build -i github.com/kinvolk/habitat-operator/cmd/operator

Update IP in ConfigMap when pod dies

In the case when we already have the pod IP data in the ConfigMap, but the pod with that IP has died we need to select a new pod and update the ConfigMap with that IP.

For more context see https://github.com/kinvolk/habitat-operator/issues/21 and https://github.com/kinvolk/habitat-operator/issues/11

Add support for --peer-watch-file

The supervisor should be started with the --peer-watch-file flag, if the CRD had a key topology: leader.

The flag should be passed as an argument to the container.

Contents of service directory are hidden

When we mount the user.toml file under /hab/svc/foo, we hide the existing contents of that directory, which include files that the service needs.

We should look into using subPath as explained here.

ConfigMap not found error

After deleting the SG with kubectl, the operator still receives events on the Pod handler.

In those handlers, we expect the ConfigMaps to be there, but they aren't (as they are deleted in the onDelete), so we get errors like

level=error component=controller msg="configmaps \"example-encrypted-service-group\" not found"

Increase number of workers

We can add a parameter to control how many workers are started. These workers will pop jobs from the workqueue in parallel, improving performance.

See this for an example implementation.

Add support for --group flag

The habitat client supports the --group flag, which allows users to start the supervisors in a specific group.

In order to support this, we need to:

Add a group key to the CRD
Pass the --group flag as arguments to to the containers

Create habitat service CRD spec

This issue is about creating a CRD spec for a Habitat service.

Fields:

Count
Docker image

Introduce workqueue

To make sure individual events don't interfere with each other, upstream Kubernetes suggestion is for controllers to implement a workqueue.

More info can be found here.

Write pod IP in the already created ConfigMap

Once the deployment is created we want to write the IP of the first running pod into a ConfigMap.

Relates to habitat-sh/habitat#2735 --peer-watch-file flag that has to be implemented on the hab client.

Add support for binding services

More info here.

Can we do an initial implementation, only supporting the --bind startup flag, without updates?

This depends on habitat-sh/habitat#2735.

Explore integration of Open Service Broker API

One way to implement the Open Service Broker API we can explore the Kubernetes service catalog which is currently in the Kubernetes incubator.

Prerequisites for the service catalog to work we would need:

Kubernetes cluster with cluster DNS enabled
Helm charts
Handle RBAC roles

e2e tests: use yaml files instead of API calls

To catch errors such as the one in https://github.com/kinvolk/habitat-operator/pull/104, we should switch in the e2e tests to yaml manifest files, instead of creating our operator SG resources through the API. As we would have noticed the problem of the tag not being changed properly.

Remove dependency on minikube

The e2e target in the Makefile calls minikube ip.

It would be good not to expect a minikube installation to be present, and get the cluster IP some other way, e.g. by parsing ~/.kube/config.

Allow changing log level from command line

Currently, a user needs to change the source to change the log level.

We should expose a flag that does this instead.

RBAC rules for Operator

Create role based access control rules for the Habitat operator.

Kubernetes RBAC has been promoted to v1 in Kubernetes 1.8 and major Kubernetes distributions turn it on by default which means, that the Kubernetes apiserver will deny all access to its APIs by default. RBAC is there to enable access to those APIs.

The Habitat operator makes heavy use of the Kubernetes APIs, therefore we need to document the required RBAC roles, in order for users to run the Habitat operator in a secure manner.

Verify Configuration Updates Work

Things to write e2e tests for after habitat-sh/habitat#2805 is complete :

updates
--bind flag
gossip does not interfere with our user.toml

(This checklist is WIP, feel free to contribute to it.)

Avoid using client-go master

Switching to v4.0.0 would makes things a bit more stable rather then using master. This release includes Kubernetes 1.7.

Decrease e2e tests running time

End-to-end tests currently run for 10+ minutes.

Ideas:

Disable Travis' PR test
Find a way to only run certain tests sometime

Support multiple rings

Errors when deleting Habitat resource

When deleting a Habitat, the following errors are displayed:

ts=2017-11-08T12:02:17.98051371+01:00 level=info component=controller msg="deleted deployment" name=example-leader-follower-habitat
ts=2017-11-08T12:02:17.983844937+01:00 level=error component=controller msg="deployments.apps \"example-leader-follower-habitat\" not found"
ts=2017-11-08T12:02:17.98428139+01:00 level=error component=controller msg="Habitat could not be synced, requeueing" msg="deployments.apps \"example-leader-follower-habitat\"
 not found"

The actual Habitat is removed.

This seems like it could have been introduced in #113.

Use Namespace in tests

See https://github.com/kinvolk/habitat-operator/pull/94#discussion_r140207694

Do not handle leader IP in standalone topology

The Pod handlers should not perform any of the leader IP work if the Topology of the ServiceGroup is standalone.

Configuration flags from `glog` leak into binary

Some of the flags returned by --help seem to come from the glog library, and have no effect on our own logging (e.g. -v value).

❯ ./operator --help                                                                            
Usage of ./operator:                                                                           
  -alsologtostderr                                                                             
        log to standard error as well as files                                                 
  -kubeconfig string                                                                           
        Path to a kubeconfig. Only required if out-of-cluster.                                 
  -log_backtrace_at value                                                                      
        when logging hits line file:N, emit a stack trace                                      
  -log_dir string                                                                              
        If non-empty, write log files in this directory                                        
  -logtostderr                                                                                 
        log to standard error instead of files                                                 
  -stderrthreshold value                                                                       
        logs at or above this threshold go to stderr                                           
  -v value                                                                                     
        log level for V logs                                                                   
  -vmodule value                                                                               
        comma-separated list of pattern=N settings for file-filtered logging

Create unit tests

The unit tests should include testing individual functions in the main code base.

Deploy operator using Helm chart

Helm is the Kubernetes Package Manager. By creating a Helm chart we give the user a reproducible way to easily deploy the Habitat operator.

Create ConfigMap and mount it in the pod

First step to getting the full functionality of the habitat leader-follower topology is to create the ConfigMap, which we will later update with the pod IP. This will be taken by habitat when using the --peer-watch-file flag. See habitat-sh/habitat#2735.

Use cache instead of making API calls

Whenever possible, we should use the cache returned by the cache.NewInformer function, instead of making API calls, to retrieve objects from the API.

Depends on #63.

Run E2E tests automatically

This issue is to explore ways in which we can make running our E2E tests part of the CI process.

Parallelizing Travis jobs using the build matrix
Using the -j flag in make
Using a cron job to periodically run the E2E tests on master
Using a custom bash script that only runs the E2E tests on master
Only running the E2E tests in the job that tests the PR merge commit (e.g. checking $TRAVIS_PULL_REQUEST in a bash script)

Add Deployment watcher

We should have watchers for all resources that affect Habitats.

We have a pod watcher, but we still need to add a Deployment one.

Depends on #63.

Release process

There should be a documented release process, i.e. steps we need to take for when doing a new release of the Habitat operator. Here are a few steps that come to my mind right now:

Build image make linux
Tag image to the release following versioning strategy vx.x.x
Push the image to the version tag as well as latest to hub.docker.com
Bump up tag version in the deployment Habitat operator manifest file examples/habitat-operator-deployment.yml
Update CHANGELOG.md with release notes

Update deployment when Habitat object is updated

Currently, if the Habitat object is updated (i.e. image name or number of replicas was changed), the deployment is not updated. Our reconciler handles only creating deployment if it doesn't exist yet. In order to do upgrades of Habitat application on Kubernetes, we need to handle image name updates.

Add support for custom namespace

We should allow the user to create the Custom Object in a namespace of their choosing.

If none is specified, it will be created in the default namespace.

Add support for --topology flag

The operator needs to start the supervisor with the --topology leader flag, if the spec has a topology: leader flag.

This depends on habitat-sh/habitat#2735, because we want to start all supervisors with the same set of flags.

Create deployment `onAdd`

Once the operator has received a new CR, it needs to create a Deployment using the CR's parameters.

Switch to a different logging library?

log-kit has the disadvantage that the logger instance has to be (or should be) passed around for logging to be possible.

Other libraries, like glog don't have this UX issue.

Should we switch? Some people also like log15.

Running a hab Service Group inside of k8s

This is about running a Service Group as a collection of manually created pods, and confirming that the supervisors in the SG are able to talk to one another, and, for example, elect a leader.

Two types of pods will have to be crated, since we're still relying on the current --peer mechanism, and therefore we run supervisors with different flags.

Checklist

Pods can fetch from the internet
Leader election succeeds

Only watch for our own Pods

The Lister is now watching for all Pods, whereas we want to only react to our own.

This might be accomplished with namespaces or labels.

Use StatefulSets instead of Deployment

Currently we are using Deployments to deploy our Habitat Service, but since we do not know what are deploying, what type of service that is, it could be anything from a DB to a simple Rails application. We should not just assume our Habitat service would be stateless.

Couple of advantages of StatefulSets:

Graceful deployment and scaling
Stable network identity
Graceful deletion and termination
Stable, persistent storage

These would be very useful especially if our service is for example a DB.

Handle Ring Key

The operator should auto-generate a ring key and/or accept one provided by the user. This way all the containers are secured at the gossip layer.

More info here.

Use ownerReferences with CRD

Using OwnerReferences allows us to define relationships between Resources, so that deleting an owner can automatically delete owned resources.

Currently, we use OwnerReferences to associate a ConfigMap with a Deployment.

It could be useful to make all Resources we create dependent on the CustomResource, but there might currently be problems with that.

Upgrade to Kubernetes 1.8

Update client-go to 1.8
Update README

See this for changes.

Demo: Bind plus initial configuration

Create a demo for the operator. Demo will showcase the following features:

One Service group bound to a database.
We override the port on which the database listens on and display that port information in the first service.
Similar to the bind demo, but also displaying how different fields in the manifest file (Habitat features) can be used together (configuration and bind feature in this case).

Rename CRD ServiceGroup

Current name ServiceGroup conflicts with concept of a group in Habitat, as well as concept of a Service in Kubernetes, we need to come up with a better name for CRD.

List of ideas:

HabitatService
Habitat

Rename Config field