Coder Social home page Coder Social logo

projectcontour / gimbal Goto Github PK

View Code? Open in Web Editor NEW
662.0 41.0 92.0 30.39 MB

Gimbal is an ingress load balancing platform capable of routing traffic to multiple Kubernetes and OpenStack clusters. Built by Heptio in partnership with Actapio.

Home Page: https://github.com/projectcontour/gimbal

License: Apache License 2.0

Makefile 0.81% Go 98.77% Dockerfile 0.42%
kubernetes envoy ingress openstack loadbalancer

gimbal's People

Contributors

alexbrand avatar bsteciuk avatar castrojo avatar davecheney avatar etiennecoutaud avatar jeremyrickard avatar jonasrosland avatar krisdock avatar poidag-zz avatar quiye avatar rosskukulinski avatar stevesloka avatar sunjaybhatia avatar tinygrasshopper avatar vmogilev avatar yutaokaz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gimbal's Issues

Discovery of K8s namespaces that do not exist in the gimbal cluster fail

Given that the kubernetes discoverer does not create namespaces in the gimbal cluster, and the fact that the discoverer is watch-based, any backend services that live in namespaces that do not exist in the gimbal cluster will not be discovered unless the discoverer is restarted.

As a Gimbal operator, I would like the ability to define these missing namespaces after the discoverer has been started and have the discoverer retry the creation of these service/endpoints objects.

Sample RBAC permissions for team namespace

Need to define some sample RBAC permissions which can be applied to team namespaces. These would be applied to namespaces in the Gimbal cluster and should only allow users the following permissions:

  • Ingress: "get", "list", "watch", "create", "update", "patch", "delete"
  • Services and Endpoints: "get", "list", "watch"

Solicit feedback on supporting upstreams with non-routable Pod/VM IPs

At launch, Gimbal requires that discovered Application VMs (OpenStack) and Pods (Kubernetes) must have routable IPs that can be reached from the Gimbal cluster. While this is sufficient for our initial user and other users with flat IP namespaces, this will likely prohibit other common scenarios.

We should extend Gimbal to support Kubernetes and OpenStack deployments that do not provide routable IPs. This could include clusters that use an overlay network (e.g. Weave or Flannel) or that simply do not provide routable IPs.

One proposed solution would be to configure a GRE tunnel per upstream cluster.

This goal of this issue is to solicit feedback from the community about their deployments and use-cases so that we can design a viable solution.

Namespace ignore list

Add ability for discoverer to read in a configmap or args list of namespaces in which the discoverer should not watch for changes.

Cleanup deployment directory

  • Update Contour to v0.4.1 (not the routecrd branch)
  • Ordered yaml creation for Prometheus & Grafana (e.g. 01-, 02-,

Allow Prometheus data to persist to PVC

Currently, Prometheus data is stored in the pod's temporary storage. We need to make an example which mounts the data on a node's local storage. Should make it apply the deployment to the same node and utilized PV / PVC's accordingly.

Multi-team functionality through IngressRoute Delegation

Teams should not be able to use Virtual Hosts or Services that do not belong to them.

Adding a new team

  1. Administrator creates like-named Namespace & Tenant created in all OpenStack, Kubernetes, and Gimbal clusters and provide RBAC credentials as necessary to the new Team
  2. Administrator defines Virtual Hosts new Team may use in the Discovery Custom Resource Definition
  3. Team deploys applications and then configures their Route CRD based on the discovered Service

Contour can be put into an enforcing mode where only whitelisted namespaces may have Root IngressRoutes.

Liveliness Probes

Should look into the deployments of the discoverers and define probes which assist in the health. We have the metrics components, but if something starts to error, would be good to have them attempt to self-heal in addition to reporting error status.

Performance benchmarking

Gimbal needs to support internet-scale workloads. As such, we should test Gimbal to ensure that it is capable of handling the following:

  • Ingress: 10s of Gbps
  • Egress: 10s of Gbps
  • X million concurrent connections
  • Latency: p99 maximum 20ms-30ms RTT

We should document suggested hardware footprint to support traffic of this amount.

Enable route configuration across multiple services

The Kubernetes Ingress object limits a single 'service' per Route path. For Gimbal, users should be able to define multiple Services (from Kubernetes or OpenStack clusters) that will receive traffic for a given route. Users should be able to define the load balancing strategy (RoundRobin, Random, Least load, EWMA).

This enables a wide variety of use-cases including weight-shifting, A/B testing, etc.

Rename --cluster-name flags on discoverers

The discovery naming conventions use "backend name" instead of "cluster name". We had a brief discussion on #102 about renaming the --cluster-name flag on the discoverers to --backend-name, so opening this issue to finish up that discussion.

Alertmanager docs

Should have examples on how to test alerts on Alert manager so once it's deployed initially, users can validate it is functioning correctly.

Additional Metrics

Is this a BUG REPORT, PERFORMANCE REPORT or FEATURE REQUEST?:

Feature Request

Need additional metrics exposed via discoverer:

  • upstream-services{cluster, tenant/namespace} -- (Number of upstream services, labeled by cluster and tenant/namespace)
  • replicated-services{cluster, tenant/namespace} -- (Number of services replicated to gimbal cluster labeled by cluster and tenant/namespace)
  • invalid-services{cluster, tenant/namespace} -- (Number of services unable to be replicated labeled by cluster and tenant/namespace)
  • upstream-endpoints{cluster, tenant/namespace,service} -- (Number of endpoints - meaning IP:Port - in the upstream labeled by cluster, namespace, and service name)
  • replicated-endpoints{cluster, tenant/namespace,service} -- (Number of endpoints - meaning IP:Port replicated to Gimbal cluster labeled by cluster, namespace and service name)
  • invalid-endpoints{cluster, tenant/namespace,service} -- (Number of endpoints - meaning IP:Port - unable to be replicated to gimbal labeled by namespace and service name)

Openstack Discoverer logs incorrect update

The openstack discoverer logs updates for services when no updates have happened. The following example show the lb being added, then shows an update on each cycle when it wasn't updated in openstack.

time="2018-05-08T13:10:17Z" level=info msg="Successfully handled: add service 'demo/lb1-2ad78ab3-aa94-43e4-a764-194a67601d16-openstack'"
time="2018-05-08T13:10:17Z" level=info msg="Successfully handled: add endpoints 'demo/lb1-2ad78ab3-aa94-43e4-a764-194a67601d16-openstack'"
time="2018-05-08T13:10:47Z" level=info msg="Successfully handled: update service 'demo/lb1-2ad78ab3-aa94-43e4-a764-194a67601d16-openstack'"
time="2018-05-08T13:11:17Z" level=info msg="Successfully handled: update service 'demo/lb1-2ad78ab3-aa94-43e4-a764-194a67601d16-openstack'"
time="2018-05-08T13:11:47Z" level=info msg="Successfully handled: update service 'demo/lb1-2ad78ab3-aa94-43e4-a764-194a67601d16-openstack'"
time="2018-05-08T13:12:19Z" level=info msg="Successfully handled: update service 'demo/lb1-2ad78ab3-aa94-43e4-a764-194a67601d16-openstack'"
time="2018-05-08T13:12:47Z" level=info msg="Successfully handled: update service 'demo/lb1-2ad78ab3-aa94-43e4-a764-194a67601d16-openstack'"
time="2018-05-08T13:13:21Z" level=info msg="Successfully handled: update service 'demo/lb1-2ad78ab3-aa94-43e4-a764-194a67601d16-openstack'"
time="2018-05-08T13:13:47Z" level=info msg="Successfully handled: update service 'demo/lb1-2ad78ab3-aa94-43e4-a764-194a67601d16-openstack'"
time="2018-05-08T13:14:17Z" level=info msg="Successfully handled: update service 'demo/lb1-2ad78ab3-aa94-43e4-a764-194a67601d16-openstack'"

Discoverer: Initialize gauge metrics to zero

Some of the metrics exposed by the discoverer components should probably be initialized to zero so that they show up in grafana. An example that comes to mind is the number of discovery errors.

OpenStack discoverer: Improve handling of LB and Listener names

We use the LBaaS listener name as the port name. When the name is long enough, it can result in an invalid port name.

time="2018-04-26T14:54:23Z" level=error msg="error handling update endpoints 'vioadmin/1-61fb2abd-f007-4dd2-8784-9db807b27e4d-openstack': Endpoints "1-61fb2abd-f007-4dd2-8784-9db807b27e4d-openstack" is invalid: subsets[1].ports[0].name: Invalid value: "k8s-registry-listener-a2fd7571-5091-4a1f-8a65-610250ce8877-11000": must be no more than 63 characters"

Document sample RBAC rules for remote k8s clusters

For the Kuberenetes Discoverer, should document RBAC rules that could be applied to remote clusters which are going to be discovered. Need to document in docs as well as provide samples in deployment.

Secure envoy admin interface

#89 adds network policies to limit who can access the admin interface (to gather metrics), however, it would be better to fully secure the interface. There's an open issue in Envoy, adding to track from here: envoyproxy/envoy#2763

Persist grafana dashboards and make them editable

Currently, grafana dashboards are deployed via the dashboard provisioning feature . This is great because the dashboards come pre-loaded with the grafana deployment. The downside, however, is that grafana does not support modifying these dashboards, and persisting the changes (we are also using configmaps, which are read-only).

Ideally, the grafana dashboards can be modified, and the changes persisted to something like a PV+PVC.

Endpoints discovered from Kubernetes include nodeName and targetRef

Endpoints discovered by the kubernetes-discoverer include the nodeName and the targetRef, which reference resources that do not exist in the Contour cluster. NodeName could be helpful for future features, but I agree that targetRef should be removed.

kind: Endpoints
metadata:
  creationTimestamp: 2018-02-21T22:23:20Z
  labels:
    contour.heptio.com/cluster: origin-k8s
    contour.heptio.com/namespace: default
    contour.heptio.com/service: hello
  name: hello-origin-k8s
  namespace: default
  resourceVersion: "236374"
  selfLink: /api/v1/namespaces/default/endpoints/hello-origin-k8s
  uid: d28f23da-1755-11e8-ab00-f80f4182762e
subsets:
- addresses:
  - ip: 1.2.3.4
    nodeName: worker03
    targetRef:
      kind: Pod
      name: hello-hjn8n
      namespace: default
      resourceVersion: "905453"
      uid: 713bc282-16ce-11e8-a53a-fa163eab6398

Metrics by VHost

Is this a BUG REPORT, PERFORMANCE REPORT or FEATURE REQUEST?:

Feature Request

It would be good to view metrics from a team perspective or by VHost so I could see Requests per second against a VHost + Path. Right now the metric support allows those numbers against a backend, but it might not show the whole picture for what a team would be interested in.

Setup DNS for internal routing

Gimbal is focused on handling Ingress traffic. But another use-case is to handle internal application routing for multiple clusters.

A good example would be to have an application deployed to a cluster. If maintenance is required on that cluster, then spin up the application on a second and slowly transition work over to the new cluster. By providing a single DNS endpoint within the cluster, any consumer of the application doesn't need to change logic.

// cc @hhoover

Split Envoy from Contour

Each instance of Contour will create watches on Services, Endpoints, etc. As we need to scale Envoy to handle more traffic this scales Contour with each instance of Envoy. We should split those apart to allow Envoy to scale separately as needed.

Contour allows us to specify the grpc endpoint (https://github.com/heptio/contour/blob/master/cmd/contour/contour.go#L59), however, it's not secured.

Also, Envoy shouldn't run under the contour service account since it no longer needs the same access to the k8s api.

Allowed namespace list

Add ability for discoverer to read in a configmap or args list of namespaces in which the discoverer should watch for changes.

Use kube-state-metrics for counting the number of services and endpoints discovered

The current grafana dashboard displays the total number of services and endpoints that have been discovered. The dashboard computes these numbers based on the event_timestamp metric, which indicates the last time a service or endpoint was updated.

The problem with using this metric for computing the count is that if the discoverer is restarted, the grafana dashboard will display zero until the services/endpoints are updated (because the "event_timestamp" metric will not be set).

Inspect OpenStack status fields when discovering services

OpenStack resources have a couple of status fields that we might have to inspect to determine whether we should route traffic to that LB or endpoint.

Load balancers have admin_state_up, provisioning_status and operating_status fields. Listeners, pools and pool members all have an admin_state_up field.

Metrics: API Latency

Need to implement api latency, the milliseconds it takes for requests to return from a remote discoverer api (e.g. Openstack)

Labels:

  • clustername
  • clustertype

Total Endpoints Discovered graph misleading

Is this a BUG REPORT, PERFORMANCE REPORT or FEATURE REQUEST?:

bug report

What happened:

Deployed master of Gimbal and looked at the Grafana dashboard for Gimbal Discovery. The Total Endpoints Discovered graph is misleading because Endpoints are not the same as Endpoint Address available.

Scaling up a deployment in a upstream cluster from 1 Pod to 10 Pods results in the total endpoints graph remaining flat at 1.

What you expected to happen:

The graph should increase as remote deployments are scaled up or down.

As the term Endpoints is overloaded, we might consider renaming Total Endpoints to Remote Endpoint Addresses.

Anything else we should know:

Right now we're graphing kube_endpoint_labels but we should be using kube_endpoint_address_available. The catch is that the kube-state-metrics code doesn't provide labels (which include the cluster name) in the kube_endpoint_address_available metric.

Documentation of service discovery naming conventions

Related to #71

We don't currently have any documentation about the naming schemes used by Discoverers. Discoverers should have a consistent naming convention and we should document accordingly.

e.g.

${discoverer-prefix)-${service-name}-${cluster-name}

Update dashboards

Some new metrics have been added, would be good to show them in the included dashboards:

  • API Latency (Openstack)
  • Cycle Duration (Openstack)
  • Queue Size

Metrics: QueueSize

Need to implement number of items in process queue with the following labels:

  • namespace
  • clustername
  • clustertype

Discoverer watch for changes to secret

The discoverer takes in a secret for access credentials to a remote cluster. It needs to watch for changes to this secret and update its configuration accordingly to allow for credential rotation. Could watch for file changes on disk or utilize a Controller to implement.

Metrics: Cycle Duration

Need to implement Cycle Duration which is the milliseconds it takes for all objects to be synced from a remote discoverer api (e.g. Openstack)

Labels:

  • clustername
  • clustertype

[Discoverers] Better handling of whitespace in --cluster-name field

It's possible to specify a cluster-name that includes invalid characters for a Service (e.g. whitespace like a newline character). We should either probably strip whitespace (newlines are common mistakes in shell scripting) or refuse to start the discoverer due to an invalid cluster-name.

time="2018-04-09T01:33:26Z" level=error msg="error handling add service \"default/hello-bgp-cluster\n\": Service \"hello-bgp-cluster\\n\" is invalid: [metadata.name: Invalid value: \"hello-bgp-cluster\\n\": a DNS-1035 label must consist of lower case alphanumeric characters or '-', start with an alphabetic character, and end with an alphanumeric character (e.g. 'my-name',  or 'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?'), metadata.labels: Invalid value: \"bgp-cluster\\n\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')]"

Add issue templates

Create GitHub issue template(s) for common issue types: bug, performance, feature

Active Health Checking of Endpoints

While Kubernetes Services support native health-checking, other upstream systems, including OpenStack, do not include health checking.

Gimbal should allow active health checking of upstream endpoints (grouped by Service).

Active health checking of Endpoints

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.