Coder Social home page Coder Social logo

kubernetes-sigs / cluster-capacity Goto Github PK

View Code? Open in Web Editor NEW
433.0 23.0 101.0 137.58 MB

Cluster capacity analysis

License: Apache License 2.0

Go 70.33% Shell 13.83% Makefile 1.24% Python 9.18% Dockerfile 0.69% Starlark 4.71%
k8s-sig-scheduling

cluster-capacity's Introduction

Cluster capacity analysis framework

Build Status

Implementation of cluster capacity analysis.

Intro

As new pods get scheduled on nodes in a cluster, more resources get consumed. Monitoring available resources in the cluster is very important as operators can increase the current resources in time before all of them get exhausted. Or, carry different steps that lead to increase of available resources.

Cluster capacity consists of capacities of individual cluster nodes. Capacity covers CPU, memory, disk space and other resources.

Overall remaining allocatable capacity is a rough estimation since it does not assume all resources being distributed among nodes. Goal is to analyze remaining allocatable resources and estimate available capacity that is still consumable in terms of a number of instances of a pod with given requirements that can be scheduled in a cluster.

Build and Run

Build the framework:

$ cd $GOPATH/src/sigs.k8s.io
$ git clone https://github.com/kubernetes-sigs/cluster-capacity
$ cd cluster-capacity
$ make build

and run the analysis:

$ ./cluster-capacity --kubeconfig <path to kubeconfig> --podspec=examples/pod.yaml

For more information about available options run:

$ ./cluster-capacity --help

Demonstration

Assuming a cluster is running with 4 nodes and 1 master with each node with 2 CPUs and 4GB of memory. With pod resource requirements to be 150m of CPU and 100Mi of Memory.

$ ./cluster-capacity --kubeconfig <path to kubeconfig> --podspec=pod.yaml --verbose
Pod requirements:
	- cpu: 150m
	- memory: 100Mi

The cluster can schedule 52 instance(s) of the pod.
Termination reason: FailedScheduling: pod (small-pod-52) failed to fit in any node
fit failure on node (kube-node-1): Insufficient cpu
fit failure on node (kube-node-4): Insufficient cpu
fit failure on node (kube-node-2): Insufficient cpu
fit failure on node (kube-node-3): Insufficient cpu


Pod distribution among nodes:
	- kube-node-1: 13 instance(s)
	- kube-node-4: 13 instance(s)
	- kube-node-2: 13 instance(s)
	- kube-node-3: 13 instance(s)

To decrease available resources in the cluster you can use provided RC (examples/rc.yml):

$ kubectl create -f examples/rc.yml

E.g. to change a number of replicas to 6, you can run:

$ kubectl patch -f examples/rc.yml -p '{"spec":{"replicas":6}}'

Once the number of running pods in the cluster grows and the analysis is run again, the number of schedulable pods decreases as well:

$ ./cluster-capacity --kubeconfig <path to kubeconfig> --podspec=pod.yaml --verbose
Pod requirements:
	- cpu: 150m
	- memory: 100Mi

The cluster can schedule 46 instance(s) of the pod.
Termination reason: FailedScheduling: pod (small-pod-46) failed to fit in any node
fit failure on node (kube-node-1): Insufficient cpu
fit failure on node (kube-node-4): Insufficient cpu
fit failure on node (kube-node-2): Insufficient cpu
fit failure on node (kube-node-3): Insufficient cpu


Pod distribution among nodes:
	- kube-node-1: 11 instance(s)
	- kube-node-4: 12 instance(s)
	- kube-node-2: 11 instance(s)
	- kube-node-3: 12 instance(s)

Output format

cluster capacity command has a flag --output (-o) to format its output as json or yaml.

$ ./cluster-capacity --kubeconfig <path to kubeconfig> --podspec=pod.yaml -o json
$ ./cluster-capacity --kubeconfig <path to kubeconfig> --podspec=pod.yaml -o yaml

The json or yaml output is not versioned and is not guaranteed to be stable across various releases.

Running Cluster Capacity as a Job Inside of a Pod

Running the cluster capacity tool as a job inside of a pod has the advantage of being able to be run multiple times without needing user intervention.

Follow these example steps to run Cluster Capacity as a job:

1. Create a Container that runs Cluster Capacity

In this example we create a simple Docker image utilizing the Dockerfile found in the root directory and tag it with cluster-capacity-image:

$ docker build -t cluster-capacity-image .

2. Setup an authorized user with the necessary permissions

$ kubectl apply -f config/rbac.yaml

3. Define and create the pod specification (pod.yaml):

apiVersion: v1
kind: Pod
metadata:
  name: small-pod
  labels:
    app: guestbook
    tier: frontend
spec:
  containers:
  - name: php-redis
    image: gcr.io/google-samples/gb-frontend:v4
    imagePullPolicy: Always
    resources:
      limits:
        cpu: 150m
        memory: 100Mi
      requests:
        cpu: 150m
        memory: 100Mi

The cluster capacity analysis is mounted in a volume using a ConfigMap named cluster-capacity-configmap to mount input pod spec file pod.yaml into a volume test-volume at the path /test-pod.

$ kubectl create configmap cluster-capacity-configmap \
    --from-file pod.yaml

4. Create the job specification (cluster-capacity-job.yaml):

apiVersion: batch/v1
kind: Job
metadata:
  name: cluster-capacity-job
spec:
  parallelism: 1
  completions: 1
  template:
    metadata:
      name: cluster-capacity-pod
    spec:
        containers:
        - name: cluster-capacity
          image: cluster-capacity-image
          imagePullPolicy: "Never"
          volumeMounts:
          - mountPath: /test-pod
            name: test-volume
          env:
          - name: CC_INCLUSTER
            value: "true"
          command:
          - "/bin/sh"
          - "-ec"
          - |
            /bin/cluster-capacity --podspec=/test-pod/pod.yaml --verbose
        restartPolicy: "Never"
        serviceAccountName: cluster-capacity-sa
        volumes:
        - name: test-volume
          configMap:
            name: cluster-capacity-configmap

Note the environment variable CC_INCLUSTER the example above is required. This is used to indicate to the cluster capacity tool that it is running inside a cluster as a pod.

The pod.yaml key of the ConfigMap is the same as the pod specification file name, though it is not required. By doing this, the input pod spec file can be accessed inside the pod as /test-pod/pod.yaml.

5. Run the cluster capacity image as a job in a pod:

$ kubectl create -f cluster-capacity-job.yaml

6. Check the job logs to find the number of pods that can be scheduled in the cluster:

$ kubectl logs jobs/cluster-capacity-job
small-pod pod requirements:
        - CPU: 150m
        - Memory: 100Mi

The cluster can schedule 52 instance(s) of the pod small-pod.

Termination reason: Unschedulable: No nodes are available that match all of the
following predicates:: Insufficient cpu (2).

Pod distribution among nodes:
small-pod
        - 192.168.124.214: 26 instance(s)
        - 192.168.124.120: 26 instance(s)

Pod spec generator: genpod

genpod is an internal tool to cluster capacity, and could be used to create sample pod spec. In general, users are recommended to provide their own pod spec file as part of analysis

As pods are part of a namespace with resource limits and additional constraints (e.g. node selector forced by namespace annotation), it is natural to analyse how many instances of a pod with maximal resource requirements can be scheduled. In order to generate the pod spec, you can run:

$ genpod --kubeconfig <path to kubeconfig>  --namespace <namespace>

Assuming at least one resource limits object is available with at least one maximum resource type per pod. If multiple resource limits objects per namespace are available, minimum of all maximum resources per type is taken. If a namespace is annotated with openshift.io/node-selector, the selector is set as pod's node selector.

Example:

Assuming cluster-capacity namespace with openshift.io/node-selector: "region=hpc,load=high" annotation and resource limits are created (see examples/namespace.yml and examples/limits.yml)

$ kubectl describe limits hpclimits --namespace cluster-capacity
Name:           hpclimits
Namespace:      cluster-capacity
Type            Resource        Min     Max     Default Request Default Limit   Max Limit/Request Ratio
----            --------        ---     ---     --------------- -------------   -----------------------
Pod             cpu             10m     200m    -               -               -
Pod             memory          6Mi     100Mi   -               -               -
Container       memory          6Mi     20Mi    6Mi             6Mi             -
Container       cpu             10m     50m     10m             10m             -
$ genpod --kubeconfig <path to kubeconfig>  --namespace cluster-capacity
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  name: cluster-capacity-stub-container
  namespace: cluster-capacity
spec:
  containers:
  - image: gcr.io/google_containers/pause:2.0
    imagePullPolicy: Always
    name: cluster-capacity-stub-container
    resources:
      limits:
        cpu: 200m
        memory: 100Mi
      requests:
        cpu: 200m
        memory: 100Mi
  dnsPolicy: Default
  nodeSelector:
    load: high
    region: hpc
  restartPolicy: OnFailure
status: {}

Roadmap

Underway:

  • analysis covering scheduler and admission controller
  • generic framework for any scheduler created by the default scheduler factory
  • continuous stream of estimations

Would like to get soon:

  • include multiple schedulers
  • accept a list (sequence) of pods
  • extend analysis with volume handling
  • define common interface each scheduler need to implement if embedded in the framework

Other possibilities:

  • incorporate re-scheduler
  • incorporate preemptive scheduling
  • include more of Kubelet's behaviour (e.g. recognize memory pressure, secrets/configmap existence test)

Community, discussion, contribution, and support

Learn how to engage with the Kubernetes community on the community page.

You can reach the maintainers of this project at:

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

cluster-capacity's People

Contributors

adamdang avatar asifdxtreme avatar aveshagarwal avatar derekwaynecarr avatar dhodovsk avatar huiwq1990 avatar ingvagabund avatar juneezee avatar k8s-ci-robot avatar knelasevero avatar mbssaiakhil avatar mooncak avatar my-git9 avatar nikhita avatar ravisantoshgudimetla avatar ringtail avatar spiffxp avatar sufuf3 avatar ydcool avatar yltsaize avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cluster-capacity's Issues

"restartPolicy" and "dnsPolicy" shouldn't be "Required value"

"restartPolicy" and "dnsPolicy" shouldn't be "Required value". For example, estimate cluster capacity with below yaml (without "restartPolicy" and "dnsPolicy")

# cat simple-pod1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: small-pod
  namespace: cluster-capacity
  labels:
    app: guestbook
    tier: frontend
spec:
  containers:
  - name: php-redis
    image: gcr.io/google-samples/gb-frontend:v4
    imagePullPolicy: Always
    resources:
      limits:
        cpu: 300m
        memory: 200Mi
      requests:
        cpu: 150m
        memory: 100Mi
# cc --podspec simple-pod1.yaml
Failed to parse pod spec file: Invalid pod: "Required value: spec.restartPolicy, Required value: spec.dnsPolicy"

Usage of colors in CLI

Maybe I am color blind, but I find the use of colors in the CLI output really hard to read especially if we have a customized terminal.

For example, it is nearly impossible for me to read the number in the following text:

The cluster can schedule 26 instance(s) of the pod.

Can we remove that feature?

Improve clarity of command output

$ ./cluster-capacity --kubeconfig=/home/decarr/.kube/config --podspec=examples/pod.yaml 
E0213 16:24:54.181990    4663 factory.go:583] Error scheduling cluster-capacity small-pod-26: pod (small-pod-26) failed to fit in any node
fit failure summary on nodes : Insufficient cpu (1); retrying
The cluster can schedule 26 instance(s) of the pod.
Termination reason: FailedScheduling: pod (small-pod-26) failed to fit in any node

The presence of the error text actually makes you think the command didn't work as expected. Can we remove the error printed from the scheduler?

Versioned API

The API returned on watch status should be versioned.

cluster capacity comes in two flavors

kubernetes schedules two primary things:

  1. pods to nodes
  2. pvc to pv

thoughts about dividing the capacity command into two flavors to allow future expansion for clusters that do not have dynamic storage provisioning?

$ cluster-capacity compute // similar to current command
$ cluster-capacity storage // if we ever did something w/ pvc -> pv analysis?

Need better build instructions

Not sure what I am doing, but make build does not provide enough instruction to build these components.

go build -o hypercc github.com/ingvagabund/cluster-capacity/cmd/hypercc
../go/src/github.com/ingvagabund/cluster-capacity/vendor/github.com/ghodss/yaml/yaml.go:10:2: cannot find package "gopkg.in/yaml.v2" in any of:
	/root/go/src/github.com/ingvagabund/cluster-capacity/vendor/gopkg.in/yaml.v2 (vendor tree)
	/usr/lib/golang/src/gopkg.in/yaml.v2 (from $GOROOT)
	/root/go/src/gopkg.in/yaml.v2 (from $GOPATH)
...
../go/src/github.com/ingvagabund/cluster-capacity/pkg/framework/strategy/strategy.go:9:2: cannot find package "k8s.io/kubernetes/plugin/pkg/scheduler/schedulercache" in any of:
	/root/go/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/plugin/pkg/scheduler/schedulercache (vendor tree)
	/usr/lib/golang/src/k8s.io/kubernetes/plugin/pkg/scheduler/schedulercache (from $GOROOT)
	/root/go/src/k8s.io/kubernetes/plugin/pkg/scheduler/schedulercache (from $GOPATH)
make: *** [build] Error 1

Reduce code in vendor dir to minimum possible

I think there is a lot of code in the vendor dir and not all of them seems necessary for cluster capacity. So removing unnecessary code would be better for dependency management.

Should support URL for "--podspec"

I think we should support URL for --podspec
Such as:

# cc --podspec https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/quota/pod-burstable.yaml
Failed to parse pod spec file: Failed to open config file: open /data/src/github.com/ingvagabund/cluster-capacity/test/https:/raw.githubusercontent.com/openshift-qe/v3-testfiles/master/quota/pod-burstable.yaml: no such file or directory

[RFE] Should support user specify spec.nodeName in pod yaml

Pod should be scheduled to specify node follow spec.nodeName in pod yaml, and will not be scheduled to other nodes.

Such as below pod only can be scheduled to ec2-54-161-105-73.compute-1.amazonaws.com, but in current code doesn't support this feature. Furthermore, the pod with spec.nodeName also can be scheduled to a paint or unschedulable node. You can refer to my TC: https://tcms-openshift.rhcloud.com/case/6053/?from_plan=

# cat nodename-pod1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: small-pod1
  namespace: cluster-capacity
  labels:
    app: guestbook
    tier: frontend
spec:
  containers:
  - name: php-redis
    image: gcr.io/google-samples/gb-frontend:v4
    imagePullPolicy: Always
    resources:
      limits:
        cpu: 300m
        memory: 200Mi
      requests:
        cpu: 150m
        memory: 100Mi
  restartPolicy: "OnFailure"
  dnsPolicy: "Default"
  nodeName: ec2-54-161-105-73.compute-1.amazonaws.com
# cc --podspec nodename-pod1.yaml
E1205 01:24:41.024684   17369 factory.go:583] Error scheduling cluster-capacity small-pod-17: pod (small-pod-17) failed to fit in any node
fit failure on node (ec2-54-158-106-111.compute-1.amazonaws.com): Insufficient cpu
fit failure on node (ec2-54-161-105-73.compute-1.amazonaws.com): Insufficient cpu
; retrying
The cluster can schedule 17 instance(s) of the pod.
Termination reason: FailedScheduling: pod (small-pod-17) failed to fit in any node

"--verbose" doesn't work in watch mode

The option of "--verbose" doesn't work in watch mode, for example:

# cc --podspec ./simple-pod3.yaml --period 1 
E1128 22:12:36.321478   29192 factory.go:583] Error scheduling cluster-capacity small-pod-16: pod (small-pod-16) failed to fit in any node
fit failure on node (ec2-54-165-2-61.compute-1.amazonaws.com): Insufficient cpu
fit failure on node (ec2-52-90-124-47.compute-1.amazonaws.com): Insufficient cpu
; retrying
E1128 22:12:38.514192   29192 factory.go:583] Error scheduling cluster-capacity small-pod-16: pod (small-pod-16) failed to fit in any node
fit failure on node (ec2-54-165-2-61.compute-1.amazonaws.com): Insufficient cpu
fit failure on node (ec2-52-90-124-47.compute-1.amazonaws.com): Insufficient cpu
; retrying
# cc --podspec ./simple-pod3.yaml --period 1 --verbose
E1128 22:14:46.597077   30643 factory.go:583] Error scheduling cluster-capacity small-pod-16: pod (small-pod-16) failed to fit in any node
fit failure on node (ec2-54-165-2-61.compute-1.amazonaws.com): Insufficient cpu
fit failure on node (ec2-52-90-124-47.compute-1.amazonaws.com): Insufficient cpu
; retrying
E1128 22:14:48.796596   30643 factory.go:583] Error scheduling cluster-capacity small-pod-16: pod (small-pod-16) failed to fit in any node
fit failure on node (ec2-54-165-2-61.compute-1.amazonaws.com): Insufficient cpu
fit failure on node (ec2-52-90-124-47.compute-1.amazonaws.com): Insufficient cpu
; retrying

Wrong info of "--verbose" in watch mode

There are two problems in output of "--verbose" in watch mode:

  1. Althrough I have created namespace "cluster-capacity", the output of "--verbose" in watch mode still give the error Unable to create next pod to schedule: Pod's namespace cluster-capacity not found: Unable to create getter for namespaces: Requested namespaces resource cluster-capacity not found.

  2. I think we don't need let user know the detail info of "ReplicaSet", suggest remove E1205 00:43:23.701296 18452 reflector.go:361] github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:481: expected type *extensions.ReplicaSet, but watch event object had type *v1beta1.ReplicaSet in output of "--verbose". Because user always will not use 2>/dev/null to filter errors.

# kc get namespace | grep cluster-capacity
cluster-capacity   Active    2h
# cc --podspec examples/pod.yaml --period 1 --verbose
Unable to create next pod to schedule: Pod's namespace cluster-capacity not found: Unable to create getter for namespaces: Requested namespaces resource cluster-capacity not found

E1205 00:43:23.701296   18452 reflector.go:361] github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:481: expected type *extensions.ReplicaSet, but watch event object had type *v1beta1.ReplicaSet
E1205 00:43:23.810525   18452 factory.go:583] Error scheduling cluster-capacity small-pod-16: pod (small-pod-16) failed to fit in any node
fit failure on node (ec2-54-158-106-111.compute-1.amazonaws.com): Insufficient cpu
fit failure on node (ec2-54-161-105-73.compute-1.amazonaws.com): Insufficient cpu
; retrying
Pod requirements:
	- CPU: 150m
	- Memory: 100Mi

The cluster can schedule 16 instance(s) of the pod.
Termination reason: FailedScheduling: pod (small-pod-16) failed to fit in any node
fit failure on node (ec2-54-158-106-111.compute-1.amazonaws.com): Insufficient cpu
fit failure on node (ec2-54-161-105-73.compute-1.amazonaws.com): Insufficient cpu

Pod distribution among nodes:
	- ec2-54-158-106-111.compute-1.amazonaws.com: 9 instance(s)
	- ec2-54-161-105-73.compute-1.amazonaws.com: 7 instance(s)

Report capacity as a metric

As a cluster administrator I rely on prometheus metrics and alert manager rules to automate responding to cluster events. I want cluster capacity results to be exposed as a metric so I can alert operations and respond. Example metric:

# HELP cluster_capacity_schedulable_count number schedulable pods
# TYPE cluster_capacity_schedulable_count gauge
cluster_capacity_schedulable_count{podspec="basic_user_pod.yaml"} 23
cluster_capacity_schedulable_count{podspec="infra.yaml"} 1

With this data I am able create a rule like this to automate a response:

    - alert: MaxCapacityInfra
      expr: cluster_capacity_schedulable_count{podspec="infra.yaml"} < 2
      for: 5m
      labels:
        severity: critical
      annotations:
        description: "Insufficient node capacity given for podspec {{ $labels.podspec }}"
        auto_heal: true
        auto_heal_playbook: playbooks/infra_node_scaleout.yml

Reference: https://prometheus.io/docs/instrumenting/

cc @aveshagarwal @derekwaynecarr @eparis

update package names for this repo

For example:
github.com/ingvagabund/cluster-capacity/pkg/api

Needs to become:
github.com/kubernetes-incubator/cluster-capacity/pkg/api

"Pod requirements" in output should use resource request, and have duplicate info in output

Problem 1: The "Pod requirements" in output is using the value of resource limit, but not resource request.
Problem 2: The output have duplicate info about "duplicate" while running with --verbose. Such as:

fit failure on node (ec2-54-165-2-61.compute-1.amazonaws.com): Insufficient cpu
fit failure on node (ec2-52-90-124-47.compute-1.amazonaws.com): Insufficient cpu

How to reproduce:

  1. Create a yaml file without resource limit
# cat simple-pod3.yaml
apiVersion: v1
kind: Pod
metadata:
  name: small-pod
  namespace: cluster-capacity
  labels:
    app: guestbook
    tier: frontend
spec:
  containers:
  - name: php-redis
    image: gcr.io/google-samples/gb-frontend:v4
    imagePullPolicy: Always
    resources:
      requests:
        cpu: 150m
        memory: 100Mi
  restartPolicy: "OnFailure"
  dnsPolicy: "Default"
  1. Estimate cluster capacity
# cc --podspec ./simple-pod3.yaml --verbose
E1128 21:52:55.638500   14226 factory.go:583] Error scheduling cluster-capacity small-pod-16: pod (small-pod-16) failed to fit in any node
fit failure on node (ec2-54-165-2-61.compute-1.amazonaws.com): Insufficient cpu
fit failure on node (ec2-52-90-124-47.compute-1.amazonaws.com): Insufficient cpu
; retrying
**Pod requirements:
	- cpu: 0
	- memory: 0**

The cluster can schedule 16 instance(s) of the pod.
Termination reason: FailedScheduling: pod (small-pod-16) failed to fit in any node
fit failure on node (ec2-54-165-2-61.compute-1.amazonaws.com): Insufficient cpu
fit failure on node (ec2-52-90-124-47.compute-1.amazonaws.com): Insufficient cpu

Pod distribution among nodes:
	- ec2-54-165-2-61.compute-1.amazonaws.com: 7 instance(s)
	- ec2-52-90-124-47.compute-1.amazonaws.com: 9 instance(s)

Need to check namespace in `metadata.namespce` of pod yaml is exist or not before estimate and schedule pods

I think we need to check namespace in metadata.namespce of pod yaml is exist or not before estimate and schedule pods.
In another word, user want to know how many pods can be scheduled in cluster, but no namespace cluster-capacity in current cluster. Actually, below pod cannot be created as no namespace cluster-capacity.

apiVersion: v1
kind: Pod
metadata:
  name: small-pod
  namespace: cluster-capacity
  labels:
    app: guestbook
    tier: frontend
spec:
  containers:
  - name: php-redis
    image: gcr.io/google-samples/gb-frontend:v4
    imagePullPolicy: Always
    resources:
      limits:
        cpu: 300m
        memory: 200Mi
      requests:
        cpu: 150m
        memory: 100Mi
  restartPolicy: "OnFailure"
  dnsPolicy: "Default"

Panic observed: converting extensions.ReplicaSets with v1 coder instead of extensions coder.

E1122 20:10:14.453800      15 runtime.go:64] Observed a panic: &errors.errorString{s:"extensions.ReplicaSet is not suitable for converting to [\"v1\"]"} (extensions.ReplicaSet is not suitable for converting to ["v1"])
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:70
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:63
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:49
/usr/lib/golang/src/runtime/asm_amd64.s:479
/usr/lib/golang/src/runtime/panic.go:458
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/runtime/codec.go:74
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/pkg/framework/watch/watch.go:79
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/pkg/framework/restclient/restclient.go:574
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/pkg/framework/restclient/restclient.go:640
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/restclient/request.go:659
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/cache/listwatch.go:76
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/cache/listwatch.go:95
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:294
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:204
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:87
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:88
/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:49
/usr/lib/golang/src/runtime/asm_amd64.s:2086
panic: extensions.ReplicaSet is not suitable for converting to ["v1"] [recovered]
	panic: extensions.ReplicaSet is not suitable for converting to ["v1"]
goroutine 129 [running]:
panic(0x1813b20, 0xc4202e05e0)
	/usr/lib/golang/src/runtime/panic.go:500 +0x1a1
github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:56 +0x126
panic(0x1813b20, 0xc4202e05e0)
	/usr/lib/golang/src/runtime/panic.go:458 +0x243
github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/runtime.EncodeOrDie(0x7f391aa986f8, 0xc42029c1b0, 0x2669380, 0xc4201da000, 0xc42029c1b0, 0x0)
	/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/runtime/codec.go:74 +0xce
github.com/ingvagabund/cluster-capacity/pkg/framework/watch.(*WatchBuffer).EmitWatchEvent(0xc42000d5e0, 0x1a90004, 0x5, 0x2669380, 0xc4201da000, 0xc4202e05c0, 0x1)
	/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/pkg/framework/watch/watch.go:79 +0xa7
github.com/ingvagabund/cluster-capacity/pkg/framework/restclient.(*RESTClient).createWatchReadCloser(0xc4201a4000, 0x1a9860a, 0xb, 0x267ee00, 0xc420136de0, 0x0, 0x0, 0x0)
	/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/pkg/framework/restclient/restclient.go:574 +0x1386
github.com/ingvagabund/cluster-capacity/pkg/framework/restclient.(*RESTClient).Do(0xc4201a4000, 0xc4201e4000, 0x0, 0x42, 0x0)
	/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/pkg/framework/restclient/restclient.go:640 +0x188d
github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/restclient.(*Request).Watch(0xc4202f49c0, 0x0, 0x0, 0x0, 0x0)
	/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/restclient/request.go:659 +0x1e3
github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/cache.NewListWatchFromClient.func2(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc420343598, ...)
	/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/cache/listwatch.go:76 +0x1b6
github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/cache.(*ListWatch).Watch(0xc420342840, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/cache/listwatch.go:95 +0x5e
github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch(0xc42035cd80, 0xc42022efc0, 0x0, 0x0)
	/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:294 +0x5f9
github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).RunUntil.func1()
	/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:204 +0x33
github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/util/wait.JitterUntil.func1(0xc42024a7c0)
	/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:87 +0x5e
github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/util/wait.JitterUntil(0xc42024a7c0, 0x3b9aca00, 0x0, 0x1, 0xc42022efc0)
	/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:88 +0xad
github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/util/wait.Until(0xc42024a7c0, 0x3b9aca00, 0xc42022efc0)
	/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:49 +0x4d
created by github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).RunUntil
	/home/jchaloup/Projects/kubernetes/src/github.com/ingvagabund/cluster-capacity/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:207 +0x1c4

Cluster capacity was not limited by limitrange

I found it doesn't base on limitrange while estimating cluster capacity.
Problem Description:

  1. Shouldn't schedule any pod while limit/request value in pod yaml larger than Max of limitrange
  2. Shouldn't schedule any pod while limit/request value in pod yaml smaller than Min of limitrange
  3. Should use default limit/request of limitrange to schedule pods while limit/request value is empty in pod yaml

How to reproduce:

  1. Create namespace "cluster-capacity"
# kc create namespace cluster-capacity
  1. Create limitrange in namespace "cluster-capacity"
# cat limits.yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: hpclimits
  namespace: cluster-capacity
spec:
  limits:
  - max:
      cpu: "400m"
      memory: 200Mi
    min:
      cpu: 20m
      memory: 12Mi
    type: Pod
  - default:
      cpu: 100m
      memory: 80Mi
    defaultRequest:
      cpu: 50m
      memory: 60Mi
    max:
      cpu: "200m"
      memory: 100Mi
    min:
      cpu: 10m
      memory: 6Mi
    type: Container
# kc create -f limits.yaml
  1. Estimate cluster capacity with limit-pod1.yaml, the limit/request value in pod yaml larger than Max of limitrange.
# cat limit-pod1.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  name: cluster-capacity-stub-container
  namespace: cluster-capacity
spec:
  containers:
  - image: gcr.io/google_containers/pause:2.0
    imagePullPolicy: Always
    name: cluster-capacity-stub-container
    resources:
      limits:
        cpu: 800m
        memory: 600Mi
      requests:
        cpu: 700m
        memory: 500Mi
  dnsPolicy: Default
  restartPolicy: OnFailure
status: {}
# cc --podspec ./limit-pod1.yaml --verbose

Actual Result:
3. Still schedule base on limit/request in pod yaml file, but not consider limitation of limitrange.

Expect Result:
3. No instance of the pod can be scheduled.

Implement hypercc.

Currently when running hypercc, it exits silently. It should be implemented as a super command of other sub commands (currently cluster capacity and genpod).

Use cluster-capacity programatically

Is there a way how we could easily use this tool programatically?(It would probably require to migrate to go modules.)

Use case:
Create a single binary library which would call cluster-capacity programatically and do some actions based on the output

Not able to detect the failure message for each node

./cluster-capacity --kubeconfig=k8s.kubeconfig --podspec 1.yaml --verbose
test-0 pod requirements:
- CPU: 2
- Memory: 4Gi

The cluster can schedule 0 instance(s) of the pod test-0.

Termination reason: Unschedulable: 0/21 nodes are available: 1 node(s) didn't find available persistent volumes to bind, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod anti-affinity rules, 1 node(s) had taint {node.k8s.test/lifecycle: offline}, that the pod didn't tolerate, 18 node(s) didn't match Pod's node affinity.

cluster-capacity hang if pod.yaml contain spec.nodeName

cluster-capacity will hang if pod.yaml contain spec.nodeName, pods should be scheduled successful base on spec.nodeName in pod yaml file.
For example:

# cat nodename-pod1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: small-pod
  namespace: cluster-capacity
  labels:
    app: guestbook
    tier: frontend
spec:
  containers:
  - name: php-redis
    image: gcr.io/google-samples/gb-frontend:v4
    imagePullPolicy: Always
    resources:
      limits:
        cpu: 300m
        memory: 200Mi
      requests:
        cpu: 150m
        memory: 100Mi
  restartPolicy: "OnFailure"
  dnsPolicy: "Default"
  nodeName: ec2-54-91-42-3.compute-1.amazonaws.com
# ps -ef|grep nodename-pod
root     31897 14314  0 00:48 pts/1    00:00:00 /data/src/github.com/ingvagabund/cluster-capacity/cluster-capacity --kubeconfig /etc/kubernetes/kubectl.kubeconfig --master https://172.18.5.209:6443 --podspec nodename-pod1.yaml --verbose

# gdb -p 31897
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 31897
Reading symbols from /data/src/github.com/ingvagabund/cluster-capacity/hypercc...done.
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[New LWP 31902]
[New LWP 31901]
[New LWP 31900]
[New LWP 31899]
[New LWP 31898]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
runtime.epollwait () at /usr/lib/golang/src/runtime/sys_linux_amd64.s:444
444		MOVL	AX, ret+24(FP)
warning: File "/usr/lib/golang/src/runtime/runtime-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py:/usr/lib/golang/src/pkg/runtime/runtime-gdb.py".
To enable execution of this file add
	add-auto-load-safe-path /usr/lib/golang/src/runtime/runtime-gdb.py
line to your configuration file "/root/.gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/root/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7.x86_64
(gdb) where
#0  runtime.epollwait () at /usr/lib/golang/src/runtime/sys_linux_amd64.s:444
#1  0x00000000004299a4 in runtime.netpoll (block=true, ~r1=0x0) at /usr/lib/golang/src/runtime/netpoll_epoll.go:67
#2  0x00000000004333bc in runtime.findrunnable (gp=0x1, inheritTime=false) at /usr/lib/golang/src/runtime/proc.go:1958
#3  0x0000000000433b4f in runtime.schedule () at /usr/lib/golang/src/runtime/proc.go:2075
#4  0x0000000000434229 in runtime.goexit0 (gp=0xc8201b4f00) at /usr/lib/golang/src/runtime/proc.go:2210
#5  0x000000000045ad7b in runtime.mcall () at /usr/lib/golang/src/runtime/asm_amd64.s:233
#6  0x00000000033fa600 in runtime.work ()
#7  0x00007ffd1ed59190 in ?? ()
#8  0x00000000033fa600 in runtime.work ()
#9  0x0000000000431022 in runtime.mstart () at /usr/lib/golang/src/runtime/proc.go:1068
#10 0x000000000045ac18 in runtime.rt0_go () at /usr/lib/golang/src/runtime/asm_amd64.s:149
#11 0x0000000000000008 in ?? ()
#12 0x00007ffd1ed591d8 in ?? ()
#13 0x0000000000000008 in ?? ()
#14 0x00007ffd1ed591d8 in ?? ()
#15 0x0000000000000000 in ?? ()

Provide integration tests to be run as part of CI

We need integration tests to verify the CC works.

Assumptions:

  1. kubernetes cluster is already deployed

The tests should consists of:

  1. building cc
  2. running cc with various settings (pod without admissions, pod with admissions, ...)
  3. running cc as a pod in the cluster
  4. ...

Magic command to run:

make integration-test

old (beta) topology labels are going away

Hello.

The labels failure-domain.beta.kubernetes.io/zone and failure-domain.beta.kubernetes.io/region have been deprecated for almost a year now and are scheduled to be removed from kubernetes/kubernetes after 1.20 (Q1'21).

In prep for that I scanned all our repos looking for places that reference the old names and not the new ones. There are a couple such places in this repo. Without knowing exactly what this codebase is doing, I can't say for sure if it is a problem or not, so I am just filing issues.

Please take a look and be advised of this upcoming change.

Think about reducing the number of supported admission controllers

Given the evolving nature of admission control in k8s, I think we should evaluate reducing the number of admission controllers to a fixed set rather than the entire open ended list.

For reference see:
kubernetes/community#132

The candidate list could be limited to things that default resources, and things that constrain resources.

  1. LimitRanger
  2. ResourceQuota

I am also not sure I would want to expose the full set of admission controllers at this time versus just being quota aware or not.

Items to complete before first milestone release

Focus the tool on its primary use-case around simulated scheduling decisions in preparation for an initial milestone release candidate.

Must dos

  • Remove the use of any unversioned clients as they represent support risks across Kubernetes releases that were exposed in the rebase to k8s 1.6.

  • Drop support for --admission-control flag as it uses unversioned clients

  • Drop support for --resource-mode flag as it mandated running a mock API server with upstream admission controllers that are operating against unversioned APIs. Scheduler moved to versioned objects, and so this dichotomy exposed issues in the underlying caches that drive this simulator tool. It is not safe, practical, or required to have ResourceQuota awareness in an initial milestone.

  • Make clear the output from -o yaml|json flag is not versioned/stable/guaranteed across releases.

  • Drop support for --period flag. This flag causes the tool to run a server that can be interacted via curl and do "continuous cluster capacity analysis". The API exposed needs more iteration, and running a server like this opens the risk of escalation style scenarios. This can be explored in a future iteration, but its not required to reach an initial milestone.

  • Document genpod as an internal tool to the project. User can provide their own sample pods to drive simulation behavior without growing the support space of tools offered by this project in its first iteration.

  • Drop --verbose option in favor of a -q flag. If -q is provided, just output the number of pods that can be scheduled, but not the individual failure reasons per node. This will give us stable scripting against the CLI invocation.

  • Update documentation to reflect above

  • Cut branch

  • Push an image

Tests: create test to count a number of running go routines after each scheduler iteration

The cluster capacity framework runs a scheduler and admission controller inside. As currently both components suffer from non-terminated go routines, the number of handing go routines grows over time. There are two PRs (kubernetes/kubernetes#37148, kubernetes/kubernetes#37137) that needs to be merged to fix the problem. In order to avoid similar issues, it makes sense to introduce a test the monitor consumed resources.

Goal here is to create a test that will run the cluster capacity framework in loop and counts the number of running go routines. The test passes if the number of go routines is constant (or in an interval) over time. E.g. run the framework 20 times and compare the number of go routines with the first iteration and 20th interation.

cluster capacity not taking a namespace's resource limits into account

With current implementation, if a namespace has resource limits/request associated with it, but provided pod spec to cluster capacity does not have any resource limits/request specified, I see that output is always 110. That seems like a default pod limit in a cluster on a node, without considering a namespace's resource limits. Is this working as designed? I thought it should have populated input pod's resource limits/request from the pod's namespace's resource limits. Just for info, genpod creates pod spec by taking a namespace's resource limits into account.
@derekwaynecarr @ingvagabund

Need check the file if exist while using URL

Please check the file if exist while using URL:

# cc --podspec https://raw.githubusercontent.com/not-exist
Failed to parse pod spec file: Invalid pod: "Required value: metadata.name, Required value: spec.containers, Required value: spec.restartPolicy, Required value: spec.dnsPolicy" 

We can refer to kubectl:

# kc create -f https://raw.githubusercontent.com/not-exist
error: unable to read URL "https://raw.githubusercontent.com/not-exist", server reported 400 Bad Request, status code=400

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

why specify --kubeconfig and --master

I appear to have to provide both.

With a valid kubeconfig, I am getting:

$ ./cluster-capacity --kubeconfig=/home/decarr/.kube/config --podspec=examples/pod.yaml
(02:41:03 PM) decarr: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined

I am not in-cluster.

Building the tool

Is there any instructions on how to solved the problem mentioned in the ticket #21, tried it with 1.7 and with 1.8. Tried adding the vendor directory to the GOPATH (besides a directory in my home directory), which doesn't work as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.