Coder Social home page Coder Social logo

Comments (30)

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

Scalability Notes:

Scalability:

1. Horizontal scaling - on controller plugin - deployment
   - Need to enable auto scaling for controller pod when resoucres used more than 80% 
   - Need to find how many volume requests the controller pod can handle by a single replica before the usage reaches 80%
   - Need to find upto how many replicas the plugin code can run without any errors

2. Vertical scaling - on node plugin - demonset
   - Since it is demonset, vertical scaling is default
   - Need to find how many volumes requests per node the plugin can hadle

Metrics for testing:

Time of provisioning: The amount of time it takes to provision and attach 1000 volumes
   - average of all volume times
   - median of times
   - need to visualize the metrics

Micro benchmarking: Need to benchmark each and every operation of the blugin ex: controllerPublish, controllerCreate, nodePublish, nodeStage
   - can be saved for later

Benchmarking for kubernetes:
   - If the plugin just listens to kublet logs, then it is expected to not create any issues
   - If kubernetes services are making requests during volume operations, we should also consider how much requests the affected service can hadle
   - Even if we don't mind about kubernetes, creating 100pods will definitely affects the etcd service

Network metrics latency testing:
 - latency testing: since the calls are grpc calls, need to see how much time the plugin taking to respond to each request
 - no need to care about io and throughput as it should be tested as part of powervs cloud platform

PowerVS cloud volume testing:
 - No need to bench mark for the powervs cloud like how much time the cloud takes to create disks as the motive is to test the csi driver



Tool for testing: kube-burner
  - https://github.com/cloud-bulldozer/kube-burner
  - Kube-burner is a tool aimed at stressing Kubernetes clusters by creating or deleting a high quantity of objects
  - can create the required number of pods, pvcs 
  
 We can also use Openshift4-tools for bench marking: https://github.com/RobertKrawitz/OpenShift4-tools 

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

/assign

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

Need to use existing tool for benchmarking.
Need to focus on the below operations:

Time of provisioning: The amount of time it takes to provision and attach 100 volumes
   - average of all volume times
   - median of times
   - need to visualize the metrics

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

Create workloads in using Kube-burner with max disk size(100GB) and see how the system behaves.

from ibm-powervs-block-csi-driver.

mkumatag avatar mkumatag commented on July 25, 2024

Create workloads in using Kube-burner with max disk size(100GB) and see how the system behaves.

check whats the max volume size is supported in the powervs in the ibmcloud powervs doc and try creating of that size and time it.

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

The maximum volume size supported is not mentioned in the Docs.
However, from UI I can see the maximum size as 2TB.

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

2000GB is the maximum supported Disk size. Confirmed by Bobby.

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

Used the command mkfs -t xfs /dev/dev-name to format the disk.
Below table shows the run time analysis:

Disk Size Run time
500GB 1m9s
1000GB 2m26s
1500GB 3m40s
2000GB 4m17s

Increasing disk size can definitely increase the time for staging the volume and publish it.
Which increases the time for the workloads to be running.

The above table shows the run time for mkfs.
@mkumatag , do you think that we should track runtime in the plugin for the below operations?

Controller Plugin:

  • CreateVolume

Node Plugin:

  • blkid
  • mkfs
  • mount

from ibm-powervs-block-csi-driver.

mkumatag avatar mkumatag commented on July 25, 2024

Increasing disk size can definitely increase the time for staging the volume and publish it.
Which increases the time for the workloads to be running.

What's the time it takes to create the volume?

from ibm-powervs-block-csi-driver.

mkumatag avatar mkumatag commented on July 25, 2024

and will this time create impact to this timeout value here - #32 (comment)

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

Tried to attach 2000GB disk to a pod using Kube-burner (no of workloads=1).
Create is SuperFast(less than 5s) and attachVolume took 20s.

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

and will this time create impact to this timeout value here - #32 (comment)

We've given 100s timeout to csi-provisioner. We don't really need this much time out.

Current temporary wait loop in the code itself is not going for 2nd iteration most of the times(unless there are more disks created) and the each iteration takes 5s.

Which means, In most cases the created disk is available in 5 to 10s.
It is not the conclusion still, we need to create 100+ disks at a time and see how the system behaves.

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

20 volumes/iteration of size 10GB each took 7mins/iteration, successfully tested with 80volumes.
30 volumes/iteration of size 1GB each took 7mins/iteration, successfully tested with 150volumes.
60 volumes/iteration of size 1GB each took 60mins+/iteration, successfully tested with 120volumes -> formatting and mounting 60volumes at the same time takes lots of time on the nodes.

from ibm-powervs-block-csi-driver.

mkumatag avatar mkumatag commented on July 25, 2024

20 volumes/iteration of size 10GB each took 7mins/iteration, successfully tested with 80volumes.
30 volumes/iteration of size 1GB each took 7mins/iteration, successfully tested with 150volumes.
60 volumes/iteration of size 1GB each took 60mins+/iteration, successfully tested with 120volumes -> formatting and mounting 60volumes at the same time takes lots of time on the nodes.

Please mention the number of workers we are trying with.

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

20 volumes/iteration of size 10GB each took 7mins/iteration, successfully tested with 80volumes.
30 volumes/iteration of size 1GB each took 7mins/iteration, successfully tested with 150volumes.
60 volumes/iteration of size 1GB each took 60mins+/iteration, successfully tested with 120volumes -> formatting and mounting 60volumes at the same time takes lots of time on the nodes.

Please mention the number of workers we are trying with.

No of worker nodes used in this test is 3.

Need to use taint feature and schedule volumes on a single node of the cluster using Kube-burner.

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

Used taint to schedule all the workloads on a single node.
No of worker nodes: 3

No of workloads/iteration Run time/iteration No of iterations
10 ~3m 5
20 ~3m 5
30 ~7m 4
50 ~10m 2

Tried to attach 127+ volumes to a single node(127 is the max limit) by running 20 workloads/iteration.
On the 6th iteration, pods were in pending state.
Could schedule only 110 pods on the node.

[root@madhan-multinode-kubernetes-1 ~]# kubectl get pods --all-namespaces -o wide | grep madhan-multinode-kubernetes-2 | wc -l
110

Pods were in Pending state as there were too many pods on the node.

[root@madhan-multinode-kubernetes-1 ~]# kubectl describe pod app-6-8 -n kube-system-test
Name:         app-6-8
Namespace:    kube-system-test
...
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  25m                 default-scheduler  0/4 nodes are available: 4 persistentvolumeclaim "powervs-claim-6-8" not found.
  Warning  FailedScheduling  53s (x23 over 25m)  default-scheduler  0/4 nodes are available: 1 Too many pods, 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had taint {key1: value1}, that the pod didn't tolerate.
  

Couldn't try attaching the 127 volume on the node as Kubernetes didn't allow to run more than 110 pods/node.

Solution
Need to modify template in such a way that each pod has 2 volumes and try with Kube-burner.

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

Used the command mkfs -t xfs /dev/dev-name to format the disk. Below table shows the run time analysis:

Disk Size Run time
500GB 1m9s
1000GB 2m26s
1500GB 3m40s
2000GB 4m17s
Increasing disk size can definitely increase the time for staging the volume and publish it. Which increases the time for the workloads to be running.

The above table shows the run time for mkfs. @mkumatag , do you think that we should track runtime in the plugin for the below operations?

Controller Plugin:

  • CreateVolume

Node Plugin:

  • blkid
  • mkfs
  • mount

Any format that takes more than 1minute is not acceptable. Check with PowerTeam.

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

20 volumes/iteration of size 10GB each took 7mins/iteration, successfully tested with 80volumes.
30 volumes/iteration of size 1GB each took 7mins/iteration, successfully tested with 150volumes.
60 volumes/iteration of size 1GB each took 60mins+/iteration, successfully tested with 120volumes -> formatting and mounting 60volumes at the same time takes lots of time on the nodes.

Please mention the number of workers we are trying with.

No of worker nodes used in this test is 3.

Need to use taint feature and schedule volumes on a single node of the cluster using Kube-burner.

Check for audit log and control plane logs from the storage. (May need to ask for the logs from the power storage team for the expected timeframe)

from ibm-powervs-block-csi-driver.

k8s-triage-robot avatar k8s-triage-robot commented on July 25, 2024

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

from ibm-powervs-block-csi-driver.

k8s-triage-robot avatar k8s-triage-robot commented on July 25, 2024

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

from ibm-powervs-block-csi-driver.

k8s-triage-robot avatar k8s-triage-robot commented on July 25, 2024

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

from ibm-powervs-block-csi-driver.

k8s-ci-robot avatar k8s-ci-robot commented on July 25, 2024

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

/reopen

from ibm-powervs-block-csi-driver.

k8s-ci-robot avatar k8s-ci-robot commented on July 25, 2024

@Madhan-SWE: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

/remove-lifecycle rotten

from ibm-powervs-block-csi-driver.

k8s-triage-robot avatar k8s-triage-robot commented on July 25, 2024

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

/remove-lifecycle stale

from ibm-powervs-block-csi-driver.

k8s-triage-robot avatar k8s-triage-robot commented on July 25, 2024

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

Scale tests are added as part of the repo: https://github.com/kubernetes-sigs/ibm-powervs-block-csi-driver/tree/main/tests/scale/kube-burner
This issue can be closed.
New issue can be opened in future in order re-test and document the new results.

from ibm-powervs-block-csi-driver.

Madhan-SWE avatar Madhan-SWE commented on July 25, 2024

/remove-lifecycle stale

from ibm-powervs-block-csi-driver.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.