Comments (30)
Scalability Notes:
Scalability:
1. Horizontal scaling - on controller plugin - deployment
- Need to enable auto scaling for controller pod when resoucres used more than 80%
- Need to find how many volume requests the controller pod can handle by a single replica before the usage reaches 80%
- Need to find upto how many replicas the plugin code can run without any errors
2. Vertical scaling - on node plugin - demonset
- Since it is demonset, vertical scaling is default
- Need to find how many volumes requests per node the plugin can hadle
Metrics for testing:
Time of provisioning: The amount of time it takes to provision and attach 1000 volumes
- average of all volume times
- median of times
- need to visualize the metrics
Micro benchmarking: Need to benchmark each and every operation of the blugin ex: controllerPublish, controllerCreate, nodePublish, nodeStage
- can be saved for later
Benchmarking for kubernetes:
- If the plugin just listens to kublet logs, then it is expected to not create any issues
- If kubernetes services are making requests during volume operations, we should also consider how much requests the affected service can hadle
- Even if we don't mind about kubernetes, creating 100pods will definitely affects the etcd service
Network metrics latency testing:
- latency testing: since the calls are grpc calls, need to see how much time the plugin taking to respond to each request
- no need to care about io and throughput as it should be tested as part of powervs cloud platform
PowerVS cloud volume testing:
- No need to bench mark for the powervs cloud like how much time the cloud takes to create disks as the motive is to test the csi driver
Tool for testing: kube-burner
- https://github.com/cloud-bulldozer/kube-burner
- Kube-burner is a tool aimed at stressing Kubernetes clusters by creating or deleting a high quantity of objects
- can create the required number of pods, pvcs
We can also use Openshift4-tools for bench marking: https://github.com/RobertKrawitz/OpenShift4-tools
from ibm-powervs-block-csi-driver.
/assign
from ibm-powervs-block-csi-driver.
Need to use existing tool for benchmarking.
Need to focus on the below operations:
Time of provisioning: The amount of time it takes to provision and attach 100 volumes
- average of all volume times
- median of times
- need to visualize the metrics
from ibm-powervs-block-csi-driver.
Create workloads in using Kube-burner with max disk size(100GB) and see how the system behaves.
from ibm-powervs-block-csi-driver.
Create workloads in using Kube-burner with max disk size(100GB) and see how the system behaves.
check whats the max volume size is supported in the powervs in the ibmcloud powervs doc and try creating of that size and time it.
from ibm-powervs-block-csi-driver.
The maximum volume size supported is not mentioned in the Docs.
However, from UI I can see the maximum size as 2TB.
from ibm-powervs-block-csi-driver.
2000GB is the maximum supported Disk size. Confirmed by Bobby.
from ibm-powervs-block-csi-driver.
Used the command mkfs -t xfs /dev/dev-name
to format the disk.
Below table shows the run time analysis:
Disk Size | Run time |
---|---|
500GB | 1m9s |
1000GB | 2m26s |
1500GB | 3m40s |
2000GB | 4m17s |
Increasing disk size can definitely increase the time for staging the volume and publish it.
Which increases the time for the workloads to be running.
The above table shows the run time for mkfs
.
@mkumatag , do you think that we should track runtime in the plugin for the below operations?
Controller Plugin:
- CreateVolume
Node Plugin:
- blkid
- mkfs
- mount
from ibm-powervs-block-csi-driver.
Increasing disk size can definitely increase the time for staging the volume and publish it.
Which increases the time for the workloads to be running.
What's the time it takes to create the volume?
from ibm-powervs-block-csi-driver.
and will this time create impact to this timeout value here - #32 (comment)
from ibm-powervs-block-csi-driver.
Tried to attach 2000GB disk to a pod using Kube-burner (no of workloads=1).
Create is SuperFast(less than 5s) and attachVolume took 20s.
from ibm-powervs-block-csi-driver.
and will this time create impact to this timeout value here - #32 (comment)
We've given 100s timeout to csi-provisioner. We don't really need this much time out.
Current temporary wait loop in the code itself is not going for 2nd iteration most of the times(unless there are more disks created) and the each iteration takes 5s.
Which means, In most cases the created disk is available in 5 to 10s.
It is not the conclusion still, we need to create 100+ disks at a time and see how the system behaves.
from ibm-powervs-block-csi-driver.
20 volumes/iteration of size 10GB each took 7mins/iteration, successfully tested with 80volumes.
30 volumes/iteration of size 1GB each took 7mins/iteration, successfully tested with 150volumes.
60 volumes/iteration of size 1GB each took 60mins+/iteration, successfully tested with 120volumes -> formatting and mounting 60volumes at the same time takes lots of time on the nodes.
from ibm-powervs-block-csi-driver.
20 volumes/iteration of size 10GB each took 7mins/iteration, successfully tested with 80volumes.
30 volumes/iteration of size 1GB each took 7mins/iteration, successfully tested with 150volumes.
60 volumes/iteration of size 1GB each took 60mins+/iteration, successfully tested with 120volumes -> formatting and mounting 60volumes at the same time takes lots of time on the nodes.
Please mention the number of workers we are trying with.
from ibm-powervs-block-csi-driver.
20 volumes/iteration of size 10GB each took 7mins/iteration, successfully tested with 80volumes.
30 volumes/iteration of size 1GB each took 7mins/iteration, successfully tested with 150volumes.
60 volumes/iteration of size 1GB each took 60mins+/iteration, successfully tested with 120volumes -> formatting and mounting 60volumes at the same time takes lots of time on the nodes.Please mention the number of workers we are trying with.
No of worker nodes used in this test is 3.
Need to use taint feature and schedule volumes on a single node of the cluster using Kube-burner.
from ibm-powervs-block-csi-driver.
Used taint to schedule all the workloads on a single node.
No of worker nodes: 3
No of workloads/iteration | Run time/iteration | No of iterations |
---|---|---|
10 | ~3m | 5 |
20 | ~3m | 5 |
30 | ~7m | 4 |
50 | ~10m | 2 |
Tried to attach 127+ volumes to a single node(127 is the max limit) by running 20 workloads/iteration.
On the 6th iteration, pods were in pending state.
Could schedule only 110 pods on the node.
[root@madhan-multinode-kubernetes-1 ~]# kubectl get pods --all-namespaces -o wide | grep madhan-multinode-kubernetes-2 | wc -l
110
Pods were in Pending state as there were too many pods on the node.
[root@madhan-multinode-kubernetes-1 ~]# kubectl describe pod app-6-8 -n kube-system-test
Name: app-6-8
Namespace: kube-system-test
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 25m default-scheduler 0/4 nodes are available: 4 persistentvolumeclaim "powervs-claim-6-8" not found.
Warning FailedScheduling 53s (x23 over 25m) default-scheduler 0/4 nodes are available: 1 Too many pods, 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had taint {key1: value1}, that the pod didn't tolerate.
Couldn't try attaching the 127 volume on the node as Kubernetes didn't allow to run more than 110 pods/node.
Solution
Need to modify template in such a way that each pod has 2 volumes and try with Kube-burner.
from ibm-powervs-block-csi-driver.
Used the command
mkfs -t xfs /dev/dev-name
to format the disk. Below table shows the run time analysis:Disk Size Run time
500GB 1m9s
1000GB 2m26s
1500GB 3m40s
2000GB 4m17s
Increasing disk size can definitely increase the time for staging the volume and publish it. Which increases the time for the workloads to be running.The above table shows the run time for
mkfs
. @mkumatag , do you think that we should track runtime in the plugin for the below operations?Controller Plugin:
- CreateVolume
Node Plugin:
- blkid
- mkfs
- mount
Any format that takes more than 1minute is not acceptable. Check with PowerTeam.
from ibm-powervs-block-csi-driver.
20 volumes/iteration of size 10GB each took 7mins/iteration, successfully tested with 80volumes.
30 volumes/iteration of size 1GB each took 7mins/iteration, successfully tested with 150volumes.
60 volumes/iteration of size 1GB each took 60mins+/iteration, successfully tested with 120volumes -> formatting and mounting 60volumes at the same time takes lots of time on the nodes.Please mention the number of workers we are trying with.
No of worker nodes used in this test is 3.
Need to use taint feature and schedule volumes on a single node of the cluster using Kube-burner.
Check for audit log and control plane logs from the storage. (May need to ask for the logs from the power storage team for the expected timeframe)
from ibm-powervs-block-csi-driver.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
from ibm-powervs-block-csi-driver.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
from ibm-powervs-block-csi-driver.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
from ibm-powervs-block-csi-driver.
@k8s-triage-robot: Closing this issue.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from ibm-powervs-block-csi-driver.
/reopen
from ibm-powervs-block-csi-driver.
@Madhan-SWE: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from ibm-powervs-block-csi-driver.
/remove-lifecycle rotten
from ibm-powervs-block-csi-driver.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
from ibm-powervs-block-csi-driver.
/remove-lifecycle stale
from ibm-powervs-block-csi-driver.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
from ibm-powervs-block-csi-driver.
Scale tests are added as part of the repo: https://github.com/kubernetes-sigs/ibm-powervs-block-csi-driver/tree/main/tests/scale/kube-burner
This issue can be closed.
New issue can be opened in future in order re-test and document the new results.
from ibm-powervs-block-csi-driver.
/remove-lifecycle stale
from ibm-powervs-block-csi-driver.
Related Issues (20)
- Remove the usage of io/ioutil package HOT 1
- Undiagnosed panic detected in pod failure
- controller pod restarts many times HOT 1
- too many cloud api calls in node-update-controller HOT 6
- Add e2e tests HOT 4
- create a prow job for release-0.3 and release-0.4 branches HOT 5
- Unable to upgrade k8s.io/kubernetes v1.27.3
- Allow setting webhook server port for controller manager HOT 2
- Add govulncheck target in Makefile
- Prow job to publish unit test coverage HOT 4
- Cannot find enough information on device delete failures in node plugin
- Add documentation for firewall requirements HOT 6
- Bump k8s.io/klog/v2 from 2.100.1 to 2.110.1 fails
- Ability to run the driver on PowerVS staging environment
- Manually bump otelgrpc from 0.45.0 to 0.46.0 HOT 2
- snyk: code issue: Path Traversal
- Support tier0 and tier5k as storage type HOT 2
- Error running IT: too many colons in address HOT 2
- Upgrade go version 1.22 HOT 4
- Liveness-probe reports ImagePullBackOff when deployed using overlays/stable/?ref=main
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ibm-powervs-block-csi-driver.