Comments (32)
While the expected NodeStageVolume is called, there is another NodeStageVolume call for other volume in the cluster is made. this could've caused the issue.
I1118 03:32:30.005319 1 node.go:92] NodeStageVolume: called with args {VolumeId:21a0ff66-bf01-45a3-add1-b4f4982854f9 PublishContext:map[wwn:6005076810830198a000000000000d09] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-41aacd4d-9d53-4168-bbc0-77cf56596e26/globalmount VolumeCapability:mount:<fs_type:"ext4" mount_flags:"rw" > access_mode:<mode:SINGLE_NODE_WRITER > Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1637056010413-8081-powervs.csi.ibm.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1118 03:32:40.031333 1 node.go:356] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1118 03:32:40.034375 1 node.go:356] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1118 03:32:40.035256 1 node.go:92] NodeStageVolume: called with args {VolumeId:1659d58d-5ae9-432c-8797-363e448a696d PublishContext:map[wwn:60050768108181d628000000000044ec] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d79ec46a-461b-4422-81c6-b99f4752c0e0/globalmount VolumeCapability:mount:<fs_type:"ext3" > access_mode:<mode:SINGLE_NODE_WRITER > Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1636700261501-8081-powervs.csi.ibm.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1118 03:32:41.953376 1 node.go:356] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1118 03:32:54.032953 1 node.go:356] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1118 03:32:54.036012 1 node.go:356] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
from ibm-powervs-block-csi-driver.
From the NodeStage volume method, RescanSCSIBus() from the mounter is called.
And then the there is no prints after the call in the node plugin logs.
Explored the RescanSCSIBus() method and the method is running the script /usr/bin/rescan-scsi-bus.sh to scan the scsi bus. Manually ran the script, while scanning the scsi bus for the 4th host, the script got stuck and the window is hanging still. This is the reason why we couldn't see logs after this RescanSCSIBus() call
from ibm-powervs-block-csi-driver.
Commented the call for RescanSCSIBus() method as the result of it is not used in the plugin.
When NodeStage volume is called it internally calls GetDevicePath to get the device path using WWN.
GetDevicePath internally calls Attach method from fibrechannel library.
The following order of call created the issue.
NodeStageVolume -> mounter.GetDevicePath-> fibrechannel.Attach -> fibrechannel.searchDisk -> fibrechannel.scsiHostRescan
When RescanSCSIBus() is commented, the method scsiHostRescan is get stuck and no logs after this method call.
The method scsiHostRescan get the list of dirs in the folder /sys/class/scsi_host/. Go to each folder named like host1, host2, host3, host4 and write on the file scan.
While writing on host4/scan, the method got stuck.
Couldn't manually write to the file due to the permission issues.
from ibm-powervs-block-csi-driver.
Since the scsiHostRescan is failing, manually tried to format the newly created disk.
Format window also hung and not returning the results.
[root@madhan-1-kube-1-22-2 ~]# mkfs -t ext4 -F -m0 /dev/dm-26
mke2fs 1.45.6 (20-Mar-2020)
from ibm-powervs-block-csi-driver.
Checking the format status also hangs.
[root@madhan-1-kube-1-22-2 powervs-csi-driver]# blkid -p -s TYPE -s PTTYPE -o export /dev/dm-26
from ibm-powervs-block-csi-driver.
Around 886 host rescans were running in the background.
[root@madhan-1-kube-1-22-2 ~]# ps aux | grep rescan-scsi-bus.sh | wc -l
886
Couldn't force kill the rescan script.
[root@madhan-1-kube-1-22-2 ~]# ps -eaf | grep rescan | grep 4191027
root 4191027 3631367 0 Nov18 ? 00:00:01 /bin/bash /usr/bin/rescan-scsi-bus.sh
[root@madhan-1-kube-1-22-2 ~]# kill -9 4191027
[root@madhan-1-kube-1-22-2 ~]# ps -eaf | grep rescan | grep 4191027
root 4191027 3631367 0 Nov18 ? 00:00:01 /bin/bash /usr/bin/rescan-scsi-bus.sh
from ibm-powervs-block-csi-driver.
[root@madhan-1-kube-1-22-2 ~]# ps -eaf | grep rescan | grep 4191027
root 4191027 3631367 0 Nov18 ? 00:00:01 /bin/bash /usr/bin/rescan-scsi-bus.sh
what are the child processes running part of this script? pstree
may help
from ibm-powervs-block-csi-driver.
[root@madhan-1-kube-1-22-2 ~]# ps -eaf | grep rescan | grep 4191027
root 4191027 3631367 0 Nov18 ? 00:00:01 /bin/bash /usr/bin/rescan-scsi-bus.shwhat are the child processes running part of this script?
pstree
may help
Restarted the host, will check the child processes if the process still running after restart.
from ibm-powervs-block-csi-driver.
Restarted the host and all the background process were killed.
/usr/bin/rescan-scsi-bus.sh is running without any issues.
Redeployed the CSI Driver.
Controller Plugin always fails during blue mix authentication with timeout error
Controller plugin logs:
[root@madhan-1-kube-1-22-2 powervs-csi-driver]# kubectl logs powervs-csi-controller-77ff978f87-5twfw -c powervs-plugin --follow
I1119 08:30:01.255610 1 driver.go:68] Driver: powervs.csi.ibm.com Version: v0.0.2
I1119 08:30:01.255669 1 controller.go:60] retrieving node info from metadata service
I1119 08:30:01.255683 1 metadata.go:27] retrieving instance data from kubernetes api
I1119 08:30:01.257671 1 metadata.go:32] kubernetes api is available
I1119 08:30:01.284241 1 controller.go:65] Metadata: &{cloudInstanceId:7845d372-d4e1-46b8-91fc-41051c984601 pvmInstanceId:9552c51d-5916-4ce5-a061-1e8bd7315ca8}
I1119 08:30:01.285915 1 controller.go:66] Cloud instance id: 7845d372-d4e1-46b8-91fc-41051c984601
I1119 08:30:01.286090 1 controller.go:68] apikey : ===========================================
I1119 08:30:01.286107 1 powervs.go:128] API Key: ===========================================
I1119 08:30:01.286124 1 powervs.go:130] session ERROR: <nil>, bxSess &{Config:0xc00062a0e0}
I1119 08:30:31.287004 1 powervs.go:136] Authentication ERROR: Post "https://iam.cloud.ibm.com/identity/token": dial tcp: i/o timeout
panic: Post "https://iam.cloud.ibm.com/identity/token": dial tcp: i/o timeout
goroutine 1 [running]:
github.com/ppc64le-cloud/powervs-csi-driver/pkg/driver.newControllerService(0xc0000a0230)
/root/e2etest/powervs-csi-driver/pkg/driver/controller.go:71 +0x55c
github.com/ppc64le-cloud/powervs-csi-driver/pkg/driver.NewDriver({0xc00056ff38, 0x4, 0x4})
/root/e2etest/powervs-csi-driver/pkg/driver/driver.go:92 +0x2b8
main.main()
/root/e2etest/powervs-csi-driver/cmd/main.go:33 +0x1a0
Error snippet:
err = authenticateAPIKey(bxSess)
klog.V(4).Infof("Authentication ERROR: %+v", err)
if err != nil {
return nil, err
}
Node plugin is running in the same snippet on the same node with out any errors.
Same version of controller plugin is running without any issues in the new cluster.
from ibm-powervs-block-csi-driver.
The PowerVS-plugin container in the controller pod couldn't connect to internet.
Tried to install ping and ping google.
But, apt-update itself failing due to resolution issues.
[root@madhan-1-kube-1-22-2 powervs-csi-driver]# kubectl exec -it powervs-csi-controller-77ff978f87-6h6x7 -c powervs-plugin /bin/sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
# apt-get update
0% [Working]
0% [Connecting to deb.debian.org] [Connecting to security.debian.org]
0% [Connecting to deb.debian.org] [Connecting to security.debian.org]
0% [Connecting to deb.debian.org] [Connecting to security.debian.org]
Err:1 http://deb.debian.org/debian buster InRelease
Temporary failure resolving 'deb.debian.org'
Err:2 http://security.debian.org/debian-security buster/updates InRelease
Temporary failure resolving 'security.debian.org'
0% [Connecting to deb.debian.org]
0% [Connecting to deb.debian.org]
Err:3 http://deb.debian.org/debian buster-updates InRelease
Temporary failure resolving 'deb.debian.org'
Reading package lists... Done
W: Failed to fetch http://deb.debian.org/debian/dists/buster/InRelease Temporary failure resolving 'deb.debian.org'
W: Failed to fetch http://security.debian.org/debian-security/dists/buster/updates/InRelease Temporary failure resolving 'security.debian.org'
W: Failed to fetch http://deb.debian.org/debian/dists/buster-updates/InRelease Temporary failure resolving 'deb.debian.org'
W: Some index files failed to download. They have been ignored, or old ones used instead.
from ibm-powervs-block-csi-driver.
Controller plugin having the same image is running without any errors in other cluster.
from ibm-powervs-block-csi-driver.
@mkumatag , any approaches on debugging this further?
from ibm-powervs-block-csi-driver.
@mkumatag , any approaches on debugging this further?
this can be a generic issue with the calico/dns pods deployed, can you check if you can reach the outside n/w via ping command,? worst case - you may just need to restart calico/coredns pods.
from ibm-powervs-block-csi-driver.
Modified the plugin image by installing iputils-ping.
Tried pinging 8.8.8.8 from the container, ping is failing as expected.
[root@madhan-1-kube-1-22-2 powervs-csi-driver]# kubectl exec powervs-csi-controller-77ff978f87-ngzfs -c powervs-plugin -- ping 8.8.8.8
command terminated with exit code 137
[root@madhan-1-kube-1-22-2 powervs-csi-driver]#
Planning to restart calico and coredns pods.
from ibm-powervs-block-csi-driver.
Calico has some network issues in the latest version.
Applied the below fix given by @bkhadars solved the issue.
systemctl start docker
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
# Flush All Iptables Chains/Firewall rules #
iptables -F
# Delete all Iptables Chains #
iptables -X
# Flush all counters too #
iptables -Z
# Flush and delete all nat and mangle #
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X
iptables -t raw -F
iptables -t raw -X
systemctl restart docker
from ibm-powervs-block-csi-driver.
Ran e2e test cases to reproduce and analyse the issues.
There were no issues for 30minutes. Then the cluster is started showing up the format issue: #14
from ibm-powervs-block-csi-driver.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
from ibm-powervs-block-csi-driver.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
from ibm-powervs-block-csi-driver.
/remove-lifecycle rotten
from ibm-powervs-block-csi-driver.
/reopen
from ibm-powervs-block-csi-driver.
/ptal
from ibm-powervs-block-csi-driver.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
from ibm-powervs-block-csi-driver.
/remove-lifecycle stale
from ibm-powervs-block-csi-driver.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
from ibm-powervs-block-csi-driver.
/remove-lifecycle stale
from ibm-powervs-block-csi-driver.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
from ibm-powervs-block-csi-driver.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
from ibm-powervs-block-csi-driver.
/remove-lifecycle rotten
from ibm-powervs-block-csi-driver.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
from ibm-powervs-block-csi-driver.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
from ibm-powervs-block-csi-driver.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
from ibm-powervs-block-csi-driver.
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from ibm-powervs-block-csi-driver.
Related Issues (20)
- Remove the usage of io/ioutil package HOT 1
- Undiagnosed panic detected in pod failure
- controller pod restarts many times HOT 1
- too many cloud api calls in node-update-controller HOT 6
- Add e2e tests HOT 4
- create a prow job for release-0.3 and release-0.4 branches HOT 5
- Unable to upgrade k8s.io/kubernetes v1.27.3
- Allow setting webhook server port for controller manager HOT 2
- Add govulncheck target in Makefile
- Prow job to publish unit test coverage HOT 4
- Cannot find enough information on device delete failures in node plugin
- Add documentation for firewall requirements HOT 6
- Bump k8s.io/klog/v2 from 2.100.1 to 2.110.1 fails
- Ability to run the driver on PowerVS staging environment
- Manually bump otelgrpc from 0.45.0 to 0.46.0 HOT 2
- snyk: code issue: Path Traversal
- Support tier0 and tier5k as storage type HOT 2
- Error running IT: too many colons in address HOT 2
- Upgrade go version 1.22 HOT 4
- Liveness-probe reports ImagePullBackOff when deployed using overlays/stable/?ref=main
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ibm-powervs-block-csi-driver.