kubevirt / kubevirtci Goto Github PK
View Code? Open in Web Editor NEWContains cluster definitions and client tools to quickly spin up and destroy ephemeral and scalable k8s and ocp clusters for testing
License: Apache License 2.0
Contains cluster definitions and client tools to quickly spin up and destroy ephemeral and scalable k8s and ocp clusters for testing
License: Apache License 2.0
+ make cluster-up
./cluster/up.sh
Downloading .......
Downloading .......
2018/05/17 21:46:46 Waiting for host: 192.168.66.101:22
provisioning node node01 failed
make: *** [cluster-up] Error 1
+ make cluster-down
./cluster/down.sh
Build step 'Execute shell' marked build as failure
Additional problem that the output does not show any useful information. I think we can add -x options under ssh.sh
script
okd Dockerfile has some python2 leftovers
installing python2-pip
and
RUN pip2 install yq
kubevirtci (getting_started) $ cat cluster-provision/okd/base/Dockerfile
FROM fedora@sha256:a66c6fa97957087176fede47846e503aeffc0441050dd7d6d2ed9e2fae50ea8e
RUN dnf install -y
libvirt
libvirt-devel
libvirt-daemon-kvm
libvirt-client
qemu-kvm
openssh-clients
haproxy
jq
virt-install
socat
selinux-policy
selinux-policy-targeted
httpd-tools
python2-pip &&
dnf clean all
RUN pip2 install yq
/kind enhancement
This issue was originally filed on kubevirt (kubevirt/kubevirt#2034) and was discovered while developing features for kubevirt, trying to test it using the local cluster
What happened:
When using the development cluster (make cluster-up) any openshift provider, we have a hard limit on the number of pods per node. See
https://docs.openshift.com/container-platform/3.11/admin_guide/manage_nodes.html#admin-guide-max-pods-per-node for example. More proof:
1092 12:50:55 fromani@musashi2 ~/Projects/golang/src/kubevirt.io/kubevirt $ $OC describe node
Name: node01
Roles: compute,infra,master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
cpumanager=false
kubernetes.io/hostname=node01
kubevirt.io/schedulable=true
node-role.kubernetes.io/compute=true
node-role.kubernetes.io/infra=true
node-role.kubernetes.io/master=true
Annotations: kubevirt.io/heartbeat=2019-02-19T11:50:11Z
node.openshift.io/md5sum=a201251d7833413b47af5c70d28e10bb
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Sun, 09 Dec 2018 09:47:17 +0100
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Tue, 19 Feb 2019 12:51:05 +0100 Sun, 09 Dec 2018 09:47:10 +0100 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Tue, 19 Feb 2019 12:51:05 +0100 Sun, 09 Dec 2018 09:47:10 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 19 Feb 2019 12:51:05 +0100 Sun, 09 Dec 2018 09:47:10 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 19 Feb 2019 12:51:05 +0100 Sun, 09 Dec 2018 09:47:10 +0100 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 19 Feb 2019 12:51:05 +0100 Sun, 09 Dec 2018 09:50:37 +0100 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.66.101
Hostname: node01
Capacity:
cpu: 5
devices.kubevirt.io/kvm: 110
devices.kubevirt.io/tun: 110
devices.kubevirt.io/vhost-net: 110
hugepages-1Gi: 0
hugepages-2Mi: 128Mi
memory: 4912732Ki
pods: 40
Allocatable:
cpu: 5
devices.kubevirt.io/kvm: 110
devices.kubevirt.io/tun: 110
devices.kubevirt.io/vhost-net: 110
hugepages-1Gi: 0
hugepages-2Mi: 128Mi
memory: 4679260Ki
pods: 40
I'm not sure this is computed from the number of cpu/cores on the machine running the test cluster or it is an hardcoded setting of the image.
On my laptop with 4 cpus and 8 threads, I have 40 maximum pods, and this number is pretty much exhausted from the base openshift environment: I can't even run a VM on it, it won't be scheduled.
What you expected to happen:
In the local cluster environment, which I understand is used mostly for development/quick demos, I expect no limits on the number of pods which could run. The environment will be slow and that's fine
How to reproduce it (as minimally and precisely as possible):
export KUBEVIRT_PROVIDER=os-3.11.0-crio # any os-* is actually fine
make cluster-up # will be fine
make cluster-sync # will be fine
./cluster/kubectl.sh create -f cluster/examples/vm-cirros.yaml # will be fine
./cluster/virtctl.sh start vm-cirros # will silently fail
the actual reason why it will fail is the same as if you try to run any other pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 24s (x56 over 5m) default-scheduler 0/1 nodes are available: 1 Insufficient pods.
Despite KUBECONFIG
being set correctly to a valid configuration, when using provider external
execution of tests fails with error:
panic:
Your test failed.
Ginkgo panics to prevent subsequent assertions from running.
Normally Ginkgo rescues this panic so you shouldn't see it.
But, if you make an assertion in a goroutine, Ginkgo can't capture the panic.
To circumvent this, you should call
defer GinkgoRecover()
at the top of the goroutine that caused this panic.
goroutine 1 [running]:
kubevirt.io/kubevirt/vendor/github.com/onsi/ginkgo.Fail(0xc0005a8000, 0x2d1, 0xc0001b2430, 0x1, 0x1)
/go/src/kubevirt.io/kubevirt/vendor/github.com/onsi/ginkgo/ginkgo_dsl.go:262 +0xc8
kubevirt.io/kubevirt/vendor/github.com/onsi/gomega/internal/assertion.(*Assertion).match(0xc0003d3080, 0x1c36c20, 0x2c29040, 0x0, 0x0, 0x0, 0x0, 0x2c29040)
/go/src/kubevirt.io/kubevirt/vendor/github.com/onsi/gomega/internal/assertion/assertion.go:75 +0x1f1
kubevirt.io/kubevirt/vendor/github.com/onsi/gomega/internal/assertion.(*Assertion).ToNot(0xc0003d3080, 0x1c36c20, 0x2c29040, 0x0, 0x0, 0x0, 0x0)
/go/src/kubevirt.io/kubevirt/vendor/github.com/onsi/gomega/internal/assertion/assertion.go:43 +0xc7
kubevirt.io/kubevirt/tests.GetHighestCPUNumberAmongNodes(0x1c9ee40, 0xc00036b960, 0xc0001b9dc0)
/go/src/kubevirt.io/kubevirt/tests/utils.go:2879 +0x1b8
kubevirt.io/kubevirt/tests_test.glob..func27.3.1()
/go/src/kubevirt.io/kubevirt/tests/vmi_configuration_test.go:142 +0x4a
...
Remediation by soft-linking the internal configuration location against the valid configuration (see #199) makes the tests work again.
References:
I sometimes see error: the path "/tmp/local-volume.yaml" does not exist
on cluster-up.
Here an example:
https://storage.googleapis.com/kubevirt-prow/pr-logs/pull/kubevirt_kubevirt/2768/pull-kubevirt-e2e-os-3.11.0-multus/1179008179694997504/build-log.txt
KUBEVIRT_NUM_NODES should set number of nodes in cluster, but in okd 4.* clusters it doesn't work.
Steps to reproduce:
1)
export KUBEVIRT_PROVIDER="okd-4.1"
export KUBEVIRT_NUM_NODES=4
make cluster-up
cluster-up/kubectl.sh get nodes
Actual results:
Only two nodes are created (1 master and 1 worker)
Expected results:
4 nodes are created (1 master and 3 workers)
Log from cluster building:
[ksimon@localhost kubevirtci]$ make cluster-up
./cluster-up/check.sh
[ OK ] found /dev/kvm
[ OK ] intel nested virtualization enabled
./cluster-up/up.sh
Number of workers: 3
you should provide the installer pull secret file, if you want to install additional machines
Download the image docker.io/kubevirtci/okd-4.1@sha256:e7e3a03bb144eb8c0be4dcd700592934856fb623d51a2b53871d69267ca51c86
Downloading .......
Start the container okd-4.1-cluster
Download the image docker.io/library/registry:2.7.1
Downloading .......
Start the container okd-4.1-registry
Run the cluster
+ NUM_SECONDARY_NICS=0
+ chown root:kvm /dev/kvm
+ chmod 660 /dev/kvm
+ haproxy -f /etc/haproxy/haproxy.cfg
+ virsh list
Id Name State
----------------------------------------------------
++ virsh net-list --name
++ grep -v default
+ cluster_network=test-1-mkmgt
+ virsh net-update test-1-mkmgt add dns-host '<host ip='\''192.168.126.1'\''>
<hostname>ceph</hostname>
<hostname>nfs</hostname>
<hostname>registry</hostname>
</host>' --live --config
Updated network test-1-mkmgt persistent config and live state
+ domain_number=1
++ virsh list --name --all
+ for domain in $(virsh list --name --all)
+ '[' 0 -gt 0 ']'
++ expr 1 + 1
+ domain_number=2
+ virt-xml --edit --memory 12288 test-1-mkmgt-master-0
Domain 'test-1-mkmgt-master-0' defined successfully.
+ virt-xml --edit --cpu host-passthrough test-1-mkmgt-master-0
Domain 'test-1-mkmgt-master-0' defined successfully.
+ [[ test-1-mkmgt-master-0 =~ master ]]
+ virt-xml --edit --vcpu 6 test-1-mkmgt-master-0
Domain 'test-1-mkmgt-master-0' defined successfully.
+ [[ test-1-mkmgt-master-0 =~ worker ]]
+ virsh start test-1-mkmgt-master-0
Domain test-1-mkmgt-master-0 started
+ for domain in $(virsh list --name --all)
+ '[' 0 -gt 0 ']'
++ expr 2 + 1
+ domain_number=3
+ virt-xml --edit --memory 12288 test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ virt-xml --edit --cpu host-passthrough test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ [[ test-1-mkmgt-worker-0-xzq4q =~ master ]]
+ [[ test-1-mkmgt-worker-0-xzq4q =~ worker ]]
+ virt-xml --edit --memory 8192 test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ virt-xml --edit --vcpu 6 test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ virsh start test-1-mkmgt-worker-0-xzq4q
Domain test-1-mkmgt-worker-0-xzq4q started
++ virsh list --name --all
++ virsh list --name
+ [[ test-1-mkmgt-master-0
test-1-mkmgt-worker-0-xzq4q != \t\e\s\t\-\1\-\m\k\m\g\t\-\m\a\s\t\e\r\-\0\
\t\e\s\t\-\1\-\m\k\m\g\t\-\w\o\r\k\e\r\-\0\-\x\z\q\4\q ]]
+ export KUBECONFIG=/root/install/auth/kubeconfig
+ KUBECONFIG=/root/install/auth/kubeconfig
+ oc config set-cluster test-1 --server=https://127.0.0.1:6443
Cluster "test-1" set.
+ oc config set-cluster test-1 --insecure-skip-tls-verify=true
Cluster "test-1" set.
+ oc get nodes
Unable to connect to the server: EOF
+ sleep 5
+ oc get nodes
Unable to connect to the server: EOF
+ sleep 5
+ oc get nodes
Unable to connect to the server: EOF
+ sleep 5
+ oc get nodes
NAME STATUS ROLES AGE VERSION
test-1-mkmgt-master-0 Ready master 54d v1.13.4+c41acf267
test-1-mkmgt-worker-0-xzq4q Ready worker 54d v1.13.4+c41acf267
+ sleep 30
++ oc -n openshift-ingress get pods -o custom-columns=NAME:.metadata.name,HOST_IP:.status.hostIP,PHASE:.status.phase
++ grep route
++ grep Running
++ head -n 1
++ awk '{print $2}'
+ [[ 192.168.126.51 != '' ]]
++ oc -n openshift-ingress get pods -o custom-columns=NAME:.metadata.name,HOST_IP:.status.hostIP,PHASE:.status.phase
++ grep route
++ grep Running
++ head -n 1
++ awk '{print $2}'
+ worker_node_ip=192.168.126.51
+ [[ 192.168.126.51 != \1\9\2\.\1\6\8\.\1\2\6\.\5\1 ]]
++ oc get pods --all-namespaces --no-headers
++ grep -v revision-pruner
++ grep -v Running
++ grep -v Completed
++ wc -l
+ [[ 5 -le 3 ]]
+ echo 'waiting for pods to come online'
waiting for pods to come online
+ sleep 10
We have some helper scripts like, cli.sh, ssh.sh, container.sh and others, would be helpful to have some jobs to run them at the PRs that changes them, like a small bash test suite after cluster-up.
In KubeVirt, we want to begin testing Pod Security Policies. However this requires enabling the PSP admission controller which will negatively impact projects who have not yet adopted PSPs.
To get around this, we need the ability to dynamically enable optional admission controllers during cluster-up through the use of an environment variable.
Under 4.3 and 4.4 providers we have line
sudo
permissions, I expect that we will allow to run make cluster-up
without sudo
permissions.The setup script is a bit aggressive and moves all the local interfaces that are sr-iov capable into the kind node's namespace.
If the device used to connect to the host is sr-iov capable itself, what happens is that after moving it into the node's namespace the host gets disconnected.
There should be a way to filter the interfaces not used for sr-iov testing purpouses.
Two approaches could be:
(or the combination of the two).
The process of update hashes when something is change at the containers is very error prone, for example changing stuff a base image means:
/test push-base
fix hashes
/test push-centos
fix hashes
/test providers
fix hashes
A possible solution would be to just point to "stable" all the providers and make a "post-merge" job that will update stable.
In preparation for running provision / push on prow ci,
we need a way to tag the last known good image, with a tag, so the next push will not make that one tag-less, and prone to be deleted, as suggested in discussion with Roman.
tag can be a words such as stable
or timestamp.
need to keep in mind that we need also to use a sliding window , and not leave all the images tagged, or to config a garbage collect policy on the image repo that for example keep last 5 images.
this might help , tag without pull, didnt try yet
https://dille.name/blog/2018/09/20/how-to-tag-docker-images-without-pulling-them/
we can for example om publish.sh tag current with "stable" and then tag the new with latest (as we do now), or just always tag with timestamp as Roman suggested.
Error response from daemon: Unable to remove volume, volume still in use: remove kubevirt-functional-tests-openshift-release0: volume is in use - [b0bbbbdf0ec8114313f9fc6f35c6733e7c6697478f37ad617a2dfd087cffce62]
make: *** [cluster-down] Error 1
+ make cluster-down
./cluster/down.sh
When we need to just update a script that is in the container,
rerunning the whole provision isnt required, and also require more testing and integration.
Upon discussion with Roman and Quique, we can create a prow job that will:
Pull the current known good image according images.sh
CONTAINER=$(docker create IMG_HASH)
#user block of commands such as:
docker cp cluster-provision/okd/scripts/run.sh $CONTAINER:/scripts
HASH=$(docker commit $CONTAINER)
and then publish it (either in this job or on another, but that will need of course to have the image available)
the user-block code can be a file that the person will push to the PR, in the PR the user will write
/test provide-...
adds also the source controlled files such as cluster-provision/okd/scripts/run.sh, which do need to be merged later, and here need to be copied.
it can safe lots of time and retesting when no need to create a new whole provider,
in case openshift reinstalling is needed, this wont be the case of course.
Is there a way to run kubevirtci on RHEL 8 with no docker?
The code of gocli seems to require Docker daemon as it calls the API. Well there is no docker anymore in RHEL 8 as it was replaced by the daemon-less podman.
Starting of OKD4 cluster spends most time on downloading of the image (which gets cached on the system, so its fine) and waiting for pods to turn Ready:
19:25:04 [check-patch.hco-master.fc30.x86_64] + [[ 8 -le 3 ]]
19:25:04 [check-patch.hco-master.fc30.x86_64] + echo 'waiting for pods to come online'
Is there a way to make a snapshot of the cluster once all pods are deployed? It would make our CI much faster.
So we can ditch GOPATH, but we have to keep vendor dir
It appears that clusterip services are only accessible on nodes that have an endpoint for that service.
For example, take the cdi-api
service. It has one endpoint on node02.
➜ containerized-data-importer git:(master) ✗ k get services -n cdi cdi-api
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cdi-api ClusterIP 10.96.239.43 <none> 443/TCP 20m
➜ containerized-data-importer git:(master) ✗ k get endpoints -n cdi cdi-api
NAME ENDPOINTS AGE
cdi-api 10.244.1.5:8443 17m
➜ containerized-data-importer git:(master) ✗ k get endpoints -n cdi cdi-api -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
endpoints.kubernetes.io/last-change-trigger-time: "2020-01-16T15:58:17Z"
creationTimestamp: "2020-01-16T15:58:11Z"
labels:
cdi.kubevirt.io: cdi-apiserver
operator.cdi.kubevirt.io/createVersion: latest
name: cdi-api
namespace: cdi
resourceVersion: "999"
selfLink: /api/v1/namespaces/cdi/endpoints/cdi-api
uid: 480ddcea-5bde-436a-b8e3-d93af2db8204
subsets:
- addresses:
- ip: 10.244.1.5
nodeName: node02
targetRef:
kind: Pod
name: cdi-apiserver-7859f9fd8c-xjvrj
namespace: cdi
resourceVersion: "997"
uid: 31cbb07b-39af-4ecc-83c3-323bad15a836
ports:
- port: 8443
protocol: TCP
The service is accessible from node02
[vagrant@node02 ~]$ curl --insecure https://10.96.239.43/apis
{
"kind": "APIGroupList",
"apiVersion": "v1",
"groups": [
{
"kind": "APIGroup",
"apiVersion": "v1",
"name": "upload.cdi.kubevirt.io",
"versions": [
{
"groupVersion": "upload.cdi.kubevirt.io/v1alpha1",
"version": "v1alpha1"
}
],
"preferredVersion": {
"groupVersion": "upload.cdi.kubevirt.io/v1alpha1",
"version": "v1alpha1"
},
"serverAddressByClientCIDRs": [
{
"clientCIDR": "0.0.0.0/0",
"serverAddress": ""
}
]
}
]
}
But not accessible from node01
[vagrant@node01 ~]$ curl --insecure -m 10 https://10.96.239.43/apis
curl: (28) Connection timed out after 10001 milliseconds
The service is accessible from node01 using the pod ip. So something appears to be up with iptables/conntrack?
FWIW the same issue also applies to cdi-uploadproxy
One of examples:
+ make cluster-up
./cluster/up.sh
Downloading ...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Downloading ........................................................
Downloading ...................................................................................................................
/bin/bash: ssh.sh: command not found
provisioning node node02 failed
make: *** [cluster-up] Error 1
On my place (remote home office) it takes two hours for the installer step:
INFO Fetching OS image: rhcos-42.80.20191002.0-qemu.qcow2
DEBUG Unpacking OS image into "/root/.cache/openshift-install/libvirt/image/675149f237ac179efd9ab5226c6b2042"...
to finish.
Tested dl speed at my place is 30 MBit. The image size downloaded is roughly 700MB, so it should take around 3,5 minutes. However the download is done each and every time the provision.sh script is started.
This slows down the iterations of creating and changing the provisioning setup. The OpenShift installer provides an env var OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE
that if i.e. set to file://.../images/rhcos-42.80.20191002.0-qemu.qcow2
would skip the download and use the image provided.
This in combination with a mount of the volume into the container via -v $HOME/images:$HOME/images
or the like should help increase iteration time very much.
In k8s-1.15.1, we get a local storage class with pre-provisioned PVs.
It would be useful to get the same with OKD-4.1.
It also means that we need to move to OS 3.10
I just updated to Fedora 31 recently which upgrades cgroups to v2. Docker does not support cgroups v2, which results in an error when starting the docker daemon.
According to this docker issue[1] and referenced issues it does not sound like docker will support cgroupsv2 soon. From the Fedora perspective[2] it sounds we should use cgroupsv2 rather which podman supports already. There is work ongoing to support podman
for kubevirtci [3] though an issue exists that prohibits finalization[4].
So what is the thinking about that issue here? Should we just live with the workaround or consider switching to podman? The latter will cause work in kubevirtci, even though podman claims to be a drop in replacement, which is true for the cli side, but not for the api side.
[1] docker/for-linux#665
[2] https://linuxacademy.com/blog/containers/a-game-changer-for-containers-cgroups/
[3] https://github.com/fromanirh/pack8s
[4] ffromani/pack8s#14
Env(s): Rhel 7.5 (gcloud instance) && Centos 7.5 (local VM in vmware)
I'm following the steps outlined in the main README.md. In both environments, I get the exact same behavior from
# gocli run --random-ports --nodes 3 --background kubevirtci/k8s-1.10.3
Gives the error
{"status":"Status: Image is up to date for docker.io/registry:2"}
2018/07/30 20:55:57 Waiting for host: 192.168.66.101:22
provisioning node node01 failed
Logs from kubevirt-node01
:
# docker logs 4a96ea9c8948
3: tap01: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master br0 state DOWN mode DEFAULT group default qlen 1000
link/ether 92:41:2b:e4:76:9e brd ff:ff:ff:ff:ff:ff
Creating disk "/var/run/disk/disk.qcow2 backed by /disk01.qcow2".
Formatting '/var/run/disk/disk.qcow2', fmt=qcow2 size=42949672960 backing_file=/disk01.qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
SSH will be available on container port 2201.
VNC will be available on container port 5901.
VM MAC in the guest network will be 52:55:00:d1:55:01
VM IP in the guest network will be 192.168.66.101
VM hostname will be node01
mknod: missing operand after '10' <========= looks like line 82 of base/scripts/vm.sh
Try 'mknod --help' for more information.
Is this expected and if so, could the log line be more verbose in what's missing?
When $KUBEVIRTCI_CLUSTER
isn't set, the automatic resolution of it produces an extra /
which causes the make cluster-up
to fail since it cannot find the appropriate scripts.
[alevitte@alevitte kubevirtci]$ make cluster-up
./cluster-up/check.sh
[ OK ] found /dev/kvm
./cluster-up/check.sh: line 41: [: missing `]'
[ERR ] intel nested virtualization not enabled
./cluster-up/up.sh
./cluster-up/up.sh: line 12: /home/alevitte/dev/kubevirtci/cluster-up//cluster-up/cluster/okd-4.3/provider.sh: No such file or directory
./cluster-up/up.sh: line 13: up: command not found
/home/alevitte/dev/kubevirtci/cluster-up/kubectl.sh: line 30: /home/alevitte/dev/kubevirtci/cluster-up//cluster-up/cluster/okd-4.3/provider.sh: No such file or directory
I am trying to set up a dev cluster with OpenShift. I do:
$ make cluster-down # tear down previous cluster
$ docker ps # check no kubevirt containers are running
$ export KUBEVIRT_PROVIDER=os-3.10.0
$ make cluster-up
...
TASK [Ensure openshift-ansible installer package deps are installed] ***********
ok: [node02] => (item=iproute)
ok: [node02] => (item=dbus-python)
ok: [node02] => (item=PyYAML)
ok: [node02] => (item=python-ipaddress)
ok: [node02] => (item=libsemanage-python)
ok: [node02] => (item=yum-utils)
FAILED - RETRYING: Ensure openshift-ansible installer package deps are installed (3 retries left).
FAILED - RETRYING: Ensure openshift-ansible installer package deps are installed (2 retries left).
FAILED - RETRYING: Ensure openshift-ansible installer package deps are installed (1 retries left).
failed: [node02] (item=python-docker) => {"attempts": 3, "changed": false, "item": "python-docker", "msg": "Failure talking to yum: failure: repodata/repomd.xml from my-origin: [Errno 256] No more
mirrors to try.\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001 milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001 milliseconds')"}
It tries to reach to a repo located at 10.35.4.116 that is apparently not available for anyone without access to internal Red Hat network (?).
I checked on the node in question, it has the following repos:
$ ls -1 /etc/yum.repo.d
CentOS-Base.repo
CentOS-CR.repo
CentOS-Debuginfo.repo
CentOS-fasttrack.repo
CentOS-Gluster-3.12.repo
CentOS-Media.repo
CentOS-OpenShift-Origin.repo
CentOS-Sources.repo
CentOS-Vault.repo
epel.repo
epel-testing.repo
origin-latest.repo
The repo in question is defined in origin-latest.repo:
$ cat /etc/yum.repo.d/origin-latest.repo
[my-origin]
name=Origin packages v3.10.0-rc.0
baseurl=http://10.35.4.116/v3.10.0-rc.0/
enabled=1
gpgcheck=0
The file apparently comes from kubevirtci repo:
kubevirtci/os-3.10/scripts/provision.sh
Line 17 in 0a98feb
Looks like I can't get my hands on openshift cluster unless I am inside Red Hat network. I think the idea of these providers is to deploy OpenShift Origin which should be accessible for everyone, is it not?
We should probably avoid using the private IP address for any repo that kubevirtci creates.
guess its by mistake
value appears here
/kind enhancement
Problem:
while working behind network proxy, make cluster-up
fails to setup cluster while downloading CNI config. Also, no docker pull
or kubectl apply
is successful as no image gets downloaded due to lack of proxy settings in docker-daemon.
.
.
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join 192.168.66.101:6443 --token abcdef.1234567890123456 --discovery-token-ca-cert-hash sha256:23d5e3383df6878d799c4b326f8910d521894dc9a403dcac9dd6a3800236fac3
+ kubectl --kubeconfig=/etc/kubernetes/admin.conf apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml
The connection to the server raw.githubusercontent.com was refused - did you specify the right host or port?
provisioning node node01 failed
Makefile:104: recipe for target 'cluster-up' failed
make: *** [cluster-up] Error 1
Expected Behaviour
node.sh
or node<node-no>.sh
etc.gocli run
should be able to consume custom scripts if indicated explicitlyAfter make cluster-up
with KUBEVIRT_PROVIDER=okd-4.3
:
export KUBECONFIG=$(cluster-up/kubeconfig.sh)
[alevitte@alevitte kubevirtci]$ oc get pods
error: Missing or incomplete configuration info. Please login or point to an existing, complete config file:
1. Via the command-line flag --config
2. Via the KUBECONFIG environment variable
3. In your home directory as ~/.kube/config
To view or setup config directly use the 'config' command.
[alevitte@alevitte kubevirtci]$ echo $KUBECONFIG
/home/alevitte/dev/kubevirtci/_ci-configs/okd-4.3/.kubeconfig
Note: this is NOT fixed by #198 .
The docker version that currently available is docker-1.13.1-53.git774336d.el7.centos.x86_64.rpm, but looks like it reintroduces slowness.
kubevirtci
seem to spin off docker containers on docker engine? how can i make this native to an existing openshift/k8s cluster?
The error:
[rwsu@fedora30 kubevirtci]$ ./cluster-up/cli.sh ports registry
failed to found the container with name okd-4.1.0-dnsmasq
When you set INSTALLER_PULL_SECRET and run make cluster-up, the following error occurs:
Error response from daemon: invalid mount config for type "bind": bind source path does not exist
Debug output:
ERROR Attempted to gather debug logs after installation failure: failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: failed to initialize the SSH agent: failed to read directory "/root/.ssh": open /root/.ssh: no such file or directory
adding this (before cluster create) to the provision.sh can fix this problem
or alternatively copy and rename existing keys tuple (or use them with --keys)
ssh-keygen -t rsa -f /root/.ssh/id_rsa -N ''
This issue is from commit fac568c3691954fabea2b0277c16b5fe2af482d6
.
[Reproduced on a local RHEL-8]
When using a k8s provider, e.g.
export KUBEVIRT_PROVIDER=k8s-1.17.0
make cluster-up
And then ssh to a node; e.g.
./cluster-up/ssh.sh node01
The session is not interactive; meaning, anything typed to terminal is ignored, even echo hello
.
The USE_TTY
environment variable is not set. Setting it manually at the beginning of the ssh.sh script does make it work.
It seems that #246 reverted the change to https://github.com/kubevirt/kubevirtci/commits/master/cluster-up/cluster/images.sh.
It seems to have reverted the OCP 4.3 sha256 from ocp-4.3@sha256:03a8c736263493961f198b5cb214d9b1fc265ece233c60bdb1c8b8b4b779ee1e to the old ocp-4.3@sha256:8f59d625852ef285d6ce3ddd6ebd3662707d2c0fab19772b61dd0aa0f6b41e5f which does not work anymore :
2020/02/05 12:10:05 failed to download quay.io/kubevirtci/ocp-4.3@sha256:78c2c7517b3b4d36eebb39e49631340bf6ebf7ac47957612501a43a85fb013ce: Error response from daemon: manifest for quay.io/kubevirtci/ocp-4.3@sha256 :78c2c7517b3b4d36eebb39e49631340bf6ebf7ac47957612501a43a85fb013ce not found
Setting it back manually to the correct sha256 in the image.sh work correctly :
Download the image quay.io/kubevirtci/ocp-4.3@sha256:03a8c736263493961f198b5cb214d9b1fc265ece233c60bdb1c8b8b4b779ee1e Downloading ..
as part of a conversion at kubevirt gchat
we should decide what lanes are supported and which we can drop,
i think that there is a file that list lanes, and we can decide what need to be maintained and what should be dropped from code and update accordingly
(maybe with a markdown table ?, that shows what lanes exists, and what is the status of the lane)
after that we can take it to the next step that from time to time provider.sh (or other suitable file)
can have some workaround for a specific cluster-up problem of a lane, so stuff can be usable easier if we continue to maintain it
First time using kind k8s provider.
> export KUBEVIRT_PROVIDER=kind-k8s-sriov-1.14.2
> make cluster-up
./cluster-up/up.sh
DEBU[14:48:00] Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\t{{.Label "io.k8s.sigs.k
ind.cluster"}}]
...
++ _kubectl get nodes
++ grep sriov-worker
++ /home/dhiller/Projects/github.com/kubevirt/kubevirt/_ci-configs/kind-k8s-sriov-1.14.2/.kubectl --kubeconfig=/home/dhiller/Projects/github.com/kubevirt/kubevirt/_ci-configs/kind-k8s-sriov-1.14.2/.kubeconfig get nodes
+ [[ -z sriov-worker Ready <none> 40s v1.14.2 ]]
+ SRIOV_NODE=sriov-worker
+ mkdir -p /var/run/netns/
mkdir: cannot create directory ‘/var/run/netns/’: Permission denied
make: *** [Makefile:101: cluster-up] Error 1
OKD providers.sh script needs refactoring. Some functions can be reused in other providers.sh scripts.
It would be useful to be able to specify additional disks, as similar to KUBEVIRT_NUM_SECONDARY_NICS.
In order to deploy Rook/Ceph, available disks are needed for the OSDs.
I think this is a leftover from kubevirt. https://github.com/kubevirt/kubevirtci/blob/master/cluster-up/virtctl.sh#L41 contains reference to parent directory/_out/cmd. That is fine on kubevirt/kubevirt, but failing on other projects. Any suggestion how to fix it?
Support k8s-1.11.0
Run Ocp-4.3
try to connect to the container with the make connect target
it exits with an error
regression , worked week or 2 ago
reason is the logic that allow it to be standalone was removed and its used as helper for ssh.sh i think,
also the kubeconfig it sets was removed
This log shows
+ kubectl --kubeconfig=/etc/kubernetes/admin.conf apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml
Unable to connect to the server: net/http: TLS handshake timeout
provisioning node node01 failed
I have seen that lately more frequently.
--workers-memory param (located here: https://github.com/kubevirt/kubevirtci/blob/master/cluster-up/cluster/okd-4.1/provider.sh#L36) doesn't work in okd 4.1
Steps to reproduce:
1)export KUBEVIRT_PROVIDER="okd-4.1"
2) update --workers-memory 8192 (here: https://github.com/kubevirt/kubevirtci/blob/master/cluster-up/cluster/okd-4.1/provider.sh#L36) to e.g. --workers-memory 16000
3) make cluster-up
4) oc get node test-1-mkmgt-worker-0-xzq4q -o json | jq '.status.capacity.memory'
Actual results:
"6099376Ki"
Expected results:
value around 16Gi
Log from cluster building:
[ksimon@localhost kubevirtci]$ make cluster-up
./cluster-up/check.sh
[ OK ] found /dev/kvm
[ OK ] intel nested virtualization enabled
./cluster-up/up.sh
Number of workers: 3
you should provide the installer pull secret file, if you want to install additional machines
Download the image docker.io/kubevirtci/okd-4.1@sha256:e7e3a03bb144eb8c0be4dcd700592934856fb623d51a2b53871d69267ca51c86
Downloading .......
Start the container okd-4.1-cluster
Download the image docker.io/library/registry:2.7.1
Downloading .......
Start the container okd-4.1-registry
Run the cluster
+ NUM_SECONDARY_NICS=0
+ chown root:kvm /dev/kvm
+ chmod 660 /dev/kvm
+ haproxy -f /etc/haproxy/haproxy.cfg
+ virsh list
Id Name State
----------------------------------------------------
++ virsh net-list --name
++ grep -v default
+ cluster_network=test-1-mkmgt
+ virsh net-update test-1-mkmgt add dns-host '<host ip='\''192.168.126.1'\''>
<hostname>ceph</hostname>
<hostname>nfs</hostname>
<hostname>registry</hostname>
</host>' --live --config
Updated network test-1-mkmgt persistent config and live state
+ domain_number=1
++ virsh list --name --all
+ for domain in $(virsh list --name --all)
+ '[' 0 -gt 0 ']'
++ expr 1 + 1
+ domain_number=2
+ virt-xml --edit --memory 12288 test-1-mkmgt-master-0
Domain 'test-1-mkmgt-master-0' defined successfully.
+ virt-xml --edit --cpu host-passthrough test-1-mkmgt-master-0
Domain 'test-1-mkmgt-master-0' defined successfully.
+ [[ test-1-mkmgt-master-0 =~ master ]]
+ virt-xml --edit --vcpu 6 test-1-mkmgt-master-0
Domain 'test-1-mkmgt-master-0' defined successfully.
+ [[ test-1-mkmgt-master-0 =~ worker ]]
+ virsh start test-1-mkmgt-master-0
Domain test-1-mkmgt-master-0 started
+ for domain in $(virsh list --name --all)
+ '[' 0 -gt 0 ']'
++ expr 2 + 1
+ domain_number=3
+ virt-xml --edit --memory 12288 test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ virt-xml --edit --cpu host-passthrough test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ [[ test-1-mkmgt-worker-0-xzq4q =~ master ]]
+ [[ test-1-mkmgt-worker-0-xzq4q =~ worker ]]
+ virt-xml --edit --memory 16000 test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ virt-xml --edit --vcpu 6 test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ virsh start test-1-mkmgt-worker-0-xzq4q
Domain test-1-mkmgt-worker-0-xzq4q started
++ virsh list --name --all
++ virsh list --name
+ [[ test-1-mkmgt-master-0
test-1-mkmgt-worker-0-xzq4q != \t\e\s\t\-\1\-\m\k\m\g\t\-\m\a\s\t\e\r\-\0\
\t\e\s\t\-\1\-\m\k\m\g\t\-\w\o\r\k\e\r\-\0\-\x\z\q\4\q ]]
+ export KUBECONFIG=/root/install/auth/kubeconfig
+ KUBECONFIG=/root/install/auth/kubeconfig
+ oc config set-cluster test-1 --server=https://127.0.0.1:6443
Cluster "test-1" set.
+ oc config set-cluster test-1 --insecure-skip-tls-verify=true
Cluster "test-1" set.
+ oc get nodes
Unable to connect to the server: EOF
+ sleep 5
+ oc get nodes
Unable to connect to the server: EOF
+ sleep 5
+ oc get nodes
NAME STATUS ROLES AGE VERSION
test-1-mkmgt-master-0 Ready master 54d v1.13.4+c41acf267
test-1-mkmgt-worker-0-xzq4q Ready worker 54d v1.13.4+c41acf267
+ sleep 30
Certificates and CAs can be rotated. Ensure that we are ready for that, by rotating everything once per hour. Our test lanes take longer than that. That should reveal issues. Follow up of kubevirt/kubevirt#2234.
Need to run those (provision + push) even with current kubevirtci, in order to see the lanes are stable and usable
so when needed we wont get stuck
thanks
Right now we allow running multiple clusters at the same time, by differentiating container-ownership based on a prefix on the container. Instead we should use docker labels.
When calling from stuff like golang code or jenkins it's not under TTY and it does not return the correct error code (and possibly it does not wait to finish).
To reproduce
echo "./kubevirtci/cluster-up/cli.sh ssh node01 false; echo \$?" > foo.sh
nohup bash foo.sh > nohup.log 2>&1 < /dev/null
cat nohup.log # It should be 1 but it's 0
Running directly the gocli command with and without -it
$ docker run --privileged --net=host --rm -v /var/run/docker.sock:/var/run/docker.sock kubevirtci/gocli@sha256:8571161d7956b830646216335453b995ba754e07319dde062241ccc025f5ee00 --prefix k8s-1.13.3 ssh node01 false
$ echo $?
0
$ docker run --privileged -it --net=host --rm -v /var/run/docker.sock:/var/run/docker.sock kubevirtci/gocli@sha256:8571161d7956b830646216335453b995ba754e07319dde062241ccc025f5ee00 --prefix k8s-1.13.3 ssh node01 false
$ echo $?
1
Also looks like without "-it" it's not waiting to finish the command
$ time docker run --privileged --net=host --rm -v /var/run/docker.sock:/var/run/docker.sock kubevirtci/gocli@sha256:8571161d7956b830646216335453b995ba754e07319dde062241ccc025f5ee00 --prefix k8s-1.13.3 ssh node01 sleep 10
real 0m0,391s
user 0m0,064s
sys 0m0,041s
We have the following PR [1] at kubernetes-nmstate to bump k8s to 1.16 but is failing at cluster-up since we install openvswitch there from cbs.centos.org, logs [2]
Installing Open vSwitch on nodes
2020/01/16 11:37:40 Waiting for host: 192.168.66.101:22
2020/01/16 11:37:40 Connected to tcp://192.168.66.101:22
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: cbs.centos.org; Unknown error
Makefile:112: recipe for target 'cluster-up' failed
make: *** [cluster-up] Error 1
+ EXIT_VALUE=2
[1] nmstate/kubernetes-nmstate#276
[2] https://storage.googleapis.com/kubevirt-prow/pr-logs/pull/nmstate_kubernetes-nmstate/276/pull-kubernetes-nmstate-e2e-k8s/1217766368921784320/build-log.txt
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.