Coder Social home page Coder Social logo

kubevirt / kubevirtci Goto Github PK

View Code? Open in Web Editor NEW
78.0 13.0 118.0 30.16 MB

Contains cluster definitions and client tools to quickly spin up and destroy ephemeral and scalable k8s and ocp clusters for testing

License: Apache License 2.0

Shell 68.57% Makefile 0.52% Go 30.21% Dockerfile 0.54% Starlark 0.16%

kubevirtci's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubevirtci's Issues

Sometime make cluster-up fails

+ make cluster-up
./cluster/up.sh
Downloading .......
Downloading .......
2018/05/17 21:46:46 Waiting for host: 192.168.66.101:22
provisioning node node01 failed
make: *** [cluster-up] Error 1
+ make cluster-down
./cluster/down.sh
Build step 'Execute shell' marked build as failure

Additional problem that the output does not show any useful information. I think we can add -x options under ssh.sh script

python2 leftovers in okd Dockerfile

okd Dockerfile has some python2 leftovers

installing python2-pip
and
RUN pip2 install yq

kubevirtci (getting_started) $ cat cluster-provision/okd/base/Dockerfile
FROM fedora@sha256:a66c6fa97957087176fede47846e503aeffc0441050dd7d6d2ed9e2fae50ea8e

RUN dnf install -y
libvirt
libvirt-devel
libvirt-daemon-kvm
libvirt-client
qemu-kvm
openssh-clients
haproxy
jq
virt-install
socat
selinux-policy
selinux-policy-targeted
httpd-tools
python2-pip &&
dnf clean all

RUN pip2 install yq

Relax the constraints on the pods per node on the local cluster (kubevirt: make cluster-up)

/kind enhancement

This issue was originally filed on kubevirt (kubevirt/kubevirt#2034) and was discovered while developing features for kubevirt, trying to test it using the local cluster

What happened:
When using the development cluster (make cluster-up) any openshift provider, we have a hard limit on the number of pods per node. See
https://docs.openshift.com/container-platform/3.11/admin_guide/manage_nodes.html#admin-guide-max-pods-per-node for example. More proof:

1092 12:50:55 fromani@musashi2 ~/Projects/golang/src/kubevirt.io/kubevirt $ $OC describe node
Name:               node01
Roles:              compute,infra,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    cpumanager=false
                    kubernetes.io/hostname=node01
                    kubevirt.io/schedulable=true
                    node-role.kubernetes.io/compute=true
                    node-role.kubernetes.io/infra=true
                    node-role.kubernetes.io/master=true
Annotations:        kubevirt.io/heartbeat=2019-02-19T11:50:11Z
                    node.openshift.io/md5sum=a201251d7833413b47af5c70d28e10bb
                    volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp:  Sun, 09 Dec 2018 09:47:17 +0100
Taints:             <none>
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Tue, 19 Feb 2019 12:51:05 +0100   Sun, 09 Dec 2018 09:47:10 +0100   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Tue, 19 Feb 2019 12:51:05 +0100   Sun, 09 Dec 2018 09:47:10 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Tue, 19 Feb 2019 12:51:05 +0100   Sun, 09 Dec 2018 09:47:10 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Tue, 19 Feb 2019 12:51:05 +0100   Sun, 09 Dec 2018 09:47:10 +0100   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Tue, 19 Feb 2019 12:51:05 +0100   Sun, 09 Dec 2018 09:50:37 +0100   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.66.101
  Hostname:    node01
Capacity:
 cpu:                            5
 devices.kubevirt.io/kvm:        110
 devices.kubevirt.io/tun:        110
 devices.kubevirt.io/vhost-net:  110
 hugepages-1Gi:                  0
 hugepages-2Mi:                  128Mi
 memory:                         4912732Ki
 pods:                           40
Allocatable:
 cpu:                            5
 devices.kubevirt.io/kvm:        110
 devices.kubevirt.io/tun:        110
 devices.kubevirt.io/vhost-net:  110
 hugepages-1Gi:                  0
 hugepages-2Mi:                  128Mi
 memory:                         4679260Ki
 pods:                           40

I'm not sure this is computed from the number of cpu/cores on the machine running the test cluster or it is an hardcoded setting of the image.

On my laptop with 4 cpus and 8 threads, I have 40 maximum pods, and this number is pretty much exhausted from the base openshift environment: I can't even run a VM on it, it won't be scheduled.

What you expected to happen:
In the local cluster environment, which I understand is used mostly for development/quick demos, I expect no limits on the number of pods which could run. The environment will be slow and that's fine

How to reproduce it (as minimally and precisely as possible):

export KUBEVIRT_PROVIDER=os-3.11.0-crio  # any os-* is actually fine
make cluster-up  # will be fine
make cluster-sync  # will be fine
./cluster/kubectl.sh create -f cluster/examples/vm-cirros.yaml  # will be fine
./cluster/virtctl.sh start vm-cirros  # will silently fail

the actual reason why it will fail is the same as if you try to run any other pod:

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  24s (x56 over 5m)  default-scheduler  0/1 nodes are available: 1 Insufficient pods.

kubevirt/kubevirt tests can't find configuration if provider external is used under openshift-ci

Despite KUBECONFIG being set correctly to a valid configuration, when using provider external execution of tests fails with error:

panic: 
Your test failed.
Ginkgo panics to prevent subsequent assertions from running.
Normally Ginkgo rescues this panic so you shouldn't see it.
But, if you make an assertion in a goroutine, Ginkgo can't capture the panic.
To circumvent this, you should call
	defer GinkgoRecover()
at the top of the goroutine that caused this panic.
goroutine 1 [running]:
kubevirt.io/kubevirt/vendor/github.com/onsi/ginkgo.Fail(0xc0005a8000, 0x2d1, 0xc0001b2430, 0x1, 0x1)
	/go/src/kubevirt.io/kubevirt/vendor/github.com/onsi/ginkgo/ginkgo_dsl.go:262 +0xc8
kubevirt.io/kubevirt/vendor/github.com/onsi/gomega/internal/assertion.(*Assertion).match(0xc0003d3080, 0x1c36c20, 0x2c29040, 0x0, 0x0, 0x0, 0x0, 0x2c29040)
	/go/src/kubevirt.io/kubevirt/vendor/github.com/onsi/gomega/internal/assertion/assertion.go:75 +0x1f1
kubevirt.io/kubevirt/vendor/github.com/onsi/gomega/internal/assertion.(*Assertion).ToNot(0xc0003d3080, 0x1c36c20, 0x2c29040, 0x0, 0x0, 0x0, 0x0)
	/go/src/kubevirt.io/kubevirt/vendor/github.com/onsi/gomega/internal/assertion/assertion.go:43 +0xc7
kubevirt.io/kubevirt/tests.GetHighestCPUNumberAmongNodes(0x1c9ee40, 0xc00036b960, 0xc0001b9dc0)
	/go/src/kubevirt.io/kubevirt/tests/utils.go:2879 +0x1b8
kubevirt.io/kubevirt/tests_test.glob..func27.3.1()
	/go/src/kubevirt.io/kubevirt/tests/vmi_configuration_test.go:142 +0x4a
...

Remediation by soft-linking the internal configuration location against the valid configuration (see #199) makes the tests work again.

References:

KUBEVIRT_NUM_NODES variable doesn't work in okd 4.*

KUBEVIRT_NUM_NODES should set number of nodes in cluster, but in okd 4.* clusters it doesn't work.

Steps to reproduce:
1)

export KUBEVIRT_PROVIDER="okd-4.1"
export KUBEVIRT_NUM_NODES=4
  1. make cluster-up
  2. cluster-up/kubectl.sh get nodes

Actual results:
Only two nodes are created (1 master and 1 worker)

Expected results:
4 nodes are created (1 master and 3 workers)

Log from cluster building:

[ksimon@localhost kubevirtci]$ make cluster-up
./cluster-up/check.sh
[ OK ] found /dev/kvm
[ OK ] intel nested virtualization enabled
./cluster-up/up.sh
Number of workers: 3
you should provide the installer pull secret file, if you want to install additional machines
Download the image docker.io/kubevirtci/okd-4.1@sha256:e7e3a03bb144eb8c0be4dcd700592934856fb623d51a2b53871d69267ca51c86
Downloading .......
Start the container okd-4.1-cluster
Download the image docker.io/library/registry:2.7.1
Downloading .......
Start the container okd-4.1-registry
Run the cluster
+ NUM_SECONDARY_NICS=0
+ chown root:kvm /dev/kvm
+ chmod 660 /dev/kvm
+ haproxy -f /etc/haproxy/haproxy.cfg
+ virsh list
 Id    Name                           State
----------------------------------------------------

++ virsh net-list --name
++ grep -v default
+ cluster_network=test-1-mkmgt
+ virsh net-update test-1-mkmgt add dns-host '<host ip='\''192.168.126.1'\''>
  <hostname>ceph</hostname>
  <hostname>nfs</hostname>
  <hostname>registry</hostname>
</host>' --live --config
Updated network test-1-mkmgt persistent config and live state
+ domain_number=1
++ virsh list --name --all
+ for domain in $(virsh list --name --all)
+ '[' 0 -gt 0 ']'
++ expr 1 + 1
+ domain_number=2
+ virt-xml --edit --memory 12288 test-1-mkmgt-master-0
Domain 'test-1-mkmgt-master-0' defined successfully.
+ virt-xml --edit --cpu host-passthrough test-1-mkmgt-master-0
Domain 'test-1-mkmgt-master-0' defined successfully.
+ [[ test-1-mkmgt-master-0 =~ master ]]
+ virt-xml --edit --vcpu 6 test-1-mkmgt-master-0
Domain 'test-1-mkmgt-master-0' defined successfully.
+ [[ test-1-mkmgt-master-0 =~ worker ]]
+ virsh start test-1-mkmgt-master-0
Domain test-1-mkmgt-master-0 started

+ for domain in $(virsh list --name --all)
+ '[' 0 -gt 0 ']'
++ expr 2 + 1
+ domain_number=3
+ virt-xml --edit --memory 12288 test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ virt-xml --edit --cpu host-passthrough test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ [[ test-1-mkmgt-worker-0-xzq4q =~ master ]]
+ [[ test-1-mkmgt-worker-0-xzq4q =~ worker ]]
+ virt-xml --edit --memory 8192 test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ virt-xml --edit --vcpu 6 test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ virsh start test-1-mkmgt-worker-0-xzq4q
Domain test-1-mkmgt-worker-0-xzq4q started

++ virsh list --name --all
++ virsh list --name
+ [[ test-1-mkmgt-master-0
test-1-mkmgt-worker-0-xzq4q != \t\e\s\t\-\1\-\m\k\m\g\t\-\m\a\s\t\e\r\-\0\
\t\e\s\t\-\1\-\m\k\m\g\t\-\w\o\r\k\e\r\-\0\-\x\z\q\4\q ]]
+ export KUBECONFIG=/root/install/auth/kubeconfig
+ KUBECONFIG=/root/install/auth/kubeconfig
+ oc config set-cluster test-1 --server=https://127.0.0.1:6443
Cluster "test-1" set.
+ oc config set-cluster test-1 --insecure-skip-tls-verify=true
Cluster "test-1" set.
+ oc get nodes
Unable to connect to the server: EOF
+ sleep 5
+ oc get nodes
Unable to connect to the server: EOF
+ sleep 5
+ oc get nodes
Unable to connect to the server: EOF
+ sleep 5
+ oc get nodes
NAME                          STATUS   ROLES    AGE   VERSION
test-1-mkmgt-master-0         Ready    master   54d   v1.13.4+c41acf267
test-1-mkmgt-worker-0-xzq4q   Ready    worker   54d   v1.13.4+c41acf267
+ sleep 30
++ oc -n openshift-ingress get pods -o custom-columns=NAME:.metadata.name,HOST_IP:.status.hostIP,PHASE:.status.phase
++ grep route
++ grep Running
++ head -n 1
++ awk '{print $2}'
+ [[ 192.168.126.51 != '' ]]
++ oc -n openshift-ingress get pods -o custom-columns=NAME:.metadata.name,HOST_IP:.status.hostIP,PHASE:.status.phase
++ grep route
++ grep Running
++ head -n 1
++ awk '{print $2}'
+ worker_node_ip=192.168.126.51
+ [[ 192.168.126.51 != \1\9\2\.\1\6\8\.\1\2\6\.\5\1 ]]
++ oc get pods --all-namespaces --no-headers
++ grep -v revision-pruner
++ grep -v Running
++ grep -v Completed
++ wc -l
+ [[ 5 -le 3 ]]
+ echo 'waiting for pods to come online'
waiting for pods to come online
+ sleep 10

test: add smoke test for helper scripts

We have some helper scripts like, cli.sh, ssh.sh, container.sh and others, would be helpful to have some jobs to run them at the PRs that changes them, like a small bash test suite after cluster-up.

need ability to dynamically enable admission controllers during "cluster-up"

In KubeVirt, we want to begin testing Pod Security Policies. However this requires enabling the PSP admission controller which will negatively impact projects who have not yet adopted PSPs.

To get around this, we need the ability to dynamically enable optional admission controllers during cluster-up through the use of an environment variable.

Sriov provider - disconnects the node if the interface used to reach it is sriov - capable

The setup script is a bit aggressive and moves all the local interfaces that are sr-iov capable into the kind node's namespace.

If the device used to connect to the host is sr-iov capable itself, what happens is that after moving it into the node's namespace the host gets disconnected.
There should be a way to filter the interfaces not used for sr-iov testing purpouses.

Two approaches could be:

  • provide a static list through an env variable
  • skip all the pfs that already own an ip address

(or the combination of the two).

Automatize container hashes update

The process of update hashes when something is change at the containers is very error prone, for example changing stuff a base image means:
/test push-base
fix hashes
/test push-centos
fix hashes
/test providers
fix hashes

A possible solution would be to just point to "stable" all the providers and make a "post-merge" job that will update stable.

Tag latest known good image, before new push, so it wont get deleted once a new latest is pushed

In preparation for running provision / push on prow ci,
we need a way to tag the last known good image, with a tag, so the next push will not make that one tag-less, and prone to be deleted, as suggested in discussion with Roman.
tag can be a words such as stable or timestamp.

need to keep in mind that we need also to use a sliding window , and not leave all the images tagged, or to config a garbage collect policy on the image repo that for example keep last 5 images.

this might help , tag without pull, didnt try yet
https://dille.name/blog/2018/09/20/how-to-tag-docker-images-without-pulling-them/

we can for example om publish.sh tag current with "stable" and then tag the new with latest (as we do now), or just always tag with timestamp as Roman suggested.

Sometimes gocli rm fails to remove docker volume

Error response from daemon: Unable to remove volume, volume still in use: remove kubevirt-functional-tests-openshift-release0: volume is in use - [b0bbbbdf0ec8114313f9fc6f35c6733e7c6697478f37ad617a2dfd087cffce62]
make: *** [cluster-down] Error 1
+ make cluster-down
./cluster/down.sh

Create a prow job that will allow fast provision of ocp, in case the update involves static changes only

When we need to just update a script that is in the container,
rerunning the whole provision isnt required, and also require more testing and integration.
Upon discussion with Roman and Quique, we can create a prow job that will:
Pull the current known good image according images.sh
CONTAINER=$(docker create IMG_HASH)

#user block of commands such as:
docker cp cluster-provision/okd/scripts/run.sh $CONTAINER:/scripts

HASH=$(docker commit $CONTAINER)
and then publish it (either in this job or on another, but that will need of course to have the image available)

the user-block code can be a file that the person will push to the PR, in the PR the user will write
/test provide-...
adds also the source controlled files such as cluster-provision/okd/scripts/run.sh, which do need to be merged later, and here need to be copied.

it can safe lots of time and retesting when no need to create a new whole provider,
in case openshift reinstalling is needed, this wont be the case of course.

Podman (and RHEL 8) support

Is there a way to run kubevirtci on RHEL 8 with no docker?

The code of gocli seems to require Docker daemon as it calls the API. Well there is no docker anymore in RHEL 8 as it was replaced by the daemon-less podman.

Make snapshot with pods started

Starting of OKD4 cluster spends most time on downloading of the image (which gets cached on the system, so its fine) and waiting for pods to turn Ready:

19:25:04 [check-patch.hco-master.fc30.x86_64] + [[ 8 -le 3 ]]
19:25:04 [check-patch.hco-master.fc30.x86_64] + echo 'waiting for pods to come online'

Is there a way to make a snapshot of the cluster once all pods are deployed? It would make our CI much faster.

ClusterIP services may not accessible in multi node 1.17.0 environment

It appears that clusterip services are only accessible on nodes that have an endpoint for that service.

For example, take the cdi-api service. It has one endpoint on node02.

➜  containerized-data-importer git:(master) ✗ k get services -n cdi cdi-api
NAME      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
cdi-api   ClusterIP   10.96.239.43   <none>        443/TCP   20m

➜  containerized-data-importer git:(master) ✗ k get endpoints -n cdi cdi-api
NAME      ENDPOINTS         AGE
cdi-api   10.244.1.5:8443   17m

➜  containerized-data-importer git:(master) ✗ k get endpoints -n cdi cdi-api -o yaml
apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    endpoints.kubernetes.io/last-change-trigger-time: "2020-01-16T15:58:17Z"
  creationTimestamp: "2020-01-16T15:58:11Z"
  labels:
    cdi.kubevirt.io: cdi-apiserver
    operator.cdi.kubevirt.io/createVersion: latest
  name: cdi-api
  namespace: cdi
  resourceVersion: "999"
  selfLink: /api/v1/namespaces/cdi/endpoints/cdi-api
  uid: 480ddcea-5bde-436a-b8e3-d93af2db8204
subsets:
- addresses:
  - ip: 10.244.1.5
    nodeName: node02
    targetRef:
      kind: Pod
      name: cdi-apiserver-7859f9fd8c-xjvrj
      namespace: cdi
      resourceVersion: "997"
      uid: 31cbb07b-39af-4ecc-83c3-323bad15a836
  ports:
  - port: 8443
    protocol: TCP

The service is accessible from node02

[vagrant@node02 ~]$ curl --insecure https://10.96.239.43/apis
{
 "kind": "APIGroupList",
 "apiVersion": "v1",
 "groups": [
  {
   "kind": "APIGroup",
   "apiVersion": "v1",
   "name": "upload.cdi.kubevirt.io",
   "versions": [
    {
     "groupVersion": "upload.cdi.kubevirt.io/v1alpha1",
     "version": "v1alpha1"
    }
   ],
   "preferredVersion": {
    "groupVersion": "upload.cdi.kubevirt.io/v1alpha1",
    "version": "v1alpha1"
   },
   "serverAddressByClientCIDRs": [
    {
     "clientCIDR": "0.0.0.0/0",
     "serverAddress": ""
    }
   ]
  }
 ]
}

But not accessible from node01

[vagrant@node01 ~]$ curl --insecure -m 10 https://10.96.239.43/apis
curl: (28) Connection timed out after 10001 milliseconds

The service is accessible from node01 using the pod ip. So something appears to be up with iptables/conntrack?

FWIW the same issue also applies to cdi-uploadproxy

Provisioning of nodes sometime fails on ssh.sh command

One of examples:

+ make cluster-up
./cluster/up.sh
Downloading ...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Downloading ........................................................
Downloading ...................................................................................................................
/bin/bash: ssh.sh: command not found
provisioning node node02 failed
make: *** [cluster-up] Error 1

Add support for OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE in gocli

On my place (remote home office) it takes two hours for the installer step:

INFO Fetching OS image: rhcos-42.80.20191002.0-qemu.qcow2                                                                                               
DEBUG Unpacking OS image into "/root/.cache/openshift-install/libvirt/image/675149f237ac179efd9ab5226c6b2042"... 

to finish.

Tested dl speed at my place is 30 MBit. The image size downloaded is roughly 700MB, so it should take around 3,5 minutes. However the download is done each and every time the provision.sh script is started.

This slows down the iterations of creating and changing the provisioning setup. The OpenShift installer provides an env var OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE that if i.e. set to file://.../images/rhcos-42.80.20191002.0-qemu.qcow2 would skip the download and use the image provided.

This in combination with a mount of the volume into the container via -v $HOME/images:$HOME/images or the like should help increase iteration time very much.

no cgroupsv2 support for docker

I just updated to Fedora 31 recently which upgrades cgroups to v2. Docker does not support cgroups v2, which results in an error when starting the docker daemon.

According to this docker issue[1] and referenced issues it does not sound like docker will support cgroupsv2 soon. From the Fedora perspective[2] it sounds we should use cgroupsv2 rather which podman supports already. There is work ongoing to support podman for kubevirtci [3] though an issue exists that prohibits finalization[4].

So what is the thinking about that issue here? Should we just live with the workaround or consider switching to podman? The latter will cause work in kubevirtci, even though podman claims to be a drop in replacement, which is true for the cli side, but not for the api side.

[1] docker/for-linux#665
[2] https://linuxacademy.com/blog/containers/a-game-changer-for-containers-cgroups/
[3] https://github.com/fromanirh/pack8s
[4] ffromani/pack8s#14

Node provisioning fails consistently

Env(s): Rhel 7.5 (gcloud instance) && Centos 7.5 (local VM in vmware)

I'm following the steps outlined in the main README.md. In both environments, I get the exact same behavior from

# gocli run --random-ports --nodes 3 --background kubevirtci/k8s-1.10.3

Gives the error

{"status":"Status: Image is up to date for docker.io/registry:2"}
2018/07/30 20:55:57 Waiting for host: 192.168.66.101:22
provisioning node node01 failed

Logs from kubevirt-node01:

# docker logs 4a96ea9c8948
3: tap01: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master br0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 92:41:2b:e4:76:9e brd ff:ff:ff:ff:ff:ff
Creating disk "/var/run/disk/disk.qcow2 backed by /disk01.qcow2".
Formatting '/var/run/disk/disk.qcow2', fmt=qcow2 size=42949672960 backing_file=/disk01.qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
SSH will be available on container port 2201.
VNC will be available on container port 5901.
VM MAC in the guest network will be 52:55:00:d1:55:01
VM IP in the guest network will be 192.168.66.101
VM hostname will be node01
mknod: missing operand after '10'   <========= looks like line 82 of base/scripts/vm.sh
Try 'mknod --help' for more information.

Is this expected and if so, could the log line be more verbose in what's missing?

Syntax error when not `$KUBEVIRTCI_CLUSTER` not set

When $KUBEVIRTCI_CLUSTER isn't set, the automatic resolution of it produces an extra / which causes the make cluster-up to fail since it cannot find the appropriate scripts.

[alevitte@alevitte kubevirtci]$ make cluster-up 
./cluster-up/check.sh
[ OK ] found /dev/kvm
./cluster-up/check.sh: line 41: [: missing `]'
[ERR ] intel nested virtualization not enabled
./cluster-up/up.sh
./cluster-up/up.sh: line 12: /home/alevitte/dev/kubevirtci/cluster-up//cluster-up/cluster/okd-4.3/provider.sh: No such file or directory
./cluster-up/up.sh: line 13: up: command not found
/home/alevitte/dev/kubevirtci/cluster-up/kubectl.sh: line 30: /home/alevitte/dev/kubevirtci/cluster-up//cluster-up/cluster/okd-4.3/provider.sh: No such file or directory

Can't cluster-up with provider os-3.10.0 due to unaccessible address of origin-latest.repo

I am trying to set up a dev cluster with OpenShift. I do:

$ make cluster-down # tear down previous cluster
$ docker ps # check no kubevirt containers are running
$ export KUBEVIRT_PROVIDER=os-3.10.0
$ make cluster-up
...
TASK [Ensure openshift-ansible installer package deps are installed] ***********
ok: [node02] => (item=iproute)
ok: [node02] => (item=dbus-python)
ok: [node02] => (item=PyYAML)
ok: [node02] => (item=python-ipaddress)
ok: [node02] => (item=libsemanage-python)
ok: [node02] => (item=yum-utils)
FAILED - RETRYING: Ensure openshift-ansible installer package deps are installed (3 retries left).
FAILED - RETRYING: Ensure openshift-ansible installer package deps are installed (2 retries left).
FAILED - RETRYING: Ensure openshift-ansible installer package deps are installed (1 retries left).
failed: [node02] (item=python-docker) => {"attempts": 3, "changed": false, "item": "python-docker", "msg": "Failure talking to yum: failure: repodata/repomd.xml from my-origin: [Errno 256] No more
mirrors to try.\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001 milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001
milliseconds')\nhttp://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: [Errno 12] Timeout on
http://10.35.4.116/v3.10.0-rc.0/repodata/repomd.xml: (28, 'Connection timed out after 30001 milliseconds')"}

It tries to reach to a repo located at 10.35.4.116 that is apparently not available for anyone without access to internal Red Hat network (?).

I checked on the node in question, it has the following repos:

$ ls -1 /etc/yum.repo.d
CentOS-Base.repo
CentOS-CR.repo
CentOS-Debuginfo.repo
CentOS-fasttrack.repo
CentOS-Gluster-3.12.repo
CentOS-Media.repo
CentOS-OpenShift-Origin.repo
CentOS-Sources.repo
CentOS-Vault.repo
epel.repo
epel-testing.repo
origin-latest.repo

The repo in question is defined in origin-latest.repo:

$ cat /etc/yum.repo.d/origin-latest.repo
[my-origin]
name=Origin packages v3.10.0-rc.0
baseurl=http://10.35.4.116/v3.10.0-rc.0/
enabled=1
gpgcheck=0

The file apparently comes from kubevirtci repo:

baseurl=http://10.35.4.116/v3.10.0-rc.0/

Looks like I can't get my hands on openshift cluster unless I am inside Red Hat network. I think the idea of these providers is to deploy OpenShift Origin which should be accessible for everyone, is it not?

We should probably avoid using the private IP address for any repo that kubevirtci creates.

option to pass custom provision file while (make cluster-up)

/kind enhancement

Problem:
while working behind network proxy, make cluster-up fails to setup cluster while downloading CNI config. Also, no docker pull or kubectl apply is successful as no image gets downloaded due to lack of proxy settings in docker-daemon.

.
.
Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join 192.168.66.101:6443 --token abcdef.1234567890123456 --discovery-token-ca-cert-hash sha256:23d5e3383df6878d799c4b326f8910d521894dc9a403dcac9dd6a3800236fac3

+ kubectl --kubeconfig=/etc/kubernetes/admin.conf apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml
The connection to the server raw.githubusercontent.com was refused - did you specify the right host or port?
provisioning node node01 failed
Makefile:104: recipe for target 'cluster-up' failed
make: *** [cluster-up] Error 1

Expected Behaviour

  • modify provision scripts like node.sh or node<node-no>.sh etc.
  • gocli run should be able to consume custom scripts if indicated explicitly

Missing .kubeconfig with okd-4.3

After make cluster-up with KUBEVIRT_PROVIDER=okd-4.3:

export KUBECONFIG=$(cluster-up/kubeconfig.sh)
[alevitte@alevitte kubevirtci]$ oc get pods
error: Missing or incomplete configuration info.  Please login or point to an existing, complete config file:

  1. Via the command-line flag --config
  2. Via the KUBECONFIG environment variable
  3. In your home directory as ~/.kube/config

To view or setup config directly use the 'config' command.
[alevitte@alevitte kubevirtci]$ echo $KUBECONFIG 
/home/alevitte/dev/kubevirtci/_ci-configs/okd-4.3/.kubeconfig

Note: this is NOT fixed by #198 .

how to run on existing ocp?

kubevirtci seem to spin off docker containers on docker engine? how can i make this native to an existing openshift/k8s cluster?

Unable to spin up OCP-4.3 when an OCP pull secret provided

When you set INSTALLER_PULL_SECRET and run make cluster-up, the following error occurs:
Error response from daemon: invalid mount config for type "bind": bind source path does not exist

Debug output:

  • docker run --privileged --net=host --rm -v /var/run/docker.sock:/var/run/docker.sock kubevirtci/gocli@sha256:e48c7285ac9e4e61fe0f89f35ac5f9090497ea7c8165deeadb61e464c88d8afd run okd --container-registry quay.io --installer-pull-secret-file /home/dorzheho/.kube/pull-secret --random-ports --background --prefix ocp-4.3 --master-cpu 6 --workers-cpu 6 --workers-memory 8192 --secondary-nics 0 --registry-volume kubevirt_registry --workers 1 kubevirtci/ocp-4.3@sha256:03a8c736263493961f198b5cb214d9b1fc265ece233c60bdb1c8b8b4b779ee1e --container-registry-user dorzheho --container-registry-password MYPASSWORD
    Download the image quay.io/kubevirtci/ocp-4.3@sha256:03a8c736263493961f198b5cb214d9b1fc265ece233c60bdb1c8b8b4b779ee1e
    Downloading ..................
    Error response from daemon: invalid mount config for type "bind": bind source path does not exist

no keys exists at /root/.ssh folder, provision gives an error in case previous error occured

ERROR Attempted to gather debug logs after installation failure: failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: failed to initialize the SSH agent: failed to read directory "/root/.ssh": open /root/.ssh: no such file or directory

adding this (before cluster create) to the provision.sh can fix this problem
or alternatively copy and rename existing keys tuple (or use them with --keys)
ssh-keygen -t rsa -f /root/.ssh/id_rsa -N ''

Non-interactive session when ssh to a node in k8s-* provider

This issue is from commit fac568c3691954fabea2b0277c16b5fe2af482d6.

[Reproduced on a local RHEL-8]

When using a k8s provider, e.g.

export KUBEVIRT_PROVIDER=k8s-1.17.0
make cluster-up

And then ssh to a node; e.g.

./cluster-up/ssh.sh node01

The session is not interactive; meaning, anything typed to terminal is ignored, even echo hello.

The USE_TTY environment variable is not set. Setting it manually at the beginning of the ssh.sh script does make it work.

OCP 4.3 and 4.4 images entry was removed

It seems that #246 reverted the change to https://github.com/kubevirt/kubevirtci/commits/master/cluster-up/cluster/images.sh.

It seems to have reverted the OCP 4.3 sha256 from ocp-4.3@sha256:03a8c736263493961f198b5cb214d9b1fc265ece233c60bdb1c8b8b4b779ee1e to the old ocp-4.3@sha256:8f59d625852ef285d6ce3ddd6ebd3662707d2c0fab19772b61dd0aa0f6b41e5f which does not work anymore :

2020/02/05 12:10:05 failed to download quay.io/kubevirtci/ocp-4.3@sha256:78c2c7517b3b4d36eebb39e49631340bf6ebf7ac47957612501a43a85fb013ce: Error response from daemon: manifest for quay.io/kubevirtci/ocp-4.3@sha256 :78c2c7517b3b4d36eebb39e49631340bf6ebf7ac47957612501a43a85fb013ce not found

Setting it back manually to the correct sha256 in the image.sh work correctly :

Download the image quay.io/kubevirtci/ocp-4.3@sha256:03a8c736263493961f198b5cb214d9b1fc265ece233c60bdb1c8b8b4b779ee1e Downloading ..

Decide which lanes we support, and which we can drop

as part of a conversion at kubevirt gchat
we should decide what lanes are supported and which we can drop,
i think that there is a file that list lanes, and we can decide what need to be maintained and what should be dropped from code and update accordingly
(maybe with a markdown table ?, that shows what lanes exists, and what is the status of the lane)

after that we can take it to the next step that from time to time provider.sh (or other suitable file)
can have some workaround for a specific cluster-up problem of a lane, so stuff can be usable easier if we continue to maintain it

kind k8s sriov 1.14 provider throws permission denied after cluster up first time

First time using kind k8s provider.

> export KUBEVIRT_PROVIDER=kind-k8s-sriov-1.14.2
> make cluster-up                                                                                                                                       
./cluster-up/up.sh                                                                                                                                      
DEBU[14:48:00] Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\t{{.Label "io.k8s.sigs.k
ind.cluster"}}]                                                                                                                                         
...
++ _kubectl get nodes
++ grep sriov-worker
++ /home/dhiller/Projects/github.com/kubevirt/kubevirt/_ci-configs/kind-k8s-sriov-1.14.2/.kubectl --kubeconfig=/home/dhiller/Projects/github.com/kubevirt/kubevirt/_ci-configs/kind-k8s-sriov-1.14.2/.kubeconfig get nodes
+ [[ -z sriov-worker          Ready    <none>   40s   v1.14.2 ]]
+ SRIOV_NODE=sriov-worker
+ mkdir -p /var/run/netns/
mkdir: cannot create directory ‘/var/run/netns/’: Permission denied
make: *** [Makefile:101: cluster-up] Error 1

Refactor OKD providers

OKD providers.sh script needs refactoring. Some functions can be reused in other providers.sh scripts.

Enable additional disks on nodes

It would be useful to be able to specify additional disks, as similar to KUBEVIRT_NUM_SECONDARY_NICS.

In order to deploy Rook/Ceph, available disks are needed for the OSDs.

"make connect" stopped working with ocp 4.3

Run Ocp-4.3
try to connect to the container with the make connect target
it exits with an error

regression , worked week or 2 ago

reason is the logic that allow it to be standalone was removed and its used as helper for ssh.sh i think,
also the kubeconfig it sets was removed

Sometimes starting k8s fails because of no connection to the apiserver

This log shows

+ kubectl --kubeconfig=/etc/kubernetes/admin.conf apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml
Unable to connect to the server: net/http: TLS handshake timeout
provisioning node node01 failed

I have seen that lately more frequently.

--workers-memory param doesn't work in okd 4.1

--workers-memory param (located here: https://github.com/kubevirt/kubevirtci/blob/master/cluster-up/cluster/okd-4.1/provider.sh#L36) doesn't work in okd 4.1

Steps to reproduce:
1)export KUBEVIRT_PROVIDER="okd-4.1"
2) update --workers-memory 8192 (here: https://github.com/kubevirt/kubevirtci/blob/master/cluster-up/cluster/okd-4.1/provider.sh#L36) to e.g. --workers-memory 16000
3) make cluster-up
4) oc get node test-1-mkmgt-worker-0-xzq4q -o json | jq '.status.capacity.memory'

Actual results:
"6099376Ki"
Expected results:
value around 16Gi

Log from cluster building:

[ksimon@localhost kubevirtci]$ make cluster-up
./cluster-up/check.sh
[ OK ] found /dev/kvm
[ OK ] intel nested virtualization enabled
./cluster-up/up.sh
Number of workers: 3
you should provide the installer pull secret file, if you want to install additional machines
Download the image docker.io/kubevirtci/okd-4.1@sha256:e7e3a03bb144eb8c0be4dcd700592934856fb623d51a2b53871d69267ca51c86
Downloading .......
Start the container okd-4.1-cluster
Download the image docker.io/library/registry:2.7.1
Downloading .......
Start the container okd-4.1-registry
Run the cluster
+ NUM_SECONDARY_NICS=0
+ chown root:kvm /dev/kvm
+ chmod 660 /dev/kvm
+ haproxy -f /etc/haproxy/haproxy.cfg
+ virsh list
 Id    Name                           State
----------------------------------------------------

++ virsh net-list --name
++ grep -v default
+ cluster_network=test-1-mkmgt
+ virsh net-update test-1-mkmgt add dns-host '<host ip='\''192.168.126.1'\''>
  <hostname>ceph</hostname>
  <hostname>nfs</hostname>
  <hostname>registry</hostname>
</host>' --live --config
Updated network test-1-mkmgt persistent config and live state
+ domain_number=1
++ virsh list --name --all
+ for domain in $(virsh list --name --all)
+ '[' 0 -gt 0 ']'
++ expr 1 + 1
+ domain_number=2
+ virt-xml --edit --memory 12288 test-1-mkmgt-master-0
Domain 'test-1-mkmgt-master-0' defined successfully.
+ virt-xml --edit --cpu host-passthrough test-1-mkmgt-master-0
Domain 'test-1-mkmgt-master-0' defined successfully.
+ [[ test-1-mkmgt-master-0 =~ master ]]
+ virt-xml --edit --vcpu 6 test-1-mkmgt-master-0
Domain 'test-1-mkmgt-master-0' defined successfully.
+ [[ test-1-mkmgt-master-0 =~ worker ]]
+ virsh start test-1-mkmgt-master-0
Domain test-1-mkmgt-master-0 started

+ for domain in $(virsh list --name --all)
+ '[' 0 -gt 0 ']'
++ expr 2 + 1
+ domain_number=3
+ virt-xml --edit --memory 12288 test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ virt-xml --edit --cpu host-passthrough test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ [[ test-1-mkmgt-worker-0-xzq4q =~ master ]]
+ [[ test-1-mkmgt-worker-0-xzq4q =~ worker ]]
+ virt-xml --edit --memory 16000 test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ virt-xml --edit --vcpu 6 test-1-mkmgt-worker-0-xzq4q
Domain 'test-1-mkmgt-worker-0-xzq4q' defined successfully.
+ virsh start test-1-mkmgt-worker-0-xzq4q
Domain test-1-mkmgt-worker-0-xzq4q started

++ virsh list --name --all
++ virsh list --name
+ [[ test-1-mkmgt-master-0
test-1-mkmgt-worker-0-xzq4q != \t\e\s\t\-\1\-\m\k\m\g\t\-\m\a\s\t\e\r\-\0\
\t\e\s\t\-\1\-\m\k\m\g\t\-\w\o\r\k\e\r\-\0\-\x\z\q\4\q ]]
+ export KUBECONFIG=/root/install/auth/kubeconfig
+ KUBECONFIG=/root/install/auth/kubeconfig
+ oc config set-cluster test-1 --server=https://127.0.0.1:6443
Cluster "test-1" set.
+ oc config set-cluster test-1 --insecure-skip-tls-verify=true
Cluster "test-1" set.
+ oc get nodes
Unable to connect to the server: EOF
+ sleep 5
+ oc get nodes
Unable to connect to the server: EOF
+ sleep 5
+ oc get nodes
NAME                          STATUS   ROLES    AGE   VERSION
test-1-mkmgt-master-0         Ready    master   54d   v1.13.4+c41acf267
test-1-mkmgt-worker-0-xzq4q   Ready    worker   54d   v1.13.4+c41acf267
+ sleep 30

Calling cli.sh ssh or ssh.sh without TTY does not return the correct error code

When calling from stuff like golang code or jenkins it's not under TTY and it does not return the correct error code (and possibly it does not wait to finish).

To reproduce

echo "./kubevirtci/cluster-up/cli.sh ssh node01 false; echo \$?" > foo.sh
nohup bash foo.sh  > nohup.log 2>&1 < /dev/null
cat nohup.log # It should be 1 but it's 0

Running directly the gocli command with and without -it

$ docker run --privileged --net=host --rm -v /var/run/docker.sock:/var/run/docker.sock kubevirtci/gocli@sha256:8571161d7956b830646216335453b995ba754e07319dde062241ccc025f5ee00 --prefix k8s-1.13.3 ssh node01 false
$ echo $?
0

$ docker run --privileged -it --net=host --rm -v /var/run/docker.sock:/var/run/docker.sock kubevirtci/gocli@sha256:8571161d7956b830646216335453b995ba754e07319dde062241ccc025f5ee00 --prefix k8s-1.13.3 ssh node01 false
$ echo $?
1

Also looks like without "-it" it's not waiting to finish the command

$ time docker run --privileged --net=host --rm -v /var/run/docker.sock:/var/run/docker.sock kubevirtci/gocli@sha256:8571161d7956b830646216335453b995ba754e07319dde062241ccc025f5ee00 --prefix k8s-1.13.3 ssh node01 sleep 10

real	0m0,391s
user	0m0,064s
sys	0m0,041s

Cannot access cbs.centos.org at k8s 1.16 provider at prow

We have the following PR [1] at kubernetes-nmstate to bump k8s to 1.16 but is failing at cluster-up since we install openvswitch there from cbs.centos.org, logs [2]

Installing Open vSwitch on nodes
2020/01/16 11:37:40 Waiting for host: 192.168.66.101:22
2020/01/16 11:37:40 Connected to tcp://192.168.66.101:22
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: cbs.centos.org; Unknown error
Makefile:112: recipe for target 'cluster-up' failed
make: *** [cluster-up] Error 1
+ EXIT_VALUE=2

[1] nmstate/kubernetes-nmstate#276
[2] https://storage.googleapis.com/kubevirt-prow/pr-logs/pull/nmstate_kubernetes-nmstate/276/pull-kubernetes-nmstate-e2e-k8s/1217766368921784320/build-log.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.