kubernetes-retired / kubeadm-dind-cluster Goto Github PK

View Code? Open in Web Editor NEW

1.1K 58.0 275.0 1.1 MB

[EOL] A Kubernetes multi-node test cluster based on kubeadm

License: Apache License 2.0

Shell 98.14% Dockerfile 1.68% Makefile 0.18%

kubernetes kubeadm dind k8s k8s-sig-cluster-lifecycle

kubeadm-dind-cluster's People

Contributors

Stargazers

Watchers

Forkers

webwurst zefciu lukemarsden moondev pvo99i alex-arzner-pro aramisjohnson kinvolk-archives cloud-robotics pigmej heschlie 40a luafran stanxii djschny daggerok ncodefresh solidnerd marcosnils mattymo dangula pmichali danehans markjacksonfishing cloudnativelabs pingcap 2ffs2nns rancheral millerhooks chingloong curx ravisantoshgudimetla rushins kuberchaun shashidharatd m1ck2 awsdevopro yamaszone bobhenkel rayhero researchiteng fturib tallaxes stealthybox az82 liyndon uliul-devops adidenko mgicode gokulchandrap lacoste alexxnica kryndex hanwangkun paulczar galenzhao vbmade2000 mffahey amorphid wankes2000 linuxem swapnil-linux rpothier patricklucas savamilovanovic ccojocar vadorovsky ezc totr detiber qiupei leblancd nsvijay04b1 vrovachev openthings markthink zhanghr kalyanbhonagiri16 mgdevstack fmarsaud1 akihirosuda jeffersonjhunt hollychen503 adouba greengosoft alphasocket ncdc elouazzany mattkelly fabricecrb naresh0112 dalazx guiguan may-bright askmeegs ivan4th kafnevod chlung wallrj jellonek

kubeadm-dind-cluster's Issues

Is there anyway to enable --net=host on all docker containers in the v1.6 script

I'm back... I have a question, is there a way to enable the docker arg --net=host on first each node, and then all of there children docker containers. This is specifically for the v1.6 script and I am going to need it for the rbd commands that apparently need a clear path networking path to the host. You have any pointers for me to enable this?

Im hoping I can continue leveraging this script, im trying to avoid going to straight kubeam :(

If this is not the best way to ask questions please let me know of other avenues.

Thanks,
aramis

Error waiting for kube-proxy and the nodes

Hi There i am trying to build one dev machine with 1.8, however its failing at kube-proxy.
Any help really appreciated.

./dind-cluster-v1.8.sh up

WARNING: cluster glitch: proxy pods aren't removed; pods may 'blink' for some time after restore
NAME READY STATUS RESTARTS AGE
etcd-kube-master 1/1 Running 0 17s
kube-dns-545bc4bfd4-vnq5z 2/3 Terminating 0 59s
kube-dns-855bdc94cb-gj8q9 0/3 Terminating 0 25s
kube-proxy-mfx66 0/1 Terminating 2 59s
kube-scheduler-kube-master 0/1 Pending 0 2s

Waiting for kube-proxy and the nodes
.......................................................................................................................................................................................................Error waiting for kube-proxy and the nodes

kubectl get pods --all-namespaces

NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-kube-master 1/1 Running 1 3m
kube-system kube-apiserver-kube-master 1/1 Running 1 1m
kube-system kube-controller-manager-kube-master 1/1 Running 1 1m
kube-system kube-proxy-d7dlb 0/1 CrashLoopBackOff 4 2m
kube-system kube-proxy-gwdfl 0/1 CrashLoopBackOff 4 2m
kube-system kube-proxy-t49zz 0/1 Error 4 1m
kube-system kube-scheduler-kube-master 1/1 Running 1 3m

kubectl logs kube-proxy-t49zz -n kube-system

W0118 14:16:26.111384 1 server.go:191] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
time="2018-01-18T14:16:26Z" level=warning msg="Running modprobe ip_vs failed with message: ``, error: exec: "modprobe": executable file not found in $PATH"
W0118 14:16:26.123933 1 server_others.go:268] Flag proxy-mode="" unknown, assuming iptables proxy
I0118 14:16:26.126071 1 server_others.go:122] Using iptables Proxier.
W0118 14:16:26.134066 1 proxier.go:476] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0118 14:16:26.134240 1 server_others.go:157] Tearing down inactive rules.
I0118 14:16:26.194376 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0118 14:16:26.194605 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0118 14:16:26.201140 1 conntrack.go:83] Setting conntrack hashsize to 32768
error: write /sys/module/nf_conntrack/parameters/hashsize: operation not supported

Don't force dashboard installation

For our use case we don't require the kubernetes dashboard, so it would be useful for us (and save time) to have a flag that would not require it to be installed and wait for it to come up before completing a run / restore.

Upgrade etcd version

According to kubeadm doc, kubeadm uses etcd v3.0.17. But etcd image in mirantis/kubeadm-dind-cluster v1.6 and v1.7 is still v2.2.5. So could you upgrade etcd image to v3.0.17.
I didn't see how the save.tar.lz4 was created so I don't know how to upgrade this myself

Services deleted when using reup the dind cluster with the script

First time used the dind-cluster-v1.8.sh script with up and installed a sample application. Service, deployments etc are working fine. After some time the vm got restarted. We tried to bring up dind cluster with the script and reup command. But service deployments everything are deleted.

How to bring up high availability dind cluster?

dind-cluster.sh script sometimes fails to detect moby Linux

The reason is a race condition in the docker info and grep -q pipe-line used as detection mechanism.

This is the detection mechanism used to detect moby Linux:

if docker info|grep -q '^Kernel Version: .*-moby$'; then
is_moby_linux=1
fi

Since grep -q exits immediately with a zero status as soon as a match is found, sometimes it exists while docker info is still writing to the pipe, but there is no reader (because grep has exited), so docker info receives a SIGPIPE signal from the kernel and it exits with a status of 141.
Now, given that pipefail shell option is enabled, the pipe-line's return status is the value of the last (rightmost) command to exit with a non-zero status, that is, docker info exit status equal to 141, hence, the if condition is not true in these cases.

Enable arm and arm64 support.

kubeadm-dind works great on amd64 machines, it would be awesome if the marintis-dind container ran on arm and arm64 arch too.

Config Not Being Respected

I am using the v1.6 script to deploy a cluster. It works, but does not respect config changes that I apply to config.sh. For example, I tried updating the DIND_SUBNET and deploy. The cluster was deployed with the default 10.192.0.0 network. I then ran down/clean, updated the NUM_NODES to 1 and ran up but 2 nodes were deployed.

$ git diff -w
diff --git a/config.sh b/config.sh
index d735523..c399885 100644
--- a/config.sh
+++ b/config.sh
@@ -1,5 +1,5 @@
 # DIND subnet (/16 is always used)
-DIND_SUBNET=10.192.0.0
+DIND_SUBNET=2001::
 
 # Apiserver port
 APISERVER_PORT=${APISERVER_PORT:-8080}
@@ -7,7 +7,7 @@ APISERVER_PORT=${APISERVER_PORT:-8080}
 # Number of nodes. 0 nodes means just one master node.
 # In case of NUM_NODES=0 'node-role.kubernetes.io/master' taint is removed
 # from the master node.
-NUM_NODES=${NUM_NODES:-2}
+NUM_NODES=${NUM_NODES:-1}
 
 # Use non-dockerized build
 # KUBEADM_DIND_LOCAL=

However, changes are respected when the settings are updated directly in the v1.6 script:

$ git diff dind-cluster-v1.6.sh 
diff --git a/fixed/dind-cluster-v1.6.sh b/fixed/dind-cluster-v1.6.sh
index 968a85e..75ee043 100755
--- a/fixed/dind-cluster-v1.6.sh
+++ b/fixed/dind-cluster-v1.6.sh
@@ -47,7 +47,7 @@ if [[ ! ${EMBEDDED_CONFIG:-} ]]; then
 fi
 
 CNI_PLUGIN="${CNI_PLUGIN:-bridge}"
-DIND_SUBNET="${DIND_SUBNET:-10.192.0.0}"
+DIND_SUBNET="${DIND_SUBNET:-2001::}"
 dind_ip_base="$(echo "${DIND_SUBNET}" | sed 's/\.0$//')"
 DIND_IMAGE="${DIND_IMAGE:-}"
 BUILD_KUBEADM="${BUILD_KUBEADM:-}"

$ ./dind-cluster-v1.6.sh up
* Making sure DIND image is up to date 
v1.6: Pulling from mirantis/kubeadm-dind-cluster
Digest: sha256:b81a47264b1992bfeb76f0407e886feded413edd7f5fcbab02ea296831b43db2
Status: Image is up to date for mirantis/kubeadm-dind-cluster:v1.6
* Saving a copy of docker host's /lib/modules 
* Starting DIND container: kube-master
docker: Error response from daemon: invalid IPv4 address: 2001::.2.

How to enable RBAC on the cluster?

I'm curious to know if it's possible to start the cluster with RBAC enabled, to allow for "--clusterrole" flag functionality and the like.

Up of a fresh kubeadm-dind-cluster fails for newest (2017-01-12) docker images

Running on Docker Mac 17.12.0 (and same problem with 17.09.01 i think)

When re-upping an existing cluster created with the pre 2017-01-12 docker v1.8 image, the cluster restarted properly (upgrading from v1.8.4 to v1.8.6).
However, after running clean on the prior cluster and then running up it fails. Running get pods --all-namespaces shows the kube-dns and kube-proxy images failing.

The logs from kube-proxy look a lot like this issue #50:

I0113 07:50:07.318108       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0113 07:50:07.325346       1 conntrack.go:83] Setting conntrack hashsize to 32768
error: write /sys/module/nf_conntrack/parameters/hashsize: operation not supported

Reverting to the prior version of mirantis/kubeadm-dind-cluster:v1.8 fixed the problem.

(We had to hack in our docker engine to find the prior layers in order to restore. It would be quite helpful if docker hub had tags for all release instead of just replacing. e.g. tag v1.8 could be the most recent, but it would have been helpful to also have explicit tage v1.8.6, v1.8.4, .... for restoring in case of problems. Also it would be useful if we could rebuild it ourselves in cases like this, but we weren't sure how to.)

Error starting daemon: error initializing graphdriver: driver not supported

I am unable to deploy a kubeadm-dind-cluster on Ubuntu 16.04.2 or CentOS 7. I get the following error:

Jun 22 19:23:34 kube-master systemd[1]: Starting Docker Application Container Engine...
Jun 22 19:23:34 kube-master rundocker[180]: Trying to load overlay module (this may fail)
Jun 22 19:23:34 kube-master rundocker[180]: time="2017-06-22T19:23:34.197080260Z" level=info msg="libcontainerd: new containerd process, pid: 189"
Jun 22 19:23:35 kube-master rundocker[180]: time="2017-06-22T19:23:35.208001616Z" level=fatal msg="Error starting daemon: error initializing graphdriver: driver not supported"
Jun 22 19:23:35 kube-master systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Jun 22 19:23:35 kube-master systemd[1]: Failed to start Docker Application Container Engine.
Jun 22 19:23:35 kube-master systemd[1]: docker.service: Unit entered failed state.
Jun 22 19:23:35 kube-master systemd[1]: docker.service: Failed with result 'exit-code'.

Here are details of my CentOS 7 setup:

$ docker version
Client:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   78d1802
 Built:        Tue Jan 10 20:20:01 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   78d1802
 Built:        Tue Jan 10 20:20:01 2017
 OS/Arch:      linux/amd64

$ rpm --query centos-release
centos-release-7-3.1611.el7.centos.x86_64

# docker info|grep Storage
Storage Driver: overlay

I have tried the overlay and overlay2 drivers, but I hit the same error.

Here are the details of my Ubuntu setup:

# cat /etc/os-release 
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.2 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

# docker version
Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 22:00:43 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 22:00:43 2016
 OS/Arch:      linux/amd64

# docker info | grep Stor
WARNING: No swap limit support
Storage Driver: aufs

@pmichali is experiencing the same problem.

We are trying to use kubeadm-dind-cluster for k8s IPv6 e2e testing. More details on the issue can be found at kubernetes/kubernetes#47666

hostPort not working

Hello,

I seem to be having an issue getitng hostport working I am trying to get the off the shelf nginx ingress controller working. I am including the yaml for reference. Since this is powered by kubeam I assumed I would have to do the hostNetwork: true trick to get hostPort to function correctly

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: default-http-backend
  labels:
    k8s-app: default-http-backend
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        k8s-app: default-http-backend
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: default-http-backend
        # Any image is permissable as long as:
        # 1. It serves a 404 page at /
        # 2. It serves 200 on a /healthz endpoint
        image: gcr.io/google_containers/defaultbackend:1.0
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 30
          timeoutSeconds: 5
        ports:
        - containerPort: 8080
        resources:
          limits:
            cpu: 10m
            memory: 20Mi
          requests:
            cpu: 10m
            memory: 20Mi
---
apiVersion: v1
kind: Service
metadata:
  name: default-http-backend
  namespace: kube-system
  labels:
    k8s-app: default-http-backend
spec:
  ports:
  - port: 80
    targetPort: 8080
  selector:
    k8s-app: default-http-backend
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-ingress-controller
  labels:
    k8s-app: nginx-ingress-controller
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        k8s-app: nginx-ingress-controller
    spec:
      # hostNetwork makes it possible to use ipv6 and to preserve the source IP correctly regardless of docker configuration
      # however, it is not a hard dependency of the nginx-ingress-controller itself and it may cause issues if port 10254 already is taken on the host
      # that said, since hostPort is broken on CNI (https://github.com/kubernetes/kubernetes/issues/31307) we have to use hostNetwork where CNI is used
      # like with kubeadm
      hostNetwork: true
      terminationGracePeriodSeconds: 60
      containers:
      - image: gcr.io/google_containers/nginx-ingress-controller:0.9.0-beta.3
        name: nginx-ingress-controller
        readinessProbe:
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
        livenessProbe:
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          timeoutSeconds: 1
        ports:
        - containerPort: 80
          hostPort: 80
        - containerPort: 443
          hostPort: 443
        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        args:
        - /nginx-ingress-controller
        - --default-backend-service=$(POD_NAMESPACE)/default-http-backend

CentOS 7: Docker App Container Engine startup fails - error initializing graphdriver: driver not supported

When trying to start up DinD, during kubeadm init, a failure is seen for Docker Application Container Engine, saying driver not supported.

When checking the master container, docker is not running due to this error. If an attempt is made to manually start docker, the same error is seen.

This works on Ubuntu 16.04. In comparing the two, we see that for Ubuntu the host and the master container use the "overlay2" driver. Both are running Ubuntu. For CentOS 7, the host is using "devicemapper" (overlay2 is not supported), and the master container is using "overlay2". However, the container OS is RHEL 4.8.5-11.

Is it possible that RHEL doesn't support "overlay2" driver? If not, how to force a different driver that does work?

Docs cleanup after adding Weave support

Below line is not required any more as Weave CNI is now supported in kubeadm-dind-cluster:
https://github.com/Mirantis/kubeadm-dind-cluster/blob/master/config.sh#L31

Add IPv6 support to DinD

This is to track efforts to add support in DinD for Kubernetes cluster running in IPv6 mode (initially IPv6 only, later dualstack), connected to a IPv4 network. Goal is to be able to use DinD for E2E testing of IPv6 functionality in Kubernetes.

Note: IPv6 support in Kubernetes is a WIP, and as such, for now, custom Kubernetes repos are used that have patches/changes that are in-flight for kubernetes.

A DinD PR will be provided, based on the fix-1.8+ branch that is currently available (and based on the latest on master branch).

dind::at-least-kubeadm-1-8 function fails

I am using master for kdc and k8s and the dind::at-least-kubeadm-1-8 function does not work properly for me. My deployment completes successfully, but kubeadm init does not reference kubeadm.conf. When I remove the reference to dind::at-least-kubeadm-1-8 in the dind::init function, kubeadm init works properly by using kubeadm.conf. Here are a few details from my deployment.

I have to use sudo -E when calling the gce-setup.sh script, otherwise I get permission errors when running kdc:

$ . /Users/daneyonhansen/code/go/src/github.com/Mirantis/kubeadm-dind-cluster/gce-setup.sh
<SNIP>
chown: /Users/daneyonhansen/code/go/src/k8s.io/kubernetes/_output/images/kube-build:build-f0510dd6c7-5-v1.9.1-1/Dockerfile: Operation not permitted
<SNIP>

When I run with sudo -E, my deployment completes, but does not use kubeadm.conf:

$ sudo -E /Users/daneyonhansen/code/go/src/github.com/Mirantis/kubeadm-dind-cluster/gce-setup.sh
<SNIP>
* Starting DIND container: kube-master
* Running kubeadm: init --pod-network-cidr=10.244.0.0/16 --skip-preflight-checks
<SNIP>

It appears the regex used in dind::at-least-kubeadm-1-8 does not work:

$ docker exec kube-master kubeadm version -o short 2>/dev/null|sed 's/^\(v[0-9]*\.[0-9]*\).*')
-bash: syntax error near unexpected token `)'

This is what I get without the regex:

$ sudo -E docker exec kube-master kubeadm version -o short
v1.9.0-alpha.2.852+cbdd18eee97369

allow version to be specified on command line and use non-custom kubernetes builds

I noticed when going to use this, two things that concerned me:

the kubernetes versions being used appear to be non-official versions
when executing the scripts, then appear to hard code specific bugfix versions of kubernetes

I would expect the default behavior of the v1.6 script for example to use the latest bugfix version of v1.6. And optionally when starting up could override to use a different one.

gce deploy failure

I am trying to deploy kubeadm-dind-cluster to gce using the gce setup script. The deployment fails because docker engine does not start. The daemon does not start because the systemd dropin references /usr/bin/docker daemon instead of /usr/bin/dockerd. As soon as I update the dropin, reload and restart the daemon, docker successfully starts.

I am running docker-machine version 0.10.0, build 76ed2a6. Here are the details of the issue. Do you have any recommendations for docker-machine creating the proper daemon name in the dropin?

1.8 Relase

With the release of k8s 1.8, are you planning to release a 1.8 version of kubeadm-dind-cluster? If so, what is the planned release date?

sha1sum calculation fails on first run (at least on alpine)

On alpine including running in the standard "docker:latest" image, the checksum calculation of kubectl fails on first run, this appear to be because the echo piped into sha1sum -c is missing an extra space

echo "${kubectl_sha1} ${path}" | sha1sum -c

should be (note the two spaces rather than one)

echo "${kubectl_sha1}  ${path}" | sha1sum -c

I haven't tested running it on different linux variants, as this might be a busybox / alpine specific problem, thus issue rather than a PR.

Is there a way to mount host volumes ?

Is there a way to mount host volumes

Support multiple CNI implementations

Let's start with Flannel and Calico

IPv6 Route Logic Fails

I see the following error in a gce ipv6 deployment:

dind-cluster.sh: line 486: ip: command not found

Here is the related function in the script:

function dind::ensure-nat {
    if [[  ${IP_MODE} = "ipv6" ]]; then
        if ! docker ps | grep tayga >&/dev/null; then
            docker run -d --name tayga --hostname tayga --net kubeadm-dind-net --label mirantis.kubeadm_dind_cluster \
		   --sysctl net.ipv6.conf.all.disable_ipv6=0 --sysctl net.ipv6.conf.all.forwarding=1 \
		   --privileged=true --ip 172.18.0.200 --ip6 ${LOCAL_NAT64_SERVER} --dns ${REMOTE_DNS64_V4SERVER} --dns ${dns_server} \
		   -e TAYGA_CONF_PREFIX=${DNS64_PREFIX_CIDR} -e TAYGA_CONF_IPV4_ADDR=172.18.0.200 \
		   danehans/tayga:latest >/dev/null
	    # TODO Way to add route w/o sudo? Need to check/create, as "clean" may remove route
	    local route="$(ip route | egrep "^172.18.0.128/25")"
	    if [[ -z "${route}" ]]; then
		if [[ "${GCE_HOSTED}" = true ]]; then
		    docker-machine ssh k8s-dind sudo ip route add 172.18.0.128/25 via 172.18.0.200
		else
		    sudo ip route add 172.18.0.128/25 via 172.18.0.200
		fi
	    fi
	fi
    fi
}

I believe how the route is checked is dependent on $GCE_HOSTED. If $GCE_HOSTED = true, then:

docker-machine ssh k8s-dind sudo ip route | egrep "^172.18.0.128/25"

else:

ip route | egrep "^172.18.0.128/25"

cc @pmichali

Create a master node takes a long time.

Run "./dind-cluster-v1.6.sh up", can create the cluster succesful.

NAME          STATUS    AGE       VERSION
kube-master   Ready     9m        v1.6.6
kube-node-1   Ready     6m        v1.6.6
kube-node-2   Ready     6m        v1.6.6
* Access dashboard at: http://localhost:8080/ui

But it spent so long time, check the print & found "[apiclient] All control plane components are healthy after 35581.805504 seconds".

Where can I got the full log, then to analysis what happand in the "35581.805504 seconds"? Thanks.

[init] Using Kubernetes version: v1.6.7
[init] Using Authorization mode: RBAC
[preflight] Skipping pre-flight checks
[certificates] Generated CA certificate and key.
[certificates] Generated API server certificate and key.
[certificates] API Server serving cert is signed for DNS names [kube-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.192.0.2]
[certificates] Generated API server kubelet client certificate and key.
[certificates] Generated service account token signing key and public key.
[certificates] Generated front-proxy CA certificate and key.
[certificates] Generated front-proxy client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[apiclient] Created API client, waiting for the control plane to become ready
[apiclient] All control plane components are healthy after 35581.805504 seconds
[apiclient] Waiting for at least one node to register
[apiclient] First node has registered after 2.537246 seconds
[token] Using token: 7561fb.eea50f2e1a373de4
[apiconfig] Created RBAC rules
[addons] Created essential addon: kube-proxy
[addons] Created essential addon: kube-dns

Your Kubernetes master has initialized successfully!

Consider adding a developer workflow example

Hello! I think people would see the obvious benefit of your approach if you documented what a workflow for using kubeadm-dind-cluster would look like for a kubernetes core developer. As an example:

Someone asks "hey check out my PR!"
you git pull foo
Fire up the cluster with this branch.
You find a bug, fix it.
Restart the cluster to test.

It appears to me that steps 3-5 are very well suited to kubeadm-dind-cluster and probably much faster than using a VM, I think by giving people an example like this they can see how it would work first hand.

Failed to start control plane with dind-cluster-v1.8.sh due to unknown flag(s)

kube-apiserver and kube-controller-manager failed to start.

CONTAINER ID        IMAGE                                      COMMAND                  CREATED             STATUS                      PORTS                                          NAMES
45dd162516f4        40c77120c3b5                               "kube-apiserver --ins"   11 seconds ago      Exited (2) 9 seconds ago                                                   k8s_kube-apiserver_kube-apiserver-kube-master_kube-system_e8e89663840a3c709cb6fb6c80d6114a_5

root@kube-master:/etc/kubernetes/manifests# docker logs 45dd162516f4
unknown flag: --insecure-port
Usage of kube-apiserver:

CONTAINER ID        IMAGE                                      COMMAND                  CREATED             STATUS                      PORTS                                          NAMES
b8cc58d08dd5        40c77120c3b5                               "kube-controller-mana"   11 seconds ago      Exited (2) 10 seconds ago                                                  k8s_kube-controller-manager_kube-controller-manager-kube-master_kube-system_504f2ed899e013f03c12d042973e8167_6

root@kube-master:/etc/kubernetes/manifests# docker logs b8cc58d08dd5
unknown flag: --root-ca-file
Usage of kube-controller-manager:

Add Kubenet Support

Kubenet support is needed for k8s e2e. Here is how you configure kubenet networking:

Do not setup a CNI conf file and do not call any CNI plugin. Kubenet wraps the CNI bridge and local-ipam plugins and dynamically creates the CNI conf file based on the controller-manager args below.
Update all kubelet configs to include:

Environment="KUBELET_NETWORK_ARGS=--network-plugin=kubenet --non-masquerade-cidr=$CLUSTER_CIDR"

Update controller-manager pod manifest to include:

    - --allocate-node-cidrs=true
    - --cluster-cidr=$CLUSTER_CIDR

kubenet does not support node-to-node connectivity. You must create static routes on each node. For example, if $CLUSTER_CIDR = 10.10.0.0/16, $MASTER_IP=192.168.100.10 and $NODE_IP=192.168.100.20:
On master: sudo route add -net 10.10.1.0 netmask 255.255.255.0 gw 192.168.100.20
On node: sudo route add -net 10.10.0.0 netmask 255.255.255.0 gw 192.168.100.10

cc: @pmichali

Can not access to service on local host

I installed kubeadm-dind-cluster successfully on my local host using dind-cluster-v1.8.sh up, i created deployment and exposed a service. I can access that service inside master container, but failed on local. Can anybody help me please?

Fixes to run on OSX

Hi,

I made a kubernetes workshop, and some people made a fix to run kubeadm-dind on OSX:

https://www.meetup.com/Docker-Belgium/events/240356311/comments/480799431/?read=1&_af=event&_af_eid=240356311&itemTypeToken=COMMENT&https=on

"one1zero1one
Done also in OSX 10.11.6, had to do the following: brew install md5shasum, created a /boot folder in OSX and added it to docker preferences file sharing, then did a chmod a+x kubectl-v1.6.1 in the /Uses/user/.kubeadm-dind-cluster - after that the script runs fine and you get teh cluster."

I don't have OSX myself, but I am gonna try to reproduce it via a vagrant OSX box using KVM and nested virtualization.

I think he used Docker-on-mac.

Conntrack work-around no longer works

In wrapkubeadm, the script passes --conntrack-max=0 and --conntrack-max-per-core=0 to kube-proxy in an attempt to tell it to skip trying to update the hashsize, when there is a large conntrack max configured on the host to avoid docker issue moby/moby#24000.

Unfortunately, recent changes to kube-proxy have it read the CLI args, and then attempt to read from a config file. If there is a config file (there always is), the CLI args are ignored. As a result, with the reading of the config file, there are no conntrack settings so the default conntrack max-per-core value of 32768 is used. When on a system with 32 CPUs, the resulting conntrack value (1048576) can be more than four times larger than the hashsize (e.g. 262144), which causes kube-proxy to attempt to increase the hashsize and hits the docker issue.

DinD needs to be able to set both conntrack max and max-per-core to zero in the config file, which tells kube-proxy to ignore attempting to modify the hashsize in the condition where there is a large number of conntracks.

run kubeadm-dind-cluster behind a corporate proxy

I'm trying to run your awesome project behind a corporate proxy.

I first ran $ ./dind-cluster-v1.8.sh up to see what I get and it fails when trying to obtain mirantis/kubeadm-dind-cluster from Docker Hub. I didn't configured the proxy on my Docker daemon as I'm proxifying any call to Docker Hub through an internal Docker registry. So I updated the name of the images to obtain in dind-cluster-v1.8.sh with the prefix of my internal Docker registry.

I ran $ ./dind-cluster-v1.8.sh up another time. Pulling mirantis/kubeadm-dind-cluster worked. But then I got another issue:

...
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /lib/systemd/system/kubelet.service.
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
unable to get URL "https://dl.k8s.io/release/stable-1.8.txt": Get https://dl.k8s.io/release/stable-1.8.txt: dial tcp 23.236.58.218:443: i/o timeout

More attempts to reach this URL follow and quite logically fail. So I have to exit the process manually at this stage.

Do you have any idea what I have to configure in order to be able to run the cluster from behind a corporate proxy (or if it was at least already tested) ?

enable offline running

Assuming you've already ran a ./dind-cluster-v1.6.sh up successfully once, it would be nice to allow the cluster to still start up offline with no internet connection (airplane, park, etc.).

Turn off wifi on your laptop and run ./dind-cluster-v1.6.sh up

$ ./dind-cluster-v1.6.sh up
* Making sure DIND image is up to date 
Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 192.168.65.1:53: read udp 192.168.65.2:43749->192.168.65.1:53: i/o timeout

support Kubernetes v1.7

I have not looked yet the extent to which a new script and changes would be needed to support v1.7 but putting a placeholder here so that it can be tracked if that's OK.

Exposing an External IP Address to Access an Application in a Cluster

I successfully installed dind-cluster on my laptop using 1.8 sh. I then followed steps from k8s tutorial:
https://kubernetes.io/docs/tutorials/stateless-application/expose-external-ip-address/

It is possible to access the service from inside the cluster.
However, external port never gets assigned, EXTERNAL-IP stays forever in pending:

sava@DellXPS:/mnt/c/k8s/helloworld$ kubectl get services my-service
NAME         TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
my-service   LoadBalancer   10.104.13.14   <pending>     8080:31339/TCP   21m

Could somebody give me a hint if / how would be possible to expose LoadBalancer service on the host with running kubeadm-dind-cluster? Could https://github.com/Mirantis/k8s-externalipcontroller be used?

Enhancements for gce-setup.sh and error in dind-cluster.sh

To allow DinD to be easily used by developers at different locations, it would be nice to be able to specify the "zone" or have it read it in via config settings (like project is set).

May also want to allow the VM name to be overridden, so that the same script could be used from multiple machines.

In dind_cluster.sh, when E2E command is performed, it ends up running e2e.go and passing -check_version_skew, which appears to have been renamed to -check-version-skew.

If wanted, I could do a pull request with these changes.

Issue creating 1.6 K8s dind cluster

So I have successfully used the dind-cluster-v1.5.sh script to install a 1.5 version of kubernetes in my docker container.

However when I use the dind-cluster-v1.6.sh script to install version 1.6 it hangs at the following;
" * Waiting for kube-proxy and the nodes "

At this point Kubectl commands are not executing from the k8s host or the master node. Is this a known issue?

Can't pull mirantis/kubeadm-dind-cluster:v1.9 on cluster up

When starting a 1.9 cluster using the fixed version script we fail to pull the image from docker hub:

./fixed/dind-cluster-v1.9.sh up
* Making sure DIND image is up to date
Error response from daemon: manifest for mirantis/kubeadm-dind-cluster:v1.9 not found

The v1.9 tag hasn't been pushed upstream yet: https://hub.docker.com/r/mirantis/kubeadm-dind-cluster/tags/

Solution is to tag and publish this image in the mirantis repo.

Temp workarounds for others:

build your own image locally -- use the non-fixed script:

build/build-local.sh
DIND_IMAGE=mirantis/kubeadm-dind-cluster:local ./dind-cluster.sh up

use mine:

DIND_IMAGE="stealthybox/kubeadm-dind-cluster:v1.9" ./dind-cluster.sh up

Fix /boot and /lib/module handling

As of now, /boot and /lib/modules are not copied from the host and we don't mount them with -v either because this is not compatible with Moby Linux-based Dockers (e.g. Mac OS X). The proper solution to this is copying these directories from the host Linux by means of e.g. busybox image + nsenter. This is needed for Virtlet to work on kubeadm-dind-cluster.

Increasing dind worker nodes limit

I'm trying to increase my worker nodes to 3 and currently there is 2.
Couldn't find any workaround through the wiki.

Support prebuilt k8s DIND images

This will make it easier to use kubeadm-dind-cluster for Virtlet devenv & CI.
Provide prebuilt image for k8s stable (1.5.x).
Try to make an image for k8s 1.4 if this is not too difficult.

devicemapper support.

Hi,

This is an awsome project, and I try it on "CentOS Linux release 7.3.1611 (Core)" with docker 1.12.6, but got the following error info.

It look like unsupport the devicemapper, which is the default graphdriver of CentOS, am I right?

[root@localhost kubeadm-dind-cluster]# ./dind-cluster-v1.5.sh up
WARNING: Usage of loopback devices is strongly discouraged for production use. Use --storage-opt dm.thinpooldev to specify a custom block storage device.
WARNING: Usage of loopback devices is strongly discouraged for production use. Use --storage-opt dm.thinpooldev to specify a custom block storage device.

Making sure DIND image is up to date
Trying to pull repository docker.io/mirantis/kubeadm-dind-cluster ...
v1.5: Pulling from docker.io/mirantis/kubeadm-dind-cluster
952132ac251a: Already exists
82659f8f1b76: Already exists
c19118ca682d: Already exists
8296858250fe: Already exists
24e0251a0e2c: Already exists
2545d638d973: Already exists
e0b45d7ea196: Already exists
8d7d40f3e602: Already exists
216f5a138844: Already exists
c71de27d6b60: Already exists
b4905a66b05c: Already exists
88d9c6d89a0e: Already exists
5b20a29e0052: Already exists
096f47601f48: Already exists
cb5873b128e5: Already exists
90aa4e16a184: Pull complete
b9cbd586a93b: Pull complete
fe48e937b7c1: Pull complete
1a7ea6f613e5: Pull complete
0720888e3849: Pull complete
0e6e0fb90af6: Pull complete
6352f14208e8: Pull complete
5e25d1c1f645: Pull complete
93db4372e6a5: Pull complete
Digest: sha256:051af9b28a1cb767e91a678d89bbaa36007606b39d1242da68f5a069481d016e

Removing container: 87af074de5cb
87af074de5cb

Starting DIND container: kube-master

Running kubeadm: init --skip-preflight-checks
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
docker failed to start. Diagnostics below:
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─10-dind.conf, 20-fs.conf
Active: failed (Result: exit-code) since Tue 2017-03-28 06:47:20 UTC; 19ms ago
Docs: https://docs.docker.com
Process: 205 ExecStart=/usr/local/bin/rundocker $DOCKER_EXTRA_OPTS (code=exited, status=1/FAILURE)
Main PID: 205 (code=exited, status=1/FAILURE)

Mar 28 06:47:19 kube-master systemd[1]: Starting Docker Application Container Engine...
Mar 28 06:47:19 kube-master rundocker[205]: Trying to load overlay module (this may fail)
Mar 28 06:47:19 kube-master rundocker[205]: time="2017-03-28T06:47:19.483431970Z" level=info msg="libcontainerd: new containerd process, pid: 217"
Mar 28 06:47:20 kube-master rundocker[205]: time="2017-03-28T06:47:20.492311655Z" level=fatal msg="Error starting daemon: error initializing graphdriver: driver not supported"
Mar 28 06:47:20 kube-master systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Mar 28 06:47:20 kube-master systemd[1]: Failed to start Docker Application Container Engine.
Mar 28 06:47:20 kube-master systemd[1]: docker.service: Unit entered failed state.
Mar 28 06:47:20 kube-master systemd[1]: docker.service: Failed with result 'exit-code'.
*** kubeadm failed
[root@localhost kubeadm-dind-cluster]#

Add possibility for the local host to act as repository.

Basically the goal is to replace minikube with this. Minikube is cpu and memory intensive and has only one master node and no workers.

The dind-approach is by far better, just not as mature.

On thing that doesn't work (or I found no way of doing it) is to build images locally without pushing them to any repository and the use them in the dind-cluster.

That would be awesome.

script v1.6 fails under Alpine 3.7 with permission denied

Hi,

I am running the script under Alpine 3.7, and I hit the following issue:

root@host# ls -1 dind-cluster-v1.6.sh
dind-cluster-v1.6.sh
root@host# docker run --privileged -v $PWD:/mnt -it alpine:3.7 /bin/sh 
root@container# apk update && apk add curl bash docker
root@container# ./dind-cluster-v1.6.sh up
[...]
No resources found
* Setting cluster config 
./dind-cluster-v1.6.sh: line 615: /root/.kubeadm-dind-cluster/kubectl: Permission denied

If I do a chmod +x on it, it then works fine.

Fix hyperkube handling for k8s master (pre-1.8)

Current kubeadm no longer supports KUBE_HYPERKUBE_IMAGE and requires passing control plane image via the config.

Add Support for kubeadm extra args

kubeadm.conf supports setting extra args for tailoring a kubeadm deployment. See the following example for using extra args to set k8s component IPs:

apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
apiServerExtraArgs:
  etcd-servers: "http://${APISERVER_ADVERTISE_ADDRESS}:2379"
controllerManagerExtraArgs:
  address: "${APISERVER_ADVERTISE_ADDRESS}"
schedulerExtraArgs:
  address: "${APISERVER_ADVERTISE_ADDRESS}"
etcd:
  extraArgs:
    listen-client-urls: "http://${APISERVER_ADVERTISE_ADDRESS}:2379"

Pod network is not changed when using Calico CNI

There's a typo in bash scripts due to which $POD_NETWORK_CIDR is not changed to "192.168.0.0/16". I'm not quite sure why such CIDR adjustment is needed, but it does not work as expected.

Steps to reproduce:

SKIP_SNAPSHOT=true CNI_PLUGIN=calico-kdd ./dind-cluster-v1.8.sh up

and check Pods IPs

Expected result:
Pods have IPs from"192.168.0.0/16" network

Actual result:
Check Pod IPs, they are from "10.244.0.0/16" network

Unable to access EXTERNAL_IP of K8s service from k8s node

Hello,

Yes its that time again. So I ran into an issue where I use your v1.5 script to set up a 2 node k8s cluster. I then create a service which uses an external ip like

NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
web 10.106.52.236 172.17.4.201 80:32023/TCP 1h

i can hit the service from the master k8s node like so;
curl 10.106.52.236:80 --> success 200
curl 0.0.0.0:32023 --> success 200
curl 127.0.0.1:32023 --> success 200

However whether I am on the master of the docker host, I cannot hit that EXTERNAL-IP of 172.17.4.201, is there any reason you can think of why this would be an issue?

Thanks,
aramisjohnson

Routing to a service ip does not work from the kernel

When running DinD, if a kernel module attempts to target an ip address that is a Kubernetes service IP, the route does not get forwarded.

I've tried the different networking options of bridge, calico, flannel, and weave and they all have the same behavior that the kernel module fails to target the service IP.

In the GCE documentation it indicates that they had to set iptables and sysctl net.ipv4.ip_forward=1 so the kernel will work correctly with bridged containers. Seems like we need something similar for DinD to work with the kernel.

For example, in Rook we have a service defined:

NAME             CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
rook-ceph-mon0   10.101.104.208   <none>        6790/TCP   42m

The service routes to the pod:

NAME                              READY     STATUS    RESTARTS   AGE       IP           NODE
rook-ceph-mon0-kqtgv              1/1       Running   0          42m       10.244.1.5   kube-node-1

The routing happens perfectly from other pods/user mode. However, the rbd kernel module is not able to route with the service ip. Any input on this would be appreciated!

The connection to the server localhost:8080 was refused - did you specify the right host or port?

So this worked like a charm, i booted an ubuntu container with DinD enabled on it, then i ran ./dind-cluster-v1.5.sh up. The cluster/node came up in a matter of minutes. I can docker exec into the newly created container/node and start issuing kubectl commands like normal.

The only problem is, if i dont docker exec into the container/node, kubectl commands return the following error;

The connection to the server localhost:8080 was refused - did you specify the right host or port?

How can I enable it so kubectl commands work on the parent container?

Again great job with this and any help you can provide will be much appreciated...

Is there any way to expose NodePort to the host?

I see that master node has exposed port 8080. I need to access some services exposed with NodePort from my machine. Is there an easy way to enable it?