hobby-kube / guide Goto Github PK

Kubernetes clusters for the hobbyist.

License: MIT License

kubernetes guide terraform security setup automation cluster digitalocean scaleway hetzner-cloud

guide's Introduction

Kubernetes clusters for the hobbyist

This guide answers the question of how to setup and operate a fully functional, secure Kubernetes cluster on a cloud provider such as Hetzner Cloud, DigitalOcean or Scaleway. It explains how to overcome the lack of external ingress controllers, fully isolated secure private networking and persistent distributed block storage.

Be aware, that the following sections might be opinionated. Kubernetes is an evolving, fast paced environment, which means this guide will probably be outdated at times, depending on the author's spare time and individual contributions. Due to this fact contributions are highly appreciated.

This guide is accompanied by a fully automated cluster setup solution in the shape of well structured, modular Terraform recipes. Links to contextually related modules are spread throughout the guide, visually highlighted using the Terraform icon.

If you find this project helpful, please consider supporting its future development on GitHub Sponsors.

Cluster size
Choosing a cloud provider
Choosing an operating system
Security
Installing Kubernetes
Access and operations
- Role-Based Access Control
- Deploying services
Bringing traffic to the cluster
Distributed block storage
Where to go from here

Cluster size

The professional hobbyist cluster operators aim for resilience—a system's ability to withstand and recover from failure. On the other hand, they usually have a limited amount of funds they can or want to spend on a basic cluster. It's therefore crucial to find a good balance between resilience and cost.

After experimenting with various setups and configurations a good reference point is, that a basic cluster can be operated on as little as two virtual hosts with 1GB memory each. At this point it's worth mentioning that Kubernetes does not include swap memory in its calculations and will evict pods pretty brutally when reaching memory limits (reference). As opposed to memory, raw CPU power doesn't matter that much, although it should be clear that the next Facebook won't be running on two virtual CPU cores.

For a Kubernetes cluster to be resilient it's recommended that it consists of at least three hosts. The main reason behind this is that etcd, which itself is an essential part of any Kubernetes setup, is only fault tolerant with a minimum of three cluster members (reference).

Choosing a cloud provider

provider/hcloud provider/digitalocean provider/scaleway

At this point it's time to choose a cloud provider based on a few criteria such as trustworthiness, reliability, pricing and data center location. The very best offer at this time is definitely from Hetzner Cloud (referral link, get €20), where one gets a suitable three node cluster up and running for around €13.50/month (3x2GB, respectively 3x4GB with arm64 CPUs).

DigitalOcean (referral link, get $100) is known for their great support and having data centers around the globe which is definitely a plus. A three node cluster will cost $18/month (3x1GB).

Scaleway's instances start at around €5. A three node cluster will cost around €15/month (3x1GB, with 20GB disk space each).

Linode, Vultr and a couple of other providers with similar offers are other viable options. While they all have their advantages and disadvantages, they should be perfectly fine for hosting a Kubernetes cluster.

While pricing for virtual private servers has generally increased in the past years, the rise of arm64 based CPUs has opened doors for less expensive options. This guide results in a setup that can be operated on x86, as well as arm64 based systems.

Choosing an operating system

While Linux comes in many flavors, Ubuntu (LTS) is the distribution of choice for hosting our cluster. This may seem opinionated—and it is—but then again, Ubuntu has always been a first class citizen in the Kubernetes ecosystem.

CoreOS would be a great option as well, because of how it embraces the use of containers. On the other hand, not everything we might need in the future is readily available. Some essential packages are likely to be missing at this point, or at least there's no support for running them outside of containers.

That being said, feel free to use any Linux distribution you like. Just be aware that some of the sections in this guide may differ substantially depending on your chosen operating system.

Security

Securing hosts on both public and private interfaces is an absolute necessity.

This is a tough one. Almost every single guide fails to bring the security topic to the table to the extent it deserves. One of the biggest misconceptions is that private networks are secure, but private does not mean secure. In fact, private networks are more often than not shared between many customers in the same data center. This might not be the case with all providers. It's generally good advise to gain absolute certainty, what the actual conditions of a private network are.

Firewall

security/ufw

While there are definitely some people out there able to configure iptables reliably, the average mortal will cringe when glancing at the syntax of the most basic rules. Luckily, there are more approachable solutions out there. One of those is UFW, the uncomplicated firewall—a human friendly command line interface offering simple abstractions for managing complex iptables rules.

Assuming the secure public Kubernetes API runs on port 6443, SSH daemon on 22, plus 80 and 443 for serving web traffic, results in the following basic UFW configuration:

ufw allow ssh # sshd on port 22, be careful to not get locked out!
ufw allow 6443 # remote, secure Kubernetes API access
ufw allow 80
ufw allow 443
ufw default deny incoming # deny traffic on every other port, on any interface
ufw enable

This ruleset will get slightly expanded in the upcoming sections.

Secure private networking

Kubernetes cluster members constantly exchange data with each other. A secure network overlay between hosts is not only the simplest, but also the most secure solution for making sure that a third party occupying the same network as our hosts won't be able to eavesdrop on their private traffic. It's a tedious job to secure every single service, as this task usually requires creating and distributing certificates across hosts, managing secrets in one way or another and, last but not least, configuring services to actually use encrypted means of communication. That's why setting up a network overlay using VPN—which itself is a one-time effort requiring very little know how, and which naturally ensures secure inter-host communication for every possible service running now and in the future—is simply the best solution to address this problem.

When talking about VPN, there are generally two types of solutions:

Traditional VPN services, running in userland, typically providing a tunnel interface
IPsec, which is part of the Kernel and enables authentication and encryption on any existing interface

VPN software running in userland has in general a huge negative impact on network throughput as opposed to IPsec, which is much faster. Unfortunately, it's quite a challenge to understand how the latter works. strongSwan is certainly one of the more approachable solutions, but setting it up for even the most basic needs is still accompanied by a steep learning curve.

Complexity is security's worst contender.

A project called WireGuard supplies the best of both worlds at this point. Running as a Kernel module, it not only offers excellent performance, but is dead simple to set up and provides a tunnel interface out of the box. It may be disputed whether running VPN within the Kernel is a good idea, but then again alternatives running in userland such as tinc or fastd aren't necessarily more secure. However, they are an order of magnitude slower and typically harder to configure.

WireGuard setup

security/wireguard

Let's start off by installing WireGuard. Follow the instructions found here: WireGuard Installation.

apt install wireguard

Once WireGuard has been installed, it's time to create the configuration files. Each host should connect to its peers to create a secure network overlay via a tunnel interface called wg0. Let's assume the setup consists of three hosts and each one will get a new VPN IP address in the 10.0.1.1/24 range:

Host	Private IP address (ethN)	VPN IP address (wg0)
kube1	10.8.23.93	10.0.1.1
kube2	10.8.23.94	10.0.1.2
kube3	10.8.23.95	10.0.1.3

Please note that Hetzner Cloud doesn't provide a private network interface, but it's perfectly fine to run WireGuard on the public interface. Just make sure to use the public IP addresses and the public network interface (eth0).

In this scenario, a configuration file for kube1 would look like this:

# /etc/wireguard/wg0.conf
[Interface]
Address = 10.0.1.1
PrivateKey = <PRIVATE_KEY_KUBE1>
ListenPort = 51820

[Peer]
PublicKey = <PUBLIC_KEY_KUBE2>
AllowedIps = 10.0.1.2/32
Endpoint = 10.8.23.94:51820

[Peer]
PublicKey = <PUBLIC_KEY_KUBE3>
AllowedIps = 10.0.1.3/32
Endpoint = 10.8.23.95:51820

To simplify the creation of private and public keys, the following command can be used to generate and print the necessary key-pairs:

for i in 1 2 3; do
  private_key=$(wg genkey)
  public_key=$(echo $private_key | wg pubkey)
  echo "Host $i private key: $private_key"
  echo "Host $i public key:  $public_key"
done

After creating a file named /etc/wireguard/wg0.conf on each host containing the correct IP addresses and public and private keys, configuration is basically done.

What's left is to add the following firewall rules:

# open VPN port on private network interface (use eth0 on Hetzner Cloud)
ufw allow in on eth1 to any port 51820
# allow all traffic on VPN tunnel interface
ufw allow in on wg0
ufw reload

Before starting WireGuard we need to make sure that ip forwarding and a few other required network settings are enabled:

echo br_netfilter > /etc/modules-load.d/kubernetes.conf
modprobe br_netfilter

echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
echo "net.bridge.bridge-nf-call-iptables=1" >> /etc/sysctl.conf

sysctl -p # apply settings from /etc/sysctl.conf

Executing the command systemctl start wg-quick@wg0 on each host will start the VPN service and, if everything is configured correctly, the hosts should be able to establish connections between each other. Traffic can now be routed securely using the VPN IP addresses (10.0.1.1–10.0.1.3).

In order to check whether the connections are established successfully, wg show comes in handy:

$ wg show
interface: wg0
  public key: 5xKk9...
  private key: (hidden)
  listening port: 51820

peer: HBCwy...
  endpoint: 10.8.23.199:51820
  allowed ips: 10.0.1.1/32
  latest handshake: 25 seconds ago
  transfer: 8.76 GiB received, 25.46 GiB sent

peer: KaRMh...
  endpoint: 10.8.47.93:51820
  allowed ips: 10.0.1.3/32
  latest handshake: 59 seconds ago
  transfer: 41.86 GiB received, 25.09 GiB sent

Last but not least, run systemctl enable wg-quick@wg0 to launch the service whenever the system boots.

Installing Kubernetes

service/kubernetes

There are plenty of ways to set up a Kubernetes cluster from scratch. At this point however, we settle on kubeadm. This dramatically simplifies the setup process by automating the creation of certificates, services and configuration files.

Before getting started with Kubernetes itself, we need to take care of setting up two essential services that are not part of the actual stack, namely containerd and etcd. We've been using Docker in the past, but containerd is now preferred.

Containerd setup

containerd is robust container runtime. Hints regarding supported versions are available in the official container runtimes guide. Let's install a supported containerd version:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmour -o /etc/apt/keyrings/docker.gpg
echo "deb [signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" \
  > /etc/apt/sources.list.d/docker.list

apt-get install -y containerd.io=1.6.15-1 # Kubernetes 1.26 requires at least containerd v1.6

Kubernetes recommends running containerd with the cgroup driver. This can be done by creating a containerd config file and setting the required configuration flag:

# write default containerd config
containerd config default > /etc/containerd/config.toml
# set systemd cgroup flag to true
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml

# enable containerd and restart
systemctl enable containerd
systemctl restart containerd

Etcd setup

service/etcd

etcd is a highly-available key value store, which Kubernetes uses for persistent storage of all of its REST API objects. It is therefore a crucial part of the cluster. kubeadm would normally install etcd on a single node. Depending on the number of hosts available, it would be rather stupid not to run etcd in cluster mode. As mentioned earlier, it makes sense to run at least a three node cluster due to the fact that etcd is fault tolerant only from this size on.

Even though etcd is generally available with most package managers, it's recommended to manually install a more recent version:

export ETCD_VERSION="v3.5.6"
mkdir -p /opt/etcd
curl -L https://storage.googleapis.com/etcd/${ETCD_VERSION}/etcd-${ETCD_VERSION}-linux-amd64.tar.gz \
  -o /opt/etcd-${ETCD_VERSION}-linux-amd64.tar.gz
tar xzvf /opt/etcd-${ETCD_VERSION}-linux-amd64.tar.gz -C /opt/etcd --strip-components=1

In an insecure environment configuring etcd typically involves creating and distributing certificates across nodes, whereas running it within a secure network makes this process a whole lot easier. There's simply no need to make use of additional security layers as long as the service is bound to an end-to-end secured VPN interface.

This section is not going to explain etcd configuration in depth, refer to the official documentation instead. All that needs to be done is creating a systemd unit file on each host. Assuming a three node cluster, the configuration for kube1 would look like this:

# /etc/systemd/system/etcd.service
[Unit]
Description=etcd
After=network.target [email protected]

[Service]
Type=notify
ExecStart=/opt/etcd/etcd --name kube1 \
  --data-dir /var/lib/etcd \
  --listen-client-urls "http://10.0.1.1:2379,http://localhost:2379" \
  --advertise-client-urls "http://10.0.1.1:2379" \
  --listen-peer-urls "http://10.0.1.1:2380" \
  --initial-cluster "kube1=http://10.0.1.1:2380,kube2=http://10.0.1.2:2380,kube3=http://10.0.1.3:2380" \
  --initial-advertise-peer-urls "http://10.0.1.1:2380" \
  --heartbeat-interval 200 \
  --election-timeout 5000
Restart=always
RestartSec=5
TimeoutStartSec=0
StartLimitInterval=0

[Install]
WantedBy=multi-user.target

It's important to understand that each flag starting with --initial does only apply during the first launch of a cluster. This means for example, that it's possible to add and remove cluster members at any time without ever changing the value of --initial-cluster.

After the files have been placed on each host, it's time to start the etcd cluster:

systemctl enable etcd.service # launch etcd during system boot
systemctl start etcd.service

Executing /opt/etcd/etcdctl member list should show a list of cluster members. If something went wrong check the logs using journalctl -u etcd.service.

Kubernetes setup

Now that containerd is configured and etcd is running, it's time to deploy Kubernetes. The first step is to install the required packages on each host:

# https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-using-native-package-management
# (`xenial` is correct even for newer Ubuntu versions)
curl -fsSLo /etc/apt/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" \
  > /etc/apt/sources.list.d/kubernetes.list

apt-get update

apt-get install -y kubelet=1.26.0-00 kubeadm=1.26.0-00 kubectl=1.26.0-00 # kubernetes-cni package comes as dependency of the others
# Pin Kubernetes major version since there are breaking changes between releases.
# For example, Kubernetes 1.26 requires a newer containerd (https://kubernetes.io/blog/2022/11/18/upcoming-changes-in-kubernetes-1-26/#cri-api-removal).
apt-mark hold kubelet kubeadm kubectl kubernetes-cni

Initializing the master node

Before initializing the master node, we need to create a manifest on kube1 which will then be used as configuration in the next step:

# /tmp/master-configuration.yml
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 10.0.1.1
  bindPort: 6443
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
certificatesDir: /etc/kubernetes/pki
apiServer:
  certSANs:
  - <PUBLIC_IP_KUBE1>
etcd:
  external:
    endpoints:
      - http://10.0.1.1:2379
      - http://10.0.1.2:2379
      - http://10.0.1.3:2379
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
failSwapOn: false
cgroupDriver: systemd

Then we run the following command on kube1:

kubeadm init --config /tmp/master-configuration.yml --ignore-preflight-errors=Swap,NumCPU

After the setup is complete, kubeadm prints a token such as 818d5a.8b50eb5477ba4f40. It's important to write it down, we'll need it in a minute to join the other cluster nodes.

Kubernetes is built around openness, so it's up to us to choose and install a suitable pod network. This is required as it enables pods running on different nodes to communicate with each other. One of the many options is Cilium. It requires little configuration and is considered stable and well-maintained:

# create symlink for the current user in order to gain access to the API server with kubectl
[ -d $HOME/.kube ] || mkdir -p $HOME/.kube
ln -s /etc/kubernetes/admin.conf $HOME/.kube/config

# install Cilium
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH="$(arch | sed 's/x86_64/amd64/; s/aarch64/arm64/')"
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz*

cilium install --version 1.14.1 --set ipam.mode=cluster-pool --set ipam.operator.clusterPoolIPv4PodCIDRList=10.96.0.0/16

# allow traffic on the newly created Cilium network interface
ufw allow in on cilium_vxlan
ufw reload

Cilium will not readily work with our current cluster configuration because traffic will be routed via the wrong network interface. This can be fixed by running the following command on each host:

ip route add 10.96.0.0/16 dev $VPN_INTERFACE src $VPN_IP

# on kube1:
ip route add 10.96.0.0/16 dev wg0 src 10.0.1.1
# on kube2:
ip route add 10.96.0.0/16 dev wg0 src 10.0.1.2
# on kube3:
ip route add 10.96.0.0/16 dev wg0 src 10.0.1.3

The added route will not survive a reboot as it is not persistent. To ensure that the route gets added after a reboot, we have to add a systemd service unit on each node which will wait for the WireGuard interface to come up and after that adds the route. For kube1 it would look like this:

# /etc/systemd/system/overlay-route.service
[Unit]
Description=Overlay network route for WireGuard
[email protected]

[Service]
Type=oneshot
User=root
ExecStart=/sbin/ip route add 10.96.0.0/16 dev wg0 src 10.0.1.1

[Install]
WantedBy=multi-user.target

After that we have to enable it by running following command:

systemctl enable overlay-route.service

Finally, we can check if everything works:

cilium status --wait

Joining the cluster nodes

All that's left is to join the cluster with the other nodes. Run the following command on each host:

kubeadm join --token="<TOKEN>" 10.0.1.1:6443 \
  --discovery-token-unsafe-skip-ca-verification \
  --ignore-preflight-errors=Swap

That's it, a Kubernetes cluster is ready at our disposal.

Access and operations

service/kubernetes

As soon as the cluster is running, we want to be able to access the Kubernetes API remotely. This can be done by copying /etc/kubernetes/admin.conf from kube1 to your own machine. After installing kubectl locally, execute the following commands:

# create local config folder
mkdir -p ~/.kube
# backup old config if required
[ -f ~/.kube/config ] && cp ~/.kube/config ~/.kube/config.backup
# copy config from master node
scp root@<PUBLIC_IP_KUBE1>:/etc/kubernetes/admin.conf ~/.kube/config
# change config to use correct IP address
kubectl config set-cluster kubernetes --server=https://<PUBLIC_IP_KUBE1>:6443

You're now able to remotely access the Kubernetes API. Running kubectl get nodes should show a list of nodes similar to this:

NAME    STATUS   ROLES                  AGE     VERSION
kube1   Ready    control-plane,master   5h11m   v1.21.1
kube2   Ready    <none>                 5h11m   v1.21.1
kube3   Ready    <none>                 5h11m   v1.21.1

Role-Based Access Control

As of version 1.6, kubeadm configures Kubernetes with RBAC enabled. Because our hobby cluster is typically operated by trusted people, we should enable permissive RBAC permissions to be able to deploy any kind of services using any kind of resources. If you're in doubt whether this is secure enough for your use case, please refer to the official RBAC documentation.

kubectl create clusterrolebinding permissive-binding \
  --clusterrole=cluster-admin \
  --user=admin \
  --user=kubelet \
  --group=system:serviceaccounts

Deploying services

Services can now be deployed remotely by calling kubectl -f apply <FILE>. It's also possible to apply multiple files by pointing to a folder, for example:

$ ls dashboard/
deployment.yml  service.yml

$ kubectl apply -f dashboard/
deployment "kubernetes-dashboard" created
service "kubernetes-dashboard" created

This guide will make no further explanations in this regard. Please refer to the official documentation on kubernetes.io.

Bringing traffic to the cluster

There are downsides to running Kubernetes outside of well integrated platforms such as AWS or GCE. One of those is the lack of external ingress and load balancing solutions. Fortunately, it's fairly easy to get an NGINX powered ingress controller running inside the cluster, which will enable services to register for receiving public traffic.

Ingress controller setup

Because there's no load balancer available with most cloud providers, we have to make sure the NGINX server is always running on the same host, accessible via an IP address that doesn't change. As our master node is pretty much idle at this point, and no ordinary pods will get scheduled on it, we make kube1 our dedicated host for routing public traffic.

We already opened port 80 and 443 during the initial firewall configuration, now all we have to do is to write a couple of manifests to deploy the NGINX ingress controller on kube1:

One part requires special attention. In order to make sure NGINX runs on kube1—which is a tainted control-plane node and no pods will normally be scheduled on it—we need to specify a toleration:

# from ingress/deployment.yml
tolerations:
- key: node-role.kubernetes.io/control-plane
  operator: Equal
  effect: NoSchedule

Specifying a toleration doesn't make sure that a pod is getting scheduled on any specific node. For this we need to add a node affinity rule. As we have just a single control-plane node, the following specification is enough to schedule a pod on kube1:

# from ingress/deployment.yml
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node-role.kubernetes.io/control-plane
          operator: Exists

Running kubectl apply -f ingress/ will apply all manifests in this folder. First, a namespace called ingress is created, followed by the NGINX deployment, plus a default backend to serve 404 pages for undefined domains and routes including the necessary service object. There's no need to define a service object for NGINX itself, because we configure it to use the host network (hostNetwork: true), which means that the container is bound to the actual ports on the host, not to some virtual interface within the pod overlay network.

Services are now able to make use of the ingress controller and receive public traffic with a simple manifest:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
spec:
  ingressClassName: "nginx"
  rules:
  - host: service.example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: example-service
          servicePort: http

The NGINX ingress controller is quite flexible and supports a whole bunch of configuration options.

DNS records

dns/cloudflare dns/google dns/aws dns/digitalocean

At this point we could use a domain name and put some DNS entries into place. To serve web traffic it's enough to create an A record pointing to the public IP address of kube1 plus a wildcard entry to be able to use subdomains:

Type	Name	Value
A	example.com	<PUBLIC_IP_KUBE1>
CNAME	*.example.com	example.com

Once the DNS entries are propagated our example service would be accessible at http://service.example.com. If you don't have a domain name at hand, you can always add an entry to your hosts file instead.

Additionally, it might be a good idea to assign a subdomain to each host, e.g. kube1.example.com. It's way more comfortable to ssh into a host using a domain name instead of an IP address.

Obtaining SSL/TLS certificates

Thanks to Let’s Encrypt and a project called cert-manager it's incredibly easy to obtain free certificates for any domain name pointing at our Kubernetes cluster. Setting this service up takes no time and it plays well with the NGINX ingress controller we deployed earlier. These are the related manifests:

Before deploying cert-manager using the manifests above, make sure to replace the email address in ingress/tls/cert-issuer.yml with your own.

To enable certificates for a service, the ingress manifest needs to be slightly extended:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt" # enable certificates
spec:
  ingressClassName: "nginx"
  tls: # specify domains to fetch certificates for
  - hosts:
    - service.example.com
    secretName: example-service-tls
  rules:
  - host: service.example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: example-service
          servicePort: http

After applying this manifest, cert-manager will try to obtain a certificate for service.example.com and reload the NGINX configuration to enable TLS. Make sure to check the logs of the cert-manager pod if something goes wrong.

NGINX will automatically redirect clients to HTTPS whenever TLS is enabled. In case you still want to serve traffic on HTTP, add nginx.ingress.kubernetes.io/ssl-redirect: "false" to the list of annotations.

Deploying the Kubernetes Dashboard

Now that everything is in place, we are able to expose services on specific domains and automatically obtain certificates for them. Let's try this out by deploying the Kubernetes Dashboard with the following manifests:

Optionally, the following manifests can be used to get resource utilization graphs within the dashboard using metrics-server:

What's new here is that we enable basic authentication to restrict access to the dashboard. The following annotations are supported by the NGINX ingress controller, and may or may not work with other solutions:

# from dashboard/ingress.yml
annotations:
  # ...
  nginx.ingress.kubernetes.io/auth-type: basic
  nginx.ingress.kubernetes.io/auth-secret: kubernetes-dashboard-auth
  nginx.ingress.kubernetes.io/auth-realm: "Authentication Required"

# dashboard/secret.yml
apiVersion: v1
kind: Secret
metadata:
  name: kubernetes-dashboard-auth
  namespace: kube-system
data:
  auth: YWRtaW46JGFwcjEkV3hBNGpmQmkkTHYubS9PdzV5Y1RFMXMxMWNMYmJpLw==
type: Opaque

This example will prompt a visitor to enter their credentials (user: admin / password: test) when accessing the dashboard. Secrets for basic authentication can be created using htpasswd, and need to be added to the manifest as a base64 encoded string.

Distributed block storage

Data in containers is ephemeral, as soon as a pod gets stopped, crashes, or is for some reason rescheduled to run on another node, all its data is gone. While this is fine for static applications such as the Kubernetes Dashboard (which obtains its data from persistent sources running outside of the container), persisting data becomes a non-optional requirement as soon as we deploy databases on our cluster.

Kubernetes supports various types of volumes that can be attached to pods. Only a few of these match our requirements. There's the hostPath type which simply maps a path on the host to a path inside the container, but this won't work because we don't know on which node a pod will be scheduled.

Persistent volumes

There's a concept within Kubernetes of how to separate storage management from cluster management. To provide a layer of abstraction around storage there are two types of resources deeply integrated into Kubernetes, PersistentVolume and PersistentVolumeClaim. When running on well integrated platforms such as GCE, AWS or Azure, it's really easy to attach a persistent volume to a pod by creating a persistent volume claim. Unfortunately, we don't have access to such solutions.

Our cluster consists of multiple nodes and we need the ability to attach persistent volumes to any pod running on any node. There are a couple of projects and companies emerging around the idea of providing hyper-converged storage solutions. Some of their services are running as pods within Kubernetes itself, which is certainly the perfect way of managing storage on a small cluster such as ours.

Choosing a solution

Currently there are a couple of interesting solutions matching our criteria, but they all have their downsides:

Rook.io is an open source project based on Ceph. Even though it's in an early stage, it offers good documentation and is quite flexible.
gluster-kubernetes is an open source project built around GlusterFS and Heketi. Setup seems tedious at this point, requiring some kind of schema to be provided in JSON format.
Portworx is a commercial project that offers a free variant of their proprietary software, providing great documentation and tooling.

Rook and Portworx both shine with a simple setup and transparent operations. Rook is our preferred choice because it offers a little more flexibility and is open source in contrast to Portworx, even though the latter wins in simplicity by launching just a single pod per instance.

Deploying Rook

As we run only a three node cluster, we're going to deploy Rook on all three of them by adding a control-plane toleration to the Rook cluster definition.

Before deploying Rook we need to either provide a raw, unformatted block device or specify a directory that will be used for storage on each host. On a typical Ubuntu installation, the volume on which the operating system is installed is called /dev/vda. Attaching another volume will be available as /dev/vdb.

Make sure to edit the cluster manifest as shown below and choose the right configuration depending on whether you want to use a directory or a block device available in your environment for storage:

# storage/cluster.yml
  # ...
  storage:
    useAllNodes: true
    useAllDevices: false
    storeConfig:
      databaseSizeMB: 1024
      journalSizeMB: 1024
    # Uncomment the following line and replace it with the name of block device used for storage:
    #deviceFilter: vdb
    # Uncomment the following lines when using a directory for storage:
    #directories:
    #- path: /storage/data

As mentioned earlier, Rook is using Ceph under the hood. Run apt-get install ceph-common on each host to install the Ceph common utilities. Afterwards, apply the storage manifests in the following order:

storage/00-namespace.yml
storage/operator.yml
storage/cluster.yml (wait for the rook-ceph-mon pods to be deployed kubectl -n rook get pods before continuing)
storage/storageclass.yml
storage/tools.yml

It's worth mentioning that the storageclass manifest contains the configuration for the replication factor:

# storage/storageclass.yml
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
  namespace: rook
spec:
  failureDomain: host
  replicated:
    size: 2 # replication factor
---
# ...

In order to operate on the storage cluster simply run commands within the Rook tools pod, such as:

# show ceph status
kubectl -n rook exec -it rook-tools -- ceph status

# show volumes
kubectl -n rook exec -it rook-tools -- rbd list replicapool

# show volume information
kubectl -n rook exec -it rook-tools -- rbd info replicapool/<volume>

Further commands are listed in the Rook Tools documentation.

Consuming storage

The storage class we created can be consumed with a persistent volume claim:

# minio/pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: minio-persistent-storage
spec:
  storageClassName: rook-block
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

This will create a volume called minio-persistent-storage with 5GB storage capacity.

In this example we're deploying Minio, an Amazon S3 compatible object storage server, to create and mount a persistent volume:

minio/deployment.yml
minio/ingress.yml
minio/secret.yml (MINIO_ACCESS_KEY: admin / MINIO_SECRET_KEY: admin.minio.secret.key)
minio/service.yml
minio/pvc.yml

The volume related configuration is buried in the deployment manifest:

# from minio/deployment.yml
containers:
- name: minio
  volumeMounts:
  - name: data
    mountPath: /data
# ...
volumes:
- name: data
  persistentVolumeClaim:
    claimName: minio-persistent-storage

The minio-persistent-storage volume will live as long as the persistent volume claim is not deleted (e.g. kubectl delete -f minio/pvc.yml ). The Minio pod itself can be deleted, updated or rescheduled without data loss.

Where to go from here

This was hopefully just the beginning of your journey. There are many more things to explore around Kubernetes. Feel free to leave feedback or raise questions at any time by opening an issue here.

guide's People

Contributors

Stargazers

Watchers

Forkers

moondev vishnudxb dockernuts rahulkrishnanfs ivanovb maleck13 tuananh robertpd bcoryat jamoy barkinet jmprusi kevinwojo muharremokutan cartertsai hambster nixtish snambi 7agustibm etopian sysbot sandramadhu tforrest joemocha sampwing gernest kod3r dantodor spillai mario21ic llbuaa2015 malisetti johnsonc cloud-robotics junooberlin rgaidot folpindo parthmonga omoalpha wlmgithub sync-mp4 guoyu07 ghosthamlet defstack dyhpoon mikalv ellerbrock montana hhy5277 cibuildorg geekul 40a arenstar pbgodwin githubch gerred ismailmechbal gourao michaelbernstein rubythonode prafulrana zengxianhui mhaleywgtn ip-2014 felixhummel johnko ninefive schoolofdevops superliuwr segmond vasekpurchart stakater unkaktus parallaxisjones rodrigolins manav-rai proshik grynienp vaidyanath julientant codixis vagmi faaaaabi claudiouzelac traustitj bluescurry mihaibogdaneugen zeroows harsh-px cyberj skymoon08 sarjeet2013 occultism dshamanthreddy sindile kmriyad danibram rizplate josephbolus grahamkennery

guide's Issues

had some trouble with newer versions of terraform

Once I found the proper version (0.9.6, not higher) I finally got it to build the entire cluster.

I continued and used all the manifests too. Everything seems to be covered, I now have a very nice running k8s cluster. Great guide!

upgrading hobby kube cluster

Hey, Wondering what the options are for upgrading a hobby kube cluster? Any thoughts on this, is there a way to set the version of kubernetes? If this was an option then tearing it down and back up would be an option.

Consider securing etcd

Compromised containers could access and leak important data stored in etcd.

Related comment on Hacker News: https://news.ycombinator.com/item?id=14291817

Update DigitalOcean Pricing

DigitalOcean has updated their pricing for 2018: https://blog.digitalocean.com/new-droplet-plans/

1 vCPU/1GB VMs are now only $5, so running a 3-node cluster is only $15/mo instead of $30/mo.

DigitalOcean has also enhanced their security for private networks: https://www.digitalocean.com/community/tutorials/digitalocean-private-networking-faq

I can gladly submit a PR with an updated README.md to reflect the pricing changes.

Disclaimer: Not affiliated with DO in any way (until now I've been using Vultr).

heapster UI url

Hi, someone know the url UI for heapster?
after run:
kubectl apply -f dashboard/heapster/
nothing are displayed on:
https://dashboard.example.com/#!/workload?
thanks in advance,

Hetzner Cloud provider

Hetzner Cloud just launched with great offers starting from €2.49 and has the potential to become the cheapest option for running a small scale Kubernetes cluster. An official Terraform provider has been released[1].

So far everything runs well, including automated provisioning using Terraform[2].

[1] https://github.com/hetznercloud/terraform-provider-hcloud
[2] hobby-kube/provisioning#14

Issues with cluster on Hetzner cloud - Pods stuck in "creating container"

Hi together

First: Thank you for this nice hobby-kube!!! :-)

I build one on 3 hetzner cloud vm's today. (two times, first on ubuntu 18.04 then on ubuntu 16.04)
I used this guide https://github.com/hobby-kube/guide to build it manually.

If I deploy something it hangs on "ContainerCreating" and after some time I can see the error message:
Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/76e1d121d2aedd44c3652fd285428241770a7ae2c46dc26bab853c05a025c84b: dial tcp 127.0.0.1:6784: connect: connection refused

Maybe I did something wrong or is there something missing in the guide?

Any help is much appreciated.

Some output... hope it helps:

root@kube01 ~/deployments # kubectl get nodes
NAME      STATUS    ROLES     AGE       VERSION
kube01    Ready     master    33m       v1.10.2
kube02    Ready     <none>    29m       v1.10.2
kube03    Ready     <none>    29m       v1.10.2

root@kube01 ~/deployments # kubectl get pods --all-namespaces
NAMESPACE     NAME                                    READY     STATUS              RESTARTS   AGE
kube-system   kube-apiserver-kube01                   1/1       Running             0          32m
kube-system   kube-controller-manager-kube01          1/1       Running             0          33m
kube-system   kube-dns-86f4d74b45-r9sbg               3/3       Running             0          33m
kube-system   kube-proxy-4cg67                        1/1       Running             0          30m
kube-system   kube-proxy-m7nmc                        1/1       Running             0          33m
kube-system   kube-proxy-xc729                        1/1       Running             0          30m
kube-system   kube-scheduler-kube01                   1/1       Running             0          33m
kube-system   kubernetes-dashboard-7f87cb5646-6qfp7   0/1       ContainerCreating   0          26m
kube-system   weave-net-kkbvj                         2/2       Running             0          6m
kube-system   weave-net-p2q5s                         2/2       Running             0          6m
kube-system   weave-net-sw7tz                         2/2       Running             0          6m

root@kube01 ~/deployments # kubectl describe pod -n kube-system kubernetes-dashboard-7f87cb5646-6qfp7
Name:           kubernetes-dashboard-7f87cb5646-6qfp7
Namespace:      kube-system
Node:           kube03/88.198.93.160
Start Time:     Tue, 08 May 2018 22:17:29 +0200
Labels:         app=kubernetes-dashboard
                pod-template-hash=3943761202
Annotations:    <none>
Status:         Pending
IP:
Controlled By:  ReplicaSet/kubernetes-dashboard-7f87cb5646
Containers:
  kubernetes-dashboard:
    Container ID:
    Image:          gcr.io/google_containers/kubernetes-dashboard-amd64:v1.8.3
    Image ID:
    Port:           9090/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lvlj7 (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  default-token-lvlj7:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-lvlj7
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Normal   SuccessfulMountVolume   27m                kubelet, kube03    MountVolume.SetUp succeeded for volume "default-token-lvlj7"
  Normal   Scheduled               27m                default-scheduler  Successfully assigned kubernetes-dashboard-7f87cb5646-6qfp7 to kube03
  Warning  FailedCreatePodSandBox  19m (x2 over 23m)  kubelet, kube03    Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/889d732df3746c0516c9d8616b5b96046911b0ee6593ec21db5f3121f3a26046: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/571392aa28b5a762e58043bb2b6e3e3683c9a5336a76112c2cd75eeab1ef7564: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/9544ecb695bdd48ce0b4f580d44514c38ab1209f42c64fbb19266d40a49f7579: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/183eba1061d3a3547c23dec81814650c419a04a4b1d52a2dab8c0e27c823eb1e: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/f12e5320b522996ed4937ad3ec64e255fe53125a61941e28541110bdb070bf68: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/208a57ca1209b449b577d7244078a27f8235f776fdc4e42cb770cc3dfb93f427: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/c87965c136f9c033e716234c6f658e0350cbfaa24f3541399eec300e800ad062: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/76e1d121d2aedd44c3652fd285428241770a7ae2c46dc26bab853c05a025c84b: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  11m (x4 over 15m)  kubelet, kube03    (combined from similar events): Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   SandboxChanged          7m (x28 over 23m)  kubelet, kube03    Pod sandbox changed, it will be killed and re-created.

[ERROR Swap]: running with swap on is not supported. Please disable swap

The terraform apply command got stuck at a point when it was trying to initialize all the hosts in the cluster. It kept on saying "Waiting for API server to respond".

I am using scaleway as a provider with VC1M instance type from image Ubuntu Xenial (16.04 latest)

See logs

module.kubernetes.null_resource.kubernetes.0: Still creating... (20s elapsed)
module.kubernetes.null_resource.kubernetes.2: Still creating... (20s elapsed)
module.kubernetes.null_resource.kubernetes.1: Still creating... (20s elapsed)
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 47%
module.kubernetes.null_resource.kubernetes[0] (remote-exec): 	[WARNING FileExisting-crictl]: crictl not found in system path
module.kubernetes.null_resource.kubernetes[0] (remote-exec): [preflight] Some fatal errors occurred:
module.kubernetes.null_resource.kubernetes[0] (remote-exec): 	[ERROR Swap]: running with swap on is not supported. Please disable swap
module.kubernetes.null_resource.kubernetes[0] (remote-exec): [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 54%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 54%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 75%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 78%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 78%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 82%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 82%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 86%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 86%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 90%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 90%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 92%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 92%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 94%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 94%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 97%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 97%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 98%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 98%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 99%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 99%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 99%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 99%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 99%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 99%
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 99%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 99%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... Done
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 0%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 100%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... Done
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Building dependency tree... 0%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Building dependency tree... 0%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Building dependency tree... 50%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Building dependency tree... 50%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Building dependency tree
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading state information... 0%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading state information... 0%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading state information... Done
module.kubernetes.null_resource.kubernetes[1] (remote-exec): docker.io is already the newest version (1.13.1-0ubuntu1~16.04.2).
module.kubernetes.null_resource.kubernetes[1] (remote-exec): 0 upgraded, 0 newly installed, 0 to remove and 107 not upgraded.
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 0%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... 100%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading package lists... Done
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Building dependency tree... 0%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Building dependency tree... 0%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Building dependency tree... 50%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Building dependency tree... 50%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Building dependency tree
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading state information... 0%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading state information... 0%
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Reading state information... Done
module.kubernetes.null_resource.kubernetes[1] (remote-exec): kubeadm is already the newest version (1.9.0-00).
module.kubernetes.null_resource.kubernetes[1] (remote-exec): kubectl is already the newest version (1.9.0-00).
module.kubernetes.null_resource.kubernetes[1] (remote-exec): kubelet is already the newest version (1.9.0-00).
module.kubernetes.null_resource.kubernetes[1] (remote-exec): kubernetes-cni is already the newest version (0.6.0-00).
module.kubernetes.null_resource.kubernetes[1] (remote-exec): 0 upgraded, 0 newly installed, 0 to remove and 107 not upgraded.
module.kubernetes.null_resource.kubernetes[1]: Provisioning with 'remote-exec'...
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Connecting to remote host via SSH...
module.kubernetes.null_resource.kubernetes[1] (remote-exec):   Host: 51.15.90.200
module.kubernetes.null_resource.kubernetes[1] (remote-exec):   User: root
module.kubernetes.null_resource.kubernetes[1] (remote-exec):   Password: false
module.kubernetes.null_resource.kubernetes[1] (remote-exec):   Private key: false
module.kubernetes.null_resource.kubernetes[1] (remote-exec):   SSH Agent: true
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Connected!
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes.2: Still creating... (30s elapsed)
module.kubernetes.null_resource.kubernetes.1: Still creating... (30s elapsed)
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes.1: Still creating... (40s elapsed)
module.kubernetes.null_resource.kubernetes.2: Still creating... (40s elapsed)
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes.1: Still creating... (50s elapsed)
module.kubernetes.null_resource.kubernetes.2: Still creating... (50s elapsed)
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes.2: Still creating... (1m0s elapsed)
module.kubernetes.null_resource.kubernetes.1: Still creating... (1m0s elapsed)
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes.2: Still creating... (1m10s elapsed)
module.kubernetes.null_resource.kubernetes.1: Still creating... (1m10s elapsed)
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes.2: Still creating... (1m20s elapsed)
module.kubernetes.null_resource.kubernetes.1: Still creating... (1m20s elapsed)
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes.1: Still creating... (1m30s elapsed)
module.kubernetes.null_resource.kubernetes.2: Still creating... (1m30s elapsed)
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes.1: Still creating... (1m40s elapsed)
module.kubernetes.null_resource.kubernetes.2: Still creating... (1m40s elapsed)
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond

For further investigation, i logged on to the kube1 and ran the following command which is mentioned in the guides to start kubeadm. It gave me swap error.

root@kube1:/tmp# cat master-configuration.yml
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
api:
  advertiseAddress: 10.0.1.1
etcd:
  endpoints:
  - http://10.0.1.1:2379
  - http://10.0.1.2:2379
  - http://10.0.1.3:2379
apiServerCertSANs:
  - 51.15.59.16
root@kube1:/tmp# kubeadm init --config /tmp/master-configuration.yml
[init] Using Kubernetes version: v1.9.0
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
	[WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Some fatal errors occurred:
	[ERROR Swap]: running with swap on is not supported. Please disable swap
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`

Userspace proxy?

Hi, I work on Weave Net, and I wonder if you could give a few pointers how you came to this conclusion:

To cut a long story short kube-proxy needs to be patched to run in userspace mode.

I must have built a thousand Kubernetes clusters, and don't recall ever switching kube-proxy into userspace mode.

Waiting for API server to respond (Digital Ocean)

I could have sworn that I was able to provision a cluster a few weeks ago using these terraform modules but I'm now receiving the following error, and it just continues to print out Waiting for API server to respond...

module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[0] (remote-exec): [init] using Kubernetes version: v1.11.2
module.kubernetes.null_resource.kubernetes[0] (remote-exec): [preflight] running pre-flight checks
module.kubernetes.null_resource.kubernetes[0] (remote-exec): 	[WARNING Swap]: running with swap on is not supported. Please disable swap
module.kubernetes.null_resource.kubernetes[0] (remote-exec): I0814 00:10:05.192904   14243 kernel_validator.go:81] Validating kernel version
module.kubernetes.null_resource.kubernetes[0] (remote-exec): I0814 00:10:05.193231   14243 kernel_validator.go:96] Validating kernel config
module.kubernetes.null_resource.kubernetes.2: Still creating... (50s elapsed)
module.kubernetes.null_resource.kubernetes.1: Still creating... (50s elapsed)
module.kubernetes.null_resource.kubernetes.0: Still creating... (50s elapsed)
module.kubernetes.null_resource.kubernetes[0] (remote-exec): [preflight] Some fatal errors occurred:
module.kubernetes.null_resource.kubernetes[0] (remote-exec): 	[ERROR ExternalEtcdVersion]: this version of kubeadm only supports external etcd version >= 3.2.17. Current version: 3.2.13
module.kubernetes.null_resource.kubernetes[0] (remote-exec): 	[ERROR ExternalEtcdVersion]: this version of kubeadm only supports external etcd version >= 3.2.17. Current version: 3.2.13
module.kubernetes.null_resource.kubernetes[0] (remote-exec): 	[ERROR ExternalEtcdVersion]: this version of kubeadm only supports external etcd version >= 3.2.17. Current version: 3.2.13
module.kubernetes.null_resource.kubernetes[0] (remote-exec): [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes.2: Still creating... (1m0s elapsed)
module.kubernetes.null_resource.kubernetes.1: Still creating... (1m0s elapsed)
module.kubernetes.null_resource.kubernetes.0: Still creating... (1m0s elapsed)

Add note on integrating with external DNS provider

Hi!

Just found this project, fantastic work! I'm interested in trying it out, but I already have a DNS provider, and I was a little confused as to exactly how I would integrate my existing provider with this setup. Could you add a little note explaining what records to set? I see the example records, but the addition of the terraform files for google/aws/cloudflare makes me wonder if there's anything more I need to add.

Thanks!

Have you considered network namespaces?

Hello,

While researching wireguard and docker I came across the documentation page about wireguard routing and linux network namespaces. I think it can improve security since all docker containers can only access other hosts inside wireguard. This is also true for other applications started inside the namespace.

https://www.wireguard.com/netns/
https://medium.com/@ApsOps/an-illustrated-guide-to-kubernetes-networking-part-1-d1ede3322727
https://blog.scottlowe.org/2013/09/04/introducing-linux-network-namespaces/
https://medium.com/google-cloud/understanding-kubernetes-networking-pods-7117dd28727

Include OVH?

OVH is one of the best vale/price VPS providers. Their cheapest offering - VPS SSD 1 is $3.49 for 1 core and 2 GB RAM. It has a data center in Beauharnois, Canada which should provide much better latency than Scaleway.

Changing hosts count makes subsequent terraform apply to fail because of etcd

Wonderful guide, thank you!

Quick question - if I have a running cluster of e.g. 3 nodes and change that to 4, then at the etcd step the terraform apply fails because etcd requires a somewhat stateful process to add a new node - ssh to existing node, add new one and on the new one start etcd with the env variables from previous step, more details on etcd cluster reconfiguration. What would be a good way to handle this? Alternatively I'd like to limit etcd to 3 nodes, while there could be many more k8s nodes.

HA on Distributed Block Storage

As per Distributed Block storage section, I understand that the devices on the nodes will be used to create PVs and if a pod goes down, it will again attach to the previous PV it was attached to. One thing i am quite not sure about is if a node goes down (along with its attached device), will its PV be still visible/available because the repl count is set to 2? Which means solutions like portworxcan separate out PVs from devices and PVCs will still continue to work even if a single disk fails?

Ingress with rewrite-target annotation (or any)

Hello. I've followed this guide for the most part, including the ingress setup, and the issue I've now encountered is that the nginx controller does not seem to be responding to any annotations I add to my Ingress.

I'm checking with kubectl exec -it -n ingress nginx-ingress-controller-8cd96ddb8-sqfvl cat /etc/nginx/nginx.conf what generated the nginx.conf contains, and have confirmed with diff that adding or removing annotations have no impact on the nginx.conf. Changing the path property in yaml does however change the respective location <path> { declaration in the config so the changes are being applied, just not the annotations.

This my Ingress currently, where the intent is to have the path /fillaripollari send requests to my API as if they were coming from the root (hence the need for rewrite).

Aside from rewrite not getting applied, the ingress is working perfectly from what I can tell.

Any ideas how to

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: fillaripolleri
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/rewrite-target: "/"
spec:
  rules:
  - host: mydomain.com
    http:
      paths:
      - path: /fillaripolleri
        backend:
          serviceName: fillaripolleri-server
          servicePort: 3000

Name:             fillaripolleri
Namespace:        default
Address:
Default backend:  default-http-backend:80 (<none>)
Rules:
  Host        Path  Backends
  ----        ----  --------
  mydomain.com
              /fillaripolleri   fillaripolleri-server:3000 (<none>)
Annotations:
  kubectl.kubernetes.io/last-applied-configuration:  {"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{"kubernetes.io/ingress.class":"nginx","nginx.ingress.kubernetes.io/rewrite-target":"/"},"name":"fillaripolleri","namespace":"default"},"spec":{"rules":[{"host":"mydomain.com","http":{"paths":[{"backend":{"serviceName":"fillaripolleri-server","servicePort":3000},"path":"/fillaripolleri"}]}}]}}

  kubernetes.io/ingress.class:                 nginx
  nginx.ingress.kubernetes.io/rewrite-target:  /
Events:
  Type    Reason  Age   From                Message
  ----    ------  ----  ----                -------
  Normal  CREATE  24s   ingress-controller  Ingress default/fillaripolleri

Hyperlinks in Table of Contents dont work

Ingress nginx-ingress-controller readiness probe failed

Disclaimer: New to kubernetes and terraform.

I am working through your guide and have successfully deployed kubernetes to Vultr by way of the provision repository.

I am currently running into an issue however with the ingress/nginx-ingress-controller. According to the event log it is failing due to the readiness checks (http://<ip>:10254/healthz).

Any assistance here would be greatly appreciated.

  Normal   Scheduled  13m                 default-scheduler     Successfully assigned ingress/nginx-ingress-controller-68c4654b64-mmn45 to vultr.guest
  Warning  Unhealthy  12m (x3 over 12m)   kubelet, vultr.guest  Readiness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  12m (x3 over 12m)   kubelet, vultr.guest  Readiness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Normal   Pulling    12m (x2 over 13m)   kubelet, vultr.guest  pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
  Warning  Unhealthy  12m (x3 over 12m)   kubelet, vultr.guest  Liveness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Normal   Killing    12m                 kubelet, vultr.guest  Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Pulled     12m (x2 over 12m)   kubelet, vultr.guest  Successfully pulled image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
  Normal   Started    12m (x2 over 12m)   kubelet, vultr.guest  Started container
  Normal   Created    12m (x2 over 12m)   kubelet, vultr.guest  Created container
  Normal   Pulled     12m (x2 over 12m)   kubelet, vultr.guest  Successfully pulled image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
  Normal   Started    12m (x2 over 12m)   kubelet, vultr.guest  Started container
  Normal   Created    12m (x2 over 12m)   kubelet, vultr.guest  Created container
  Warning  Unhealthy  12m (x3 over 12m)   kubelet, vultr.guest  Liveness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Normal   Pulling    12m (x2 over 13m)   kubelet, vultr.guest  pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
  Normal   Killing    12m                 kubelet, vultr.guest  Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Pulled     12m (x2 over 12m)   kubelet, vultr.guest  Successfully pulled image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
  Normal   Created    12m (x2 over 12m)   kubelet, vultr.guest  Created container
  Normal   Started    12m (x2 over 12m)   kubelet, vultr.guest  Started container
  Warning  Unhealthy  11m (x6 over 12m)   kubelet, vultr.guest  Liveness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Normal   Pulling    11m (x3 over 13m)   kubelet, vultr.guest  pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0"
  Normal   Killing    11m (x2 over 12m)   kubelet, vultr.guest  Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy  11m (x8 over 12m)   kubelet, vultr.guest  Readiness probe failed: Get http://<ip>:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    3m2s (x26 over 9m)  kubelet, vultr.guest  Back-off restarting failed container

$ kubectl logs --follow -n ingress deployment/nginx-ingress-controller
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.21.0
  Build:      git-b65b85cd9
  Repository: https://github.com/aledbf/ingress-nginx
-------------------------------------------------------------------------------

nginx version: nginx/1.15.6
W0114 02:45:19.180427       7 client_config.go:548] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0114 02:45:19.180730       7 main.go:196] Creating API client for https://10.96.0.1:443
I0114 02:45:19.201933       7 main.go:240] Running in Kubernetes cluster version v1.13 (v1.13.2) - git (clean) commit cff46ab41ff0bb44d8584413b598ad8360ec1def - platform linux/amd64
I0114 02:45:19.204756       7 main.go:101] Validated ingress/default-http-backend as the default backend.
I0114 02:45:19.383896       7 nginx.go:258] Starting NGINX Ingress controller
I0114 02:45:19.394616       7 event.go:221] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress", Name:"nginx-ingress-controller", UID:"4bd3da96-17a6-11e9-9a3f-560001d7c2e6", APIVersion:"v1", ResourceVersion:"10661", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress/nginx-ingress-controller
I0114 02:45:20.588427       7 nginx.go:279] Starting NGINX process
I0114 02:45:20.590132       7 leaderelection.go:187] attempting to acquire leader lease  ingress/ingress-controller-leader-nginx...
W0114 02:45:20.591452       7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint
I0114 02:45:20.591559       7 controller.go:172] Configuration changes detected, backend reload required.
I0114 02:45:20.615190       7 leaderelection.go:196] successfully acquired lease ingress/ingress-controller-leader-nginx
I0114 02:45:20.615996       7 status.go:148] new leader elected: nginx-ingress-controller-68c4654b64-mmn45
I0114 02:45:20.776051       7 controller.go:190] Backend successfully reloaded.
I0114 02:45:20.776242       7 controller.go:202] Initial sync, sleeping for 1 second.
[14/Jan/2019:02:45:21 +0000]TCP200000.001
W0114 02:45:24.350352       7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint
W0114 02:45:27.705918       7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint
141.105.109.219 - [141.105.109.219] - - [14/Jan/2019:02:45:29 +0000] "GET / HTTP/1.1" 404 153 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7" 423 0.000 [-] - - - - 22b6d6cdd9c481da64779f3f0cd8b5e9
W0114 02:45:31.017150       7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint
W0114 02:45:34.350531       7 controller.go:373] Service "ingress/default-http-backend" does not have any active Endpoint

Warning: `/etc/wireguard/wg0.conf' is world accessible

$ umask u=rwx,go= && cat > /etc/wireguard/wg0.conf << _EOF
[Interface]
Address = 10.0.1.1
PrivateKey = <PRIVATE_KEY_KUBE1>
ListenPort = 51820

[Peer]
PublicKey = <PUBLIC_KEY_KUBE2>
AllowedIps = 10.0.1.2/32
Endpoint = 10.8.23.94:51820

[Peer]
PublicKey = <PUBLIC_KEY_KUBE3>
AllowedIps = 10.0.1.3/32
Endpoint = 10.8.23.95:51820
_EOF

$ sudo chmod u=rwx,go= /etc/wireguard/wg0.conf
$ sudo systemctl restart wg-quick@wg0

kube-lego is in maintenance mode only

According to https://github.com/jetstack/kube-lego, the project is in maintenance mode.

Someone pointed it out here: https://news.ycombinator.com/item?id=17054466, just wanted to make sure you know about it because it's easy to miss comment replies on HN :)

Digital Ocean Load Balancer

Returns 503 when load balanced across the node IPs.

The nginx-ingress controller is already working as it's showing

default backend - 404

when accessed directly, however when using DO's LB, it returns 503.

I'm not sure why.

Scaleway doesn't allow you to add more SSDs

I was about to set up persistent volumes for my cluster, but it looks like scaleway doesn't allow you to add more than 50GB storage to a single instance. If you try to start your instance with more than 50GB storage attached, you will get an error:

The total volume size of VC1S instances must be equal or below 50GB

I think it's worth mentioning in the README that PVs are not a possibility on scaleway (haven't tried digital ocean yet).

Dashboard Password

Hi,

Thanks for your great guide.
I managed to run everything. however when I visit the dashboard's URL, it requires a username/password. How do I get those?

Thanks

PostUp/PreDown in wg0.conf instead systemd service for setting routes

It's possible to configure PostUp/PreDown commands in the Wireguard configuration. It's maybe the better place to configure the routes instead of heaving a separate systemd service for this.

See https://git.zx2c4.com/WireGuard/about/src/tools/man/wg-quick.8

Consider updating to use Packer for Scaleway

While I have not tried myself, after reading the latest Scaleway blog post, it looks like it's now possible to use Packer. Perhaps that would help with installation of wireguard? Would make installation even easier.

Updated pricing

Hetzner and Scaleway have new prices now. It would be nice for those price changes to reflect on the ReadMe.

502 Bad Gateway → Deploying the Kubernetes Dashboard

Hi, someone know how fix it? all services are working...

502 Bad Gateway
nginx/1.13.2

Getting kubectl states....
NAME                          STATUS    AGE       VERSION
mas-00                        Ready     6h        v1.7.0
min-00                        Ready     6h        v1.7.0
vmi124777.contaboserver.net   Ready     6h        v1.7.0
NAMESPACE     NAME                                       READY     STATUS    RESTARTS   AGE
ingress       default-http-backend-726995137-rpvsg       1/1       Running   1          5h
ingress       kube-lego-2933009699-gjb48                 1/1       Running   1          35m
ingress       nginx-ingress-controller-588775257-4gn3q   1/1       Running   2          5h
kube-system   heapster-3875886179-gbnq2                  1/1       Running   1          40m
kube-system   kube-apiserver-mas-00                      1/1       Running   1          6h
kube-system   kube-controller-manager-mas-00             1/1       Running   1          6h
kube-system   kube-dns-2425271678-xsh8f                  3/3       Running   3          6h
kube-system   kube-proxy-hkw96                           1/1       Running   1          6h
kube-system   kube-proxy-k63sl                           1/1       Running   1          6h
kube-system   kube-proxy-q6x71                           1/1       Running   1          6h
kube-system   kube-scheduler-mas-00                      1/1       Running   1          6h
kube-system   kubernetes-dashboard-4079053634-fkz3m      1/1       Running   1          40m
kube-system   weave-net-32zwn                            2/2       Running   4          6h
kube-system   weave-net-lqss1                            2/2       Running   3          6h
kube-system   weave-net-w9zsg                            2/2       Running   3          6h
NAMESPACE     NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
ingress       default-http-backend       1         1         1            1           5h
ingress       kube-lego                  1         1         1            1           35m
ingress       nginx-ingress-controller   1         1         1            1           5h
kube-system   heapster                   1         1         1            1           40m
kube-system   kube-dns                   1         1         1            1           6h
kube-system   kubernetes-dashboard       1         1         1            1           40m
NAMESPACE     NAME                   CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
default       kubernetes             10.96.0.1        <none>        443/TCP         6h
ingress       default-http-backend   10.110.68.42     <none>        80/TCP          5h
ingress       kube-lego-nginx        10.108.198.188   <none>        8080/TCP        35m
kube-system   heapster               10.100.5.228     <none>        80/TCP          40m
kube-system   kube-dns               10.96.0.10       <none>        53/UDP,53/TCP   6h
kube-system   kubernetes-dashboard   10.97.124.69     <none>        80/TCP          40m

thanks in advance

Add guide to non-vendorcloud (i.e. on prem Bare Metal) installs

As a hobbyist, I don't want my projects to unnecessarily burn cash by consuming cost-per-resource-over-time whilst iterating over configurations. Using locally available commodity hardware which I and most hobbyists I know generally have available is a more efficient way to host projects such as this.

I appreciate this is a challenge with Terraform not having a provider for bare metal without being opinionated (the Digital Rebar or Matchbox providers seem the best option) but perhaps the most agnostic way to accomplish this would be via the Salt-Masterless provisioner? Another approach that might simplify the process somewhat whilst introducing a thin layer of abstraction could be to use LXC/D, perhaps in conjunction with Packer images?

Kubelet binds external IP address as internalP

When I joined nodes to my k8s-cluster I noticed, that kubelet assigns the nodes external IP to the node:

$ kubectl describe node some-node2

Name:               some-node2
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/hostname=some-node2
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:             <none>
CreationTimestamp:  Mon, 12 Feb 2018 12:21:44 +0100
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Wed, 14 Feb 2018 09:52:05 +0100   Mon, 12 Feb 2018 12:21:44 +0100   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Wed, 14 Feb 2018 09:52:05 +0100   Mon, 12 Feb 2018 12:21:44 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 14 Feb 2018 09:52:05 +0100   Mon, 12 Feb 2018 12:21:44 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  Ready            True    Wed, 14 Feb 2018 09:52:05 +0100   Mon, 12 Feb 2018 12:22:05 +0100   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  195.201.XXX.XXX
  Hostname:    some-node2
Capacity:

This results in all pods using this IP address. Also the the kube-proxy and weave-net pod:

$ kubectl get pods -n kube-system -o wide

.
.
.
kube-proxy-rqwv7                                  1/1       Running   0          10d       10.0.1.2         master
kube-proxy-rvz4z                                  1/1       Running   0          1d        195.201.XXX.XXX   some-node2
kube-proxy-rwr7j                                   1/1       Running   5          9d        10.0.1.3         some-node3
kube-proxy-vg8lb                                  1/1       Running   0          10d       10.0.1.8        some-node4
weave-net-7fx72                                   2/2       Running   1          10d       10.0.1.1         some-node1
weave-net-pczwt                                   2/2       Running   0          1d        195.201.XXX.XXX   some-node2
.
.
.

Is this the expected behavior?
Because, as I understand it, one would expose pods through services or ingresses to the outside world. As mentioned in:

And it feels a little bit strange to see the weave-net pod exposed with a public IP address.

For the nodes to expose their internal IP, I set extra environment args (--node-ip) to the kubelet system unit. Here for the node with the internalIP 10.0.1.2:

# /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
Environment="KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10 --cluster-domain=cluster.local"
Environment="KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt"
Environment="KUBELET_CADVISOR_ARGS=--cadvisor-port=0"
Environment="KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true --cert-dir=/var/lib/kubelet/pki"
Environment="KUBELET_EXTRA_ARGS=--node-ip=10.0.1.2"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS

What do you think?

Include private docker registry in the guide

First of all, thanks for the guide! I'm almost finishing to configure my own kubernetes too :)
One thing that I think it is missing in the guide, is the usage of a private docker registry (either on the internet or self-hosted).
While searching, I've found treescale.com. Just go there, register and you're ready to push your own images :)
For the download of images from a private repository, I've found this guide: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
It's actually very easy to make it work with your kubernetes.
I hope I have the time to send a pull request to add the private docker registry to the guide, but if I don't manage to do it, you should be able to catch up just by reading this isssue.

Note: I'm not related to treescale in any form.

Integrate external storage providers : e.g. nfs-provisioner

"nfs-provisioner is an out-of-tree dynamic provisioner for Kubernetes 1.4. You can use it to quickly & easily deploy shared storage that works almost anywhere. "

https://github.com/kubernetes-incubator/external-storage

Especially for the hobbyist external storage is a big problem or very expensive. It would be fantastic if you could integrate this nfs-provisioner in your tutorial.

Maintenance burden

I am trying to estimate how resource-intensive is to maintain this cluster. I'm pretty decent with linux, shells, networking, the concept, I could even fall back to iptables without a hiccup. But my current job is relatively far from devops.

So setting this cluster up would take several hours, up to a day, I assume. That I can afford. But I wonder how up-to-date is it important to be, and furthermore, how frequent are updates to various parts of the project. By using Google's containers, I have a feeling every time I log in to my computer there's an update to gcloud and kubectl packages.

I would maintain this cluster purely for personal reasons - playing with it a bit, maybe learning etc. But if I need to spend a lot of time just maintaining this, then I'm probably better off using Scaleway or Hetzner directly.

Can anybody share any insights into this topic? How often do you have to do updates etc to this system?

What do you think about Rancher Longhorn distributed block storage solution?

Hi,

I didn't see Rancher Longhorn in Distributed block storage solution list. Do you think that Longhorn can be an alternative to portworx?

Best regards,
Stéphane

Rook deployment and command namespace discrepancy

In the rook documentation the operator is deployed on the default namespace

metadata:
  name: rook-operator
  namespace: default

But in the docs the check command uses the rook namespace

kubectl -n rook get pods

Documentation: Emphasize that deviating from the defaults leads to pain, suffering, and hate

I am very pleased to have a running K8s cluster, and to report that doing so is very straightforward using the default values in https://github.com/hobby-kube/provisioning!

My original plan was not to follow all the defaults. After all, why pay for 4GB instances when 2GB instances are sufficient and half the price? Whether the underlying issues that I've run across are worth addressing is debatable, but having spent >10h trying to make what I'd figured were relatively minor modifications without success, I'd like to save others the same trouble.

In particular, the README here points to Scaleway's 2GB instances: "The very best offer at this time is definitely Scaleway where one gets a three node cluster up and running with enough memory (3x2GB) for less than €10/month." I'd inferred that using those 2GB instances with this project would be straightforward, and while I believe it it still possible, the tradeoff for the time investment in re-writing module.swap would be significant.

Among the things I tried with Scaleway before throwing up my hands and using all-default values (except for credentials):

Using Ubuntu Zesty instead of Xenial

Everything runs and terraform apply reports success, but kubectl reports that nodes never go ready due to DNS issues, and further investigation indicates that CNI isn't fully configured on the instances.

Using Scaleway ARM C1 instances

This doesn't work because the C1 instances are ARMv7l, which are 32-bit CPUs. Golang has known issues on 32-bit architectures, so etcd won't run without modification to the runtime environment.

Using Scaleway ARM ARM64-2GB instances

This doesn't work because (a) the install script for etcd is currently hardcoded for arch amd64 (granted, not too hard to correct) and (b) despite initial appearances to the contrary, much like the VC1S type instances, ARM64-2GB Scaleway instances are limited to a combined 50GB of attached volumes at boot. (It was my belief that this latter was only true for VC1S that had me pressing on to get things working under ARM.)

Using Scaleway x86-64 VC1S instances

This doesn't work because you can't have more than 50GB of volumes attached to Scaleway 2GB volumes upon boot, and the provided boot images are already 50GB. This is already being tracked at #4 .

Starting etcd.service hangs attempting to contact peers

I'm at the step where the etcd cluster is brought up by starting the etcd.service on each host. I've validated that the VPN connections are ok using wg show.
`
root@kube1:~# wg show
interface: wg0
public key: qSQ/xnfVbSYIjo77TZeswUVV2nB4V9gO6Q0uVIEgdCY=
private key: (hidden)
listening port: 51820

peer: AcTw0SN6h9fqfIB25zIqpEEob7Qjum+r29qOHFzrdUY=
endpoint: 10.8.23.94:51820
allowed ips: 10.0.1.2/32
transfer: 0 B received, 21.54 KiB sent

peer: dCu52IH4TgIxbk+MP11PwO8oIzrv9dH4K/ZjIOy9mmo=
endpoint: 10.8.23.95:51820
allowed ips: 10.0.1.3/32
transfer: 0 B received, 21.39 KiB sent
`

The systemctl start etcd.service call never returns. Looking at journalctl -u etcd.service shows the following lines over and over till i killed the start call.

May 09 20:16:06 kube1 etcd[32490]: health check for peer 910054c0ee2bca8d could not connect: dial tcp 10.0.1.3:2380: i/o timeout May 09 20:16:06 kube1 etcd[32490]: health check for peer 940feae903dd7834 could not connect: dial tcp 10.0.1.2:2380: i/o timeout
For context I'm attempting this using virtualbox on OSX with ubuntu 16.04 server and bridged network card settings.

kubectl connection

I'm trying to do this guide by simulating a cloud environment locally. I have 3 Virtualbox Ubuntu servers and following the guide to create a 3 node cluster. Everything was working fine until I got to the Weave Net configuration. When I invoke kubectl to install Weave Net I get:
root@kube1:/etc/kubernetes# kubectl apply -f https://git.io/weave-kube-1.6
The connection to the server kube1:8080 was refused - did you specify the right host or port?

any suggestions to what may be wrong?
Thanks.

hcloud provider: ssh: unable to authenticate

Hello,

Thank you for this project, it is awesome! But I run into problem I can't resolve.
When trying to provision hcloud server it stuck on connecting to newly created server.
Here is output of terraform:

module.provider.hcloud_server.host: Creating...
 datacenter:   "" => "<computed>"
 image:        "" => "ubuntu-16.04"
 ipv4_address: "" => "<computed>"
 ipv6_address: "" => "<computed>"
 ipv6_network: "" => "<computed>"
 keep_disk:    "" => "false"
 location:     "" => "nbg1"
 name:         "" => "node-1"
 server_type:  "" => "cx11"
 ssh_keys.#:   "" => "1"
 ssh_keys.0:   "" => "work"
 status:       "" => "<computed>"
module.provider.hcloud_server.host: Still creating... (10s elapsed)
module.provider.hcloud_server.host: Still creating... (20s elapsed)
module.provider.hcloud_server.host: Provisioning with 'remote-exec'...
module.provider.hcloud_server.host (remote-exec): Connecting to remote host via SSH...
module.provider.hcloud_server.host (remote-exec):   Host: 159.69.206.29
module.provider.hcloud_server.host (remote-exec):   User: root
module.provider.hcloud_server.host (remote-exec):   Password: false
module.provider.hcloud_server.host (remote-exec):   Private key: false
module.provider.hcloud_server.host (remote-exec):   SSH Agent: false
module.provider.hcloud_server.host (remote-exec):   Checking Host Key: false


.....


module.provider.hcloud_server.host: Still creating... (5m0s elapsed)
module.provider.hcloud_server.host: Still creating... (5m10s elapsed)
module.provider.hcloud_server.host (remote-exec): Connecting to remote host via SSH...
module.provider.hcloud_server.host (remote-exec):   Host: 159.69.206.29
module.provider.hcloud_server.host (remote-exec):   User: root
module.provider.hcloud_server.host (remote-exec):   Password: false
module.provider.hcloud_server.host (remote-exec):   Private key: false
module.provider.hcloud_server.host (remote-exec):   SSH Agent: false
module.provider.hcloud_server.host (remote-exec):   Checking Host Key: false
module.provider.hcloud_server.host: Still creating... (5m20s elapsed)

Error: Error applying plan:

1 error(s) occurred:

* module.provider.hcloud_server.host: timeout - last error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none], no supported methods remain

as displayed above I'm passing work ssh key to ssh_keys of "hcloud_server" "host" resource, this key is also added to hetzner account and I can ssh to server. So it seems that remote-exec do not use key work.

I'm using 0.11.10 version of terraform and 1.4.0 of hcloud provider.

Problems deploying Portworx

I tried to deploy portworx, but forgot to change the variables PX_STORAGE_SERVICE which was set to /dev/vdb instead of /dev/vda.
So I manually deleted the daemon_set and the storage class, and redeployed using kubectl apply -f storage.
However it seems that smth is broken as I end up with 3 pods in CrashLoopBackOff state.
Here is the log from one of the pod:

container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"rootfs_linux.go:54: mounting \\\"/var/lib/kubelet/pods/aad88e78-09c2-11e8-8395-de2b44396007/containers/portworx-storage-1/ccd4ada6\\\" to rootfs \\\"/var/lib/docker/overlay2/04e4c34c1d0f67a09e9f3b4feef3cadd6e4b1e8e016fae48762b250798769803/merged\\\" at \\\"/var/lib/docker/overlay2/04e4c34c1d0f67a09e9f3b4feef3cadd6e4b1e8e016fae48762b250798769803/merged/dev/termination-log\\\" caused \\\"no such file or directory\\\"\""

Is it possible to have multiple ingress controllers?

Is it possible to have multiple ingress controllers, one per node?

I was thinking about designing a multi-cloud cluster with one node per different cloud provider and using global server load balancing.

Have you looked at the CIS Security Guide for Kubernetes yet?

Have you looked at the CIS Security Guide for Kubernetes yet? If not, it is getting released next week. It is in the preview as of now. You may signup for this. There are quite a few security configuration items that might help you to improve the guide.
To sign up check - https://cavirin.com/kubernetes.html

How to register another user account to dashboard

Thanks for this great well-explained guide on how to setup a Kubernetes cluster on Scaleway.
Being a very early hobbyist, I can't figure out how to change the credentials with my dashboard.
I did another user using htpasswd, but now what should be changed to switch identification?

using DO the wireguard hangs forever

I'm using the DO provider and looks like if use the wireguard module it not works as is in the DO.

Any tip? I'm also debugging what need to do in order to make it work.

...
module.wireguard.null_resource.wireguard.1: Still creating... (34m10s elapsed)
module.wireguard.null_resource.wireguard.0: Still creating... (34m10s elapsed)
module.wireguard.null_resource.wireguard.2: Still creating... (34m20s elapsed)
module.wireguard.null_resource.wireguard.1: Still creating... (34m20s elapsed)
module.wireguard.null_resource.wireguard.0: Still creating... (34m20s elapsed)
module.wireguard.null_resource.wireguard.0: Still creating... (34m30s elapsed)
module.wireguard.null_resource.wireguard.1: Still creating... (34m30s elapsed)
module.wireguard.null_resource.wireguard.2: Still creating... (34m30s elapsed)
module.wireguard.null_resource.wireguard.2: Still creating... (34m40s elapsed)
module.wireguard.null_resource.wireguard.0: Still creating... (34m40s elapsed)
module.wireguard.null_resource.wireguard.1: Still creating... (34m40s elapsed)
module.wireguard.null_resource.wireguard.0: Still creating... (34m50s elapsed)
...

and thanks so much for the guide!!

Question: why not run etcd in docker ?

Maybe this is a stupid question but i'm wondering if it wouldn't be easier to just create an etcd container which restarts on each node, rather than using another systemd service.

Did you consider this option ?
What are your thoughts about it ?
Thanks ! Great guide btw ;)

WireGuard already ships systemd unit for wg-quick

Hey,

Great guide. Thanks for writing it and for the inclusion of WireGuard. Nice to see.

I noticed that you're providing instructions on creating a custom unit for systemd. Actually, WireGuard already ships with the [email protected] unit, which is ± exactly what you have in here. Thus, rather than instructing users to create a new unit, you may simple instruct them to systemctl enable wg-quick@wg0 and systemctl start wg-quick@wg0. The Ubuntu packages should be installing this automatically, and if you're compiling and installing manually, make install will automatically install this unit file too.

Jason

Kube-DNS do not start

With scaleway provider, using VC1M instance, I couldn't ssh to my server, seems like the problem is related to the option enable local boot which is disabled by default when using terraform.
I could terraform the machines till the end of the process without errors but the kube-dns pod remains pending on containerCreating state.
From my research it could be related to a bug with weave-net, I tried to move to kube-flannel but I need to re-init the master node with kubeadmin which raise other problems.

kubectl get pods --all-namespaces

NAMESPACE     NAME                            READY     STATUS              RESTARTS   AGE
kube-system   kube-apiserver-kube1            1/1       Running             0          16m
kube-system   kube-controller-manager-kube1   1/1       Running             0          17m
kube-system   kube-dns-86f4d74b45-7f8j2       0/3       ContainerCreating   0          17m
kube-system   kube-proxy-gvn78                1/1       Running             0          17m
kube-system   kube-proxy-p8k7z                1/1       Running             0          17m
kube-system   kube-proxy-wtw2p                1/1       Running             0          17m
kube-system   kube-scheduler-kube1            1/1       Running             0          16m
kube-system   weave-net-fwx9c                 2/2       Running             1          17m
kube-system   weave-net-hjcpp                 2/2       Running             0          17m
kube-system   weave-net-vcn74                 2/2       Running             0          17m

kubectl describe pods kube-dns -n kube-system

Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Warning  FailedScheduling        19m (x4 over 19m)  default-scheduler  0/1 nodes are available: 1 node(s) were not ready.
  Warning  FailedScheduling        19m (x2 over 19m)  default-scheduler  0/2 nodes are available: 2 node(s) were not ready.
  Warning  FailedScheduling        19m (x3 over 19m)  default-scheduler  0/3 nodes are available: 3 node(s) were not ready.
  Normal   Scheduled               18m                default-scheduler  Successfully assigned kube-dns-86f4d74b45-7f8j2 to kube3
  Normal   SuccessfulMountVolume   18m                kubelet, kube3     MountVolume.SetUp succeeded for volume "kube-dns-config"
  Normal   SuccessfulMountVolume   18m                kubelet, kube3     MountVolume.SetUp succeeded for volume "kube-dns-token-t47ql"
  Warning  FailedCreatePodSandBox  2m (x4 over 14m)   kubelet, kube3     Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   SandboxChanged          2m (x4 over 14m)   kubelet, kube3     Pod sandbox changed, it will be killed and re-created.

PCV

Im trying to create a PCV with this yml

---
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: portworx
provisioner: kubernetes.io/portworx-volume
parameters:
  repl: "2"
  priority_io: "high"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
   name: postgres-data-staging
   namespace: lb-staging
spec:
  storageClassName: portworx
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

but im getting this error Failed to provision volume with StorageClass "portworx": services "portworx-service" not found

Name:          postgres-data-staging
Namespace:     lb-staging
StorageClass:  portworx
Status:        Pending
Volume:
Labels:        <none>
Annotations:   kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"postgres-data-staging","namespace":"lb-staging"},"spec":{"access...
               volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/portworx-volume
Capacity:
Access Modes:
Events:
  Type     Reason              Age              From                         Message
  ----     ------              ----             ----                         -------
  Warning  ProvisioningFailed  1s (x2 over 6s)  persistentvolume-controller  Failed to provision volume with StorageClass "portworx": services "portworx-service" not found

I have one node not well configured in my storage but the pods are ok:

/opt/pwx/bin/pxctl status
PX is not running on this host
root@node01:~# /opt/pwx/bin/pxctl status
Status: PX is operational
License: PX-Developer
Node ID: 11d2c5b5-6818-4264-87cf-1e241762c674
        IP: 10.0.1.2
        Local Storage Pool: 1 pool
        POOL    IO_PRIORITY     RAID_LEVEL      USABLE  USED    STATUS  ZONE    REGION
        0       MEDIUM          raid0           20 GiB  2.0 GiB Online  default default
        Local Storage Devices: 1 device
        Device  Path            Media Type              Size            Last-Scan
        0:1     /dev/sda        STORAGE_MEDIUM_MAGNETIC 20 GiB          23 Mar 18 22:12 UTC
        total                   -                       20 GiB
Cluster Summary
        Cluster ID: portworx-storage-0
        Cluster UUID: fb678d93-fcad-4f3b-a85e-188cce5fa9ef
        Nodes: 3 node(s) with storage (2 online)
        IP              ID                                      StorageNode     Used            Capacity        Status
        10.0.1.4        4fd9660b-4206-4b2a-8a41-4e5a96e1d073    Yes             2.0 GiB         20 GiB          Online
        10.0.1.3        3bf347f3-0761-492b-9854-9a0ff17f8c80    Yes             Unavailable     Unavailable     Offline
        10.0.1.2        11d2c5b5-6818-4264-87cf-1e241762c674    Yes             2.0 GiB         20 GiB          Online   (This node)
Global Storage Pool
        Total Used      :  4.0 GiB
        Total Capacity  :  40 GiB

Any idea how to fix this?

If I create the service with this yml:

kind: Service
apiVersion: v1
metadata:
  name: portworx-service
  namespace: kube-system
spec:
  selector:
    name: portworx
  ports:
    - protocol: TCP
      port: 9001
      targetPort: 9001

when I try to create the PVC I got:

Name:          postgres-data-staging
Namespace:     lb-staging
StorageClass:  portworx
Status:        Pending
Volume:
Labels:        <none>
Annotations:   kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"postgres-data-staging","namespace":"lb-staging"},"spec":{"access...
               volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/portworx-volume
Capacity:
Access Modes:
Events:
  Type     Reason              Age   From                         Message
  ----     ------              ----  ----                         -------
  Warning  ProvisioningFailed  0s    persistentvolume-controller  Failed to provision volume with StorageClass "portworx": Get http://10.106.100.183:9001/v1/osd-volumes/versions: dial tcp 10.106.100.183:9001: i/o timeout

this are the ports I have working on those nodes related to px

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 10.0.1.2:9002           0.0.0.0:*               LISTEN      5274/px
tcp        0      0 10.0.1.2:9003           0.0.0.0:*               LISTEN      5257/px-storage
tcp        0      0 0.0.0.0:9004            0.0.0.0:*               LISTEN      4895/px-ns
tcp6       0      0 :::9001                 :::*                    LISTEN      5274/px
tcp6       0      0 :::6060                 :::*                    LISTEN      5274/px
tcp6       0      0 :::9005                 :::*                    LISTEN      5274/px
tcp6       0      0 :::9008                 :::*                    LISTEN      5274/px
tcp6       0      0 :::9009                 :::*                    LISTEN      5257/px-storage
tcp6       0      0 :::9013                 :::*                    LISTEN      4895/px-ns

is like 9001 is not exposed properly

Waiting for API server to respond (DO)

I know that a fix was pushed yesterday but unfortunately this is still not working. It gets further along but ultimately ends up here:

module.kubernetes.null_resource.kubernetes[2]: Provisioning with 'remote-exec'...
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Connecting to remote host via SSH...
module.kubernetes.null_resource.kubernetes[2] (remote-exec):   Host: 139.59.159.48
module.kubernetes.null_resource.kubernetes[2] (remote-exec):   User: root
module.kubernetes.null_resource.kubernetes[2] (remote-exec):   Password: false
module.kubernetes.null_resource.kubernetes[2] (remote-exec):   Private key: false
module.kubernetes.null_resource.kubernetes[2] (remote-exec):   SSH Agent: true
module.kubernetes.null_resource.kubernetes[2] (remote-exec):   Checking Host Key: false
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Connected!
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes.1: Still creating... (2m40s elapsed)
module.kubernetes.null_resource.kubernetes.0: Still creating... (2m40s elapsed)
module.kubernetes.null_resource.kubernetes.2: Still creating... (2m40s elapsed)
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes.2: Still creating... (2m50s elapsed)
module.kubernetes.null_resource.kubernetes.0: Still creating... (2m50s elapsed)
module.kubernetes.null_resource.kubernetes.1: Still creating... (2m50s elapsed)
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[2] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes[1] (remote-exec): Waiting for API server to respond
module.kubernetes.null_resource.kubernetes.1: Still creating... (3m0s elapsed)
module.kubernetes.null_resource.kubernetes.0: Still creating... (3m0s elapsed)
module.kubernetes.null_resource.kubernetes.2: Still creating... (3m0s elapsed)

Need to allow port for weave ?

I had a problem that kube-dns never change the status from ContainerCreating to Running, and it was because weave couldn't establish connections like the following:

$ weave status connections
-> 10.8.57.9:6783        failed      cannot connect to ourself, retry: never
-> 10.8.166.137:6783     retrying    dial tcp4 :0->10.8.166.137:6783: getsockopt: connection timed out
-> 10.8.55.201:6783      retrying    dial tcp4 :0->10.8.55.201:6783: getsockopt: connection timed out

And it was fixed when I allowed the port 6783 on firewall on all nodes like:

ufw allow 6783
ufw reload

Is it actually required to add to the guide, or I wonder if I was missing some configurations.

Question on opening ports in ufw