squat / kilo Goto Github PK
View Code? Open in Web Editor NEWKilo is a multi-cloud network overlay built on WireGuard and designed for Kubernetes (k8s + wg = kg)
Home Page: https://kilo.squat.ai
License: Apache License 2.0
Kilo is a multi-cloud network overlay built on WireGuard and designed for Kubernetes (k8s + wg = kg)
Home Page: https://kilo.squat.ai
License: Apache License 2.0
Hello,
I was wondering if it is possible to run kilo with wireguard-go
or boringtun
, in userspace (instead of relying on kernel module). Is this on the roadmap?
Thanks!
I'm using k3s with flannel and kilo as an add-on, using in-cluster authorization without mounting kubeconfig and with the following args:
--cni=false
--local=false
--encapsulate=never
--subnet=10.40.0.0/16
--hostname=$(NODE_NAME)
--compatibility=flannel
I have a master server in location A and a node in location B.
The master has a public IP and the node is behind a NAT.
I'm setting the following annotations on master:
kilo.squat.ai/force-endpoint: <master_public_ip>:51820
kilo.squat.ai/leader: "true"
kilo.squat.ai/location: A
And these on node:
kilo.squat.ai/force-endpoint: <nat_public_ip>:51820
kilo.squat.ai/persistent-keepalive=5
kilo.squat.ai/location: B
I'm booting the master, then the node and after both servers become ready I don't see the related endpoints of each other on each host (using wg command).
Checking the logs in the kilo on master I see at the end those connection timed out :
{"caller":"mesh.go:219","component":"kilo","level":"warn","msg":"no private key found on disk; generating one now","ts":"2020-02-26T10:06:11.347867897Z"} {"caller":"main.go:217","msg":"Starting Kilo network mesh '12220b790da5ab7fbdcfb1db9d899bec9602261e-dirty'.","ts":"2020-02-26T10:06:12.017759649Z"}
E0226 10:06:12.145623 1 reflector.go:126] pkg/k8s/backend.go:391: Failed to list *v1alpha1.Peer: the server could not find the requested resource (get peers.kilo.squat.ai)
{"caller":"mesh.go:664","component":"kilo","level":"info","msg":"WireGuard configurations are different","ts":"2020-02-26T10:06:13.609772774Z"}
E0226 10:12:51.783433 1 streamwatcher.go:109] Unable to decode an event from the watch stream: read tcp 10.30.2.83:48064->10.43.0.1:443: read: connection timed out
E0226 10:13:20.455365 1 streamwatcher.go:109] Unable to decode an event from the watch stream: read tcp 10.30.2.83:48062->10.43.0.1:443: read: connection timed out
The :1107/health in kilo on master responds with 200 OK.
The logs in the kilo on node seems ok:
{"caller":"mesh.go:219","component":"kilo","level":"warn","msg":"no private key found on disk; generating one now","ts":"2020-02-26T10:18:01.36172736Z"}
{"caller":"main.go:217","msg":"Starting Kilo network mesh '12220b790da5ab7fbdcfb1db9d899bec9602261e-dirty'.","ts":"2020-02-26T10:18:01.387390299Z"}
{"caller":"mesh.go:664","component":"kilo","level":"info","msg":"WireGuard configurations are different","ts":"2020-02-26T10:18:01.853855818Z"}
To resolve this I had to delete the kilo pod on master and the new pod could configured correctly the wg endpoint.
Hi,
I have one master node in region A with a public ip and a worker node in region B behind a NAT (two separate networks).
After deploy Kilo I annotated both nodes to force external ip (master with own public ip and worker with NAT public ip) and to set the related location on each (master: region-a, worker: region-b).
Checking the wireguard peers in the master, with wg command, I can see the peer of the worker, with the NAT public ip as endpoint, but the port is different than the wireguard listen port set on the worker node.
I can also see that the an handshake was made successfully, but after 30s approximately, the Kilo recreate the peer because it detects differences on configuration (log: 'WireGuard configurations are different'), due to the endpoint port and interrupting existing connections.
How can I solve this?
Thanks in advance.
Hi,
I was wondering, what is Kilo in comparison to Service Mesh softwares such as Linkerd, Linkerd2 (formerly Conduit), Consul and Istio?
Service Mesh softwares list source: https://kubedex.com/istio-vs-linkerd-vs-linkerd2-vs-consul/
I heard about Kilo by reading the schedule of the 2019 Open Networking Summit Europe conference:
Connecting Kubernetes Clusters Across Clouds With Kilo
Source: https://events.linuxfoundation.org/events/open-networking-summit-europe-2019/
I think I understand the difference between Kilo and the Container Network Interface (CNI like Calico and Flannel): the scope of a CNI is one node. CNI is a network plugin for the containers/pods running in one node.
But I don't understand the difference between Kilo and Service Mesh softwares like Istio.
My initial understanding is that Kilo is different than Istio & co. Kilo connects together K8s nodes that can span across data centers and public clouds via WireGuard VPN.
Therefore, thanks to Kilo, it is like if the apps and services running on these distributed K8s nodes within these different Cloud operators where running on the same "virtual (overlay) cloud".
So maybe Kilo is working at a lower level than Service Mesh solutions like Istio?
But I am not sure about this.
Thank you for any input you can share on this question!
Sorry if this post/question does not suit the github issues topics.
--
Nicop311
Problem:
I'm using squat/kilo:amd64-c93fa1e5b194e5d0a847f0775033bed92251f4d6
Add a new node ten-vm1
got err info kilo.squat.ai/internal-ip
with 127.0.0.1
, cause err route in other node:
[root@hw-vm1 ~]# route -n |grep kilo
127.0.0.1 10.4.0.3 255.255.255.255 UGH 0 0 0 kilo0
Resolv:
Cur I just use kubectl annotate node ten-vm1 kilo.squat.ai/internal-ip="172.21.0.xx/32" --overwrite=true
to fix this err info.
Hello,
I am trying to set up a VPN only kilo installation on a Rancher RKE / Canal based cluster.
I ended up using the manifest suggested in #30, making sure it is set to go to a single node. It installs fine, and the kilo pod is running.
When I add a peer using the suggested manifest (filling in public key), the kilo pod has an error as follows:
{"caller":"main.go:217","msg":"Starting Kilo network mesh 'ba00b6c180d40bd73fc94af5be3bbf8f85789bf9'.","ts":"2020-02-22T20:54:17.234829806Z"}
{"caller":"cni.go:58","component":"kilo","err":"failed to read CNI config list file: error reading /etc/cni/net.d/10-kilo.conflist: open /etc/cni/net.d/10-kilo.conflist: no such file or directory","level":"warn","msg":"failed to get CIDR from CNI file; overwriting it","ts":"2020-02-22T20:54:17.336024857Z"}
{"caller":"cni.go:66","component":"kilo","level":"info","msg":"CIDR in CNI file is empty","ts":"2020-02-22T20:54:17.336108147Z"}
{"CIDR":"10.42.3.0/24","caller":"cni.go:71","component":"kilo","level":"info","msg":"setting CIDR in CNI file","ts":"2020-02-22T20:54:17.336140049Z"}
{"caller":"cni.go:73","component":"kilo","err":"failed to read CNI config list file: open /etc/cni/net.d/10-kilo.conflist: no such file or directory","level":"warn","msg":"failed to set CIDR in CNI file","ts":"2020-02-22T20:54:17.33616623Z"}
{"caller":"mesh.go:482","component":"kilo","event":"add","level":"info","peer":{"AllowedIPs":[{"IP":"10.5.0.1","Mask":"/////w=="}],"Endpoint":null,"PersistentKeepalive":10,"PublicKey":"<...>","Name":"squat"},"ts":"2020-02-22T20:54:17.486646157Z"}
{"caller":"mesh.go:717","component":"kilo","error":"failed to delete configuration file: remove /var/lib/kilo/conf: no such file or directory","level":"error","ts":"2020-02-22T20:54:17.486913746Z"}
{"caller":"mesh.go:727","component":"kilo","error":"failed to clean up node backend: failed to patch node: the server rejected our request due to an error in our request","level":"error","ts":"2020-02-22T20:54:17.493099325Z"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x11f6048]
goroutine 31 [running]:
github.com/squat/kilo/pkg/mesh.(*Mesh).resolveEndpoints(0xc00023c580, 0x16, 0x0)
/kilo/pkg/mesh/mesh.go:764 +0x2f8
github.com/squat/kilo/pkg/mesh.(*Mesh).applyTopology(0xc00023c580)
/kilo/pkg/mesh/mesh.go:560 +0xc7
github.com/squat/kilo/pkg/mesh.(*Mesh).syncPeers(0xc00023c580, 0xc00035c060)
/kilo/pkg/mesh/mesh.go:483 +0x51d
github.com/squat/kilo/pkg/mesh.(*Mesh).Run(0xc00023c580, 0x0, 0x0)
/kilo/pkg/mesh/mesh.go:350 +0x632
main.Main.func4(0x0, 0x0)
/kilo/cmd/kg/main.go:218 +0x14a
github.com/oklog/run.(*Group).Run.func1(0xc00049f740, 0xc0003a7ca0, 0xc0002aa860)
/kilo/vendor/github.com/oklog/run/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
/kilo/vendor/github.com/oklog/run/group.go:37 +0xbb
According to the manifest I am using, CNI is not enabled (--cni=false
) but it is still looking for this file. Any suggestions on how to get this working?
Thanks,
Ben
Hi squat,
In the issue #9 that I posted,
I first made a WireGuard connection between VmOnAWS
and VmOnGCP
,
then created a K8s cluster with those nodes,
and applied Kilo as a final step.
But in the issue #8, you said that
For Kilo to pick up an existing Wireguard interface on the host is not supported.
So I wanted to re-create a K8s cluster with nodes in different cloud service providers,
and here is the point where my questions arise.
When I follow the Kilo installation guide..
I am using Ubuntu as an guest VM's OS, so I installed WireGuard with apt install wireguard
.
root@VmOnAWS# which wg
/usr/bin/wg
And for the clean state, I destroyed the existing WireGuard connection between VmOnAWS
and VmOnGCP
.
# wg-quick down ./wg0.conf
[#] wg showconf wg0
[#] ip link delete dev wg0
I opened UDP port 51820 for my AWS SecurityGroup and GCP SecurityGroup.
The instruction makes me kubectl annotate
k8s nodes,
but I have no k8s cluster since I started from the clean state,
neither can I make a k8s cluster (without using WireGuard or sth) since one node is in AWS and another is in GCP.
One of the my guesses is that
the k8s cluster on AWS
and a VM on GCP
.the VM on GCP
a worker node of the k8s cluster on AWS
.Do you think this will work / make sense..?
==[vm-route]===============
[root@ali-vm1 v070]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.16.175.253 0.0.0.0 UG 0 0 0 eth0
2.3.0.0 0.0.0.0 255.255.255.0 U 0 0 0 br-6d0b493fbd9b
7.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
7.0.1.0 10.4.0.2 255.255.255.0 UG 0 0 0 kilo0
10.4.0.0 0.0.0.0 255.255.0.0 U 0 0 0 kilo0
169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 eth0
172.16.160.0 0.0.0.0 255.255.240.0 U 0 0 0 eth0
172.17.91.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0
172.19.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-6cec6875d930
192.168.0.105 10.4.0.2 255.255.255.255 UGH 0 0 0 kilo0
[root@hw-vm1 v070]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.0.1 0.0.0.0 UG 0 0 0 eth0
7.0.0.0 10.4.0.1 255.255.255.0 UG 0 0 0 kilo0
10.4.0.0 0.0.0.0 255.255.0.0 U 0 0 0 kilo0
169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 eth1
169.254.0.0 0.0.0.0 255.255.0.0 U 1004 0 0 eth2
169.254.169.254 192.168.0.254 255.255.255.255 UGH 0 0 0 eth0
172.16.168.255 10.4.0.1 255.255.255.255 UGH 0 0 0 kilo0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-f5da666f520e
172.19.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-873639ae95ca
172.20.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-d9e7fbf26b47
172.21.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-2be4dd4a63ad
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2
192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
==[podOK]==================
kube-system rbac-manager-79bdb8757d-hdhgq 1/1 Running 1 34h 7.0.0.58 ali-vm1
[root@ali-vm1 v070]# ping 7.0.0.58
PING 7.0.0.58 (7.0.0.58) 56(84) bytes of data.
64 bytes from 7.0.0.58: icmp_seq=1 ttl=64 time=0.054 ms
^C
[root@hw-vm1 v070]# ping 7.0.0.58
PING 7.0.0.58 (7.0.0.58) 56(84) bytes of data.
64 bytes from 7.0.0.58: icmp_seq=1 ttl=63 time=26.6 ms
64 bytes from 7.0.0.58: icmp_seq=2 ttl=63 time=26.6 ms
==[svc]================
[root@(⎈ |default:kube-system) t-nat]$ kc get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default agola-gateway NodePort 6.7.9.161 <none> 8000:30002/TCP 3d21h
[root@ali-vm1 v070]# yum install nc
[root@ali-vm1 v070]# nc -vz 6.7.9.161 8000
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 6.7.9.161:8000.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
[root@ali-vm1 v070]# curl 6.7.9.161:8000
<!DOCTYPE html><html lang=en><head><meta
[root@ali-vm1 v070]# curl 6.7.8.1:443
Client sent an HTTP request to an HTTPS server.
[root@hw-vm1 v070]# nc -vz 6.7.9.161 8000
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 6.7.9.161:8000.
Ncat: 0 bytes sent, 0 bytes received in 0.04 seconds.
[root@hw-vm1 v070]# curl 6.7.9.161:8000
<!DOCTYPE html><html lang=en><head><meta
[root@hw-vm1 v070]# curl 6.7.8.1:443
^C
installing kilo on 4 nodes, 2 different public network space, does kilo encrypt comms between nodes ?
how does one validate this encryption, meaning if it "automagically" encrypts node comunications how can i verify it from node to node?
also, can my remote "workstation" (laptop) from far away be a client to the cluster, and also utilize the vpn for internet access along with management. Sorry the docs werent so clear to me. kilo is installed.
feature: nodes check connection to each other, for example by ping, get the ttl show in kgctl and for metrics
Using kilo as cni in kubeadm (kilo/manifests/kilo-kubeadm.yaml
file), I'm getting the following error in all nodes (ubuntu18.04 and centos7):
{"caller":"mesh.go:618","component":"kilo","error":"failed to add rule: failed to add iptables chain: running [/sbin/ip6tables -t nat -N KILO-NAT --wait]: exit status 3: modprobe: can't change directory to '/lib/modules': No such file or directory\nip6tables v1.8.4 (legacy): can't initialize ip6tables table `nat': Table does not exist (do you need to insmod?)\nPerhaps ip6tables or your kernel needs to be upgraded.\n","level":"error","ts":"2020-05-07T13:59:26.463594297Z"}
But after than I execute the command /sbin/ip6tables -t nat -N KILO-NAT --wait
directly on the node kilo starts working, configuring correctly the pod network and wireguard conf.
feature request:
kgctl under releases for linux/macosx/win
meanwhile:
kgctl as docker image => https://hub.docker.com/r/mrhein/kgctl
It doesn't look like there is any way to hit the last 6 rules:
-A KILO-NAT -d 172.30.12.0/22 -m comment --comment "Kilo: do not NAT packets destined for the local Pod subnet" -j RETURN
-A KILO-NAT -d 172.28.128.0/24 -m comment --comment "Kilo: do not NAT packets destined for the Kilo subnet" -j RETURN
-A KILO-NAT -d 10.255.255.254/32 -m comment --comment "Kilo: do not NAT packets destined for the local private IP" -j RETURN
-A KILO-NAT -m comment --comment "Kilo: NAT remaining packets" -j MASQUERADE
-A KILO-NAT -s 172.30.12.0/22 -d 172.28.129.1/32 -m comment --comment "Kilo: do not NAT packets from local pod subnet to peers" -j RETURN
-A KILO-NAT -s 172.30.12.0/22 -d 192.168.1.0/24 -m comment --comment "Kilo: do not NAT packets from local pod subnet to peers" -j RETURN
-A KILO-NAT -s 172.30.12.0/22 -d 172.30.4.0/22 -m comment --comment "Kilo: do not NAT packets from local pod subnet to remote pod subnets" -j RETURN
-A KILO-NAT -s 172.30.12.0/22 -d 172.30.0.0/22 -m comment --comment "Kilo: do not NAT packets from local pod subnet to remote pod subnets" -j RETURN
-A KILO-NAT -s 172.30.12.0/22 -d 172.30.8.0/22 -m comment --comment "Kilo: do not NAT packets from local pod subnet to remote pod subnets" -j RETURN
-A KILO-NAT -s 172.30.12.0/22 -d 172.30.16.0/22 -m comment --comment "Kilo: do not NAT packets from local pod subnet to remote pod subnets" -j RETURN
Here's the stacktrace:
{"caller":"mesh.go:617","component":"kilo","error":"failed to delete rule: failed to clear iptables chain: running [/sbin/iptables -t filter -F KILO-IPIP --wait]: exit status 4: iptables: Resource temporarily unavailable.\n","level":"error","ts":"2020-04-19T18:33:48.316238028Z"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x104fe51]
goroutine 42 [running]:
github.com/squat/kilo/pkg/iptables.(*Controller).reconcile(0xc00011b810, 0x0, 0x0)
/kilo/pkg/iptables/iptables.go:246 +0xe1
github.com/squat/kilo/pkg/iptables.(*Controller).Run.func1(0xc00011b810, 0xc000088180)
/kilo/pkg/iptables/iptables.go:230 +0x11f
created by github.com/squat/kilo/pkg/iptables.(*Controller).Run
/kilo/pkg/iptables/iptables.go:222 +0xc3
After setting up kilo on k3s, i noticed that although Kilo is running, istio-ingressgateway pod can't seem to reach the istiod pod. in the logs you'll notice the the connection is refused.
I'm using coredns with kilo's CNI (no flannel).
Any advice would be greatly appreciated!
2020-06-12T23:29:50.578577Z warning envoy config [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:54] Unable to establish new stream
2020-06-12T23:29:50.989714Z warn cache resource:default request:54d7ab7d-1db1-4fb5-ae9b-1b33c85912ee CSR failed with error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.43.0.10:53: read udp 10.42.2.9:51000->10.43.0.10:53: read: connection refused", retry in 6400 millisec
2020-06-12T23:29:50.991787Z warning envoy config [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamSecrets gRPC config stream closed: 14, connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.43.0.10:53: read udp 10.42.2.9:51000->10.43.0.10:53: read: connection refused"
2020-06-12T23:29:50.990179Z error citadelclient Failed to create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.43.0.10:53: read udp 10.42.2.9:51000->10.43.0.10:53: read: connection refused"
2020-06-12T23:29:50.990266Z error cache resource:default request:54d7ab7d-1db1-4fb5-ae9b-1b33c85912ee CSR retrial timed out: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.43.0.10:53: read udp 10.42.2.9:51000->10.43.0.10:53: read: connection refused"
2020-06-12T23:29:50.990327Z error cache resource:default failed to generate secret for proxy: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.43.0.10:53: read udp 10.42.2.9:51000->10.43.0.10:53: read: connection refused"
2020-06-12T23:29:50.990366Z error sds resource:default Close connection. Failed to get secret for proxy "router~10.42.2.9~istio-ingressgateway-74d4d8d459-wlt8p.istio-system~istio-system.svc.cluster.local" from secret cache: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.43.0.10:53: read udp 10.42.2.9:51000->10.43.0.10:53: read: connection refused"
Trying to setup kilo on a k3s cluster with three local nodes and one remote node.
All four nodes are showing this in the logs over and over:
{"caller":"mesh.go:639","component":"kilo","error":"address already in use","level":"error","ts":"2020-04-08T02:37:36.742997397Z"}
{"caller":"mesh.go:639","component":"kilo","error":"address already in use","level":"error","ts":"2020-04-08T02:37:44.982264977Z"}
{"caller":"mesh.go:639","component":"kilo","error":"address already in use","level":"error","ts":"2020-04-08T02:39:57.25050876Z"}
{"caller":"mesh.go:639","component":"kilo","error":"address already in use","level":"error","ts":"2020-04-08T02:39:57.402066518Z"}
{"caller":"mesh.go:639","component":"kilo","error":"address already in use","level":"error","ts":"2020-04-08T02:39:59.314872301Z"}
{"caller":"mesh.go:639","component":"kilo","error":"address already in use","level":"error","ts":"2020-04-08T02:40:09.406198759Z"}
{"caller":"mesh.go:639","component":"kilo","error":"address already in use","level":"error","ts":"2020-04-08T02:40:16.246194963Z"}
{"caller":"mesh.go:639","component":"kilo","error":"address already in use","level":"error","ts":"2020-04-08T02:40:19.455699414Z"}
I'm not really sure what address is even in use so I can't debug on my end.
Any guidance?
Hi @squat,
I have a node on a location B with all the peer created correctly, but its peer is not being created on the leader side, on location A, with this message showing on the kilo leader logs:
{"caller":"mesh.go:382","component":"kilo","event":"update","level":"debug","msg":"received incomplete node","node":{"Endpoint":{"DNS":"","IP":"192.168.50.13","Port":51820},"Key":"MHFPNW0zR3oxMFBlRkV0UUNRNGcxNTNvcTFnLzZnbE15WUJ1K2Q2d1JDTT0=","InternalIP":{"IP":"192.168.50.13","Mask":"////AA=="},"LastSeen":1594277590,"Leader":false,"Location":"admin-c7z87-oce-systel-ne-1","Name":"admin-c7z87-oce-systel-ne-1","PersistentKeepalive":5,"Subnet":{"IP":"10.42.39.0","Mask":"////AA=="},"WireGuardIP":{"IP":"10.40.0.2","Mask":"//8AAA=="}},"ts":"2020-07-11T10:52:59.048513869Z"}
Do you know what may be causing this?
Thanks in advance.
Hi Lucas!
Thank you for this great project!
I am trying to set up a multi provider k3s cluster using kilo. The machines roughly look like:
oci
location - 2 machines (both only has local ip address assigned to the local interfaces, ext ips are managed via the internet gateway of the cloud provider)gcp
location - 1 machineI haven't got to doing a multi provider setup yet. I am still trying to get the 2 machines in oci
to talk to each other.
I am trying to use kilo as CNI directly, the network configuration is as follows:
oci-master
- internal ip 10.1.20.3, external (using placeholders here)
oci-worker
- internal ip 10.1.20.2, external
The machines can ping each other directly using the 10.1.20.x addresses.
My issue is that, once they come up, I can't get the pods launched on each machine to talk to each other.
I can ping pods on the machines that runs it, but not from master -> worker and vice versa.
on my laptop
> kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
my-nginx-74f94c7795-j7kzv 1/1 Running 0 99m 10.42.1.5 oci-worker <none> <none>
but on oci-master
> ping 10.42.1.5
PING 10.42.1.5 (10.42.1.5): 56 data bytes
^C--- 10.42.1.5 ping statistics ---
4 packets transmitted, 0 packets received, 100% packet loss
I think i should be able to reach every pod from any node on the cluster (AFAIK).
Please let me know if there is additional info that would be helpful to include!
I provisioned the machines as follows:
oci-master
:
k3sup install \
--ip <ext-master-ip> \
--k3s-version 'v1.17.0+k3s.1' \
--k3s-extra-args '--no-flannel --no-deploy metrics-server --no-deploy servicelb --no-deploy traefik --default-local-storage-path /k3s-local-storage --node-name oci-master --node-external-ip <ext-master-ip> --node-ip 10.1.20.3'
kubectl annotate node oci-master \
kilo.squat.ai/force-external-ip="<ext-ip-master>/32" \
kilo.squat.ai/force-internal-ip="10.1.20.3/24" \
kilo.squat.ai/location="oci" \
kilo.squat.ai/leader="true"
oci-worker
:
k3sup join \
--ip <ext-worker-ip> \
--server-ip <ext-master-ip> \
--k3s-version 'v1.17.0+k3s.1' \
--k3s-extra-args '--no-flannel --node-name oci-worker --node-external-ip <ext-worker-ip> --node-ip 10.1.20.2''
kubectl annotate node oci-worker \
kilo.squat.ai/force-external-ip="<ext-worker-ip>/32" \
kilo.squat.ai/force-internal-ip="10.1.20.2/24" \
kilo.squat.ai/location="oci"
Finally setting up kilo
kubectl apply -f k3s-kilo.yaml
I had to do the same changes suggested in #11 and #27 to make sure that kilo pods has the correct permissions, but I was able to get the pods to come up correctly.
I am able see logs like these when taking pod logs (with log-level=debug)
on oci-master
{"caller":"mesh.go:410","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2020-02-09T09:12:46.095414595Z"}
{"caller":"mesh.go:412","component":"kilo","event":"update","level":"debug","msg":"processing local node","node":{"ExternalIP":{"IP":"<ext-ip-master>","Mask":"/////w=="},"Key":"<key>","InternalIP":{"IP":"10.1.20.3","Mask":"////AA=="},"LastSeen":1581239566,"Leader":true,"Location":"oci","Name":"oci-master","Subnet":{"IP":"10.42.0.0","Mask":"////AA=="},"WireGuardIP":{"IP":"10.4.0.1","Mask":"//8AAA=="}},"ts":"2020-02-09T09:12:46.095454981Z"}
on oci-worker
{"caller":"mesh.go:410","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2020-02-09T10:44:48.564218597Z"}
{"caller":"mesh.go:508","component":"kilo","level":"debug","msg":"successfully checked in local node in backend","ts":"2020-02-09T10:45:18.478913052Z"}
{"caller":"mesh.go:675","component":"kilo","level":"debug","msg":"local node is not the leader","ts":"2020-02-09T10:45:18.4804814Z"}
{"caller":"mesh.go:410","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2020-02-09T10:45:18.481320232Z"}
{"caller":"mesh.go:412","component":"kilo","event":"update","level":"debug","msg":"processing local node","node":{"ExternalIP":{"IP":"<ext-ip-worker>","Mask":"/////w=="},"Key":"<key>","InternalIP":{"IP":"10.1.20.2","Mask":"////AA=="},"LastSeen":1581245118,"Leader":false,"Location":"oci","Name":"oci-worker","Subnet":{"IP":"10.42.1.0","Mask":"////AA=="},"WireGuardIP":null},"ts":"2020-02-09T10:45:18.481367592Z"}
oci-master
> ifconfig
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.1.20.3 netmask 255.255.255.0 broadcast 10.1.20.255
inet6 fe80::200:17ff:fe02:2f31 prefixlen 64 scopeid 0x20<link>
ether 00:00:17:02:2f:31 txqueuelen 1000 (Ethernet)
RX packets 945623 bytes 2361330833 (2.3 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 851708 bytes 304538145 (304.5 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
kilo0: flags=209<UP,POINTOPOINT,RUNNING,NOARP> mtu 1420
inet 10.4.0.1 netmask 255.255.0.0 destination 10.4.0.1
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 1354843 bytes 457783326 (457.7 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1354843 bytes 457783326 (457.7 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
tunl0: flags=193<UP,RUNNING,NOARP> mtu 8980
inet 10.42.0.1 netmask 255.255.255.255
tunnel txqueuelen 1000 (IPIP Tunnel)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 5 bytes 420 (420.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
> ip route
default via 10.1.20.1 dev ens3
default via 10.1.20.1 dev ens3 proto dhcp src 10.1.20.3 metric 100
10.1.20.0/24 dev ens3 proto kernel scope link src 10.1.20.3
10.4.0.0/16 dev kilo0 proto kernel scope link src 10.4.0.1
10.42.1.0/24 via 10.1.20.2 dev tunl0 proto static onlink
oci-worker
> ifconfig
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.1.20.2 netmask 255.255.255.0 broadcast 10.1.20.255
inet6 fe80::200:17ff:fe02:1682 prefixlen 64 scopeid 0x20<link>
ether 00:00:17:02:16:82 txqueuelen 1000 (Ethernet)
RX packets 231380 bytes 781401888 (781.4 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 221393 bytes 29979034 (29.9 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
kube-bridge: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.42.1.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::38f7:34ff:fed9:897e prefixlen 64 scopeid 0x20<link>
ether 26:d7:aa:ce:37:f8 txqueuelen 1000 (Ethernet)
RX packets 21865 bytes 10732037 (10.7 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 19269 bytes 7046706 (7.0 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 78258 bytes 29977684 (29.9 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 78258 bytes 29977684 (29.9 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
tunl0: flags=193<UP,RUNNING,NOARP> mtu 8980
inet 10.42.1.1 netmask 255.255.255.255
tunnel txqueuelen 1000 (IPIP Tunnel)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 10 bytes 840 (840.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth5ee1a633: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::24d7:aaff:fece:37f8 prefixlen 64 scopeid 0x20<link>
ether 26:d7:aa:ce:37:f8 txqueuelen 0 (Ethernet)
RX packets 12748 bytes 10219673 (10.2 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 9890 bytes 4818258 (4.8 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth965708c2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::9cfc:9dff:fef1:dc7a prefixlen 64 scopeid 0x20<link>
ether 9e:fc:9d:f1:dc:7a txqueuelen 0 (Ethernet)
RX packets 22 bytes 1636 (1.6 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 21 bytes 1754 (1.7 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vethd34408af: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::5077:76ff:fe3a:1b01 prefixlen 64 scopeid 0x20<link>
ether 52:77:76:3a:1b:01 txqueuelen 0 (Ethernet)
RX packets 9091 bytes 816526 (816.5 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 9442 bytes 2233086 (2.2 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
> ip route
default via 10.1.20.1 dev ens3
default via 10.1.20.1 dev ens3 proto dhcp src 10.1.20.2 metric 100
10.1.20.0/24 dev ens3 proto kernel scope link src 10.1.20.2
10.4.0.1 via 10.1.20.3 dev tunl0 proto static onlink
10.42.0.0/24 via 10.1.20.3 dev tunl0 proto static onlink
10.42.1.0/24 dev kube-bridge proto kernel scope link src 10.42.1.1
169.254.0.0/16 dev ens3 proto dhcp scope link src 10.1.20.2 metric 100
Interestingly setting up another machine with a different region, I was able to see that the wireguard interfaces come up with correct allowed-ips and was even able to ping 10.1.20.2 (oci-worker) directly over wireguard.
Presumably that is going from gcp-worker
-> oci-master
(leader for oci location) -> oci-worker
Hi squat!
I encountered a crash on latest develop (c93fa1e):
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x123ae3e]
goroutine 64 [running]:
github.com/squat/kilo/pkg/route.(*Table).Run.func1(0xc000420390, 0xc00007f2c0, 0xc00007e780)
/kilo/pkg/route/route.go:96 +0x24e
created by github.com/squat/kilo/pkg/route.(*Table).Run
/kilo/pkg/route/route.go:80 +0x1d9
It happens when I run:
sudo systemctl restart systemd-networkd
The event that is breaking kilo is (with my ipv6 gateway redacted):
{Ifindex: 2 Dst: <nil> Src: <nil> Gw: 0123:4567:89ab:cdef:: Flags: [] Table: 254}
Once kilo restarts, the old interface still exists so kilo tries to create a second interface and fails miserably unless I intervene:
{"caller":"mesh.go:666","component":"kilo","error":"address already in use","level":"error","ts":"2019-08-27T15:31:41.993388636Z"}
There's two issues here:
kilo0
every time?I have two subnets in two different locations I would like to connect. Each subnet has two nodes each. I use k3s, so I tried applying the kilo-k3s.yaml
manifest to connect one node in each location. When I tried running pods that pinged each other on different nodes, the pods could not connect to other nodes. So, I tried enabling the --mesh-granularity=full
flag and applied the manifest again. It went through without any problems and my pods could talk to each other.
Any idea how I can debug the problem? I would like to only have one node in each logical group connected.
Hi squat!
Is there a way to completely disable the private IP? I have hosts that do not have a private interface. Currently I'm forcing the private IP to a random IP that doesn't exist but it still adds it to the allowed ips list.
Is it correct that the video, https://www.youtube.com/watch?v=iPz_DAOOCKA, isn't referenced anywhere?
If so, that would be a shame, shall I make a PR?
Gravity is a platform that allows us to build K8s clusters declaratively, and is a pretty powerful tool I've started experimenting with as part of my devops toolkit.
It has its own implementation of wireguard (wormhole) that helps create a mesh, similar to kilo, but kilo provides easy peering functionality with kgctl
.
I'd love to start a conversation about how we can make a .yaml deployment for gravity clusters. I'm able to pretty seamlessly get kilo up and running on gravity. the only issue right now is that although the wireguard kilo interface shows up, it appears that kilo/kgctl is never able to pull the nodes and properly apply the wireguard config.
Hi Squat!
It would be great if it was possible to run kg
as a peer. Basically, I want a process watching kubernetes for updates and auto-updating my wireguard interface with the new routes.
I've hacked it into the codebase here:
master...SerialVelocity:kg-peer
but it isn't complete or very pretty code!
Hi
I am trying to apply Kilo on K8s cluster, but an error appears when I kubectl apply -f kilo.yaml
.
root@VmOnAWS# wg show all
(WireGuard Server, 10.10.10.10
)
interface: wg0
public key: (masked)
private key: (hidden)
listening port: 51820
peer: (masked)
endpoint: 34.97.48.xxx:51820
allowed ips: 10.10.10.12/32
latest handshake: 1 minute, 30 seconds ago
transfer: 297.40 MiB received, 1.47 GiB sent
root@VmOnGCP# wg show all
(WireGuard Client, 10.10.10.12
)
interface: wg0
public key: (masked)
private key: (hidden)
listening port: 51820
peer: (masked)
endpoint: 15.164.170.xxx:51820
allowed ips: 10.10.10.10/32
latest handshake: 1 minute, 24 seconds ago
transfer: 1.47 GiB received, 297.44 MiB sent
persistent keepalive: every 25 seconds
# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-172-31-17-xx NotReady master 24m v1.15.0 172.31.17.xxx <none> Ubuntu 18.04.2 LTS 4.15.0-1044-aws docker://18.9.7
nodes-wg2 NotReady <none> 21m v1.15.0 10.174.0.xxx <none> Ubuntu 18.04.2 LTS 4.15.0-1037-gcp docker://18.9.7
ip-172-31-17-xx
is on AWS.nodes-wg2
is on GCP.NotReady
.# kubectl annotate node ip-172-31-17-xx kilo.squat.ai/location="aws"
node/ip-172-31-17-xx annotated
# kubectl annotate node nodes-wg2 kilo.squat.ai/location="gcp"
node/nodes-wg2 annotated
# kubectl apply -f https://raw.githubusercontent.com/squat/kilo/master/manifests/kilo-kubeadm.yaml
configmap/kilo created
serviceaccount/kilo created
clusterrole.rbac.authorization.k8s.io/kilo created
clusterrolebinding.rbac.authorization.k8s.io/kilo created
error: error validating "https://raw.githubusercontent.com/squat/kilo/master/manifests/kilo-kubeadm.yaml": error validating data: ValidationError(DaemonSet.spec): missing required field "selector" in
io.k8s.api.apps.v1.DaemonSetSpec; if you choose to ignore these errors, turn validation off with --validate=false
# kubectl apply -f https://raw.githubusercontent.com/squat/kilo/master/manifests/kilo-kubeadm.yaml --validate=false
configmap/kilo unchanged
serviceaccount/kilo unchanged
clusterrole.rbac.authorization.k8s.io/kilo unchanged
clusterrolebinding.rbac.authorization.k8s.io/kilo unchanged
The DaemonSet "kilo" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{"app.kubernetes.io/name":"kilo"}: `selector` does not match template `labels`
Is there a way to resolve this issue/problem?
If I add the kilo.squat.ai/persistent-keepalive
annotation, or update the value of it, on an existing node running kilo already, the wireguard conf is not updated.
sudo kubectl apply -f https://raw.githubusercontent.com/squat/kilo/master/manifests/kilo-k3s-flannel.yaml
serviceaccount/kilo created
clusterrole.rbac.authorization.k8s.io/kilo created
clusterrolebinding.rbac.authorization.k8s.io/kilo created
daemonset.apps/kilo created
sudo kubectl logs -f kilo-cz64w -n kube-system
failed to create Kubernetes config: Error loading config file "/etc/kubernetes/kubeconfig": read /etc/kubernetes/kubeconfig: is a directory
I think the problem is with kilo-k3s-flannel.yaml:99
.
Hi
First of all thank you for your awesome work with this project, much appreciated.
We test kilo currently with our clusters that we deploy with RKE and import to rancher later on. We use it as CNI provider and in a full-mesh layout.
We used the kilo-k3.yaml as our reference and had to lower the mtu
setting in the cni-conf.json configmap to 1300. The rancher-node-agent tries to open a wss://
connection to the rancher server which did not succeed with the original 1420 setting. While the 1300 was just our first lucky shot, it might be worth further testing to be as high as possible but we had no problems with this setting so far. Do you think this is worth documenting in this project? If yes, could you suggest a good place (maybe another file in manifests
) so that I can suggest a PR?
While setting up kilo on a k3s cluster I noticed that it uses -kubeconfig
, or -master
to get the config that is used when interfacing with the cluster. This code can be seen here.
This seems like a security problem - why should kilo require access to my kubeconfig, which contains credentials that have the power to do anything to the cluster? Moreover, it seems redundant: I looked through kilo-k3s-flannel.yaml
(which is what I used to get it working) and noticed that a service account is created for kilo with all of the permissions it should need.
This example (see main.go) uses this function to get the config. Can kilo not use this function instead?
I'm new to interfacing applications with kubernetes clusters, so if I'm missing something my apologies. If it's be welcome I'd be happy to submit a pull request for this.
Will Kilo pick up an existing Wireguard interface on the host?
To be clear, I'd consider it a feature (though it's conceivably an anti-feature I suppose) - it would mean having it bound to a physical interface in a different network namespace was automatically supported.
Hi
I've seen this commit which adds support to arm architectures, but it doesn't work. It returns this:
no matching manifest for linux/arm/v7 in the manifest list entries
I'm doing anything wrong?
Is it possible to specify the internal IP that a node should use?
Or, if not that granular (or not possible/desirable for some reason?) - a CIDR block for the location?
Project looks great by the way - I'm currently running Wireguard on the hosts and using kube-router for CNI, so tempted to collapse that into just kilo.
kubectl apply -f https://raw.githubusercontent.com/squat/modulus/master/wireguard/daemonset.yaml
error: unable to recognize "https://raw.githubusercontent.com/squat/modulus/master/wireguard/daemonset.yaml": no matches for kind "DaemonSet" in version "extensions/v1beta1"
@squat Hi, there with new feed back:
1. How the node's kilo public key save or when it will change?
In my home use wg-vpn conn to the cluster, after I've moved my cluster's master from ali-vm1 to hw-vm1(the etcd data no change, just mv), the both nodes been recreated(k3s in docker).
Then i found the kilo public key of node changed, while the other info all be the same.
the former key:
[Peer]
AllowedIPs = 7.0.0.0/24, 172.16.168.255/32, 10.4.0.1/32
Endpoint = 47.98.xxx.xxx:51820
PersistentKeepalive = 10
PublicKey = nOWT---------N1GnXXz+0UiseSOYOrq14Nz4=
[Peer]
AllowedIPs = 7.0.1.0/24, 192.168.0.105/32, 10.4.0.2/32
Endpoint = 139.159.xxx.xxx:51820
PersistentKeepalive = 10
PublicKey = fQzIcE5--------------MZJdH9wzq9eKGogDO9fWmc=
the new key:
[Peer]
AllowedIPs = 7.0.0.0/24, 172.16.168.255/32, 10.4.0.1/32
Endpoint = 47.98.xxx.xxx:51820
PersistentKeepalive = 10
PublicKey = b/q4LgcU--------------Hpj/fJVzjfn3bNygZE2cwqwE=
[Peer]
AllowedIPs = 7.0.1.0/24, 192.168.0.105/32, 10.4.0.2/32
Endpoint = 139.159.xxx.xxx:51820
PersistentKeepalive = 10
PublicKey = 9lKn---------2zwlrCJjcXXthXnKWWRrp8LWlU=
2. kilo's net didn't auto clean, when the node restart/recreate or just kilo redeploy?
When the node reboot/recreate, that's way kilo will recreate. but the former kilo0 net device not cleand, and the route exist too. this will cause conn between nodes in failure.
1).I just use this by hand to clean.
ifconfig kilo0 down
ip link delete dev kilo0
2).but the progress of wg still exist. How to clean progress?
[root@ali-vm1 ~]# ps -ef |grep wg
root 7659 2 0 14:17 ? 00:00:00 [wg-crypt-kilo0]
root 10670 2 0 19:20 ? 00:00:00 [kworker/u2:1-wg]
root 15086 2 0 16:15 ? 00:00:00 [wg-crypt-kilo2]
root 20223 2 0 19:40 ? 00:00:01 [kworker/0:2-wg-]
root 22389 2 0 19:44 ? 00:00:01 [kworker/0:4-wg-]
root 25012 2 0 19:50 ? 00:00:01 [kworker/0:5-wg-]
root 27801 26236 0 19:55 pts/0 00:00:00 grep --color=auto wg
Stacktrace:
panic: runtime error: invalid memory address or nil pointer dereference
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x10985f3]
goroutine 11 [running]:
github.com/squat/kilo/pkg/iptables.deleteFromIndex(0x0, 0xc00051dee0, 0xc00051ded8, 0xc0003660c0)
/kilo/pkg/iptables/iptables.go:226 +0x93
github.com/squat/kilo/pkg/iptables.(*Controller).CleanUp(0xc00051dec0, 0x0, 0x0)
/kilo/pkg/iptables/iptables.go:265 +0x83
github.com/squat/kilo/pkg/mesh.(*Mesh).cleanUp(0xc0002066e0)
/kilo/pkg/mesh/mesh.go:708 +0x47
panic(0x138a7a0, 0x26d5c10)
/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/squat/kilo/pkg/iptables.(*Controller).Set(0xc00051dec0, 0xc000232f00, 0x13, 0x14, 0x0, 0x0)
/kilo/pkg/iptables/iptables.go:243 +0x364
github.com/squat/kilo/pkg/mesh.(*Mesh).applyTopology(0xc0002066e0)
/kilo/pkg/mesh/mesh.go:635 +0x12e0
github.com/squat/kilo/pkg/mesh.(*Mesh).syncNodes(0xc0002066e0, 0xc00032bc40)
/kilo/pkg/mesh/mesh.go:445 +0x751
github.com/squat/kilo/pkg/mesh.(*Mesh).Run(0xc0002066e0, 0x0, 0x0)
/kilo/pkg/mesh/mesh.go:350 +0x735
main.Main.func4(0x0, 0x0)
/kilo/cmd/kg/main.go:218 +0x168
github.com/oklog/run.(*Group).Run.func1(0xc0000a8300, 0xc0003d1220, 0xc000414c30)
/kilo/vendor/github.com/oklog/run/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
/kilo/vendor/github.com/oklog/run/group.go:37 +0xbe
Looks like this is wrong:
Lines 225 to 230 in 3facc9f
I think (*rules)[j]
should be (*rules)[i+j]
. You've got this in multiple places in the file. It might be worth switching to a standard for loop to make it a little clearer.
What if nodes have dynamic public IP? is kilo inteligent and auto refresh nodes configuration with the new node public IP?
Any way to auto detect datacenters? so many nodes in the same DC can directly route without WG?(ip route?)
This only works on kubernetes I supose.
Check my project, maybe can help you with ideas ;) Good work!!
https://github.com/segator/wireguard-dynamic
Hi - came across your project, I was curious if it would be possible to run the nodes in client mode without an exposed port? I know this is possible in wireguard, and it surprises me I can't seem to do it with this or wormhole.
I've been looking into using Kubernetes in an at-edge setting. In this type of deployment I'd be setting up nodes behind other people's NAT'ed networks. Kubernete's API and CRDs make a lot of things I need to do very easy to accomplish (daemonsets, service meshes, and config management, etc) very simple. Wireguard would provide a transparent security layer. In my application I don't mind the high latency with communications to the API server and other things. One thing that I don't control in my deployment is the router at the locations of deployment. I can guarantee there will be a network connection that will be able to speak to my api server but I cannot forward ports.
I noticed in your documentation you must provide at least one public IP to each region. Is there some way to use kilo to avoid this constraint? What does this constraint come from? Is this some inherent feature of WG?
Set a controller with http for web view of topology graph.
Is it possible to use AWS VPC peering between two region? I have 2 separate k8s cluster on different region and both VPC is connected using peering network.
Can I use node internal IP for wireguard connectivity ?
No releases for kgctl on Github. I suppose it has to be compiled?
I have a cluster in two locations, A and B, with one server on each location successfully connected with kilo.
Each server has on pod and both pods are connected to each other.
The server on location A has the kilo.squat.ai/leader=true
annotation, but if I add a new server at location A also with this annotation, kilo elects the new server as a new leader, updating the wg configuration on the server on location B, dropping the current connection between the pod on old leader server and the pod on the server on location B.
Is in the roadmap the microk8s compatibility? Microk8s uses flannel as cni by default, I've tested on it but with no success.
Kilo pods starts well and no errors are printed in the logs (w/ log-level=all) but the sudo wg
command don't show anything, no public key nor endpoint... and in the node the kilo.squat.ai/wireguard-ip
annotation shows no ip.
Can you please take a look on microk8s? I think is interesting now that microk8s stable version has the clustering option. Thanks in advance.
so it appears somethings changed with k8s 1.8.1 or flannel ....
running in full mesh one node sees the two others, the two others only see one another and not all 3
all 3 nodes have a public ip interface, and after applying kilo-kubeadm-flannel it appears pods loose valid running state and become "unreachable"
default mailu-roundcube-7b49b94446-ltlcz 0/1 OOMKilled 0 175m
default mailu-rspamd-685df75db8-85thh 0/1 Running 0 175m
Will tail 3 logs...
kilo-djr4w
kilo-ds78k
kilo-fxw8w
[kilo-ds78k] {"caller":"mesh.go:631","component":"kilo","level":"info","msg":"WireGuard configurations are different","ts":"2020-04-22T03:34:40.710140705Z"}
[kilo-ds78k] {"caller":"mesh.go:631","component":"kilo","level":"info","msg":"WireGuard configurations are different","ts":"2020-04-22T03:35:10.781458565Z"}
[kilo-djr4w] {"caller":"main.go:236","msg":"caught interrupt; gracefully cleaning up; see you next time!","ts":"2020-04-22T03:37:13.310050264Z"}
[kilo-djr4w] {"caller":"mesh.go:696","component":"kilo","error":"failed to clean up node backend: failed to patch node: nodes "node2" is forbidden: User "system:serviceaccount:kube-system:kilo" cannot patch resource "nodes" in API group "" at the cluster scope","level":"error","ts":"2020-04-22T03:37:13.311905223Z"}
[kilo-ds78k] {"caller":"main.go:236","msg":"caught interrupt; gracefully cleaning up; see you next time!","ts":"2020-04-22T03:35:40.370479773Z"}
[kilo-fxw8w] {"caller":"main.go:236","msg":"caught interrupt; gracefully cleaning up; see you next time!","ts":"2020-04-22T03:37:13.280341465Z"}
[kilo-fxw8w] {"caller":"mesh.go:696","component":"kilo","error":"failed to clean up node backend: failed to patch node: nodes "node3" is forbidden: User "system:serviceaccount:kube-system:kilo" cannot patch resource "nodes" in API group "" at the cluster scope","level":"error","ts":"2020-04-22T03:37:13.284032892Z"}
[kilo-ds78k] {"caller":"mesh.go:696","component":"kilo","error":"failed to clean up node backend: failed to patch node: Unauthorized","level":"error","ts":"2020-04-22T03:35:40.383739615Z"}
Is it possible to run it along with Calico? Anyone has tried it?
Hey,
first, thanks for this awesome project! I am just getting started with it, and I was able to get it up and running quite easily.
I only want to use it within a Rancher 2 / Canal (Flannel + Calico) setup, to access cluster-local services via a VPN from an external site.
I ran the image with the following options:
image: squat/kilo
args:
- --hostname=$(NODE_NAME)
- --cni=false
- --encapsulation=never
- --compatibility=flannel
- --local=false
(as I am running in-cluster, I don't need the kubeconfig
option).
Then, I added a new peer and extracted its config via kgctl
.
-> Observation: in my case the allowed IPs look like: 10.4.0.1/32, 10.19.0.5/32, 10.19.0.6/32, 10.19.0.7/32, 10.42.0.0/24, 10.42.1.0/24, 10.42.2.0/24, 10.43.0.0/24
However, the services are in the 10.43
IP range; so they are not included in the allowed IPs.
Do you have any hint how to debug further why the Service IP range is not included in the allowed IPs?
Thank you and all the best ❤️
Sebastian
I've follow README.MD, docs/vpn.md for the following settings.
apiVersion: kilo.squat.ai/v1alpha1
kind: Peer
metadata:
name: sam
spec:
allowedIPs:
- 10.5.0.1/32 # Example IP address on the peer's interface.
publicKey: FLS------hzpNFbJ/JUiN4He8pTxLmFC5ZtQLK5Oc0A= #- replace 6 char
persistentKeepalive: 10
[root@ali-vm1 ~]# route -n |grep kilo
7.0.1.0 10.4.0.2 255.255.255.0 UG 0 0 0 kilo0
10.4.0.0 0.0.0.0 255.255.0.0 U 0 0 0 kilo0
10.5.0.1 0.0.0.0 255.255.255.255 UH 0 0 0 kilo0
192.168.0.105 10.4.0.2 255.255.255.255 UGH 0 0 0 kilo0
[root@hw-vm1 ~]# route -n |grep kilo
7.0.0.0 10.4.0.1 255.255.255.0 UG 0 0 0 kilo0
10.4.0.0 0.0.0.0 255.255.0.0 U 0 0 0 kilo0
10.5.0.1 0.0.0.0 255.255.255.255 UH 0 0 0 kilo0
172.16.168.255 10.4.0.1 255.255.255.255 UGH 0 0 0 kilo0
root@deb10:/home/sam# lsmod |grep wire
wireguard 221184 0
ip6_udp_tunnel 16384 2 wireguard,vxlan
udp_tunnel 16384 2 wireguard,vxlan
root@deb10:/home/sam# ip a |grep wg
4455: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
inet 10.5.0.1/32 scope global wg0
root@deb10:/home/sam# route -n |grep wg
7.0.1.0 0.0.0.0 255.255.255.0 U 0 0 0 wg0
10.4.0.1 0.0.0.0 255.255.255.255 UH 0 0 0 wg0
10.4.0.2 0.0.0.0 255.255.255.255 UH 0 0 0 wg0
172.16.168.255 0.0.0.0 255.255.255.255 UH 0 0 0 wg0
192.168.0.105 0.0.0.0 255.255.255.255 UH 0 0 0 wg0
dev wg0: (ListenPort = 5555, nofirewall run in deb10)
root@deb10:/home/sam# cat /etc/wireguard/wg0.conf
[Interface]
Address = 10.5.0.1/32
PrivateKey = +Dsm------FVL3e83lTIVC9dI1rYwjEI7ljI9wbyFWk= #replace 6 char
ListenPort = 5555
peer.ini:
[root@(⎈ |default:default) ~]$ kgctl showconf peer sam
[Peer]
AllowedIPs = 7.0.0.0/24, 172.16.168.255/32, 10.4.0.1/32
Endpoint = 47.98.xxx.xxx:51820
PersistentKeepalive = 0
PublicKey = nOW------dKxE0NDuCxN1GnXXz+0UiseSOYOrq14Nz4=
[Peer]
AllowedIPs = 7.0.1.0/24, 192.168.0.105/32, 10.4.0.2/32
Endpoint = 139.159.xxx.xxx:51820
PersistentKeepalive = 0
PublicKey = fQz------H70oWHUWzSGiMZJdH9wzq9eKGogDO9fWmc=
IFACE=wg0
wg-quick up $IFACE
wg setconf $IFACE peer.ini
ip route add 10.4.0.2/32 dev wg0
...
Current anno: kilo.squat.ai/force-external-ip="a.b.c.d/32"
But sometimes we just use a dymanic ip from network provider, thus we use a ddns domain to get the correct wan-ip.
How ready is this project to be used in production environments?
ok enlighten me again! docs are sketchy .... for a vpn now that kilo is running on 3 nodes.... id like to
connnect my remote laptop for administration and web access, where are we deriving the keys from ? i imagine this is the "client" public key ? and i can get the server/cluster side with wg showconf ?
I do appreciate the assitance, but yes i will say docs are a bit lacking, that being said point me in the right direction, and i can help with documentation once i get the connection sorted.
VPN
Kilo also enables peers outside of a Kubernetes cluster to connect to the VPN, allowing cluster applications to securely access external services and permitting developers and support to securely debug cluster resources. In order to declare a peer, start by defining a Kilo peer resource:
cat <<'EOF' | kubectl apply -f -
apiVersion: kilo.squat.ai/v1alpha1
kind: Peer
metadata:
name: squat
spec:
allowedIPs:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.