costela / wesher Goto Github PK

View Code? Open in Web Editor NEW

888.0 27.0 60.0 292 KB

wireguard overlay mesh network manager

License: GNU General Public License v3.0

Makefile 1.25% Go 85.62% Shell 12.42% Dockerfile 0.72%

wireguard vpn mesh-networks go golang overlay-network encryption networking

wesher's Introduction

wesher

wesher creates and manages an encrypted mesh overlay network across a group of nodes, using wireguard.

Its main use-case is adding low-maintenance security to public-cloud networks or connecting different cloud providers.

⚠ WARNING: since mesh membership is controlled by a mesh-wide pre-shared key, this effectively downgrades some of the security benefits from wireguard. See security considerations below for more details.

Quickstart

Before starting:
1. make sure the wireguard kernel module is available on all nodes. It is bundled with linux newer than 5.6 and can otherwise be installed following the instructions here.
2. The following ports must be accessible between all nodes (see configuration options to change these):
  - 51820 UDP
  - 7946 UDP and TCP

Download the latest release for your architecture:

$ wget -O wesher https://github.com/costela/wesher/releases/latest/download/wesher-$(go env GOARCH)
$ chmod a+x wesher

On the first node:
```
# ./wesher
```
This will start the wesher daemon in the foreground and - when running on a terminal - will currently output a generated cluster key as follows:
```
new cluster key generated: XXXXX
```
Note: to avoid accidentally leaking it in the logs, the created key will only be displayed if running on a terminal. When started via other means (e.g.: desktop session manager or init system), the key can be retreived with grep ClusterKey /var/lib/wesher/state.json.
Lastly, on any further node:
```
# wesher --cluster-key XXXXX --join x.x.x.x
```
Where XXXXX is the base64 encoded 256 bit key printed by the step above, and x.x.x.x is the hostname or IP of any of the nodes already joined to the mesh cluster.

Permissions

Note that wireguard - and therefore wesher - need root access to work properly.

It is also possible to give the wesher binary enough capabilities to manage the wireguard interface via:

# setcap cap_net_admin=eip wesher

This will enable running as an unprivileged user, but some functionality (like automatic adding peer entries to /etc/hosts; see configuration options below) will not work.

(optional) systemd integration

A minimal systemd unit file is provided under the dist folder and can be copied to /etc/systemd/system:

# wget -O /etc/systemd/system/wesher.service https://raw.githubusercontent.com/costela/wesher/master/dist/wesher.service
# systemctl daemon-reload
# systemctl enable wesher

The provided unit file assumes wesher is installed to /usr/local/sbin.

Note that, as mentioned above, the initial cluster key will not be displayed in the journal. It can either be initialized by running wesher manually once, or by pre-seeding via /etc/default/wesher as the WESHER_CLUSTER_KEY environment var (see configuration options below).

Installing from source

There are a couple of ways of installing wesher from sources:

Preferred:

$ git clone https://github.com/costela/wesher.git
$ cd wesher
$ make

This method can build a bit-by-bit identical binary to the released ones, assuming the same go version is used to build its respective git tag.

Alternatively:

$ GO111MODULE=on go get github.com/costela/wesher

Note: this method will not provide a meaningful output for --version.

Features

The wesher tool builds a cluster and manages the configuration of wireguard on each node to create peer-to-peer connections between all nodes, thus forming a full mesh VPN. This approach may not scale for hundreds of nodes (benchmarks accepted 😉), but is sufficiently performant to join several nodes across multiple cloud providers, or simply to secure inter-node comunication in a single public-cloud.

Automatic Key management

The wireguard private keys are created on startup for each node and the respective public keys are then broadcast across the cluster.

The control-plane cluster communication is secured with a pre-shared AES-256 key. This key can be be automatically created during startup of the first node in a cluster, or it can be provided (see configuration). The cluster key must then be sent to other nodes via a out-of-band secure channel (e.g. ssh, cloud-init, etc). Once set, the cluster key is saved locally and reused on the next startup.

Automatic IP address management

The overlay IP address of each node is automatically selected out of a private network (10.0.0.0/8 by default; MUST be different from the underlying network used for cluster communication) and is consistently hashed based on the peer's hostname.

The use of consistent hashing means a given node will always receive the same overlay IP address (see limitations of this approach below).

Note: the node's hostname is also used by the underlying cluster management (using memberlist) to identify nodes and must therefore be unique in the cluster.

Automatic /etc/hosts management

To ease intra-node communication, wesher also adds entries to /etc/hosts for each peer in the mesh. This enables using the nodes' hostnames to ensure communication over the secured overlay network (assuming files is the first entry for hosts in /etc/nsswitch.conf).

See configuration below for how to disable this behavior.

Seamless restarts

If a node in the cluster is restarted, it will attempt to re-join the last-known nodes using the same cluster key. This means a restart requires no manual intervention.

Configuration options

All options can be passed either as command-line flags or environment variables:

Option	Env	Description	Default
`--cluster-key KEY`	WESHER_CLUSTER_KEY	shared key for cluster membership; must be 32 bytes base64 encoded; will be generated if not provided	autogenerated/loaded
`--join HOST,...`	WESHER_JOIN	comma separated list of hostnames or IP addresses to existing cluster members; if not provided, will attempt resuming any known state or otherwise wait for further members
`--init`	WESHER_INIT	whether to explicitly (re)initialize the cluster; any known state from previous runs will be forgotten	`false`
`--bind-addr ADDR`	WESHER_BIND_ADDR	IP address to bind to for cluster membership (cannot be used with --bind-iface)	autodetected
`--bind-iface IFACE`	WESHER_BIND_IFACE	Interface to bind to for cluster membership (cannot be used with --bind-addr)
`--cluster-port PORT`	WESHER_CLUSTER_PORT	port used for membership gossip traffic (both TCP and UDP); must be the same across cluster	`7946`
`--wireguard-port PORT`	WESHER_WIREGUARD_PORT	port used for wireguard traffic (UDP); must be the same across cluster	`51820`
`--overlay-net ADDR/MASK`	WESHER_OVERLAY_NET	the network in which to allocate addresses for the overlay mesh network (CIDR format); smaller networks increase the chance of IP collision	`10.0.0.0/8`
`--interface DEV`	WESHER_INTERFACE	name of the wireguard interface to create and manage	`wgoverlay`
`--no-etc-hosts`	WESHER_NO_ETC_HOSTS	whether to skip writing hosts entries for each node in mesh	`false`
`--log-level LEVEL`	WESHER_LOG_LEVEL	set the verbosity (one of debug/info/warn/error)	`warn`

Running multiple clusters

To make a node be a member of multiple clusters, simply start multiple wesher instances.
Each instance must have different values for the following settings:

--interface
either --cluster-port, or --bind-addr or --bind-iface
--wireguard-port

The following settings are not required to be unique, but recommended:

--overlay-net (to reduce the chance of node address conflicts; see Overlay IP collisions)
--cluster-key (as a sensible security measure)

Security considerations

The decision of whom to allow in the mesh is made by memberlist and is secured by a cluster-wide pre-shared key. Compromise of this key will allow an attacker to:

access services exposed on the overlay network
impersonate and/or disrupt traffic to/from other nodes It will not, however, allow the attacker access to decrypt the traffic between other nodes.

This pre-shared key is currently static, set up during cluster bootstrapping, but will - in a future version - be rotated for improved security.

Current known limitations

Overlay IP collisions

Since the assignment of IPs on the overlay network is currently decided by the individual node and implemented as a naive hashing of the hostname, there can be no guarantee two hosts will not generate the same overlay IPs. This limitation may be worked around in a future version.

Split-brain

Once a cluster is joined, there is currently no way to distinguish a failed node from an intentionally removed one. This is partially by design: growing and shrinking your cluster dynamically (e.g. via autoscaling) should be as easy as possible.

However, this does mean longer connection loss between any two parts of the cluster (e.g. across a WAN link between different cloud providers) can lead to a split-brain scenario where each side thinks the other side is simply "gone".

There is currently no clean solution for this problem, but one could work around it by designating edge nodes which periodically restart wesher with the --join option pointing to the other side. Future versions might include the notion of a "static" node to more cleanly avoid this.

wesher's People

Contributors

Stargazers

Watchers

Forkers

th3architect iamd3vil 101script backwardn newredsquare lnattrass v1k0d3n simhaonline emrul apgpavel kaiyou tikazyq vvalorous stevefan1999-personal guoyongchang junneyang showsmall wikipikimiki allright fastgh tunfish datanoisetv gerifield moromindful maxmalysh aigent isgasho cheetahfox lykos153 cutff 5l1v3r1 iamindian westcope gjpin crt-fork classx samizdapp rightone nexxus-lmt anrg-laas cloudalert fencyoung cloudnepal j4ckzh0u wahello cxw0504 p1u3o derlaft awesomegolang ozfive harunor iq-scm freemanpolys boedy gnu-linux-libre mibootore

wesher's Issues

Issue with generating cluster key

Expected Behavior

Running sudo ./wesher and seeing new cluster key generated: XZYCLUSTERKEY

Actual Behavior

After running sudo ./wesher nothing happened (blank prompt). I only can see the cluster key running command as common user, but I receive permission denied on /var/lib/wesher/state.json and after terminating using CTRL+C I got errors [0035].

WARN[0000] could not open state in /var/lib/wesher/state.json: open /var/lib/wesher/state.json: permission denied 
new cluster key generated: SOMECLUSTERKEABC

^CERRO[0035] could not remove stale hosts entries: could not open /etc/hosts for reading: open /etc/hosts: permission denied 
ERRO[0035] could not down interface: operation not permitted

(You can observe ^C where I escaped)

Steps to Reproduce the Problem

Download wesher using below commands as common user:

wget -O wesher https://github.com/costela/wesher/releases/latest/download/wesher-$(go env GOARCH)
chmod a+x wesher
sudo ./wesher

Specifications

Version: v0.2.4
Platform: Ubuntu 18.04.3 LTS

Unable to create wgoverlay

I am trying wesher on Digital Ocean running CentOS8. I installed wireguard and then launched wesher on the first node with:
./wesher --overlay-net 172.16.201.0/24 --log-level debug

On the second node I ran:
./wesher --cluster-key --join

I then instantly saw on both hosts:
ERRO[0000] could not up interface: could not create interface wgoverlay: operation not supported

systemd service

Hey there,

I really like your project. This works absolutly fine with multiple servers.

Would it be possible to provide a systemd service which automatically starts the wisher daemon and connects to the existing mesh? Or what would be the right way to do so?

Thank you very much in advance

How to "start" over and generate a new key

after running on the first node once.

# ./wesher

How do you reset things so you can generate a "new" cluster key if you want to?

Add option to "join-and-exit"

To aid in automatically setting up wesher noninteractively, it would be very nice if there was an option to join and then exit. Then I would provide the initial IP address to join and the cluster key, without specifying --join which - looking at the documentation at least - prevents wesher from resuming from the previous known state of other known hosts that were in the cluster.

Then, after it's exited, I could start it for real - e.g. via systemd.

netlink errors on linux >= 5.2

Hi,

I'm trying to get two wesher instances up and running. I noticed that the wesher succeeds on one side, and seems to succeed on the other side. Unfortunately, it seems like there are some netlink errors that aren't being handled properly. Here is an strace of the side with the netlink failure. Notice the error=-EINVAL.

INFO[0000] cluster members:
INFO[0000]      addr: 192.168.1.198, overlay: {10.170.2.37 ffffffff}, pubkey: fa9Ve+i9SSB/C3/k2P0+OmRAu3lrbW1DkSKkIj5d8E8=
) = 0
sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=192, type=wireguard, flags=NLM_F_REQUEST|NLM_F_ACK, seq=2417833881, pid=4958}, "\x01\x01\x00\x00\x0e\x00\x02\x00\x77\x67\x6f\x76\x65\x72\x6c\x61\x79\x00\x00\x00\x24\x00\x03\x00\x80\x57\x2d\xac\xf6\x84\x8b\x2b"...}, iov_len=192}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 192
futex(0xc000040848, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xc0000412c8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xf49ae8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=112->12, msg_iov=[{iov_base={{len=212, type=NLMSG_ERROR, flags=0, seq=2417833881, pid=4958}, {error=-EINVAL, msg={{len=192, type=wireguard, flags=NLM_F_REQUEST|NLM_F_ACK, seq=2417833881, pid=4958}, "\x01\x01\x00\x00\x0e\x00\x02\x00\x77\x67\x6f\x76\x65\x72\x6c\x61\x79\x00\x00\x00\x24\x00\x03\x00\x80\x57\x2d\xac\xf6\x84\x8b\x2b"...}}}, iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_PEEK) = 212
futex(0xc000064bc8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xc000040848, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xf49ae8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=112->12, msg_iov=[{iov_base={{len=212, type=NLMSG_ERROR, flags=0, seq=2417833881, pid=4958}, {error=-EINVAL, msg={{len=192, type=wireguard, flags=NLM_F_REQUEST|NLM_F_ACK, seq=2417833881, pid=4958}, "\x01\x01\x00\x00\x0e\x00\x02\x00\x77\x67\x6f\x76\x65\x72\x6c\x61\x79\x00\x00\x00\x24\x00\x03\x00\x80\x57\x2d\xac\xf6\x84\x8b\x2b"...}}}, iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 212

Does Wesher manipulate Allowed IPs

Just curios but in regards to wireguard's "allowed-ips" does Wesher do anything with each Node's config regarding inserting IP addresses into a Node "allowed-ips" list?

Dependabot can't resolve your Go dependency files

Dependabot can't resolve your Go dependency files.

As a result, Dependabot couldn't update your dependencies.

The error Dependabot encountered was:

If you think the above is an error on Dependabot's side please don't hesitate to get in touch - we'll do whatever we can to fix it.

View the update logs.

can private network address pool be changed from 10.x.x.x to 172.16-31 or 192.168.x.x

The README file states:

The overlay IP address of each node is automatically selected out of a private network (10.0.0.0/8 by default; MUST be different from the underlying network used for cluster communication) and is consistently hashed based on the peer's hostname.

Is there any way to configure the private network pool to something other than 10.x.x.x. Such as a 192.168.x.x or a 172.16.x.x ?

I am using LXD containers on all of my servers and they use the 10.x.x.x network for the containers.

not able to auto rejoin cluster in openwrt

scenario
in LAB , 3 openwrt router running wesher

A: 192.168.11.30/24 ( master node)
B: 192.168.11.40/24
C: 192.168.11.50/24
wireguard-port 12000

first , start a wireguard mesh with wesher

A: wesher --init --cluster-key cluster_key --wireguard-port 12000 --log-level debug

B: wesher --cluster-key cluster_key --join 192.168.11.30 --wireguard-port 12000

C: wesher --cluster-key cluster_key --join 192.168.11.30 --wireguard-port 12000

log from A

DEBU[0000] waiting for cluster events                   
DEBU[0023] 2020/06/03 07:27:41 [DEBUG] memberlist: Stream connection from=192.168.11.40:49278 
INFO[0023] node OpenWrt-14 joined                       
INFO[0023] cluster members:                             
INFO[0023] 	addr: 192.168.11.40, overlay: {10.247.72.110 ffffffff}, pubkey: node_B_pubkey= 
INFO[0023] writing entry for 10.247.72.110 ([OpenWrt-14]) 
DEBU[0053] 2020/06/03 07:28:10 [DEBUG] memberlist: Stream connection from=192.168.11.50:57776 
INFO[0053] node OpenWrt-15 joined                       
INFO[0053] cluster members:                             
INFO[0053] 	addr: 192.168.11.40, overlay: {10.247.72.110 ffffffff}, pubkey: node_B_pubkey= 
INFO[0053] 	addr: 192.168.11.50, overlay: {10.247.73.169 ffffffff}, pubkey: node_C_pubkey= 
INFO[0053] writing entry for 10.247.72.110 ([OpenWrt-14]) 
INFO[0053] writing entry for 10.247.73.169 ([OpenWrt-15])

OK , Now, I have a VPN made up of three Nodes.

let's try what if one of slave nodes crash/reboot/replaced whatever.

so I just kill the wesher process in B and try to rejoing to the cluster

root@OpenWrt-14:~# wesher --join 192.168.11.30 --log-level debug 
DEBU[0000] 2020/06/03 08:11:24 [DEBUG] memberlist: Initiating push/pull sync with:  192.168.11.30:7946 
DEBU[0000] waiting for cluster events                   
INFO[0000] node OpenWrt-13 joined                       
INFO[0000] node OpenWrt-15 joined                       
INFO[0000] cluster members:                             
INFO[0000] 	addr: 192.168.11.50, overlay: {10.247.73.169 ffffffff}, pubkey: node_C_pubkey
INFO[0000] 	addr: 192.168.11.30, overlay: {10.247.76.31 ffffffff}, pubkey: node_A_pubkey 
INFO[0000] writing entry for 10.247.73.169 ([OpenWrt-15]) 
INFO[0000] writing entry for 10.247.76.31 ([OpenWrt-13]) 
INFO[0000] cluster members:                             
INFO[0000] 	addr: 192.168.11.50, overlay: {10.247.73.169 ffffffff}, pubkey: node_C_pubkey
INFO[0000] 	addr: 192.168.11.30, overlay: {10.247.76.31 ffffffff}, pubkey: node_A_pubkey 
INFO[0000] writing entry for 10.247.73.169 ([OpenWrt-15]) 
INFO[0000] writing entry for 10.247.76.31 ([OpenWrt-13]) 
^CINFO[0008] terminating...        

### process terminated ###
                       
DEBU[0009] 2020/06/03 08:11:33 [ERR] memberlist: Failed to send gossip to 192.168.11.30:7946: write udp [::]:7946->192.168.11.30:7946: use of closed network connection

and the debug log shows node B left

INFO[2655] node OpenWrt-14 left                         
INFO[2655] cluster members:                             
INFO[2655] 	addr: 192.168.11.50, overlay: {10.247.73.169 ffffffff}, pubkey: node_C_pubkey= 
INFO[2655] writing entry for 10.247.73.169 ([OpenWrt-15])

in README , "Configuration options" , if not provide the HOST with --join , will attempt resuming any known state
so , on node B , run wesher --join , but it failed to rejoin

root@OpenWrt-14:~# wesher --join  --log-level debug 
DEBU[0000] 2020/06/03 08:11:42 [WARN] memberlist: Failed to resolve true: lookup true on 127.0.0.1:53: no such host 
ERRO[0000] could not join cluster, retrying in 670.867025ms  error="1 error occurred:\n\t* Failed to resolve true: lookup true on 127.0.0.1:53: no such host\n\n"
DEBU[0001] 2020/06/03 08:11:43 [WARN] memberlist: Failed to resolve true: lookup true on 127.0.0.1:53: no such host 
ERRO[0001] could not join cluster, retrying in 450.026747ms  error="1 error occurred:\n\t* Failed to resolve true: lookup true on 127.0.0.1:53: no such host\n\n"
DEBU[0001] 2020/06/03 08:11:43 [WARN] memberlist: Failed to resolve true: lookup true on 127.0.0.1:53: no such host 
ERRO[0001] could not join cluster, retrying in 574.33029ms  error="1 error occurred:\n\t* Failed to resolve true: lookup true on 127.0.0.1:53: no such host\n\n"

let's check the wgoverlay.json

root@OpenWrt-14:~# cat /var/lib/wesher/wgoverlay.json 
{
  "ClusterKey": "cluster_key",
  "Nodes": [
    {
      "Name": "OpenWrt-15",
      "Addr": "192.168.11.50",
      "Meta": "Mv+BAwEBCG5vZGVNZXRhAf+CAAECAQtPdmVybGF5QWRkcgH/hAABBlB1YktleQEMAAAAI/+DAwEBBUlQTmV0Af+EAAECAQJJUAEKAAEETWFzawEKAAAAP/+CAQEECvdJqQEE/////wABLFV4aW9RRW03YnJTTEV5UkVaanBlTXJSSmZPSklCczgzM3lUREZYbURGazQ9AA==",
      "OverlayAddr": {
        "IP": "",
        "Mask": null
      },
      "PubKey": ""
    },
    {
      "Name": "OpenWrt-13",
      "Addr": "192.168.11.30",
      "Meta": "Mv+BAwEBCG5vZGVNZXRhAf+CAAECAQtPdmVybGF5QWRkcgH/hAABBlB1YktleQEMAAAAI/+DAwEBBUlQTmV0Af+EAAECAQJJUAEKAAEETWFzawEKAAAAP/+CAQEECvdMHwEE/////wABLFluKytiMms0TmZzek1LaFB5M3paanhCd0VCZjhicDhjM2VPMUNsY2g4aHM9AA==",
      "OverlayAddr": {
        "IP": "",
        "Mask": null
      },
      "PubKey": ""
    }
  ]
}root@OpenWrt-14:~#

the json file looks just good (or not ? )
I have no idea why wesher try to lookup 127.0.0.1:53 for smething when try to rejoin .

of course I can fix the problem by join the cluster manually , but I want it to be fully automatic after any node reboot or crashed.

am I misunderstanding the rejoin process or where should I pay more attention or maybe I just run the run command/parameters ?

should bind on ipv4 and ipv6 per default

I have having a hard time joining the mesh network started by the designated VPS.

The error message is always either 'connection refused', or 'i/o timeout'.

I was connecting both through IPv4 and IPv6 and got the same results.

Ports are opened.

investigate adding NAT traversal support

Right now wesher requires all nodes in the mesh to be directly accessible. It should be possible to use uPnP+IGD to make nodes accessible behind a NAT gateway.

Use cases include:

joining a workstation to a cluster for secure access to services
slightly increase multi-cloud security by not requiring constantly open ports (dynamically and selectively opening ports instead)

Some refactoring ideas

Here are some refactoring ideas that I intend to take on while I work on wesher in the next couple of weeks (I need some features implemented and am willing to contribute).

Feel free to object on any of them, so I only go with what you'd be willing to merge :)

Move the code related to a node meta (anything that is not pure cluster management) out of cluster.go, probably in a new node.go
Remove inter-dependency between wireguard.go and cluster.go, the only glue should be in main.go, and common types maybe in aforementioned node.go
In cluster.go, limit the complexity of newCluster by splitting it

Maybe go further with:

Move the cluster code to a module and split it between cluster management and state file management
Move the wireguard code to a module (3 files currently)

Those modules would not be properly reuseable, and this might increase complexity by requiring the common types are also maybe moved to a module. So I am unsure about those last two.

Drop wg-quick

Instead of naively doing wg-quick down/wg-quick up, which may lead to unnecessary package drops, we should use just ip and wg directly.

Default port 7946 collides with Docker Swarm port

Hi,

I tried to setup docker swarm over wesher and run into a port conflict. According to the docker swarm docs swarm uses TCP and UDP port 7946.
Cause docker swarm is a common project I vote to change the default wesher port to something else instead, e.g. 7680.

As a workaround you can of cause use --cluster-port 7980 to setup the mesh on another port. Doing so, works like a charm with swarm.

Make the etchosts banner configurable

The provided etchosts module supports providing a banner to distinguish between multiple writers to the /etc/hosts file.

In order to support running multipler wesher instances on the same hosts, this banner would need to be configurable.

apt repository

Hello!

This is not an issue. I just wanted to let you know that I have packaged wesher in my apt repository.

You can find it here: https://apt.starbeamrainbowlabs.com/

This is done automatically - so when a new release comes out I just set a job going on my CI server (in future this will be completely automated).

This issue can be closed - just wanted to let you know :-)

Why does network need to be multiple of 8?

Hi,

Is here any reason why wesher is set up so that nework side needs to a multiple of 8? This currently restricts you to having your network block being a minimum of a /24. Since a random IP from the network is assigned there is no way to know if a host's IP will end in 1 or 95.

node subnets are not included in "allowed ips"?

I have wesher creating my mesh using wireguard ok. But I'm finding that subnets on various nodes
are not reachable.

On some of my NODEs I have created bridged subnets.

With regular WireGuard I could normally include those Subnets (example 192.168.75.0/24) as an "allowed ips"

But running Wesher if I check /var/lib/wesher/state.json the IP subnets are not included !

Is there a command line option not documented for how someone can specifically identify subnets they want included in the WireGuard config?

thanks

current release builds not easily reproducible

As soon as golang/go#16860 is fixed, this should be reevaluated.

Ability to add additional subnets to wireguard's AllowIPs

I wonder if we can add additional subnets to wireguard's AllowIPs in wesher's configuration.

Right now wesher assigns a /128 or a /32 to nodes in the mesh.

However if there is a subnet behind the mesh, it would not be accessible by the nodes not physically connected to the subnet.

The solution I can think of is, wg showconf wgoverlay > wesher.conf then add the subnets in wesher.conf then wg setconf wgoverlay wesher.conf, but the work compounds as the mesh grows and I haven't really tried if it works.

Another use case would be adding ::/0 and/or 0.0.0.0/0 to a node and let the node act like a VPN gateway.

Support wireguard keepalives

Nodes behind firewalls can sometimes lose connection. Supporting PersistentKeepalive would help with that

is it possible to run wesher in openwrt ?

I already build VPN network using wireguard in openwrt , but I'm very interested with wesher. and I'm wonder if wesher run in openwrt since it's compile by golang ?
if so , then I don't need to prepare another machine to run wireguard on it , just run wesher in openwrt router , and assign client's gateway to openwrt . in that case, it will be much more convenient.

Try to avoid isolated node split-brain

There are two kinds of split-brain:

a fairly rare case of half-half split-brain, where halft the nodes are completely disconnected from the other half for long enough so a split-brain occur, this might sometimes happen if entire LANs are connected using wesher ;
a much more common case of isolated node split-brain, where a single node loses connection to the reste of the cluster, and acts as an isolated node from now on.

There would be a general fix for both of these, which involves keeping track of some super-nodes (maybe all known nodes ?) and regularly try to join these nodes to the memberlist with some kind of backoff mechanism, and maybe forget them after some (fairly long) time.

This would probably require some complex code, should not be run from inside the main loop to avoid deadlocking, and quiet frankly: it sounds scary to me. I would love to get into it later, but I am not familiar enough with the wesher code for now.

However, the second more common case has a quick and (not so) dirty fix. If the memberlist becomes empty, it is usually safe to consider we are facing a split-brain, and more generally we know for sure we are in a deadend (until some nodes leaves/joins that is). So I think it would be safe to simply fatal-exit. Then, it is the service manager responsibility to handle restarting if required by the admin.

I have a patch working for this, and have tested it using systemd unit files with success. I need to isolate the changes and provide a PR.

clarification on statement in README

In README it says:

make sure the wireguard kernel module is installed on all nodes.

When it says installed, doe that also mean wireguard should be completely configured also before running wesher:

https://www.wireguard.com/quickstart/

does wesher create a config file for wireguard to use? if so where do I find it.

Since wesher is using wireguard, does wesher create & update a config file for wireguard?

If so where is that config file kept.

thanks

Converging with another of my projects

Hello,

I have contributed some features and refactoring to wesher in the past and have been using it for production clusters everyday since. Little to say I have come to love this project, its simple architecture and philosophy.

As a separate project I have been building a Kubernetes distribution, starting with duct-taped proof of concepts binding wesher and k3s together as independent processes. I have reached a point where binding the pieces together is better achieved in Golang directly, benefiting from the nice feats of Goroutine scheduler and memory manager.

In the end, I would very much like to embed wesher as a library for setting up the mesh vpn. I would use the wireguard abstractions and common types but probably not the cluster abstractions as is, since I require them for exchanging more than wireguard keys. For prototyping I have implemented what I believe is a much needed library for abstracting memberlist delegate patterns and providing simple usable event channels. I think it has become generic enough to serve as a basis for other projects.

My proposal would be:

Publish that library (branded sml, as in simpler memberlists), which is largely inspired by wesher cluster package
Make wesher packages a bit more ergonomic for using as a library
Move wesher on top of sml for easier integration in my current project
Move my own codebase on top of both wesher and sml.

I am exposing this plan for us to assess if it is worth the time and energy, or if I should just stick with a separate wireguard mesh implementation which is 80% copy pasted from wesher yet weighs a couple dozen lines of Go only.

Cheers 🤗

IPv6 Support

Hello,

I love the idea and would like to know whether IPv6 is currently supported by wesher (both underlay network and within wireguard)?

add optional cluster key rotation

It would be nice to have the memberlist key be rotated automatically.
This would increase security a bit, at the cost of making ad-hoc joining less practical.
We'd also need a way to get the current key.

Misunderstading instructions

The README says:

On the first node:

# ./wesher

Running the command above on a terminal will currently output a generated cluster key as follows:

new cluster key generated: XXXXX

Note: the created key will only be shown if running on a terminal, to avoid keys leaking via logs.

3. Lastly, on any further node:

# wesher --cluster-key XXXXX --join x.x.x.x

Where XXXXX is the base64 encoded 256 bit key printed by the step above, and x.x.x.x is the hostname or IP of any of the nodes already joined to the mesh cluster.

I have wireguard installed on all 3 nodes.

wget installed wesher
chmod a+x ./wesher
sudo ./wesher
saved key

But Step #3 above stumps me... If I just generated the "cluster-key" how can the next step say:

# wesher --cluster-key XXXXX --join x.x.x.x

Where XXXXX is the base64 encoded 256 bit key printed by the step above, and x.x.x.x is the hostname or IP of any of the nodes already joined to the mesh cluster.

How do you get a first node into the cluster? so additional nodes can be added like that?

Is step #2 supposed to to that? When I run:

# ./wesher

I get the...

new cluster key generated: XXXXX

but the command never completes or goes back to a prompt?

how about add config-file/log-file parameter ?

I run wesher in couple openwrt routers to test wireguard mesh vpn
althrough it's only node to node so far , but it's really wonderful.
so I'm trying to add wesher to /etc/rc.local and I don't want save the key in rc.local

I read the instructions and find out wesher should be able to read config in /etc/default/wesher
so I make /etc/default (which is not present in openwrt default setting) , add wesher file like

root@OpenWrt-13:~# cat /etc/default/wesher 
WESHER_CLUSTER_KEY = cluster_key_here
WESHER_JOIN = 1.1.1.1,2.2.2.2
WESHER_WIREGUARD_PORT = 12000
WESHER_LOG_LEVEL = debug
root@OpenWrt-13:~#

but I do not know how to make wesher to read the config , there's no parameter to assign config file,and also I don't know if wesher try to read default config or not , not shown in debug log.

and please consider to add log-file parameter to save wesher log for debug later or audit. it's not easy to find log in stdout .

what do you think ?

Make the stored state path configurable

Currently the stored state is written to a hardcoded path.

This prevents from running multiple wesher instances on the same machine.

I have already implemented and tested a patch for this, I need to isolate the changes and contribute a PR.

Internal Floating IP

In some cloud or bare metal environments so called floating or failover ip's are provided. Those can be assigned to any server as required. Using a daemon tracking the life status of the servers, like keepalived, one can easily create a simple high availability environment. This is primary useful to connect a static ip to a domain.

For some specific use cases it might be useful to have a similar feature inside a vpn. An ip always pointing to some server alive.

A real world usecase:

Below is some typical self-made HA kubernetes cluster setup.

LB Node 1:
- HAProxy
- keepalived master
- Public IP: 76.0.0.11
- Public Floating IP assigned: 100.0.0.3
- VPN IP 10.0.0.11
- VPN Floating Point IP assigned: 10.0.0.99
LB Node 2:
- HAProxy
- keepalived master
- Public IP: 76.0.0.12
- VPN IP 10.0.0.12
Master Node 1:
- k8s control plane
- VPN IP 10.0.0.21
Master Node 2:
- k8s control plane
- VPN IP 10.0.0.22
Master Node 3:
- k8s control plane
- VPN IP 10.0.0.23
Worker Node 1
- k8s worker
- VPN IP 10.0.0.31
Worker Node 2
- k8s worker
- VPN IP 10.0.0.32

Every k8s component has to communicate to the control place through a load balancer. If you want all traffic routed through the vpn you need to take care that the load balancer nodes are always be reachable. Both LB nodes running some alive keeper daemon (e.g. keepalived). If one of the nodes goes offline, the other automatically assigns the vpn floating ip to itself.

Is this possible somehow with wireguard and wesher already? Starting a second wesher instance on a connected server understandably failed, cause the port is still used.

REQ: Add ability to generate and export a WG config for a (different) node

I'd like to be able to add mobile devices (phones, tablets) to a Wesher-configured network. I assume that, since wesher sets up WireGuard networks, that this would be possible using the (e.g.) Android Wireguard app with the right config.

I'm imagining something like this:

1. On a desktop node: `wesher --cluster-key XXXXX --join x.x.x.x --export > mobile.txt`
2. Copy `mobile.txt` to mobile device
3. Import `mobile.txt` in Wireguard app
4. Enable interface

Wesher does some hostfile magic, among other things, and this would obviously be missing on the mobile device, but this would be a start to making it easier for mobile apps to join a Wesher network.

Is UseIPAsName really used?

This setting is not documented in README.md and explicitely marked as a testing setting in config.go. However, I cannot foresee a testing use case for it, especially when e2e.sh makes no use of it for automated e2e testing.

Is it really used? And if not, should we get rid of it?

could not enable interface wgoverlay

I run wesher in one of my vps , wesher can be activated normally

root@warp:/var/lib/wesher# wesher --wireguard-port 12000 --cluster-port 10000
new cluster key generated: 3W6Xn6yfuNUQX94nyPh+IC2sxCjPTeGQMH4BlLiD5AM=

but when someother nodes wants to join the cluster , then there will be an error message.

second node

chchang@hqdc039:~/git$ sudo wesher --cluster-key 3W6Xn6yfuNUQX94nyPh+abcdeCjPTeGQMH4BlLiD5AM= --join 123.123.123.123 --wireguard-port 12000 --cluster-port 10000

wesher console

root@warp:/var/lib/wesher# wesher --wireguard-port 12000 --cluster-port 10000
new cluster key generated: 3W6Xn6yfuNUQX94nyPh+IC2sxCjPTeGQMH4BlLiD5AM=
ERRO[0069] could not up interface: could not enable interface wgoverlay: address already in use 
ERRO[0110] could not up interface: could not enable interface wgoverlay: address already in use

both nodes running ubuntu focal x64
any suggestions ? or any logs I can provide ?

Dependabot can't resolve your Go dependency files

Dependabot can't resolve your Go dependency files.

As a result, Dependabot couldn't update your dependencies.

The error Dependabot encountered was:

If you think the above is an error on Dependabot's side please don't hesitate to get in touch - we'll do whatever we can to fix it.

View the update logs.

retry joins

Currently we give up if the join fails to contact any host. This should be retried to make wesher more robust when, for instance, whole clusters are powercycled.

local misspelled as "loca" in README on one line about systemctl

in the wesher README it says:

The provided unit file assumes wesher is installed to /usr/loca/sbin

you left off an "l" it should read:

The provided unit file assumes wesher is installed to /usr/local/sbin

How do you remove wesher from a system?

If you are only testing wesher and want to completly remove wesher and any files it created in use what do you have to do?

how to use --bind-addr

How do I list all the ips connected through the mesh?

And I get this when I try to use --bind-addr

FATA[0000] could not create cluster: Could not set up network transport: failed to obtain an address: Failed to start TCP listener on "x.x.x.x" port 7946: listen tcp x.x.x.x:7946: bind: cannot assign requested address

Edit : forgot to mention here's my /etc/hosts on the first node -

127.0.0.1   localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost   ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
127.0.1.1   flyingfish  flyingfish```

support manually assigning IPs to nodes

Is there a way to statically assign IPs to a specific node?

Dependabot can't resolve your Go dependency files

Dependabot can't resolve your Go dependency files.

As a result, Dependabot couldn't update your dependencies.

The error Dependabot encountered was:

go: golang.zx2c4.com/wireguard/[email protected] requires
	golang.zx2c4.com/[email protected]: reading golang.zx2c4.com/wireguard/go.mod at revision v0.0.20200121: unknown revision

If you think the above is an error on Dependabot's side please don't hesitate to get in touch - we'll do whatever we can to fix it.

View the update logs.

site to site works now ?

Does wesher support site to site wireguard VPN now ?

How does routing work in wesher?

I will use an example to help describe the question.

Let’s say I have 1 public node in a hosting provider and two nodes at my home. The nodes at home are behind a NAT.
Let’s also say that wesher has set up a mesh between all the nodes. There should be 3 tunnels.
If this is 10.0.0.0/24, they are all on the same subnet.

How does wireguard know to route to the correct node?

Also.
now let’s say you add another node. Let’s say it is only reachable from your home network via the public node.

Will your system route home traffic through the public node to reach this new node (And vice versa)?

Split Brain Question

The readme is a bit lite on split brain information.

If one cluster in a DC goes down, and then comes back up- will it or will it not automatically recover? Say I have 10 servers in DC A, and 10 in DC B and the link between them goes down. When the link recovers, will the two sides rediscover each other without manual intervention?

costela / wesher Goto Github PK

wesher's Introduction

wesher

Quickstart

Permissions

(optional) systemd integration

Installing from source

Features

Automatic Key management

Automatic IP address management

Automatic /etc/hosts management

Seamless restarts

Configuration options

Running multiple clusters

Security considerations

Current known limitations

Overlay IP collisions

Split-brain

wesher's People

Contributors

Stargazers

Watchers

Forkers

wesher's Issues

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Recommend Projects

Recommend Topics

Recommend Org