hashicorp / memberlist Goto Github PK

Golang package for gossip based membership and failure detection

License: Mozilla Public License 2.0

Makefile 0.18% Go 99.50% Shell 0.33%

memberlist's Introduction

memberlist

memberlist is a Go library that manages cluster membership and member failure detection using a gossip based protocol.

The use cases for such a library are far-reaching: all distributed systems require membership, and memberlist is a re-usable solution to managing cluster membership and node failure detection.

memberlist is eventually consistent but converges quickly on average. The speed at which it converges can be heavily tuned via various knobs on the protocol. Node failures are detected and network partitions are partially tolerated by attempting to communicate to potentially dead nodes through multiple routes.

Building

If you wish to build memberlist you'll need Go version 1.2+ installed.

Please check your installation with:

go version

Usage

Memberlist is surprisingly simple to use. An example is shown below:

/* Create the initial memberlist from a safe configuration.
   Please reference the godoc for other default config types.
   http://godoc.org/github.com/hashicorp/memberlist#Config
*/
list, err := memberlist.Create(memberlist.DefaultLocalConfig())
if err != nil {
	panic("Failed to create memberlist: " + err.Error())
}

// Join an existing cluster by specifying at least one known member.
n, err := list.Join([]string{"1.2.3.4"})
if err != nil {
	panic("Failed to join cluster: " + err.Error())
}

// Ask for members of the cluster
for _, member := range list.Members() {
	fmt.Printf("Member: %s %s\n", member.Name, member.Addr)
}

// Continue doing whatever you need, memberlist will maintain membership
// information in the background. Delegates can be used for receiving
// events when members join or leave.

The most difficult part of memberlist is configuring it since it has many available knobs in order to tune state propagation delay and convergence times. Memberlist provides a default configuration that offers a good starting point, but errs on the side of caution, choosing values that are optimized for higher convergence at the cost of higher bandwidth usage.

For complete documentation, see the associated Godoc.

Protocol

memberlist is based on "SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol". However, we extend the protocol in a number of ways:

Several extensions are made to increase propagation speed and convergence rate.
Another set of extensions, that we call Lifeguard, are made to make memberlist more robust in the presence of slow message processing (due to factors such as CPU starvation, and network delay or loss).

For details on all of these extensions, please read our paper "Lifeguard : SWIM-ing with Situational Awareness", along with the memberlist source. We welcome any questions related to the protocol on our issue tracker.

memberlist's People

Contributors

Stargazers

Watchers

Forkers

kainosnoema steveyen felixge benagricola wolfeidau jkassemi densone blusch ryanuber rbg yaman lirudy savaki sargun yunkai yyzi cultureulterior eliquious inercia kunalkushwaha alex-devops pombredanne calavera agustim garudareiga relistan derekchiang wuchuguang atomx oldmantaiter sarvex newrelic-forks jsternberg rlayte aybabtme sp-ronny-lopez 42wim memoleaf jsullivan3 stabbycutyou jacksontj saromanov justinwalz glycerine maheshkelkar ncodes orivej github-cloud senzflow obinnaokechukwu cloudxtreme nathanmccauley binhn jpfuentes2 zined mathpl kleopatra999 sidravic dynomitedb the-voice mengjinglei blanklabel sclement2 haskeef gauravsri jmylchreest tyler tryggth awdrius justloop vrok conseweb pavelnikolov aruanruan jinshengruan freeaqingme digideskio mypmc chentknba tsavo woodsaj betawaffle grapebaba horusiath dakunge anrs denisbrodbeck dragon9783 nathanielc mattbostock aryanugroho maniacs-ops sadysnaat phonkee fcrisciani jrossi yanghongkjxy shitaic etsangsplk keep-network

memberlist's Issues

Allow installing keys to the keyring even when encryption is disabled

There are two parts when communicating with encryption in a cluster. There's encrypting and decrypting. Currently, you have to have an encryption key registered to do both encrypt and decrypt. Since it's possible to have multiple keys in the keyring and switch the primary key, it should also be possible to install a key to the keyring (for decryption) without necessarily installing a primary key for encryption.

Question about broadcasts

I have read #10 and have the following questions:

If I use QueueBroadcast() on a memberlist.TransmitLimitedQueue, will it send the messages to every single node on the network? If so, why does serf's implementation rebroadcast the message after calling handleUserEvent(): https://github.com/hashicorp/serf/blob/555e0dcbb180ecbd03431adc28226bb3192558bc/serf/delegate.go#L62 ?
Under what circumstances will a message be broadcast? I am trying to broadcast a protobuf message that's 18 bytes, however, a lot of times, the message does not seem to be broadcast after it is queued.

Question: assigning consecutive ID's to members

Does memberlist make it possible for each node to know which member it is in the list? For example, would the second node that joined know it's #2? I'm hoping to have each node assigned a consective ID when joining, and for that list to update automatically as membership changes.

10.12.19.3 joins -> given ID #1 -> NumMembers() is 1
1.53.12.2 joins -> given ID #2 -> NumMembers() is 2
4.122.4.14 joins -> given ID #2 -> NumMembers() is 3
1.53.12.2 dies -> NumMembers() is 2 -> ID's are consecutively reassigned

Is this something that memberlist could do? Or serf? Or consul?

Thanks

gossip broadcasts different state to the random nodes in one iteration

Reading the code here https://github.com/hashicorp/memberlist/blob/master/state.go#L403 it looks like the broadcast queue is read each time for all the k random nodes(which re-sorts the broadcast queue) so every node selected to get a gossip update is going to get different set of data in a single gossip. Is this intentional? I would have though that the intention would have been to gossip the exact state to all the random nodes in a single iteration.

Can you please make a release?

Most Go dep management tools, including dep, will expect to operate on tagged releases. Thank you!

Are there docker ports to open?

If have memberLists inside docker containers are there ports I need to EXPOSE for memberList to work correctly? Right now they seems to come up ok but then they both report timeouts and eventually stop talking altogether.

Here is my sample docker-compose file:

version: '2'

services:

  app1:
    environment:
      APP_NAME: app1
      CLUSTER_ADDR: app1
      CLUSTER_PORT: 63000
      CLUSTER_MEMBERS:app1:63000
      MAX_CLUSTER_NODES: 50
    hostname: app1
    container_name: app1
    image: myimage
    ports:
      - 8080:8080

  app2:
    environment:
      APP_NAME: app2
      CLUSTER_ADDR: app2
      CLUSTER_PORT: 63001
      CLUSTER_MEMBERS: app1:63000
      MAX_CLUSTER_NODES: 50
    hostname: app2
    container_name: app2
    image: myimage
    ports:
      - 8081:8080
    depends_on:
      - app1

Unable to disable push/pull sync

Configuration:

    // Configure memberlist
    conf := memberlist.DefaultWANConfig()
    conf.AdvertisePort = port
    conf.AdvertiseAddr = ip
    conf.BindPort = port
    conf.BindAddr = ip
    conf.Name = name
    conf.Events = delegate
    conf.GossipNodes = 6
    conf.GossipInterval = conf.GossipInterval * 5 // 2.5s
    conf.PushPullInterval = 0 // this should disable PushPull based on https://github.com/hashicorp/memberlist/blob/master/config.go#L66
    conf.DisableTcpPings = true

INFO[0015] memberlist - [DEBUG] memberlist: Initiating push/pull sync with: 127.0.0.1:23333 
INFO[0015] memberlist - [DEBUG] memberlist: TCP connection from=127.0.0.1:59993

I am trying to minimize or eliminate TCP connections between nodes.

However when I start the server, it still creates TCP connections.

Does memberlist always sync state when it is joining a cluster?

Could it broadcast messages to every node?

From the code of [0] and [1]. The memberlist broadcast mechanism can't definitely guarantee that a message could be sent to every node of the cluster even set [3] greater than the number of nodes. From the comments of the util kRandomNodes, eventually len(kNodes) could be less than k nodes even k is (much )less than n, owe to choosing random node every loop without excluding chosen ones.

So if I'm right, is there any method to broadcast messages to every node in the cluster?

[0]

memberlist/state.go

Lines 485 to 500 in 3d8438d

    
           kNodes := kRandomNodes(m.config.GossipNodes, m.nodes, func(n *nodeState) bool { 
        
           	if n.Name == m.config.Name { 
        
           		return true 
        
           	} 
        
           	switch n.State { 
        
           	case stateAlive, stateSuspect: 
        
           		return false 
        
           	case stateDead: 
        
           		return time.Since(n.StateChange) > m.config.GossipToTheDeadTime 
        
           	default: 
        
           		return true 
        
           	} 
        
           })

[1]

memberlist/util.go

Lines 126 to 154 in 3d8438d

    
           func kRandomNodes(k int, nodes []*nodeState, filterFn func(*nodeState) bool) []*nodeState { 
        
           	n := len(nodes) 
        
           	kNodes := make([]*nodeState, 0, k) 
        
           OUTER: 
        
           	// Probe up to 3*n times, with large n this is not necessary 
        
           	// since k << n, but with small n we want search to be 
        
           	// exhaustive 
        
           	for i := 0; i < 3*n && len(kNodes) < k; i++ { 
        
           		// Get random node 
        
           		idx := randomOffset(n) 
        
           		node := nodes[idx] 
        
           		// Give the filter a shot at it. 
        
           		if filterFn != nil && filterFn(node) { 
        
           			continue OUTER 
        
           		} 
        
           		// Check if we have this node already 
        
           		for j := 0; j < len(kNodes); j++ { 
        
           			if node == kNodes[j] { 
        
           				continue OUTER 
        
           			} 
        
           		} 
        
           		// Append the node 
        
           		kNodes = append(kNodes, node) 
        
           	} 
        
           	return kNodes 
        
           }

[2]

memberlist/config.go

Line 141 in 3d8438d

GossipNodes int

data race warning

WARNING: DATA RACE
Write by goroutine 17:
github.com/hashicorp/memberlist.(_Memberlist).resetNodes()
/Users/lsaint/go/src/github.com/hashicorp/memberlist/state.go:284 +0x1c2
github.com/hashicorp/memberlist.(_Memberlist).probe()
/Users/lsaint/go/src/github.com/hashicorp/memberlist/state.go:178 +0x116
github.com/hashicorp/memberlist._Memberlist.(github.com/hashicorp/memberlist.probe)·fm()
/Users/lsaint/go/src/github.com/hashicorp/memberlist/state.go:70 +0x33
github.com/hashicorp/memberlist.(_Memberlist).triggerFunc()
/Users/lsaint/go/src/github.com/hashicorp/memberlist/state.go:106 +0xd5

Previous read by goroutine 18:
github.com/hashicorp/memberlist.(*Memberlist).pushPullTrigger()
/Users/lsaint/go/src/github.com/hashicorp/memberlist/state.go:130 +0xd7

Goroutine 17 (running) created at:
github.com/hashicorp/memberlist.(*Memberlist).schedule()
/Users/lsaint/go/src/github.com/hashicorp/memberlist/state.go:70 +0x237
github.com/hashicorp/memberlist.Create()
/Users/lsaint/go/src/github.com/hashicorp/memberlist/memberlist.go:141 +0xe8
...
...

Goroutine 18 (running) created at:
github.com/hashicorp/memberlist.(*Memberlist).schedule()
/Users/lsaint/go/src/github.com/hashicorp/memberlist/state.go:76 +0x3e8
github.com/hashicorp/memberlist.Create()
/Users/lsaint/go/src/github.com/hashicorp/memberlist/memberlist.go:141 +0xe8
...

...

Small bug - missing arg to `fmt.Errorf()`

This line would always cause a %!v(MISSING):

https://github.com/hashicorp/memberlist/blob/master/memberlist.go#L337

    return fmt.Errorf("Failed to get final advertise address: %v")

Metrics counter for "alive" isn't accurate

The metrics counter for when an alive message is processed isn't accurate. The current logic checks the incarnation of the alive node received and will bail if the incarnation is less than or equal to what is currently known for other nodes, but it will only bail when the incarnation is strictly less than when the alive message received is for yourself. This is so the current node can refute an alive message that has the wrong information, but the logic causes a problem with metrics.

When an alive message is received for yourself it always increases the counter even if nothing was done with the message. When a push/pull request happens with another node, the current node rediscovers itself and then will discard the message later when it sees it isn't new. But by the time it discards the message, the node has already reported that it saw an alive message. Since a push/pull happens roughly every 30 seconds, that means you see ~2 alive messages for a node constantly. This implies that there is some activity going on in the system even though there isn't.

I think the increase counter metric should only be sent when an alive message is going to be broadcast rather than always being sent.

pr #100 introduces failure in TestSuspicion_Timer

at tip 9800c50 on osx 10.11.6, using go1.8beta2,

*   9800c50 (HEAD -> master, origin/master, origin/HEAD) Merge pull request \
#100 from hashicorp/smooth-suspicion-timeout
|\
| * b339ecf Change suspicion timeout function to be smooth instead of stepwi\
se
|/
*   a6968aa Merge pull request #99 from hashicorp/f-gossip-dead-timeout
|\

I see the following test failure, which does not happen one merge back at a6968aa:

=== RUN   TestSuspicion_Timer
2017/01/22 21:32:46 [INFO] memberlist: Marking 127.0.0.143 as failed, suspect timeout reached (0 peer confirmations)
2017/01/22 21:32:46 [INFO] memberlist: Marking 127.0.0.147 as failed, suspect timeout reached (0 peer confirmations)
2017/01/22 21:32:51 [INFO] memberlist: Marking test as failed, suspect timeout reached (0 peer confirmations)
2017/01/22 21:32:51 [INFO] memberlist: Marking test2 as failed, suspect timeout reached (0 peer confirmations)
2017/01/22 21:32:51 [INFO] memberlist: Marking test3 as failed, suspect timeout reached (0 peer confirmations)
--- FAIL: TestSuspicion_Timer (12.56s)
	suspicion_test.go:136: case 5: should not have fired ( 0.500286)

Does/should the BindAddr config support host names?

With recent changes to the network transport is seems that hostnames are no longer correctly supported in the config.BindAddr.

For example if I create a config with BindAddr = "localhost" I get an error attempting to create the TCP listener, since the port is bound on another non localhost interface. If I create the config with BindAddr = "127.0.0.1" no error occurs and things work as normal.

This appears to be the offending line of code, https://github.com/hashicorp/memberlist/blob/master/net_transport.go#L79

If addr is not an IPv4 or IPv6 string then ip is nil and tcpAddr tries to listen on the port with an empty IP which I believe means all interfaces which is not what was specified, and in my case errors.

I see two solutions to this:

Change the line above to resolve the address via net.ResovleIPAddr.
Change config.BindAddr to be of type net.IP to avoid confusion

I can submit a PR to fix this just would like some direction on which is the better fix.

IPv6 parsing in resolveAddr doesn't handle the case where the port is missing

This came out of hashicorp/serf#286 but the problem is down in memberlist. Behavior may have varied across Go versions, but with 1.5.2 if you don't supply a port number then it will try to lookup the address via DNS. Adding the port number does the right thing.

We should fix the support for IPV6 addresses without port numbers inresolveAddr().

util.privateBlocks contains loopback address range

When determining the advertiseAddr the IP of each network interface is examined to see if it is a privateIP.

The set of privateIp ranges that the addresses are evaluated against include the loopback address range (127.0.0.0/8).

When iterating over the interface addresses, the loopback address (127.0.0.1) is always returned first on my system (Ubuntu 14.04- 3.13.0-100-generic #147-Ubuntu).

As a result, 127.0.0.1 is always selected as the privateIP address to advertise to peers, which results in the peers being unable to communicate with each other.

I really think that the loopback range should be removed from the privateBlocks or at the very least moved to the end of the slice so that all other private Ip ranges are checked first.

TestMemberList_ResolveAddr_TCP_First fail

Line 421 in memberlist_test.go
change to: ipPort{net.ParseIP("127.0.0.1").To4()
Then run test again

Deadlock when Shutting down

Using memberlist as a library I occasionally get a deadlock when shutting down the memberlist instance.

The deadlock seems to be caused by this read lock https://github.com/hashicorp/memberlist/blob/master/net.go#L605 and the fact that Shutdown holds the same write lock.

Specifically the Shutdown process waits on packetListen to finish, but packetListen is blocked on the above RLock because ingestPacket eventually calls down to rawSendMsgPacket.

Here are the relevant calls stacks from my application, as you can see I am consuming memberlist indirectly through serf.

### packetListen blocked on RLock
goroutine 424 [semacquire]:
sync.runtime_Semacquire(0xc42052423c)
        /usr/lib/go/src/runtime/sema.go:47 +0x34
sync.(*RWMutex).RLock(0xc420524230)
        /usr/lib/go/src/sync/rwmutex.go:43 +0x95
./github.com/hashicorp/memberlist.(*Memberlist).rawSendMsgPacket(0xc4205241e0, 0xc4203eeb50, 0xf, 0x0, 0xc420f7c580, 0x24b, 0x2bc, 0xb, 0x0)
        ./github.com/hashicorp/memberlist/net.go:605 +0xb83
./github.com/hashicorp/memberlist.(*Memberlist).sendMsg(0xc4205241e0, 0xc4203eeb50, 0xf, 0xc42052ea20, 0x96, 0x102, 0xc420af6368, 0xc4207e37b0)
        ./github.com/hashicorp/memberlist/net.go:579 +0x371
./github.com/hashicorp/memberlist.(*Memberlist).encodeAndSendMsg(0xc4205241e0, 0xc4203eeb50, 0xf, 0x2, 0xc62ea0, 0xc420af6360, 0x0, 0xc41ffc0164)
        ./github.com/hashicorp/memberlist/net.go:549 +0x117
./github.com/hashicorp/memberlist.(*Memberlist).handlePing(0xc4205241e0, 0xc420a7d029, 0x34, 0xdd7, 0x10f8180, 0xc420802990)
        ./github.com/hashicorp/memberlist/net.go:412 +0x41b
./github.com/hashicorp/memberlist.(*Memberlist).handleCommand(0xc4205241e0, 0xc420a7d029, 0x34, 0xdd7, 0x10f8180, 0xc420802990, 0xed0ba4d11, 0x2db42fb9, 0x113d920)
        ./github.com/hashicorp/memberlist/net.go:321 +0x105
./github.com/hashicorp/memberlist.(*Memberlist).handleCompound(0xc4205241e0, 0xc420a7d001, 0x45f, 0xdff, 0x10f8180, 0xc420802990, 0xed0ba4d11, 0xc42db42fb9, 0x113d920)
        ./github.com/hashicorp/memberlist/net.go:392 +0x140
./github.com/hashicorp/memberlist.(*Memberlist).handleCommand(0xc4205241e0, 0xc420a7d001, 0x45f, 0xdff, 0x10f8180, 0xc420802990, 0xed0ba4d11, 0x2db42fb9, 0x113d920)
        ./github.com/hashicorp/memberlist/net.go:316 +0x613
./github.com/hashicorp/memberlist.(*Memberlist).handleCompressed(0xc4205241e0, 0xc420a2a001, 0x2de, 0xffff, 0x10f8180, 0xc420802990, 0xed0ba4d11, 0x2db42fb9, 0x113d920)
        ./github.com/hashicorp/memberlist/net.go:540 +0x277
./github.com/hashicorp/memberlist.(*Memberlist).handleCommand(0xc4205241e0, 0xc420a2a001, 0x2de, 0xffff, 0x10f8180, 0xc420802990, 0xed0ba4d11, 0x2db42fb9, 0x113d920)
        ./github.com/hashicorp/memberlist/net.go:318 +0x6ce
./github.com/hashicorp/memberlist.(*Memberlist).ingestPacket(0xc4205241e0, 0xc420a2a000, 0x2df, 0x10000, 0x10f8180, 0xc420802990, 0xed0ba4d11, 0x2db42fb9, 0x113d920)
        ./github.com/hashicorp/memberlist/net.go:304 +0x11a
./github.com/hashicorp/memberlist.(*Memberlist).packetListen(0xc4205241e0)
        ./github.com/hashicorp/memberlist/net.go:272 +0x1b1
created by ./github.com/hashicorp/memberlist.newMemberlist
        ./github.com/hashicorp/memberlist/memberlist.go:146 +0x857

### NetTransport.Shutdown() blocked waiting on packetListen
### MemberList.Shutdown is holding the write lock causing the deadlock.
goroutine 530 [semacquire]:
sync.runtime_Semacquire(0xc4209fa87c)
        /usr/lib/go/src/runtime/sema.go:47 +0x34
sync.(*WaitGroup).Wait(0xc4209fa870)
        /usr/lib/go/src/sync/waitgroup.go:131 +0xc2
./github.com/hashicorp/memberlist.(*NetTransport).Shutdown(0xc4209fa850, 0xd92dc0, 0xc420524230)
        ./github.com/hashicorp/memberlist/net_transport.go:216 +0x14f
./github.com/hashicorp/memberlist.(*Memberlist).Shutdown(0xc4205241e0, 0x0, 0x0)
        ./github.com/hashicorp/memberlist/memberlist.go:618 +0xfd
./github.com/hashicorp/serf/serf.(*Serf).Shutdown(0xc420200f00, 0x0, 0x0)
        ./github.com/hashicorp/serf/serf/serf.go:808 +0x125

Question: does this gossip protocol pattern match for a pattern and gossip?

I want to know whether the gossip protocol here works the same way like seneca does?
on giving a pattern(keyword) to match, will this match and then gossip and respond back where it found that to perform specific action?

Calling Leave removes the local node from it's own list

This may or may not be intended behavior - if so, I'd appreciate guidance on the right way to do this.

When calling Leave, the current is marked "dead" and broadcast to the rest of the cluster, while keeping our current listeners still active, so that we can receive updates but not be considered "alive". This helps the cluster heal more rapidly once we re-join.

That makes sense, but I'm having a different use case for Leave, which is not working as intended.

I'd like to be able to tell a node "Hey, leave the cluster and go back to just being on your own - Return to your initial state of a membership of 1". This does not seem to be supported by Leave.

Is there another method that I'm missing here? Is my only option to teardown and re-create a memberlist object from scratch in this case? It would seem like it should be possible to complete cause a natural, and expected, full excision of a given node from a cluster, leaving it in a working state as it would have been before being joined.

Any help would be appreciated

Multiple Memberlists on a single machine

I apologize if I'm having a total brain fart here (stemming from either a Thanksgiving food coma or the fact that this is my first foray into go), but I'm attempting to launch two instances of a service using the Memberlist library on my local machine, using separate ports for Memberlist, but am hitting an error:

[ERR] Conflicting address for mbp-chrisc-lt. Mine: 192.168.206.249:8001 Theirs: 192.168.206.249:8002

Based on the block below (line 586) in the state.go file, it appears as though this configuration may not be supported by Memberlist:

 // Check if this address is different than the existing node
 if !reflect.DeepEqual([]byte(state.Addr), a.Addr) || state.Port != a.Port {
        m.logger.Printf("[ERR] Conflicting address for %s. Mine: %v:%d Theirs: %v:%d",
                        state.Name, state.Addr, state.Port, net.IP(a.Addr), a.Port)
        return
}

If I read that right, it will print that error and return if the two addresses are NOT the same OR the ports are NOT the same. Should it be the other way around?

Use of ChannelEventDelegate is not clear

I want to use it as an Events field of a default config. But in my application I cannot read events from the send only channel Ch.

What is the intended purpose?

Remove use of PKCS7 padding

Since we are using AES-GCM instead of AES-CBC we no longer need to pad the messages. We will introduce a protocol version bump and drop it in that version.

Broadcasting message to cluster

Is there an easy way to broadcast a payload to all other members of the cluster. I've looked through the documentation and the only way to do it seems to be to implement Delegate, however I'm having difficulty understanding what the majority of the interface functions are supposed to do. It would be useful to have an example use case of the library beyond just connecting to a cluster

Expose a method to send a user message over TCP

Can a feature be added to memberlist for sending a user message over TCP? I'm looking for a way to send a directed message to a single peer with stronger delivery guarantees than UDP.

Request: dependency injection for transport

IMHO, a problem with memberlist is that it mixes control and transport planes. The library creates and maintains sockets while it should focus in the control and leave the transport to the user. The current situation makes it very difficult to integrate the library in some environments (ie, NATed) or with other transports (ie, custom protocols, ZeroMQ, etc). For example, a common practice for NATed peers to join each other is to send UDP packets repeatedly until they can finally connect, so the Dial in this scenario involves something more than a simple DialUDP. I think that memberlist would be a perfect match for many distributed systems, but I think it should focus on the control plane and it should work in the same way regardless the underlying transport.

So I would suggest to modify the library and add some kind of dependency injection for the transport. A new parameter could be added to the config for memberlist.Create() where uses could specify a transport factory that could provide new connection/connectionless objects that should follow the corresponding interfaces. memberlist.Default*Config() could initialize the config with TCP/UDP dialers/listeners, so it would hopefully not break most of current code...

Perform on-join DNS requests over TCP

If we make the request via TCP we can more easily get a larger initial join list from the DNS server. Right now the default UDP client will only get a short list of responses (~3). If we pull SRV records we can get the port information as well.

A lot of test failed

        memberlist_test.go:898: err: Failed to start TCP listener. Err: listen tcp 127.0.0.38:7946: bind: can't assign requested address
--- FAIL: TestMemberlist_conflictDelegate (0.00 seconds)
        memberlist_test.go:939: err: Failed to start TCP listener. Err: listen tcp 127.0.0.39:7946: bind: can't assign requested address
--- FAIL: TestHandleCompoundPing (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.41:8045: bind: can't assign requested address
--- FAIL: TestHandlePing (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.42:8045: bind: can't assign requested address
--- FAIL: TestHandlePing_WrongNode (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.43:8045: bind: can't assign requested address
--- FAIL: TestHandleIndirectPing (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.44:8045: bind: can't assign requested address
--- FAIL: TestTCPPushPull (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.45:8045: bind: can't assign requested address
--- FAIL: TestSendMsg_Piggyback (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.46:8045: bind: can't assign requested address
--- FAIL: TestMemberList_Probe (0.00 seconds)
        state_test.go:20: failed to get memberlist: Failed to start TCP listener. Err: listen tcp 127.0.0.47:7946: bind: can't assign requested address
--- FAIL: TestMemberList_ProbeNode_Suspect (0.00 seconds)
        state_test.go:20: failed to get memberlist: Failed to start TCP listener. Err: listen tcp 127.0.0.49:7946: bind: can't assign requested address
--- FAIL: TestMemberList_ProbeNode (0.00 seconds)
        state_test.go:20: failed to get memberlist: Failed to start TCP listener. Err: listen tcp 127.0.0.53:7946: bind: can't assign requested address
--- FAIL: TestMemberList_ResetNodes (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.55:8045: bind: can't assign requested address
--- FAIL: TestMemberList_AliveNode_NewNode (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.56:8045: bind: can't assign requested address
--- FAIL: TestMemberList_AliveNode_SuspectNode (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.57:8045: bind: can't assign requested address
--- FAIL: TestMemberList_AliveNode_Idempotent (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.58:8045: bind: can't assign requested address
--- FAIL: TestMemberList_AliveNode_ChangeMeta (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.59:8045: bind: can't assign requested address
--- FAIL: TestMemberList_AliveNode_Refute (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.60:8045: bind: can't assign requested address
--- FAIL: TestMemberList_SuspectNode_NoNode (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.61:8045: bind: can't assign requested address
--- FAIL: TestMemberList_SuspectNode (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.62:8045: bind: can't assign requested address
--- FAIL: TestMemberList_SuspectNode_DoubleSuspect (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.63:8045: bind: can't assign requested address
--- FAIL: TestMemberList_SuspectNode_OldSuspect (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.64:8045: bind: can't assign requested address
--- FAIL: TestMemberList_SuspectNode_Refute (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.65:8045: bind: can't assign requested address
--- FAIL: TestMemberList_DeadNode_NoNode (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.66:8045: bind: can't assign requested address
--- FAIL: TestMemberList_DeadNode (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.67:8045: bind: can't assign requested address
--- FAIL: TestMemberList_DeadNode_Double (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.68:8045: bind: can't assign requested address
--- FAIL: TestMemberList_DeadNode_OldDead (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.69:8045: bind: can't assign requested address
--- FAIL: TestMemberList_DeadNode_Refute (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.70:8045: bind: can't assign requested address
--- FAIL: TestMemberList_MergeState (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.71:8045: bind: can't assign requested address
--- FAIL: TestMemberlist_Gossip (0.00 seconds)
        state_test.go:20: failed to get memberlist: Failed to start TCP listener. Err: listen tcp 127.0.0.72:7946: bind: can't assign requested address
--- FAIL: TestMemberlist_PushPull (0.00 seconds)
        state_test.go:20: failed to get memberlist: Failed to start TCP listener. Err: listen tcp 127.0.0.74:7946: bind: can't assign requested address
--- FAIL: TestVerifyProtocol (0.00 seconds)
        memberlist_test.go:103: failed to start: Failed to start TCP listener. Err: listen tcp 127.0.0.76:8045: bind: can't assign requested address
FAIL

go version go1.2 darwin/amd64

suspicion timeouts are quantized by whole-number multiples of Log10(cluster size)

The current logic (in util.go:suspicionTimeout) for computing the suspicion timeouts is:

nodeScale := math.Ceil(math.Log10(float64(n + 1)))
timeout := time.Duration(suspicionMult) * time.Duration(nodeScale) * interval

The Ceil operator causes nodeScale to always be a whole-number multiple of Log10(n+1), with the result that the timeout increases as a step function:

           n < 10    =>  timeout = 1 x suspicionMult x interval
     10 <= n < 100   =>  timeout = 2 x suspicionMult x interval
    100 <= n < 1000  =>  timeout = 3 x suspicionMult x interval

This change:

nodeScale := math.Max(1.0, math.Log10(math.Max(1.0, float64(n))))

would make it rise smoothly (but still not dropping below 1x if n < 10):

      n = 5       =>  timeout = 1 x suspicionMult x interval
      n = 10      =>  timeout = 1 x suspicionMult x interval
      n = 50      =>  timeout = 1.7 x suspicionMult x interval
      n = 100     =>  timeout = 2.0 x suspicionMult x interval
      n = 500     =>  timeout = 2.7 x suspicionMult x interval
      n = 1000    =>  timeout = 3.0 x suspicionMult x interval

It still guards against n == 0, which I believe was the intention of the +1 in (n + 1).

Memberlist logger should be exported

Hi there,

I think the logger should be exported instead of the LogOutput in the configuration; I'm using a custom logger with a custom format however I must hack a custom io.Writer to have memberlist use my custom logger.

Is there functionality for rejoining the cluster when a node restarts?

Lets say a node is running on a computer that goes down for a given amount of time, is there functionality built into this library to rejoin the cluster it was part of?

First thing that comes to mind is just saving one or more of the ip's of other nodes in a file, but there's probably a more intuitive way of doing this.

I know this seems like a bit of a lazy question, I did look through the code, I just want to make sure :)

If one side's merge delegate fails, the other side can think it joined

While testing things that led to hashicorp/consul#3451 I realized that if one side fails the merge delegate it will refuse to let the new node join its side of the pool, but the other node will think it joined. We should make sure we propagate an error to the new node.

This makes transitions cleaner where the new node can get an error and exit, even if it doesn't have an identical merge delegate to the other side.

Can't Join after leave from cluster

Why a member can't rejoin into a cluster after leave from the cluster, unless I restart the member?

Anyone who can help me? thanks in advance.

memberlist.Join is slow if many hosts cannot be reached - running in serial?

I'm trying to use the start_join configuration on my Consul clients with a list of all the potential LAN server IPs (of which there are 33) as they are assigned via DHCP. I am running 3 Consul servers in that range.

I've noticed that the initial join is very slow with this many IPs, as if it is trying to join them in serial. It takes about a minute for it to check through 10 IP addresses and get to one which is actually hosting a server. I couldn't find any configuration in Consul which seemed relevant for reducing what I presume is a long connection timeout, or to try and join the start_join addresses in parallel

I think this is the relevant source code, though I am not very familiar with golang:

memberlist/memberlist.go

Line 179 in 56f5fd7

for _, exist := range existing {

It appears that this attempts to join the existing array in serial. Is that correct? If so would it be reasonable to attempt to join these members in parallel?

0.1 Release breaks serf

Hi,

I am one of the Debian maintainers for the hashicorp/* packages. As a general rule, we don't ship vendored dependencies, instead splitting them into separate packages.

Since the upgrade of memberlist to 0.1, serf has started to get all kinds of test errors (see build logs here https://buildd.debian.org/status/package.php?p=golang-github-hashicorp-serf). It seems that it fails to talk to gossip peers, and tests timeout or just plain fail.

Thanks.

Data Races

@slackpad a data race on Incarnation. Maybe you have an idea on how to address this:

WARNING: DATA RACE
Write at 0x00c424b902d0 by goroutine 5112:
  github.com/hashicorp/consul/vendor/github.com/hashicorp/memberlist.(*Memberlist).refute()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/vendor/github.com/hashicorp/memberlist/state.go:807 +0x7f
  github.com/hashicorp/consul/vendor/github.com/hashicorp/memberlist.(*Memberlist).suspectNode()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/vendor/github.com/hashicorp/memberlist/state.go:1033 +0xa41
  github.com/hashicorp/consul/vendor/github.com/hashicorp/memberlist.(*Memberlist).handleSuspect()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/vendor/github.com/hashicorp/memberlist/net.go:518 +0x297
  github.com/hashicorp/consul/vendor/github.com/hashicorp/memberlist.(*Memberlist).packetHandler()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/vendor/github.com/hashicorp/memberlist/net.go:384 +0x360

Previous read at 0x00c424b902d0 by goroutine 168:
  github.com/hashicorp/consul/vendor/github.com/hashicorp/memberlist.(*Memberlist).Leave()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/vendor/github.com/hashicorp/memberlist/memberlist.go:580 +0x269
  github.com/hashicorp/consul/vendor/github.com/hashicorp/serf/serf.(*Serf).Leave()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/vendor/github.com/hashicorp/serf/serf/serf.go:690 +0x2fc
  github.com/hashicorp/consul/agent/consul.(*Client).Leave()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/agent/consul/client.go:180 +0xf9
  github.com/hashicorp/consul/agent/consul.TestLeader_LeftMember()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/agent/consul/leader_test.go:178 +0x25b
  testing.tRunner()
      /Users/frank/go1.9/src/testing/testing.go:746 +0x16c

Goroutine 5112 (running) created at:
  github.com/hashicorp/consul/vendor/github.com/hashicorp/memberlist.newMemberlist()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/vendor/github.com/hashicorp/memberlist/memberlist.go:180 +0x895
  github.com/hashicorp/consul/vendor/github.com/hashicorp/memberlist.Create()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/vendor/github.com/hashicorp/memberlist/memberlist.go:190 +0x3c
  github.com/hashicorp/consul/vendor/github.com/hashicorp/serf/serf.Create()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/vendor/github.com/hashicorp/serf/serf/serf.go:395 +0x13b7
  github.com/hashicorp/consul/agent/consul.(*Client).setupSerf()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/agent/consul/client_serf.go:47 +0xe8b
  github.com/hashicorp/consul/agent/consul.NewClientLogger()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/agent/consul/client.go:138 +0xa80
  github.com/hashicorp/consul/agent/consul.NewClient()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/agent/consul/client.go:80 +0x41
  github.com/hashicorp/consul/agent/consul.testClientWithConfig()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/agent/consul/client_test.go:57 +0x78
  github.com/hashicorp/consul/agent/consul.testClient()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/agent/consul/client_test.go:39 +0x85
  github.com/hashicorp/consul/agent/consul.TestLeader_LeftMember()
      /Users/frank/go/consul/src/github.com/hashicorp/consul/agent/consul/leader_test.go:157 +0xc7

Ping a node that tries to join the cluster before letting it join

This problem happened in the context of Consul, but the issue is with memberlist inside of Consul.

When a single member of the cluster is unreachable to reach, but able to send messages, it can pollute the gossip cluster with inaccurate and irrelevant information. This can end up happening accidentally pretty easily with forgetting to open a port on an EC2 security group. This problem continues to get worse the more nodes in the cluster that aren't cooperating.

When the appropriate node doesn't refute the message fast enough, it gets kicked from the cluster. For our high availability components this isn't a big problem, albeit annoying, but for our single instance nodes this causes a big problem for DNS.

This gets even worse when this uncooperative node just starts telling the cluster that one of the servers is dead. This can end up causing a leader election if the server doesn't refute the message fast enough.

It would be nice if, on join, memberlist attempted to detect a misbehaving node and kicked it from the cluster before it can cause a cluster outage. This can be as simple as just performing a simple ping and getting a response. Since this would be done on joining the cluster, it's probably better to avoid probeNode (which sends other messages) and only send the ping message.

Can we support pluggable interface based logger?

It would be nice to use a interface based logger option with multi-level output support. The current log implementation produces too many debug logs which are rarely useful.

Member list has flaky tests

Tests timeout intermittently, needs to be more robust.

Enable Sourcegraph

I want to use Sourcegraph code search and code review with memberlist. A project maintainer needs to enable it to set up a webhook so the code is up-to-date there.

Could you please enable memberlist on @sourcegraph by going to https://sourcegraph.com/github.com/hashicorp/memberlist and clicking on Settings? (It should only take 15 seconds.)

Thank you!

Clarify that it is not safe to call any of the methods of the Memberlist type inside the EventDelegate methods.

The problem I'm experiencing is that I call Memberlist.Members() inside the NotifyLeave delegate of that memberlist. This will block everything on the Memberlist.nodeLock.

My instance was waiting in the following places, I didn't have enough time to look for where the lock was actually held.
https://github.com/hashicorp/memberlist/blob/master/memberlist.go#L403
https://github.com/hashicorp/memberlist/blob/master/state.go#L298
https://github.com/hashicorp/memberlist/blob/master/state.go#L168

IPv6 addresses with zones not supported in resolveAddr()

An IPv6 address can have a "%zone" section on the end, where the zone is (typically) an interface name. This is mandatory when link-local addresses are in use (which can be desirable if one wants a configuration-free transport with non-routable addresses).

Note that net.SplitHostPort() puts the zone on the host string it emits, and that any such value should eventually be in the Zone element of any net.IPAddr struct or derivative.

At present, resolveAddr() yields a "no such host" for such addresses.

memberlist without tcp

I'd like to use the memberlist library with UDP only.
My protocol will use the SendToUDP() interface exclusively and I'd like the membership list returned to report only UDP connectivity.
It appears that some TCP related actions (open port, DNS, etc) are performed regardless of the value in DisableTcpPings.
If I work on a patch to exclude all TCP actions for DisableTcpPings true, would it be possible to merge this up stream?
Alan

What will it do with the tcp ackrespmsg

in net.go line 256:
if recv a tcp ping package~then will send back a tcp ack package!
but nothing deal with this msg!
only the udp ping and ack package will be deal!

queue size for nodes that lives cluster

I want to use gossip protocol for my hand-made dns servers to speedup updating zones (not wait for ttl expire), if node leaves cluster how much messages after re-joining it can receive?
How much size i need for this?

Question: Roles of nodes in memberlist

Hi @armon and @mitchellh ,

Do you think it would be possible to designate role or additional metadata regarding a node when it joins a cluster? I understand that it's currently not possible but would be interested in a PR which would allow a user to configure the role of a node and query memberlist for nodes with specific metadata?

We are currently using the name field and adding more information to the name and parse the names to decipher the roles of various nodes in a cluster.

[DEBUG]/[ERR] Logging

How to turn off/disable these logs?

User broadcasts are not propagated through the network

Maybe I'm missing something, but I've found, that when I return broadcast message from custom defined Delegate, this message is send only to one other node. And from that other node message is not propagated to other nodes.

So I need to write some kind of propagation mechanism to achieve this in ugly way.

Is it a bug or feature?

Failed UDP Ping

I have two nodes running locally on my machine.
One with the default port for memberlist. and another that assigns the first free port.

I'm seeing this in the logs:

2016/11/11 21:54:50 [DEBUG] memberlist: Initiating push/pull sync with: 127.0.0.1:7946
2016/11/11 21:54:50 Member: seed_se-lpt-0002:7946 169.254.87.13
2016/11/11 21:54:50 Member: member_se-lpt-0002:28210 169.254.87.13
Running
2016/11/11 21:54:51 [DEBUG] memberlist: Failed UDP ping: seed_se-lpt-0002:7946 (timeout reached)
2016/11/11 21:54:52 [INFO] memberlist: Suspect seed_se-lpt-0002:7946 has failed, no acks received
2016/11/11 21:54:53 [DEBUG] memberlist: Failed UDP ping: seed_se-lpt-0002:7946 (timeout reached)
2016/11/11 21:54:55 [INFO] memberlist: Suspect seed_se-lpt-0002:7946 has failed, no acks received
2016/11/11 21:54:55 [INFO] memberlist: Marking seed_se-lpt-0002:7946 as failed, suspect timeout reached (0 peer confirmations)

Why are UDP pings failing even though both nodes run on the same machine?
The nodes can clearly see eachother early on, but then drops contact due to missing acks.

ideas why?

Failing tests

Hi,

I have some tests failing while building this package for Debian. There are a few timing-dependent tests that I had to either disable or increase their timeouts heavily, and also one where the result comes unordered so I had to order the messages before comparing.

Here is my current diff:

--- a/memberlist_test.go
+++ b/memberlist_test.go
@@ -731,7 +731,7 @@
 
 	m1.Shutdown()
 
-	time.Sleep(10 * time.Millisecond)
+	time.Sleep(10000 * time.Millisecond)
 
 	if len(m2.Members()) != 1 {
 		t.Fatalf("should have 1 nodes! %v", m2.Members())
@@ -925,7 +925,7 @@
 	}
 
 	// Wait for a little while
-	time.Sleep(3 * time.Millisecond)
+	time.Sleep(300 * time.Millisecond)
 
 	// Ensure we got the messages
 	if len(d1.msgs) != 2 {
@@ -1099,6 +1099,7 @@
 			t.Fatalf("unexpected err: %s", err)
 		}
 
+		time.Sleep(300 * time.Millisecond)
 		// Check the hosts
 		if len(m2.Members()) != 2 {
 			t.Fatalf("should have 2 nodes! %v", m2.Members())
@@ -1295,6 +1296,7 @@
 	}
 
 	yield()
+	time.Sleep(300 * time.Millisecond)
 
 	// Ensure we were notified
 	if mock.other == nil {
--- a/state_test.go
+++ b/state_test.go
@@ -602,6 +602,7 @@
 }
 
 func TestMemberList_ProbeNode_Awareness_MissedNack(t *testing.T) {
+	t.Skip("Skipping timing-dependent test.")
 	addr1 := getBindAddr()
 	addr2 := getBindAddr()
 	addr3 := getBindAddr()
@@ -1771,7 +1772,7 @@
 	for i := 0; i < 2; i++ {
 		select {
 		case <-ch:
-		case <-time.After(10 * time.Millisecond):
+		case <-time.After(10000 * time.Millisecond):
 			t.Fatalf("timeout")
 		}
 	}
--- a/transport_test.go
+++ b/transport_test.go
@@ -1,7 +1,8 @@
 package memberlist
 
 import (
-	"bytes"
+	"sort"
+	"strings"
 	"testing"
 	"time"
 )
@@ -116,9 +117,14 @@
 	}
 	time.Sleep(100 * time.Millisecond)
 
-	received := bytes.Join(d1.msgs, []byte("|"))
-	expected := []byte("SendTo|SendToUDP|SendToTCP|SendBestEffort|SendReliable")
-	if !bytes.Equal(received, expected) {
+	msgs := make([]string, len(d1.msgs))
+	for i := range(d1.msgs) {
+		msgs[i] = string(d1.msgs[i][:])
+	}
+	sort.Strings(msgs)
+	received := strings.Join(msgs, "|")
+	expected := string("SendBestEffort|SendReliable|SendTo|SendToTCP|SendToUDP")
+	if received != expected {
 		t.Fatalf("bad: %s", received)
 	}
 }
--- a/suspicion_test.go
+++ b/suspicion_test.go
@@ -30,6 +30,7 @@
 }
 
 func TestSuspicion_Timer(t *testing.T) {
+	t.Skip("Skipping timing-dependent test.")
 	const k = 3
 	const min = 500 * time.Millisecond
 	const max = 2 * time.Second

Race found with Node.PMax

Hi hashis,

Found this data race while using memberlist as of commit 23ad4b7 :

WARNING: DATA RACE
Read at 0x00c4211acf49 by goroutine 660:
  github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist.(*Memberlist).rawSendMsgUDP()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist/net.go:662 +0x63b
  github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist.(*Memberlist).gossip()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist/state.go:509 +0x733
  github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist.(*Memberlist).(github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist.gossip)-fm()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist/state.go:94 +0x41
  github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist.(*Memberlist).triggerFunc()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist/state.go:118 +0x159

Previous write at 0x00c4211acf49 by goroutine 705:
  github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist.(*Memberlist).aliveNode()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist/state.go:952 +0xef2
  github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist.(*Memberlist).handleAlive()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist/net.go:558 +0x32f
  github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist.(*Memberlist).udpHandler()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist/net.go:410 +0x308

Goroutine 660 (running) created at:
  github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist.(*Memberlist).schedule()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist/state.go:94 +0x423
  github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist.Create()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist/memberlist.go:164 +0xcd
  github.com/fastly/checkup/vendor/github.com/hashicorp/serf/serf.Create()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/vendor/github.com/hashicorp/serf/serf/serf.go:398 +0x13d5
  github.com/fastly/checkup/cluster.New()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/cluster/cluster.go:126 +0x6e0
  github.com/fastly/checkup/checkup.New()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/checkup/checkup.go:206 +0x13a7
  github.com/fastly/checkup/test.StartCheckup()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/test/helper.go:62 +0x36f
  command-line-arguments_test.startManyCheckupsAndFakeVarnishes()
      /home/vagrant/devel/go-workspace/checkup/fakevarnish/scale_test.go:333 +0x534
  command-line-arguments_test.TestManyCheckupsWithDifferentBackends()
      /home/vagrant/devel/go-workspace/checkup/fakevarnish/scale_test.go:166 +0x20f
  testing.tRunner()
      /home/vagrant/devel/go-workspace/go/src/testing/testing.go:657 +0x107

Goroutine 705 (running) created at:
  github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist.newMemberlist()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist/memberlist.go:146 +0xde0
  github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist.Create()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/vendor/github.com/hashicorp/memberlist/memberlist.go:156 +0x3c
  github.com/fastly/checkup/vendor/github.com/hashicorp/serf/serf.Create()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/vendor/github.com/hashicorp/serf/serf/serf.go:398 +0x13d5
  github.com/fastly/checkup/cluster.New()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/cluster/cluster.go:126 +0x6e0
  github.com/fastly/checkup/checkup.New()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/checkup/checkup.go:206 +0x13a7
  github.com/fastly/checkup/test.StartCheckup()
      /home/vagrant/devel/go-workspace/src/github.com/fastly/checkup/test/helper.go:62 +0x36f
  command-line-arguments_test.startManyCheckupsAndFakeVarnishes()
      /home/vagrant/devel/go-workspace/checkup/fakevarnish/scale_test.go:333 +0x534
  command-line-arguments_test.TestManyCheckupsWithDifferentBackends()
      /home/vagrant/devel/go-workspace/checkup/fakevarnish/scale_test.go:166 +0x20f
  testing.tRunner()
      /home/vagrant/devel/go-workspace/go/src/testing/testing.go:657 +0x107
==================

I'm not sure if nodeLock (

memberlist/memberlist.go

Line 47 in 902b55b

nodeLock sync.RWMutex

) is intended to protect just nodeMap in aliveNode() or also the nodeState / Nodes from the map and their data (such as PMax).

Looks like potential races also in probeNode:
https://github.com/hashicorp/memberlist/blob/master/state.go#L326
https://github.com/hashicorp/memberlist/blob/master/state.go#L347

	kNodes := kRandomNodes(m.config.GossipNodes, m.nodes, func(n *nodeState) bool {
	if n.Name == m.config.Name {
	return true
	}

	switch n.State {
	case stateAlive, stateSuspect:
	return false

	case stateDead:
	return time.Since(n.StateChange) > m.config.GossipToTheDeadTime

	default:
	return true
	}
	})

	func kRandomNodes(k int, nodes []nodeState, filterFn func(nodeState) bool) []*nodeState {
	n := len(nodes)
	kNodes := make([]*nodeState, 0, k)
	OUTER:
	// Probe up to 3*n times, with large n this is not necessary
	// since k << n, but with small n we want search to be
	// exhaustive
	for i := 0; i < 3*n && len(kNodes) < k; i++ {
	// Get random node
	idx := randomOffset(n)
	node := nodes[idx]

	// Give the filter a shot at it.
	if filterFn != nil && filterFn(node) {
	continue OUTER
	}

	// Check if we have this node already
	for j := 0; j < len(kNodes); j++ {
	if node == kNodes[j] {
	continue OUTER
	}
	}

	// Append the node
	kNodes = append(kNodes, node)
	}
	return kNodes
	}