uber / ringpop-go Goto Github PK

View Code? Open in Web Editor NEW

819.0 2.6K 82.0 4.83 MB

Scalable, fault-tolerant application-layer sharding for Go applications

Home Page: http://www.uber.com

License: MIT License

Go 96.95% Makefile 0.31% Shell 2.43% Thrift 0.10% Perl 0.22%

ringpop-go's Introduction

ringpop-go

(This project is no longer under active development.)

Ringpop is a library that brings cooperation and coordination to distributed applications. It maintains a consistent hash ring on top of a membership protocol and provides request forwarding as a routing convenience. It can be used to shard your application in a way that's scalable and fault tolerant.

Getting started

To install ringpop-go:

go get github.com/uber/ringpop-go

Developing

First make certain that thrift (OSX: brew install thrift) and glide are in your path (above). Then,

make setup

to install remaining golang dependencies and install the pre-commit hook.

Finally, run the tests by doing:

make test

Documentation

Interested in where to go from here? Read the docs at ringpop.readthedocs.org

ringpop-go's People

Contributors

Stargazers

Watchers

Forkers

coderqq dansimau gl-works wanghaofu dehora mennopruijssers corgiman smothiki andrew-stg maniacs-ops jdbellamy qianxiangshan conseweb andylau004 noah-mcaulay gauravkumar28 godeep apersson zhuangzhi jiaguofang fast01 doodle-tnw linuxerwang sysbot hadoop835 longquanzheng tailin zofuthan awesomegolang himank dgirishkrishna jangocheng spencerx codelingobot hobbes26 liuchang0520 libingli prkara mukteshkrmishra bonedaddy andypeng2015 congwangeid tiagoad sug0 fr34k8 xuzhu-lab holajiawei nirajkvinit 5l1v3r1 nystya 1500mh deanpham98 standardgalactic nienie kngoyal khileshchauhan manick02 ajunlonglive devdoshi jarpit96 diamond0cat showsmall binhnguyenduc gandhikrishna fengbaicanhe isabella232 subhadeepchandra1 isgasho phirtual bdjimmy jaime0815 lcpinto iq-scm khanraa 0o001 azarc-io aman0303 luis-sousa-pinto bikermonk shehackedyou anhnguyensgu luis-pinto-fanduel

ringpop-go's Issues

Ringpop with HTTP server

Hi, I am trying to use ringpop with http.Server. The server's handler receives requests and uses ringpop to handle or forward the requests to other servers on the ring. But I am unable to find any docs or guides or examples. Can someone please share a link to relevant examples? Thanks.

Flappy test TestJoinHandlerNotMakingAlive

--- FAIL: TestJoinHandlerNotMakingAlive (0.05s)
    Error Trace:    node_bootstrap_test.go:110
    Error:      Should be false
    Messages:   didn't expect the bootstapping node to appear in the member list of any peers

    Error Trace:    node_bootstrap_test.go:111
    Error:      Should be false
    Messages:   didn't expect existing node to have changes

--- FAIL: TestBootstrapTestSuite (0.47s)

Have received this a couple of times now on separate branches.

`go tool vet` is not running on all examples

When running make lint the vetting tool is not ran on all our examples by default. This let a vetting error slip through the cracks in the ping-thrift-gen example.

For some reason go tool vet only runs on this example after installing the example:

$ go install github.com/uber/ringpop-go/examples/ping-thrift-gen/

When you run make lint after an installation you will see the following error

$ make lint
ERROR: ./examples/ping-thrift-gen/main.go:95: github.com/uber/ringpop-go/examples/ping-thrift-gen/gen-go/ping.Pong composite literal uses unkeyed fields
make: *** [lint] Error 1

This will prevent the precommit hook that is setup to finish which makes it harder to commit. The easiest way to remove this error is by removing the ping-thrift-gen from pkg via

$ rm -rf $GOPATH/pkg/darwin_amd64/github.com/uber/ringpop-go/examples/ping-thrift-gen # OSX specific

We should find the root cause for the vetter to not be running correctly and fix the vetting issue that it finds when it does run.

Flappy test TestIndirectPing6

--- FAIL: TestIndirectPing6 (1.07s)
    Error Trace:    ping_request_test.go:308
    Error:      "[ping request timed out]" should have 0 item(s), but has 1
    Messages:   expected no errors from the helper nodes

FAIL

Build:
https://travis-ci.org/uber/ringpop-go/jobs/110344763

Possible race in TestFlushesAfterInterval test, update_rollup_test.go

While running the swim package test suit the TestFlushesAfterInterval test failed with:

--- FAIL: TestFlushesAfterInterval (0.00s)
        Error Trace:    update_rollup_test.go:106
    Error:      Should be empty, but was map[one:[{ 0 one 0  {63587583130 67548078 0x7b3660}}] two:[{ 0 two 0  {63587583130 67548078 0x7b3660}}]]
    Messages:   expected buffer to be flushed

--- FAIL: TestRollupTestSuite (0.01s)

I haven't seen it before and I wasn't able to reproduce it again.

mock generation broken: GOPATH overrides do not work with mockery

GOPATH is set to : separated list of paths, which doesn't work with the current mockery generation script (test/gen-testfiles).

Join w/ backoff

Keep Go version on par with Node.js and provide ability for Go to backoff its join retries.

Leader election

The Ringpop documentation has a section about implementing leader election but it's empty. Any chance you could shed some light on how to do so with Ringpop?

Improve stats during partition healing

We should add more stats to partition healing.
For healing through discover provider:

timer for complete heal-operation
counts for success heal-operation
counts for error (max failures) heal-operation

For heal-attempts:

timer for complete heal attempt.
count for successes
count for errors

(this is the ringpop-go counterpart of uber-node/ringpop-node#267)

unable to count members of the ring for statting

I keep getting this message from ringpop-go apps:

{"level":"error","msg":"facility.go:164 unable to count members of the ring for statting: \"ringpop is not bootstrapped\"","time":"2016-02-25T18:15:57+01:00"}

I think ringpop is internally trying to emit stats before it's ready. It shouldn't do this.

Rework locking in ringpop-go

Locking in ringpop-go is too granular, and is giving us headaches:

There are many module-level and datastructure-level locks which are difficult to reason about. That makes it difficult to refactor anything that changes way the critical sections are locked, and is hard to do right without adding new race conditions or deadlocks (ask @motiejus).
When we need to fix a race condition that we know about, to avoid (1), we fall back into adding yet another lock. For example, see #112.

To fix this, we agreed to do the following:

Make locking much more coarse: ideally, only one lock in Ringpop. The lock would be acquired when ringpop code is entered via API or network call (and released when a network call is made). That way we would know that whenever ringpop is mutating or reading its state, the lock is locked.
Remove all the other locks.
Win!

Flappy test TestSuspectBecomesFaulty

--- FAIL: TestSuspectBecomesFaulty (0.01s)
    Error Trace:    suspicion_test.go:103
    Error:      Not equal: "faulty" (expected)
                    != "alive" (actual)
    Messages:   expected member to be faulty

--- FAIL: TestSuspicionTestSuite (0.02s)

Build:
https://travis-ci.org/uber/ringpop-go/jobs/110352908

handleChanges refactor

The handleChanges implementation is strange or too subtle for me to understand. I'm proposing to change it to match the following patch:

@@ -391,24 +395,22 @@ func (n *Node) Bootstrap(opts *BootstrapOptions) ([]string, error) {
 //= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

 func (n *Node) handleChanges(changes []Change) {
+       n.disseminator.AdjustMaxPropagations()
        for _, change := range changes {
                n.disseminator.RecordChange(change)

                switch change.Status {
                case Alive:
                        n.suspicion.Stop(change)
-                       n.disseminator.AdjustMaxPropagations()
-
-               case Faulty:
-                       n.suspicion.Stop(change)

                case Suspect:
                        n.suspicion.StartSuspect(change)
-                       n.disseminator.AdjustMaxPropagations()
+
+               case Faulty:
+                       n.suspicion.Stop(change)

                case Leave:
                        n.suspicion.Stop(change)
-                       n.disseminator.AdjustMaxPropagations()
                }
        }
 }

This calls AdjustMaxPropagations a single time, before the loop starts. It also lists the states in the logical order.

Flappy Test: TestFilterChangesFromSender

--- FAIL: TestFilterChangesFromSender (0.00s)
    Error Trace:    disseminator_test.go:342
    Error:      "[{192.0.2.1:1 %!s(int64=1470763549492) 127.0.0.1:3002 %!s(int64=1470763549491) alive %!s(bool=false) map[] {%!s(int64=63606360349) %!s(int32=492053247) %!s(*time.Location=&{ [] [] 0 0 <nil>})}} {127.0.0.1:3002 %!s(int64=1470763549492) 127.0.0.1:3003 %!s(int64=1470763549491) suspect %!s(bool=false) map[] {%!s(int64=63606360349) %!s(int32=492089845) %!s(*time.Location=&{ [] [] 0 0 <nil>})}} {192.0.2.1:1 %!s(int64=1470763549492) 127.0.0.1:3004 %!s(int64=1470763549491) faulty %!s(bool=false) map[] {%!s(int64=63606360349) %!s(int32=492115184) %!s(*time.Location=&{ [] [] 0 0 <nil>})}}]" should have 2 item(s), but has 3
    Messages:   expected one change was filtered

    Error Trace:    disseminator_test.go:345
    Error:      Should be false
    Messages:   expected that suspect change filtered

--- FAIL: TestDisseminatorTestSuite (0.00s)
FAIL
FAIL    github.com/uber/ringpop-go/swim 10.653s

Have seen this issue a couple of times now on travis. The fight on flappy tests is still not over.

Log joined nodes

When a new node joins you see this in the logs:

INFO[0001] bootstrap complete                            joined=[127.0.0.1:3000]

On 127.0.0.1:3000 the first node of the cluster there's nothing in the logs indicating a new node joined. A similar log entry would be nice on the seed node. I suggest not using joined for both log entries because it will be difficult to write automated log parsing that could distinguished between a new bootstrapping node vs a seed node receiving a new joined node.

Remove Hosts and File-based host list in favor of DiscoverProvider

We have DiscoverProvider interface for passing in a list of hosts to bootstrap. We also have a legacy mechanism File and Hosts to pass in a JSON file and a static list of hosts.

To do: replace File and Hosts with the equivalent DiscoverProviders, and remove the legacy ones.

Bootstrap can occur without listen

(More or less duplicate of uber-node/ringpop-node#275)

Ringpop currently allows bootstrap to occur without a listening tchannel underneath. This was confirmed by using tick-cluster with bootstrap but not listen on both a single node, and all nodes (patch below).

Behaviorally, a single node failing to listen is the worst case. It has continuous 1-way interactions with other nodes, and seems to create a continuous cycle of other nodes marking it suspect. This is possible in real life during a rolling upgrade, or if bootstrap/listen handling is incorrect in some cases.

If all nodes fail to listen, they all simply fail to bootstrap, as expected.

Our current code demonstrates listen() before bootstrap() pretty consistently, but given the failure modes, we ought to be more defensive and confirm that the tchannel is already listening, or call r.channel.listen()ourselves.

Behavior was confirmed by watching tick-cluster logs, and running ringpop-admin dump on one of the live nodes.

diff --git a/scripts/testpop/testpop.go b/scripts/testpop/testpop.go
index a239893..d6a3d1b 100644
--- a/scripts/testpop/testpop.go
+++ b/scripts/testpop/testpop.go
@@ -66,16 +66,26 @@ func main() {
        ringpop.TombstonePeriod(5*time.Second),
    )

-   if err := ch.ListenAndServe(*hostport); err != nil {
-       log.Fatalf("could not listen on %s: %v", *hostport, err)
-   }
+   if *hostport == "172.18.24.59:3000" {
+       opts := &swim.BootstrapOptions{}
+       opts.DiscoverProvider = jsonfile.New(*hostfile)

-   opts := &swim.BootstrapOptions{}
-   opts.DiscoverProvider = jsonfile.New(*hostfile)
+       _, err = rp.Bootstrap(opts)
+       if err != nil {
+           log.Fatalf("bootstrap failed: %v", err)
+       }
+   } else {
+       if err := ch.ListenAndServe(*hostport); err != nil {
+           log.Fatalf("could not listen on %s: %v", *hostport, err)
+       }

-   _, err = rp.Bootstrap(opts)
-   if err != nil {
-       log.Fatalf("bootstrap failed: %v", err)
+       opts := &swim.BootstrapOptions{}
+       opts.DiscoverProvider = jsonfile.New(*hostfile)
+
+       _, err = rp.Bootstrap(opts)
+       if err != nil {
+           log.Fatalf("bootstrap failed: %v", err)
+       }
    }

    // block

Generated rinpop adapter does not compile when there is service extension

service Foo {
   string ping()
}

service Bar extends Foo {
}

The generated code fails with this error:

cannot use adapter (type *RingpopBarAdapter) as type TChanBar in return argument:
	*RingpopBarAdapter does not implement TChanBar (missing Ping method)

Examples of how tchannel-go thrift gen template handles it https://github.com/uber/tchannel-go/blob/dev/thrift/thrift-gen/tchannel-template.go#L55
https://github.com/uber/tchannel-go/blob/dev/thrift/thrift-gen/tchannel-template.go#L45

Flapper TestRequestTimesOut

--- FAIL: TestRequestTimesOut (3.01s)
        Error Trace:    forwarder_test.go:274
    Error:      Not equal: "request timed out" (expected)
                    != "key destinations have diverged" (actual)
    Messages:   An error with value "request timed out" is expected but got "key destinations have diverged".

--- FAIL: TestForwarderTestSuite (3.12s)

Triggered with docker on mac

Rolling update problem

Hi,

I'm experiencing some issues when doing a rolling update over my ringpop cluster.

I'm running the cluster on top of Kubernetes with a headless service for peer communication. Every DNS query to this service returns a list of all ringpop IPs in the cluster.

I implemented the Kubernetes host provider like this:

// KubeProvider returns a list of hosts for a kubernetes headless service
type KubeProvider struct {
    svc  string
    port int
}

func NewKubeProvider(svc string, port int) discovery.DiscoverProvider {
    provider := &KubeProvider{
        svc:  svc,
        port: port,
    }
    return provider
}

func (k *KubeProvider) Hosts() ([]string, error) {
    addrs, err := net.LookupHost(k.svc)
    if err != nil {
        return nil, errors.Trace(err)
    }

    for i := range addrs {
        addrs[i] = fmt.Sprintf("%s:%d", addrs[i], k.port)
    }
    return addrs, nil
}

During a rolling update, old ringpop services are stopped one by one and new ringpop services are created with a different IP. When a new ringpop service starts, it may see old or new ips in the hosts list.

I'm running 2 instances in the cluster right now, one simply fails to start::

{"level":"info","msg":"GossipAddr: 10.244.3.5:18080","time":"2016-08-18T17:43:46Z"}
{"level":"error","msg":"unable to count members of the ring for statting: \"ringpop is not bootstrapped\"","time":"2016-08-18T17:43:46Z"}
{"cappedDelay":60000,"initialDelay":100000000,"jitteredDelay":58434,"level":"warning","local":"10.244.3.5:18080","maxDelay":60000000000,"minDelay":51200,"msg":"ringpop join attempt delay reached max","numDelays":10,"time":"2016-08-18T17:45:01Z","uncappedDelay":102400}
{"joinDuration":134374138254,"level":"warning","local":"10.244.3.5:18080","maxJoinDuration":120000000000,"msg":"max join duration exceeded","numFailed":12,"numJoined":0,"startTime":"2016-08-18T17:43:46.377647091Z","time":"2016-08-18T17:46:00Z"}
{"err":"join duration of 2m14.374138254s exceeded max 2m0s","level":"error","local":"10.244.3.5:18080","msg":"bootstrap failed","time":"2016-08-18T17:46:00Z"}
{"error":"join duration of 2m14.374138254s exceeded max 2m0s","level":"info","msg":"bootstrap failed","time":"2016-08-18T17:46:00Z"}
{"level":"fatal","msg":"[ringpop bootstrap failed: join duration of 2m14.374138254s exceeded max 2m0s]","time":"2016-08-18T17:46:00Z"}

The other one attempts to connect to the first and fails all the periodic health checks.

{"level":"info","msg":"GossipAddr: 10.244.1.7:18080","time":"2016-08-18T17:43:46Z"}
{"level":"error","msg":"unable to count members of the ring for statting: \"ringpop is not bootstrapped\"","time":"2016-08-18T17:43:46Z"}
{"level":"error","msg":"unable to count members of the ring for statting: \"ringpop is not bootstrapped\"","time":"2016-08-18T17:43:46Z"}
{"joined":["10.244.3.5:18080","10.244.3.5:18080"],"level":"info","msg":"bootstrap complete","time":"2016-08-18T17:43:49Z"}
{"level":"info","msg":"Running on :8080 using 1 processes","time":"2016-08-18T17:43:49Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"attempt heal","target":"10.244.0.6:18080","time":"2016-08-18T17:43:49Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"ping request target unreachable","target":"10.244.3.5:18080","time":"2016-08-18T17:43:49Z"}
{"error":"join timed out","failure":0,"level":"warning","local":"10.244.1.7:18080","msg":"heal attempt failed (10 in total)","time":"2016-08-18T17:43:50Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"ping request target unreachable","target":"10.244.3.5:18080","time":"2016-08-18T17:43:50Z"}
{"level":"info","local":"10.244.1.7:18080","member":"10.244.3.5:18080","msg":"executing scheduled transition for member","state":"suspect","time":"2016-08-18T17:43:54Z"}
{"level":"warning","local":"10.244.1.7:18080","msg":"no pingable members","time":"2016-08-18T17:43:54Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"attempt heal","target":"10.244.3.5:18080","time":"2016-08-18T17:44:20Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"reincarnate nodes before we can merge the partitions","target":"10.244.3.5:18080","time":"2016-08-18T17:44:20Z"}
{"error":"JSON call failed: map[type:error message:node is not ready to handle requests]","failure":0,"level":"warning","local":"10.244.1.7:18080","msg":"heal attempt failed (10 in total)","time":"2016-08-18T17:44:20Z"}
{"level":"warning","local":"10.244.1.7:18080","msg":"no pingable members","time":"2016-08-18T17:44:20Z"}
{"latency":"1.323232ms","level":"info","method":"GET","msg":"","request_id":"e21bcc3f05fa04449ab4b9f0520e0933","time":"2016-08-18T17:44:30Z","url":"/_internal/cluster-info"}
{"level":"warning","local":"10.244.1.7:18080","msg":"no pingable members","time":"2016-08-18T17:44:30Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"attempt heal","target":"10.244.3.5:18080","time":"2016-08-18T17:44:50Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"reincarnate nodes before we can merge the partitions","target":"10.244.3.5:18080","time":"2016-08-18T17:44:50Z"}
{"error":"JSON call failed: map[message:node is not ready to handle requests type:error]","failure":0,"level":"warning","local":"10.244.1.7:18080","msg":"heal attempt failed (10 in total)","time":"2016-08-18T17:44:50Z"}
{"level":"warning","local":"10.244.1.7:18080","msg":"no pingable members","time":"2016-08-18T17:44:50Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"attempt heal","target":"10.244.3.5:18080","time":"2016-08-18T17:45:20Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"reincarnate nodes before we can merge the partitions","target":"10.244.3.5:18080","time":"2016-08-18T17:45:20Z"}
{"error":"JSON call failed: map[type:error message:node is not ready to handle requests]","failure":0,"level":"warning","local":"10.244.1.7:18080","msg":"heal attempt failed (10 in total)","time":"2016-08-18T17:45:20Z"}
{"level":"warning","local":"10.244.1.7:18080","msg":"no pingable members","time":"2016-08-18T17:45:20Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"attempt heal","target":"10.244.3.5:18080","time":"2016-08-18T17:45:50Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"reincarnate nodes before we can merge the partitions","target":"10.244.3.5:18080","time":"2016-08-18T17:45:50Z"}
{"error":"JSON call failed: map[type:error message:node is not ready to handle requests]","failure":0,"level":"warning","local":"10.244.1.7:18080","msg":"heal attempt failed (10 in total)","time":"2016-08-18T17:45:50Z"}
{"level":"warning","local":"10.244.1.7:18080","msg":"no pingable members","time":"2016-08-18T17:45:50Z"}

Eventually, the first pod timeouts, it is restarted by the cluster manager and it successfully connects to the second pod.

Is it related to #146 ?

Flappy integration test - 83 check number of alive nodes

[test-errors-1] # cluster-size 1: endpoint: /admin/gossip/tick
[test-errors-1] 
[test-errors-1] ./test/../testpop --listen=127.0.0.1:13077 --hosts=/tmp/ringpop-integration-test-hosts.json
[test-errors-1] [sut:127.0.0.1:13077] 11:08:40.592 time="2016-02-19T11:08:40Z" level=error msg="unable to count members of the ring for statting: \"ringpop is not bootstrapped\"" 
[test-errors-1] ok 80 JSON Validation: join request
[test-errors-1] * starting waitForJoins
[test-errors-1] ok 81 check number of joins
[test-errors-1] * starting requestAdminStats
[test-errors-1] * starting waitForStatsAssertStatus
[test-errors-1] ok 82 JSON Validation: admin-status response
[test-errors-1] not ok 83 check number of alive nodes
[test-errors-1]   ---
[test-errors-1]     operator: equal
[test-errors-1]     expected: 2
[test-errors-1]     actual:   1
[test-errors-1]     at: Array.waitForStatsAssertStatus [as 2] (/home/travis/gopath/src/github.com/uber/ringpop-go/test/ringpop-common/test/ringpop-assert.js:468:11)
[test-errors-1]   ...
[test-errors-1] ok 84 check number of suspect nodes
[test-errors-1] ok 85 check number of faulty nodes
[test-errors-1] not ok 86 full stats check
[test-errors-1]   ---
[test-errors-1]     operator: fail
[test-errors-1]     at: Array.waitForStatsAssertStatus [as 2] (/home/travis/gopath/src/github.com/uber/ringpop-go/test/ringpop-common/test/ringpop-assert.js:472:15)
[test-errors-1]   ...
[test-errors-1] ============== error details ===============
[test-errors-1] 
[test-errors-1] [
[test-errors-1]   {
[test-errors-1]     "address": "127.0.0.1:13077",
[test-errors-1]     "status": "alive",
[test-errors-1]     "incarnationNumber": 1455880120591
[test-errors-1]   }
[test-errors-1] ]
[test-errors-1] 
[test-errors-1] ============================================
[test-errors-1] 
[test-errors-1] not ok 87 member not found in membership
[test-errors-1]   ---
[test-errors-1]     operator: fail
[test-errors-1]     at: Array.waitForStatsAssertStatus [as 2] (/home/travis/gopath/src/github.com/uber/ringpop-go/test/ringpop-common/test/ringpop-assert.js:476:22)
[test-errors-1]   ...

Build:
https://travis-ci.org/uber/ringpop-go/jobs/110352909

(This project is no longer under active development.)

Could we please get some more info about what this means? Should people avoid looking at ringpop as it's a discontinued project that uber is moving on from, or is it because you consider it feature complete and in maintenance mode?

Flappy integration test - 1912 Testing chosen dest ?

[test-errors-4] FAIL: Test failed for cluster size 4
[test-errors-4] 
[test-errors-4] # cluster-size 4: endpoint: /admin/lookup (Hello World 0.49425241770222783)
[test-errors-4] 
[test-errors-4] ./test/../testpop --listen=127.0.0.1:19897 --hosts=/tmp/ringpop-integration-test-hosts.json
[test-errors-4] [sut:127.0.0.1:19897] 14:43:15.795 time="2016-02-22T14:43:15Z" level=error msg="unable to count members of the ring for statting: \"ringpop is not bootstrapped\"" 
[test-errors-4] ok 1896 JSON Validation: join request
[test-errors-4] * starting waitForJoins
[test-errors-4] ok 1897 JSON Validation: join request
[test-errors-4] ok 1898 JSON Validation: join request
[test-errors-4] [sut:127.0.0.1:19897] 14:43:15.799 time="2016-02-22T14:43:15Z" level=error msg="unable to count members of the ring for statting: \"ringpop is not bootstrapped\"" 
[test-errors-4] ok 1899 JSON Validation: join request
[test-errors-4] ok 1900 check number of joins
[test-errors-4] * starting requestAdminStats
[test-errors-4] [sut:127.0.0.1:19897] 14:43:15.800 time="2016-02-22T14:43:15Z" level=info msg="bootstrap complete" joined=[127.0.0.1:52406 127.0.0.1:51402 127.0.0.1:42725 127.0.0.1:42437] 
[test-errors-4] ok 1901 JSON Validation: ping request
[test-errors-4] * starting waitForStatsAssertStatus
[test-errors-4] ok 1902 JSON Validation: admin-status response
[test-errors-4] ok 1903 check number of alive nodes
[test-errors-4] ok 1904 check number of suspect nodes
[test-errors-4] ok 1905 check number of faulty nodes
[test-errors-4] ok 1906 same incarnationNumber 1337
[test-errors-4] ok 1907 same incarnationNumber 1337
[test-errors-4] ok 1908 same incarnationNumber 1337
[test-errors-4] ok 1909 same incarnationNumber 1337
[test-errors-4] ok 1910 same incarnationNumber as sut 1456152195795
[test-errors-4] * starting callEndpoint
[test-errors-4] * starting validateEventBody
[test-errors-4] ok 1911 JSON Validation: admin-lookup response
[test-errors-4] hash: 4293389289 matchingHash: 695936160
[test-errors-4] not ok 1912 Testing chosen dest
[test-errors-4]   ---
[test-errors-4]     operator: equal
[test-errors-4]     expected: |-
[test-errors-4]       '127.0.0.1:42437'
[test-errors-4]     actual: |-
[test-errors-4]       '127.0.0.1:52406'
[test-errors-4]     at: Array.validateEventBody [as 4] (/home/travis/gopath/src/github.com/uber/ringpop-go/test/ringpop-common/test/ringpop-assert.js:132:14)
[test-errors-4]   ...
[test-errors-4] ok 1913 Wait for AdminLookup response
[test-errors-4] * starting expectOnlyPingsAndPingReqs
[test-errors-4] ok 1914 check if all remaining events are pings or ping-reqs
[test-errors-4] ok 1915 validate done: all functions passed

Support all (3) protocols of TChannel in HandleOrForward

Currently thrift over TChannel is not working as reported by #22. Before JSON over TChannel was not working as fixed by #14.

As one could see is that both PR's are the exact inverse of each other.

Since HandleOrForward is one of -maybe the biggest- driver to use Ringpop it should work on all supported TChannel protocols. We need to figure out how to do this the correct way.

too much "no pingable members" when running single-node cluster

Thanks for make the awesome ringpop. But it constantly warning "no pingable members" when running a single-node cluster which is too much annoying. It should be better if lower log level to DEBUG.

WARN[0000] no pingable members                           local=gMac.local:5000
WARN[0000] no pingable members                           local=gMac.local:5000
WARN[0000] no pingable members                           local=gMac.local:5000
WARN[0002] no pingable members                           local=gMac.local:5000
...

Make sure incarnation number is increased when bumped

When reincarnating, the incarnation number is always "bumped" to the current time in ms. If we reincarnate twice in one ms, the incarnation number is not really bumped.
By can make sure it's always increasing by using max(currentIncarnationNumber + 1, nowInMillis()).

(node: uber-node/ringpop-node#298)

"ping request inconclusive due to errors" in healthy ring

I am seeing a lot of log warnings like:

{"errors":null,"level":"warning","local":"172.18.25.93:55435","msg":"ping request inconclusive due to errors","numErrors":0,"ringpop":"ringpopd-go","target":"172.18.25.93:56203","time":"2015-11-19T10:09:55+01:00"}
{"errors":null,"level":"warning","local":"172.18.25.93:55435","msg":"ping request inconclusive due to errors","numErrors":0,"ringpop":"ringpopd-go","target":"172.18.25.93:56203","time":"2015-11-19T10:09:57+01:00"}
{"errors":null,"level":"warning","local":"172.18.25.93:55435","msg":"ping request inconclusive due to errors","numErrors":0,"ringpop":"ringpopd-go","target":"172.18.25.93:56203","time":"2015-11-19T10:09:59+01:00"}
{"errors":null,"level":"warning","local":"172.18.25.93:55435","msg":"ping request inconclusive due to errors","numErrors":0,"ringpop":"ringpopd-go","target":"172.18.25.93:56203","time":"2015-11-19T10:10:01+01:00"}
{"errors":null,"level":"warning","local":"172.18.25.93:55435","msg":"ping request inconclusive due to errors","numErrors":0,"ringpop":"ringpopd-go","target":"172.18.25.93:56203","time":"2015-11-19T10:10:02+01:00"}
{"errors":null,"level":"warning","local":"172.18.25.93:55435","msg":"ping request inconclusive due to errors","numErrors":0,"ringpop":"ringpopd-go","target":"172.18.25.93:56203","time":"2015-11-19T10:10:04+01:00"}
{"errors":null,"level":"warning","local":"172.18.25.93:55435","msg":"ping request inconclusive due to errors","numErrors":0,"ringpop":"ringpopd-go","target":"172.18.25.93:56203","time":"2015-11-19T10:10:06+01:00"}

This is occurring in a healthy ring:

$ ringpop-admin status 172.18.25.93:55435 
 172.18.25.93:55435   alive 
 172.18.25.93:56203   alive 
$

It looks like this message is being printed whether there are errors collected or not (note that len(errs) is 0 in the message).