I'm experiencing some issues when doing a rolling update over my ringpop cluster.
During a rolling update, old ringpop services are stopped one by one and new ringpop services are created with a different IP. When a new ringpop service starts, it may see old or new ips in the hosts list.
{"level":"info","msg":"GossipAddr: 10.244.3.5:18080","time":"2016-08-18T17:43:46Z"}
{"level":"error","msg":"unable to count members of the ring for statting: \"ringpop is not bootstrapped\"","time":"2016-08-18T17:43:46Z"}
{"cappedDelay":60000,"initialDelay":100000000,"jitteredDelay":58434,"level":"warning","local":"10.244.3.5:18080","maxDelay":60000000000,"minDelay":51200,"msg":"ringpop join attempt delay reached max","numDelays":10,"time":"2016-08-18T17:45:01Z","uncappedDelay":102400}
{"joinDuration":134374138254,"level":"warning","local":"10.244.3.5:18080","maxJoinDuration":120000000000,"msg":"max join duration exceeded","numFailed":12,"numJoined":0,"startTime":"2016-08-18T17:43:46.377647091Z","time":"2016-08-18T17:46:00Z"}
{"err":"join duration of 2m14.374138254s exceeded max 2m0s","level":"error","local":"10.244.3.5:18080","msg":"bootstrap failed","time":"2016-08-18T17:46:00Z"}
{"error":"join duration of 2m14.374138254s exceeded max 2m0s","level":"info","msg":"bootstrap failed","time":"2016-08-18T17:46:00Z"}
{"level":"fatal","msg":"[ringpop bootstrap failed: join duration of 2m14.374138254s exceeded max 2m0s]","time":"2016-08-18T17:46:00Z"}
The other one attempts to connect to the first and fails all the periodic health checks.
{"level":"info","msg":"GossipAddr: 10.244.1.7:18080","time":"2016-08-18T17:43:46Z"}
{"level":"error","msg":"unable to count members of the ring for statting: \"ringpop is not bootstrapped\"","time":"2016-08-18T17:43:46Z"}
{"level":"error","msg":"unable to count members of the ring for statting: \"ringpop is not bootstrapped\"","time":"2016-08-18T17:43:46Z"}
{"joined":["10.244.3.5:18080","10.244.3.5:18080"],"level":"info","msg":"bootstrap complete","time":"2016-08-18T17:43:49Z"}
{"level":"info","msg":"Running on :8080 using 1 processes","time":"2016-08-18T17:43:49Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"attempt heal","target":"10.244.0.6:18080","time":"2016-08-18T17:43:49Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"ping request target unreachable","target":"10.244.3.5:18080","time":"2016-08-18T17:43:49Z"}
{"error":"join timed out","failure":0,"level":"warning","local":"10.244.1.7:18080","msg":"heal attempt failed (10 in total)","time":"2016-08-18T17:43:50Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"ping request target unreachable","target":"10.244.3.5:18080","time":"2016-08-18T17:43:50Z"}
{"level":"info","local":"10.244.1.7:18080","member":"10.244.3.5:18080","msg":"executing scheduled transition for member","state":"suspect","time":"2016-08-18T17:43:54Z"}
{"level":"warning","local":"10.244.1.7:18080","msg":"no pingable members","time":"2016-08-18T17:43:54Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"attempt heal","target":"10.244.3.5:18080","time":"2016-08-18T17:44:20Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"reincarnate nodes before we can merge the partitions","target":"10.244.3.5:18080","time":"2016-08-18T17:44:20Z"}
{"error":"JSON call failed: map[type:error message:node is not ready to handle requests]","failure":0,"level":"warning","local":"10.244.1.7:18080","msg":"heal attempt failed (10 in total)","time":"2016-08-18T17:44:20Z"}
{"level":"warning","local":"10.244.1.7:18080","msg":"no pingable members","time":"2016-08-18T17:44:20Z"}
{"latency":"1.323232ms","level":"info","method":"GET","msg":"","request_id":"e21bcc3f05fa04449ab4b9f0520e0933","time":"2016-08-18T17:44:30Z","url":"/_internal/cluster-info"}
{"level":"warning","local":"10.244.1.7:18080","msg":"no pingable members","time":"2016-08-18T17:44:30Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"attempt heal","target":"10.244.3.5:18080","time":"2016-08-18T17:44:50Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"reincarnate nodes before we can merge the partitions","target":"10.244.3.5:18080","time":"2016-08-18T17:44:50Z"}
{"error":"JSON call failed: map[message:node is not ready to handle requests type:error]","failure":0,"level":"warning","local":"10.244.1.7:18080","msg":"heal attempt failed (10 in total)","time":"2016-08-18T17:44:50Z"}
{"level":"warning","local":"10.244.1.7:18080","msg":"no pingable members","time":"2016-08-18T17:44:50Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"attempt heal","target":"10.244.3.5:18080","time":"2016-08-18T17:45:20Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"reincarnate nodes before we can merge the partitions","target":"10.244.3.5:18080","time":"2016-08-18T17:45:20Z"}
{"error":"JSON call failed: map[type:error message:node is not ready to handle requests]","failure":0,"level":"warning","local":"10.244.1.7:18080","msg":"heal attempt failed (10 in total)","time":"2016-08-18T17:45:20Z"}
{"level":"warning","local":"10.244.1.7:18080","msg":"no pingable members","time":"2016-08-18T17:45:20Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"attempt heal","target":"10.244.3.5:18080","time":"2016-08-18T17:45:50Z"}
{"level":"info","local":"10.244.1.7:18080","msg":"reincarnate nodes before we can merge the partitions","target":"10.244.3.5:18080","time":"2016-08-18T17:45:50Z"}
{"error":"JSON call failed: map[type:error message:node is not ready to handle requests]","failure":0,"level":"warning","local":"10.244.1.7:18080","msg":"heal attempt failed (10 in total)","time":"2016-08-18T17:45:50Z"}
{"level":"warning","local":"10.244.1.7:18080","msg":"no pingable members","time":"2016-08-18T17:45:50Z"}
Eventually, the first pod timeouts, it is restarted by the cluster manager and it successfully connects to the second pod.