Coder Social home page Coder Social logo

ringpop-node's Introduction

ringpop-node Build Status

(This project is no longer under active development.)

Ringpop is a library that brings cooperation and coordination to distributed applications. It maintains a consistent hash ring on top of a membership protocol and provides request forwarding as a routing convenience. It can be used to shard your application in a way that's scalable and fault tolerant.

Requirements

  • Node 0.10 (0.10.32 or higher)

Installation

To install Ringpop for usage as a library:

npm install ringpop

Prepare the current directory for development:

npm install

To be able to run the tests, make sure you have your open file limit restriction on at least 4K:

ulimit -n 4096

Tick Cluster

An example application tools/tick-cluster.js is included in ringpop-common repository. It just launches a ringpop cluster of a given size. Using this application is the quickest way to start a ringpop cluster.

git clone https://github.com/uber/ringpop-common.git
./ringpop-common/tools/tick-cluster.js --interpreter node main.js

Example

Run a 2-node Ringpop cluster from the command-line. Install Ringpop and TChannel, copy/paste the below into your editor and run!

var Ringpop = require('ringpop');
var TChannel = require('tchannel');

function Cluster(opts) {
    this.name = opts.name;
    this.size = opts.size;
    this.basePort = opts.basePort;
    this.bootstrapNodes = [];

    // Create the bootstrap list of nodes that'll
    // be used to seed Ringpop for its join request.
    for (var i = 0; i < this.size; i++) {
        this.bootstrapNodes.push('127.0.0.1:' + (this.basePort + i));
    }
}

Cluster.prototype.launch = function launch(callback) {
    var self = this;
    var done = after(self.size, callback);

    for (var i = 0; i < this.size; i++) {
        var addr = this.bootstrapNodes[i];
        var addrParts = addr.split(':');

        var tchannel = new TChannel();
        var ringpop = new Ringpop({
            app: this.name,
            hostPort: addr,
            channel: tchannel.makeSubChannel({
                serviceName: 'ringpop',
                trace: false
            })
        });
        ringpop.setupChannel();

        // First make sure TChannel is accepting connections.
        tchannel.listen(+addrParts[1], addrParts[0], listenCb(ringpop));
    }


    function listenCb(ringpop) {
        // When TChannel is listening, bootstrap Ringpop. It'll
        // try to join its friends in the bootstrap list.
        return function onListen() {
            ringpop.bootstrap(self.bootstrapNodes, done);
        };
    }
};

// IGNORE THIS! It's a little utility function that invokes
// a callback after a specified number of invocations
// of its shim.
function after(count, callback) {
    var countdown = count;

    return function shim(err) {
        if (typeof callback !== 'function') return;

        if (err) {
            callback(err);
            callback = null;
            return;
        }

        if (--countdown === 0) {
            callback();
            callback = null;
        }
    };
}

if (require.main === module) {
    // Launch a Ringpop cluster of arbitrary size.
    var cluster = new Cluster({
        name: 'mycluster',
        size: 2,
        basePort: 3000
    });

    // When all nodes have been bootstrapped, your
    // Ringpop cluster will be ready for use.
    cluster.launch(function onLaunch(err) {
        if (err) {
            console.error('Error: failed to launch cluster');
            process.exit(1);
        }

        console.log('Ringpop cluster is ready!');
    });
}

Documentation

Interested in where to go from here? Read the docs at ringpop.readthedocs.org.

ringpop-node's People

Contributors

alexhauser23 avatar benfleis avatar bhenhsi avatar btromanova avatar charliezhang avatar corgiman avatar dansimau avatar davejn avatar jcorbin avatar jonjs avatar jwolski avatar jwolski2 avatar kriskowal avatar lupie avatar markyen avatar mennopruijssers avatar motiejus avatar raynos avatar sashahilton00 avatar severb avatar shannili avatar thanodnl avatar toddsifleet avatar uberesch avatar weikai77 avatar yulunli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ringpop-node's Issues

Use logical clocks for incarnation numbers

There's the potential for mishaps when using system clock to represent a members incarnation number. They are susceptible to misconfigurations and drift. Replace with logical clocks.

Catching handleOrProxy errors when request is sent to node that has failed.

When testing Ringpop in a small cluster, I have been testing how ringpop handles node failures. When a node goes down, and a request is handleOrProxy()'d, there is a period of approximately 5 seconds where ringpop continues to forward requests to the dead node. Whilst this is expected, it throws a rather ugly error response: tchannel socket error (ECONNREFUSED from connect): connect ECONNREFUSED 10.2.0.17:9880. Is there any way that an extra argument could be added to the handleOrProxy() method that would be called when such an error occurs so that we can send a formatted response that our application can read? eg. handleOrProxy(req, res, errCallback) where errCallback is something like the example below:

function errCallback (err, req, res) {
    switch (err) {
        case 'ECONNREFUSED':
        res.end(JSON.stringify({status: 'failed', error: 'connection_refused', message: 'Could not connect to the correct server.', retry_in: 5}));
        break;

        case 'ETIMEDOUT':
        res.end(JSON.stringify({status: 'failed', error: 'connection_timeout', message: 'The connection timed out.', retry_in: 20}));
        break;

        //etc.
    }
}

Persistent full-syncs

In the wild, we've seen persistent full-syncs and cannot converge. We need a dump of cluster stats when this happens so that we can evaluate the membership stat against the membership update rules.

Start the cluster with nodes offline

Hi, first of all, thanks for the awesome work behind ringpop.
Just one question: we are experimenting with the framework and noticed that the bootstrap won't trigger its callback if one of the nodes is down (currently we are using 3 nodes). Is this by design or it's an unexpected behaviour? Because if the the nodes come up the cluster starts correctly, but in our infrastructure not all the nodes spawn at the same time (some nodes can take even 5 minutes to boot up), so the cluster could hang a lot.

Thanks
Mattia

hashring replica point concatenation can create slightly imbalanced rings in particular circumstances

Replica points are added by concatenation onto the server name, which is currently of the form a.b.c.d:X. If you have a=1.2.3.4:55 and b=1.2.3.4:555, you will have some overlap, e.g. with replica points 5 and 55, respectively, where a + '55' === b + '5', and therefore hash identically.

This is probably an atypical configuration, and probably not a major concern, but as we consider a future with logical IDs, this could become a bigger of a potential question mark.

Probably the simplest thing is simply to insert '/' or '#' in between. In any consistent naming scheme, this would make all points unique.

Some trivial code to demonstrate this is below -- no matter how much you adjust the offset, 's1:11' is always underrepresented.

test('random hosts with good points show good distribution', function t(assert) {
    var ring = new HashRing({ replicaPoints: 100 });
    var servers = ['s1:1', 's1:11', 's2:2', 's2:5'];
    var counts = {};
    for (var i = 0; i < servers.length; i++) {
        var server = servers[i];
        ring.addServer(server);
        counts[server] = 0;
    }

    var offset = 20 * 1000000;
    for (var i = 0; i < 1000000; i++) {
        s = ring.lookup((i + offset) + '');
        counts[s]++
    }
    console.log(counts);
    assert.end();
});

faster membership convergence

Look at membership convergence via membership update fanouts, with (at least) 2 known possible mechanisms:

  • burst increase of ping rate
  • manual/alternate additional update messages, possibly w/ NOOP instead of PING as underlying message?

Looking at initial baselining via #56, with the note that it is localhost only, and we will ultimately need something closer to reality.

Bootstrap can occur without listen

(More or less duplicate of uber/ringpop-go#146)

Ringpop currently allows bootstrap to occur without a listening tchannel underneath. This was confirmed by using tick-cluster with bootstrap but not listen on both a single node, and all nodes (patch below).

Behaviorally, a single node failing to listen is the worst case. It has continuous 1-way interactions with other nodes, and seems to create a continuous cycle of other nodes marking it suspect. This is possible in real life during a rolling upgrade, or if bootstrap/listen handling is incorrect in some cases.

If all nodes fail to listen, they all simply fail to bootstrap, as expected.

Our current code demonstrates listen() before bootstrap() pretty consistently, but given the failure modes, we ought to be more defensive and confirm that the tchannel is already listening, or call this.channel.listen() ourselves.

Behavior was confirmed by watching tick-cluster logs, and running ringpop-admin dump on one of the live nodes.


diff --git a/main.js b/main.js
index dc8fd2c..bd4f101 100755
--- a/main.js
+++ b/main.js
@@ -61,10 +61,15 @@ function main(args) {
     var listenParts = listen.split(':');
     var port = Number(listenParts[1]);
     var host = listenParts[0];
-    tchannel.listen(port, host, onListening);

-    function onListening() {
+    if (port === 3000) {
         ringpop.bootstrap(program.hosts);
+    } else {
+        tchannel.listen(port, host, onListening);
+
+        function onListening() {
+            ringpop.bootstrap(program.hosts);
+        }
     }
 }

Improve stats during partition healing

We should add more stats to partition healing.
For healing through discover provider:

  • timer for complete heal-operation
  • counts for success heal-operation
  • counts for error (max failures) heal-operation

For heal-attempts:

  • timer for complete heal attempt.
  • count for successes
  • count for errors

(this is the ringpop-node counterpart of uber/ringpop-go#143)

flappy node damping

ringpop needs the ability to detect slow or flappy nodes and intentionally evict them from the cluster.

can we introduce a damping period akin to the suspect period? the period would be initiated if a node within the cluster had to assert itself as an 'alive' member over some yet-to-be-defined threshold. this node can mark itself 'damped' and generate an initial damping score based on the degree of flappiness. that score can then grow or decay based on behavior of the cluster over the damping period window. if the score does not decay enough the 'damped' node will eventually be marked as 'evicted'. evicted nodes must be removed from the ring (or ownership of the keyspace significantly reduced), but may still be part of the cluster membership and thus pinged periodically (but less frequently) as part of the protocol period.

Potential inspiration:

Cannot send to destroyed tchannel

Error: cannot send() to destroyed tchannel
    at TChannel.send [as request] (/home/raynos/uber/autobahn/node_modules/tchannel/index.js:286:15)
    at TChannel.send (/home/raynos/uber/autobahn/node_modules/tchannel/index.js:278:20)
    at new PingSender (/home/raynos/uber/autobahn/node_modules/ringpop/lib/swim.js:84:23)
    at RingPop.sendPing (/home/raynos/uber/autobahn/node_modules/ringpop/index.js:695:12)
    at RingPop.protocolPingReq (/home/raynos/uber/autobahn/node_modules/ringpop/index.js:457:10)
    at RingPopTChannel.protocolPingReq (/home/raynos/uber/autobahn/node_modules/ringpop/lib/tchannel.js:183:18)
    at runHandler (/home/raynos/uber/autobahn/node_modules/tchannel/index.js:764:14)
    at process._tickDomainCallback (node.js:459:13)

It looks like if we closed ringpop whilst a ping is mid flight it does not check whether its destroyed in the protocolPingReq method of ringpop.

zalgo.

All callbacks must be asynchronous.

function createRingpop(callback) {
  var r = Ringpop({ ... })

  r.bootstrap(callback)

  return r;
}

var r = createRingpop(function (err) {
  if (err) {
    // do stuff with r
    // OOPS ZALGO. callback called before return.
  }
})

If we do not call callbacks in the nextTick we get horrible race conditions.

npm install ringpop build error

OS : MAC OS X EL
Node Version: v6.2.1, v4.4.5, v4.2.4, 0.12.9
Step to reproduce:
1 open terminal
2 npm install ringpop / npm install ringpop -save


> node-gyp rebuild

  CXX(target) Release/obj.target/toobusy/toobusy.o
../toobusy.cc:25:29: error: unknown type name 'Arguments'; did you mean 'v8::internal::Arguments'?
Handle<Value> TooBusy(const Arguments& args) {
                            ^~~~~~~~~
                            v8::internal::Arguments
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:139:7: note: 'v8::internal::Arguments' declared here
class Arguments;
      ^
../toobusy.cc:37:20: error: no matching function for call to 'True'
    return block ? True() : False();
                   ^~~~
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:8139:16: note: candidate function not viable: requires single argument 'isolate', but no arguments were provided
Local<Boolean> True(Isolate\* isolate) {
               ^
../toobusy.cc:37:29: error: no matching function for call to 'False'
    return block ? True() : False();
                            ^~~~~
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:8148:16: note: candidate function not viable: requires single argument 'isolate', but no arguments were provided
Local<Boolean> False(Isolate\* isolate) {
               ^
../toobusy.cc:40:30: error: unknown type name 'Arguments'; did you mean 'v8::internal::Arguments'?
Handle<Value> ShutDown(const Arguments& args) {
                             ^~~~~~~~~
                             v8::internal::Arguments
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:139:7: note: 'v8::internal::Arguments' declared here
class Arguments;
      ^
../toobusy.cc:45:12: error: no matching function for call to 'Undefined'
    return Undefined();
           ^~~~~~~~~
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:315:27: note: candidate function not viable: requires single argument 'isolate', but no arguments were provided
  friend Local<Primitive> Undefined(Isolate\* isolate);
                          ^
../toobusy.cc:48:25: error: unknown type name 'Arguments'; did you mean 'v8::internal::Arguments'?
Handle<Value> Lag(const Arguments& args) {
                        ^~~~~~~~~
                        v8::internal::Arguments
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:139:7: note: 'v8::internal::Arguments' declared here
class Arguments;
      ^
../toobusy.cc:49:17: error: calling a protected constructor of class 'v8::HandleScope'
    HandleScope scope;
                ^
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:885:13: note: declared protected here
  V8_INLINE HandleScope() {}
            ^
../toobusy.cc:50:18: error: no member named 'Close' in 'v8::HandleScope'
    return scope.Close(Integer::New(s_currentLag));
           ~~~~~ ^
../toobusy.cc:50:49: error: too few arguments to function call, expected 2, have 1
    return scope.Close(Integer::New(s_currentLag));
                       ~~~~~~~~~~~~             ^
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:2499:3: note: 'New' declared here
  static Local<Integer> New(Isolate\* isolate, int32_t value);
  ^
../toobusy.cc:53:35: error: unknown type name 'Arguments'; did you mean 'v8::internal::Arguments'?
Handle<Value> HighWaterMark(const Arguments& args) {
                                  ^~~~~~~~~
                                  v8::internal::Arguments
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:139:7: note: 'v8::internal::Arguments' declared here
class Arguments;
      ^
../toobusy.cc:54:17: error: calling a protected constructor of class 'v8::HandleScope'
    HandleScope scope;
                ^
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:885:13: note: declared protected here
  V8_INLINE HandleScope() {}
            ^
../toobusy.cc:56:13: error: member access into incomplete type 'const v8::internal::Arguments'
    if (args.Length() >= 1) {
            ^
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:139:7: note: forward declaration of 'v8::internal::Arguments'
class Arguments;
      ^
../toobusy.cc:57:18: error: type 'const v8::internal::Arguments' does not provide a subscript operator
        if (!args[0]->IsNumber()) {
             ~~~~^~
../toobusy.cc:58:24: error: no member named 'ThrowException' in namespace 'v8'
            return v8::ThrowException(
                   ~~~~^
../toobusy.cc:60:33: error: no member named 'New' in 'v8::String'
                    v8::String::New("expected numeric first argument")));
                    ~~~~~~~~~~~~^
../toobusy.cc:62:23: error: type 'const v8::internal::Arguments' does not provide a subscript operator
        int hwm = args[0]->Int32Value();
                  ~~~~^~
../toobusy.cc:64:24: error: no member named 'ThrowException' in namespace 'v8'
            return v8::ThrowException(
                   ~~~~^
../toobusy.cc:66:33: error: no member named 'New' in 'v8::String'
                    v8::String::New("maximum lag should be greater than 10ms")));
                    ~~~~~~~~~~~~^
../toobusy.cc:71:18: error: no member named 'Close' in 'v8::HandleScope'
    return scope.Close(Number::New(HIGH_WATER_MARK_MS));
           ~~~~~ ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make: **\* [Release/obj.target/toobusy/toobusy.o] Error 1
gyp ERR! build error 
gyp ERR! stack Error: `make` failed with exit code: 2
gyp ERR! stack     at ChildProcess.onExit (/Users/xxxx/.nvm/versions/node/v4.2.4/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:270:23)
gyp ERR! stack     at emitTwo (events.js:87:13)
gyp ERR! stack     at ChildProcess.emit (events.js:172:7)
gyp ERR! stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:200:12)
gyp ERR! System Darwin 15.5.0
gyp ERR! command "/Users/xxxx/.nvm/versions/node/v4.2.4/bin/node" "/Users/xxxx/.nvm/versions/node/v4.2.4/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /Users/xxxx/work/testing/node_modules/ringpop/node_modules/toobusy
gyp ERR! node -v v4.2.4
gyp ERR! node-gyp -v v3.0.3
gyp ERR! not ok 

> [email protected] install /Users/xxxx/work/testing/node_modules/ringpop/node_modules/farmhash
> node-gyp rebuild

  CXX(target) Release/obj.target/farmhash/src/upstream/farmhash.o
../src/upstream/farmhash.cc:684:23: warning: unused function 'Shuffle2031' [-Wunused-function]
STATIC_INLINE __m128i Shuffle2031(__m128i x) {
                      ^
1 warning generated.
  CXX(target) Release/obj.target/farmhash/src/bindings.o
In file included from ../src/bindings.cc:4:
../node_modules/nan/nan.h:324:27: error: redefinition of 'NanEnsureHandleOrPersistent'
  NAN_INLINE v8::Local<T> NanEnsureHandleOrPersistent(const v8::Local<T> &val) {
                          ^
../node_modules/nan/nan.h:319:17: note: previous definition is here
  v8::Handle<T> NanEnsureHandleOrPersistent(const v8::Handle<T> &val) {
                ^
../node_modules/nan/nan.h:344:27: error: redefinition of 'NanEnsureLocal'
  NAN_INLINE v8::Local<T> NanEnsureLocal(const v8::Handle<T> &val) {
                          ^
../node_modules/nan/nan.h:334:27: note: previous definition is here
  NAN_INLINE v8::Local<T> NanEnsureLocal(const v8::Local<T> &val) {
                          ^
../node_modules/nan/nan.h:757:13: error: no member named 'smalloc' in namespace 'node'
    , node::smalloc::FreeCallback callback
      ~~~~~~^
../node_modules/nan/nan.h:768:12: error: no matching function for call to 'New'
    return node::Buffer::New(v8::Isolate::GetCurrent(), data, size);
           ^~~~~~~~~~~~~~~~~
/Users/xxxx/.node-gyp/4.2.4/include/node/node_buffer.h:31:40: note: candidate function not viable: no known conversion from 'uint32_t' (aka 'unsigned int') to
      'enum encoding' for 3rd argument
NODE_EXTERN v8::MaybeLocalv8::Object New(v8::Isolate\* isolate,
                                       ^
/Users/xxxx/.node-gyp/4.2.4/include/node/node_buffer.h:43:40: note: candidate function not viable: 2nd argument ('const char _') would lose const qualifier
NODE_EXTERN v8::MaybeLocalv8::Object New(v8::Isolate_ isolate,
                                       ^
/Users/xxxx/.node-gyp/4.2.4/include/node/node_buffer.h:28:40: note: candidate function not viable: requires 2 arguments, but 3 were provided
NODE_EXTERN v8::MaybeLocalv8::Object New(v8::Isolate\* isolate, size_t length);
                                       ^
/Users/xxxx/.node-gyp/4.2.4/include/node/node_buffer.h:36:40: note: candidate function not viable: requires 5 arguments, but 3 were provided
NODE_EXTERN v8::MaybeLocalv8::Object New(v8::Isolate\* isolate,
                                       ^
In file included from ../src/bindings.cc:4:
../node_modules/nan/nan.h:772:12: error: no viable conversion from returned value of type 'v8::MaybeLocalv8::Object' to function return type 'v8::Localv8::Object'
    return node::Buffer::New(v8::Isolate::GetCurrent(), size);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:210:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from
      'v8::MaybeLocalv8::Object' to 'const v8::Localv8::Object &' for 1st argument
class Local {
      ^
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:210:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from
      'v8::MaybeLocalv8::Object' to 'v8::Localv8::Object &&' for 1st argument
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:214:13: note: candidate template ignored: could not match 'Local' against 'MaybeLocal'
  V8_INLINE Local(Local<S> that)
            ^
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:326:13: note: candidate template ignored: could not match 'S _' against 'v8::MaybeLocalv8::Object'
  V8_INLINE Local(S_ that)
            ^
In file included from ../src/bindings.cc:4:
../node_modules/nan/nan.h:779:26: error: no member named 'Use' in namespace 'node::Buffer'
    return node::Buffer::Use(v8::Isolate::GetCurrent(), data, size);
           ~~~~~~~~~~~~~~^
In file included from ../src/bindings.cc:2:
In file included from /Users/xxxx/.node-gyp/4.2.4/include/node/node.h:42:
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:221:5: error: assigning to 'v8::Primitive *volatile' from incompatible type 'v8::Value *'
    TYPE_CHECK(T, S);
    ^~~~~~~~~~~~~~~~
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:180:37: note: expanded from macro 'TYPE_CHECK'
    _(static_cast<T_ volatile_>(0)) = static_cast<S_>(0);      \
                                    ^ ~~~~~~~~~~~~~~~~~~
../node_modules/nan/nan.h:501:12: note: in instantiation of function template specialization 'v8::Localv8::Primitive::Localv8::Value' requested here
    return NanEscapeScope(NanNew(v8::Undefined(v8::Isolate::GetCurrent())));
           ^
../node_modules/nan/nan.h:483:30: note: expanded from macro 'NanEscapeScope'
# define NanEscapeScope(val) scope.Escape(Nan::imp::NanEnsureLocal(val))

                     ^

In file included from ../src/bindings.cc:2:
In file included from /Users/xxxx/.node-gyp/4.2.4/include/node/node.h:42:
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:221:5: error: assigning to 'v8::Boolean _volatile' from incompatible type 'v8::Value *'
    TYPE_CHECK(T, S);
    ^~~~~~~~~~~~~~~~
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:180:37: note: expanded from macro 'TYPE_CHECK'
    *(static_cast<T_ volatile_>(0)) = static_cast<S_>(0);      \
                                    ^ ~~~~~~~~~~~~~~~~~~
../node_modules/nan/nan.h:511:12: note: in instantiation of function template specialization 'v8::Localv8::Boolean::Localv8::Value' requested here
    return NanEscapeScope(NanNew(v8::True(v8::Isolate::GetCurrent())));
           ^
../node_modules/nan/nan.h:483:30: note: expanded from macro 'NanEscapeScope'
# define NanEscapeScope(val) scope.Escape(Nan::imp::NanEnsureLocal(val))

                     ^

In file included from ../src/bindings.cc:2:
In file included from /Users/xxxx/.node-gyp/4.2.4/include/node/node.h:42:
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:221:5: error: assigning to 'v8::Function _volatile' from incompatible type 'v8::Value *'
    TYPE_CHECK(T, S);
    ^~~~~~~~~~~~~~~~
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:180:37: note: expanded from macro 'TYPE_CHECK'
    *(static_cast<T_ volatile_>(0)) = static_cast<S_>(0);      \
                                    ^ ~~~~~~~~~~~~~~~~~~
../node_modules/nan/nan.h:1645:12: note: in instantiation of function template specialization 'v8::Localv8::Function::Localv8::Value' requested here
    return NanEscapeScope(NanNew(handle)->Get(kCallbackIndex)
           ^
../node_modules/nan/nan.h:483:30: note: expanded from macro 'NanEscapeScope'
# define NanEscapeScope(val) scope.Escape(Nan::imp::NanEnsureLocal(val))

                     ^

In file included from ../src/bindings.cc:2:
In file included from /Users/xxxx/.node-gyp/4.2.4/include/node/node.h:42:
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:221:5: error: assigning to 'v8::Object _volatile' from incompatible type 'v8::Value *'
    TYPE_CHECK(T, S);
    ^~~~~~~~~~~~~~~~
/Users/xxxx/.node-gyp/4.2.4/include/node/v8.h:180:37: note: expanded from macro 'TYPE_CHECK'
    *(static_cast<T_ volatile_>(0)) = static_cast<S_>(0);      \
                                    ^ ~~~~~~~~~~~~~~~~~~
../node_modules/nan/nan.h:1776:12: note: in instantiation of function template specialization 'v8::Localv8::Object::Localv8::Value' requested here
    return NanEscapeScope(
           ^
../node_modules/nan/nan.h:483:30: note: expanded from macro 'NanEscapeScope'
# define NanEscapeScope(val) scope.Escape(Nan::imp::NanEnsureLocal(val))

                     ^

10 errors generated.
make: **\* [Release/obj.target/farmhash/src/bindings.o] Error 1
gyp ERR! build error 
gyp ERR! stack Error: `make` failed with exit code: 2
gyp ERR! stack     at ChildProcess.onExit (/Users/xxxx/.nvm/versions/node/v4.2.4/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:270:23)
gyp ERR! stack     at emitTwo (events.js:87:13)
gyp ERR! stack     at ChildProcess.emit (events.js:172:7)
gyp ERR! stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:200:12)
gyp ERR! System Darwin 15.5.0
gyp ERR! command "/Users/xxxx/.nvm/versions/node/v4.2.4/bin/node" "/Users/xxxx/.nvm/versions/node/v4.2.4/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /Users/xxxx/work/testing/node_modules/ringpop/node_modules/farmhash
gyp ERR! node -v v4.2.4
gyp ERR! node-gyp -v v3.0.3
gyp ERR! not ok 
npm ERR! Darwin 15.5.0
npm ERR! argv "/Users/xxxx/.nvm/versions/node/v4.2.4/bin/node" "/Users/xxxx/.nvm/versions/node/v4.2.4/bin/npm" "install" "ringpop" "--save"
npm ERR! node v4.2.4
npm ERR! npm  v2.14.12
npm ERR! code ELIFECYCLE

npm ERR! [email protected] install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the [email protected] install script 'node-gyp rebuild'.
npm ERR! This is most likely a problem with the toobusy package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR!     node-gyp rebuild
npm ERR! You can get their info via:
npm ERR!     npm owner ls toobusy
npm ERR! There is likely additional logging output above.

npm ERR! Please include the following file with any support request:
npm ERR!     /Users/xxxx/work/testing/npm-debug.log

We have tried on two different mac , we are still seeing this issue.

adaptive timeouts

An aggressive, yet reliable, adaptive timeout for pings could lead to earlier fault detections and faster convergence in the event of failures. we have the safety net of SWIM's suspect subprotocol to reduce potential false positives. SWIM states that ping timeouts are 1x, ping-req time outs should be 2x and the protocol period at least ping timeout + ping-req timeout. We should strive to get closer to these ideals than the current configuration of a minimum 200ms protocol period with 1500ms ping timeout and 5000ms ping-req timeout.

References and sources of inspiration:

Merge partitioned clusters

Ringpop should be able to handle network partitions that temporarily cause two or more clusters to form. A ringpop instance never attempts to rejoin faulty members, but could since we maintain faulty members in the membership list.

the request proxy needs to timeout.

RequestProxy.handleRequest needs to timeout and clean itself up.

Having looked at heapsnapshots too much I have a feeling that we are leaking the mock responses.

We should time them out and tear them down. I think it causes a closure leak in tchannel.

cc @jwolski

ringpop for {go,python,java,etc.}

So, is there a protocol spec. for other ringpop? As in, if somebody wanted to implement ringpop for e.g. Go, how would they go about doing that?

New nodes joining (big) clusters cause more traffic in ping responses than needed

New nodes joining an existing cluster add the complete membership list to the list of updates in its own Dissemination. These updates should already being propagated within the cluster and thus not needed in the dissemination of this node.

Since ringpop currently transfers the complete changes list on ping responses the network traffic created because of this increases with the number of nodes. This violates the claim made in section 2 of the SWIM paper where they claim the message size does not increase with the number of nodes in the cluster.

Suggested solution is to not use the membership list a new node receives upon joining as updates in the Dissemination protocol.

Ping-req should be hardened against connection errors

Ping-req might lead to false positive suspects if requests to all k members fail because of socket-level errors. If 3 ping-req members, X + Y + Z are chosen to ping member A and all 3 fail, not because A is unreachable, but because X + Y + Z requests result in socket-level failures, then A is considered a suspect. This seems wrong or at the very least too aggressive. At the very least, it's worth collecting more data about to see how often these false positives occur.

Some options potential options are:

  • Leave the impl the way it is
  • Move on to the next protocol period and let others discover that A is really down
  • Ping-req retries and select different group of k members.

Installation issue with Node 10 - 13

I tried to install tchannel node locally on mac using multiple node versions, 10-13. None of them works. I got the same error. I would like to use RingPop and tchannel is a dependency of it. Thanks!

npm install tchannel

[email protected] install /Users/congwang/node/node_modules/sse4_crc32
node-gyp rebuild

CXX(target) Release/obj.target/crc32c_sse42/src/crc32c_sse42.o
LIBTOOL-STATIC Release/crc32c_sse42.a
libtool: unrecognized option -static' libtool: Try libtool --help' for more information.
make: *** [Release/crc32c_sse42.a] Error 1
gyp ERR! build error
gyp ERR! stack Error: make failed with exit code: 2
gyp ERR! stack at ChildProcess.onExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:194:23)
gyp ERR! stack at ChildProcess.emit (events.js:219:5)
gyp ERR! stack at Process.ChildProcess._handle.onexit (internal/child_process.js:274:12)
gyp ERR! System Darwin 18.7.0
gyp ERR! command "/usr/local/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /Users/congwang/node/node_modules/sse4_crc32
gyp ERR! node -v v13.3.0
gyp ERR! node-gyp -v v5.0.5
gyp ERR! not ok
npm WARN [email protected] No description
npm WARN [email protected] No repository field.

npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] install: node-gyp rebuild
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! /Users/congwang/.npm/_logs/2019-12-11T23_30_29_987Z-debug.log

Allow Nodes to mark other nodes Rejected only if not itself in Suspect / Faulty lists

While Issue #42 is still pending ... just wanted to check whether RingPop currently debars all 'Reject M' messages from a Node N, where Node N itself is in a suspect / Faulty list of all others...?

(Use case would be when partitioned sets, mark nodes in the other partitions as Faulty, and lets assume that the network restores, the Reject messages would pass over to the other partition thereby marking Alive nodes in the correspondingly opposite-partitions).

(Have just come from watching RingPop@Rackspace video & reading SWIM paper ... so pardon if I am missing the elephant in the room)

add internal event tracing capabilities

For diagnostics, analysis, and even when ringpop is scaffolding for other tools, it would be beneficial to have a generic mechanism for forwarding ringpop internal/protocol events.

Maintain 2 checksums: membership and ring

The reason this is being proposed is because proxied requests are refused when membership checksums differ even if the underlying server/replicas in the ring are the same. That's because member status and incarnation number are part of the checksum. This results in more retries for proxied requests than desired.

When we validate source/destination checksums on proxy, it is to make sure that hashring lookups are consistent. And they would be if member status like suspect was not part of the checksum.

We rely heavily on membership checksums for full-syncs, etc. So we'd need 2 different checkums one for membership and the other for ring.

benchmark convergence time and other metrics

It would be good to benchmark convergence time and other metrics that we care.

It may not necessarily reflect the real situation in production but it can be used to have a rough idea on performance characteristics and as a baseline to prevent regression under difference scenarios that we can simulate.

The benchmark should be automated under different ringpop/swim configurations and scenarios like single node failure, catastrophic failure, parallel restart, rolling restart, capacity increase, packet loss, latency increase, etc.

I made a simple prototype under https://github.com/mrhooray/swim-js/tree/master/bench which measures convergence time under single node failure scenario. It does not include much automation but can be tuned with cli arguments and produces result like

configuration:
- cycles 20
- workers 30
- codec msgpack
- dissemination factor 15
- interval 20 ms
- join timeout 100 ms
- ping timeout 4 ms
- ping req timeout 12 ms
- ping req group size 3
- max dgram size 512 bytes
convergence time under single node failure
histogram data:
- count 20
- min 94
- max 198
- mean 130.4
- median 126
- variance 807.8315789473684
- std dev 28.422378136731773
- p75 143.75
- p95 196.74999999999997
- p99 198

Is this something we'd like to have? suggestions/ideas? @jwolski @Raynos

Emit fewer ringChanged events by taking into account previous status

Emit ringChanged events only when servers are added/removed from the ring. Today, the event is emitted if an alive member's incarnation number is bumped. This is incorrect. Top-level ringpop object should simply just forward added and removed events from it's internal ring object.

fanout membership updates

fanout membership updates when they are first detected to expedite convergence. we can use the piggyback count on the update recorded in the dissemination component to aid fanout controls. we may not want to fanout all updates with piggyback count of 0 either. we may want to take into account both piggyback count as well as the source of the update. only the source should fanout the first N times updates are disseminated.

ringpop install issue

Hi, i am installing ringpop in linux and i am getting below error, node version v5.0.0

starting i am getting this error after some installation i am getting another error as shown at end

In file included from ../src/bindings.cc:4:0:
../../nan/nan.h:324:27: error: redefinition of ‘template v8::Local Nan::imp::NanEnsureHandleOrPersistent(const v8::Local&)’
NAN_INLINE v8::Local NanEnsureHandleOrPersistent(const v8::Local &val) {
^
../../nan/nan.h:319:17: error: ‘template v8::Handle Nan::imp::NanEnsureHandleOrPersistent(v8::Handle&)’ previously declared here
v8::Handle NanEnsureHandleOrPersistent(const v8::Handle &val) {
^
../../nan/nan.h:344:27: error: redefinition of ‘template v8::Local Nan::imp::NanEnsureLocal(v8::Handle&)’
NAN_INLINE v8::Local NanEnsureLocal(const v8::Handle &val) {
^
../../nan/nan.h:334:27: error: ‘template v8::Local Nan::imp::NanEnsureLocal(const v8::Local&)’ previously declared here
NAN_INLINE v8::Local NanEnsureLocal(const v8::Local &val) {
^
../../nan/nan.h:757:13: error: ‘node::smalloc’ has not been declared
, node::smalloc::FreeCallback callback
^
../../nan/nan.h:757:35: error: expected ‘,’ or ‘...’ before ‘callback’
, node::smalloc::FreeCallback callback
^
../../nan/nan.h: In function ‘v8::Localv8::Object NanNewBufferHandle(char_, size_t, int)’:
../../nan/nan.h:761:50: error: ‘callback’ was not declared in this scope
v8::Isolate::GetCurrent(), data, length, callback, hint);
^
../../nan/nan.h:761:60: error: ‘hint’ was not declared in this scope
v8::Isolate::GetCurrent(), data, length, callback, hint);
^
../../nan/nan.h: In function ‘v8::Localv8::Object NanNewBufferHandle(const char_, uint32_t)’:
../../nan/nan.h:768:67: error: call of overloaded ‘New(v8::Isolate_, const char_&, uint32_t&)’ is ambiguous
return node::Buffer::New(v8::Isolate::GetCurrent(), data, size);
^
../../nan/nan.h:768:67: note: candidates are:
In file included from ../../nan/nan.h:25:0,
from ../src/bindings.cc:4:
/root/.node-gyp/5.0.0/include/node/node_buffer.h:31:40: note: v8::MaybeLocalv8::Object node::Buffer::New(v8::Isolate_, v8::Localv8::String, node::encoding)
NODE_EXTERN v8::MaybeLocalv8::Object New(v8::Isolate_ isolate,
^
/root/.node-gyp/5.0.0/include/node/node_buffer.h:31:40: note: no known conversion for argument 3 from ‘uint32_t {aka unsigned int}’ to ‘node::encoding’
/root/.node-gyp/5.0.0/include/node/node_buffer.h:43:40: note: v8::MaybeLocalv8::Object node::Buffer::New(v8::Isolate_, char_, size_t)
NODE_EXTERN v8::MaybeLocalv8::Object New(v8::Isolate* isolate,
^
/root/.node-gyp/5.0.0/include/node/node_buffer.h:43:40: note: no known conversion for argument 2 from ‘const char_’ to ‘char_’
In file included from ../src/bindings.cc:4:0:
../../nan/nan.h: In function ‘v8::Localv8::Object NanNewBufferHandle(uint32_t)’:
../../nan/nan.h:772:61: error: could not convert ‘node::Buffer::New(v8::Isolate::GetCurrent(), ((size_t)size))’ from ‘v8::MaybeLocalv8::Object’ to ‘v8::Localv8::Object’
return node::Buffer::New(v8::Isolate::GetCurrent(), size);
^
../../nan/nan.h: In function ‘v8::Localv8::Object NanBufferUse(char_, uint32_t)’:
../../nan/nan.h:779:12: error: ‘Use’ is not a member of ‘node::Buffer’
return node::Buffer::Use(v8::Isolate::GetCurrent(), data, size);
^
make: *_* [Release/obj.target/farmhash/src/bindings.o] Error 1
make: Leaving directory /home/nowconfer/NowConfer-1.0.1.0/node_modules/ringpop/node_modules/farmhash/build' gyp ERR! build error gyp ERR! stack Error:make` failed with exit code: 2
gyp ERR! stack at ChildProcess.onExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:270:23)
gyp ERR! stack at emitTwo (events.js:87:13)
gyp ERR! stack at ChildProcess.emit (events.js:172:7)
gyp ERR! stack at Process.ChildProcess._handle.onexit (internal/child_process.js:200:12)
gyp ERR! System Linux 3.10.0-123.el7.x86_64
gyp ERR! command "/usr/local/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /home/nowconfer/NowConfer-1.0.1.0/node_modules/ringpop/node_modules/farmhash
gyp ERR! node -v v5.0.0
gyp ERR! node-gyp -v v3.0.3
gyp ERR! not ok

Please help in solving this issues, thanks in advance

npm ERR! Linux 3.10.0-123.el7.x86_64
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "install" "ringpop" "-save"
npm ERR! node v5.0.0
npm ERR! npm v3.5.0
npm ERR! code ELIFECYCLE

npm ERR! [email protected] install: node-gyp rebuild
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] install script 'node-gyp rebuild'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the farmhash package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! node-gyp rebuild
npm ERR! You can get their info via:
npm ERR! npm owner ls farmhash
npm ERR! There is likely additional logging output above.

npm ERR! Please include the following file with any support request:
npm ERR! /home/nowconfer/NowConfer-1.0.1.0/npm-debug.log

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.