Coder Social home page Coder Social logo

tiered-storage's People

Contributors

cw75 avatar jhellerstein avatar saurav-c avatar vsreekanti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tiered-storage's Issues

Clean up pass-by-ref vs pass-by-val

There are a few places where we're needlessly passing around references to things like integers. Change those to pass by value and make them const when possible.

Better error checking on bad requests

Currently, if you send a malformed request from the user to the proxy, the proxy segfaults. e.g., if you type PUT a 1 instead of PUT a:1... that should throw an error, not crash. :-)

Better client interface

Let Indy client proxy accept a script that contains the commands (rather than requiring us to type them in...)

consider changing request lifecycle

A few months back the request lifecycle looked something like this:

  1. User sends request to routing layer.
  2. Routing layer forwards the request to the server.
  3. Server responds to routing.
  4. Routing responds to user.

We concluded that this was bad because this meant that the (potentially large) value now made two hops: Server to routing and routing to user. The simple solution at the time was to make the routing layer simply respond with the addresses of the correct server and have the user communicate directly with the server. The result was a request lifecycle that looks like this:

  1. User sends a key to the routing layer.
  2. The routing layer responds with the addresses of the set of servers responsible for this key.
  3. The user caches these addresses and sends the request directly to the server.
  4. The server responds.
  5. The user relies on its cache for future requests until a server receives a message for a key it's not responsible for. The server then tells the user to invalidate its cache for that key and return to step 1.

However, we’ve since changed the architecture of the user & routing components to make them asynchronous. When a user sends a request, it includes it’s IP address in the request so that the routing layer knows where to respond to, because it might have to make a request to determine the correct replication factor for the key. In light of that change, we should change the request pattern:

  1. User sends request to routing layer.
  2. Routing layer forwards request along to one of the correct servers.
  3. Server responds directly to user.

Pros:

  • This reduces the maximum number of roundtrips for a request from 4 to 3.
  • This could potentially be even faster if the routing layer is colocated with the storage layer (see #67).

Cons:

  • Forces the routing layer to be invoked on every single request. i.e., the minimum number of requests goes from 2 to 3. Whether this is a net negative or positive depends on the workload.

cc @cw75

consider having routing and server in a single process

these are ostensibly separated because there's much more serving traffic than routing traffic, and we can deploy many fewer routing nodes. Still, doesn't seem very harmful to have that lightly-used service embedded in the server process

Style

#60 made me realize that we have inconsistent styles, for example, around #define guards. By default, I usually go with Google's style guides, but I don't know a ton about "good" C++ development, so I'm happy to defer to anyone else with stronger opinions on the subject. Either way, we are inconsistent right now, so we should fix that.

Once we finish this refactor, we should also probably run clang-format on the code because I'm pretty sure we have some weird indentation stuff going on, imports are not alphabetical, etc. I'll eventually put this into the Travis build as well.

@cw75 @jhellerstein: Tagging you guys, so that we can come to a decision on this soon and move forward.

Clean up protobuf definitions

  • Typenames should be PascalCased
  • Look into collapsing duplicate definitions of sub-messages (e.g., Global and Local in ReplicationFactorRequest and ReplicationFactor)
  • Consider reuse of existing messages instead of specialized messages.

Protobuf Question

Which Protocol Buffer version should I use? proto2? proto3? Doesn't matter?

Better failure handling when no storage servers exist

When there are no storage servers, the routing node fails with a floating point exception because it tries to hash into the global hash ring when the ring is empty. It just prints out error!, and the routing process fails silently. This is obviously bad, and we should instead return a more useful error message to the user.

Scalable Client Proxies

If we want to have multiple client proxies, there exists a race condition where a server joins and a client proxy is added immediately after and therefore doesn't get the newest server in the list of servers that it uses to construct a hash ring. We can avoid this by having the clients gossip their lists of servers every time the list is updated or a new client joins.

Storage server seg faults when `SERVER_TYPE` is unset

In the Kubernetes deployment, we rely on an environment variable called SERVER_TYPE to tell the storage server process whether it should start as a memory or disk tier node. If no such variable is set when the server process is started (with ./build/src/bedrock/server), the server seg faults and fails to start. We should either have a more graceful error message or have a default (probably makes sense to default to being a memory tier server?) instead of failing silently.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.