Coder Social home page Coder Social logo

thebes's People

Contributors

aarondav avatar alig avatar jhellerstein avatar pbailis avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

thebes's Issues

figure out distributed transaction protocol

do we ship the transaction in its entirety to the remote datacenter and have it execute there? do we do 2PC and 2PL in the remote DC and then send the result back to the client?

todo

add tag support

AARON:
run experiments with no TM, small
test cross-cluster with new AMI

PETER:
write scripts to parse logs
functional testing of RR, RC
figure out percentile composition

once aaron's code works, run full load on
1.) bunch of m1.large on us-east-1; 2 clusters
2.) bunch of m1.xlarge across us-east-1 and us-west-2

expand interface for replication

need to add timestamp
likely: make a thrift "writerequest" datatype

need to add additional thrift service for intra-replica communciation

Dynamic master configuration

It would be nice to dynamically configure masters so we can play with failure modes under 2PL. One way to do this is via ZooKeeper.

There are other issues here, like durability (see #26).

Add Transaction Manager Proxy (single cluster)

Long-haul WAN latencies will be expensive for individual operations; build a service that allows clients to send their entire transaction logic to a coordinator node.

This will require a wire protocol that allows clients to express their entire transaction, then ship it to the Transaction Manager. The Transaction Manager will look a lot like the previous implementation of the TwoPLClient.

For this milestone, consider the case where all masters are in a single cluster. #24 changes this.

Java 6 Compatibility

Some of the Config code requires Java 7 features. To preserve backwards compatibility, we should remove them.

test YCSB

run one YCSB process per physical host (1:1 mapping between servers and clients)
vary number of threads per process to vary throughput
measure YCSB reported throughput and latency
each thread gets a separate DB instance, so no problem with synchronization

ycsb load thebes -threads 10 -fieldlength=1 -p fieldcount=1 -p operationcount=10000 -p recordcount=10000 -t

ycsb run thebes -threads 10 -fieldlength=1 -p fieldcount=1 -p operationcount=10000 -p recordcount=10000 -t

Durable storage over WAN before acknowledgment

Write to a majority of non-master replicas for a given cluster before acknowledging to the client.

This is what Spanner accomplishes via Paxos-replicated log writes. Omitting it from experiments only makes them look better.

ycsb integration

we need to integrate as a database in YCSB. we should also fork the YCSB codebase to include a "transaction" construct.

Two Phase Locking Overview

There are several steps:
Implement a Local Lock Manager #20
Set partition masters in yaml configuration #21
TwoPL clients directly contact masters to perform transactions #22
Add Transaction Manager Proxy (single cluster) #23
Add cross-cluster transactional support with TransactionManagers #24

Stretch goals:
Dynamic master configuration #25
Durable acks over WAN #26

Add cross-cluster TransactionManager support

We'll want to run cross-cluster transactions if masters live in different clusters. We should come up with a heuristic to choose an optimal TransactionManager to do the proxying.

A first cut at this is to simply select the TM in the cluster with the most masters for a given transaction's data items.

We can develop latency-specific heuristics later.

add garbage collection of partial order metadata

one way to do this is to use async handlers for anti-entropy messages, then, then all handlers have acked, send a notification to clients.

the problem here is that now clients have to run a server.

how do we set up isolation level settings and architect for them

my temptation is to do this in the yaml configuration. this will require shutting down the cluster between options.

ansi_isolation: {repeatableread, readcommitted, readuncommitted}
transactional_visibility: true

// partial orders need to be kept separate--or do they
partial_orders: [{causal, explicit, monotonicwrites, monotonicreads}]

with the exception of explicit causality, we shouldn't have to change the API

also, need to figure out how to achieve this with modularity of the code.

support basic anti-entropy between nodes

change "ReplicaService"

add "InternalReplicaService":
*_add thrift service definition
*_optional: add thrift server port to configuration
*_start service on server boot
*_change configuration in AntiEntropyService

set up actual anti-entropy
**call AntiEntropyService.sendToNeighbors on ReplicaService.put()

add "timestamp" field to put request--hold off for now.

support for version vectors

I need to add version vectors to the Thebes API to track causality (effectively going from a scalar clientId, timestamp to a Map<serverId, timestamp>). How do you want to handle this?

My proposal is that we can change Version to become VersionVector and discard it in TwoPLServer (effectively what we do anyway with dependencies!)

To induce a total ordering between vectors, we can pick the one with the highest timestamp.

Thoughts? I am planning to do this soon.

reconfigure command line options

since we're shipping the client library as a jar, its dependencies on the command line are going to cause a problem. what we probably want to do is instead set java environment variables.

i think that using environment variables is still better than setting configuration file parameters, simply because the latter causes lots of headache when running experiments and/or quickly changing parameters (especially across a cluster!). this is open for discussion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.