jamestran201 / mit-distributed-systems-labs Goto Github PK
View Code? Open in Web Editor NEWImplementation of the labs for MIT distributed systems course
Implementation of the labs for MIT distributed systems course
In this lab you'll implement Raft as a Go object type with associated methods, meant to be used as a module in a larger service. A set of Raft instances talk to each other with RPC to maintain replicated logs. Your Raft interface will support an indefinite sequence of numbered commands, also called log entries. The entries are numbered with index numbers. The log entry with a given index will eventually be committed. At that point, your Raft should send the log entry to the larger service for it to execute.
Follow the design in https://pdos.csail.mit.edu/6.824/papers/raft-extended.pdf, pay particular attention to Figure 2. Will not implement "cluster membership changes" (chapter 6).
Resources:
A cluster consists of:
When a file is created, it is divided into chunks, each chunk is assigned a globally unique ID by the master. Chunkservers store chunks as Linux files. Master contains mapping from file name to chunk ID's, and other metadata.
Clients often ask the master for metadata
, but it will talk to the chunkserers to read/write data.
Around 64MB
Small files may only have 1 chunk. Chunkservers storing these chunks may become hot spots if many clients are reading the same file.
Master stores 3 main types of metadata:
All 3 are stored in mem. 1 and 2 are also persisted to logs to help with recovery when the master crashes. Data for 3 is gathered when master starts up or when new chunkserver joins the cluster.
Storing metadata in memory
Pros:
The master knows where to find the chunk locations because:
The operation log contains historical record of critical metadata changes. Files and chunks are uniquely and eternally identified by their logical timestamps at which they were created in the logs.
The operation log is replicated across many machines. The master will not respond to a client operation until the log has been flushed to disk locally and remotely. The master batches several log records together for flushing to reduce the overall impact on throughput.
The master recovers its state by using a checkpoint and replaying the operations in the log. Once the log reaches a certain size, the master will produce a checkpoint which will be replicated to remote machines. The log can then be reduced to only contain operations after the checkpoint.
Question: How do they process writes to the metadata data structure while checkpointing is going on?
File namespace creations (e.g., file creation) are atomic
The state of a file region after a data mutation depends on the type of mutation, whether it succeeds or fails, and whether there are concurrent mutations.
A file region is consistent if all clients will always see the same data, regardless of which replicas they read from.
A region is defined after a file data mutation if it is consistent and clients will see what the mutation writes in its entirety.
Because clients cache chunk locations, they may read from a stale replica. This issue is limited by the cache timeout and refreshing the cache whenever a file is re-opened.
Summarize the requirements for the coordinator and worker in lab 1.
mr-out-*
nReduce
reduce tasks, where nReduce
is the number of reduce tasks -- the argument that main/mrcoordinator.go
passes to MakeCoordinator()
. Each mapper should create nReduce
intermediate files for consumption by the reduce tasks.mr-out-X
.mr-out-X
file should contain one line per Reduce function output. The line should be generated with the Go "%v %v"
format, called with the key and value. Have a look in main/mrsequential.go
for the line commented "this is the correct format". The test script will fail if your implementation deviates too much from this format.main/mrcoordinator.go
expects mr/coordinator.go
to implement a Done()
method that returns true when the MapReduce job is completely finished; at that point, mrcoordinator.go
will exit.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.