Coder Social home page Coder Social logo

feup-sdis's Introduction

Hello ๐Ÿ‘‹, I'm Daniel Silva.

I'm from Porto, Portugal, and currently a Developer @ konkConsulting.

Dannyps' github stats

Top Langs

feup-sdis's People

Contributors

dannyps avatar fabiodrg avatar

Watchers

 avatar  avatar

feup-sdis's Issues

Generic Listen class

It's very likely that MCListen, MDBListen and MDRListen will share fields and behavior. Maybe it makes sense to create a generic class to handle common behavior. Related to #18

Implement Backup sub protocols messages

Considering the existing Message class it makes some sense in my head to create individual classes for each type of message, although I also feel I am complicating the problem...

The classes could have support for creating the messages and parsing received messages.

  • Create a PUTCHUNK message
  • Parse a PUTCHUNK message
  • Create a STORED message
  • Parse a STORED message

Enhance ChunkInfo class to track peers who stored the chunk

Tracking only the number of peers who stored a given chunk is not enough. This is because the initiator peer may request the same chunk multiple times if it doesn't meet the desired replication degree (loss of STORED messages for example). In this scenario, all peers who already stored the chunk will send the STORED message again, misleading the backup degree.

Create ChunkMessage

Add support for CHUNK messages with the following properties:

  • Version
  • SenderId
  • FileId
  • ChunkNo
  • Chunk Data (Body)

Rethink how file hashes are stored

Storing the hash as a string, which results from a raw conversion of the 32-bytes hash to most likely an UTF-16 string is just dumb...

  • Store the file id as byte[]
  • When printing the file id, display in the hexadecimal format

Add method to load local stored chunks

Add a method on Peer to load the locally stored chunks. This load either is called every time is needed, or it's loaded when the program runs and then for every additionally stored chunk, it updates the internal data structure.

Create the DeleteWorker

Upon a DELETE message, the peer launches the DeleteWorker which will delete the chunks from its own file system

Create DeleteMessage

Message to handle DELETE protocol messages. This message is sent through the MC channel.

Initiator peer must listen for STORED messages and ensure the replication degree

  1. Initiator peer sends the PUTCHUNK message
  2. Waits for 1 second and collects the STORED messages, i.e., it requires a data structure that maps a given pair (fileId,chunkNo) to the peers who stored the chunk.
  3. If upon 1 second the replication degree is not met, doubles the waiting time and re-sends the PUTCHUNK message.

This loop is done at most 5 times per chunk.

Implement MCListen

Similar to MDBListen, create a class to listen to the Control channel.

Implement a Message class

Maybe makes some sense to have another class to generate and parse Messages. Since the communication for control messages is UDP, this class could be able to receive DatagramPackets and generate DatagramPackets according to the subprotocol being used.

Generic format

All messages have the following header: <Message type> <Version> <SenderId> <FileId> <ChunkNo> <ReplicationDeg> <CRLF>

  • Not all control messages have all fields, but appear in the relative order as illustrated above.
  • Moreover, all fields are separated between one or more spaces.
  • The header always terminates with the sequence <CRLF>, in addition to the <CRLF> shown above. Might have spaces in between, but no other charcaters.
  • Strings in Java are UTF-16! The fields must be ASCII characters, therefore each character should take a single byte!

Common fields

Every message has:

  • MessageType
  • Version
  • SenderId

Other fields must are specific to sub-protocols, which could extend this Message class, however, might not be worth it.

Utility methods

Some fields have specific encodings, thus this class should offer protected methods to handle it.

Tip: String class offers the method getBytes, which allows to specify the encoding. Also see StandardCharsets

  • MessageType: Just the type of message. A sequence of ASCII characters. Variable length.
  • Version: Three ASCII characters in format <n>.<m>, where n and m are ASCII digits.
  • SenderId: Variable length of the sender ID.
  • FileId: The file SHA256 hash. This hash takes 32 bytes, however, this field should take 64 bytes! Each byte of the hash is encoded as two ASCII characters. For instance, a byte 0xB2 from the hash should be encoded as B2 in the field. Handle both uppercase or lowercase characters, i.e, B is the same as b. Represented in big-endian order.
  • ChunkNo: The chunk number, which is an integer, should be encoded as a sequence of ASCII characters for each digit. I.e., the number 123 is encoded with the ASCII characters that represent 1, 2, ..., starting at most significant digit. Can't be larger than 6 chars, therefore maximum file size is 64GB.
  • ReplicationDeg: Single byte. ASCII character for the digit (which means replication degree ranges from 0 to 9).

Implement RegularFile class

A class responsible to receive a file name, split itself in Chunks, generate its own identifier (#1), and later on maybe to construct the original File from all Chunks.

Compute a hash for files

Method to compute a hash to be used as a unique identifier for a given file. This identifier should change upon file content modifications.

  • Computing the hash for the file content might be slow
  • Use file metadata (file name, date modified, owner, date creation)
  • Later, investigate more efficient ways (picking random blocks of data for example?)

ChunkSenderWorker attempts to read files that may not exist

Recovery fails when some peer does not have the necessary chunk because it hangs (NullPointerException on reading non-existing file). Attempts with large files (>1.5MB) and 4 peers were enough to replicate this condition.

Basically, the MCListen launches the ChunkSenderWorker upon receiving a GETCHUNK message. Instead of ensuring the chunk is stored locally, it attempts to get the chunk from the filesystem which causes the exception.

Path file2Send = Paths.get(ServiceFileSystem.getBackupChunkPath(msg.getFileIdHexStr(), msg.getChunkNo()));

Review testing code to investigate lack of reliability

Peer sending messages to itself (considering there's a single peer running), the results are a shitfest. Sometimes the sent message matches the received message. Sometimes larger fields such as file id or data are broken. Sometimes, a partial datagram is received.. Sure, UDP is not reliable, but this is running locally. It doesn't feel right at all..

Add local service state interface for Client

One of the interfaces every peer must provide to the client is the retrieve of local service state information. Notice that the Peer must send its information to the Client application, instead of simply printing. A class to group all the needed information sounds good. The client requests the state information, the Peer populates an object with the desired information, and sends it back to the client through RMI, as simple as returning the class object instance.

Relevant information

The backup services instantiated by the Peer

For each local file the peer has instantiated its backup, it must show:

  • Pathname
  • The hash file identifier
  • Desired replication degree
  • List the chunks
    • The chunk identifier
    • The actual replication degree (what we call backup degree)

The local stored chunks as requested by other Peers

For each chunk display:

  • Chunk identifier
  • The size in KB
  • The backup replication degree

Storage capacity

  • Maximum amount of disk space to be used for storing chunks
  • Amount of that space already used

Use the actual filename for computing the file id

Another mistake.. I am using the path for the file as if it was the actual filename, which is wrong because the program can be instantiated from different directories, thus the same file has different relative paths, resulting in different hashes.

Create GetChunkMessage

Extend Message class to handle GETCHUNK messages. The required attributes are:

  • Version
  • SenderId
  • FileId
  • ChunkNo

Implement Chunk class

  • Must be serializable
  • Contain file id (hash, start with SHA256)
  • Contain the chunk id (hash)
  • Replication degree
  • ...

Add Backup worker to handle PUTCHUNK messages

The client requests the backup service. This splits the file in chunks each will be sent, one by one, to all the peers. Considering #17, the process of listening for STORED messages and ensuring the desired replication degree may take a while (5 retries, up to ~31 seconds). It makes sense to dispatch the service requests, like backup requests, to workers. I.e, the Peer splits the file in chunks as it is, it's a fast operation, but then launches a worker (thread) responsible for creating the message, await for responses and ensuring the replication degree. Then it's ready for incoming backup or other service requests.

Enhance the BackupWorker threadpool

The worst case scenario is when a given chunk's replication degree is not met. This thread stays on the ThreadPool, wasting computing resources. It seems there's no easy/native way to put a sleeping thread in a queue, in order to run another waiting thread. Once the thread pool starts executing a thread, until it dies, it must stay there.
Using ScheduledThreadPool is not a perfect solution either. This runs thread periodically, which is not what we want, at all.

Simplify how messages are sent

Repeated code alert ๐Ÿšจ

Add methods to the Peer for sending messages through the channels. The argument can be a datagram for example. Inside the method, the Peer handles setting the socket address when needed and sends the message through the socket.

Keep track of backed up chunks

Each peer must be aware of backed up chunks on its file system. Relevant information is the replication degree desired and the actual replication degree on the system.

  • Once the Peer receives the PUTCHUNK, it stores data on some table with key (fileId, chunkNo) and initializes the relevant data.
  • Subsequent STORED messages should update this information.
  • REMOVED messages also affect this information
  • DELETE messages should remove all information regarding the backed up file
  • The data structure used for this purpose should be serializable and load upon Peer launching and saved periodically on disk. See #34 and #35

Review Peer command line arguments

Note: The "name" of each multicast channel consists of the IP multicast address and port, and should be configurable via a pair of command line arguments of the server program. The "name" of the channels should be provided in the following order MC, MDB, MDR. These arguments must follow immediately the first three command line arguments, which are the protocol version, the server id and the service access point. - Source

The Peer command line arguments order is not respecting. It should be:
<protocol version> <server id> <service AP> <MC> <MDB> <MDR>.

Right, @Dannyps ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.