dannyps / feup-sdis Goto Github PK
View Code? Open in Web Editor NEWSDIS @ FEUP
SDIS @ FEUP
It's very likely that MCListen
, MDBListen
and MDRListen
will share fields and behavior. Maybe it makes sense to create a generic class to handle common behavior. Related to #18
Considering the existing Message
class it makes some sense in my head to create individual classes for each type of message, although I also feel I am complicating the problem...
The classes could have support for creating the messages and parsing received messages.
PUTCHUNK
messagePUTCHUNK
messageSTORED
messageSTORED
messageTracking only the number of peers who stored a given chunk is not enough. This is because the initiator peer may request the same chunk multiple times if it doesn't meet the desired replication degree (loss of STORED messages for example). In this scenario, all peers who already stored the chunk will send the STORED message again, misleading the backup degree.
Add support for CHUNK messages with the following properties:
Storing the hash as a string, which results from a raw conversion of the 32-bytes hash to most likely an UTF-16 string is just dumb...
byte[]
It should be more flexible
Add a method on Peer to load the locally stored chunks. This load either is called every time is needed, or it's loaded when the program runs and then for every additionally stored chunk, it updates the internal data structure.
Upon a DELETE message, the peer launches the DeleteWorker
which will delete the chunks from its own file system
As you suggested @Dannyps, although I am not having any success with this.
Perhaps, the ChunkReceiverWatcher
should launch, yet another thread, to merge the chunks and write them to the disk.
Didn't notice that such class was right there in my working directory, so I used Strings here and there for protocol version. Might be useful to use this class.
Message to handle DELETE protocol messages. This message is sent through the MC channel.
Depends on #31
This is a common task, thus it makes sense that Message is able to generate a DatagramPacket for itself.
This loop is done at most 5 times per chunk.
See the specification of the messages format or #5
Similar to MDBListen, create a class to listen to the Control channel.
Maybe makes some sense to have another class to generate and parse Message
s. Since the communication for control messages is UDP, this class could be able to receive DatagramPacket
s and generate DatagramPacket
s according to the subprotocol being used.
All messages have the following header: <Message type> <Version> <SenderId> <FileId> <ChunkNo> <ReplicationDeg> <CRLF>
<CRLF>
, in addition to the <CRLF>
shown above. Might have spaces in between, but no other charcaters.Every message has:
MessageType
Version
SenderId
Other fields must are specific to sub-protocols, which could extend this Message
class, however, might not be worth it.
Some fields have specific encodings, thus this class should offer protected methods to handle it.
Tip: String
class offers the method getBytes
, which allows to specify the encoding. Also see StandardCharsets
MessageType
: Just the type of message. A sequence of ASCII characters. Variable length.Version
: Three ASCII characters in format <n>.<m>
, where n
and m
are ASCII digits.SenderId
: Variable length of the sender ID.FileId
: The file SHA256 hash. This hash takes 32 bytes, however, this field should take 64 bytes! Each byte of the hash is encoded as two ASCII characters. For instance, a byte 0xB2
from the hash should be encoded as B2
in the field. Handle both uppercase or lowercase characters, i.e, B
is the same as b
. Represented in big-endian order.ChunkNo
: The chunk number, which is an integer, should be encoded as a sequence of ASCII characters for each digit. I.e., the number 123
is encoded with the ASCII characters that represent 1
, 2
, ..., starting at most significant digit. Can't be larger than 6 chars, therefore maximum file size is 64GB.ReplicationDeg
: Single byte. ASCII character for the digit (which means replication degree ranges from 0 to 9).A class responsible to receive a file name, split itself in Chunk
s, generate its own identifier (#1), and later on maybe to construct the original File from all Chunk
s.
Method to compute a hash to be used as a unique identifier for a given file. This identifier should change upon file content modifications.
This is necessary to perform the recovery of the files
Recovery fails when some peer does not have the necessary chunk because it hangs (NullPointerException on reading non-existing file). Attempts with large files (>1.5MB) and 4 peers were enough to replicate this condition.
Basically, the MCListen
launches the ChunkSenderWorker
upon receiving a GETCHUNK message. Instead of ensuring the chunk is stored locally, it attempts to get the chunk from the filesystem which causes the exception.
Peer sending messages to itself (considering there's a single peer running), the results are a shitfest. Sometimes the sent message matches the received message. Sometimes larger fields such as file id or data are broken. Sometimes, a partial datagram is received.. Sure, UDP is not reliable, but this is running locally. It doesn't feel right at all..
After successfully storing the chunk, the peer must send a STORED message through MC channel. This message is sent with a uniform delay between 0 and 400 milliseconds.
One of the interfaces every peer must provide to the client is the retrieve of local service state information. Notice that the Peer must send its information to the Client application, instead of simply printing. A class to group all the needed information sounds good. The client requests the state information, the Peer populates an object with the desired information, and sends it back to the client through RMI, as simple as returning the class object instance.
For each local file the peer has instantiated its backup, it must show:
For each chunk display:
Another mistake.. I am using the path for the file as if it was the actual filename, which is wrong because the program can be instantiated from different directories, thus the same file has different relative paths, resulting in different hashes.
Extend Message
class to handle GETCHUNK messages. The required attributes are:
The client requests the backup service. This splits the file in chunks each will be sent, one by one, to all the peers. Considering #17, the process of listening for STORED messages and ensuring the desired replication degree may take a while (5 retries, up to ~31 seconds). It makes sense to dispatch the service requests, like backup requests, to workers. I.e, the Peer splits the file in chunks as it is, it's a fast operation, but then launches a worker (thread) responsible for creating the message, await for responses and ensuring the replication degree. Then it's ready for incoming backup or other service requests.
The worst case scenario is when a given chunk's replication degree is not met. This thread stays on the ThreadPool
, wasting computing resources. It seems there's no easy/native way to put a sleeping thread in a queue, in order to run another waiting thread. Once the thread pool starts executing a thread, until it dies, it must stay there.
Using ScheduledThreadPool
is not a perfect solution either. This runs thread periodically, which is not what we want, at all.
Repeated code alert ๐จ
Add methods to the Peer for sending messages through the channels. The argument can be a datagram for example. Inside the method, the Peer handles setting the socket address when needed and sends the message through the socket.
Each peer must be aware of backed up chunks on its file system. Relevant information is the replication degree desired and the actual replication degree on the system.
Note: The "name" of each multicast channel consists of the IP multicast address and port, and should be configurable via a pair of command line arguments of the server program. The "name" of the channels should be provided in the following order MC, MDB, MDR. These arguments must follow immediately the first three command line arguments, which are the protocol version, the server id and the service access point. - Source
The Peer
command line arguments order is not respecting. It should be:
<protocol version> <server id> <service AP> <MC> <MDB> <MDR>
.
Right, @Dannyps ?
e3178d0 closed part of this issue. However, the structure used to detect that other peers have sent chunks must be purged after some time. A comment with a TODO has been added on ad067f8.
Originally posted by @Dannyps in #30 (comment)
This is a very useful method for debugging and override toString
of some objects in order to display inner data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.