jgantunes / pulsarcast Goto Github PK
View Code? Open in Web Editor NEWA pub-sub system for the distributed web - my master thesis @ IST
A pub-sub system for the distributed web - my master thesis @ IST
We need to setup a test harness with the right specs for this (see ipfs/notes#280)
Investigate the following options:
Focused on reading. Relevant literature covered:
An Efficient Multicast Protocol for Content-Based Publish-Subscribe Systems
Solution for content based routing at the application level,
Application-Level Multicast using Content-Addressable Networks
Overlay based solution for content based routing at the application level (introducing the notion of CAN's)
Overcast: reliable multicasting with an overlay network
Multicast solution with a single producer, with a strong focus on saving bandwidth
Relevant literature that may be covered on the upcoming weeks:
What would be the main benefits of having package managers over IPFS? A truly distributed package manager, no need for complex infrastructures to deal with a centralized source which can easily become a bottleneck. There are already some implementations like gx which might be a good starting point to look at but probably lacks in terms of having a well defined and established package manager to compare it to (unlike Node and NPM for example, or dpkg and debian). By the end, one of the main goals here would be to compare any possible implementation(s) over IPFS with a real live one in terms of resilience, performance, correctness, etc.
It would probably be useful to have a sub-topic created by default at each topic (that would never change) to disseminate any relevant information relative to the actual topic. This would be specially relevant given that the actual meta information would benefit from all the properties pulsarcast
has to offer, essentially creating a log of all the changes, allowing any node to rebuild and get the amount of information it needs.
Possible examples are:
If needed this information could even be spread across different sub-topics, giving the possibility to every node to fine grain the information it's interested in.
In the related work chapter the network overlays are introduced without making any correlation with the pub-sub systems that use them.
The report should:
Our work should:
parent
link impliesPresentation:
My goal will be to build a pub/sub communication system over IPFS, a peer-to-peer hypermedia protocol. The solution I would like to propose would be inline with what is the IPFS philosophy to have a decoupled structure of small components that allow developers and users to use what’s best fit for their specific needs, like opting between reliability vs speed, or favouring routing over a structured network or not. By the end I would like to have a lip2p specification that could be integrated into IPFS core.
Related work:
So far I've been covering:
IPFS structure and its whitepaper
XL peer-to-peer pub/sub systems (quite large haven't finished) from where I've extracted:
Megdhoot a topic based pub sub system over an unstructured network
Sub-2-Sub a content based pub sub system over DHT’s
What I've been up to:
Looking at IPFS current implementation and testing it. I would like to test some stuff over the current setup (like tracking the num of messages between peers) and my aim would be to have some small tests that could allow me to see how the network behaves.
Questions:
Diving into the current pubsub (aka floodsub) implementation. Seems to be a simple flood alg as the name states, each node emits the message to self if interested and forwards to all peers in his list(?) (not sure what's the structure used here). The messages have some sort of id which prevents the node to forward previously seen messages (each node only forwards once). How does this lib relate with ipfs and how does ipfs use it?
Looking into the ipfs paper for some clarity on blocks vs files, are they the same?
On using IPLD to map pub/sub messages.
My initial thought (from what I managed to grasp on the literature around IPFS PubSub and the usage of IPLD) was to map messages to DAG nodes that would have references (merkle-links) to its parents. This would create a tree that would give a desirable set of properties to the system, that being:
So as an example we could have:
Where each node in the Merkle tree would be a message and each link would be a merkle-link to the parent message.
If we would to represent this in JSON it could be something like:
{
"from": <peer-id>,
"payload": {
(...)
},
"parent": {
"/": "QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k"
}
}
** Note: ** this is just an example, probably more relevant fields will need to be added to this message schema but for now consider the more relevant parent key.
These merkle trees may be a good solution to create a message hierarchy, but how would these relate with the subscription model used?
So, picturing a topic based model, we could have:
So a couple of questions rise here:
The notion of hierarchy is independent of topic, we could have different tree branches under the same topic. Is this a good thing?
Should the root node under a topic be able to point to a message of a different topic as its parent? Or only for sub-topics for example? (as depicted in the figure above)
Picking up on the previous question, I think it would depend on what we think the parent link represents right? Is it a notion of causality only? Something else?
What would the root nodes actually represent? Would they just contain information on the topic that tree is representing? Or would the root node just be the first message for a given topic Since they wouldn't actually have anything useful to give to the peers, as they wouldn't be able to resolve the rest of the tree from here.
Out of the box the tree allows peers to detect missing nodes from the mid of the tree they're trying to build. This however is trickier for leafs right? Since it's not easy to detect missing leaf nodes. Take for example:
Where the red nodes are missing nodes from a tree state. The only way this could be detected would be through some background process where the peers would share their leaf nodes with each other from time to time?
Would there be a need for a consensus algorithm here? My guess is that it won't since each new message is unique and pointing to its parent. That is, except for the root node of a new topic. Given I'm publishing the first message in a topic, how could I guarantee that no other peer would be doing the same, at the same time, on some other end of the network, leading in the end to two different trees?
When a new peer subscribes to a new topic and wants to build the state tree on its side how could this be done? Since there's no clear "entry point" to the topic, the easiest way would be for him to ask his peers to give him the leaf nodes for that topic and with that he could build the rest of the tree, recursively requesting parent nodes.
As this tree keeps on growing and virtually any new node can point to any message along the way it could be useful to create a notion of stale leafs? Take the following example:
Where the yellow leafs represent messages produced/received more then 24 hours ago. These messages probably aren't relevant for a new joining node, as such we would be avoiding that after some time, whenever a new joining node would request the leaf messages of the tree, he would be swarmed with thousands of messages with some being clearly outdated and maybe irrelevant to him. This could however pose a treat to the persistence of the system, with old messages having a tendency to not being replicated and maybe disappear?
Still picking up on the way the tree represents topics. Another approach could be to use different graphs to represent messages and the actual topic hierarchy and semantics. As an example:
The messages could then have merkle-links to the topics like:
{
"from": <peer-id>,
"payload": {
(...)
},
"parent": {
"/": "QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k"
}
"topic": {
"/": "QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k"
}
}
However could messages point to parents from different topics? Would this make sense? If we wanted to avoid this how could we avoid having peers wrongfully linking to parents outside what's specified? In fact how could we guarantee that each peer would even create a rightfully structured message? My guess is that would lay somewhere in the authenticated/certified realm, which I'm not entirely sure if it is something we should address in our work.
/ipfs/<msg-hash>/parent/parent/parent/...
right? Is this a nice, clean approach? Would a different scheme help out in this path resolution?The archival of big datasets on IPFS could be a way to offload some of the burden to the peers while adding other important features like replication (and even maybe have specific content closer to where it's useful, although this might be way out of hand to what the focus here is). A possible goal could be to develop a solution where one could make a big dataset available without putting too much pressure on the network and that could also guarantee some other nice to have features such as replication(?)
With #28 it's increasingly important to support signature of topic and event descriptor by both the author
and the publisher
This is just a sum up of the notes from the meetings. A lot of scattered stuff going on so there may be a lot of loose ends/incoherences and probably stuff that just doesn't make sense.
Self-updatable web assembly applications with IPFS
Videostreaming
Causal consistency over IPFS
Cache blocks on IPFS (cluster?)
Replicas / geo-distributed replicas based on popularity/requests/availability/SLA
Fast VM/Docker start
Scatered resources
Still focused on reading. Relevant literature - "TERA: Topic-based Event Routing for peer-to-peer Architectures" (one of the solutions covered in XL pubsub). Next efforts will be put into more recent literature and the creation of hack project around integrating IPFS pubsub in a multiplayer web based game.
Umbrella issue to cover the work done under the evaluation section
On the literature side, focused on reading:
Also investigated a bit on possible games to hack, as per #7. Some possibilities I'm considering would be:
As per @tomgco suggestion this might come in handy - https://github.com/jepsen-io/jepsen
After discussion with @luisveiga some changes were performed to the specification of Pulsarcast. These changes have been implemented in https://github.com/JGAntunes/js-pulsarcast already.
The Allowed Publishers, ability to Request to Publish and the Linking of events are dictated by the topic descriptor. This is what we call event topology given it dictates how the event tree is structured.
The new Topic Descriptor:
const topicDescriptor = Joi.object().keys({
name: Joi.string().required(),
author: Joi.binary().required(),
parent: Joi.object().keys({
'/': Joi.binary()
}).required(),
'#': Joi.object().pattern(Joi.string(), Joi.object().keys({
'/': Joi.binary().required()
})).required(),
metadata
})
const metadata = Joi.object().keys({
created: Joi.date().iso().required(),
protocolVersion: Joi.string().required(),
allowedPublishers: Joi.object().keys({
enabled: Joi.boolean(),
peers: Joi.alternatives()
.when('enabled', {is: true, then: Joi.array().items(Joi.binary()).min(1)})
}).required(),
requestToPublish: Joi.object().keys({
enabled: Joi.boolean(),
peers: Joi.array().items(Joi.binary())
}).required(),
eventLinking: Joi.string().valid(['LAST_SEEN', 'CUSTOM']).required()
}).required()
The event descriptor remains unchanged with the exception of two distinct fields in its metadata which is author
and publisher
.
This also introduces a new kind of RPC which is REQUEST_TO_PUBLISH
essentially sending an event without a publisher
, looking for someone willing to publish it.
The essential idea is to add support for multiple scenarios where we want to control who's allowed to publish and how are the events linked.
Examples:
If we wanted to have order guarantee in our event tree (essentially creating a chain of linked events) we could have a topic with only its author as an allowed publisher, that would allow requests to publish and where the event linking is done based on the last seen event.
If we wanted to leave it up to application how the events are to be linked, we could allow custom event linking, giving the ability to set a custom parent
to the event at the time of publish.
The introduction should cover the motivation for, not only pub-sub systems, but also for all of our objectives. Specifically reliability, consistency and persistence in pub-sub systems for the web.
The IPFS section in the related work section lacks a more elaborate description on all the IPFS inner workings. E.g.:
Motivate why using IPFS is a good approach
Allow for the propagation of information in a structured manner using a pub sub pattern. The challenges could possibly be, how to route information across peers without putting too much pressure on the network (either way pub->sub, sub->pub) while giving basic guarantees like basic reliability and fast enough to be used in a real world scenario. Might be interesting to pursue the notion of "authenticated streams" mentioned here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.