Coder Social home page Coder Social logo

jgantunes / pulsarcast Goto Github PK

View Code? Open in Web Editor NEW
39.0 11.0 6.0 42.84 MB

A pub-sub system for the distributed web - my master thesis @ IST

Makefile 0.08% Shell 0.26% TeX 99.66%
pubsub thesis delivery-guarantees reliability p2p decentralized libp2p scalability persistence

pulsarcast's Introduction

Pulsarcast

Scalable and reliable pub-sub over P2P networks

Implementation

JS-Pulsarcast - A javascript implementation of pulsarcast, using libp2p

Project Description

Abstract

The publish-subscribe paradigm is a wildly popular form of communication in complex distributed systems. The properties offered by it make it an ideal solution for a multitude of applications, ranging from social media to content streaming and stock exchange platforms. Consequently, a lot of research exists around it, with solutions ranging from centralised message brokers, to fully decentralised scenarios (peer to peer).

Within the pub-sub realm not every solution is the same of course and trade-offs are commonly made between the ability to distribute content as fast as possible or having the assurance that all the members of the network will receive the content they have subscribed to. Delivery guarantees is something quite common within the area of centralised pub-sub solutions, there is, however, a clear lack of decentralised systems accounting for this. Specifically, a reliable system with the ability to provide message delivery guarantees and, more importantly, persistence guarantees. To this end, we present Pulsarcast, a decentralised, highly scalable, pub-sub, topic based system seeking to give guarantees that are traditionally associated with a centralised architecture such as persistence and eventual delivery guarantees.

The aim of Pulsarcast is to take advantage of the network infrastructure and protocols already in place. Relying on a structured overlay and a graph based data structure, we build a set of dissemination trees through which our events will be distributed. Our work also encompasses a software module that implements Pulsarcast, with our experimental results showing that is a viable and quite promising solution within the pub-sub and peer to peer ecosystem.

Documents

Publications & Talks

Acknowledgements

This work was developed at INESC-ID Lisboa (Distributed Systems Group), Instituto Superior Técnico, Universidade de Lisboa.

A very special thank you to Microsoft Azure (and specially @palma21) for supporting our evaluation efforts.

pulsarcast's People

Contributors

daviddias avatar fabioantunes avatar jgantunes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pulsarcast's Issues

Changes to the event topology

After discussion with @luisveiga some changes were performed to the specification of Pulsarcast. These changes have been implemented in https://github.com/JGAntunes/js-pulsarcast already.

Changes

The Allowed Publishers, ability to Request to Publish and the Linking of events are dictated by the topic descriptor. This is what we call event topology given it dictates how the event tree is structured.

The new Topic Descriptor:

const topicDescriptor = Joi.object().keys({
  name: Joi.string().required(),
  author: Joi.binary().required(),
  parent: Joi.object().keys({
    '/': Joi.binary()
  }).required(),
  '#': Joi.object().pattern(Joi.string(), Joi.object().keys({
    '/': Joi.binary().required()
  })).required(),
  metadata
})

const metadata = Joi.object().keys({
  created: Joi.date().iso().required(),
  protocolVersion: Joi.string().required(),
  allowedPublishers: Joi.object().keys({
    enabled: Joi.boolean(),
    peers: Joi.alternatives()
      .when('enabled', {is: true, then: Joi.array().items(Joi.binary()).min(1)})
  }).required(),
  requestToPublish: Joi.object().keys({
    enabled: Joi.boolean(),
    peers: Joi.array().items(Joi.binary())
  }).required(),
  eventLinking: Joi.string().valid(['LAST_SEEN', 'CUSTOM']).required()
}).required()

The event descriptor remains unchanged with the exception of two distinct fields in its metadata which is author and publisher.

This also introduces a new kind of RPC which is REQUEST_TO_PUBLISH essentially sending an event without a publisher, looking for someone willing to publish it.

Consequences

The essential idea is to add support for multiple scenarios where we want to control who's allowed to publish and how are the events linked.

Examples:

If we wanted to have order guarantee in our event tree (essentially creating a chain of linked events) we could have a topic with only its author as an allowed publisher, that would allow requests to publish and where the event linking is done based on the last seen event.

If we wanted to leave it up to application how the events are to be linked, we could allow custom event linking, giving the ability to set a custom parent to the event at the time of publish.

Archiving large files using IPFS

The archival of big datasets on IPFS could be a way to offload some of the burden to the peers while adding other important features like replication (and even maybe have specific content closer to where it's useful, although this might be way out of hand to what the focus here is). A possible goal could be to develop a solution where one could make a big dataset available without putting too much pressure on the network and that could also guarantee some other nice to have features such as replication(?)

October 7

On the literature side, focused on reading:

  • A.-M. Kermarrec and P. Triantafillou, “XL peer-to-peer pub/sub systems,” ACM Comput. Surv., vol. 46, no. 2, pp. 1–45, 2013.

Also investigated a bit on possible games to hack, as per #7. Some possibilities I'm considering would be:

IPLD to map the message structure

On using IPLD to map pub/sub messages.

Approach

My initial thought (from what I managed to grasp on the literature around IPFS PubSub and the usage of IPLD) was to map messages to DAG nodes that would have references (merkle-links) to its parents. This would create a tree that would give a desirable set of properties to the system, that being:

  • Fault-tolerance, where subscribers would be able to detect missing messages that due to multiple circumstances may not arrive at the subscriber. This subscriber could, through a newer message, detect the missing piece of information in its stream and request it from its peers.
  • Persistence. We can provide guarantees that new subscriptions, or peers that have remained inactive for a certain period of time, are able rebuild the entire Merkle tree on their side by requesting all the messages they need from the network. This comes hand in hand with the above as we can say fault-tolerance is enabled by persistence.
  • De-duplication of events.
  • Causality tracking. (as it was nicely pointed by professor Luís Veiga)

So as an example we could have:

ipfs_stack_diagram

Where each node in the Merkle tree would be a message and each link would be a merkle-link to the parent message.

If we would to represent this in JSON it could be something like:

{
  "from": <peer-id>,
  "payload": {
    (...)
  },
  "parent": {
    "/": "QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k"
  }
}

** Note: ** this is just an example, probably more relevant fields will need to be added to this message schema but for now consider the more relevant parent key.

Subscription Model

These merkle trees may be a good solution to create a message hierarchy, but how would these relate with the subscription model used?

Topic based

So, picturing a topic based model, we could have:

ipfs_stack_diagram 1

So a couple of questions rise here:

  • The notion of hierarchy is independent of topic, we could have different tree branches under the same topic. Is this a good thing?

  • Should the root node under a topic be able to point to a message of a different topic as its parent? Or only for sub-topics for example? (as depicted in the figure above)

  • Picking up on the previous question, I think it would depend on what we think the parent link represents right? Is it a notion of causality only? Something else?

  • What would the root nodes actually represent? Would they just contain information on the topic that tree is representing? Or would the root node just be the first message for a given topic Since they wouldn't actually have anything useful to give to the peers, as they wouldn't be able to resolve the rest of the tree from here.

  • Out of the box the tree allows peers to detect missing nodes from the mid of the tree they're trying to build. This however is trickier for leafs right? Since it's not easy to detect missing leaf nodes. Take for example:
    ipfs_stack_diagram 2
    Where the red nodes are missing nodes from a tree state. The only way this could be detected would be through some background process where the peers would share their leaf nodes with each other from time to time?

  • Would there be a need for a consensus algorithm here? My guess is that it won't since each new message is unique and pointing to its parent. That is, except for the root node of a new topic. Given I'm publishing the first message in a topic, how could I guarantee that no other peer would be doing the same, at the same time, on some other end of the network, leading in the end to two different trees?

  • When a new peer subscribes to a new topic and wants to build the state tree on its side how could this be done? Since there's no clear "entry point" to the topic, the easiest way would be for him to ask his peers to give him the leaf nodes for that topic and with that he could build the rest of the tree, recursively requesting parent nodes.

  • As this tree keeps on growing and virtually any new node can point to any message along the way it could be useful to create a notion of stale leafs? Take the following example:
    ipfs_stack_diagram 3
    Where the yellow leafs represent messages produced/received more then 24 hours ago. These messages probably aren't relevant for a new joining node, as such we would be avoiding that after some time, whenever a new joining node would request the leaf messages of the tree, he would be swarmed with thousands of messages with some being clearly outdated and maybe irrelevant to him. This could however pose a treat to the persistence of the system, with old messages having a tendency to not being replicated and maybe disappear?

  • Still picking up on the way the tree represents topics. Another approach could be to use different graphs to represent messages and the actual topic hierarchy and semantics. As an example:
    ipfs_stack_diagram 4
    The messages could then have merkle-links to the topics like:

{
  "from": <peer-id>,
  "payload": {
    (...)
  },
  "parent": {
    "/": "QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k"
  }
  "topic": {
    "/": "QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k"
  }
}

However could messages point to parents from different topics? Would this make sense? If we wanted to avoid this how could we avoid having peers wrongfully linking to parents outside what's specified? In fact how could we guarantee that each peer would even create a rightfully structured message? My guess is that would lay somewhere in the authenticated/certified realm, which I'm not entirely sure if it is something we should address in our work.

  • About the path resolution that's used by IPLD. If we followed the scheme defined so far, given a message, we could resolve its hierarchy through something like /ipfs/<msg-hash>/parent/parent/parent/... right? Is this a nice, clean approach? Would a different scheme help out in this path resolution?
  • Still on the hierarchy resolution. A peer that aims at getting messages higher in the hierarchy would need to recursively request and resolve each parent link. This can easily become a pain in the network. It would be desirable to have a way to resolve links in bulk right? Any ideas on how could this be achieved?

Define persistence and consistency

The report should:

  • Define persistence in the scope of pub-sub systems
  • Define consistency in the scope of pub-sub systems
    Both of these should probably reside in a missing section of the relevant systems (which should cover QoS)

Our work should:

  • Define the consistency we intend to offer, i.e. the causality that the parent link implies

2017-05-02 meeting sum up

This is just a sum up of the notes from the meetings. A lot of scattered stuff going on so there may be a lot of loose ends/incoherences and probably stuff that just doesn't make sense.

  • Self-updatable web assembly applications with IPFS

    • Same with JS
    • Notion that this updates can be anything, even worlds in a metaverse?
  • Videostreaming

  • Causal consistency over IPFS

    • CRDTS
    • Dynamo?
  • Cache blocks on IPFS (cluster?)

    • CDN
    • Snapshots/checkpoints (consistent cut?)
  • Replicas / geo-distributed replicas based on popularity/requests/availability/SLA

  • Fast VM/Docker start

    • Sharded ...
    • Subscription list on IPFS
  • Scatered resources

Expand on the selected topics

Expand on #2 #3 #4

Format:
Goals for our solution (what would we be doing if this were to be selected)
Current shortcomings (what's lacking nowadays that doesn't allow for this)
End goals for IPFS (what could we achieve with this?)

More details on how IPFS works

The IPFS section in the related work section lacks a more elaborate description on all the IPFS inner workings. E.g.:

  • Merkle DAG and the immutability it offers
  • IPNS
  • The high scalability it offers and how we intend to use it

Motivate why using IPFS is a good approach

September 22

Focused on reading. Relevant literature covered:

  • An Efficient Multicast Protocol for Content-Based Publish-Subscribe Systems
    Solution for content based routing at the application level,

  • Application-Level Multicast using Content-Addressable Networks
    Overlay based solution for content based routing at the application level (introducing the notion of CAN's)

  • Overcast: reliable multicasting with an overlay network
    Multicast solution with a single producer, with a strong focus on saving bandwidth

Relevant literature that may be covered on the upcoming weeks:

  • A Case for End System Multicast
  • Scattercast: an adaptable broadcast distribution framework
  • Yoid: Extending the Internet Multicast Architecture - é um white paper mas tenho encontrado bastantes referencias para ele.
  • Bayeux: An Architecture for Scalable and Fault-tolerant Wide-area Data Dissemination Shelley

Assembling a reliable and fast pub/sub network on top of IPFS

Allow for the propagation of information in a structured manner using a pub sub pattern. The challenges could possibly be, how to route information across peers without putting too much pressure on the network (either way pub->sub, sub->pub) while giving basic guarantees like basic reliability and fast enough to be used in a real world scenario. Might be interesting to pursue the notion of "authenticated streams" mentioned here.

Meta sub-topic

It would probably be useful to have a sub-topic created by default at each topic (that would never change) to disseminate any relevant information relative to the actual topic. This would be specially relevant given that the actual meta information would benefit from all the properties pulsarcast has to offer, essentially creating a log of all the changes, allowing any node to rebuild and get the amount of information it needs.

Possible examples are:

  • Disseminating updates to the topic descriptor (which would essentially mean a new topic descriptor)
  • New subscribers(? - probably not that relevant but still a possibility)
  • New sub-topics added (? - this would essentially translate in the first bullet point)

If needed this information could even be spread across different sub-topics, giving the possibility to every node to fine grain the information it's interested in.

2017-06-20 meeting

  • Presentation:
    My goal will be to build a pub/sub communication system over IPFS, a peer-to-peer hypermedia protocol. The solution I would like to propose would be inline with what is the IPFS philosophy to have a decoupled structure of small components that allow developers and users to use what’s best fit for their specific needs, like opting between reliability vs speed, or favouring routing over a structured network or not. By the end I would like to have a lip2p specification that could be integrated into IPFS core.

  • Related work:
    So far I've been covering:
    IPFS structure and its whitepaper
    XL peer-to-peer pub/sub systems (quite large haven't finished) from where I've extracted:
    Megdhoot a topic based pub sub system over an unstructured network
    Sub-2-Sub a content based pub sub system over DHT’s

  • What I've been up to:
    Looking at IPFS current implementation and testing it. I would like to test some stuff over the current setup (like tracking the num of messages between peers) and my aim would be to have some small tests that could allow me to see how the network behaves.

  • Questions:
    Diving into the current pubsub (aka floodsub) implementation. Seems to be a simple flood alg as the name states, each node emits the message to self if interested and forwards to all peers in his list(?) (not sure what's the structure used here). The messages have some sort of id which prevents the node to forward previously seen messages (each node only forwards once). How does this lib relate with ipfs and how does ipfs use it?

Looking into the ipfs paper for some clarity on blocks vs files, are they the same?

September 29

Still focused on reading. Relevant literature - "TERA: Topic-based Event Routing for peer-to-peer Architectures" (one of the solutions covered in XL pubsub). Next efforts will be put into more recent literature and the creation of hack project around integrating IPFS pubsub in a multiplayer web based game.

Package managers over IPFS

What would be the main benefits of having package managers over IPFS? A truly distributed package manager, no need for complex infrastructures to deal with a centralized source which can easily become a bottleneck. There are already some implementations like gx which might be a good starting point to look at but probably lacks in terms of having a well defined and established package manager to compare it to (unlike Node and NPM for example, or dpkg and debian). By the end, one of the main goals here would be to compare any possible implementation(s) over IPFS with a real live one in terms of resilience, performance, correctness, etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.