Coder Social home page Coder Social logo

revtr's Introduction

*** What is this project all about? ***

Primary Goal
* Make the reverse traceroute system better by minimizing the number of RR-pings required,
* and by maximizing the amount of information we get from each one.

Why Rank VPs?
* First of all, to do reverse traceroute, we need to find VPs that are RR-reachable
* i.e. there are empty slots in the RR header by the time that the VP->Dst ping reaches
* Dst.  Second of all, we do not want to issue so many RR-pings into a network that
* rate-limiting is triggered.  Third of all, it would be great if the VPs chosen were
* as close as possible to the destination, so as to maximize the number of free slots
* in the RR header. 

VP Ranking Methods
* 1) Destination Cover
    + Overview:
        - Say we have a new destination with prefix 'P'.  Given a set 'S' of VPs for which we 
        - already have some RR-reachable measurements to other destinations within P, rank 
        - each VP 'v' in S in terms of how many such destinations it reaches. This is a greedy
        - set-covering approximation algorithm which just picks the "set" v that has the largest
        - number of RR-reachable destinations in P.
    + Drawbacks:
        - 1) It has no "terminating condition"; i.e. just because it supplies us with a rank-ordering
        -    of VPs, we may still have to issue probes from each VP moving down the list towards our
        -    new target destination. This could end up having us send pings from every single VP in S!
        - 2) It does not take into account the number of hops away from the destination a VP sits.
        -    By extension, it does not maximize the amount of information we can get back from an RR-ping.
* 2) Top-K / Restricted Set
    + Overview:
        - A small number of M-Lab sites can to a large majority of RR-reachable destinations. As shown
        - in Brian's paper, using just the top 10 M-Lab sites reached 95% of RR-reachable destinations
        - , within the same number of hops, as reached by the full set of M-Lab + PlanetLab nodes.
        - In light of this, restricting the set of VPs to just the top-K can give most of the results,
        - while greatly reducing the amount of probing required.
    + Drawbacks:
        - ???
* 3) Ingress-Based
    + Overview:
        - Define some notion of "network" -- we choose BGP-routable prefix in our work -- and find for each
        - RR-pingable measurement the point of "ingress."  That could be either the last hop before entering
        - the given prefix, or the first hop after entering the prefix -- as long as the choice is consistent.
        - Now just deliver a ranking based on the number of hops away from an ingress that a VP sits.  If several
        - VPs enter the network via the same ingress, just keep the one with the smallest hop-count, and discard
        - the rest.  This is okay because Internet routing is destination-based: any distinct routes from S to D 
        - going through some common router R will likely be exactly the same from R to D.  
    + Benefits:
        - 1) This approach does have a "terminating condition" in that many VPs from the original set S will be
        -    eliminated from the delivered list of rankings. This contributes towards the overall goal of minimizing
        -    the number of probes sent overall during the RTR process.
        - 2) The distance from VP to ingress is taken into account for these rankings. This effectively maximizes
        -    the amount of information we are likely to receive from a successful RR-ping when using top-ranked VPs.
    + Drawbacks:
        - ???

Evaluation
* We will use existing measurements from 2011 in conjunction with BGP-dumps from the same time period to evaluate the
* efficacy of each of the three aforementioned methods.  For each BGP-routable prefix for which the VPs in this
* dataset have RR-reachable measurements, the measurements will be split into test and training data.  The training
* set is used to create VP rankings via each of the three approaches.  The test set will be used to emulate actual
* RR-pings to never-before-seen destinations.  In the process of emulation for each method we can record metrics:
*  - NumProbes:   the number of probes we had to send before receiving a successful stamping
*  - NumHops:     the number of hops it takes to reach the destination on a successful probe
*  - VPChosen:    ???
*  - AltVPExists: ???
* With these metrics in hand, we'll be able to make a first-order judgement about the relative effectiveness of each 
* ranking method.  Results will include: 
*  - CDF showing the fraction of RR-reachable test-destinations for which k probes were required to receive a stamp;
*    legend: ranking methods
*  - CDF showing the fraction of RR-reachable test-destinations for which the destination was reached within k hops;
*    legend: ranking methods
*  - ...

Challenges
*) Large datasets can take a long time to process, so writing efficient fast code was an important consideration
*) ...

revtr's People

Contributors

aistein avatar

Watchers

James Cloos avatar  avatar

Forkers

littlespace

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.