Coder Social home page Coder Social logo

vibhaa / hashpipe Goto Github PK

View Code? Open in Web Editor NEW
25.0 8.0 10.0 139.9 MB

Heavy hitter detection algorithm that is entirely in the dataplane

License: Apache License 2.0

Java 0.39% HTML 99.14% CSS 0.01% C++ 0.01% Shell 0.03% Python 0.07% Makefile 0.01% TeX 0.34% C 0.01% P4 0.01%

hashpipe's Introduction

Project aimed at identifying the flows that contribute to majority of packets between two points on a given network. The final paper can be found here.

All the evaluations used data from the CAIDA Traces from https://data.caida.org/datasets/passive-2016/equinix-chicago/20160218-130000.UTC/ and datacenter traces from http://pages.cs.wisc.edu/~tbenson/IMC10_Data.html. These were converted to .csv files with the srcip, dstip, protocol, srcport and dstport using tshark. They were then split them into chunks of 10M packets each after which TopKIdentifierFlowId was run on them. TopKIdentifierFlowId also needs a file with the actual flow size, which can be obtained by running FindFlowSize. This groups the packets from the csv by 5-tuple giving you a 5-tuple flowid and the associated count for a flow.

The output from TopKIdentifierFlowId itself involves rows of data one corresponding to each experiment (a run of HashPipe on that particular trace with those settings) and prints out the tablesize (number of flowid, counter pairs), K value, D (number of stages), false negative rate, false positive rate, weighted versions of the same (by size fo the flows), number of duplicates in the table, reported fraction of heavy hitters, number of reported heavy hitters, average deviation from the actual size across flows and so on. The header identifies the precise metrics reported. The variable "FlowSizes" has the map between flowid and their actual sizes.

runTopKIdentificationTrials function runs the experiments for you and runs only once across all the K values we want (for HashPipe and all of the incremental steps before that). To run this, you want to use the SummaryStructureType.RollingMinSingleLookup. So, you would want to run TopKIdentifierFlowId with the arguments flowSize file (actual flowsize per flow), the tracefile, "runTrial" and "Single" as its arguments to run HashPipe as per the algorithm in the paper.

runTrialsPerK on the other hand, runs it once for every K and this is relevant for Sample and Hold and the Count-Min Sketch. So, you would want to run TopKIdentifierFlowId with the arguments flowSize file (actual flowsize per flow), the tracefile, "PerThreshold" and "CM" or "SampleAndHold" as its arguments to run CM sketch or Sample And Hold respectively.

hashpipe's People

Contributors

vibhaa avatar antoninbas avatar

Stargazers

Chestnut avatar  avatar dairui avatar Emre Durmaz avatar 黄菊 avatar Liangcheng (LC) Yu avatar Mingran Yang avatar Boxiang yu avatar Karuna Grewal avatar Tony Lai avatar carpeansdiem avatar Shyam Sundar avatar  avatar  avatar Xin Zhe Khooi avatar Viren avatar Joko Akbar Prasetyo avatar Dannie Balistreri avatar  avatar Leo Xu avatar Stanley avatar  avatar João Romeiras Amado avatar CHEN Xiang avatar Scott Tang avatar

Watchers

James Cloos avatar Jennifer Rexford avatar Srinivas Narayana avatar  avatar Scott Tang avatar Joko Akbar Prasetyo avatar Fuheng Zhao avatar Albert Gran Alcoz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.