Coder Social home page Coder Social logo

quantiles's Introduction

quantiles - Optimal Quantile Approximation in Streams

GoDoc

This is a translation of TensorFlow's quantile helper class, it aims to compute approximate quantiles with error bound guarantees for weighted data sets. This implementation is an adaptation of techniques from the following papers:

The key ideas at play are the following:

  • Maintain an in-memory multi-level quantile summary in a way to guarantee a maximum approximation error of eps * W per bucket where W is the total weight across all points in the input dataset.
  • Two base operations are defined: MERGE and COMPRESS. MERGE combines two summaries guaranteeing a epsNew = max(eps1, eps2). COMPRESS compresses a summary to b + 1 elements guaranteeing epsNew = epsOld + 1/b.
  • b * sizeof(summary entry) must ideally be small enough to fit in an average CPU L2 cache.
  • To distribute this algorithm with maintaining error bounds, we need the worker-computed summaries to have no more than eps / h error where h is the height of the distributed computation graph which is 2 for an MR with no combiner.

We mainly want to max out IO bw by ensuring we're not compute-bound and using a reasonable amount of RAM.

Complexity:

  • Compute: O(n * log(1/eps * log(eps * n))).
  • Memory: O(1/eps * log^2(eps * n)) <- for one worker streaming through the entire dataset.

An epsilon value of zero would make the algorithm extremely inefficent and therefore, is disallowed.

Example Usage

package quantiles_test

import (
	"fmt"

	"github.com/axiomhq/quantiles"
)

func Example() {
	sketch := quantiles.NewDefault()
	for i := 0.0; i < 1e6; i++ {
		if err := sketch.Push(i, 1.0); err != nil {
			panic(err)
		}
	}
	fmt.Print("ApproximationError:") 	
	fmt.Println(sketch.ApproximationError(1))  // 0 <nil>

	fmt.Print("Finalize:") 
	fmt.Println(sketch.Finalize())            // <nil>

 
	fmt.Print("GenerateQuantiles(4):")         
	fmt.Println(sketch.GenerateQuantiles(4))  // [0 251865 503730 746595 999999] <nil>


	fmt.Print("GenerateQuantiles(10):")
	fmt.Println(sketch.GenerateQuantiles(10)) // [0 98946 197892 296838 395789 503730 602676 701622 800568 899514 999999] <nil>

	sum, err := sketch.FinalSummary()
	if err != nil {
		panic(err)
	}
	fmt.Print("GenerateQuantiles(4):")
	fmt.Println(sum.GenerateQuantiles(4))     // [0 251865 503730 746595 999999]
}

TODO

  • Implement an online estimator without the need of finalizing the stream
  • Add proper documentation
  • Benchmark
  • Add serialization

quantiles's People

Contributors

seiflotfy avatar ucirello avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.