Coder Social home page Coder Social logo

go-stream's Introduction

go-stream

This library is a framework for stream processing analysis. It is meant to be used as a library for go programs that need to do stream processing of large volumes of data.

It is made up of a graph connecting a source to 1 or more operators, terminating at a sink. Operators pass data from one to another with go channels. An example graph to encode objects to snappy is:

var from *util.MemoryBuffer
// fill up from

var to *util.MemoryBuffer

ch := stream.NewOrderedChain()
ch.Add(source.NewNextReaderSource(from))
timingOp, _, dur := timing.NewTimingOp()
ch.Add(timingOp)
ch.Add(compress.NewSnappyEncodeOp())
ch.Add(sink.NewWriterSink(to))

ch.Start()

log.Printf("RES: Compress Snappy.\t\tRatio %v", float64(to.ByteSize())/float64(from.ByteSize()))
log.Printf("RES: Compress Snappy.\t\tBuffered Items: %d\tItem: %v\ttook: %v\trate: %d\tsize: %E\tsize/item: %E", to.Len(), *counter, *dur, int(    float64(*counter)/(*dur).Seconds()), float64(to.ByteSize()), float64(to.ByteSize())/float64(*counter))

Operators are the main components of a chain. They process tuples to produce results. Sources are operators with no output. Sinks are operators with no input. Operators implement stream.Operator. If it takes input implements stream.In; if it produces output implements stream.Out.

Mappers give a simple way to implement operators. mapper.NewOp() takes a function of the form func(input stream.Object, out Outputer) which processes the input and outputs it to the Outputer object. Mappers are automatically parallelized. Generators give a way to give mappers thread-local storage through closures. You can also give mappers special functionality after they have finished processing the last tuple.

You can also split the data of a chain into other chains. stream.Fanout takes input and copies them to N other chains. Distributor takes input and puts it onto 1 of N chains according to a mapping function.

Chains can be ordered or unordered. Ordered chains preserve the order of tuples from input to output (although the operators still use parallelism).
Installing:

go get "github.com/cevian/go-stream/stream

Compiling: go build

Testing: go test

go-stream's People

Contributors

cevian avatar sgichohi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

go-stream's Issues

fanin.Setdest

fanin.Setdest is a bit weird. It conflicts with ch.Add()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.