Coder Social home page Coder Social logo

filteringexample's Introduction

Filtering Examples

DedupeFilteringJob

This shows an example of how to remove duplicates in a stream. It essentially uses an LRU cache and filters out duplicate messages that are seen within a set amount of time. Have a look at the DedupeFilterFunction.

In this example there is a stream of TweetImpressions except (just to show the deduplication) there are lots of duplicate Tweet IDs. The LRU cache is setup for keys to expire out every 1 second by default.

The tweets are generated randomly for tweets with IDs in the range [1..10]

This filtering is done in a fully fault-tolerant way since the state of the DedupeFilterFunction is checkpointed and will be restored in the event of failure.

TweetImpressionFilteringJob

This job consumes a stream of TweetImpressions and a stream of TweetSubscriptions from customers. There is a separate output sink for each customer and each customer receives only those TweetImpressions for tweets that they have subscribed to. Note that both the TweetImpression stream and the TweetSubscription stream are both keyed by TweetID. This naturally partitions the streams correctly across any number of hosts.

The TweetImpressions stream contains many duplicates so the the stream is first deduplicated using the DedupeFilterFunction.

The TweetImpressions are generated randomly for tweets with IDs in the range [1..10]

The tweet subscriptions are read from a socket. The format of the messages is just:

customer_name tweetId

For example:

Google 5

Indicates that customer "Google" wants to receive impressions data for the tweet with id "5"

To run this example pass the host name and port of the socket server where the subscription messages can be read from. For example:

flink run -c com.dataartisans.TweetImpressionFilteringJob FilteringExample-1.0-SNAPSHOT.jar --host localhost --port 9999

A great way to test this is to use netcat to provide the socket server. For example:

nc -lk 9999

Then type messages to the console to enable TweetSubscriptions in the format described above

Also note the set of possible customers is fixed. Customer must be one of:

Google, Twitter, Facebook, Apple, Amazon

filteringexample's People

Contributors

jgrier avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.