Coder Social home page Coder Social logo

agilemobiledev / marseille Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bhomass/marseille

0.0 3.0 0.0 316 KB

A real time streaming implementation of markov chain based fraud detection

Home Page: http://bhomass.blogspot.com/2014/12/a-real-time-streaming-implementation-of.html

License: Other

Scala 80.93% Java 19.07%

marseille's Introduction

An example implementation of a Markov chain based credit card fraud detection

More information on our blog: http://

In order to run our example, we need to install the followings:

Building the examples:

$ sbt assembly - this creates [installation_dir]/marseille/target/scala-2.10/spark-kafka-fraud-detection-assembly-1.0.jar

Let's start all the components running.

Start spark in standalone mode

$ [spark_installation_dir]/sbin/start-all.sh

you should be able to bring up the spark monitoring page at http://localhost:8080 at this point. The page should indicate that there is one work node (default setting).

Start hbase

$ [hbase_installation_dir]/bin/start-hbase.sh

This would also start the attached zookeeper instance.

Start kafka. this will reuse the zookeeper instance started with hbase. Do not start a second instance.

$ [kafka_installation_dir]/bin/kafka-server-start.sh config/server.properties

Start hdfs

$ [hadoop_installation_dir]/sbin/start-all.sh

Use jps to check both spark and hbase are running

$ jps

You should see lines like 10461 Kafka 10322 HRegionServer 14175 Jps 13669 Main 97904 10222 HMaster 13893 Master 13546 SecondaryNameNode 13452 DataNode 13377 NameNode

Now start all the monitoring tools to keep an eye on each stage of execution

Create 3 kafka topics $ [kafka_installation_dir]/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic real-time-cc $ [kafka_installation_dir]/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic user-trans-history $ [kafka_installation_dir]/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic candidate-fraud-trans

Start 3 listeners for these 3 topics using kafka's console consumer $ [kafka_installation_dir]/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic real-time-cc $ [kafka_installation_dir]/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic user-trans-history $ [kafka_installation_dir]/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic candidate-fraud-trans

The first task to run is creating a training data set.

Run the TrainerApp to create the training set. The number of customers is set to 500 by default in the app_config.prop file $ /opt/spark-1.1.0-bin-hadoop2.4/bin/spark-submit --class com.jcalc.app.TrainerApp --master spark://yourhostname:7077 spark-kafka-fraud-detection-assembly-1.0.jar

this creates three hdfs files : /data/streaming_analysis/markovmodel.txt - this is the model file. it contains the transition probability for all state pairs /data/streaming_analysis/training_sequence - this is the raw transaction sequence /data/streaming_analysis/training_transaction - this is transaction sequence, which groups all transaction sequence per customer

Use spark-submit to start MarkovPredictor which listens on the real-time-cc topic, and publish to topics user-trans-history and candidate-fraud-trans $ /opt/spark-1.1.0-bin-hadoop2.4/bin/spark-submit --class com.jcalc.feed.MarkovPredictor --master spark://yourhostname:7077 spark-kafka-fraud-detection-assembly-1.0.jar

Finally, feed simulated credit card transaction data into the real-time-cc topic to feed MarkovPredictor. 50 is a typical number of customers to use. $java -cp spark-kafka-fraud-detection-assembly-1.0.jar com.jcalc.feed.CCStreamingFeed 50

When you feed this simulated data, you should see data being printed out on all three topics. In addition, you can check that the transaction history is recorded in hbase 'cchistory' table. Use hbase shell to see the records. $ [hbase_install_dir]/bin/hbase shell hbase(main)> scan 'cchistory'

Hbase record dump looks like the following

 UHRMHKBO1D                               column=transactions:records, timestamp=1418713010152, value=LNS,NNL,NNS,LHL,NNN,LHL,NNL,LNN,NNN,NNS,LNN                 
 UM2RPYPHRX                               column=transactions:records, timestamp=1418713010327, value=NNL,LHN,NNN,NNL,NNN,NNL,NHL,NNN,NNS,NNL,NNN,LHN,LNL         
 W19EW2550B                               column=transactions:records, timestamp=1418713010198, value=NNN,HHN,NNN,NHL,NNS,NNL,NHN,NNL,HNL,LNS,LNL                 
 X30RNTW3KG                               column=transactions:records, timestamp=1418713010215, value=NNN,LNN,NNL,LHS,LNN,NNL,NNN,LHN,LNN                         
 XM7YLRIDL8                               column=transactions:records, timestamp=1418713010173, value=LNN,LNL,NNL,NNL,LNL,NNL,HNN,NNN,NNN,NNN,LNL                 
 Y3LNSKDK02                               column=transactions:records, timestamp=1418713010207, value=NNS,NNN,NNN,NNN,HNN,LNN,HNL,NNL,NNS,NNL,NNL,LNN,LNN,LHL,NNN 
50 row(s) in 0.0520 seconds

Repeat the simulated data feed as often as you like.

marseille's People

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.