Coder Social home page Coder Social logo

thirdeyedata / anomaly-detections-apache-spark Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 2.0 7.43 MB

This solution performs Anomaly Detection with Statistical Modeling on Spark. The detection is based on Z-Score calculated on cpu usage data collected from servers.

Home Page: https://thirdeyedata.io/open-source-software-oss-solutions/anomaly-detection-using-apache-spark/

Scala 3.82% Python 2.78% Ruby 0.16% Shell 0.62% Java 26.47% CSS 1.21% XSLT 2.52% HTML 56.90% JavaScript 5.53%
anomaly outlier-detection spark chombo avenir scala machine-learning fraud-detection cybersecurity

anomaly-detections-apache-spark's Introduction

Anomaly Detections using Apache Spark

Assumptions

  1. GIT is installed

  2. JDK (1.8+) is installed

  3. Scala (2.1+) is installed

  4. Maven is installed

  5. SBT (Scala build tool) is installed

  6. Spark cluster (with 1 master and at least 1 worker) is up and running and accessible

  7. Following environment variables should be set:

       SPARK_HOME is set in ~/.bashrc
    

Steps to setup the project in your local system

  1. GIT CLONE avenir , beymani , chombo , hoidla.

  2. Navigate to the folder named hoilda and execute the below commands:

      mvn clean install
      
      sbt publishLocal 
    
  3. Navigate to the folder named chombo and follow the below sequence:

Build chombo first in master branch with

      mvn clean install 
      
      sbt publishLocal 

Build chombo-spark in chombo/spark directory

      sbt clean package 
      
      sbt publishLocal
  1. Navigate to the folder named avenir and execute the below command

     mvn clean install
    
  2. Navigate to the folder named beymani and execute the below command

      mvn clean install
      
      sbt publishLocal
    
  3. Build beymani-spark in beymani/spark directory

       sbt clean package 
       
       sbt publishLocal
    
  4. Navigate to the folder named beymani /resource and execute

      ant -f beymani_spark.xml
    
  5. Navigate to the folder named chombo /resource and execute the below command

      ant -f  chombo_spark.xml
    
  6. Navigate to the folder named beymani /resource and edit the and_spark.sh file to reflect the path in your local system:

      Set the project home path       (  PROJECT_HOME )
      set the spark home path	    ( SPARK_HOME )
      set the master as spark master       ( MASTER )
    

Now you are ready to run the file and_spark.sh and below are the various parameters you should use to run the file:

Step 1 : Create base normal data

./and_spark.sh crInput <num_of_days> <reading_intervaL> <num_servers> <output_file>

where num_of_days = number of days e.g 10

reading_intervaL = reading interval in sec e.g. 300

num_servers = number of servers e.g. 4

output_file = output file, we will use c.txt from now on

        ./and_spark.sh crInput 10 300 40 c.txt

Step 2 : Copy modeling data

  • insert outliers

./and_spark.sh insOutliers <normal_data_file> <with_outlier_data_file>

where

normal_data_file = normal data file (c.txt)

with_outlier_data_file = data file with outliers (cusage.txt)

        ./and_spark.sh insOutliers c.txt cusage.txt

-copy

./and_spark.sh cpModData <with_outlier_data_file>

where

with_outlier_data_file = data file with outliers (cusage.txt)

        ./and_spark.sh cpModData cusage.csv

Step 3 : Run Spark job for stats

        ./and_spark.sh numStat

Step 4 : Copy and consolidate stats file

        ./and_spark.sh crStatsFile

Step 5 : Run Spark job to detect outliers

Set output.outliers = true and rem.outliers = true

        ./and_spark.sh olPred

Step 6 : Copy and consolidate clean file

        ./and_spark.sh crCleanFile

Step 7 : Copy test data

  • insert outliers

./and_spark.sh insOutliers <normal_data_file> <with_outlier_data_file>

where

normal_data_file = normal data file (c.txt)

with_outlier_data_file = data file with outliers (cusage.txt)

        ./and_spark.sh insOutliers c.txt cusage.txt

-copy

./and_spark.sh cpTestData <with_outlier_data_file>

where

with_outlier_data_file = data file with outliers (cusage.txt)

        ./and_spark.sh cpTestData cusage.txt

Step 8 : Run Spark job for stats again with clean data

        ./and_spark.sh numStat

Step 9 : Copy and consolidate stats file

        ./and_spark.sh crStatsFile

Step 10 : Run Spark job to detect outliers

Set output.outliers = true and rem.outliers = true

        ./and_spark.sh olPred

Configuration

Configuration is in and.conf & in and1.conf.

anomaly-detections-apache-spark's People

Contributors

djdas avatar moyukh26 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.