Coder Social home page Coder Social logo

dspa-semester-project's Introduction

DSPA 2019 - Semester Project

Course Project for Data Stream Processing and Analytics FS2019

Report for the project is available here: Report
Information about the project is available here: Info

How to SETUP the project:

Prerequisites:

  • git
  • Java version 8 (tested with java version 1.8.0_212 java openjdk amd64)
  • Maven (tested on Maven 3.6.0) For Flink, Scala binary on version 2.11:
  • Kafka 2.11-2.1.0: This is very important, currently Kafka is on version 2.2.0, but at the beginning of the semester it was on version 2.1.0. There has been some changes in the properties for connecting to zookeeper and we never changed them. With Kafka 2.1.0 it should run well.
  • Zookeeper, should be included in kafka download.

Setup:

  1. Checkout repository from gitlab. Download the data.

  2. Install Kafka into a custom directory. Note that we assume it will run on the same machine the same as we had in the exercise sessions.

  3. Open terminal on the same level as the pom.xml (and this README) and run:

mvn clean package

This should download the necessary libraries automatically.

  1. Unzip data into a well known location. Rename that folder to "input" and on the same level create a new folder "working". You should end up with something like this:

/root/data/ /input/ /streams/ /tables/ /working/

Our code will copy everything from input to working directory and make some small changes. If you need to make changes to the data, change them in the input and NOT in the working directory. Everything in the working directory is deleted.

  1. Open config.properties which is located in the folder with pom.xml (and this README). First change inputDirectory and workingDirectory according to the last step (4). Check the field writeToKafka, it should be true if not change it. The configuration file also gives access to the assignment specific configuration which we don't have to change right now.

  2. Copy the scripts from tools/kafka into the kafka installation folder. Run the following scripts in this order: startup_script.sh ; create_topics.sh ;

  3. Run main class src/main/java/main/Main once. This will take a while as it now writes all the data from the input directory into Kafka. Main class can be stopped after receiving events in the console, that means that the application is successfully reading from Kafka.

  4. Since we don't want to write to Kafka all the time we change the according configuration in config.properties: writeToKafka = false.

Second Startup: When you start Kafka a second time, for example with the scripts, then you only need to check 5-8 again. Make sure that you write at least once to Kafka every time you shutdown your computer. When you do not shut down your computer you don't need to do any of these steps at all.

After the setup, each task (Task1.java, Task2.java, Task3.java) can be run in order to get the results. Results will be streamed to Kafka topics, the names are configurable as properties or as constants. Default output Kafka topic names are:

"activityCounter" : Task 1 "task2OutputTopic" : Task 2 "fradulentUserDetection" : Task 3

dspa-semester-project's People

Contributors

pfu3tz avatar orhunozbek avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.