Authors: Rafael Moczalla, Moritz Manner Date: 08.03.2021
To clone the project you need the revision tool Git and for building and executing
need Java OpenJDK. The project was tested with Java version 11. One can install
both with the following commands sudo apt install openjdk-11-jdk-headless git
on
Ubuntu.
This project is build with the IntelliJ IDE. To install IntelliJ run the following commands.
wget -O ideaIC-2020.3.2.tar.gz https://download.jetbrains.com/idea/ideaIC-2020.3.2.tar.gz?_ga=2.189001837.737502174.1614963016-977816646.1614963016
sudo tar -xzf ideaIC-2020.3.2.tar.gz -C /opt
echo $'\n# add IntelliJ to path\nexport PATH=$PATH:/opt/idea-IC-203.7148.57/bin' >> ~/.bashrc
export PATH=$PATH:/opt/idea-IC-203.7148.57/bin
-
After cloning the project open the root folder of the project with the via
File -> Open...
. IntelliJ will automatically detect the gradle build file and prepare the project for you. -
Build the project by right clicking on the root folder in the
Project
window on the left side of the IDE and selectingBuild module DEBS-2021
. All dependencies are downloaded and the project is build. -
Run the main class to run your code. Open the main class by clicking on
src/java/main/de/hpi/debs/Main.java
in theProject
window. One can run the main class by right clicking into the file and selectingRun Main.main()
.
Our solution for the DEBS 2021 Grand Challenge is developed ontop of Flink. There are
several options to run the solution. In any case you will need to setup several
environment variables to run the soluton successfully. An example of environment
variables tuning is shown in the following example in IntelliJ style.
DEBS_API_KEY=YourDebsChallengeApiKeyHere;CHECKPOINTING_INTERVAL=0;PARALLELISM=5;BATCH_SIZE=10000;BENCHMARK_TYPE=Evaluation;BENCHMARK_NAME_PREFIX=SomePrefix;NR_OF_BATCHES=0
The CHECKPOINTING_INTERVAL
sets the interval of checkpointing and a value of 0
disables checkpointing. The PARALLELISM
sets the number of parallelisation for. In
case you run our of memory you can add additional physiccal nodes to your cluster and
increase the parallelisation. A good setup is setting PARALLELISM
to the number of
physical nodes running a Flink taskmanager. The BATCH_SIZE
sets the size of the
batches our solution will request. The NR_OF_BATCHES
sets the number of batches
you like to receive and a value of 0 disables. When NR_OF_BATCHES
is disabled our
solution will continue to request new batches until the final batchs is received.
The batches contain a flag which determines wether or not it is the final batch.
To run our solution on a real cluster we need first to setup the cluster. First, each node needs to install Flink and start the taskmanager. Additionally, one node needs to start the jobmanager.
A runable standalone jar package can be generated with gradle clean shadowJar
. This
is supported by IntelliJ as well. After generating the jar we need to upload the jar
to the same node as the jobmanager. The solution can be startet with flink run
.
We provide for the DEBS 2021 Grand Challenge a script that automates this procedures.
The debsChallenge.sh
offers a help
option that briefly describe
the script. A usual usage will look like follows. First, make sure that you logged
into each node per ssh at least once. Otherwise the script will no succeed as you are
usualy requested if you would like to add the identity of the ssh connection to your
known connections. Afterward, you will run
DEBS_API_KEY=YourDebsChallengeApiKeyHere bash ./debsChallenge.sh deploy
to upload
the script and the jars to the cluster. Then, you can run our solution with
DEBS_API_KEY=YourDebsChallengeApiKeyHere bash ./debsChallenge.sh run
. The script
will log into the node of the jobmanager and start our solution. The solution is run
in a seperate linux screen
.
If you would like to use the flink dashboard to monitor our solution you can forward
a port via ssh to your loca machine and open localhost:8081
in a web browser. If
a taskmanger crashes and do not restart automatically you shoud run
DEBS_API_KEY=YourDebsChallengeApiKeyHere bash ./debsChallenge.sh stop
followed by
DEBS_API_KEY=YourDebsChallengeApiKeyHere bash ./debsChallenge.sh deployScripts
.
This will first try to stop the Flink cluster. The second command will deploy the
script to the cluster again and start the Flink cluster once again.