Spark connection is realized using PySpark. To be able to connect to Spark, run this notebook on your master node, where spark is installed.
As Python 3 is used, make sure all your slave nodes are running Python 3 as well.
The HDFS connection has been realized using PyArrow,
Numpy and Matplotlib are being used for data visualization purposes
The notebook is currently used to evaluate sensor data gathered from an Arduino and is therefore adjusted accordingly