This is a study project and includes running queries using Spark, MapReduce and Hive and compare execution time of all three platforms. Also it performs sentimental analysis of a pariticular video on youtube and classifies how many comments are positive and negative.
-
Clone This project
-
Setup hadoop 2.x.x (2.7.3 used for testing). HADOOP_HOME variable should be set
-
Setup Spark 2.x.x (2.1.0 used for testing). spark folder should be extracted to home folder and the folder should be renamed as spark. OR Set up SPARK_HOME variable and change '~/spark' of cron.py (line 10) to SPARK_HOME
-
Setup hive 2.x.x (2.1.0 used for testing). HIVE_HOME variable should be set
-
Move youtubeData.txt to HDFS as /youtubeData.txt
-
Install Java
-
Install Python (Sudo apt-get install python)
-
Install pip (sudo apt-get install python-pip)
-
Run the command pip install -r requirements.txt
-
Run the cron.py file to run all the queries (python cron.py)
-
Run Django server (python manage.py runserver)
-
Open localhost:8000/app/ in browser