This is the working directory for our project.
explaining all files ---
-
Images output of result_visualization.py . Contains all images as output for visualizations ran on labelled files
-
bingnews.py script to scrape news articles . (COMMENTS NEED TO BE ADDED)
-
BJP_Labelled.csv and INC_Labelled.csv output of vaderSentiment.py . Contains the BJP and INC dataset labelled POS,NEG,NEU by the program.
-
Logistic Regression , SVM , RandomForest , XGBoost,Naive bayes are the programs for the respective ML models. comments have been added to Logistic Regression file for understanding. other files have almost similar codes.
-
preprocess.py preprocessing and cleaning of data for the ML models . comments added
-
procBJPtweets.csv and procINCtweets.csv are the datasets for BJP and INC tweets . (PS - More INC tweets needed.. not enough )
-
result_visualizatiojn.py visualtions done on BJP_Labelled.csv and INC_Labelled.csv . preprocessing included . Comments added.
-
train.csv 60k tweets for training all ML models . small part of the 16 million tweets dataset uploaded in Google drive in the TRAINING DATASET folder
-
tweets.py
script to fetch twitter data -
vaderSentiment.py classifies tweets in dataset as POS,NEG,NEU using VADER lexicon based approach
-
visual.py contains visualization done on train.csv . ##TEST VISUALIZATIONS##
-
data.csv dataset required for naive_bayes.py contains 40.5k tweets( equal ratio of pos and neg
-
pred.py It is consolidated code with various graphical representations. Uses LS_2.0.csv as dataset.
-
LS_2.0.csv Dataset for kaggle.py
-
accuracy.py compares the accuracy of predicted winners with the actual winners.
-
Winners.csv Consolidated dataset with all winners of karnataka 2019 elections.