Coder Social home page Coder Social logo

aristotelispap / dow-jones-industrial-average-djia-market-index-prediction-using-reddit-news- Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 13.42 MB

Used NLP to extract important features from Reddit News and combined those features with Random Forest and Boosting algorithms to predict the DJIA market index movement.

License: MIT License

Jupyter Notebook 100.00%

dow-jones-industrial-average-djia-market-index-prediction-using-reddit-news-'s Introduction

Dow-Jones-Industrial-Average-DJIA-market-index-Prediction-using-Reddit-News-

In this project, we predict the Dow Jones Industrial Average (DJIA) market index movement using news and past historical data of the market index. More specifically, we use data coming from the Reddit News as well as data describing the past 'Open', 'High', 'Low' and 'Close' prices of the index and we try to predict whether the price of the index will increase or decrease. Therefore, we formulate the problem as a binary classification problem.

The news data included the 25 most important topics of each day on Reddit for 8 consecutive years. To preprocess those text data, we used some NLP techniques in order to extract 'Subjectivity' and 'Objectivity' of the news as well as the 'Positive', 'Neutral' and 'Negative' sentiment of the news in the stock price index.

Additionally, before we make the prediction for each day's 'Close' price, we use the 'Open', 'High', 'Low' and 'Close' prices of the index for the past 4 days. In our approach, we also make each algorithm to update its weights every certain number of days in order to keep track of the latest stock prices news, i.e. the algorithm learns how often it should update its weights in order to improve its predictions.

In order to solve the problem, we apply several machine learning algorithms including Logistic Regression with L2 regularization, Random Forest, AdaBoost and Support Vector Machines with Linear and Gaussian kernels. The performance of the algorithms is evaluated using test set accuracy as well as ROC curves and AUC which are the de facto evaluation metrics for this kind of problems.

The evaluation of the algorithms show that simpler models like Logistic Regression and Support Vector Machines with Linear kernels behave better than the most sophisticated ones in the particular dataset mainly because of the limited size of the dataset. Finally, the Generalization Bound Inequality is used in order to mathematically derive an upper bound for the test accuracy and pick the best out of the models used [see Project Report]. Last, a final comparison with already existing techniques from Kaggle is also made showing the superior performance of our approach [see Project Report].

Table: Final Results

Model Test Error AUC
Logistic Regression with L2 Regularization 37.33 % 0.69
Random Forest 46.66 % 0.533
AdaBoost 49.06 % 0.563
SVM with Linear Kernel 40.21 % 0.682
SVM with Gaussian Kernel 49.59 % 0.547

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.