Coder Social home page Coder Social logo

uf-ace / stock-prediction Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 7.0 115 KB

Stock trajectory analysis and prediction using machine learning techniques like sentiment analysis and long short-term memory (LSTM) neural nets. Frontend implemented as a Discord bot.

Python 100.00%

stock-prediction's Introduction

ACE Stock Prediction Engine

Table of Contents

  1. Overview
  2. Getting Started
  3. Sentiment Analysis
  4. Contact
  5. License

Overview

The ACE Stock Prediction Engine combines sentiment and quantiative analysis to provide valuable insights into stock behavior. It assists users in making informed investment decisions by making these insights available through an interactive Discord bot.

This project uses this template for creating a serverless Discord bot using Python. Review the instructions in the template's README (specifically, the "Development" section) for some useful background.

Add our bot to your server!

Getting Started

Using Git

You'll first need to fork this repository. Then, to clone the fork to your local machine, use the following command:
git clone https://github.com/your-username/forked-repository.git

Once done, you'll want to add the original repo as an upstream target:
git remote add upstream https://github.com/UF-ACE/stock-prediction.git

To sync the forked repo with changes in the original, use the following command:
git pull upstream/branch-name Where branch-name is the name of a branch in the original repo. Note that this pulls branch-name's remote changes into the branch of the fork you currently have checked out.

To submit changes to the original repo, follow the steps below:

  • Navigate to the fork on your local machine
    • cd forked-repo
  • Create a new branch for your changes
    • git checkout -b yourname-branchname
  • ...make changes...
  • Commit and push your changes
    • git commit -m "..."
    • git push --set-upstream origin yourname-branchname
  • Submit a pull request -> be sure to select the correct base branch in the original repo

Project Environment + Contributing

This project uses Python 3.9. You can download the latest version of Python here.

Additional commands should be added to files in the /commands directory. Reference the template linked above for more information on command function structure.

Helper functions harnessed by commands should be added to /utils. These functions should be unit tested.

Given the nature of this project, it is difficult to test new functionality locally. Changes made are only reflected in our bot when those changes are pushed to the master branch of this repo. That being said, ensure the following before submitting a pull request:

  • All new command functionality is thoroughly unit tested. Discord-specific context aside, this will ensure that a function you've created produces the expected output for a given input. This is especially important for functions that rely on external data sources, such as the stock market API.
  • All new command functionality is documented in the README.md file. This includes a description of the command, its usage, and any other relevant information.
  • All command requirements are noted in the requirements.txt file. This includes any new packages that need to be installed for the command to work.
  • All relevant API keys are noted in .env_sample, README, and described in the notes of the pull request. This will ensure these variables can be added to the production environment when the time comes.

API Keys

Below is a list of the APIs that this project uses. Local data retrieval (execution of some of the functions in /utils) will require that keys for these API are specified in a .env file on your local machine.

Sentiment Analysis

Overview

Sentiment analysis involves mapping a user's query to a valid stock ticker, collecting headlines related to that stock, and analyzing the sentiment of those headlines. The sentiment analysis is performed using the VADER library, which is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. The analysis returns a compound sentiment score, which is a value in the range [-1, 1] that represents the overall sentiment of the text. A score of -1 indicates extremely negative sentiment, while a score of 1 indicates extremely positive sentiment.

News headlines are collected from FinnHub, News API, Google News, and Yahoo Finance.

Commands

/sentiment [type] [company] [interval]

  • Description: Collects and (optionally) analyzes sentiment data for a given stock ticker over a given time interval.
  • Usage:
    • [type] is one of collect or analyze, and specifies whether the commands should (only) collect data or run an analysis
    • [company] is a valid stock ticker or company name
    • [interval] is an integer in the range [1, 30] and specifies the number of days of data to collect
  • Returns: A total number of headlines collected, and three sample headlines. If the analyze option is specified, the command will also return a sentiment analysis of the collected headlines. The user will be warned if the command fails to find more than 25 headlines.

Contact

Join our Discord!

If you have any more specific questions, comments, or concerns, our developers can be reached at [email protected]. Please include "Stock Prediction" in the subject line.

License

This project is licensed under the terms of the MIT license.

stock-prediction's People

Contributors

jaarke avatar hadiplays avatar

Stargazers

Josh avatar Joseph Mickler avatar  avatar  avatar Evan Hadam avatar

Watchers

Kostas Georgiou avatar

stock-prediction's Issues

Retrieve quantitative stock data

The first step to building out our quantitative analysis functionality is to collect relevant data so it is available to downstream functions.

The Yahoo Finance API might be useful for this, but I encourage contributors to experiment with different data providers to see which provide the most functionality.

Once the provider is chosen, add a file in utils/quant/ with functions for data collection. These functions should accept a ticker symbol and timeframe as arguments and return data retrieved from the chosen API.

Don't worry about data cleansing at this point. This will be figured when we start work on our LSTM model.

Add support for downloading sentiment data

Users might want to manually sift through some of the news/social media data we retrieve about a specific stock. To allow for this, we should add another command to cogs/sentiment_cog.py, allowing the user to query for sentiment data and download that data in CSV format. This will require the following (in a new command function within the sentiment cog):

  • Calling backend functions to retrieve the list of data to be saved
  • Formatting data in CSV format -> the user should be able to discern if a given data point is a headline, a tweet, or a Reddit comment.
  • Uploading the CSV file to Discord

Note that this command doesn't require any data analysis. It is just a way for the user to interface with our data retrieval system.

Embed sample articles/posts in sentiment analysis results

Currently, users are only given the relevant sentiment scores for their desired stock based on our distinct data streams. It might be nice to further provide them with sample data points used to conduct the analysis. This would mean giving a small list of news headlines and social media content, probably no more than 5 of each. This will require modifying the sentiment commands in cogs/sentiment_cog.py to make the result embedding contain some of the retrieved data.

Fix get_reddit_posts()

This function is meant to use the Reddit API to retrieve submissions regarding a search query, so that the content of those submissions can be used as data points for sentiment analysis. However, it's been difficult to retrieve any useful information from the function. Most of the posts are advertisements or links to external websites.

We'll need to modify this function with some more powerful filtering so that the posts we retrieve from Reddit can actually be used as input for sentiment analysis. We might look into using a third-party API (e.g., PushShift, PRAW, PSAW, PMAW, ...) or adding filtering on top of the Reddit API.

Add sentiment analysis

Now that sentiment data retrieval is finished. We'll need to analyze the data we're able to collect and display the results to the user. The functions relevant for conducting analysis should be contained in a utils/sentiment/analysis.py file. We will use the VADER model for sentiment analysis (see: The GitHub page).

The following tutorial provides a general overview of how this process should go: Stock News Sentiment Analysis with Python!

We should provide the eventual results to the user with as much information as possible. Barring any performance limitations, we should also differentiate between headline sentiment scores and social media sentiment scores.

Add LSTM prediction utility function(s)

Once we have decided on an LSTM model configuration, we should add support for making predictions on a specific stock. Specifically, we should add function(s) in utils/quant/ that accept a stock ticker, a data collection timeframe, and a prediction timeframe as parameters and returns a price prediction for each day in the prediction timeframe.

The sequence of events inside these function(s) is as follows:

  • Receive historic data about the queried stock price over the given collection timeframe (see this issue).
  • Cleanse / transform the data to fit our needs.
  • Train the model on the stock's price data.
  • Make predictions based on the prediction timeframe.

In addition to the daily predictions, it might be nice to return any performance metrics available after the analysis.

Keep README up-to-date

Major contributions to the codebase should be accompanied by updates to our README. Every semester, if not more frequently, we should revisit our documentation to ensure the project's code is able to be understood and used.

Add user-facing commands for quantitative analysis

As we build out functionality for conducting quantitative analysis, we'll want to add commands to make that functionality available for bot users. This should be done iteratively, to reflect each of our steps toward a fully functional analysis engine. The steps, as I see them, are as follows:

  • Add command for retrieving (and downloading) historic stock data over a given timeframe (this issue).
  • Add command for prediction via LSTM (this issue).
  • Add commands for prediction via other techniques (this issue).
  • Add command for conducting all supported analyses and displaying all results.

These commands should each provide a well-formatted embedding to the user displaying data entries and/or prediction results, as well as any relevant performance metrics.

Add support for different time intervals (sentiment analysis)

There is currently no way for a user to specify the interval of time they'd like the system to retrieve sentiment data for. By default, the bot just pulls data from the last 7 days and conducts its analysis. This functionality is already supported by our backend data retrieval functions (via the start parameter), it will just be a matter of adding frontend support. This will include doing the following (in each relevant command in cogs/sentiment_cog.py,:

  • Parse an interval argument before company name (these should look like: 1d, 10d, 1m, 3m, 1y)
  • Map the interval argument to a date of the form "Year-Month-Day" (so if the user enters "7d", and the current date is 03/26/2023, the mapped date should be 2023-03-19)
  • Call backend functions get_headlines() and get_social_media() with the mapped date

It may be useful to take a look at utils/sentiment/headlines.py to see how datetime objects can be manipulated, subtracted, and converted to strings.

Note: Due to API limitations, the maximum interval should be 1 year (1y). The default should still be 7 days, and we should allow users not to specify any interval for this default to be used. We do not need to add support for querying arbitrary intervals (i.e., the end of the interval should always be the current date).

Add support for other prediction techniques

LSTM is just one potential method for forecasting stock price changes. We may want to allow our users to choose from various analysis techniques for some more thorough insight into how a stock's price might change.

To do this, we'll want to add utility functions inside utils/quant/ that perform different types of analysis. Like with LSTM analysis, these functions should accept a stock ticker, data collection timeframe, and prediction timeframe as parameters, and return predictions for each day in the prediction timeframe.

See this issue, regarding LSTM implementation, for a more in-depth description of how analysis functions in general should work.

Sentiment data cleansing

Currently, sentiment data cleansing is very rudimentary. To make our results more reliable, we should try to add functions that vet our retrieved data according to the following principles:

  • No duplicate news headlines
  • News headlines should be in English
  • News headlines should talk explicitly and exclusively about the stock being queried

The cleansing should occur in utils/sentiment/headlines.py

Construct, tune, store LSTM model

After data retrieval, we will need to leverage a pre-trained LSTM model to conduct forecasting.

A walkthrough guide for creating and training an LSTM model can be found here and here. We should experiment and research different layer configurations to find what achieves the best results. This is probably best done in a Jupyter notebook.

After deciding on a model, it should be stored, untrained, using Python's pickle library and made available in utils/quant/

Add support for specifying a data stream (sentiment analysis)

There is currently only one, general-use "sentiment" command supported by the bot. This command is intended to pull data from all supported sources and conduct an analysis. To give users a bit more control, we should allow them to specify a single data stream. For example, a user could use a sentiment_news command to conduct an analysis only using news headlines, or a sentiment_social command to conduct an analysis on only social media data. This enhancement will require:

  • Adding the commands named above to cogs/sentiment_cog.py
  • In each of the commands, call relevant backend functions to retrieve only the desired data
  • Conduct sentiment analysis only on the desired data

Since the new commands are likely to mirror the general sentiment command, we should encapsulate shared functionality in methods.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.