Related Paper about Deep Learning on Stock Price Prediction

Deep learning-based feature engineering for stock price movement
predictiong.pdf

In this paper, a end-to-end model named multi-filters neural network (MFNN), specifically for feature
extraction on financial time series samples and price movement prediction task, is introduced. Both convolutional
and recurrent neurons are integrated to build the architechture, so that the information from
different feature spaces and market views can be obtained.

Link to the Overleaf editor of the project report

Please use the link to access the shared file of the project report:

https://www.overleaf.com/9415485413hvcpkgggwddv

What features should be considered

Price, return rate, trading volume or other features

Proposal - integrated

Team: Infinite Alpha

Facebook Project: No

Project Title: Alpha Portfolio Construction with Deep Learning

Project Summary:

The prediction of the prices of stock and other market derivatives has been a hot topic in both academia and in the finance industry due to its vast potential of profitability. It is considered hard due to the low signal-to-noise ratio in the data, and the Efficient Market Hypothesis (EMH) even states that it's impossible to generate alpha through stock price prediction. At the same time, the dynamics of the market is also affected by the deployment of successful strategies, which in turn would invalidate those strategies over time. The difficulty and the potential upside together make it an interesting and exciting problem to solve. This project aims to build a benchmark-beating US equity portfolio based on the price prediction. Several backtesting metrics including annual return, Sharpe Ratio, alpha, etc will be considered to measure the final performance.

Approach:

Firstly, we will try implementing the basic form of neural networks on a set of stock prices and related features/financial instruments to get a benchmark for later model improvements. Based on the results, we may try combinations of CNN and RNN (LSTM/GRU) to extract features from price data and predict future stock prices. At the same time, we will probably try feeding the prediction results to an RL agent to decide the optimal allocation of assets as well. What we try afterwards would largely depend on the results from previous experiments, but one idea is to encode a history of stock prices with Transformer and perform clustering on the encoded stocks to find pairs trading opportunities.

Resources/Related Work:

For this problem, the SOTA algorithms are usually developed by large hedge funds (e.g. Citadel) and are kept as trade secrets. Most published results are either once successful strategies that have become out-of-date or impractical. According to https://paperswithcode.com/task/stock-market-prediction, it seems that the models that are purely trained on stock prices utilize some combination of embedding, CNN, RNN and graph neural networks.

[1] P. Gao, R. Zhang, X. Yang, "The Application of Stock Index Price Prediction with Neural Network", Mathematical and Computational Applications Vol. 25, Page 2297-8747, 2020
[2] Z. Hu, Y. Zhao, M. Khushi, "A Survey of Forex and Stock Price Prediction Using Deep Learning", Applied System Innovation Vol. 4, Page 2571-5577, 2021
[3] S. Mehtab, J. Sen, A. Dutta, "Stock Price Prediction Using Machine Learning and LSTM-Based Deep Learning Methods", arXiv:2009.10819 [q-fin.ST]
[4] H. Rezaei, H. Faaljou, G. Mansourfar, "Stock price prediction using deep learning and frequency decomposition", Expert Systems With Applications Vol. 169, 2021
[5] W. Long, Z. Lu, L. Cui, "Deep learning-based feature engineering for stock price movement prediction", Knowledge-Based Systems Vol. 164, 15 Jan. 2019, Pages 163-173

Datasets:
US end-of-day stock prices data from Quandl
US stock options daily data: https://www.optionistics.com/secure/subscribe
NASDAQ-100 Index: https://www.nasdaq.com/market-activity/quotes/nasdaq-ndx-index
US equity historical price and fundamental data at Yahoo Finance: https://help.yahoo.com/kb/SLN2311.html

Team Members:
Xinyun Hu
Ziwei Zeng
Ruize Luo
Boxuan Li

Looking for more members:
No

Multiagent RL

https://www.kaggle.com/c/hungry-geese/overview/evaluation

Multiagent贪吃蛇，感觉挺好玩的

What should be the sequence length and the output length of LSTM

[Draft] Project Proposal - Richard

Project Proposal

Team Name

Infinite Alpha

Is this a Facebook project?

No

Project Title

Automatic Alpha-Generating Portfolio Construction with Deep Learning

Project summary (4-5+ sentences).

Fill in your problem and background/motivation (why do you want to solve it? Why is it interesting?). This should provide some detail (don't just say "I'll be working on object detection")

The prediction of the prices of stock and other market derivatives has been a hot topic in both academia and in the finance industry due to its vast potential of profitability. It is considered hard due to the low signal-to-noise ratio in the data, and the Efficient Market Hypothesis (EMH) even states that it's impossible to generate alpha through stock price prediction. At the same time, the dynamics of the market is also affected by the deployment of successful strategies, which in turn would invalidate those strategies over time. The difficulty and the potential upside together make it an interesting and exciting problem to solve.

What you will do (Approach, 4-5+ sentences)

Be specific about what you will implement and what existing code you will use. Describe what you actually plan to implement or the experiments you might try, etc. Again, provide sufficient information describing exactly what you'll do. One of the key things to note is that just downloading code and running it on a dataset is not sufficient for a description or a project! Some thorough implementation, analysis, theory, etc. has to be done for the project.

Firstly, we will try implementing the basic form of neural networks on a set of stock prices to get a benchmark for later model improvements. Based on the results, we may try combinations of CNN and RNN (LSTM/GRU) to extract features from price data and predict future stock prices. At the same time, we will try feeding the prediction results to an RL agent to decide the optimal allocation of assets. What we try afterwards would largely depend on the results from previous experiments, but one idea is to encode a history of stock prices with Transformer and perform clustering on the encoded stocks to find pairs trading opportunities.

Resources / Related Work & Papers (4-5+ sentences).

What is the state of art for this problem? Note that it is perfectly fine for this project to implement approaches that already exist. This part should show you've done some research about what approaches exist.

For this problem, the SOTA algorithms are usually developed by large hedge funds (e.g. Citadel) and are kept as trade secrets. Most published results are either once successful strategies that have become out-of-date or impractical. According to https://paperswithcode.com/task/stock-market-prediction, it seems that the models that are purely trained on stock prices utilize some combination of embedding, CNN, RNN and graph neural networks.

Related Papers:
[1] P. Gao, R. Zhang, X. Yang, "The Application of Stock Index Price Prediction with Neural Network", Mathematical and Computational Applications Vol. 25, Page 2297-8747, 2020
[2] Z. Hu, Y. Zhao, M. Khushi, "A Survey of Forex and Stock Price Prediction Using Deep Learning", Applied System Innovation Vol. 4, Page 2571-5577, 2021
[3] S. Mehtab, J. Sen, A. Dutta, "Stock Price Prediction Using Machine Learning and LSTM-Based Deep Learning Methods", arXiv:2009.10819 [q-fin.ST]
[4] H. Rezaei, H. Faaljou, G. Mansourfar, "Stock price prediction using deep learning and frequency decomposition", Expert Systems With Applications Vol. 169, 2021

Datasets (Provide a Link to the dataset).

This is crucial! Deep learning is data-driven, so what datasets you use is crucial. One of the key things is to make sure you don't try to create and especially annotate your own data! Otherwise the project will be taken over by this.
US end-of-day stock prices data from Quandl
US stock options daily data https://www.optionistics.com/secure/subscribe

List your Group members.

Xinyun Hu
Ziwei Zeng
Ruize Luo
Boxuan Li

Are you looking for more members?

No

CAMELYON17 Data Set

Overview

Built on the success of its predecessor, CAMELYON17 is the second grand challenge in pathology organised by the Computational Pathology Group of the Radboud University Medical Center (Radboudumc) in Nijmegen, The Netherlands.

The goal of this challenge is to evaluate new and existing algorithms for automated detection and classification of breast cancer metastases in whole-slide images of histological lymph node sections. This task has high clinical relevance and would normally require extensive microscopic assessment by pathologists. The presence of metastases in lymph nodes has therapeutic implications for breast cancer patients. Therefore, an automated solution would hold great promise to reduce the workload of pathologists while at the same time reduce the subjectivity in diagnosis.

For the complete description of the challenge and the data set please visit the challenge website.

Data

Images

The data in this challenge contains a total of 1000 whole-slide images (WSIs) of sentinel lymph node from 5 different medical centers from The Netherlands: Radboud University Medical Center in Nijmegen, Canisius-Wilhelmina Hospital in Nijmegen, University Medical Center Utrecht, Rijnstate Hospital in Arnhem, and Laboratorium Pathologie Oost-Nederland in Hengelo.

The data set is divided into training and testing sets with 20 patients from each center in both sets. For each patient the shared 5 whole-slide images are zipped together into a single ZIP file. The patient pN-stages and the slide-level labels in the training set are shared in the stage_labels.csv file.

The slides are converted to generic TIFF (Tagged Image File Format) using an open-source file converter, part of the ASAP package.

Annotations

From each center 10 slides are exhaustively annotated and the annotations are shared in XML format. The XML files are compatible with the ASAP software. You may download this software and visualize the annotations overlaid on the whole slide image.

The provided XML files may have two groups of annotations ("metastases", or "normal") which can be accessed from the "PartOfGroup" attribute of the Annotation node in the XML file. Annotations belonging to group "metastases" represent tumor areas and annotations within group "normal" are non-tumor areas which have been cut-out from the original annotations in the "metastases" group.

Integrity

The checksums.md5 file contains the MD5 checksums of all the shared CAMELYON17 files. The downloaded files can be checked against this list with md5sum.

Licensing

See license.txt for licensing information.

Image segmentation in city landscape

Project Title

Image segmentation in city landscape

Project summary (4-5+ sentences).

Fill in your problem and background/motivation (why do you want to solve it? Why is it interesting?). This should provide some detail (don't just say "I'll be working on object detection")

For self-driving car, it is necessary to separate different objects. Also, the shapes and silhouette can helps improve object tracking, resulting in a more accurate input for both steering and acceleration.

What you will do (Approach, 4-5+ sentences)

Be specific about what you will implement and what existing code you will use. Describe what you actually plan to implement or the experiments you might try, etc. Again, provide sufficient information describing exactly what you'll do. One of the key things to note is that just downloading code and running it on a dataset is not sufficient for a description or a project! Some thorough implementation, analysis, theory, etc. has to be done for the project.

We will be using the dataset from the link and transfer learn deep learning models. Implement different image segmentation methods, such as Unet, DeepLab. We’ll also be studying what is the performance of different models and We’ll be evaluating the performance based on Intersection of Union, Dice Coefficient. We’ll also be studying how different loss function, such as conventional cross entropy loss, focal loss, dice loss, can impact the results.

Resources / Related Work & Papers (4-5+ sentences).

What is the state of art for this problem? Note that it is perfectly fine for this project to implement approaches that already exist. This part should show you've done some research about what approaches exist.

Deep Lean based segmentation models are used for image segmentation for self-driving scenario. Such as U net, FU net, Deep Lab, etc.

Papers:

https://ieeexplore.ieee.org/abstract/document/9356353?casa_token=ubIvqwRFp0sAAAAA:qT3J0N-7lpxAED0WFAdf8KkRnW8tZ-u-Zei0ZEq2aT-8-qkm6KN0S2SXSSThVHyKMIWncuvt
https://link.springer.com/chapter/10.1007/978-981-15-8391-9_1

Datasets (Provide a Link to the dataset).

This is crucial! Deep learning is data-driven, so what datasets you use is crucial. One of the key things is to make sure you don't try to create and especially annotate your own data! Otherwise the project will be taken over by this.

https://www.kaggle.com/c/cvpr-2018-autonomous-driving

List your Group members.
Are you looking for more members?

Need input data, training module is complete

Terminal - from Correlation One

https://terminal.c1games.com/home

CNN + RL

+ No data/labeling needed, the game generates data
+ Algorithm automatically gets played against other players
- Not much existing domain knowledge / research

r-luo / gatech-cs7643-project-group Goto Github PK

gatech-cs7643-project-group's People

Contributors

Stargazers

Watchers

gatech-cs7643-project-group's Issues

Project Proposal

Team Name

Is this a Facebook project?

Project Title

Project summary (4-5+ sentences).

What you will do (Approach, 4-5+ sentences)

Resources / Related Work & Papers (4-5+ sentences).

Datasets (Provide a Link to the dataset).

List your Group members.

Are you looking for more members?

CAMELYON17 Data Set

Overview

Data

Images

Annotations

Integrity

Licensing

Project Title

Project summary (4-5+ sentences).

What you will do (Approach, 4-5+ sentences)

Resources / Related Work & Papers (4-5+ sentences).

Papers:

Datasets (Provide a Link to the dataset).

Recommend Projects

Recommend Topics

Recommend Org