uditgupta10 / gpt-investar Goto Github PK

Enhancing Stock Investment Strategies through Annual Report Analysis with Large Language Models

License: MIT License

Python 5.30% Jupyter Notebook 94.70%

gpt-investar's Introduction

GPT-InvestAR

Enhancing Stock Investment Strategies through Annual Report Analysis with Large Language Models

This repository contains a set of tools and scripts designed to enhance stock investment strategies through the analysis of annual reports using Large Language Models. The components in this repository are organized as follows:

download_10k.py: This Python script downloads 10-K filings of companies from the SEC website, which contain crucial financial information.
convert_html_to_pdf.py: Converts HTML files to PDF files. PDFs are preferred due to their token efficiency for further analysis.
make_targets.py: Generates a DataFrame of stock tickers with target values of different time resolutions, which can be used as investment targets for a Machine Learning model.
embeddings_save.py: Generates embeddings of PDF files and saves them using Cromadb. These embeddings are numerical representations of the textual content in annual reports.
gpt_scores_as_features.py: Utilizes saved embeddings to query all questions for each annual report using a Large Language Model (LLM) such as GPT-3.5, and uses the scores or answers as features.
modeling_and_return_estimation.ipynb: This Jupyter Notebook contains the core modeling process. It uses machine learning techniques, specifically Linear Regression, to model the dataset and estimate returns. The goal is to create a portfolio of top-k predicted stocks and compare their returns with the S&P 500 index.

By following the sequence of these components, you can analyze annual reports, generate embeddings, and build predictive models to potentially enhance stock investment strategies.

Feel free to explore each component for more details and usage instructions.

Dependencies

LLama Index (and related dependencies)
OpenBB (and related dependencies)
Scikit-Learn
PDFKit (and related dependencies)

It is recommended to install libraries 1 and 2 in separate virtual (conda) environments. The python scripts mentioned above do not require both these libraries to be installed in the same environment.

Citation

If you use the code or find this repository helpful, please consider citing the paper:

GPT-InvestAR: Enhancing Stock Investment Strategies through Annual Report Analysis with Large Language Models
Udit Gupta
Publication Links:

@article{GPT-InvestAR,
  author = {Udit Gupta},
  title = {GPT-InvestAR: Enhancing Stock Investment Strategies through Annual Report Analysis with Large Language Models},
  journal = {arXiv e-prints},
  year = {2023},
  eprint = {arXiv:2309.03079},
  url = {https://arxiv.org/abs/2309.03079},
}

gpt-investar's People

Contributors

Stargazers

Watchers

Forkers

tonywhite11 chishengchen dequeplus5 davincilchen jamesaqt happyabbi frank0930 jimmyliutaiwan giggslam melandz kalijason mooonster peter10720129 jtints001 arthurzllu techthiyanes jingmouren jackpien serignecisse heydavid525 mike456752 jhihruei yfcck contropist 480 frederickazion zongrui991007 felixlehn maxclchen firmai-research kuiming awesome-software royshan jcoffi a7mosaad oseni03 napoleonchow edustack jsyzc2019 17hhh yuanlei6616 jianbotang gxfc1688 latelespaul pri1712 mcfly86 navendugarg

gpt-investar's Issues

Template to create the questions

I tried some questions such as: "profit": "Does the company got a good profit in this year? Does the company do good in this year?"
But the llm give me the answer is not '0.8' something like that. It give me {"score":"Yes"}.
Is there any tips to makesure the questions will get a value answer not "Yes", "No", "Uncertain". Thanks so much!

Correct the requirements.txt

I've been unable to set up the requirements.txt.
I've tried to manually go through each conda and each pip install and there are so many issues I encounter I've given up.
Python=3.9 got me the furthest, but still problematic.

It would be great if you maybe offered clearer list for users.

Thanks.

Can't find the 27 dimensions question list

Hi, I recently had the opportunity to read your paper and found it to be incredibly insightful and thought-provoking. However, I noticed in your paper you referenced a comprehensive list of questions encompassing 27 dimensions. We are very keen to delve deeper into your research, but upon reviewing the materials on your GitHub, it appears that only a portion of this list is available. (exactly in question.json)

Would it be possible for you to share the complete list of questions? Having access to this information would greatly assist us in fully understanding and potentially replicating your study.

Thank you in advance for your time and assistance. Your work is greatly appreciated and we are looking forward to exploring it further.

Only one question?

Curious about what the good features to the non-negative logistic regression but only found one question under questions.json. Are there other features used for the modeling?

Issue with the CONFIG_PATH in download_10k.py

Hi Udit,

I have followed the readme dependencies notes but when trying to run I am coming up against an error which reads:

usage: ipykernel_launcher.py [-h] --config_path CONFIG_PATH
ipykernel_launcher.py: error: the following arguments are required: --config_path

Any help appreciated.

Thank you