hungchun-lin / stock-price-prediction-using-gan Goto Github PK

In this project, we will compare two algorithms for stock prediction. First, we will utilize the Long Short Term Memory(LSTM) network to do the Stock Market Prediction. LSTM is a powerful method that is capable of learning order dependence in sequence prediction problems. Furthermore, we will utilize Generative Adversarial Network(GAN) to make the prediction. LSTM will be used as a generator, and CNN as a discriminator. In addition, Natural Language Processing(NLP) will also be used in this project to analyze the influence of News on stock prices.

License: MIT License

Python 0.51% Jupyter Notebook 99.49%

stock-prediction gan lstm gru python

stock-price-prediction-using-gan's People

Contributors

Stargazers

Watchers

Forkers

xhua0336 dee-z knight-zhang niceboy120 montshasta2020 tradeforce-ai ajayarunachalam chenchends vonnnn kangmincho1 bobgxp petey9891 ainiusheng vedant-2001 lalala332 xietonglei prudhvi-somisetty mlgenometech lihua1213 sandyjoop johnnie2020 jyj545 zhiwenliu99 davidpaulkim mxdy21 gentlly norah-cn hyunchangyi parvathy90 prostory bee-llel vikashachary rswallow987 ssxzsqy ulandz bmh2361 polarbluebear lay4u kartha01 xingquan-li caglavol abraham314 falconerchen doaa450 ozturkc ipanditi seemeekang yanis112 frankliang66 ngovandau gurby123 zlives2 rahat1752 holly-h samuelo-101 sorrowpsalm wlsclw purumi77 sofia-guo-fu dr-yingyao nova-land skpalu biaoqingbao trunghieu-tran powerfulgun nasaares raymoond ichbinhippo qinggeli noohshiny ln12jt sharonshe euidong salmontt ufosky-ex eyoair21 kbsudhir yixu0hja pavadik wace723 bensive onejiajia vigrousjh wjxee chz367 hirokiv kima2005 qyum schrodingerssangru cieftn24 vnderson ravi2007147

stock-price-prediction-using-gan's Issues

WGAN extreme results

After executing the code many times, a high percentage of the time WGAN model gives extreme predictions. Also provides good results sometimes (as reported in the paper), but a lot of the time I get results like the graph shows. What is the explanation of it?

I also get inverse symmetries like this:

Basic GAN test data plot / missing plot functions

Hello,
would you please add the "Basic GAN test data plot" (Fig. 7 in jcssp.2021.188.196.pdf) missing plot functions?
I.e.:
def plot_testdataset_with2020_result(X_test, y_test):
....
test_with2020_RMSE = plot_testdataset_with2020_result(X_test, y_test)
print("----- Test_RMSE_LSTM_with2020 -----", test_with2020_RMSE)
Thank you for your time,
LM

jcssp.2021.188.196.pdf

A dumb question

Hello hungchun! I have a dumb question. For your basic GAN code, may I ask what the variable yc is? For instance, x represents the independent variables while y represents the dependent variables. How about yc?

I understand it is being used to get the loss of the model, however, I do not understand what yc represents in itself (and how it relates to the theoretical derivation of loss for a GAN model given that, in theory, only the original data and noise are supposedly inputted into the discriminator and generator).

Thank you for the clarification!

WGAN test data plot / missing plot functions

Hello,
would you please add the "WGAN test data plot" (Fig. 9 in jcssp.2021.188.196.pdf) missing plot functions?
I.e.:
def plot_testdataset_with2020_result(X_test, y_test):
....
test_with2020_RMSE = plot_testdataset_with2020_result(X_test, y_test)
print("----- Test_RMSE_LSTM_with2020 -----", test_with2020_RMSE)
Thank you for your time,
LM

jcssp.2021.188.196.pdf

Autoencoder.py

Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.

RESTART: C:\Users\sujan\Documents\2020_Papers\Data_Solar\2020_paper1\P5\Stock-price-prediction-using-GAN-Capstone-Group1-master\Code\2. Autoencoder.py
Number of training days: 1747. Number of test days: 750.
VAE(
(encoder): HybridSequential(
(0): Dense(None -> 400, linear)
(1): GELU(

)
(2): Dense(None -> 4, linear)
(3): Dense(None -> 400, linear)
(4): GELU(

)
(5): Dense(None -> 4, linear)
(6): Dense(None -> 400, linear)
(7): GELU(

)
(8): Dense(None -> 4, linear)

)
(decoder): HybridSequential(
(0): Dense(None -> 400, linear)
(1): GELU(

)
(2): Dense(None -> 35, Activation(sigmoid))
(3): Dense(None -> 400, linear)
(4): GELU(

)
(5): Dense(None -> 35, Activation(sigmoid))
(6): Dense(None -> 400, linear)
(7): GELU(

)
(8): Dense(None -> 35, Activation(sigmoid))

)
)

Training completed in 47 seconds.
The shape of the newly created (from the autoencoder) features is (2497, 35).
------ pca.n_components_ ------
3
[0.66342705 0.0908217 0.05160403]
Traceback (most recent call last):
File "C:\Users\sujan\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals.py", line 4857, in create_block_manager_from_blocks
placement=slice(0, len(axes[0])))]
File "C:\Users\sujan\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals.py", line 3205, in make_block
return klass(values, ndim=ndim, placement=placement)
File "C:\Users\sujan\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals.py", line 125, in init
'{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
ValueError: Wrong number of items passed 3, placement implies 4

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\sujan\Documents\2020_Papers\Data_Solar\2020_paper1\P5\Stock-price-prediction-using-GAN-Capstone-Group1-master\Code\2. Autoencoder.py", line 160, in
VAE_features = get_autoencoder("Apple")
File "C:\Users\sujan\Documents\2020_Papers\Data_Solar\2020_paper1\P5\Stock-price-prediction-using-GAN-Capstone-Group1-master\Code\2. Autoencoder.py", line 147, in get_autoencoder
VAE_features = pd.DataFrame(principalComponents, columns = ['VAE_PCA_1', 'VAE_PCA_2', 'VAE_PCA_3', 'VAE_PCA_4'])
File "C:\Users\sujan\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 379, in init
copy=copy)
File "C:\Users\sujan\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 536, in _init_ndarray
return create_block_manager_from_blocks([values], [columns, index])
File "C:\Users\sujan\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals.py", line 4866, in create_block_manager_from_blocks
construction_error(tot_items, blocks[0].shape[1:], axes, e)
File "C:\Users\sujan\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals.py", line 4843, in construction_error
passed, implied))
ValueError: Shape of passed values is (3, 2497), indices imply (4, 2497)

some questions

Because there is no comment, I want to ask where there is a train_ predict_ index.npy?

Training error

Getting error in training. with 8 dimension and 17 features.
:34 train_step *
real_y_reshape = tf.reshape(real_y, [real_y.shape[0], real_y.shape[1],1])
C:\Users\sujan\anaconda3\lib\site-packages\tensorflow\python\framework\tensor_shape.py:887 getitem
return self._dims[key].value

IndexError: list index out of range

‘requirements.txt’ request

Hi , Hungchun

I am very interested in your project and thank you for sharing it.

But , I encountered a problem when I run 4.Basic_GAN.py which is located on {4.Basic_GAN.py, function: ''train_step(self, real_x, real_y, yc)'', line no.83}.
The error information is :"Input 'b' of 'MatMul' Op has type float32 that does not match type float64 of argument 'a'.".

And #7 limoon20 told me that he/she did not see the same error.

So , I think it is an environmental configuration problem whitch means that my PC python environment is not suitable for your project.

Could you share the 'requirments.txt' with me so that I can get right environmental to run the code?
If you can, please contact me with the email: [email protected].
Thanks in advance.

Best regard
再也不敢

Reinforcement learning for hyperparameter optimization

Hi,

i'm trying to improve this software by adding Reinforcement learning for hyperparameter optimization： Rainbow based on
Q-learnin and Proximal Policy Optimization (PPO).

has anyone ever worked on it?

FileNotFoundError: [Errno 2] No such file or directory: 'yc_train.npy'

Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.

RESTART: C:\Users\sujan\Documents\2020_Papers\Data_Solar\2020_paper1\P5\Stock-price-prediction-using-GAN-Capstone-Group1-master\Code\wgan_gp.py
Traceback (most recent call last):
File "C:\Users\sujan\Documents\2020_Papers\Data_Solar\2020_paper1\P5\Stock-price-prediction-using-GAN-Capstone-Group1-master\Code\wgan_gp.py", line 21, in
yc_train = np.load("yc_train.npy", allow_pickle=True)
File "C:\Users\sujan\AppData\Local\Programs\Python\Python37\lib\site-packages\numpy\lib\npyio.py", line 428, in load
fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'yc_train.npy'

attribute error

when running this code on 4.basic_GAN.py

if name == 'main':
input_dim = X_train.shape[1]
feature_size = X_train.shape[2]
output_dim = y_train.shape[1]

For Bayesian

opt = {"lr": 0.00016, "epoch": 165, 'bs': 128}
generator = make_generator_model(X_train.shape[1], output_dim, X_train.shape[2])
discriminator = make_discriminator_model()
gan = GAN(generator, discriminator, opt)
Predicted_price, Real_price, RMSPE = gan.train(X_train, y_train, yc_train, opt)

error showing
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_1848/1524428345.py in
4 discriminator = make_discriminator_model()
5 gan = GAN(generator, discriminator, opt)
----> 6 Predicted_price, Real_price, RMSPE = gan.train(X_train, y_train, yc_train, opt)

AttributeError: 'GAN' object has no attribute 'train'

Missing files

Amazing project. Thanks for the effort. Files are missing, for eg; train_predict_index.npy, test_predict_index.npy

train_predict_index = np.load("train_predict_index.npy", allow_pickle=True)
test_predict_index = np.load("test_predict_index.npy", allow_pickle=True)

In the prediction, the model file is missing: gen_GRU_model_89.h5

G_model = tf.keras.models.load_model('gen_GRU_model_89.h5')

Really appreciate it if you could upload them. Thanks. :)

Data Leakage when normalize the train data and test data together?

dataset = pd.read_csv('Finaldata_with_Fourier.csv', parse_dates=['Date'])
...
y_value = pd.DataFrame(dataset.iloc[:, 3])
y_scaler = MinMaxScaler(feature_range=(-1, 1))
y_scaler.fit(y_value)
y_scale_dataset = y_scaler.fit_transform(y_value)
X, y, yc = get_X_y(X_scale_dataset, y_scale_dataset)
y_train, y_test, = split_train_test(y)
yc_train, yc_test, = split_train_test(yc)

DATA.CSV

Hello
I wanna train the model on the recent data.
Would you please provide the script for compiling the DATA.csv ?

Outstanding project + a naive comment

Hello Hungchun,
thank you for sharing this outstanding tool.
I have added two lines to your code (in my PC, do not panic :-) ) data_preprocessing.py, i.e.
np.save('test_predict_index.npy', test_predict_index)
np.save('train_predict_index.npy', train_predict_index)
Because, "baseline_LSTM.py" was complaining that it could not find them.
Please, let me know if there is a more elegant way to solve this issue.
Thank you for your time,
LM