Coder Social home page Coder Social logo

leci37 / tensorflow-stocks-prediction-machine-learning-realtime Goto Github PK

View Code? Open in Web Editor NEW
224.0 224.0 70.0 29.52 MB

Predict operation stocks points (buy-sell) with past technical patterns, and powerful machine-learning libraries such as: Sklearn.RandomForest , Sklearn.GradientBoosting, XGBoost, Google TensorFlow and Google TensorFlow LSTM..Real time Twitter:

Home Page: https://twitter.com/Whale__Hunters

Python 99.95% Batchfile 0.05%
deep-learning machine-learning python stocks tensorflow trade trader-bot

tensorflow-stocks-prediction-machine-learning-realtime's People

Contributors

leci37 avatar makovez avatar theharold avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorflow-stocks-prediction-machine-learning-realtime's Issues

[Bug Report] there is a mismatch between the length of the header or column names and the actual data being read into a pandas DataFrame. This could result in some data loss or incorrect parsing of the CSV file.

there is a mismatch between the length of the header or column names and the actual data being read into a pandas DataFrame. This could result in some data loss or incorrect parsing of the CSV file.

Output given:
year1month10
UBER ==== https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY_EXTENDED&symbol=UBER&interval=15min&slice=year1month10&apikey=FXZ0
/Users/Grau/Desktop/trade/stocks-prediction-Machine-learning-RealTime-telegram-master/0_API_alphavantage_get_old_history.py:89: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False.
df_a_time = pd.read_csv(io.StringIO(raw_response.text), index_col=False, sep=',')
UBER df: (2, 1)
year1month11
UBER ==== https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY_EXTENDED&symbol=UBER&interval=15min&slice=year1month11&apikey=FXZ0
Traceback (most recent call last):
File "/Users/Grau/Desktop/trade/stocks-prediction-Machine-learning-RealTime-telegram-master/0_API_alphavantage_get_old_history.py", line 89, in
df_a_time = pd.read_csv(io.StringIO(raw_response.text), index_col=False, sep=',')

Step 3. error importing set_session

When running 3_Model_creation_models_for_a_stock.py I'm receiving the following error:

Traceback (most recent call last):
  File "3_Model_creation_models_for_a_stock.py", line 6, in <module>
    from keras.backend import set_session
ImportError: cannot import name 'set_session' from 'keras.backend'

Is set_session needed? Line 14 appears to be commented out.

I'm using the following package versions:

  • keras 2.13.1
  • tensorflow 2.13.0

Note that I'm able to change

from tensorflow import keras
from keras.backend import set_session

TO
from tensorflow.python.keras.backend import set_session
set_sesion is imported successfully.

To me, this implies that keras should not be included in the requirement doc since it is part of tensorflow.

distplot is a deprecated and will be removed in seaborn v0.14.0.

When running get_technical_indicators.py the below warning appears. It's fine for now but we'll have to update this at somepoint in the future.

C:\Users\admin\projects\stocks-prediction-Machine-learning-RealTime-telegram\Utils\Utils_plotter.py:311: UserWarning:

distplot is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either displot (a figure-level function with
similar flexibility) or histplot (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(df[ele_B][(df[main_label] == 0)], bins=50, label="Nothing", color="#FFCC33") # amarillo huevo
C:\Users\admin\projects\stocks-prediction-Machine-learning-RealTime-telegram\Utils\Utils_plotter.py:313: UserWarning:

distplot is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either displot (a figure-level function with
similar flexibility) or histplot (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(df[ele_B][(df[main_label] == 100)], bins=50, label="Point of Buy", color="#00CC00") # verde
C:\Users\admin\projects\stocks-prediction-Machine-learning-RealTime-telegram\Utils\Utils_plotter.py:315: UserWarning:

distplot is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either displot (a figure-level function with
similar flexibility) or histplot (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(df[ele_B][(df[main_label] == -100)], bins=50, label="Point of Sale ", color="#FF00FF") # Magenta

Numpy requirement conflict during pip install.

https://github.com/Leci37/TensorFlow-stocks-prediction-Machine-learning-RealTime/blame/386a580efbb2828dd366b20df2a031d45e5f9252/requirements_x.y.z.txt#L6

When installing with requirements_x.y.z.txt I get the following error. (numpy cannot satisfy all requirements)

ERROR: Cannot install -r requirements_x.y.z.txt (line 2), -r requirements_x.y.z.txt (line 3), -r requirements_x.y.z.txt (line 6) and numpy~=1.21.6 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested numpy~=1.21.6
    pandas 1.3.5 depends on numpy>=1.17.3; platform_machine != "aarch64" and platform_machine != "arm64" and python_version < "3.10"
    numba 0.56.4 depends on numpy<1.24 and >=1.18
    finplot 1.8.4 depends on numpy>=1.22.3
    The user requested numpy~=1.21.6
    pandas 1.3.5 depends on numpy>=1.17.3; platform_machine != "aarch64" and platform_machine != "arm64" and python_version < "3.10"
    numba 0.56.4 depends on numpy<1.24 and >=1.18
    finplot 1.8.3 depends on numpy>=1.22.3
    The user requested numpy~=1.21.6
    pandas 1.3.5 depends on numpy>=1.17.3; platform_machine != "aarch64" and platform_machine != "arm64" and python_version < "3.10"
    numba 0.56.4 depends on numpy<1.24 and >=1.18
    finplot 1.8.2 depends on numpy>=1.22.3

By removing the requirement of 1.8.2, pip installed 1.8.1 successfully.

get_technical_indicators.py issue

When I run python get_technical_indicators.py but there it shows the below error.

python get_technical_indicators.py
Traceback (most recent call last):
File "get_technical_indicators.py", line 3, in
import yhoo_history_stock
File "/stocks-prediction-Machine-learning-RealTime-telegram/yhoo_history_stock.py", line 12, in
from technical_indicators.talib_technical_class_object import TechData
File "/stocks-prediction-Machine-learning-RealTime-telegram/technical_indicators/talib_technical_class_object.py", line 1, in
from Utils import Utils_Yfinance, Utils_buy_sell_points
File "/stocks-prediction-Machine-learning-RealTime-telegram/Utils/Utils_buy_sell_points.py", line 11, in
from Utils import Utils_plotter
File "/stocks-prediction-Machine-learning-RealTime-telegram/Utils/Utils_plotter.py", line 2, in
import finplot as fplt
File "/usr/local/lib/python3.8/site-packages/finplot/init.py", line 24, in
import pyqtgraph as pg
File "/usr/local/lib/python3.8/site-packages/pyqtgraph/init.py", line 18, in
from .colors import palette
File "/usr/local/lib/python3.8/site-packages/pyqtgraph/colors/palette.py", line 1, in
from ..Qt import QtGui
File "/usr/local/lib/python3.8/site-packages/pyqtgraph/Qt/init.py", line 54,
Screenshot from 2023-06-07 19-41-29
in
raise Exception("PyQtGraph requires one of PyQt5, PyQt6, PySide2 or PySide6; none of these packages could be imported.")
Exception: PyQtGraph requires one of PyQt5, PyQt6, PySide2 or PySide6; none of these packages could be imported.

possible collaboration?

Hi, I really appreciate your work, it is well written and explains in details each step.

Would it be possible to speak with you via telegram?

My telegram user is @sbongown

Stuck at first try, Using Alpaca is not working, alphavantage is not either. So come up with yahoo finance but technical indicator is not reading.

import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta

import _KEYS_DICT

def get_bars(symbol, start_date, end_date, interval='15m'):
start_date = datetime.strptime(start_date, "%Y-%m-%d")
end_date = datetime.strptime(end_date, "%Y-%m-%d")

all_data = []

while start_date < end_date:
    temp_end_date = start_date + timedelta(days=7)  # Fetching data in 7-day chunks
    df = yf.download(symbol, start=start_date.strftime("%Y-%m-%d"), end=temp_end_date.strftime("%Y-%m-%d"), interval=interval)
    all_data.append(df)
    start_date = temp_end_date

final_df = pd.concat(all_data, axis=0)
return final_df[['Open', 'High', 'Low', 'Close', 'Volume']]  # Only select the columns you need

Set your parameters

START_DATE = '2023-07-05'
END_DATE = '2023-09-01'
CSV_NAME = "@chill"
stocks_list = _KEYS_DICT.DICT_COMPANYS[CSV_NAME]

for symbol in stocks_list:
# Fetch the data
print("Starting data fetching process Stock: ", symbol)
df = get_bars(symbol, START_DATE, END_DATE)
print("Data fetching process completed df.shape: ", df.shape)

# Check if the dataframe is not empty and if there are no NaT values in the index
if not df.empty and not df.index.isna().any():
    # Save the data as a CSV file
    max_recent_date = df.index.max().strftime("%Y%m%d")
    min_recent_date = df.index.min().strftime("%Y%m%d")
    print("d_price/RAW_alpha/alpha_" + symbol + '_' + '1Min' + "_" + max_recent_date + "__" + min_recent_date + ".csv")
    df.to_csv("d_price/RAW_alpha/alpha_" + symbol + '_' + '1Min' + "_" + max_recent_date + "__" + min_recent_date + ".csv",sep="\t", index=True)
    print("\tSTART: ", str(df.index.min()),  "  END: ", str(df.index.max()) , " shape: ", df.shape, "\n")
else:
    print ("error none in stock: ", symbol)

Files are attached, what am I doing wrong ?
(crypto_ml) C:\Users\dj_m0\Documents\PythonScripts\stocks-prediction-Machine-learning-RealTime-TensorFlow>python 1_Get_technical_indicators.py h5py._conv - [DEBUG]{MainThread} - <module>() - Creating converter from 7 to 5 h5py._conv - [DEBUG]{MainThread} - <module>() - Creating converter from 5 to 7 h5py._conv - [DEBUG]{MainThread} - <module>() - Creating converter from 7 to 5 h5py._conv - [DEBUG]{MainThread} - <module>() - Creating converter from 5 to 7 matplotlib - [DEBUG]{MainThread} - wrapper() - matplotlib data path: C:\Users\dj_m0\anaconda3\envs\crypto_ml\lib\site-packages\matplotlib\mpl-data matplotlib - [DEBUG]{MainThread} - wrapper() - CONFIGDIR=C:\Users\dj_m0\.matplotlib matplotlib - [DEBUG]{MainThread} - <module>() - interactive is False matplotlib - [DEBUG]{MainThread} - <module>() - platform is win32 matplotlib - [DEBUG]{MainThread} - wrapper() - CACHEDIR=C:\Users\dj_m0\.matplotlib matplotlib.font_manager - [DEBUG]{MainThread} - _load_fontmanager() - Using fontManager instance from C:\Users\dj_m0\.matplotlib\fontlist-v330.json C:\Users\dj_m0\Documents\PythonScripts\stocks-prediction-Machine-learning-RealTime-TensorFlow\py_ti\helper_loops.py:6: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. def wilders_loop(data, n): C:\Users\dj_m0\Documents\PythonScripts\stocks-prediction-Machine-learning-RealTime-TensorFlow\py_ti\helper_loops.py:18: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. def kama_loop(data, sc, n_er, length): C:\Users\dj_m0\Documents\PythonScripts\stocks-prediction-Machine-learning-RealTime-TensorFlow\py_ti\helper_loops.py:33: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. def psar_loop(psar, high, low, af_step, max_af): C:\Users\dj_m0\Documents\PythonScripts\stocks-prediction-Machine-learning-RealTime-TensorFlow\py_ti\helper_loops.py:100: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. def supertrend_loop(close, basic_ub, basic_lb, n): C:\Users\dj_m0\Documents\PythonScripts\stocks-prediction-Machine-learning-RealTime-TensorFlow\py_ti\helper_loops.py:138: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. def fib_loop(n): yfinance - [DEBUG]{MainThread} - wrapper() - Entering history() yfinance - [DEBUG]{MainThread} - history() - AAPL: Yahoo GET parameters: {'range': '60d', 'interval': '15m', 'includePrePost': True, 'events': 'div,splits,capitalGains'} urllib3.connectionpool - [DEBUG]{MainThread} - _new_conn() - Starting new HTTPS connection (1): query2.finance.yahoo.com:443 urllib3.connectionpool - [DEBUG]{MainThread} - _make_request() - https://query2.finance.yahoo.com:443 "GET /v8/finance/chart/AAPL?range=60d&interval=15m&includePrePost=True&events=div%2Csplits%2CcapitalGains HTTP/1.1" 200 None yfinance - [DEBUG]{MainThread} - history() - AAPL: yfinance received OHLC data: 2023-06-13 08:00:00 -> 2023-09-07 23:45:00 yfinance - [DEBUG]{MainThread} - history() - AAPL: OHLC after cleaning: 2023-06-13 04:00:00-04:00 -> 2023-09-07 19:45:00-04:00 yfinance - [DEBUG]{MainThread} - history() - AAPL: OHLC after combining events: 2023-06-13 04:00:00-04:00 -> 2023-09-07 19:45:00-04:00 yfinance - [DEBUG]{MainThread} - history() - AAPL: yfinance returning OHLC: 2023-06-13 04:00:00-04:00 -> 2023-09-07 19:45:00-04:00 yfinance - [DEBUG]{MainThread} - wrapper() - Exiting history() MONTH_3_ADD_LO The action history is searched for Files: Read historical data from: d_price/RAW_alpha/alpha_AAPL_1Min_20230905__20230712.csv Traceback (most recent call last): File "1_Get_technical_indicators.py", line 46, in <module> df_download = yhoo_history_stock.get_favs_SCALA_csv_stocks_history_Download_list(list_stocks, CSV_NAME, opion, GENERATED_JSON_RELATIONS = GENERATED_JSON_RELATIONS) File "C:\Users\dj_m0\Documents\PythonScripts\stocks-prediction-Machine-learning-RealTime-TensorFlow\yhoo_history_stock.py", line 221, in get_favs_SCALA_csv_stocks_history_Download_list df_all_generate_history ,df_l = get_favs_SCALA_csv_stocks_history_Download_One(df_all_generate_history, l, opion) File "C:\Users\dj_m0\Documents\PythonScripts\stocks-prediction-Machine-learning-RealTime-TensorFlow\yhoo_history_stock.py", line 246, in get_favs_SCALA_csv_stocks_history_Download_One df_l, df_RAW = get_stock_history_Tech_download(l, opion, get_technical_data=True, File "C:\Users\dj_m0\Documents\PythonScripts\stocks-prediction-Machine-learning-RealTime-TensorFlow\yhoo_history_stock.py", line 85, in get_stock_history_Tech_download df_his = __select_dowload_time_config(interval, opion, prepost, stockId) File "C:\Users\dj_m0\Documents\PythonScripts\stocks-prediction-Machine-learning-RealTime-TensorFlow\yhoo_history_stock.py", line 183, in __select_dowload_time_config df_path_raw = pd.read_csv("d_price/RAW_alpha/" + patH_raw, index_col='Date', sep='\t') File "C:\Users\dj_m0\anaconda3\envs\crypto_ml\lib\site-packages\pandas\io\parsers\readers.py", line 912, in read_csv return _read(filepath_or_buffer, kwds) File "C:\Users\dj_m0\anaconda3\envs\crypto_ml\lib\site-packages\pandas\io\parsers\readers.py", line 583, in _read return parser.read(nrows) File "C:\Users\dj_m0\anaconda3\envs\crypto_ml\lib\site-packages\pandas\io\parsers\readers.py", line 1704, in read ) = self._engine.read( # type: ignore[attr-defined] File "C:\Users\dj_m0\anaconda3\envs\crypto_ml\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 335, in read index, column_names = self._make_index(date_data, alldata, names) File "C:\Users\dj_m0\anaconda3\envs\crypto_ml\lib\site-packages\pandas\io\parsers\base_parser.py", line 363, in _make_index simple_index = self._get_simple_index(alldata, columns) File "C:\Users\dj_m0\anaconda3\envs\crypto_ml\lib\site-packages\pandas\io\parsers\base_parser.py", line 395, in _get_simple_index i = ix(idx) File "C:\Users\dj_m0\anaconda3\envs\crypto_ml\lib\site-packages\pandas\io\parsers\base_parser.py", line 390, in ix raise ValueError(f"Index {col} invalid") ValueError: Index Date invalid

alpha_AAPL_1Min_20230905__20230712.csv
alpha_TSLA_1Min_20230905__20230712.csv

0_API_alphavantage_get_old_history.py not working since alphavantage API url changed and non of this files are working

0_API_alphavantage_get_old_history.py not working since alphavantage API url changed and non of this files are working.
If possible please rebuild this excellent project and upload the new version please. Also install failed with mitmproxy.

when I put print column: the data looks like
[' "1. Information": "Intraday (15min) open', ' high', ' low', ' close prices and volume"', 'Unnamed: 4']

but actually code having something different

Traceback (most recent call last):
File "0_API_alphavantage_get_old_history.py", line 107, in
df_S_all = df_S_all.sort_values(['Date'], ascending=True)

raise KeyError(key)

KeyError: 'Date'

Respond to Multidimensional arrays feeding in tensorflow

Respond to https://stackoverflow.com/questions/40964817/multidimensional-arrays-feeding-in-tensorflow/75948534#75948534

To feed a network with multidimensional arrays in tensorflow
Here is a good example of how to read a .csv (2d array, for one GT reference one time row).
Balance and SMOTE the ground true GT data and have TF process and train it as a

  • Multidimension, 3d array (with time windows), for one GT reference N previous time rows. Explanation HERE
  • Monodimension, instead of , 2d array , for one GT reference
    one time row. Explanation NO HERE

Very good simple Example: https://github.com/Leci37/stocks-prediction-Machine-learning-RealTime-telegram/blob/master/Tutorial/RUN_buy_sell_Tutorial_3W_5min_RT.py

3d data training, from .csv information
For TF to fit() multidimensional 3D arrays requires , for example a code:
Full code here: https://github.com/Leci37/stocks-prediction-Machine-learning-RealTime-telegram/blob/master/Model_train_TF_multi_onBalance.py
train_features is a 3d array

#DATOS desequilibrados https://www.tensorflow.org/tutorials/structured_data/imbalanced_data
def train_TF_Multi_dimension_onBalance(multi_data: Data_multidimension, model_h5_name, model_type : _KEYS_DICT.MODEL_TF_DENSE_TYPE_MULTI_DIMENSI):

    # 1.0  LOAD the data with TF format (split, 3D, SMOTE and balance)
    array_aux_np, train_labels, val_labels, test_labels, train_features, val_features, test_features, bool_train_labels = multi_data.get_all_data()
    # TRAIN
    neg, pos = np.bincount(array_aux_np) #(df[Y_TARGET])
    initial_bias = np.log([pos / neg])

    # 2.0  STOP EARLY load and created a CustomEarlyStopping to avoid overfit
    resampled_steps_per_epoch = np.ceil(2.0 * neg / BATCH_SIZE)
    early_stopping = Utils_model_predict.CustomEarlyStopping(patience=8)

    # 3.0 TRAIN get the model from list of models objects and train
    model = multi_data.get_dicts_models_multi_dimension(model_type)
    model_history = model.fit(
          x=train_features, y=train_labels ,
          epochs=EPOCHS,
          steps_per_epoch=resampled_steps_per_epoch,
          callbacks=[early_stopping],  # callbacks=[early_stopping, early_stopping_board],
          validation_data=(val_features, val_labels),  verbose=0)

    # 3.1 Save the model to reuse into .h5 file
    model.save(MODEL_FOLDER_TF_MULTI + model_h5_name)
    print(" Save model Type MULTI TF: " + model_type.value +"  Path:  ", MODEL_FOLDER_TF_MULTI + model_h5_name)

    # 4.0 Eval de model with test_features this data had splited , and the .h5 model never see it
    predit_test  = model.predict(test_features).reshape(-1,)

To get the data in the correct 3d array format is required: (The example is from unbalanced GT data and contemplates the corrections balance and smote )
Full code here: https://github.com/Leci37/stocks-prediction-Machine-learning-RealTime-telegram/blob/master/Data_multidimension.py

  • 1.0 Get 2D array from .csv with one column of GT and create a MULTIDIMENSION

  • 2.0 SCALER scaling the data before, save a .scal file (it will be used to know how to scale the model for future predictions )

  • 2.1 Let's put real groound True Y_TARGET in a copy of scaled dataset

  • 3.0 SPLIT Ok we should split in 3 train val and test

  • 4.0 SMOTE train_df to balance the data since there are few positive inputs, you have to generate "neighbors" of positive inputs. only in
    the df_train.

  • 5 PREPARE the data to be entered in TF with the correct (3d array) dimensions

  • 5.1 insert Y_TARGET labels to 2D array required for TF

  • 6 DISPLAY show the df format before accessing TF

def load_split_data_multidimension(self):
    df = Utils_model_predict.load_and_clean_DF_Train_from_csv(self.path_CSV, self.op_buy_sell, self.columns_selection) # 
    # SMOTE and Tomek links
    # The SMOTE oversampling approach could generate noisy samples since it creates synthetic data. To solve this problem, after SMOTE, we could use undersampling techniques to clean up. We’ll use the Tomek links undersampling technique in this example.
    # Utils_plotter.plot_2d_space(df.drop(columns=[Y_TARGET]).iloc[:,4:5] , df[Y_TARGET], path = "SMOTE_antes.png")
    array_aux_np = df[Y_TARGET]  # TODO antes o despues del balance??
    self.array_aux_np = array_aux_np
    print("1.0 ADD MULTIDIMENSION  Get 2D array , with BACHT_SIZE_LOOKBACK from "backward glances".")
    # Values go from (10000 rows, 10 columns ) to (10000 rows, ( 10-1[groundTrue] * 10 dimensions ) columns ) but for the moment it does not go to 3d array remains 2d.
    # df.shape: (1000, 10) to (1000, 90)
    arr_mul_labels, arr_mul_features = Utils_model_predict.df_to_df_multidimension_array_2D(df.reset_index(drop=True), BACHT_SIZE_LOOKBACK = self.BACHT_SIZE_LOOKBACK)
    shape_imput_3d = (-1,self.BACHT_SIZE_LOOKBACK, len(df.columns)-1) # (-1, 10, 12)
    print("1.1 validate the structure of the data, this can be improved by")
    arr_vali = arr_mul_features.reshape(shape_imput_3d) # 5077, 10, 12
    for i in range(1, arr_vali.shape[0], self.BACHT_SIZE_LOOKBACK * 3):
        list_fails_dates = [x for x in arr_vali[i][:, 0] if not (2018 <= datetime.fromtimestamp(x).year <= 2024)]
        if list_fails_dates:
            Logger.logr.error("The dates of the new 2D array do not appear in the first column. ")
            raise ValueError("The dates of the new 2D array do not appear in the first column. ")
    print("2.0 SCALER  scaling the data before, save a .scal file (it will be used to know how to scale the model for future predictions )")
    # Do I have to scale now or can I wait until after I split
    # You can scale between the following values _KEYS_DICT.MIN_SCALER, _KEYS_DICT.MAX_SCALER
    # " that you learn for your scaling so that doing scaling before or after may give you the same results (but this depends on the actual scaling function)."  https://datascience.stackexchange.com/questions/71515/should-i-scale-data-before-or-after-balancing-dataset
    # TODO verify the correct order to "scaler split and SMOTE" order SMOTE.  sure: SMOTE only aplay on train_df
    arr_mul_features =  Utils_model_predict.scaler_min_max_array(arr_mul_features,path_to_save= _KEYS_DICT.PATH_SCALERS_FOLDER+self.name_models_stock+".scal")
    arr_mul_labels = Utils_model_predict.scaler_min_max_array(arr_mul_labels.reshape(-1,1))
    print("2.1 Let's put real groound True Y_TARGET  in a copy of scaled dataset")
    df_with_target = pd.DataFrame(arr_mul_features)
    df_with_target[Y_TARGET] = arr_mul_labels.reshape(-1,)
    print("3.0 SPLIT Ok we should split in 3 train val and test")
    # "you divide your data first and then apply synthetic sampling SMOTE on the training data only" https://datascience.stackexchange.com/questions/15630/train-test-split-after-performing-smote
    # CAUTION SMOTE generates twice as many rows
    train_df, test_df = train_test_split(df_with_target, test_size=0.18, shuffle=self.will_shuffle) # Shuffle in a time series? hmmm
    train_df, val_df = train_test_split(train_df, test_size=0.35, shuffle=self.will_shuffle)  # Shuffle in a time series? hmmm
    # Be carefull not to touch test_df, val_df
    # Apply smote only to train_df but first remove Y_TARGET from train_df
    print("3.1 Create a array 2d form dfs . Remove Y_target from train_df, because that's we want to predict and that would be cheating")
    train_df_x = np.asarray(train_df.drop(columns=[Y_TARGET] ) )
    # In train_df_y We drop everything except Y_TARGET
    train_df_y = np.asarray(train_df[Y_TARGET] )
    print("4.0 SMOTE train_df to balance the data since there are few positive inputs, you have to generate "neighbors" of positive inputs. only in the df_train.") 
    # Now we can smote only train_df . Doing the smote with 2D, with 3D is not possible.
    X_smt, y_smt = Utils_model_predict.prepare_to_split_SMOTETomek_01(train_df_x, train_df_y)
    print("4.1 Let's put real groound True Y_TARGET  in a copy of scaled dataset")
    train_cleaned_df_target = pd.DataFrame(X_smt)
    train_cleaned_df_target[Y_TARGET] = y_smt.reshape(-1,)
    #the SMOTE leaves the positives very close together
    train_cleaned_df_target = shuffle(train_cleaned_df_target)
    print("5 PREPARE the data to be entered in TF with the correct dimensions")
    print("5.1 pass Y_TARGET labels to 2D array required for TF")
    train_labels = np.asarray(train_cleaned_df_target[Y_TARGET]).astype('float32').reshape((-1, 1)) # no need already 2d
    bool_train_labels = (train_labels != 0).reshape((-1))
    val_labels = np.asarray(val_df[Y_TARGET]).astype('float32').reshape((-1, 1)) # no need already 2d
    test_labels = np.asarray(test_df[Y_TARGET]).astype('float32').reshape((-1, 1)) # no need already 2d
    print("5.2 all array windows that were in 2D format (to overcome the SCALER and SMOTE methods),")
    # must be displayed in 3D for TF by format of varible shape_imput_3d
    train_features = np.array(train_cleaned_df_target.drop(columns=[Y_TARGET]) ).reshape(shape_imput_3d)
    test_features = np.array(test_df.drop(columns=[Y_TARGET]) ).reshape(shape_imput_3d )
    val_features = np.array(val_df.drop(columns=[Y_TARGET]) ).reshape(shape_imput_3d )
    print("6 DISPLAY show the df format before accessing TF")
    Utils_model_predict.log_shapes_trains_val_data(test_features, test_labels, train_features, train_labels, val_features, val_labels)


    self.imput_shape = (train_features.shape[1], train_features.shape[2])

    self.train_labels = train_labels
    self.val_labels = val_labels
    self.test_labels = test_labels
    self.train_features = train_features
    self.val_features = val_features
    self.test_features = test_features
    self.bool_train_labels = bool_train_labels

def get_all_data(self):
        return self.array_aux_np, self.train_labels, self.val_labels, self.test_labels, self.train_features, self.val_features, self.test_features, self.bool_train_labels

No code ?

Is the code available in some different branch? I don't see any!

Timescales

It will take me a while to read and digest your excellent readme notes and instructions, but I think you mention 15min candles. Would this code also work on higher timescales, for example could I setup to test using end-of-day prices, or would that make no sense because of other real-time inputs?

Package pyine is yanked. Is it necessary?

I receive the following when trying to install pyine 1.1.2 from requirements_x.y.z.txt

ERROR: Ignored the following yanked versions: 1.0.1, 1.1.0, 1.1.1, 1.1.2
ERROR: Could not find a version that satisfies the requirement pyine~=1.1.2 (from versions: none)
ERROR: No matching distribution found for pyine~=1.1.2

I haven't yet installed and tested the project yet (though) I see little use of pyine, but if it is required this yanked version can be forced by using the following in the requirements file.
pyine==1.1.1

Requires Statement.py and realtime_model_POOL_driver.py

Statement.py
realtime_model_POOL_driver.py
I'm very excited about this stock prediction project! The features that this project brings are truly special and appealing. The project has conducted experiments with not only 36 different models but also provides multiple options for combining features. This allows for easy scalability with technologies like TensorFlow, XGBoost, Sklearn, LSTM, GRU, Dense, LINEAR, and many other models. I can't resist my curiosity to explore more details about this project and the potential it holds. It's possible that this is a breakthrough in stock prediction. When carrying out the project, I found the need for the two files mentioned above. Can you provide me with full access to the project?

Proposal and some questions

Proposal and some questions

Hello. I really like your project, I have reviewed all the readme and tutorials in detail. I have a few questions and maybe some suggestions for improvement:

  • What is the difference between raw, raw alpha and other folders in d_price?
  • Which of these folders should I put my OHLCV data in, and what file names should I give them (or how can I change this in the program?)?
  • As far as I understand, you get news data from twitter (is that correct?)... Suggestion: add basic functionality for parsing data from the listed RSS feeds.
  • how to perform calculation of technical indicators with your script using my OHLCV data ? (some cryptocurrencies - they are missing on alphavantage and yahoo)
  • As far as I understand, you are using yahoo data for the last 6-7 days for forecasting. If I use another data source, what frequency should this data have (1 minute, ticks, 10 minutes)? How often should this data be updated relative to the present (offset)?
  • Forecasting doesn't work at all for overnight and evening data as they were missing from the datasets, do you have any thoughts or intentions to fix this?

Checklist

  • I have checked that there is no similar issue in the repo (required)

Thanks, I would be happy to get feedback

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.