Coder Social home page Coder Social logo

rl-mldm / alphagen Goto Github PK

View Code? Open in Web Editor NEW
396.0 396.0 135.0 397 KB

Generating sets of formulaic alpha (predictive) stock factors via reinforcement learning.

Python 99.37% Cython 0.63%
quantitative-trading reinforcement-learning symbolic-regression

alphagen's People

Contributors

chlorie avatar xuehongyanl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alphagen's Issues

因子size较小时的test ic和test rank ic效果太好的问题

图片
你好,请问一下为什么当PPO的step较小时,产生的少量因子在测试集上会有这么好的性能?而且好像大多数test ic和test rank ic一开始都是绝对值较大的负数,随着step的增大,他们才逐渐往绝对值大的正的方向上走?希望能帮我解答这个疑惑,感谢!

ModuleNotFoundError: No module named 'qlib.utils'

Traceback (most recent call last):
File "/mnt/d/develop/workspace/myRepository/alphagen/data_collection/fetch_baostock_data.py", line 14, in
from qlib_dump_bin import DumpDataAll
File "/mnt/d/develop/workspace/myRepository/alphagen/data_collection/qlib_dump_bin.py", line 17, in
from qlib.utils import fname_to_code, code_to_fname
ModuleNotFoundError: No module named 'qlib.utils'

OS: wsl(ubuntu 20),python(3.8)
Is it a qlib version issue?

数据复权处似乎有误

如下图所示,1处的df.index并不是date索引,而是默认的0,1,2这样的数字索引,应该是date才对吧?所以前面应该先df.set_index("date")吧

image

ValueError: instrument not exists

File "~/qlib/data/storage/file_storage.py", line 73, in check
raise ValueError(f"{self.storage_name} not exists: {self.uri}")
ValueError: instrument not exists: ~/.qlib/qlib_data/cn_data_rolling/instruments/csi300.txt

There is only all.txt in ~/.qlib/qlib_data/cn_data_rolling/instruments/
The version of qlib is v0.9.3

AlphaPool的device参数默认是cpu,可以改为gpu吗?

在train_maskable_ppo.py中main函数中,有如下调用, 没有设置AlphaPool的device参数,这样就会使用其默认参数 device: torch.device = torch.device('cpu')。我的问题是,如果机器有gpu,是不是这里应该设device为使用gpu?
pool = AlphaPool(
capacity=pool_capacity,
calculator=calculator_train,
ic_lower_bound=None,
l1_alpha=5e-3
)

1

1

内存占用过高,运行时间过长问题

想知道用cpu和GPU运行一次各需要多少时间。如果要复现KDD那篇论文,10次随机种子实验,因子池大小取【10,20,50,100】,多进程跑起来内存根本不够,想知道如何解决这个问题,望解答,十分感谢!

[Feat] 单因子挖掘

如果我不想挖掘一个因子池,只是想挖掘出一个最好的因子,应该改哪些地方呢?

或许,因子池加权求和后,本身也就变成了一个因子。

test_calculator=calculator_train?

in train_maskable_ppo.py:
checkpoint_callback = CustomCallback(
save_freq=10000,
show_freq=10000,
save_path='/path/for/checkpoints',
valid_calculator=calculator_valid,
test_calculator=calculator_train,
name_prefix=name_prefix,
timestamp=timestamp,
verbose=1,
)

valid验证集数据是不是没有用?

train_maskable_ppo.py中数据被划分成训练集、验证集、测试集。训练集数据用在
AlphaPool中,验证集和测试集放进了CustomCallback的参数。

但观察CustomCallback代码,它只使用测试集test_calculator去计算ic,并没有使用valid_calculator干什么事。所以我感觉验证集数据没被系统使用,请问是不是这样的?谢谢。

MaskablePPO model.learn Issue

debug1
debug2

When I run train_maskable_ppo.py, I encountered the issue in model.learn(
total_timesteps=steps,
callback=checkpoint_callback,
tb_log_name=f'{name_prefix}_{timestamp}',
)
The issue is in the attached pictures.

df is a Empty DataFrame

您好!
我在运行train_maskable_ppo.py时,能获取到data_train,data_valid, data_test的数据
data_train = StockData(instrument=instruments,
start_time='2009-01-01',
end_time='2018-12-31')
data_valid = StockData(instrument=instruments,
start_time='2019-01-01',
end_time='2019-12-31')
data_test = StockData(instrument=instruments,
start_time='2020-01-01',
end_time='2021-12-31')
而在运行trade_decision.py时,程序运行时会调用
StockData(instrument=instruments,
start_time='2022-11-18',
end_time='2023-11-17'
)
此时获取不到数据。
请问这是什么原因呢?是还不支持获取最新的数据嘛?

ValueError: too many values to unpack

[1262:MainThread](2023-08-17 09:03:31,182) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[ValueError: too many values to unpack (expected 2)].
File "train_maskable_ppo.py", line 176, in
main(seed=seed, instruments=instruments, pool_capacity=capacity, steps=steps[capacity])
File "train_maskable_ppo.py", line 159, in main
model.learn(
File "/home/users/zhangzh/anaconda3/envs/pythonProject/lib/python3.8/site-packages/sb3_contrib/ppo_mask/ppo_mask.py", line 514, in learn
total_timesteps, callback = self._setup_learn(
File "/home/users/zhangzh/anaconda3/envs/pythonProject/lib/python3.8/site-packages/sb3_contrib/ppo_mask/ppo_mask.py", line 239, in _setup_learn
self._last_obs = self.env.reset()
File "/home/users/zhangzh/anaconda3/envs/pythonProject/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 76, in reset
obs, self.reset_infos[env_idx] = self.envs[env_idx].reset(seed=self._seeds[env_idx])
ValueError: too many values to unpack (expected 2)
在运行train_mask_ppo时遇到这个错误?是因为什么呢

stock data loading problems

你好,

我们在尝试运行你的代码时遇到了一些问题,在运行 train_maskable_ppo.py 文件时,在data_train= stockdata(........),时报错。显示 cannot reshape an array of size 7915950 into shape (6,630)。

我们对您的论文的理论和应用很感兴趣,希望可以得到您的回复,和您论文的一些解答。

best

gp.py

gp.py中最开始的cache是从哪里读入的?

gp

机器是集成显卡,没有gpu,怎样运行例子文件?

我的机器是集成显卡的,运行train_maskable_ppo.py时出如下错误:
image
看样子是需要gpu。

我的问题是:要做什么修改才能让这个例子在集成显卡的机器上运行?是否换特定版本的torch就可以在集成显卡的机器运行?

values.reshape error

data = StockData(instrument=instruments,
start_time='2009-01-01',
end_time='2018-12-31')

stock_data.py", line 72, in _get_data
values = values.reshape((-1, len(features), values.shape[-1])) # type: ignore

ValueError: cannot reshape array of size 8015930 into shape (6,626)

negative days occur

image
image

while running calculator_test, this project is trying to create a wrong shape tensor, as days equal to -93

An exception has been raised[RuntimeError: Trying to create tensor with negative dimension -93: [-93, 300]].

after copying csi100.txt, csi300.txt, csi500.txt from cn_data to cn_data_rolling,
i run train_maskable.py with(seed=0, code='csi300', pool=10, step= 200_000),
and it raise the error:"RuntimeError: Trying to create tensor with negative dimension -93: [-93, 300]" . When i debug, it shows "days = period.stop - period.start - 1 + data.n_days" here turns to -93.

please let me know how to fix it.

gp.py在cpu下出错

我设置了device使用cpu,运行gp.py出现如下错误,请问怎么解决?谢谢!
[17036:MainThread](2023-11-15 22:18:57,732) INFO - qlib.Initialization - [init.py:74] - qlib successfully initialized based on client settings.
[17036:MainThread](2023-11-15 22:18:57,733) INFO - qlib.Initialization - [init.py:76] - data_path={'__DEFAULT_FREQ': WindowsPath('G:/qlibtutor/qlib_data/rq_cn_data_h5')}
[30284:MainThread](2023-11-15 22:18:57,759) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.].

File "", line 1, in
File "E:\anaconda3\envs\qlib230908\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "E:\anaconda3\envs\qlib230908\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "E:\anaconda3\envs\qlib230908\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "E:\anaconda3\envs\qlib230908\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "E:\anaconda3\envs\qlib230908\lib\runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "E:\anaconda3\envs\qlib230908\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "E:\anaconda3\envs\qlib230908\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "e:\myquant\alphagen-master\gp_qtb.py", line 29, in
data_train = StockData(instruments, '2009-01-01', '2018-12-31', device=device)
File "e:\myquant\alphagen-master\alphagen_qlib\stock_data.py", line 37, in init
self.data, self._dates, self._stock_ids = self._get_data()
File "e:\myquant\alphagen-master\alphagen_qlib\stock_data.py", line 67, in _get_data
df = self._load_exprs(features)
File "e:\myquant\alphagen-master\alphagen_qlib\stock_data.py", line 62, in _load_exprs
return (QlibDataLoader(config=exprs) # type: ignore
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\dataset\loader.py", line 143, in load
df = self.load_group_df(instruments, exprs, names, start_time, end_time)
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\dataset\loader.py", line 217, in load_group_df
df = D.features(instruments, exprs, start_time, end_time, freq=freq, inst_processors=inst_processors)
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\data.py", line 1191, in features
return DatasetD.dataset(instruments, fields, start_time, end_time, freq, inst_processors=inst_processors)
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\data.py", line 924, in dataset
data = self.dataset_processor(
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\data.py", line 578, in dataset_processor
ParallelExt(n_jobs=workers, backend=C.joblib_backend, maxtasksperchild=C.maxtasksperchild)(task_l),
File "E:\anaconda3\envs\qlib230908\lib\site-packages\joblib\parallel.py", line 1854, in call
n_jobs = self._initialize_backend()
File "E:\anaconda3\envs\qlib230908\lib\site-packages\joblib\parallel.py", line 1332, in _initialize_backend
n_jobs = self._backend.configure(n_jobs=self.n_jobs, parallel=self,
File "E:\anaconda3\envs\qlib230908\lib\site-packages\joblib_parallel_backends.py", line 526, in configure
self._pool = MemmappingPool(n_jobs, **memmappingpool_args)
[26132:MainThread](2023-11-15 22:18:57,769) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.].

File "E:\anaconda3\envs\qlib230908\lib\site-packages\joblib\pool.py", line 323, in init

wrong requirements

In requirement, you set qlib==0.0.2.dev20, which do not exist in microsoft pyqlib versions. OS I use is window 10, should I switch to Linux or not?

use np.linalg.lstsq to _optimize?

def _optimize(self, alpha: float, lr: float, n_iter: int) -> np.ndarray:
try:
return np.linalg.lstsq(self.mutual_ics[:self.size, :self.size],self.single_ics[:self.size])[0]
except:
return self.weights[:self.size];

very fast

Unable to reproduce the code in steps due to qlib data issues

➜ python train.py 
[160896:MainThread](2023-06-05 11:04:42,018) INFO - qlib.Initialization - [config.py:416] - default_conf: client.
[160896:MainThread](2023-06-05 11:04:42,509) INFO - qlib.Initialization - [__init__.py:74] - qlib successfully initialized based on client settings.
[160896:MainThread](2023-06-05 11:04:42,509) INFO - qlib.Initialization - [__init__.py:76] - data_path={'__DEFAULT_FREQ': PosixPath('/home/ray/.qlib/qlib_data/cn_data')}
[160896:MainThread](2023-06-05 11:04:49,568) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[ValueError: cannot reshape array of size 7915950 into shape (6,630)].
  File "train.py", line 175, in <module>
    main(seed=seed, instruments=instruments, pool_capacity=capacity, steps=steps[capacity])
  File "train.py", line 105, in main
    data = StockData(instrument=instruments,
  File "/home/ray/workspace/alphagen/alphagen_qlib/stock_data.py", line 37, in __init__
    self.data, self._dates, self._stock_ids = self._get_data()
  File "/home/ray/workspace/alphagen/alphagen_qlib/stock_data.py", line 72, in _get_data
    values = values.reshape((-1, len(features), values.shape[-1]))  # type: ignore
ValueError: cannot reshape array of size 7915950 into shape (6,630)

bug:单标的数据维度不对

`from alphagen.data.expression import *
from alphagen_generic.features import *
import torch

if name == "main":

# device = torch.device('cuda:0')
device=torch.device('cpu')

data = StockData(instrument=["SH600038"], start_time="2021-01-01", end_time="2023-06-30", max_backtrack_days=0,max_future_days=0, device=device)
print(data.make_dataframe(data._get_data()[0]))`

运行如上代码出如下错误,如果设置参数instrument=“csi300”,则结果正确。看样子对设置单一标的,数据维度不对:
[22432:MainThread](2023-12-29 16:34:42,113) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[ValueError: number of stocks in the provided tensor (6) doesn't match that of the current StockData (1)].
File "e:/myquant/alphagen-master/test2.py", line 16, in
print(data.make_dataframe(data._get_data()[0]))
File "e:\myquant\alphagen-master\alphagen_qlib\stock_data.py", line 121, in make_dataframe
raise ValueError(f"number of stocks in the provided tensor ({n_stocks}) doesn't "
ValueError: number of stocks in the provided tensor (6) doesn't match that of the current StockData (1)

安装依赖库时发生冲突,应该如何解决?

你好,
请问在按照requirements.txt安装依赖库时出现冲突,该如何解决?
冲突如下:
The conflict is caused by:
The user requested numpy==1.20.1
gym 0.26.2 depends on numpy>=1.18.0
matplotlib 3.3.4 depends on numpy>=1.15
pandas 1.2.4 depends on numpy>=1.16.5
stable-baselines3 2.0.0 depends on numpy>=1.20
shimmy 1.1.0 depends on numpy>=1.18.0
gymnasium 0.28.1 depends on numpy>=1.21.0
希望得到您的解答
Best wishes

lookforward bias

Using forwarding adjusting will cause the lookforward bias, you should use backforward adjusting or non adjusting

FileNotFoundError: [Errno 2] No such file or directory: '/.qlib/qlib_data/cn_data/instruments/all.txt'

你好,我在命令行下运行指令:python fetch_baostock_data.py
出现以下错误:
Forward adjust date: 2023-11-04
Loading A-Shares stock list
Traceback (most recent call last):
File "fetch_baostock_data.py", line 276, in
dm.fetch_and_save_data()
File "fetch_baostock_data.py", line 253, in fetch_and_save_data
self._load_all_a_shares()
File "fetch_baostock_data.py", line 89, in _load_all_a_shares
self._load_all_a_shares_base()
File "fetch_baostock_data.py", line 81, in _load_all_a_shares_base
lines = _read_all_text(f"{self._qlib_path}/instruments/all.txt").split('\n')
File "fetch_baostock_data.py", line 18, in _read_all_text
with open(path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: ‘/.qlib/qlib_data/cn_data/instruments/all.txt'
请问是要自己创建这个all.txt文件嘛?
谢谢!!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.