rl-mldm / alphagen Goto Github PK

View Code? Open in Web Editor NEW

396.0 396.0 135.0 397 KB

Generating sets of formulaic alpha (predictive) stock factors via reinforcement learning.

Python 99.37% Cython 0.63%

quantitative-trading reinforcement-learning symbolic-regression

alphagen's People

Contributors

Stargazers

Watchers

Forkers

jianbotang zhuohan 393928715 bigandsweet chichihua zedrover 20cmdingding xfx88 lihj1024 ringwraith jasoncai10 bluebell136 qwzhong1988 hesongfan wzy1019288 mh5865 dfhby0 welaunchcn shenrun qian218 yuxinsun-wpi bohr1005 zerounnet dmsama99 lukedongnus internetbroker ichbinhippo ericxiaofeixia quanthao openself ch1plus1 quanxiang-liu ycl010203 jingmouren tonylibing sowelswl 134579 zhangjunapk lq2020phys doacmy yutiansut alihaskar kingmo888 1191008459 zoroastrian-nova wjsbjl junyueliu hwang127 songyigao kifile yuzhenyang wangdeyu shaojintian funkyungjz aiquantrobot tonghufu robertmay615 nathanielwei xinshou0 liuxiangyu17 jsyzc2019 jimphon88 quantmonkey aliosd kmeimagma highjim mr-nobody-dey toughstyle gaussleescorpio aprilffff shagru chine007 yaozhaoyz wangyuaqi sudarsan-sridharan jie-lei guoz14 rexvalenti450793ne yujunpeng2k anranqiu ariktan hongnianwang wjxee skoganti-carefy morri1eu saintaaron quant2008 andy071001 vujacicsun juicyideas expert68 kyriechow wenrongrong xiaoxiaoma549 18801790654 surviver007 lsz19960814 heyliutian hankwutw shanshan-he

alphagen's Issues

因子size较小时的test ic和test rank ic效果太好的问题

你好，请问一下为什么当PPO的step较小时，产生的少量因子在测试集上会有这么好的性能？而且好像大多数test ic和test rank ic一开始都是绝对值较大的负数，随着step的增大，他们才逐渐往绝对值大的正的方向上走？希望能帮我解答这个疑惑，感谢！

ModuleNotFoundError: No module named 'qlib.utils'

Traceback (most recent call last):
File "/mnt/d/develop/workspace/myRepository/alphagen/data_collection/fetch_baostock_data.py", line 14, in
from qlib_dump_bin import DumpDataAll
File "/mnt/d/develop/workspace/myRepository/alphagen/data_collection/qlib_dump_bin.py", line 17, in
from qlib.utils import fname_to_code, code_to_fname
ModuleNotFoundError: No module named 'qlib.utils'

OS: wsl(ubuntu 20),python(3.8)
Is it a qlib version issue?

AlphaPool init error

AlphaPool is inited with wrong params

数据复权处似乎有误

如下图所示，1处的df.index并不是date索引，而是默认的0，1，2这样的数字索引，应该是date才对吧？所以前面应该先df.set_index("date")吧

ValueError: instrument not exists

File "~/qlib/data/storage/file_storage.py", line 73, in check
raise ValueError(f"{self.storage_name} not exists: {self.uri}")
ValueError: instrument not exists: ~/.qlib/qlib_data/cn_data_rolling/instruments/csi300.txt

There is only all.txt in ~/.qlib/qlib_data/cn_data_rolling/instruments/
The version of qlib is v0.9.3

LSTMSharedNet assert error

ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[ValueError: can't find a freq from [] that can resample to day!].

when I run the "train_maskable_ppo.py", the error occurs.

AlphaPool的device参数默认是cpu，可以改为gpu吗？

在train_maskable_ppo.py中main函数中，有如下调用，没有设置AlphaPool的device参数，这样就会使用其默认参数 device: torch.device = torch.device('cpu')。我的问题是，如果机器有gpu，是不是这里应该设device为使用gpu？
pool = AlphaPool(
capacity=pool_capacity,
calculator=calculator_train,
ic_lower_bound=None,
l1_alpha=5e-3
)

1 关于环境问题

请问，这个需要tensorflow1.x 环境吗

内存占用过高，运行时间过长问题

想知道用cpu和GPU运行一次各需要多少时间。如果要复现KDD那篇论文，10次随机种子实验，因子池大小取【10,20,50,100】，多进程跑起来内存根本不够，想知道如何解决这个问题，望解答，十分感谢！

[Feat] 单因子挖掘

如果我不想挖掘一个因子池，只是想挖掘出一个最好的因子，应该改哪些地方呢？

或许，因子池加权求和后，本身也就变成了一个因子。

test_calculator=calculator_train?

in train_maskable_ppo.py:
checkpoint_callback = CustomCallback(
save_freq=10000,
show_freq=10000,
save_path='/path/for/checkpoints',
valid_calculator=calculator_valid,
test_calculator=calculator_train,
name_prefix=name_prefix,
timestamp=timestamp,
verbose=1,
)

valid验证集数据是不是没有用？

train_maskable_ppo.py中数据被划分成训练集、验证集、测试集。训练集数据用在
AlphaPool中，验证集和测试集放进了CustomCallback的参数。

但观察CustomCallback代码，它只使用测试集test_calculator去计算ic，并没有使用valid_calculator干什么事。所以我感觉验证集数据没被系统使用，请问是不是这样的？谢谢。

MaskablePPO model.learn Issue

When I run train_maskable_ppo.py, I encountered the issue in model.learn(
total_timesteps=steps,
callback=checkpoint_callback,
tb_log_name=f'{name_prefix}_{timestamp}',
)
The issue is in the attached pictures.

df is a Empty DataFrame

您好！
我在运行train_maskable_ppo.py时，能获取到data_train，data_valid, data_test的数据
data_train = StockData(instrument=instruments,
start_time='2009-01-01',
end_time='2018-12-31')
data_valid = StockData(instrument=instruments,
start_time='2019-01-01',
end_time='2019-12-31')
data_test = StockData(instrument=instruments,
start_time='2020-01-01',
end_time='2021-12-31')
而在运行trade_decision.py时，程序运行时会调用
StockData(instrument=instruments,
start_time='2022-11-18',
end_time='2023-11-17'
)
此时获取不到数据。
请问这是什么原因呢？是还不支持获取最新的数据嘛？

ValueError: too many values to unpack

[1262:MainThread](2023-08-17 09:03:31,182) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[ValueError: too many values to unpack (expected 2)].
File "train_maskable_ppo.py", line 176, in
main(seed=seed, instruments=instruments, pool_capacity=capacity, steps=steps[capacity])
File "train_maskable_ppo.py", line 159, in main
model.learn(
File "/home/users/zhangzh/anaconda3/envs/pythonProject/lib/python3.8/site-packages/sb3_contrib/ppo_mask/ppo_mask.py", line 514, in learn
total_timesteps, callback = self._setup_learn(
File "/home/users/zhangzh/anaconda3/envs/pythonProject/lib/python3.8/site-packages/sb3_contrib/ppo_mask/ppo_mask.py", line 239, in _setup_learn
self._last_obs = self.env.reset()
File "/home/users/zhangzh/anaconda3/envs/pythonProject/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 76, in reset
obs, self.reset_infos[env_idx] = self.envs[env_idx].reset(seed=self._seeds[env_idx])
ValueError: too many values to unpack (expected 2)
在运行train_mask_ppo时遇到这个错误？是因为什么呢

stock data loading problems

你好，

我们在尝试运行你的代码时遇到了一些问题，在运行 train_maskable_ppo.py 文件时，在data_train= stockdata(........),时报错。显示 cannot reshape an array of size 7915950 into shape （6，630）。

我们对您的论文的理论和应用很感兴趣，希望可以得到您的回复，和您论文的一些解答。

best

gp.py

gp.py中最开始的cache是从哪里读入的？

机器是集成显卡，没有gpu，怎样运行例子文件？

我的机器是集成显卡的，运行train_maskable_ppo.py时出如下错误：

看样子是需要gpu。

我的问题是：要做什么修改才能让这个例子在集成显卡的机器上运行？是否换特定版本的torch就可以在集成显卡的机器运行？

values.reshape error

data = StockData(instrument=instruments,
start_time='2009-01-01',
end_time='2018-12-31')

stock_data.py", line 72, in _get_data
values = values.reshape((-1, len(features), values.shape[-1])) # type: ignore

ValueError: cannot reshape array of size 8015930 into shape (6,626)

negative days occur

while running calculator_test, this project is trying to create a wrong shape tensor, as days equal to -93

找不到csi300.txt; ValueError: instrument not exists: C:\Users\User_name\.qlib\qlib_data\cn_data_rolling\instruments\csi300.txt

采用了 data_collection/fetch_baostock_data.py获取data，数据保存位置只有

请问这是为什么？

请问dso中的tf版本多少？

我用tf2.X有出错。例如 AttributeError: module 'tensorflow' has no attribute 'Session'

训练中证全指时内存不足该如何解决

在我训练中证全指时，显示需要很大的内存来存储，请问该如何解决呢，是否可以不进行存储只输出表达式呢

An exception has been raised[RuntimeError: Trying to create tensor with negative dimension -93: [-93, 300]].

after copying csi100.txt, csi300.txt, csi500.txt from cn_data to cn_data_rolling,
i run train_maskable.py with(seed=0, code='csi300', pool=10, step= 200_000),
and it raise the error:"RuntimeError: Trying to create tensor with negative dimension -93: [-93, 300]" . When i debug, it shows "days = period.stop - period.start - 1 + data.n_days" here turns to -93.

please let me know how to fix it.

gp.py在cpu下出错

我设置了device使用cpu，运行gp.py出现如下错误，请问怎么解决?谢谢！
[17036:MainThread](2023-11-15 22:18:57,732) INFO - qlib.Initialization - [init.py:74] - qlib successfully initialized based on client settings.
[17036:MainThread](2023-11-15 22:18:57,733) INFO - qlib.Initialization - [init.py:76] - data_path={'__DEFAULT_FREQ': WindowsPath('G:/qlibtutor/qlib_data/rq_cn_data_h5')}
[30284:MainThread](2023-11-15 22:18:57,759) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.].

File "", line 1, in
File "E:\anaconda3\envs\qlib230908\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "E:\anaconda3\envs\qlib230908\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "E:\anaconda3\envs\qlib230908\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "E:\anaconda3\envs\qlib230908\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "E:\anaconda3\envs\qlib230908\lib\runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "E:\anaconda3\envs\qlib230908\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "E:\anaconda3\envs\qlib230908\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "e:\myquant\alphagen-master\gp_qtb.py", line 29, in
data_train = StockData(instruments, '2009-01-01', '2018-12-31', device=device)
File "e:\myquant\alphagen-master\alphagen_qlib\stock_data.py", line 37, in init
self.data, self._dates, self._stock_ids = self._get_data()
File "e:\myquant\alphagen-master\alphagen_qlib\stock_data.py", line 67, in _get_data
df = self._load_exprs(features)
File "e:\myquant\alphagen-master\alphagen_qlib\stock_data.py", line 62, in _load_exprs
return (QlibDataLoader(config=exprs) # type: ignore
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\dataset\loader.py", line 143, in load
df = self.load_group_df(instruments, exprs, names, start_time, end_time)
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\dataset\loader.py", line 217, in load_group_df
df = D.features(instruments, exprs, start_time, end_time, freq=freq, inst_processors=inst_processors)
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\data.py", line 1191, in features
return DatasetD.dataset(instruments, fields, start_time, end_time, freq, inst_processors=inst_processors)
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\data.py", line 924, in dataset
data = self.dataset_processor(
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\data.py", line 578, in dataset_processor
ParallelExt(n_jobs=workers, backend=C.joblib_backend, maxtasksperchild=C.maxtasksperchild)(task_l),
File "E:\anaconda3\envs\qlib230908\lib\site-packages\joblib\parallel.py", line 1854, in call
n_jobs = self._initialize_backend()
File "E:\anaconda3\envs\qlib230908\lib\site-packages\joblib\parallel.py", line 1332, in _initialize_backend
n_jobs = self._backend.configure(n_jobs=self.n_jobs, parallel=self,
File "E:\anaconda3\envs\qlib230908\lib\site-packages\joblib_parallel_backends.py", line 526, in configure
self._pool = MemmappingPool(n_jobs, **memmappingpool_args)
[26132:MainThread](2023-11-15 22:18:57,769) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.].

File "E:\anaconda3\envs\qlib230908\lib\site-packages\joblib\pool.py", line 323, in init

请问跑这个需要多少计算资源？

wrong requirements

In requirement, you set qlib==0.0.2.dev20, which do not exist in microsoft pyqlib versions. OS I use is window 10, should I switch to Linux or not?

能否将qlib的算子全部替换为alphagen的算子？

qlib自带的算子缺少截面算子，我想问能否将qlib的算子全部替换为alphagen的算子？

use np.linalg.lstsq to _optimize?

def _optimize(self, alpha: float, lr: float, n_iter: int) -> np.ndarray:
try:
return np.linalg.lstsq(self.mutual_ics[:self.size, :self.size],self.single_ics[:self.size])[0]
except:
return self.weights[:self.size];

very fast

Unable to reproduce the code in steps due to qlib data issues

➜ python train.py 
[160896:MainThread](2023-06-05 11:04:42,018) INFO - qlib.Initialization - [config.py:416] - default_conf: client.
[160896:MainThread](2023-06-05 11:04:42,509) INFO - qlib.Initialization - [__init__.py:74] - qlib successfully initialized based on client settings.
[160896:MainThread](2023-06-05 11:04:42,509) INFO - qlib.Initialization - [__init__.py:76] - data_path={'__DEFAULT_FREQ': PosixPath('/home/ray/.qlib/qlib_data/cn_data')}
[160896:MainThread](2023-06-05 11:04:49,568) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[ValueError: cannot reshape array of size 7915950 into shape (6,630)].
  File "train.py", line 175, in <module>
    main(seed=seed, instruments=instruments, pool_capacity=capacity, steps=steps[capacity])
  File "train.py", line 105, in main
    data = StockData(instrument=instruments,
  File "/home/ray/workspace/alphagen/alphagen_qlib/stock_data.py", line 37, in __init__
    self.data, self._dates, self._stock_ids = self._get_data()
  File "/home/ray/workspace/alphagen/alphagen_qlib/stock_data.py", line 72, in _get_data
    values = values.reshape((-1, len(features), values.shape[-1]))  # type: ignore
ValueError: cannot reshape array of size 7915950 into shape (6,630)

搜索到的最佳因子池在哪个结果文件，如何读入python？

你好，
是否最后一个checkout的json里是最佳因子池
如何将json里的字符串因子读入python成可执行的expr，以验证因子计算，对比论文中的结果，
及进一步利用这些因子？

bug：单标的数据维度不对

`from alphagen.data.expression import *
from alphagen_generic.features import *
import torch

if name == "main":

# device = torch.device('cuda:0')
device=torch.device('cpu')

data = StockData(instrument=["SH600038"], start_time="2021-01-01", end_time="2023-06-30", max_backtrack_days=0,max_future_days=0, device=device)
print(data.make_dataframe(data._get_data()[0]))`

运行如上代码出如下错误，如果设置参数instrument=“csi300”，则结果正确。看样子对设置单一标的，数据维度不对：
[22432:MainThread](2023-12-29 16:34:42,113) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[ValueError: number of stocks in the provided tensor (6) doesn't match that of the current StockData (1)].
File "e:/myquant/alphagen-master/test2.py", line 16, in
print(data.make_dataframe(data._get_data()[0]))
File "e:\myquant\alphagen-master\alphagen_qlib\stock_data.py", line 121, in make_dataframe
raise ValueError(f"number of stocks in the provided tensor ({n_stocks}) doesn't "
ValueError: number of stocks in the provided tensor (6) doesn't match that of the current StockData (1)

安装依赖库时发生冲突，应该如何解决？

你好，
请问在按照requirements.txt安装依赖库时出现冲突，该如何解决？
冲突如下：
The conflict is caused by:
The user requested numpy==1.20.1
gym 0.26.2 depends on numpy>=1.18.0
matplotlib 3.3.4 depends on numpy>=1.15
pandas 1.2.4 depends on numpy>=1.16.5
stable-baselines3 2.0.0 depends on numpy>=1.20
shimmy 1.1.0 depends on numpy>=1.18.0
gymnasium 0.28.1 depends on numpy>=1.21.0
希望得到您的解答
Best wishes

How to reproduce the experimental results in the paper?

I ran the results based on the time and indices in the paper, but they are far from as good as those in the paper. I wonder if there is a large difference in parameter settings from yours?

lookforward bias

Using forwarding adjusting will cause the lookforward bias, you should use backforward adjusting or non adjusting

Qlib Issue in M series CPU Mac with HPC (solved)

Using pip install pyqlib may cause multiple issues.

Directly cloning from GitHub repository might be helpful.

$ pip install numpy
$ pip install --upgrade cython
$ git clone https://github.com/microsoft/qlib.git && cd qlib
$ python setup.py install

https://qlib.readthedocs.io/en/latest/start/installation.html

FileNotFoundError: [Errno 2] No such file or directory: '/.qlib/qlib_data/cn_data/instruments/all.txt'

你好，我在命令行下运行指令：python fetch_baostock_data.py
出现以下错误：
Forward adjust date: 2023-11-04
Loading A-Shares stock list
Traceback (most recent call last):
File "fetch_baostock_data.py", line 276, in
dm.fetch_and_save_data()
File "fetch_baostock_data.py", line 253, in fetch_and_save_data
self._load_all_a_shares()
File "fetch_baostock_data.py", line 89, in _load_all_a_shares
self._load_all_a_shares_base()
File "fetch_baostock_data.py", line 81, in _load_all_a_shares_base
lines = _read_all_text(f"{self._qlib_path}/instruments/all.txt").split('\n')
File "fetch_baostock_data.py", line 18, in _read_all_text
with open(path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: ‘/.qlib/qlib_data/cn_data/instruments/all.txt'
请问是要自己创建这个all.txt文件嘛？
谢谢！！！