rl-mldm / alphagen Goto Github PK
View Code? Open in Web Editor NEWGenerating sets of formulaic alpha (predictive) stock factors via reinforcement learning.
Generating sets of formulaic alpha (predictive) stock factors via reinforcement learning.
Traceback (most recent call last):
File "/mnt/d/develop/workspace/myRepository/alphagen/data_collection/fetch_baostock_data.py", line 14, in
from qlib_dump_bin import DumpDataAll
File "/mnt/d/develop/workspace/myRepository/alphagen/data_collection/qlib_dump_bin.py", line 17, in
from qlib.utils import fname_to_code, code_to_fname
ModuleNotFoundError: No module named 'qlib.utils'
OS: wsl(ubuntu 20),python(3.8)
Is it a qlib version issue?
File "~/qlib/data/storage/file_storage.py", line 73, in check
raise ValueError(f"{self.storage_name} not exists: {self.uri}")
ValueError: instrument not exists: ~/.qlib/qlib_data/cn_data_rolling/instruments/csi300.txt
There is only all.txt in ~/.qlib/qlib_data/cn_data_rolling/instruments/
The version of qlib is v0.9.3
when I run the "train_maskable_ppo.py", the error occurs.
在train_maskable_ppo.py中main函数中,有如下调用, 没有设置AlphaPool的device参数,这样就会使用其默认参数 device: torch.device = torch.device('cpu')。我的问题是,如果机器有gpu,是不是这里应该设device为使用gpu?
pool = AlphaPool(
capacity=pool_capacity,
calculator=calculator_train,
ic_lower_bound=None,
l1_alpha=5e-3
)
1
请问,这个需要tensorflow1.x 环境吗
想知道用cpu和GPU运行一次各需要多少时间。如果要复现KDD那篇论文,10次随机种子实验,因子池大小取【10,20,50,100】,多进程跑起来内存根本不够,想知道如何解决这个问题,望解答,十分感谢!
如果我不想挖掘一个因子池,只是想挖掘出一个最好的因子,应该改哪些地方呢?
或许,因子池加权求和后,本身也就变成了一个因子。
in train_maskable_ppo.py:
checkpoint_callback = CustomCallback(
save_freq=10000,
show_freq=10000,
save_path='/path/for/checkpoints',
valid_calculator=calculator_valid,
test_calculator=calculator_train,
name_prefix=name_prefix,
timestamp=timestamp,
verbose=1,
)
train_maskable_ppo.py中数据被划分成训练集、验证集、测试集。训练集数据用在
AlphaPool中,验证集和测试集放进了CustomCallback的参数。
但观察CustomCallback代码,它只使用测试集test_calculator去计算ic,并没有使用valid_calculator干什么事。所以我感觉验证集数据没被系统使用,请问是不是这样的?谢谢。
您好!
我在运行train_maskable_ppo.py时,能获取到data_train,data_valid, data_test的数据
data_train = StockData(instrument=instruments,
start_time='2009-01-01',
end_time='2018-12-31')
data_valid = StockData(instrument=instruments,
start_time='2019-01-01',
end_time='2019-12-31')
data_test = StockData(instrument=instruments,
start_time='2020-01-01',
end_time='2021-12-31')
而在运行trade_decision.py时,程序运行时会调用
StockData(instrument=instruments,
start_time='2022-11-18',
end_time='2023-11-17'
)
此时获取不到数据。
请问这是什么原因呢?是还不支持获取最新的数据嘛?
[1262:MainThread](2023-08-17 09:03:31,182) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[ValueError: too many values to unpack (expected 2)].
File "train_maskable_ppo.py", line 176, in
main(seed=seed, instruments=instruments, pool_capacity=capacity, steps=steps[capacity])
File "train_maskable_ppo.py", line 159, in main
model.learn(
File "/home/users/zhangzh/anaconda3/envs/pythonProject/lib/python3.8/site-packages/sb3_contrib/ppo_mask/ppo_mask.py", line 514, in learn
total_timesteps, callback = self._setup_learn(
File "/home/users/zhangzh/anaconda3/envs/pythonProject/lib/python3.8/site-packages/sb3_contrib/ppo_mask/ppo_mask.py", line 239, in _setup_learn
self._last_obs = self.env.reset()
File "/home/users/zhangzh/anaconda3/envs/pythonProject/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 76, in reset
obs, self.reset_infos[env_idx] = self.envs[env_idx].reset(seed=self._seeds[env_idx])
ValueError: too many values to unpack (expected 2)
在运行train_mask_ppo时遇到这个错误?是因为什么呢
你好,
我们在尝试运行你的代码时遇到了一些问题,在运行 train_maskable_ppo.py 文件时,在data_train= stockdata(........),时报错。显示 cannot reshape an array of size 7915950 into shape (6,630)。
我们对您的论文的理论和应用很感兴趣,希望可以得到您的回复,和您论文的一些解答。
best
data = StockData(instrument=instruments,
start_time='2009-01-01',
end_time='2018-12-31')
stock_data.py", line 72, in _get_data
values = values.reshape((-1, len(features), values.shape[-1])) # type: ignore
ValueError: cannot reshape array of size 8015930 into shape (6,626)
我用tf2.X有出错。例如 AttributeError: module 'tensorflow' has no attribute 'Session'
在我训练中证全指时,显示需要很大的内存来存储,请问该如何解决呢,是否可以不进行存储只输出表达式呢
after copying csi100.txt, csi300.txt, csi500.txt from cn_data to cn_data_rolling,
i run train_maskable.py with(seed=0, code='csi300', pool=10, step= 200_000),
and it raise the error:"RuntimeError: Trying to create tensor with negative dimension -93: [-93, 300]" . When i debug, it shows "days = period.stop - period.start - 1 + data.n_days" here turns to -93.
please let me know how to fix it.
我设置了device使用cpu,运行gp.py出现如下错误,请问怎么解决?谢谢!
[17036:MainThread](2023-11-15 22:18:57,732) INFO - qlib.Initialization - [init.py:74] - qlib successfully initialized based on client settings.
[17036:MainThread](2023-11-15 22:18:57,733) INFO - qlib.Initialization - [init.py:76] - data_path={'__DEFAULT_FREQ': WindowsPath('G:/qlibtutor/qlib_data/rq_cn_data_h5')}
[30284:MainThread](2023-11-15 22:18:57,759) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.].
File "", line 1, in
File "E:\anaconda3\envs\qlib230908\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "E:\anaconda3\envs\qlib230908\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "E:\anaconda3\envs\qlib230908\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "E:\anaconda3\envs\qlib230908\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "E:\anaconda3\envs\qlib230908\lib\runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "E:\anaconda3\envs\qlib230908\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "E:\anaconda3\envs\qlib230908\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "e:\myquant\alphagen-master\gp_qtb.py", line 29, in
data_train = StockData(instruments, '2009-01-01', '2018-12-31', device=device)
File "e:\myquant\alphagen-master\alphagen_qlib\stock_data.py", line 37, in init
self.data, self._dates, self._stock_ids = self._get_data()
File "e:\myquant\alphagen-master\alphagen_qlib\stock_data.py", line 67, in _get_data
df = self._load_exprs(features)
File "e:\myquant\alphagen-master\alphagen_qlib\stock_data.py", line 62, in _load_exprs
return (QlibDataLoader(config=exprs) # type: ignore
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\dataset\loader.py", line 143, in load
df = self.load_group_df(instruments, exprs, names, start_time, end_time)
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\dataset\loader.py", line 217, in load_group_df
df = D.features(instruments, exprs, start_time, end_time, freq=freq, inst_processors=inst_processors)
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\data.py", line 1191, in features
return DatasetD.dataset(instruments, fields, start_time, end_time, freq, inst_processors=inst_processors)
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\data.py", line 924, in dataset
data = self.dataset_processor(
File "E:\anaconda3\envs\qlib230908\lib\site-packages\qlib\data\data.py", line 578, in dataset_processor
ParallelExt(n_jobs=workers, backend=C.joblib_backend, maxtasksperchild=C.maxtasksperchild)(task_l),
File "E:\anaconda3\envs\qlib230908\lib\site-packages\joblib\parallel.py", line 1854, in call
n_jobs = self._initialize_backend()
File "E:\anaconda3\envs\qlib230908\lib\site-packages\joblib\parallel.py", line 1332, in _initialize_backend
n_jobs = self._backend.configure(n_jobs=self.n_jobs, parallel=self,
File "E:\anaconda3\envs\qlib230908\lib\site-packages\joblib_parallel_backends.py", line 526, in configure
self._pool = MemmappingPool(n_jobs, **memmappingpool_args)
[26132:MainThread](2023-11-15 22:18:57,769) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.].
File "E:\anaconda3\envs\qlib230908\lib\site-packages\joblib\pool.py", line 323, in init
In requirement, you set qlib==0.0.2.dev20, which do not exist in microsoft pyqlib versions. OS I use is window 10, should I switch to Linux or not?
qlib自带的算子缺少截面算子,我想问能否将qlib的算子全部替换为alphagen的算子?
def _optimize(self, alpha: float, lr: float, n_iter: int) -> np.ndarray:
try:
return np.linalg.lstsq(self.mutual_ics[:self.size, :self.size],self.single_ics[:self.size])[0]
except:
return self.weights[:self.size];
very fast
➜ python train.py
[160896:MainThread](2023-06-05 11:04:42,018) INFO - qlib.Initialization - [config.py:416] - default_conf: client.
[160896:MainThread](2023-06-05 11:04:42,509) INFO - qlib.Initialization - [__init__.py:74] - qlib successfully initialized based on client settings.
[160896:MainThread](2023-06-05 11:04:42,509) INFO - qlib.Initialization - [__init__.py:76] - data_path={'__DEFAULT_FREQ': PosixPath('/home/ray/.qlib/qlib_data/cn_data')}
[160896:MainThread](2023-06-05 11:04:49,568) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[ValueError: cannot reshape array of size 7915950 into shape (6,630)].
File "train.py", line 175, in <module>
main(seed=seed, instruments=instruments, pool_capacity=capacity, steps=steps[capacity])
File "train.py", line 105, in main
data = StockData(instrument=instruments,
File "/home/ray/workspace/alphagen/alphagen_qlib/stock_data.py", line 37, in __init__
self.data, self._dates, self._stock_ids = self._get_data()
File "/home/ray/workspace/alphagen/alphagen_qlib/stock_data.py", line 72, in _get_data
values = values.reshape((-1, len(features), values.shape[-1])) # type: ignore
ValueError: cannot reshape array of size 7915950 into shape (6,630)
你好,
是否最后一个checkout的json里是最佳因子池
如何将json里的字符串因子读入python成可执行的expr,以验证因子计算,对比论文中的结果,
及进一步利用这些因子?
`from alphagen.data.expression import *
from alphagen_generic.features import *
import torch
if name == "main":
# device = torch.device('cuda:0')
device=torch.device('cpu')
data = StockData(instrument=["SH600038"], start_time="2021-01-01", end_time="2023-06-30", max_backtrack_days=0,max_future_days=0, device=device)
print(data.make_dataframe(data._get_data()[0]))`
运行如上代码出如下错误,如果设置参数instrument=“csi300”,则结果正确。看样子对设置单一标的,数据维度不对:
[22432:MainThread](2023-12-29 16:34:42,113) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[ValueError: number of stocks in the provided tensor (6) doesn't match that of the current StockData (1)].
File "e:/myquant/alphagen-master/test2.py", line 16, in
print(data.make_dataframe(data._get_data()[0]))
File "e:\myquant\alphagen-master\alphagen_qlib\stock_data.py", line 121, in make_dataframe
raise ValueError(f"number of stocks in the provided tensor ({n_stocks}) doesn't "
ValueError: number of stocks in the provided tensor (6) doesn't match that of the current StockData (1)
你好,
请问在按照requirements.txt安装依赖库时出现冲突,该如何解决?
冲突如下:
The conflict is caused by:
The user requested numpy==1.20.1
gym 0.26.2 depends on numpy>=1.18.0
matplotlib 3.3.4 depends on numpy>=1.15
pandas 1.2.4 depends on numpy>=1.16.5
stable-baselines3 2.0.0 depends on numpy>=1.20
shimmy 1.1.0 depends on numpy>=1.18.0
gymnasium 0.28.1 depends on numpy>=1.21.0
希望得到您的解答
Best wishes
I ran the results based on the time and indices in the paper, but they are far from as good as those in the paper. I wonder if there is a large difference in parameter settings from yours?
Using forwarding adjusting will cause the lookforward bias, you should use backforward adjusting or non adjusting
Using pip install pyqlib may cause multiple issues.
Directly cloning from GitHub repository might be helpful.
$ pip install numpy
$ pip install --upgrade cython
$ git clone https://github.com/microsoft/qlib.git && cd qlib
$ python setup.py install
https://qlib.readthedocs.io/en/latest/start/installation.html
你好,我在命令行下运行指令:python fetch_baostock_data.py
出现以下错误:
Forward adjust date: 2023-11-04
Loading A-Shares stock list
Traceback (most recent call last):
File "fetch_baostock_data.py", line 276, in
dm.fetch_and_save_data()
File "fetch_baostock_data.py", line 253, in fetch_and_save_data
self._load_all_a_shares()
File "fetch_baostock_data.py", line 89, in _load_all_a_shares
self._load_all_a_shares_base()
File "fetch_baostock_data.py", line 81, in _load_all_a_shares_base
lines = _read_all_text(f"{self._qlib_path}/instruments/all.txt").split('\n')
File "fetch_baostock_data.py", line 18, in _read_all_text
with open(path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: ‘/.qlib/qlib_data/cn_data/instruments/all.txt'
请问是要自己创建这个all.txt文件嘛?
谢谢!!!
请问class CSRank(UnaryOperator)是用来生成截面因子的算子吗?可以在qlib中使用吗?
使用这个命令:python train_maskable_ppo.py --seed=5 --pool=101 --code='csi300' --step=100
最后只生成了3个因子。。
I found the interesting structure of alphagen.models.model.ExpressionGenerator but I didn't find any description and use of alphagen.models.model.ExpressionGenerator. What did you or would you use the part in the whole pipeline?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.