Coder Social home page Coder Social logo

gpquant's Introduction

gpquant

Introduction

As "genetic programming for quant", gpquant is a modification of the genetic algorithm package gplearn in Python, used for factor mining.

Modules

Function

Functions that calculate factors are implemented using the functional class Function, which includes 23 basic functions and 37 time series functions. All functions are essentially scalar functions, but because vectorized computation is used, both inputs and outputs are in vector form.

Fitness

Fitness evaluation indicators are implemented using the functional class Fitness, which includes several fitness functions, mainly the Sharpe Ratio ("sharpe_ratio").

Backtester

The vectorized factor backtesting framework follows the logic of first using the defined strategy function to turn the received "factor" into a "signal", and then using the signal processing function to turn the signal into an "asset" to implement backtesting. These two steps are combined in the functional class Backtester.

SyntaxTree

The formula tree is used to write the calculation formula of the factor in prefix notation, and is represented using the formula tree SyntaxTree. Each formula tree represents a factor, and is composed of Node's; each Node contains its own data, parent node, and child nodes. The Node's own data can be a Function, variable, constant, or time-series constant.

The formula tree can be crossed over subtree mutated, hoisted, point mutated or reproduced (logic can be referred to gplearn).

SymbolicRegressor

It contains the symbolic regression class (SymbolicRegressor). gpquant essentially uses genetic algorithms to solve the symbolic regression problem, and defines some parameters during the genetic process, such as population size and number of generations.

Usage

Import

Download the gpquant package (pip install gpquant) and import the SymbolicRegressor class.

Test

Like the example in gplearn, performing symbolic regression on $y=X_0^2 - X_1^2 + X_1 - 1$ with respect to $X_0$ and $X_1$ can yield the correct answer at around the 9th generation.

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.utils import *
from gpquant.SymbolicRegressor import SymbolicRegressor


# Step 1
x0 = np.arange(-1, 1, 1 / 10.0)
x1 = np.arange(-1, 1, 1 / 10.0)
x0, x1 = np.meshgrid(x0, x1)
y_truth = x0**2 - x1**2 + x1 - 1

ax = plt.figure().gca(projection="3d")
ax.set_xlim(-1, 1)
ax.set_ylim(-1, 1)
surf = ax.plot_surface(x0, x1, y_truth, rstride=1, cstride=1, color="green", alpha=0.5)
plt.show()

# Step 2
rng = check_random_state(0)

# training samples
X_train = rng.uniform(-1, 1, 100).reshape(50, 2)
y_train = X_train[:, 0] ** 2 - X_train[:, 1] ** 2 + X_train[:, 1] - 1
X_train = pd.DataFrame(X_train, columns=["X0", "X1"])
y_train = pd.Series(y_train)

# testing samples
X_test = rng.uniform(-1, 1, 100).reshape(50, 2)
y_test = X_test[:, 0] ** 2 - X_test[:, 1] ** 2 + X_test[:, 1] - 1

# Step 3
sr = SymbolicRegressor(
    population_size=2000,
    tournament_size=20,
    generations=20,
    stopping_criteria=0.01,
    p_crossover=0.7,
    p_subtree_mutate=0.1,
    p_hoist_mutate=0.1,
    p_point_mutate=0.05,
    init_depth=(6, 8),
    init_method="half and half",
    function_set=["add", "sub", "mul", "div", "square"],
    variable_set=["X0", "X1"],
    const_range=(0, 1),
    ts_const_range=(0, 1),
    build_preference=[0.75, 0.75],
    metric="mean absolute error",
    parsimony_coefficient=0.01,
)

sr.fit(X_train, y_train)

# Step 4
print(sr.best_estimator)

gpquant

介绍

gpquant是对Python的遗传算法包gplearn的一个改造,用于进行因子挖掘

模块

Function

计算因子的函数,用仿函数类Function实现了23个基本函数和37个时间序列函数。所有的函数本质上都是标量函数,但因为采用了向量化计算,所以输入和输出都是向量形式

Fitness

适应度评价指标,用仿函数类Fitness实现了几个适应度函数,主要是应用其中的夏普比率sharpe_ratio

Backtester

向量化的因子回测框架,逻辑是先根据定义的策略函数把拿到的因子factor变成信号signal,再通过信号处理函数把信号signal变成资产asset实现回测,这两步统一在仿函数Backtester类里实现

SyntaxTree

公式树,把因子的计算公式写成前缀表达式,然后用公式树SyntaxTree表示。每一个公式树代表一个因子,由节点Node构成;每个Node存放了自身数据、父节点和子节点。节点的自身数据可以是Function、变量、常量,或者时间序列常数

公式树可以交叉crossover、子树突变subtree_mutate、提升突变hoist_mutate、点突变point_mutate或者繁殖reproduce(逻辑可参照gplearn)

SymbolicRegressor

符号回归类,gpquant因子挖掘本质上是用遗传算法解决符号回归问题,其中定义了遗传过程中的一些参数,如种群数量population_size、遗传代数generations等

使用

导入

下载gpquant包(pip install gpquant),导入SymbolicRegressor类

from gpquant.SymbolicRegressor import SymbolicRegressor

测试

跟gplearn一样的例子,把$y=X_0^2 - X_1^2 + X_1 - 1$对$X_0$和$X_1$进行符号回归,大约在第9代能找到正确答案

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.utils import *
from gpquant.SymbolicRegressor import SymbolicRegressor


# Step 1
x0 = np.arange(-1, 1, 1 / 10.0)
x1 = np.arange(-1, 1, 1 / 10.0)
x0, x1 = np.meshgrid(x0, x1)
y_truth = x0**2 - x1**2 + x1 - 1

ax = plt.figure().gca(projection="3d")
ax.set_xlim(-1, 1)
ax.set_ylim(-1, 1)
surf = ax.plot_surface(x0, x1, y_truth, rstride=1, cstride=1, color="green", alpha=0.5)
plt.show()

# Step 2
rng = check_random_state(0)

# training samples
X_train = rng.uniform(-1, 1, 100).reshape(50, 2)
y_train = X_train[:, 0] ** 2 - X_train[:, 1] ** 2 + X_train[:, 1] - 1
X_train = pd.DataFrame(X_train, columns=["X0", "X1"])
y_train = pd.Series(y_train)

# testing samples
X_test = rng.uniform(-1, 1, 100).reshape(50, 2)
y_test = X_test[:, 0] ** 2 - X_test[:, 1] ** 2 + X_test[:, 1] - 1

# Step 3
sr = SymbolicRegressor(
    population_size=2000,
    tournament_size=20,
    generations=20,
    stopping_criteria=0.01,
    p_crossover=0.7,
    p_subtree_mutate=0.1,
    p_hoist_mutate=0.1,
    p_point_mutate=0.05,
    init_depth=(6, 8),
    init_method="half and half",
    function_set=["add", "sub", "mul", "div", "square"],
    variable_set=["X0", "X1"],
    const_range=(0, 1),
    ts_const_range=(0, 1),
    build_preference=[0.75, 0.75],
    metric="mean absolute error",
    parsimony_coefficient=0.01,
)

sr.fit(X_train, y_train)

# Step 4
print(sr.best_estimator)

gpquant's People

Contributors

uepg-21 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

gpquant's Issues

进化方式

你好大佬,请问代码中的轮盘赌的方式是否是每次随机选择,而没有对前一个子代的优质公式进行保留,用于变异进化

import SymbolicRegressor failed

In centos server (CentOS Linux release 7.9.2009) and python 3.8 environment, import SymbolicRegressor failed.
Error details like below:
[baikai@localhost gp]$ python3
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

from gpquant.SymbolicRegressor import SymbolicRegressor
Traceback (most recent call last):
File "", line 1, in
File "/home/baikai/.local/lib/python3.8/site-packages/gpquant/SymbolicRegressor.py", line 10, in
from .SyntaxTree import SyntaxTree
File "/home/baikai/.local/lib/python3.8/site-packages/gpquant/SyntaxTree.py", line 61, in
class SyntaxTree:
File "/home/baikai/.local/lib/python3.8/site-packages/gpquant/SyntaxTree.py", line 173, in SyntaxTree
def __flatten(self) -> list[Node]:
TypeError: 'type' object is not subscriptable

How to use backtester?

Do I understand correctly that if I want to use metrics such as annual return or shape ratio, I need to set the parameter 'transformer' to 'quantile'? I am then wondering about the method of specifying 'transformer_kwargs' in order to use that metric. And how do we set up the DataFrame (df) and signal for the backtester?"

Thank!

no more Generation

传入数据后似乎没有迭代,并且好像调不到时序函数

微信图片_20231229111728

Seems gpquant not support multi-process .

Seems gpquant not support multi-process , cannot find parameters like 'n_jobs' in gplearn to control concurrency .
SymbolicRegressor logic running inside one process, and will be very slow if we use large dataset.

多个序列fit

请问如何fit多个股票序列呢,这个项目看起来只能fit一支股票,不知道有没有pandas groupby等trick能找到多个股票上同时fitness比较好的因子

the problem of sharpe_ratio

微信图片_20240109105621 请问代码中调整的夏普指的是什么,这样做的目的是什么,直接将无风险收益率变小,使夏普比率为正来优化目标适合会更合理呢。感谢大佬回复!

Orthogonal factors

Hello, may I please ask how to generate multiple orthogonal factors that tries to fit y with X? It seems that the current implementation only supports fitting one expression for one y, so some manual decomposition of Y is required. If you are too busy to implement, please share how would you try to tackle this. Here is my thought:

Implement another fitness eval function, that is sharpe - sum(correlation(prev_results))

Hope to hear from you soon, nice work!

Best regards,
JU PING

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.