Coder Social home page Coder Social logo

fin's People

Stargazers

 avatar

Watchers

 avatar

fin's Issues

Use the index as an indirection table toward to actual data rows.

cdef Column _index

It is confusing the have the index as a column separated from the data columns.

A different approach would be to handle the index data as an ordinary column but use the index itself as an indirection to retrieve the rows in a given order. The extra level of indirection should be negligible for the C/Cython code. For Python code, this will require an extra level of __getitem__.

`Serie.from_csv()` should warn if a cell does not contain valid data

Serie.from_csv() should warn if a cell does not contain valid data.

Maybe Serie.from_data() should do the same.

fin/fin/seq/serie.pyx

Lines 156 to 182 in d79b4f0

import csv
from fin import datetime
cdef Serie serie_from_csv(iterator, str formats, fieldnames, str delimiter, dict kwargs):
"""
Create a new serie by iterating over CSV data rows.
"""
rows = []
types = parse_types(formats)
reader = csv.reader(iterator, delimiter=delimiter)
if fieldnames is not None:
heading = [str(fieldname) for fieldname in fieldnames]
else:
# default to first line
heading = [fieldname.strip() for fieldname in next(reader)]
rows = list(reader)
cols = []
names = []
for name, tps, col in zip(heading, types, zip(*rows)):
names.append(name)
cols.append(tps.parse_string_sequence(col))
result = serie_from_data(cols, names, types, kwargs)
# if select:
# result = result.select(*select)
return result

Optionally, we may consider adding a keyword parameter to specify the authorized "n/a" values that will be converted to None/NaN.

Consider typing columns

Since the addition of formatters, columns are implicitly typed. However, this may lead to implementation problems like those in series_from_csv, where the formatter-related code looks like a hack.

Consider explicitly typing the columns instead. The exact formatter, if not provided, would then be inferred from the type and possibly the data (i.e., to auto-detect the precision for floating-point numbers)

Run code snippets in test

We should change the Makefile to run the code snippets in docs/snippets and check their output is unchanged.

Make it easy to calculate values on a complete column.

We should make it easy to calculate values on a complete column.
Currently, if you want the mean of a column in a series, you have to write:

from fin.seq.serie import Serie
from fin.seq import fc, ag

ser = Serie.from_csv_file(
        "tests/_fixtures/MCD-20200103-20230103.csv",
        format="dnnnnni"
    ).group_by(
        fc.constant(True),
        (ag.first, "Date"),
        (ag.avg, "Open", "Close"),
    )

print(ser)
avg_open = ser["Open"].columns[0][-1]
print(avg_open)

We can clearly do better. This may imply:

  • Adding a way to directly access data columns by name (FWIW, indices are stored independently from data columns)
  • Adding a way to apply a function directly to a whole column.

We may leverage aggregate functions for that.
Ideally, aggregate functions may also be usable as window functions. This would be less efficient than specifically designing code, but it would avoid code duplication.

Currently, the problem is that aggregate functions are designed to apply to a set of columns rather than individual columns for efficiency reasons. Notice the for col in cols list comprehension in the code below:

fin/fin/seq/ag/core.py

Lines 25 to 30 in 6ba6041

class _Avg(AggregateFunction):
def type_for(self, column):
return coltypes.Float()
def __call__(self, *cols):
return [sum(col)/len(col) for col in cols]

Check if we can return a t-expr from a function

Check if we can return a t-expr from a function. This would permit to define high-level functions from lower-level blocks:

def sma(rowcount, col):
    return (div, (sum, col), rowcount)

Is this a desirable feature?

Serie to string conversion should default to tabular mode

fin/README.md

Lines 52 to 56 in 1d7dbf8

Date, Open, High, Low, Close, Adj Close, Volume
2023-07-17, 286.630005, 292.230011, 283.570007, 290.380005, 290.380005, 131569600
2023-07-18, 290.149994, 295.26001, 286.01001, 293.339996, 293.339996, 112434700
2023-07-19, 296.040009, 299.290009, 289.519989, 291.26001, 291.26001, 142355400
2023-07-20, 279.559998, 280.929993, 261.200012, 262.899994, 262.899994, 175158300

The default output format should be tabular, not csv.

An option would add (if feasible?) a second parameter to the Serie.__str__ method.

Auto-normalize solver's constraints

Both with RandomSolver and ParticleSwarmSolver, when an equilibrium function gains significantly more weight than the other, the constraint solver fails to find an optimal solution.

We should provide a solution to roughly normalize the different constraints to evaluate close to the [-1, +1] range.

Cythonize `column.Column`

One path to improve performance is by using Cython to perform calculations natively without the Python float overhead.

As an experiment, we may try to implement column.Column with Cython to store values as a native array of float.

The ParticleSwampSolver seems to fail the best solution when constraints are unsolvable

When using unsolvable constraints, we might hope for the solver to find the "best" answer in the domain range.
Here, we obtain many zero values:

Score 0.0
Maximum loss        : 0.0
Take profit         : 0.0
Entry price         : 0.0
Stop price          : 0.0
Quantity            : 8.204826221431059

The example above was produced by the following code:

"""
Evaluate a position size, stop price, and take profit from a maximum loss
and entry price.

Usage:
    PYTHONPATH="$PWD" python3 examples/fin/model/pricing.py
"""
from fin.model.complexmodel import ComplexModel

UNKNOWN = (0, 1000)
QTY = (1,10)
ENTRY_PRICE = 0.9024
STOP_PRICE = 0.8614
TAKE_PROFIT = 0.9498
MAX_LOSS = 21

def fees(price):
    return price*0.1/100

def balance(debit, credit):
    return (credit-fees(credit))-(debit+fees(debit))

model = ComplexModel()
eq1 = model.register(
        lambda entry, stop, qty, maxloss : balance(entry*qty, stop*qty) - -maxloss,
        dict(name="entry", description="Entry price", domain=ENTRY_PRICE),
        dict(name="stop", description="Stop price", domain=STOP_PRICE),
        dict(name="qty", description="Quantity", domain=QTY),
        dict(name="maxloss", description="Maximum loss", domain=MAX_LOSS),
    )
eq2 = model.register(
        lambda entry, stop, qty, tp : balance(entry*qty, stop*qty)*2 + balance(entry*qty, tp*qty),
        dict(name="entry", description="Entry price", domain=ENTRY_PRICE),
        dict(name="stop", description="Stop price", domain=STOP_PRICE),
        dict(name="qty", description="Quantity", domain=QTY),
        dict(name="tp", description="Take profit", domain=TAKE_PROFIT),
    )

model.bind(eq1, "entry", eq2, "entry")
model.bind(eq1, "stop", eq2, "stop")
model.bind(eq1, "qty", eq2, "qty")

params, domains, eqs = model.export()
# from pprint import pprint
# pprint(params)
# pprint(domains)
# pprint(eqs)

from fin.model.solvers import ParticleSwarmSolver
solver = ParticleSwarmSolver(5,300)
score, result = solver.solve(domains, eqs)

print(f"Score {score}")
for param, value in zip(params, result):
    print(f"{param['description']:20s}: {value}")

XY plots use the row number instead of the actual X value

element = _GNUPlotDataElement(kind, ["(column(0))", y])

This was designed like that because of dates. When plotting financial graphs, the usage is to ignore non-trading days (WE, holidays, ...)

Ideas:

  • Add an option in plots to choose between row number and x values
  • Require the user to explicitly plot against row number when required

Distinguish between `from_sequence` and `import`

fin/fin/seq/column.pyx

Lines 384 to 392 in 16ec607

@staticmethod
def from_sequence(sequence, *, convert=True, **kwargs):
"""
Create a Column from a sequence of Python objects.
"""
cdef Column column = Column(**kwargs)
column._py_values = column._type.from_sequence(sequence) if convert else tuple(sequence)
return column

The Column.from_sequence method seems to have different purposes depending on the value of the convert parameter.
We may distinguish between from_sequence, which creates a column from Python objects, and import, which creates a sequence from the textual representation of values.

Implement the `drawdown` function

We should implement the drawdown function.

A strategy suffers a drawdown whenever it has lost money recently.
A drawdown at a given time t is defined as the difference between
the current equity value (assuming no redemption or cash infusion)
of the portfolio and the global maximum of the equity curve occur-
ring on or before time t. The maximum drawdown is the difference
between the global maximum of the equity curve with the global
minimum of the curve after the occurrence of the global maximum
(time order matters here: The global minimum must occur later
than the global maximum).
— Quantitative Trading (Chan) ,p21

Consider a custom container for storing column's Python values

Consider a custom container for storing column's Python values.

Column's data storage has unique requirements that are not completely satisfied by the standard tuple type. We may consider a custom container with the following features:

  • Immutability
  • Allow zero-copy row remapping/range selection
  • Direct access to elements by index

Update README

Update the README to add the following:

  • dependencies
  • example of download + calculation
  • example of plot

Implement `Serie.group_by` and aggregate functions

Implement Serie.group_by and aggregate functions.

Note that the group_by function has a different semantic from SQL: here, rows are grouped by a consecutive sequence of values matching a condition.

Required by #27.

Possible syntax:

serie.group_by(<cond>, <aggregate fct, ...>)

Where <cond> is an arbitrary column expression (possibly a single column name) and <aggregate fct, ...> is a non-empty sequence of aggregate functions.

Example:

serie.group_by(
    (fc.gt, "CLOSE", "OPEN"),
    (ag.first, "NAME", "ID", "DATE"),
    (ag.count, (fc.named("COUNT"), "NAME")),
    (ag.avg, (fc.named("AVG PRICE"), "CLOSE")),
)

Implement several volatility estimators

Implement (in Cython) several volatility estimators:

  • Close-to-close
  • Parkinson
  • Garman-Klass

Currently, we only have the Python implementation of the close-to-close volatility:

fin/fin/seq/algo.py

Lines 188 to 212 in 78913aa

def volatility(n, tau=1/252):
"""
Compute the Annualized Historical Volatility over a n-period window.
In practice this is the standard deviation of the day-to-day return.
Parameters:
n: the number of periods in the window. Often 20 or 21 for dayly data
(corresponding to the number of trading days in one month)
tau: inverse of the number of periods in one year
"""
stddev = standard_deviation(n)
log = math.log
k = math.sqrt(1/tau)
vol = lambda stddev : stddev*k
def _volatility(rowcount, values):
# 1. Continuously compounded return for each period
ui = map_change(lambda curr, prev: log(curr/prev))(rowcount, values)
# 2. Standard deviation
result = stddev(rowcount, ui)
# 3. Annualized values
return map(vol)(rowcount, result)
return _volatility

There is no obvious way to specify the aggregate function for the index.

There is no obvious way to specify the aggregate function for the index.

I wonder if adding implicitly the index in select/group_by isn't an error. Maybe we can make the index reference mandatory. Would it be desirable to check that the index values are sorted in ascending order?

Working on that in https://github.com/s-leroux/fin/tree/exp/make-index-selection-explicit

Originally posted by @s-leroux in #30 (comment)

Other things to consider:

  • The index selection is also implicit for join operations.
  • We may introduce the new sort predicate to change the index.
  • OTOH, some use cases seem hard to express at first sight without explicit index selection. For example: "Calculate the average prices per month"

Unicode characters break the test runner on GitHub

Unicode characters break the test runner on GitHub.

The example below was caused by '…' (\u2026) in the heading while printing a Serie instance.

======================================================================
ERROR: test_adj (tests.fin.seq.fc.test_adj.TestAdjustQuote)
<class 'fin.model.solvers.particle.ParticleSwarmSolver'> Use case #1 4.565761130040028e-16 (3.9999999928003915, 7.999999995255896)
<class 'fin.model.solvers.random.RandomSolver'> Use case #1 0.05296359406606747 (4.078120375554134, 8.045150938464865)
<class 'fin.model.solvers.particle.ParticleSwarmSolver'> Use case #2 2.3216148996009935e-18 (2.0000000009396457, 2.9999999995592344)
<class 'fin.model.solvers.random.RandomSolver'> Use case #2 0.02055660096305517 (1.9510313914860744, 3.0927051599569175)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/fin/fin/tests/fin/seq/fc/test_adj.py", line 32, in test_adj
    print(res)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2026' in position 7: ordinal not in range(128)

----------------------------------------------------------------------
Ran 206 tests in 1.765s

Consider adding a tool to extract fundamental data from web pages

There is a lot of free fundamental data available on web pages. We already have experience with a web scrapper:
162c944

The code above was written especially for Investing.com.
Can we have something more generic to parse table-like data?

The requirements are to be able to parse table elements, but also eventually pseudo-tables made of div span constructs.

Adjust quotes

Yahoo Finance (and other data providers) return the quotes as (open, high, low, close, adj close). Only the close price is adjusted for splits and dividends. We should provide a function to rescale the open, high, and low accordingly.

An option is to express the open, high, and low relative to (unadjusted) close, then apply these ratios to the adjusted close price to find the other adjusted prices.

Review the differences between `algo.map()` and `expr.map()`

Review the differences between algo.map() and expr.map():

fin/fin/seq/algo.py

Lines 562 to 573 in 78913aa

def map(fct):
"""
Map data using a user-provided function.
Handle None gracefully (as opposed to `builtins.map`)
Formally, y_i = f(u_i)
"""
def _map(rowcount, values):
return Column(None, [fct(x) if x is not None else None for x in values])
return _map

fin/fin/seq/expr.py

Lines 18 to 19 in 78913aa

def map(f, *, name=None):
return lambda rowcount, *args : Column(name, [f(*row) for row in zip(*args)])

Check if:

  • we can merge both of them
  • or rename one or the other to avoid confusion

Allow row selection using subscript notation

Currently, subscript notation in Serie only allows the selection of columns:

fin/fin/seq/serie.pyx

Lines 400 to 432 in 16ec607

def __getitem__(self, selector):
t = type(selector)
if t is tuple:
return self.c_get_items(selector)
elif t is int:
return self.c_get_item_by_index(selector)
elif t is str:
return self.c_get_item_by_name(selector)
else:
raise TypeError(f"serie indices cannot be {t}")
cdef Serie c_get_items(self, tuple seq):
# Should we implement this using a recursive-descend parser to allow nested tuples?
cdef list columns = []
cdef object i
cdef type t
for i in seq:
t = type(i)
if t is int:
columns.append(serie_get_column_by_index(self, i))
elif t is str:
columns.append(serie_get_column_by_name(self, i))
else:
raise TypeError(f"serie indices cannot be {t}")
return serie_bind(self._index, tuple(columns), self.name)
cdef Serie c_get_item_by_index(self, int idx):
return serie_bind(self._index, (serie_get_column_by_index(self, idx),), self.name)
cdef Serie c_get_item_by_name(self, str name):
return serie_bind(self._index, (serie_get_column_by_name(self, name),), self.name)

We may extend the supported notation to allow row selection as well. At a minimum, we may allow row selection based on index and index range. Possibly required by #30.

Floating point data columns are sometimes displayed as ternary values

In some circumstances, floating-point number columns are displayed as ternary columns:

from fin.api.yf import Client
from fin.seq import fc

ticker = "^FCHI"
duration = dict(days=5)

client = Client()
data = client.historical_data(ticker, duration)

# Yahoo! Finance has dirty data. Do some clean-up
data = data.where(
        (fc.all, "Open", "High", "Low", "Close", "Adj Close"),
    )

print(data)

Displays:

      Date | Open     | High     | Low      | Close    | Adj Clo… |   Volume
---------- | -------- | -------- | -------- | -------- | -------- | --------
2024-05-03 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 69643700
2024-05-06 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 43781100
2024-05-07 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 58688300
2024-05-08 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |        0

It is probably caused when the same column has cached several representations. We do not have a mechanism to select the "most appropriate" representation.

Add the `union` operator

Add the union operator to combine two (more?) series that have the same columns.

This may be useful when loading data in chunks and you want to combine them to return the result as a single series.

Implement a cache strategy for end-of-day data

We should locally cache EOD data instead of systematically retrieving them from the provider.

Maybe, something along the lines of:

client = Cache(yf.Client())
t = client.historical_data(...)

Review the *t-expr* semantic

Review the t-expr semantic:

def reval(self, head, *tail):

Specifically:

  • Consider dropping the support for constants in favor of expr.constant
  • Consider accepting native Python sequences as columns (i.e., t.reval([1,2,3,4]) would return one column with the given values instead of four constant columns)

Implement Sharpe ratio

There is a preliminary implementation of the Sharpe ratio. It needs to be fixed.

fin/fin/seq/algo.py

Lines 256 to 276 in 46a55e6

def _basic_sharpe_ratio(rowcount, values):
s = iter(stddev(rowcount, values))
result = [None]*(n-1)
push = result.append
i = iter(values)
for _, _, _ in zip(range(n-1), i, s):
pass
j = iter(values)
for x_i, x_j, s_i in zip(i, j, s):
try:
ret = (x_i - x_j) # TODO replace by average daily return
push(ret/s_i)
except TypeError:
push(None)
return Column(f"BSHARPE({n}), {get_column_name(values)}", result)
return _basic_sharpe_ratio

See Risk Management in Trading (Edwrads), p109:
image

Renaming ternary columns doesn't work

This fails on the print statement:

from fin.seq.serie import Serie
from fin.seq import fc

ser = Serie.create(
        (fc.named("T"), fc.range(10)),
        (fc.named("X"), "T"),
        (fc.named("Y"), fc.all, "X"),
    )
print(ser)

But that works as expected:

from fin.seq.serie import Serie
from fin.seq import fc

ser = Serie.create(
        (fc.named("T"), fc.range(10)),
        (fc.named("X"), "T"),
        (fc.all, "X"),
    )
print(ser)

Table should know if/how their rows are sorted

Some table functions (like join()) require the table rows to be sorted. We could enforce that.

Idea: when calling table.sort(), the Table instance could remember the key used so we could check that when required.

Infinite recursion with `Table.add_column`

The code below produces an infinite recursion:

t = table.Table(361)
t.add_column((range))
  File "/home/sylvain/fin/fin/seq/table.py", line 322, in reval_item
    return self.reval(*it)
  File "/home/sylvain/fin/fin/seq/table.py", line 304, in reval
    result += self.reval(tail)
  File "/home/sylvain/fin/fin/seq/table.py", line 302, in reval
    result = self.reval_item(head)
  File "/home/sylvain/fin/fin/seq/table.py", line 322, in reval_item
    return self.reval(*it)
  File "/home/sylvain/fin/fin/seq/table.py", line 302, in reval
    result = self.reval_item(head)
  File "/home/sylvain/fin/fin/seq/table.py", line 324, in reval_item
    return [ Column(None, [item]*self._rows) ]
RecursionError: maximum recursion depth exceeded while calling a Python object

It is not obvious if add_column should accept the range function as a valid argument. Nevertheless, it shouldn't produce an infinite recursion.

In some circumstances, the column mini-language requires wrapping the argument in a 1-tuple

(fc.named("PRICE"), fc.add, (fc.constant(model_put['s_0']),), "PC"),

In some circumstances, the column mini-language requires wrapping the argument in a 1-tuple. This is confusing.

In practice, the following code is ambiguous:

(callable, callable, "X")

Using the infix notation, it can be parsed either as callable(callable(), "X") or callable(callable("X")) or even callable(callable()), "X".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.