fin,s-leroux

Use the index as an indirection table toward to actual data rows.

cdef Column _index

It is confusing the have the index as a column separated from the data columns.

A different approach would be to handle the index data as an ordinary column but use the index itself as an indirection to retrieve the rows in a given order. The extra level of indirection should be negligible for the C/Cython code. For Python code, this will require an extra level of __getitem__.

Use case: counting the average number of consecutive days BTC closes below its SMA50D

Implement the necessary code and documentation to cover the following use case:

We want to know the average number of consecutive days BTC closes below its SMA50D.

Review and document the strategy simulator

A strategy simulator was written for the pre-0.1.0 version.
Review, check, and document the simulator to work with the current version.

`Serie.from_csv()` should warn if a cell does not contain valid data

Serie.from_csv() should warn if a cell does not contain valid data.

Maybe Serie.from_data() should do the same.

fin/fin/seq/serie.pyx

Lines 156 to 182 in d79b4f0

    
           import csv 
        
           from fin import datetime 
        
           cdef Serie serie_from_csv(iterator, str formats, fieldnames, str delimiter, dict kwargs): 
        
               """ 
        
               Create a new serie by iterating over CSV data rows. 
        
               """ 
        
               rows = [] 
        
               types = parse_types(formats) 
        
               reader = csv.reader(iterator, delimiter=delimiter) 
        
               if fieldnames is not None: 
        
                   heading = [str(fieldname) for fieldname in fieldnames] 
        
               else: 
        
                   # default to first line 
        
                   heading = [fieldname.strip() for fieldname in next(reader)] 
        
               rows = list(reader) 
        
               cols = [] 
        
               names = [] 
        
               for name, tps, col in zip(heading, types, zip(*rows)): 
        
                   names.append(name) 
        
                   cols.append(tps.parse_string_sequence(col)) 
        
               result = serie_from_data(cols, names, types, kwargs) 
        
           #    if select: 
        
           #        result = result.select(*select) 
        
               return result

Optionally, we may consider adding a keyword parameter to specify the authorized "n/a" values that will be converted to None/NaN.

Add an API interface to download data from Binance

Add an API interface to download historical and intraday data from Binance:
https://data.binance.vision/

Docs:
https://github.com/binance/binance-public-data

Consider typing columns

Since the addition of formatters, columns are implicitly typed. However, this may lead to implementation problems like those in series_from_csv, where the formatter-related code looks like a hack.

Consider explicitly typing the columns instead. The exact formatter, if not provided, would then be inferred from the type and possibly the data (i.e., to auto-detect the precision for floating-point numbers)

Run code snippets in test

We should change the Makefile to run the code snippets in docs/snippets and check their output is unchanged.

Make it easy to calculate values on a complete column.

We should make it easy to calculate values on a complete column.
Currently, if you want the mean of a column in a series, you have to write:

from fin.seq.serie import Serie
from fin.seq import fc, ag

ser = Serie.from_csv_file(
        "tests/_fixtures/MCD-20200103-20230103.csv",
        format="dnnnnni"
    ).group_by(
        fc.constant(True),
        (ag.first, "Date"),
        (ag.avg, "Open", "Close"),
    )

print(ser)
avg_open = ser["Open"].columns[0][-1]
print(avg_open)

We can clearly do better. This may imply:

Adding a way to directly access data columns by name (FWIW, indices are stored independently from data columns)
Adding a way to apply a function directly to a whole column.

We may leverage aggregate functions for that.
Ideally, aggregate functions may also be usable as window functions. This would be less efficient than specifically designing code, but it would avoid code duplication.

Currently, the problem is that aggregate functions are designed to apply to a set of columns rather than individual columns for efficiency reasons. Notice the for col in cols list comprehension in the code below:

fin/fin/seq/ag/core.py

Lines 25 to 30 in 6ba6041

    
           class _Avg(AggregateFunction): 
        
               def type_for(self, column): 
        
                   return coltypes.Float() 
        
               def __call__(self, *cols): 
        
                   return [sum(col)/len(col) for col in cols]

Check if we can return a t-expr from a function

Check if we can return a t-expr from a function. This would permit to define high-level functions from lower-level blocks:

def sma(rowcount, col):
    return (div, (sum, col), rowcount)

Is this a desirable feature?

make the index spec mandatory in select

[...] it it is rather confusing to omit the DATE column in the select clause because it is implied, while it is mandatory in group_by.

Originally posted by @s-leroux in #27 (comment)

Serie to string conversion should default to tabular mode

fin/README.md

Lines 52 to 56 in 1d7dbf8

    
           Date, Open, High, Low, Close, Adj Close, Volume 
        
           2023-07-17, 286.630005, 292.230011, 283.570007, 290.380005, 290.380005, 131569600 
        
           2023-07-18, 290.149994, 295.26001, 286.01001, 293.339996, 293.339996, 112434700 
        
           2023-07-19, 296.040009, 299.290009, 289.519989, 291.26001, 291.26001, 142355400 
        
           2023-07-20, 279.559998, 280.929993, 261.200012, 262.899994, 262.899994, 175158300

The default output format should be tabular, not csv.

~~An option would add (if feasible?) a second parameter to the Serie.__str__ method.~~

Auto-normalize solver's constraints

Both with RandomSolver and ParticleSwarmSolver, when an equilibrium function gains significantly more weight than the other, the constraint solver fails to find an optimal solution.

We should provide a solution to roughly normalize the different constraints to evaluate close to the [-1, +1] range.

Cythonize `column.Column`

One path to improve performance is by using Cython to perform calculations natively without the Python float overhead.

As an experiment, we may try to implement column.Column with Cython to store values as a native array of float.

Provide an interface to financialmodelingprep.com's API (historical data, fundamental data subset)

Provide an interface to financialmodelingprep.com's API

historical data
fundamental data subset

The ParticleSwampSolver seems to fail the best solution when constraints are unsolvable

When using unsolvable constraints, we might hope for the solver to find the "best" answer in the domain range.
Here, we obtain many zero values:

Score 0.0
Maximum loss        : 0.0
Take profit         : 0.0
Entry price         : 0.0
Stop price          : 0.0
Quantity            : 8.204826221431059

The example above was produced by the following code:

"""
Evaluate a position size, stop price, and take profit from a maximum loss
and entry price.

Usage:
    PYTHONPATH="$PWD" python3 examples/fin/model/pricing.py
"""
from fin.model.complexmodel import ComplexModel

UNKNOWN = (0, 1000)
QTY = (1,10)
ENTRY_PRICE = 0.9024
STOP_PRICE = 0.8614
TAKE_PROFIT = 0.9498
MAX_LOSS = 21

def fees(price):
    return price*0.1/100

def balance(debit, credit):
    return (credit-fees(credit))-(debit+fees(debit))

model = ComplexModel()
eq1 = model.register(
        lambda entry, stop, qty, maxloss : balance(entry*qty, stop*qty) - -maxloss,
        dict(name="entry", description="Entry price", domain=ENTRY_PRICE),
        dict(name="stop", description="Stop price", domain=STOP_PRICE),
        dict(name="qty", description="Quantity", domain=QTY),
        dict(name="maxloss", description="Maximum loss", domain=MAX_LOSS),
    )
eq2 = model.register(
        lambda entry, stop, qty, tp : balance(entry*qty, stop*qty)*2 + balance(entry*qty, tp*qty),
        dict(name="entry", description="Entry price", domain=ENTRY_PRICE),
        dict(name="stop", description="Stop price", domain=STOP_PRICE),
        dict(name="qty", description="Quantity", domain=QTY),
        dict(name="tp", description="Take profit", domain=TAKE_PROFIT),
    )

model.bind(eq1, "entry", eq2, "entry")
model.bind(eq1, "stop", eq2, "stop")
model.bind(eq1, "qty", eq2, "qty")

params, domains, eqs = model.export()
# from pprint import pprint
# pprint(params)
# pprint(domains)
# pprint(eqs)

from fin.model.solvers import ParticleSwarmSolver
solver = ParticleSwarmSolver(5,300)
score, result = solver.solve(domains, eqs)

print(f"Score {score}")
for param, value in zip(params, result):
    print(f"{param['description']:20s}: {value}")

XY plots use the row number instead of the actual X value

fin/fin/seq/plot.py

Line 483 in 9fafc76

element = _GNUPlotDataElement(kind, ["(column(0))", y])

This was designed like that because of dates. When plotting financial graphs, the usage is to ignore non-trading days (WE, holidays, ...)

Ideas:

Add an option in plots to choose between row number and x values
Require the user to explicitly plot against row number when required

Create a model for the Kelly's Criterion

We should add the Kelly's Criterion to the predefined models available in the fin.model package.

See https://en.wikipedia.org/wiki/Kelly_criterion

Distinguish between `from_sequence` and `import`

fin/fin/seq/column.pyx

Lines 384 to 392 in 16ec607

    
               @staticmethod 
        
               def from_sequence(sequence, *, convert=True, **kwargs): 
        
                   """ 
        
                   Create a Column from a sequence of Python objects. 
        
                   """ 
        
                   cdef Column column = Column(**kwargs) 
        
                   column._py_values = column._type.from_sequence(sequence) if convert else tuple(sequence) 
        
                   return column

The Column.from_sequence method seems to have different purposes depending on the value of the convert parameter.
We may distinguish between from_sequence, which creates a column from Python objects, and import, which creates a sequence from the textual representation of values.

Implement the `drawdown` function

We should implement the drawdown function.

A strategy suffers a drawdown whenever it has lost money recently.
A drawdown at a given time t is defined as the difference between
the current equity value (assuming no redemption or cash infusion)
of the portfolio and the global maximum of the equity curve occur-
ring on or before time t. The maximum drawdown is the difference
between the global maximum of the equity curve with the global
minimum of the curve after the occurrence of the global maximum
(time order matters here: The global minimum must occur later
than the global maximum).
— Quantitative Trading (Chan) ,p21

Wrong exception raised when `serie_evaluate` encounters a number

fin/fin/seq2/serie.pyx

Line 211 in 9016342

raise SystemError(f"Did you mean fc.constant({t!r})?")

We should raise a TypeError here, not a SystemError.

Implement Heikin-Ashi

Implement Heikin-Ashi to create more readable candlestick charts.

From https://www.investopedia.com/trading/heikin-ashi-better-candlestick/:

Consider a custom container for storing column's Python values

Consider a custom container for storing column's Python values.

Column's data storage has unique requirements that are not completely satisfied by the standard tuple type. We may consider a custom container with the following features:

Immutability
Allow zero-copy ~~row remapping~~/range selection
Direct access to elements by index

Remove extraneous comments

Remove extraneous comments:

fin/fin/seq/column.pyx

Lines 293 to 295 in 16ec607

    
           """ 
        
           Magic number for missing values in `remap()` 
        
           """

Update README

Update the README to add the following:

dependencies
example of download + calculation
example of plot

Implement `Serie.group_by` and aggregate functions

Implement Serie.group_by and aggregate functions.

Note that the group_by function has a different semantic from SQL: here, rows are grouped by a consecutive sequence of values matching a condition.

Required by #27.

Possible syntax:

serie.group_by(<cond>, <aggregate fct, ...>)

Where <cond> is an arbitrary column expression (possibly a single column name) and <aggregate fct, ...> is a non-empty sequence of aggregate functions.

Example:

serie.group_by(
    (fc.gt, "CLOSE", "OPEN"),
    (ag.first, "NAME", "ID", "DATE"),
    (ag.count, (fc.named("COUNT"), "NAME")),
    (ag.avg, (fc.named("AVG PRICE"), "CLOSE")),
)

Implement several volatility estimators

Implement (in Cython) several volatility estimators:

Close-to-close
Parkinson
Garman-Klass

Currently, we only have the Python implementation of the close-to-close volatility:

fin/fin/seq/algo.py

Lines 188 to 212 in 78913aa

    
           def volatility(n, tau=1/252): 
        
               """ 
        
               Compute the Annualized Historical Volatility over a n-period window. 
        
               In practice this is the standard deviation of the day-to-day return. 
        
               Parameters: 
        
                   n: the number of periods in the window. Often 20 or 21 for dayly data 
        
                       (corresponding to the number of trading days in one month) 
        
                   tau: inverse of the number of periods in one year 
        
               """ 
        
               stddev = standard_deviation(n) 
        
               log = math.log 
        
               k = math.sqrt(1/tau) 
        
               vol = lambda stddev : stddev*k 
        
               def _volatility(rowcount, values): 
        
                   # 1. Continuously compounded return for each period 
        
                   ui = map_change(lambda curr, prev: log(curr/prev))(rowcount, values) 
        
                   # 2. Standard deviation 
        
                   result = stddev(rowcount, ui) 
        
                   # 3. Annualized values 
        
                   return map(vol)(rowcount, result) 
        
               return _volatility

Add multi-variable constraint solving for models

Add multi-variable constraint solving for models, maybe using the downhill simplex method, or—for derivative-free optimization methods—swarm optimization or genetic algorithms.

There is no obvious way to specify the aggregate function for the index.

I wonder if adding implicitly the index in select/group_by isn't an error. Maybe we can make the index reference mandatory. Would it be desirable to check that the index values are sorted in ascending order?

Working on that in https://github.com/s-leroux/fin/tree/exp/make-index-selection-explicit

Originally posted by @s-leroux in #30 (comment)

Other things to consider:

The index selection is also implicit for join operations.
We may introduce the new sort predicate to change the index.
OTOH, some use cases seem hard to express at first sight without explicit index selection. For example: "Calculate the average prices per month"

Unicode characters break the test runner on GitHub

Unicode characters break the test runner on GitHub.

The example below was caused by '…' (\u2026) in the heading while printing a Serie instance.

======================================================================
ERROR: test_adj (tests.fin.seq.fc.test_adj.TestAdjustQuote)
<class 'fin.model.solvers.particle.ParticleSwarmSolver'> Use case #1 4.565761130040028e-16 (3.9999999928003915, 7.999999995255896)
<class 'fin.model.solvers.random.RandomSolver'> Use case #1 0.05296359406606747 (4.078120375554134, 8.045150938464865)
<class 'fin.model.solvers.particle.ParticleSwarmSolver'> Use case #2 2.3216148996009935e-18 (2.0000000009396457, 2.9999999995592344)
<class 'fin.model.solvers.random.RandomSolver'> Use case #2 0.02055660096305517 (1.9510313914860744, 3.0927051599569175)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/fin/fin/tests/fin/seq/fc/test_adj.py", line 32, in test_adj
    print(res)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2026' in position 7: ordinal not in range(128)

----------------------------------------------------------------------
Ran 206 tests in 1.765s

Consider adding a tool to extract fundamental data from web pages

There is a lot of free fundamental data available on web pages. We already have experience with a web scrapper:
162c944

The code above was written especially for Investing.com.
Can we have something more generic to parse table-like data?

The requirements are to be able to parse table elements, but also eventually pseudo-tables made of div span constructs.

Adjust quotes

Yahoo Finance (and other data providers) return the quotes as (open, high, low, close, adj close). Only the close price is adjusted for splits and dividends. We should provide a function to rescale the open, high, and low accordingly.

An option is to express the open, high, and low relative to (unadjusted) close, then apply these ratios to the adjusted close price to find the other adjusted prices.

Review the differences between `algo.map()` and `expr.map()`

Review the differences between algo.map() and expr.map():

fin/fin/seq/algo.py

Lines 562 to 573 in 78913aa

    
           def map(fct): 
        
               """ 
        
               Map data using a user-provided function. 
        
               Handle None gracefully (as opposed to `builtins.map`) 
        
               Formally, y_i = f(u_i) 
        
               """ 
        
               def _map(rowcount, values): 
        
                   return Column(None, [fct(x) if x is not None else None for x in values]) 
        
               return _map

fin/fin/seq/expr.py

Lines 18 to 19 in 78913aa

    
           def map(f, *, name=None): 
        
               return lambda rowcount, *args : Column(name, [f(*row) for row in zip(*args)])

Check if:

we can merge both of them
or rename one or the other to avoid confusion

Allow row selection using subscript notation

Currently, subscript notation in Serie only allows the selection of columns:

fin/fin/seq/serie.pyx

Lines 400 to 432 in 16ec607

    
           def __getitem__(self, selector): 
        
               t = type(selector) 
        
               if t is tuple: 
        
                   return self.c_get_items(selector) 
        
               elif t is int: 
        
                   return self.c_get_item_by_index(selector) 
        
               elif t is str: 
        
                   return self.c_get_item_by_name(selector) 
        
               else: 
        
                   raise TypeError(f"serie indices cannot be {t}") 
        
           cdef Serie c_get_items(self, tuple seq): 
        
               # Should we implement this using a recursive-descend parser to allow nested tuples? 
        
               cdef list columns = [] 
        
               cdef object i 
        
               cdef type t 
        
               for i in seq: 
        
                   t = type(i) 
        
                   if t is int: 
        
                       columns.append(serie_get_column_by_index(self, i)) 
        
                   elif t is str: 
        
                       columns.append(serie_get_column_by_name(self, i)) 
        
                   else: 
        
                       raise TypeError(f"serie indices cannot be {t}") 
        
               return serie_bind(self._index, tuple(columns), self.name) 
        
           cdef Serie c_get_item_by_index(self, int idx): 
        
               return serie_bind(self._index, (serie_get_column_by_index(self, idx),), self.name) 
        
           cdef Serie c_get_item_by_name(self, str name): 
        
               return serie_bind(self._index, (serie_get_column_by_name(self, name),), self.name)

We may extend the supported notation to allow row selection as well. At a minimum, we may allow row selection based on index and index range. ~~Possibly required by #30.~~

Floating point data columns are sometimes displayed as ternary values

In some circumstances, floating-point number columns are displayed as ternary columns:

from fin.api.yf import Client
from fin.seq import fc

ticker = "^FCHI"
duration = dict(days=5)

client = Client()
data = client.historical_data(ticker, duration)

# Yahoo! Finance has dirty data. Do some clean-up
data = data.where(
        (fc.all, "Open", "High", "Low", "Close", "Adj Close"),
    )

print(data)

Displays:

      Date | Open     | High     | Low      | Close    | Adj Clo… |   Volume
---------- | -------- | -------- | -------- | -------- | -------- | --------
2024-05-03 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 69643700
2024-05-06 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 43781100
2024-05-07 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 58688300
2024-05-08 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |        0

It is probably caused when the same column has cached several representations. We do not have a mechanism to select the "most appropriate" representation.

Implement the logarithmic return algorithm

Implement (in Cython) the log return algorithm:

r[i] = ln(p[i]/p[i-i])

or

r[i] = ln(p[i])-ln(p[i-i])

Add the `union` operator

Add the union operator to combine two (more?) series that have the same columns.

This may be useful when loading data in chunks and you want to combine them to return the result as a single series.

Implement a cache strategy for end-of-day data

We should locally cache EOD data instead of systematically retrieving them from the provider.

Maybe, something along the lines of:

client = Cache(yf.Client())
t = client.historical_data(...)

Review the t-expr semantic

Review the t-expr semantic:

fin/fin/seq/table.py

Line 288 in 78913aa

def reval(self, head, *tail):

Specifically:

Consider dropping the support for constants in favor of expr.constant
Consider accepting native Python sequences as columns (i.e., t.reval([1,2,3,4]) would return one column with the given values instead of four constant columns)

Implement Sharpe ratio

There is a preliminary implementation of the Sharpe ratio. It needs to be fixed.

fin/fin/seq/algo.py

Lines 256 to 276 in 46a55e6

    
           def _basic_sharpe_ratio(rowcount, values): 
        
               s = iter(stddev(rowcount, values)) 
        
               result = [None]*(n-1) 
        
               push = result.append 
        
               i = iter(values) 
        
               for _, _, _ in zip(range(n-1), i, s): 
        
                   pass 
        
               j = iter(values) 
        
               for x_i, x_j, s_i in zip(i, j, s): 
        
                   try: 
        
                       ret = (x_i - x_j) # TODO replace by average daily return 
        
                       push(ret/s_i) 
        
                   except TypeError: 
        
                       push(None) 
        
               return Column(f"BSHARPE({n}), {get_column_name(values)}", result) 
        
           return _basic_sharpe_ratio

See Risk Management in Trading (Edwrads), p109:

Check that the four basic arithmetic operations on columns are implemented

Check that the four basic arithmetic operations on columns are implemented (in Cython):

add,
sub,
mul,
div

Follow the LISP semantic for the n-columns versions.
Check the results are consistent when invoked with zero or one column.

Renaming ternary columns doesn't work

This fails on the print statement:

from fin.seq.serie import Serie
from fin.seq import fc

ser = Serie.create(
        (fc.named("T"), fc.range(10)),
        (fc.named("X"), "T"),
        (fc.named("Y"), fc.all, "X"),
    )
print(ser)

But that works as expected:

from fin.seq.serie import Serie
from fin.seq import fc

ser = Serie.create(
        (fc.named("T"), fc.range(10)),
        (fc.named("X"), "T"),
        (fc.all, "X"),
    )
print(ser)

Table should know if/how their rows are sorted

Some table functions (like join()) require the table rows to be sorted. We could enforce that.

Idea: when calling table.sort(), the Table instance could remember the key used so we could check that when required.

Typo

fin/tests/fin/seq/test_column.py

Line 231 in 7e8c0f6

def tesst_ternary_to_float_conversion(self):

Typo: there is an extra s here

Infinite recursion with `Table.add_column`

The code below produces an infinite recursion:

t = table.Table(361)
t.add_column((range))

  File "/home/sylvain/fin/fin/seq/table.py", line 322, in reval_item
    return self.reval(*it)
  File "/home/sylvain/fin/fin/seq/table.py", line 304, in reval
    result += self.reval(tail)
  File "/home/sylvain/fin/fin/seq/table.py", line 302, in reval
    result = self.reval_item(head)
  File "/home/sylvain/fin/fin/seq/table.py", line 322, in reval_item
    return self.reval(*it)
  File "/home/sylvain/fin/fin/seq/table.py", line 302, in reval
    result = self.reval_item(head)
  File "/home/sylvain/fin/fin/seq/table.py", line 324, in reval_item
    return [ Column(None, [item]*self._rows) ]
RecursionError: maximum recursion depth exceeded while calling a Python object

It is not obvious if add_column should accept the range function as a valid argument. Nevertheless, it shouldn't produce an infinite recursion.

Support for the Eodhd API

We should support the End-Of-Day Historical Stock Market Data API available on https://eodhistoricaldata.com/.

API endpoint:
https://eodhistoricaldata.com/api/eod/MCD.US?api_token=demo

API doc:
https://eodhistoricaldata.com/financial-apis/api-for-historical-data-and-volumes/

Download historical data from Yahoo! Finance in chunks

The maximum length on Yahoo! Finance for historical data download is one year. We may overcome that limitation by downloading data in one-year chunks.

In some circumstances, the column mini-language requires wrapping the argument in a 1-tuple

fin/examples/fin/model/warrant_price_change.py

Line 40 in 1d7dbf8

(fc.named("PRICE"), fc.add, (fc.constant(model_put['s_0']),), "PC"),

In some circumstances, the column mini-language requires wrapping the argument in a 1-tuple. This is confusing.

In practice, the following code is ambiguous:

(callable, callable, "X")

Using the infix notation, it can be parsed either as callable(callable(), "X") or callable(callable("X")) or even callable(callable()), "X".

Add GitHub actions to run the tests on push.

	import csv
	from fin import datetime
	cdef Serie serie_from_csv(iterator, str formats, fieldnames, str delimiter, dict kwargs):
	"""
	Create a new serie by iterating over CSV data rows.
	"""
	rows = []
	types = parse_types(formats)
	reader = csv.reader(iterator, delimiter=delimiter)
	if fieldnames is not None:
	heading = [str(fieldname) for fieldname in fieldnames]
	else:
	# default to first line
	heading = [fieldname.strip() for fieldname in next(reader)]
	rows = list(reader)
	cols = []
	names = []

	for name, tps, col in zip(heading, types, zip(*rows)):
	names.append(name)
	cols.append(tps.parse_string_sequence(col))

	result = serie_from_data(cols, names, types, kwargs)
	# if select:
	# result = result.select(*select)

	return result

	class _Avg(AggregateFunction):
	def type_for(self, column):
	return coltypes.Float()

	def __call__(self, *cols):
	return [sum(col)/len(col) for col in cols]

	Date, Open, High, Low, Close, Adj Close, Volume
	2023-07-17, 286.630005, 292.230011, 283.570007, 290.380005, 290.380005, 131569600
	2023-07-18, 290.149994, 295.26001, 286.01001, 293.339996, 293.339996, 112434700
	2023-07-19, 296.040009, 299.290009, 289.519989, 291.26001, 291.26001, 142355400
	2023-07-20, 279.559998, 280.929993, 261.200012, 262.899994, 262.899994, 175158300

	@staticmethod
	def from_sequence(sequence, , convert=True, *kwargs):
	"""
	Create a Column from a sequence of Python objects.
	"""
	cdef Column column = Column(**kwargs)
	column._py_values = column._type.from_sequence(sequence) if convert else tuple(sequence)

	return column

	def volatility(n, tau=1/252):
	"""
	Compute the Annualized Historical Volatility over a n-period window.

	In practice this is the standard deviation of the day-to-day return.

	Parameters:
	n: the number of periods in the window. Often 20 or 21 for dayly data
	(corresponding to the number of trading days in one month)
	tau: inverse of the number of periods in one year
	"""
	stddev = standard_deviation(n)
	log = math.log
	k = math.sqrt(1/tau)
	vol = lambda stddev : stddev*k

	def _volatility(rowcount, values):
	# 1. Continuously compounded return for each period
	ui = map_change(lambda curr, prev: log(curr/prev))(rowcount, values)
	# 2. Standard deviation
	result = stddev(rowcount, ui)
	# 3. Annualized values
	return map(vol)(rowcount, result)

	return _volatility

	def map(fct):
	"""
	Map data using a user-provided function.

	Handle None gracefully (as opposed to `builtins.map`)

	Formally, y_i = f(u_i)
	"""
	def _map(rowcount, values):
	return Column(None, [fct(x) if x is not None else None for x in values])

	return _map

	def map(f, *, name=None):
	return lambda rowcount, args : Column(name, [f(row) for row in zip(*args)])

	def __getitem__(self, selector):
	t = type(selector)
	if t is tuple:
	return self.c_get_items(selector)
	elif t is int:
	return self.c_get_item_by_index(selector)
	elif t is str:
	return self.c_get_item_by_name(selector)
	else:
	raise TypeError(f"serie indices cannot be {t}")

	cdef Serie c_get_items(self, tuple seq):
	# Should we implement this using a recursive-descend parser to allow nested tuples?
	cdef list columns = []
	cdef object i
	cdef type t

	for i in seq:
	t = type(i)
	if t is int:
	columns.append(serie_get_column_by_index(self, i))
	elif t is str:
	columns.append(serie_get_column_by_name(self, i))
	else:
	raise TypeError(f"serie indices cannot be {t}")

	return serie_bind(self._index, tuple(columns), self.name)

	cdef Serie c_get_item_by_index(self, int idx):
	return serie_bind(self._index, (serie_get_column_by_index(self, idx),), self.name)

	cdef Serie c_get_item_by_name(self, str name):
	return serie_bind(self._index, (serie_get_column_by_name(self, name),), self.name)

	def _basic_sharpe_ratio(rowcount, values):
	s = iter(stddev(rowcount, values))
	result = [None]*(n-1)
	push = result.append

	i = iter(values)
	for _, _, _ in zip(range(n-1), i, s):
	pass

	j = iter(values)
	for x_i, x_j, s_i in zip(i, j, s):
	try:
	ret = (x_i - x_j) # TODO replace by average daily return
	push(ret/s_i)
	except TypeError:
	push(None)

	return Column(f"BSHARPE({n}), {get_column_name(values)}", result)


	return _basic_sharpe_ratio

s-leroux / fin Goto Github PK

fin's People

Stargazers

Watchers

fin's Issues

Recommend Projects

Recommend Topics

Recommend Org