pberkes / big_o Goto Github PK
View Code? Open in Web Editor NEWPython module to estimate big-O time complexity from execution time
License: BSD 3-Clause "New" or "Revised" License
Python module to estimate big-O time complexity from execution time
License: BSD 3-Clause "New" or "Revised" License
In the zipp project (part of CPython's stdlib), I've employed big O to test the complexity of zipp.CompleteDirs._implied_dirs
. The tests fail intermittently, particularly on slower platforms (macOS, Windows, PyPy) but also even on Ubuntu in CI. The tests succeed fairly reliably locally. I'm fairly confident the implementation is linear, but sometimes the test reports linearithmic or cubic.
I'm wondering if you have any advice on how I might be able to tune the test to be more reliable. I'm wondering if garbage collection is at play, or maybe the problem is simply the variability from using shared compute resources.
Failing example available: https://github.com/nickbreen/big_o_test
When trying this in a trivial Python 3.5 virtual environment there are errors.
pytest.py
(abridged)
import big_o
class Test(unittest.TestCase):
def testPerf(self):
gen = lambda n: big_o.datagen.large_integers(n)
best, others = big_o.big_o(solution, gen, n_repeats=100)
self.assertEqual(big_o.complexities.Linear, best)
pytest test.py
Output:
================================================= test session starts =================================================
platform linux -- Python 3.5.2, pytest-3.3.2, py-1.5.2, pluggy-0.6.0
rootdir: /home/nick/Code/big_o_p35, inifile:
collected 0 items / 1 errors
======================================================= ERRORS ========================================================
______________________________________________ ERROR collecting test.py _______________________________________________
ImportError while importing test module '/home/nick/Code/big_o_p35/test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
test.py:2: in <module>
import big_o
venv/lib/python3.5/site-packages/big_o/__init__.py:6: in <module>
import datagen
E ImportError: No module named 'datagen'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================================== 1 error in 0.08 seconds ===============================================
@joepartridge reported in #12 :
"""
Couldn't figure out how to make [big_o
] work with other simple functions: selection sorting, bisection search, fibonacci, and others.
Could you please put some more examples on your README?
"""
One of the tests is fragile, and fails often due to random fluctuations, e.g.
F....
======================================================================
FAIL: test_big_o (big_o.test.test_big_o.TestBigO)
----------------------------------------------------------------------
Traceback (most recent call last):
File "big_o/test/test_big_o.py", line 57, in test_big_o
self.assertEqual(class_, res_class.__class__)
AssertionError: <class 'big_o.complexities.Linear'> != <class 'big_o.complexities.Linearithmic'>
----------------------------------------------------------------------
Ran 5 tests in 1.661s
FAILED (failures=1)
Hi,
Thanks for sharing this module.
I am trying to run my custom code and would like to know "time complexity" of the following code:
def nxtGreaterQuery(arr,qur):
for i in range(len(qur)):
#print(i)
s = []
s.append(arr[qur[i]])
for j in range(qur[i]+1,len(arr)):
if s[-1] < arr[j]:
print(arr[j])
s.pop()
break
if len(s) != 0:
print(-1)
I call this function as:
nxtGreaterQuery([3,4,2,7,5,8,10,6],[3,6,1])
How can I replicate this using big_o module? I stuck in the following code:
import big_o
# How to pass two list as an argument in the following line ???
best, others = big_o.big_o(nxtGreaterQuery,big_o.datagen.range_n,big_o.datagen.range_n, n_measures=100)
print(best)
Thank you,
Saurabh
Having to type the following:
best, others = big_o.big_o(heapify, data_generator_heapify, max_n=10**7)
print(best)
print()
for other in others:
print(other)
print()
Is quite a drag. Also there needs to be a clear separation between best
and others
Would be nice if there was a print_report
function that prints it all for you, with a clear indication of which solution is likely best. This library looks like it's mostly for diagnostic usage anyway.
The library title on Github is big_O
, but one needs pip install big-O
to install it. This ambiguity can be easily solved by including a short installation section to the README. I will try to do it later but in case someone wants to do it before then.
The formulas printed with the complexity classes have no reference units.
Consider rescaling coefficients to milliseconds instead of seconds.
Complexity classes with more parameters will always fit the data better.
Use cross-validation methods, or just an AIC correction, to take that into account.
I have this implementation of the two-sum problem, and I want to test the time complexity. If I try and use this like in the documentation, it will tell me there is no argument target
, could you tell me how I might use this library to test functions such as these?
def two_sum(arr, target):
for i in arr:
for j in arr:
if (i + j == target) and (i != j) and (i < j):
print((i, j))
I have installed this module by pip install big_O and when I try to import big_O it says no module name Big O.
big_o.complexities.Polynomial.compute()
and big_o.complexities.Exponential.compute()
consistently return unreasonable values.
This issue is present both when the class is used by itself, and when it is returned by big_o.big_o()
Run the following example code to test the Polynomial
class:
import big_o
import numpy as np
polynomial_complexity = big_o.complexities.Polynomial()
n = np.array(range(1,11)) # array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
t = 3.7*n**4.5
# array([3.70000000e+00, 8.37214429e+01, 5.19095627e+02, 1.89440000e+03,
# 5.17090720e+03, 1.17457932e+04, 2.35040609e+04, 4.28653788e+04,
# 7.28271000e+04, 1.17004273e+05])
polynomial_complexity.fit(n, t) # 2.753078336902666e-29
t_computed = polynomial_complexity.compute(n)
# array([ 1.30833282, 4.42749513, 6.25208812, 7.54665744, 8.55080343,
# 9.37125043, 10.06492849, 10.66581976, 11.19584342, 11.66996574])
Observe t_computed
is very different from the values t
The issue can be plotted as follows:
import matplotlib.pyplot as plt
plt.plot(n, t, "-", n, t_computed, "--")
plt.show()
In blue is t
, the original function. In orange is t_computed
.
Observe that, while t
is consistent with a polynomial function, the t_computed
line is negative and roughly linear/constant. The t_computed
line is not consistent with a polynomial.
Result of big_o.complexities.Polynomial()
and big_o.complexities.Exponential()
should resemble a polynomial/exponential function that matches the fitted data.
Hey everyone,
So I came across the big_o library and I found it super useful and very cool. I was doing some tests, trying to understand how it works and I came across the following problem using the same code snippets as can be found in https://pypi.org/project/big-O/:
When I write:
big_o.big_o(np.zeros, big_o.datagen.n_, max_n=100000, n_repeats=100)
I get all sorts of outputs: Linear, Quadratic, Linearithmic, Cubic, ... . This is very disturbing because it looks like the result is not very stable and it might be right but might also be wrong.
I noticed this is particularly true if have already assessed the time complexity of some other method. That is, it looks like previous calls to "big_o" make subsequent calls less stable. I don't think this makes any sense but it was a pattern I have noticed (though it might have been a coincidence).
Has anyone experienced similar variations of the result?
See this snippet as an example of getting wrong results:
Running this code under Python 3.8 produced the following traceback. Note that it seems that max_n=10**7
is related to this bug, because it didn't happen before I used that.
Traceback (most recent call last):
File "heapz.py", line 115, in <module>
best, others = big_o.big_o(heapify, data_generator_heapify, max_n=10**6)
File "C:\Program Files\Python38\lib\site-packages\big_o\big_o.py", line 157, in big_o
return infer_big_o_class(ns, time, classes, verbose=verbose)
File "C:\Program Files\Python38\lib\site-packages\big_o\big_o.py", line 97, in infer_big_o_class
residuals = inst.fit(ns, time)
File "C:\Program Files\Python38\lib\site-packages\big_o\complexities.py", line 38, in fit
return residuals[0]
IndexError: index 0 is out of bounds for axis 0 with size 0
Not Python 2.7 as it is now. This is 2018!
Convert the TravisCI testing task to Github actions
Also, remove the TravisCI task
Following the discussion in #37 , we need to substitute the current big_O
examples in the documentation with new ones that are not so sensitive to the state of the machine they are running on.
Also, we need to add a paragraph discussing the situations in which the time measurements are likely to be too noisy to be meaningful.
Nothing major, just a warning.
big_o/complexities.py:36: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
coeff, residuals, rank, s = np.linalg.lstsq(x, y)
It would be nice if all of the raw data could be passed back if requested instead of only the minimum (for executions with n_repeats>1
)
Line 61 in d700425
So one could calculate standard deviation for the runs or perform other actions with it.
While at it it might be good idea to pass the raw data back in an extra variable instead adding it to the fitted dictionary:
Lines 171 to 175 in d700425
Idea:
return_values = (best, fitted)
if return_raw_data:
return_values = (best, fitted, {
'measures' = ns,
'times' = time
})
return return_values
Happy to provide a PR!
Aim at supporting 2.7, 3.4, 3.5, 3.6
To allow wider adoption
I am getting the error 'xrange' is not defined. Does it have support to Python 3.x ?
Unexpected dot on line: "from .complexities import ALL_CLASSES"
Missing: import datagen
With those minor fixes it works... with your examples.
Couldn't figure out how to make it work with other simple functions: selection sorting, bisection search, fibonacci, and others.
Could you please put some more examples on your README?
big_O appears to require numpy. However this requirement isn't present in setup.py leading to an error on import, ModuleNotFoundError: No module named numpy.
How to calculate the big_o [for] feedforward based neural network class?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.