pberkes / big_o Goto Github PK

Python module to estimate big-O time complexity from execution time

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

big_o's Issues

Seeking advice on making test more repeatable

In the zipp project (part of CPython's stdlib), I've employed big O to test the complexity of zipp.CompleteDirs._implied_dirs. The tests fail intermittently, particularly on slower platforms (macOS, Windows, PyPy) but also even on Ubuntu in CI. The tests succeed fairly reliably locally. I'm fairly confident the implementation is linear, but sometimes the test reports linearithmic or cubic.

I'm wondering if you have any advice on how I might be able to tune the test to be more reliable. I'm wondering if garbage collection is at play, or maybe the problem is simply the variability from using shared compute resources.

Python 3.5 issues with imports

Failing example available: https://github.com/nickbreen/big_o_test

When trying this in a trivial Python 3.5 virtual environment there are errors.

pytest.py (abridged)

import big_o

class Test(unittest.TestCase):

    def testPerf(self):
        gen = lambda n: big_o.datagen.large_integers(n)
        best, others = big_o.big_o(solution, gen, n_repeats=100)
        self.assertEqual(big_o.complexities.Linear, best)

pytest test.py

Output:

================================================= test session starts =================================================
platform linux -- Python 3.5.2, pytest-3.3.2, py-1.5.2, pluggy-0.6.0
rootdir: /home/nick/Code/big_o_p35, inifile:
collected 0 items / 1 errors                                                                                          

======================================================= ERRORS ========================================================
______________________________________________ ERROR collecting test.py _______________________________________________
ImportError while importing test module '/home/nick/Code/big_o_p35/test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
test.py:2: in <module>
    import big_o
venv/lib/python3.5/site-packages/big_o/__init__.py:6: in <module>
    import datagen
E   ImportError: No module named 'datagen'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================================== 1 error in 0.08 seconds ===============================================

Fragile test fails at random half of the time

One of the tests is fragile, and fails often due to random fluctuations, e.g.

F....
======================================================================
FAIL: test_big_o (big_o.test.test_big_o.TestBigO)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "big_o/test/test_big_o.py", line 57, in test_big_o
    self.assertEqual(class_, res_class.__class__)
AssertionError: <class 'big_o.complexities.Linear'> != <class 'big_o.complexities.Linearithmic'>

----------------------------------------------------------------------
Ran 5 tests in 1.661s

FAILED (failures=1)

How to pass predefined list as an argument to measure time complexity?

Hi,

Thanks for sharing this module.

I am trying to run my custom code and would like to know "time complexity" of the following code:

def nxtGreaterQuery(arr,qur):
  for i in range(len(qur)):
      #print(i)
      s = []
      s.append(arr[qur[i]])
      for j in range(qur[i]+1,len(arr)):
          if s[-1] < arr[j]:
              print(arr[j])
              s.pop()
              break
      if len(s) != 0:
          print(-1)

I call this function as:
nxtGreaterQuery([3,4,2,7,5,8,10,6],[3,6,1])
How can I replicate this using big_o module? I stuck in the following code:

import big_o
# How to pass two list as an argument  in the following line ???
best, others = big_o.big_o(nxtGreaterQuery,big_o.datagen.range_n,big_o.datagen.range_n, n_measures=100) 
print(best)

Thank you,
Saurabh

Add github actions workflow for sending a package to pypi on release

Show a nice report

Having to type the following:

best, others = big_o.big_o(heapify, data_generator_heapify, max_n=10**7)
print(best)
print()
for other in others:
    print(other)
print()

Is quite a drag. Also there needs to be a clear separation between best and others

Would be nice if there was a print_report function that prints it all for you, with a clear indication of which solution is likely best. This library looks like it's mostly for diagnostic usage anyway.

Add installation instructions to README

The library title on Github is big_O, but one needs pip install big-O to install it. This ambiguity can be easily solved by including a short installation section to the README. I will try to do it later but in case someone wants to do it before then.

Add units to string representation

The formulas printed with the complexity classes have no reference units.
Consider rescaling coefficients to milliseconds instead of seconds.

Use model selection methods when comparing complexity classes with different number of parameters

Complexity classes with more parameters will always fit the data better.

Use cross-validation methods, or just an AIC correction, to take that into account.

Passing function arguments to measure complexity

I have this implementation of the two-sum problem, and I want to test the time complexity. If I try and use this like in the documentation, it will tell me there is no argument target, could you tell me how I might use this library to test functions such as these?

def two_sum(arr, target):
     for i in arr:
             for j in arr:
                     if (i + j == target) and (i != j) and (i < j):
                             print((i, j))

Code style: PEP8

Issue in importing the big_O

I have installed this module by pip install big_O and when I try to import big_O it says no module name Big O.

Polynomial.compute() and Exponential.compute() Return Incorrect Results

Description

big_o.complexities.Polynomial.compute() and big_o.complexities.Exponential.compute() consistently return unreasonable values.

This issue is present both when the class is used by itself, and when it is returned by big_o.big_o()

Steps to reproduce:

Run the following example code to test the Polynomial class:

import big_o
import numpy as np

polynomial_complexity = big_o.complexities.Polynomial()
n = np.array(range(1,11)) # array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

t = 3.7*n**4.5 
# array([3.70000000e+00, 8.37214429e+01, 5.19095627e+02, 1.89440000e+03,
#        5.17090720e+03, 1.17457932e+04, 2.35040609e+04, 4.28653788e+04,
#        7.28271000e+04, 1.17004273e+05])

polynomial_complexity.fit(n, t) # 2.753078336902666e-29

t_computed = polynomial_complexity.compute(n)
# array([ 1.30833282,  4.42749513,  6.25208812,  7.54665744,  8.55080343,
#         9.37125043, 10.06492849, 10.66581976, 11.19584342, 11.66996574])

Observe t_computed is very different from the values t

The issue can be plotted as follows:

import matplotlib.pyplot as plt
plt.plot(n, t, "-", n, t_computed, "--")
plt.show()

In blue is t, the original function. In orange is t_computed.
Observe that, while t is consistent with a polynomial function, the t_computed line is negative and roughly linear/constant. The t_computed line is not consistent with a polynomial.

Expected Behavior:

Result of big_o.complexities.Polynomial() and big_o.complexities.Exponential() should resemble a polynomial/exponential function that matches the fitted data.

Output time complexity changes in different calls

Hey everyone,

So I came across the big_o library and I found it super useful and very cool. I was doing some tests, trying to understand how it works and I came across the following problem using the same code snippets as can be found in https://pypi.org/project/big-O/:

When I write:
big_o.big_o(np.zeros, big_o.datagen.n_, max_n=100000, n_repeats=100)

I get all sorts of outputs: Linear, Quadratic, Linearithmic, Cubic, ... . This is very disturbing because it looks like the result is not very stable and it might be right but might also be wrong.

I noticed this is particularly true if have already assessed the time complexity of some other method. That is, it looks like previous calls to "big_o" make subsequent calls less stable. I don't think this makes any sense but it was a pattern I have noticed (though it might have been a coincidence).

Has anyone experienced similar variations of the result?

See this snippet as an example of getting wrong results:

IndexError: index 0 is out of bounds for axis 0 with size 0

Running this code under Python 3.8 produced the following traceback. Note that it seems that max_n=10**7 is related to this bug, because it didn't happen before I used that.

Traceback (most recent call last):
  File "heapz.py", line 115, in <module>
    best, others = big_o.big_o(heapify, data_generator_heapify, max_n=10**6)
  File "C:\Program Files\Python38\lib\site-packages\big_o\big_o.py", line 157, in big_o
    return infer_big_o_class(ns, time, classes, verbose=verbose)
  File "C:\Program Files\Python38\lib\site-packages\big_o\big_o.py", line 97, in infer_big_o_class
    residuals = inst.fit(ns, time)
  File "C:\Program Files\Python38\lib\site-packages\big_o\complexities.py", line 38, in fit
    return residuals[0]
IndexError: index 0 is out of bounds for axis 0 with size 0

Documentation standard should be Python3

Not Python 2.7 as it is now. This is 2018!

TravisCI pull request testing

Add github actions workflow for running pytest

Convert the TravisCI testing task to Github actions
Also, remove the TravisCI task

Add better examples in the documentation and discuss potential timing issues

Following the discussion in #37 , we need to substitute the current big_O examples in the documentation with new ones that are not so sensitive to the state of the machine they are running on.

Also, we need to add a paragraph discussing the situations in which the time measurements are likely to be too noisy to be meaningful.

FutureWarning in complexities.py

Nothing major, just a warning.

big_o/complexities.py:36: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
  coeff, residuals, rank, s = np.linalg.lstsq(x, y)

Bump version number and release updated pip package

`return_raw_data` should include all timings (for `n_repeats`)

It would be nice if all of the raw data could be passed back if requested instead of only the minimum (for executions with n_repeats>1)

big_O/big_o/big_o.py

Line 61 in d700425

execution_time[i] = np.min(measurements)

So one could calculate standard deviation for the runs or perform other actions with it.

While at it it might be good idea to pass the raw data back in an extra variable instead adding it to the fitted dictionary:

big_O/big_o/big_o.py

Lines 171 to 175 in d700425

    
           if return_raw_data: 
        
               fitted['measures'] = ns 
        
               fitted['times'] = time 
        
           return best, fitted

Idea:

return_values = (best, fitted)
if return_raw_data:
    return_values = (best, fitted, {
         'measures' = ns,
         'times' = time
    })
return return_values

Happy to provide a PR!

	if return_raw_data:
	fitted['measures'] = ns
	fitted['times'] = time

	return best, fitted

pberkes / big_o Goto Github PK

big_o's Issues

Description

Steps to reproduce:

Expected Behavior:

Recommend Projects

Recommend Topics

Recommend Org