Coder Social home page Coder Social logo

big_o's Issues

Seeking advice on making test more repeatable

In the zipp project (part of CPython's stdlib), I've employed big O to test the complexity of zipp.CompleteDirs._implied_dirs. The tests fail intermittently, particularly on slower platforms (macOS, Windows, PyPy) but also even on Ubuntu in CI. The tests succeed fairly reliably locally. I'm fairly confident the implementation is linear, but sometimes the test reports linearithmic or cubic.

I'm wondering if you have any advice on how I might be able to tune the test to be more reliable. I'm wondering if garbage collection is at play, or maybe the problem is simply the variability from using shared compute resources.

Python 3.5 issues with imports

Failing example available: https://github.com/nickbreen/big_o_test

When trying this in a trivial Python 3.5 virtual environment there are errors.

pytest.py (abridged)

import big_o

class Test(unittest.TestCase):

    def testPerf(self):
        gen = lambda n: big_o.datagen.large_integers(n)
        best, others = big_o.big_o(solution, gen, n_repeats=100)
        self.assertEqual(big_o.complexities.Linear, best)
pytest test.py

Output:

================================================= test session starts =================================================
platform linux -- Python 3.5.2, pytest-3.3.2, py-1.5.2, pluggy-0.6.0
rootdir: /home/nick/Code/big_o_p35, inifile:
collected 0 items / 1 errors                                                                                          

======================================================= ERRORS ========================================================
______________________________________________ ERROR collecting test.py _______________________________________________
ImportError while importing test module '/home/nick/Code/big_o_p35/test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
test.py:2: in <module>
    import big_o
venv/lib/python3.5/site-packages/big_o/__init__.py:6: in <module>
    import datagen
E   ImportError: No module named 'datagen'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================================== 1 error in 0.08 seconds ===============================================

More complex examples in README

@joepartridge reported in #12 :
"""
Couldn't figure out how to make [big_o] work with other simple functions: selection sorting, bisection search, fibonacci, and others.
Could you please put some more examples on your README?
"""

Fragile test fails at random half of the time

One of the tests is fragile, and fails often due to random fluctuations, e.g.

F....
======================================================================
FAIL: test_big_o (big_o.test.test_big_o.TestBigO)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "big_o/test/test_big_o.py", line 57, in test_big_o
    self.assertEqual(class_, res_class.__class__)
AssertionError: <class 'big_o.complexities.Linear'> != <class 'big_o.complexities.Linearithmic'>

----------------------------------------------------------------------
Ran 5 tests in 1.661s

FAILED (failures=1)

How to pass predefined list as an argument to measure time complexity?

Hi,

Thanks for sharing this module.

I am trying to run my custom code and would like to know "time complexity" of the following code:

def nxtGreaterQuery(arr,qur):
  for i in range(len(qur)):
      #print(i)
      s = []
      s.append(arr[qur[i]])
      for j in range(qur[i]+1,len(arr)):
          if s[-1] < arr[j]:
              print(arr[j])
              s.pop()
              break
      if len(s) != 0:
          print(-1)

I call this function as:
nxtGreaterQuery([3,4,2,7,5,8,10,6],[3,6,1])
How can I replicate this using big_o module? I stuck in the following code:

import big_o
# How to pass two list as an argument  in the following line ???
best, others = big_o.big_o(nxtGreaterQuery,big_o.datagen.range_n,big_o.datagen.range_n, n_measures=100) 
print(best)

Thank you,
Saurabh

Show a nice report

Having to type the following:

best, others = big_o.big_o(heapify, data_generator_heapify, max_n=10**7)
print(best)
print()
for other in others:
    print(other)
print()

Is quite a drag. Also there needs to be a clear separation between best and others

Would be nice if there was a print_report function that prints it all for you, with a clear indication of which solution is likely best. This library looks like it's mostly for diagnostic usage anyway.

Add installation instructions to README

The library title on Github is big_O, but one needs pip install big-O to install it. This ambiguity can be easily solved by including a short installation section to the README. I will try to do it later but in case someone wants to do it before then.

Add units to string representation

The formulas printed with the complexity classes have no reference units.
Consider rescaling coefficients to milliseconds instead of seconds.

Passing function arguments to measure complexity

I have this implementation of the two-sum problem, and I want to test the time complexity. If I try and use this like in the documentation, it will tell me there is no argument target, could you tell me how I might use this library to test functions such as these?

def two_sum(arr, target):
     for i in arr:
             for j in arr:
                     if (i + j == target) and (i != j) and (i < j):
                             print((i, j))

Issue in importing the big_O

I have installed this module by pip install big_O and when I try to import big_O it says no module name Big O.

Polynomial.compute() and Exponential.compute() Return Incorrect Results

Description

big_o.complexities.Polynomial.compute() and big_o.complexities.Exponential.compute() consistently return unreasonable values.

This issue is present both when the class is used by itself, and when it is returned by big_o.big_o()

Steps to reproduce:

Run the following example code to test the Polynomial class:

import big_o
import numpy as np

polynomial_complexity = big_o.complexities.Polynomial()
n = np.array(range(1,11)) # array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

t = 3.7*n**4.5 
# array([3.70000000e+00, 8.37214429e+01, 5.19095627e+02, 1.89440000e+03,
#        5.17090720e+03, 1.17457932e+04, 2.35040609e+04, 4.28653788e+04,
#        7.28271000e+04, 1.17004273e+05])

polynomial_complexity.fit(n, t) # 2.753078336902666e-29

t_computed = polynomial_complexity.compute(n)
# array([ 1.30833282,  4.42749513,  6.25208812,  7.54665744,  8.55080343,
#         9.37125043, 10.06492849, 10.66581976, 11.19584342, 11.66996574])

Observe t_computed is very different from the values t

The issue can be plotted as follows:

import matplotlib.pyplot as plt
plt.plot(n, t, "-", n, t_computed, "--")
plt.show()

polynomial

In blue is t, the original function. In orange is t_computed.
Observe that, while t is consistent with a polynomial function, the t_computed line is negative and roughly linear/constant. The t_computed line is not consistent with a polynomial.

Expected Behavior:

Result of big_o.complexities.Polynomial() and big_o.complexities.Exponential() should resemble a polynomial/exponential function that matches the fitted data.

Output time complexity changes in different calls

Hey everyone,

So I came across the big_o library and I found it super useful and very cool. I was doing some tests, trying to understand how it works and I came across the following problem using the same code snippets as can be found in https://pypi.org/project/big-O/:

When I write:
big_o.big_o(np.zeros, big_o.datagen.n_, max_n=100000, n_repeats=100)

I get all sorts of outputs: Linear, Quadratic, Linearithmic, Cubic, ... . This is very disturbing because it looks like the result is not very stable and it might be right but might also be wrong.

I noticed this is particularly true if have already assessed the time complexity of some other method. That is, it looks like previous calls to "big_o" make subsequent calls less stable. I don't think this makes any sense but it was a pattern I have noticed (though it might have been a coincidence).

Has anyone experienced similar variations of the result?

See this snippet as an example of getting wrong results:

image

IndexError: index 0 is out of bounds for axis 0 with size 0

Running this code under Python 3.8 produced the following traceback. Note that it seems that max_n=10**7 is related to this bug, because it didn't happen before I used that.

Traceback (most recent call last):
  File "heapz.py", line 115, in <module>
    best, others = big_o.big_o(heapify, data_generator_heapify, max_n=10**6)
  File "C:\Program Files\Python38\lib\site-packages\big_o\big_o.py", line 157, in big_o
    return infer_big_o_class(ns, time, classes, verbose=verbose)
  File "C:\Program Files\Python38\lib\site-packages\big_o\big_o.py", line 97, in infer_big_o_class
    residuals = inst.fit(ns, time)
  File "C:\Program Files\Python38\lib\site-packages\big_o\complexities.py", line 38, in fit
    return residuals[0]
IndexError: index 0 is out of bounds for axis 0 with size 0

Add better examples in the documentation and discuss potential timing issues

Following the discussion in #37 , we need to substitute the current big_O examples in the documentation with new ones that are not so sensitive to the state of the machine they are running on.

Also, we need to add a paragraph discussing the situations in which the time measurements are likely to be too noisy to be meaningful.

FutureWarning in complexities.py

Nothing major, just a warning.

big_o/complexities.py:36: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
  coeff, residuals, rank, s = np.linalg.lstsq(x, y)

`return_raw_data` should include all timings (for `n_repeats`)

It would be nice if all of the raw data could be passed back if requested instead of only the minimum (for executions with n_repeats>1)

execution_time[i] = np.min(measurements)

So one could calculate standard deviation for the runs or perform other actions with it.

While at it it might be good idea to pass the raw data back in an extra variable instead adding it to the fitted dictionary:

big_O/big_o/big_o.py

Lines 171 to 175 in d700425

if return_raw_data:
fitted['measures'] = ns
fitted['times'] = time
return best, fitted

Idea:

return_values = (best, fitted)
if return_raw_data:
    return_values = (best, fitted, {
         'measures' = ns,
         'times' = time
    })
return return_values

Happy to provide a PR!

found some bugs on big_o.py

Unexpected dot on line: "from .complexities import ALL_CLASSES"
Missing: import datagen

With those minor fixes it works... with your examples.
Couldn't figure out how to make it work with other simple functions: selection sorting, bisection search, fibonacci, and others.
Could you please put some more examples on your README?

Required packages not installed

big_O appears to require numpy. However this requirement isn't present in setup.py leading to an error on import, ModuleNotFoundError: No module named numpy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.