Hi, Thanks for sharing this module. I am trying to

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to pass predefined list as an argument to measure time complexity? about big_o HOT 7 CLOSED

pberkes commented on September 18, 2024 2

How to pass predefined list as an argument to measure time complexity?

from big_o.

Comments (7)

pberkes commented on September 18, 2024 1

@ericrommel

There are a couple of problems with the way you measure the performance of your code:

The data you use to measure performance is not sorted, but your function expects sorted data, use sorted, i.e.
test_data = lambda n: sorted(big_o.datagen.integers(n, 1, 10**4))
I assume you want to test the average performance of your code, but the way you select the target means that you are often testing a corner case: your input numbers lies between 1 and 10^4, but the target is between -10^4 and 10^4, so you often test the case where the target is smaller than the smallest number. This can be improved with

def aux_function(arr):
    # Select one of the targets in the array
    target_idx = random.randint(0, len(arr))
    target = arr[target_idx]
    return binary_search(nums=arr, target=target)

Since you're testing your function with random targets, you will need to test each N with several repetitions to measure the average behavior at that N, e.g. n_repeats=10
The default N ranges from 100 to 100000, but these limits are very small for this task, which completes very quickly even for very large lists. In order to cover a range of Ns where the algorithmic time dominates, I would use
test_data = lambda n: sorted(big_o.datagen.integers(n * 100, 1, 10**8)). (I also increased the range of the random integers to avoid duplicates)

So the code looks like this:

import big_o
import pytest
import random
from time import sleep


def binary_search(nums: list[int], target: int) -> int:
    if len(nums) == 0:
        return -1
    left, right = 0, len(nums) - 1
    t = 0
    while left <= right:
        t += 1
        sleep(0.01)
        
        mid = (left + right) // 2
        if nums[mid] == target:
            return mid
        elif nums[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1


def aux_function(arr):
    # Select one of the targets in the array
    target_idx = random.randint(0, len(arr))
    target = arr[target_idx]
    return binary_search(nums=arr, target=target)


# Calculating the Time complexity
test_data = lambda n: sorted(big_o.datagen.integers(n * 100, 1, 10**8))

best, others = big_o.big_o(aux_function, test_data, n_measures=10, n_repeats=10, return_raw_data=True)
print(f"\n{big_o.reports.big_o_report(best, others)}")

and the report I get on my machine is

Best : Logarithmic: time = 0.59 + 0.19*log(n) (sec)                 
Constant: time = 2.5 (sec)                                      (res: 1.3)
Linear: time = 2 + 8.5E-06*n (sec)                              (res: 0.61)
Quadratic: time = 2.2 + 6.4E-11*n^2 (sec)                       (res: 0.9)
Cubic: time = 2.3 + 5.5E-16*n^3 (sec)                           (res: 1)
Polynomial: time = 0.95 * x^0.093 (sec)                         (res: 0.036)
Logarithmic: time = 0.59 + 0.19*log(n) (sec)                    (res: 0.023)
Linearithmic: time = 2.1 + 7.1E-07*n*log(n) (sec)               (res: 0.65)
Exponential: time = 2 * 1^n (sec)                               (res: 0.69)

from big_o.

pberkes commented on September 18, 2024 1

@ericrommel

Ah yes it was debugging code, sorry I forgot to remove it.

The bisection algorithm is very efficient, for a list of 100 elements you can expect ~7 iterations, and for 100000 elements ~17 iterations.

So now it depends why you want to compute the complexity:

If you want to study the behavior of your code in the regime of relatively low list sizes (<= 100000 elements, maybe that's the list length you're going to work with), then you've got your answer: the overhead of calling aux_function dominates over the extra 10 loops, and the complexity is approximately constant (you can verify that by plotting the times). The call to random.randint in aux_function might be contributing to that overhead, and you could improve the situation by getting rid of it by looking at the worst-case scenario and setting target_idx always to 0.
If you want to check that the code follows the theoretical asymptotic behavior of log(N), then you need to create lists of a size where the number of iterations increases significantly (but the size necessary increases with 2^number of iterations!), or like I did add a significant load in the for loop (the sleep command)

The reason why you're getting Polynomial instead of Logarithmic is similar: your computer might a bit slower or faster than mine, and in the range of list lengths you're considering log(N) and N^0.08 must by very close

from big_o.

amoynahan commented on September 18, 2024

I would like to see an answer to this question

from big_o.

TheMatt2 commented on September 18, 2024

You can solve this by writing a custom data generator and testing only one input at a time.

At the current time, this module does not support multivariable big-o function guessing (perhaps a feature request if there is interest?). This module only supports testing a single N at a time.

However, you can still test your function by holding one of the inputs constant, and testing the other.

import big_o

def nxtGreaterQuery(arr,qur):
  for i in range(len(qur)):
      s = []
      s.append(arr[qur[i]])
      for j in range(qur[i]+1,len(arr)):
          if s[-1] < arr[j]:
              s.pop()
              break

# Generate custom data generators
def nxtGreaterQueryArrOnly(arr):
    nxtGreaterQuery(arr, [3,6,1])

def nxtGreaterQueryQurOnly(qur):
    # qur seems to be indexes into arr
    # range_n will happily produces indexes that are too big
    # So use module to shrink them
    arr = [3,4,2,7,5,8,10,6]
    qur = [n % len(arr) for n in qur]
    nxtGreaterQuery([3,4,2,7,5,8,10,6], qur)

# Run Big-O
best, others = big_o.big_o(nxtGreaterQueryArrOnly,big_o.datagen.range_n,n_measures=100)
print("Complexity of Arr:", best)

best, others = big_o.big_o(nxtGreaterQueryQurOnly, big_o.datagen.range_n,n_measures=100)
print("Complexity of Qur:", best)

When I run this, I get the result:

Complexity of Arr: Constant: time = 3E-05 (sec)
Complexity of Qur: Linear: time = 0.0039 + 1.6E-06*n (sec)

So it would seem your function has linear complexity with respect to Qur, and constant complexity with respect to Arr.

from big_o.

ericrommel commented on September 18, 2024

Nice workaround but I think the result can be wrong sometimes. For example, the average case time complexity of the binary search algorithm is O(log N). I tried as below and I'm always getting Constant. Maybe I'm doing something wrong? My sample to test:

import big_o
import pytest
import random

class Sample:
    def binary_search(self, nums: list[int], target: int) -> int:
        if len(nums) == 0:
            return -1
        left, right = 0, len(nums) - 1
        while left <= right:
            mid = (left + right) // 2
            if nums[mid] == target:
                return mid
            elif nums[mid] < target:
                left = mid + 1
            else:
                right = mid - 1
        return -1

@pytest.mark.parametrize("test_input, test_target, expected", [
    ([-1, 0, 3, 5, 9, 12], 9, 4),
    ([-1, 0, 3, 5, 9, 12], 2, -1),
])
def test_search(test_input, test_target, expected):
    assert Sample().binary_search(nums=test_input, target=test_target) == expected

# Big-O support function
def aux_function(arr):
    target = random.randint(-10 ** 4, 10 ** 4)
    Sample().binary_search(nums=arr, target=target)

def test_complexity(capsys):
    with capsys.disabled():
        test_data = lambda n: big_o.datagen.integers(n, 1, 10**4)

        # Calculating the Time complexity
        best, others = big_o.big_o(aux_function, test_data, n_measures=100)
        print(f"\n{big_o.reports.big_o_report(best, others)}")

Output:

Process finished with exit code 0
PASSED [ 33%]
PASSED [ 66%]
PASSED [100%]
Best : Constant: time = 1.1E-05 (sec)
Constant: time = 1.1E-05 (sec) (res: 3.1E-09)
Linear: time = 8.5E-06 + 4.7E-11n (sec) (res: 2.9E-09)
Quadratic: time = 9.4E-06 + 4.5E-16n^2 (sec) (res: 2.9E-09)
Cubic: time = 9.7E-06 + 4.6E-21n^3 (sec) (res: 2.9E-09)
Polynomial: time = 2E-06 * x^0.15 (sec) (res: 3E-09)
Logarithmic: time = -1.2E-06 + 1.1E-06log(n) (sec) (res: 2.9E-09)
Linearithmic: time = 8.7E-06 + 4E-12nlog(n) (sec) (res: 2.9E-09)
Exponential: time = 7.2E-06 * 1^n (sec) (res: 3E-09)

from big_o.

ericrommel commented on September 18, 2024

@pberkes
Thank you so much for the feedback and explanation. Totally makes sense.

I made the changes and was still getting a Constant time after applying them. Then, checking again your changes, I saw that you added a calculation (t+=1) and a sleep(0.01). Adding them, I'm getting a polynomial time most of the time. From 5 executions, just as an example, I got logarithmic time only once. What are your hints here?

# Without calculation and sleep:
Best : Constant: time = 0.00012 (sec)
Best : Constant: time = 0.00011 (sec)
Best : Constant: time = 8.7E-05 (sec)

# With calculation and sleep:
Best : Polynomial: time = 0.88 * x^0.089 (sec)
Best : Logarithmic: time = 0.6 + 0.16*log(n) (sec)
Best : Polynomial: time = 0.93 * x^0.084 (sec)
Best : Polynomial: time = 1 * x^0.075 (sec)
Best : Polynomial: time = 0.97 * x^0.08 (sec)

from big_o.

ericrommel commented on September 18, 2024

@pberkes

Gotcha... Thanks for answering...

from big_o.

How to pass predefined list as an argument to measure time complexity? about big_o HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent