Coder Social home page Coder Social logo

Comments (7)

pberkes avatar pberkes commented on September 18, 2024 1

@ericrommel

There are a couple of problems with the way you measure the performance of your code:

  • The data you use to measure performance is not sorted, but your function expects sorted data, use sorted, i.e.
    test_data = lambda n: sorted(big_o.datagen.integers(n, 1, 10**4))
  • I assume you want to test the average performance of your code, but the way you select the target means that you are often testing a corner case: your input numbers lies between 1 and 10^4, but the target is between -10^4 and 10^4, so you often test the case where the target is smaller than the smallest number. This can be improved with
def aux_function(arr):
    # Select one of the targets in the array
    target_idx = random.randint(0, len(arr))
    target = arr[target_idx]
    return binary_search(nums=arr, target=target)
  • Since you're testing your function with random targets, you will need to test each N with several repetitions to measure the average behavior at that N, e.g. n_repeats=10
  • The default N ranges from 100 to 100000, but these limits are very small for this task, which completes very quickly even for very large lists. In order to cover a range of Ns where the algorithmic time dominates, I would use
    test_data = lambda n: sorted(big_o.datagen.integers(n * 100, 1, 10**8)). (I also increased the range of the random integers to avoid duplicates)

So the code looks like this:

import big_o
import pytest
import random
from time import sleep


def binary_search(nums: list[int], target: int) -> int:
    if len(nums) == 0:
        return -1
    left, right = 0, len(nums) - 1
    t = 0
    while left <= right:
        t += 1
        sleep(0.01)
        
        mid = (left + right) // 2
        if nums[mid] == target:
            return mid
        elif nums[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1


def aux_function(arr):
    # Select one of the targets in the array
    target_idx = random.randint(0, len(arr))
    target = arr[target_idx]
    return binary_search(nums=arr, target=target)


# Calculating the Time complexity
test_data = lambda n: sorted(big_o.datagen.integers(n * 100, 1, 10**8))

best, others = big_o.big_o(aux_function, test_data, n_measures=10, n_repeats=10, return_raw_data=True)
print(f"\n{big_o.reports.big_o_report(best, others)}")

and the report I get on my machine is

Best : Logarithmic: time = 0.59 + 0.19*log(n) (sec)                 
Constant: time = 2.5 (sec)                                      (res: 1.3)
Linear: time = 2 + 8.5E-06*n (sec)                              (res: 0.61)
Quadratic: time = 2.2 + 6.4E-11*n^2 (sec)                       (res: 0.9)
Cubic: time = 2.3 + 5.5E-16*n^3 (sec)                           (res: 1)
Polynomial: time = 0.95 * x^0.093 (sec)                         (res: 0.036)
Logarithmic: time = 0.59 + 0.19*log(n) (sec)                    (res: 0.023)
Linearithmic: time = 2.1 + 7.1E-07*n*log(n) (sec)               (res: 0.65)
Exponential: time = 2 * 1^n (sec)                               (res: 0.69)

from big_o.

pberkes avatar pberkes commented on September 18, 2024 1

@ericrommel

Ah yes it was debugging code, sorry I forgot to remove it.

The bisection algorithm is very efficient, for a list of 100 elements you can expect ~7 iterations, and for 100000 elements ~17 iterations.

So now it depends why you want to compute the complexity:

  • If you want to study the behavior of your code in the regime of relatively low list sizes (<= 100000 elements, maybe that's the list length you're going to work with), then you've got your answer: the overhead of calling aux_function dominates over the extra 10 loops, and the complexity is approximately constant (you can verify that by plotting the times). The call to random.randint in aux_function might be contributing to that overhead, and you could improve the situation by getting rid of it by looking at the worst-case scenario and setting target_idx always to 0.
  • If you want to check that the code follows the theoretical asymptotic behavior of log(N), then you need to create lists of a size where the number of iterations increases significantly (but the size necessary increases with 2^number of iterations!), or like I did add a significant load in the for loop (the sleep command)

The reason why you're getting Polynomial instead of Logarithmic is similar: your computer might a bit slower or faster than mine, and in the range of list lengths you're considering log(N) and N^0.08 must by very close

from big_o.

amoynahan avatar amoynahan commented on September 18, 2024

I would like to see an answer to this question

from big_o.

TheMatt2 avatar TheMatt2 commented on September 18, 2024

You can solve this by writing a custom data generator and testing only one input at a time.

At the current time, this module does not support multivariable big-o function guessing (perhaps a feature request if there is interest?). This module only supports testing a single N at a time.

However, you can still test your function by holding one of the inputs constant, and testing the other.

import big_o

def nxtGreaterQuery(arr,qur):
  for i in range(len(qur)):
      s = []
      s.append(arr[qur[i]])
      for j in range(qur[i]+1,len(arr)):
          if s[-1] < arr[j]:
              s.pop()
              break

# Generate custom data generators
def nxtGreaterQueryArrOnly(arr):
    nxtGreaterQuery(arr, [3,6,1])

def nxtGreaterQueryQurOnly(qur):
    # qur seems to be indexes into arr
    # range_n will happily produces indexes that are too big
    # So use module to shrink them
    arr = [3,4,2,7,5,8,10,6]
    qur = [n % len(arr) for n in qur]
    nxtGreaterQuery([3,4,2,7,5,8,10,6], qur)

# Run Big-O
best, others = big_o.big_o(nxtGreaterQueryArrOnly,big_o.datagen.range_n,n_measures=100)
print("Complexity of Arr:", best)

best, others = big_o.big_o(nxtGreaterQueryQurOnly, big_o.datagen.range_n,n_measures=100)
print("Complexity of Qur:", best)

When I run this, I get the result:

Complexity of Arr: Constant: time = 3E-05 (sec)
Complexity of Qur: Linear: time = 0.0039 + 1.6E-06*n (sec)

So it would seem your function has linear complexity with respect to Qur, and constant complexity with respect to Arr.

from big_o.

ericrommel avatar ericrommel commented on September 18, 2024

Nice workaround but I think the result can be wrong sometimes. For example, the average case time complexity of the binary search algorithm is O(log N). I tried as below and I'm always getting Constant. Maybe I'm doing something wrong? My sample to test:

import big_o
import pytest
import random

class Sample:
    def binary_search(self, nums: list[int], target: int) -> int:
        if len(nums) == 0:
            return -1
        left, right = 0, len(nums) - 1
        while left <= right:
            mid = (left + right) // 2
            if nums[mid] == target:
                return mid
            elif nums[mid] < target:
                left = mid + 1
            else:
                right = mid - 1
        return -1

@pytest.mark.parametrize("test_input, test_target, expected", [
    ([-1, 0, 3, 5, 9, 12], 9, 4),
    ([-1, 0, 3, 5, 9, 12], 2, -1),
])
def test_search(test_input, test_target, expected):
    assert Sample().binary_search(nums=test_input, target=test_target) == expected

# Big-O support function
def aux_function(arr):
    target = random.randint(-10 ** 4, 10 ** 4)
    Sample().binary_search(nums=arr, target=target)

def test_complexity(capsys):
    with capsys.disabled():
        test_data = lambda n: big_o.datagen.integers(n, 1, 10**4)

        # Calculating the Time complexity
        best, others = big_o.big_o(aux_function, test_data, n_measures=100)
        print(f"\n{big_o.reports.big_o_report(best, others)}")

Output:

Process finished with exit code 0
PASSED [ 33%]
PASSED [ 66%]
PASSED [100%]
Best : Constant: time = 1.1E-05 (sec)
Constant: time = 1.1E-05 (sec) (res: 3.1E-09)
Linear: time = 8.5E-06 + 4.7E-11n (sec) (res: 2.9E-09)
Quadratic: time = 9.4E-06 + 4.5E-16
n^2 (sec) (res: 2.9E-09)
Cubic: time = 9.7E-06 + 4.6E-21n^3 (sec) (res: 2.9E-09)
Polynomial: time = 2E-06 * x^0.15 (sec) (res: 3E-09)
Logarithmic: time = -1.2E-06 + 1.1E-06
log(n) (sec) (res: 2.9E-09)
Linearithmic: time = 8.7E-06 + 4E-12nlog(n) (sec) (res: 2.9E-09)
Exponential: time = 7.2E-06 * 1^n (sec) (res: 3E-09)

from big_o.

ericrommel avatar ericrommel commented on September 18, 2024

@pberkes
Thank you so much for the feedback and explanation. Totally makes sense.

I made the changes and was still getting a Constant time after applying them. Then, checking again your changes, I saw that you added a calculation (t+=1) and a sleep(0.01). Adding them, I'm getting a polynomial time most of the time. From 5 executions, just as an example, I got logarithmic time only once. What are your hints here?

# Without calculation and sleep:
Best : Constant: time = 0.00012 (sec)
Best : Constant: time = 0.00011 (sec)
Best : Constant: time = 8.7E-05 (sec)
# With calculation and sleep:
Best : Polynomial: time = 0.88 * x^0.089 (sec)
Best : Logarithmic: time = 0.6 + 0.16*log(n) (sec)
Best : Polynomial: time = 0.93 * x^0.084 (sec)
Best : Polynomial: time = 1 * x^0.075 (sec)
Best : Polynomial: time = 0.97 * x^0.08 (sec)

from big_o.

ericrommel avatar ericrommel commented on September 18, 2024

@pberkes

Gotcha... Thanks for answering...

from big_o.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.