Comments (7)
There are a couple of problems with the way you measure the performance of your code:
- The data you use to measure performance is not sorted, but your function expects sorted data, use
sorted
, i.e.
test_data = lambda n: sorted(big_o.datagen.integers(n, 1, 10**4))
- I assume you want to test the average performance of your code, but the way you select the target means that you are often testing a corner case: your input numbers lies between 1 and 10^4, but the target is between -10^4 and 10^4, so you often test the case where the target is smaller than the smallest number. This can be improved with
def aux_function(arr):
# Select one of the targets in the array
target_idx = random.randint(0, len(arr))
target = arr[target_idx]
return binary_search(nums=arr, target=target)
- Since you're testing your function with random targets, you will need to test each N with several repetitions to measure the average behavior at that N, e.g.
n_repeats=10
- The default N ranges from 100 to 100000, but these limits are very small for this task, which completes very quickly even for very large lists. In order to cover a range of Ns where the algorithmic time dominates, I would use
test_data = lambda n: sorted(big_o.datagen.integers(n * 100, 1, 10**8))
. (I also increased the range of the random integers to avoid duplicates)
So the code looks like this:
import big_o
import pytest
import random
from time import sleep
def binary_search(nums: list[int], target: int) -> int:
if len(nums) == 0:
return -1
left, right = 0, len(nums) - 1
t = 0
while left <= right:
t += 1
sleep(0.01)
mid = (left + right) // 2
if nums[mid] == target:
return mid
elif nums[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
def aux_function(arr):
# Select one of the targets in the array
target_idx = random.randint(0, len(arr))
target = arr[target_idx]
return binary_search(nums=arr, target=target)
# Calculating the Time complexity
test_data = lambda n: sorted(big_o.datagen.integers(n * 100, 1, 10**8))
best, others = big_o.big_o(aux_function, test_data, n_measures=10, n_repeats=10, return_raw_data=True)
print(f"\n{big_o.reports.big_o_report(best, others)}")
and the report I get on my machine is
Best : Logarithmic: time = 0.59 + 0.19*log(n) (sec)
Constant: time = 2.5 (sec) (res: 1.3)
Linear: time = 2 + 8.5E-06*n (sec) (res: 0.61)
Quadratic: time = 2.2 + 6.4E-11*n^2 (sec) (res: 0.9)
Cubic: time = 2.3 + 5.5E-16*n^3 (sec) (res: 1)
Polynomial: time = 0.95 * x^0.093 (sec) (res: 0.036)
Logarithmic: time = 0.59 + 0.19*log(n) (sec) (res: 0.023)
Linearithmic: time = 2.1 + 7.1E-07*n*log(n) (sec) (res: 0.65)
Exponential: time = 2 * 1^n (sec) (res: 0.69)
from big_o.
Ah yes it was debugging code, sorry I forgot to remove it.
The bisection algorithm is very efficient, for a list of 100 elements you can expect ~7 iterations, and for 100000 elements ~17 iterations.
So now it depends why you want to compute the complexity:
- If you want to study the behavior of your code in the regime of relatively low list sizes (<= 100000 elements, maybe that's the list length you're going to work with), then you've got your answer: the overhead of calling
aux_function
dominates over the extra 10 loops, and the complexity is approximately constant (you can verify that by plotting the times). The call torandom.randint
inaux_function
might be contributing to that overhead, and you could improve the situation by getting rid of it by looking at the worst-case scenario and settingtarget_idx
always to 0. - If you want to check that the code follows the theoretical asymptotic behavior of log(N), then you need to create lists of a size where the number of iterations increases significantly (but the size necessary increases with 2^number of iterations!), or like I did add a significant load in the for loop (the
sleep
command)
The reason why you're getting Polynomial instead of Logarithmic is similar: your computer might a bit slower or faster than mine, and in the range of list lengths you're considering log(N) and N^0.08 must by very close
from big_o.
I would like to see an answer to this question
from big_o.
You can solve this by writing a custom data generator and testing only one input at a time.
At the current time, this module does not support multivariable big-o function guessing (perhaps a feature request if there is interest?). This module only supports testing a single N
at a time.
However, you can still test your function by holding one of the inputs constant, and testing the other.
import big_o
def nxtGreaterQuery(arr,qur):
for i in range(len(qur)):
s = []
s.append(arr[qur[i]])
for j in range(qur[i]+1,len(arr)):
if s[-1] < arr[j]:
s.pop()
break
# Generate custom data generators
def nxtGreaterQueryArrOnly(arr):
nxtGreaterQuery(arr, [3,6,1])
def nxtGreaterQueryQurOnly(qur):
# qur seems to be indexes into arr
# range_n will happily produces indexes that are too big
# So use module to shrink them
arr = [3,4,2,7,5,8,10,6]
qur = [n % len(arr) for n in qur]
nxtGreaterQuery([3,4,2,7,5,8,10,6], qur)
# Run Big-O
best, others = big_o.big_o(nxtGreaterQueryArrOnly,big_o.datagen.range_n,n_measures=100)
print("Complexity of Arr:", best)
best, others = big_o.big_o(nxtGreaterQueryQurOnly, big_o.datagen.range_n,n_measures=100)
print("Complexity of Qur:", best)
When I run this, I get the result:
Complexity of Arr: Constant: time = 3E-05 (sec)
Complexity of Qur: Linear: time = 0.0039 + 1.6E-06*n (sec)
So it would seem your function has linear complexity with respect to Qur
, and constant complexity with respect to Arr
.
from big_o.
Nice workaround but I think the result can be wrong sometimes. For example, the average case time complexity of the binary search algorithm is O(log N). I tried as below and I'm always getting Constant. Maybe I'm doing something wrong? My sample to test:
import big_o
import pytest
import random
class Sample:
def binary_search(self, nums: list[int], target: int) -> int:
if len(nums) == 0:
return -1
left, right = 0, len(nums) - 1
while left <= right:
mid = (left + right) // 2
if nums[mid] == target:
return mid
elif nums[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
@pytest.mark.parametrize("test_input, test_target, expected", [
([-1, 0, 3, 5, 9, 12], 9, 4),
([-1, 0, 3, 5, 9, 12], 2, -1),
])
def test_search(test_input, test_target, expected):
assert Sample().binary_search(nums=test_input, target=test_target) == expected
# Big-O support function
def aux_function(arr):
target = random.randint(-10 ** 4, 10 ** 4)
Sample().binary_search(nums=arr, target=target)
def test_complexity(capsys):
with capsys.disabled():
test_data = lambda n: big_o.datagen.integers(n, 1, 10**4)
# Calculating the Time complexity
best, others = big_o.big_o(aux_function, test_data, n_measures=100)
print(f"\n{big_o.reports.big_o_report(best, others)}")
Output:
Process finished with exit code 0
PASSED [ 33%]
PASSED [ 66%]
PASSED [100%]
Best : Constant: time = 1.1E-05 (sec)
Constant: time = 1.1E-05 (sec) (res: 3.1E-09)
Linear: time = 8.5E-06 + 4.7E-11n (sec) (res: 2.9E-09)
Quadratic: time = 9.4E-06 + 4.5E-16n^2 (sec) (res: 2.9E-09)
Cubic: time = 9.7E-06 + 4.6E-21n^3 (sec) (res: 2.9E-09)
Polynomial: time = 2E-06 * x^0.15 (sec) (res: 3E-09)
Logarithmic: time = -1.2E-06 + 1.1E-06log(n) (sec) (res: 2.9E-09)
Linearithmic: time = 8.7E-06 + 4E-12nlog(n) (sec) (res: 2.9E-09)
Exponential: time = 7.2E-06 * 1^n (sec) (res: 3E-09)
from big_o.
@pberkes
Thank you so much for the feedback and explanation. Totally makes sense.
I made the changes and was still getting a Constant time after applying them. Then, checking again your changes, I saw that you added a calculation (t+=1)
and a sleep(0.01)
. Adding them, I'm getting a polynomial time most of the time. From 5 executions, just as an example, I got logarithmic time only once. What are your hints here?
# Without calculation and sleep:
Best : Constant: time = 0.00012 (sec)
Best : Constant: time = 0.00011 (sec)
Best : Constant: time = 8.7E-05 (sec)
# With calculation and sleep:
Best : Polynomial: time = 0.88 * x^0.089 (sec)
Best : Logarithmic: time = 0.6 + 0.16*log(n) (sec)
Best : Polynomial: time = 0.93 * x^0.084 (sec)
Best : Polynomial: time = 1 * x^0.075 (sec)
Best : Polynomial: time = 0.97 * x^0.08 (sec)
from big_o.
Gotcha... Thanks for answering...
from big_o.
Related Issues (20)
- Python 3.5 issues with imports HOT 5
- FutureWarning in complexities.py HOT 1
- Required packages not installed
- Documentation standard should be Python3 HOT 1
- How to calculate the big_o [for] feedforward based neural network class
- Show a nice report HOT 7
- IndexError: index 0 is out of bounds for axis 0 with size 0 HOT 2
- Add installation instructions to README HOT 2
- Output time complexity changes in different calls HOT 3
- Add better examples in the documentation and discuss potential timing issues
- NameError: name 'xrange' is not defined
- Issue in importing the big_O HOT 2
- Polynomial.compute() and Exponential.compute() Return Incorrect Results
- Bump version number and release updated pip package HOT 2
- Add github actions workflow for running pytest HOT 2
- Add github actions workflow for sending a package to pypi on release HOT 3
- `return_raw_data` should include all timings (for `n_repeats`)
- Passing function arguments to measure complexity HOT 3
- Seeking advice on making test more repeatable HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from big_o.