Coder Social home page Coder Social logo

lab-timeit2's Introduction

Lab: timeit (Part 2)

Part I - Completing last week's lab

  1. Find a partner.

  2. Try to fork your partner's repo from last week. (Recall that this repo was forked from my upstream repo https://github.com/mikeizbicki/lab-timeit, and so the url will look the same just with their username.)

    You'll notice that GitHub doesn't let you fork this repo: you've already forked my repo, and GitHub doesn't let you have multiple forks of the same repo. To work around this problem, you will have to:

    1. Create a duplicate copy of your own lab-timeit repo that is not a fork of my repo. Follow the instructions at https://docs.github.com/en/repositories/creating-and-managing-repositories/duplicating-a-repository. It does not matter what you name your duplicate repo.

    2. Once your partner has their duplicate repo created, fork it.

  3. Clone the newly created fork onto the lambda server.

  4. Edit the README.md file to:

    1. Remove the comments around the big-O notation table.
    2. Fill in the correct runtimes in each box of the table using big-O notation.
  5. Submit a pull request to your partner's repo, and accept the pull request that your partner submits to your repo.

Part II

(You no longer need a partner, but you're encouraged to keep working together.)

For the second part of this lab, you will measure the runtime performance of the sequential and binary search algorithms in the notes.py file. There will be two tables to complete. Fork this repo, and make all changes in your own forked repo. When done, ensure that you push the changes back onto github.

Runtime vs N

Recall that sequential search has a worst case runtime of $\Theta(n)$ and binary search has a worst case runtime of $\Theta(\log n)$. We will prove these facts in class next week. In this assignment we will see just how much better the logarithmic runtime is than the linear runtime as $n$ grows large.

The following terminal command measures the runtime of the binary_search_itr function from the notes.py file on a list of length n=65536:

$ python3 -m timeit \
    'import notes; n = 65536; xs = list(range(-n,n))' \
    'notes.binary_search_itr(xs,5)'

NOTE: The backslash \ in the command above signifies that the command continues onto the next line. Recall that the \ is used to "escape" characters to change their meaning. In this case, the character that follows the \ is a newline character (rendered as \n in python), and the \ above signifies that the newline character should be interpreted as ordinary whitespace and not the end of a command. Therefore, the command above is functionally equivalent to

$ python3 -m timeit -s 'import notes; n = 65536; xs = list(range(-n,n))' 'notes.binary_search_rec(xs,5)'

But the first command is easier to read.

For each cell in the table below: Modify the command above for the corresponding search function and value of n; measure the runtime and enter it into the table.

sequential_search_itr binary_search_rec
n=2**0 0.152 usec 0.727 usec
n=2**1 0.189 usec 0.993 usec
n=2**2 0.264 usec 1.24 usec
n=2**3 0.399 usec 1.11 usec
n=2**4 0.558 usec 1.34 usec
n=2**5 0.87 usec 1.51 usec
n=2**6 1.5 usec 1.83 usec
n=2**7 2.74 usec 1.97 usec
n=2**8 5.23 usec 2.37 usec
n=2**9 10 usec 2.47 usec
n=2**10 19.6 usec 2.67 usec
n=2**11 37.9 usec 3.24 usec
n=2**12 78.4 usec 3.53 usec
n=2**13 152 usec 3.8 usec
n=2**14 307 usec 4.06 usec
n=2**15 624 usec 4.33 usec
n=2**16 1.28 msec 4.74 usec
n=2**17 2.56 msec 4.9 usec
n=2**18 5.04 msec 5.2 usec
n=2**19 10.2 msec 5.15 usec
n=2**20 20.5 msec 5.42 usec
n=2**21 41.1 msec 5.89 usec
n=2**22 82.6 msec 6.23 usec

HINT: You don't have to run all of these tests manually. The bash shell has a built-in for loop feature that you can use. To see how this feature works, run the command

$ for i in $(seq 0 22); do
    echo "i=$i"
done

Notice:

  1. When you enter a multiline command in bash, your prompt will probably change to >. It is traditional not to display this "multiline prompt" when writing commands.
  2. The $i gets substituted with each value between 0 and 22. This is different than python, where the last value in the range is included).

We can put the timeit command inside of this for loop as well to run all of the appropriate timeit calls. The final command will look something like:

$ for i in $(seq 0 22); do
    echo "i=$i"
    python3 -m timeit \
        -s "import notes; n = 2**$i; xs = list(range(-n,n))" \
        "notes.binary_search_rec(xs,5)"
done

But you'll have to modify it for the other table column.

You should observe that:

  1. Binary search is much faster for large $n$, but for small $n$ sequential search may be faster.
  2. Multiplying n times 2 gives a multiplicative slowdown for sequential search (by a factor of 2), but an additive slowdown for binary search.

You should ensure that this multiplicative vs additive slowdown makes sense to you based on the properties of logarithms.

At FAANG-type companies, they are searching through datasets of size n>1000000000000000 (15+ zeros). It should hopefully be clear from these examples that the logarithmic runtime is absolutely essential for any realtime queries of datasets of this size.

Measuring combinations of data structure / implementation

We will now compare the runtime of binary search on four of python's container types: list, deque, tuple, and array.

We've already covered the list/deque types in class. The tuple type is one that you've probably also seen. In python, it's denoted using parentheses instead of square brackets.

>>> xs = (1, 2, 3, 4, 5)

Tuples can also be created by omitting the parenthesis entirely or using the tuple function like so:

>>> xs = 1, 2, 3, 4, 5
>>> xs = tuple(range(1,6))

Tuples can be indexed and sliced just like lists in python, but they are immutable. This means that they cannot be updated (for example with the append method), and are therefore slightly more efficient.

The array type is likely one that you haven't seen before, since it is not usually introduced in intro programming courses. The array type is included in the numpy library. You create an array by first importing the library, and then calling the array constructor on an iterable (i.e. list-like container):

>>> import numpy
>>> numpy.array([1, 2, 3, 4, 5])
array([1, 2, 3, 4, 5])
>>> numpy.array(range(1,6))
array([1, 2, 3, 4, 5])

The array supports a very similar interface as a list. For example, you can index and slice just like in a list:

>>> xs = numpy.array(range(1,6))
>>> xs[3]
4
>>> xs[3:]
array([4, 5])

The purpose of the array is to support numerical computations from linear algebra, and it behaves differently than lists with respect to the + and * operators. Lists use "container algebra" operations:

>>> [1, 2] + [3, 4]
[1, 2, 3, 4]
>>> [1, 2]*2
[1, 2, 1, 2]

and arrays use "vector algebra" operations:

>>> [1, 2] + [3, 4]
[4, 6]
>>> [1, 2]*2
[2, 4]

In this problem, the important difference will be that:

  1. list slices make a copy and take time O(k), where k is the size of the slice;
  2. array slices do not make a copy and take time O(1). To see that array slices do not make a copy, run the following sequence of commands:
>>> xs = numpy.array([1, 2, 3, 4, 5])
>>> xs
array([1, 2, 3, 4, 5])
>>> ys = xs[3:]
>>> ys
array([4, 5])
>>> ys[0] = -1
>>> ys[1] = -2
>>> xs
array([ 1,  2,  3, -1, -2])

NOTE: If it's not obvious to you how the commands above would generate different output if xs were a list, then you should also run them for xs = [1, 2, 3, 4, 5] before continuing.

The following terminal command measures the runtime of the binary_search_itr command from the notes.py file on an array of length n=65536:

$ python3 -m timeit \
    -s 'import notes; import numpy; n = 65536; xs = numpy.array(range(-n,n))' \
    'notes.binary_search_itr(xs,5)'

For each cell in the table below: Modify the command above for the corresponding search function and container type; measure the runtime and enter it into the table.

array list tuple deque
sequential_search_itr 8.83 msec 0.151 usec 1.16 msec 1.33 msec
sequential_search_itr2 11.3 msec 0.328 usec 2.34 msec 109 msec
sequential_search_rec --- --- --- ---
binary_search_itr 9.97 usec 0.346 usec 2.31 usec 172 usec
binary_search_rec 11.6 usec 0.711 usec 4.14 usec 174 usec
binary_search_rec2 9.39 usec 0.535 usec 426 usec ---

You should notice that:

  1. for the array container, all implementations of binary search work well
  2. for the list container, the binary search that relies on slicing is slow
  3. the tuple container behaves just like the list container
  4. binary search provides no speed up for the deque container; the deque container also does not support slicing, and so the binary_search_rec2 function will have a type error
  5. the sequential_search_rec gets a RecursionError for large n values; this is one of the reasons we tend to prefer for loops over recursion when possible

We will prove all of these statements formally next week in class by showing that the runtimes are:

array list tuple deque
sequential_search_itr O(n) O(n) O(n) O(n)
sequential_search_itr2 O(n) O(n) O(n) O(n^2)
sequential_search_rec --- --- --- ---
binary_search_itr O(log n) O(log n) O(log n) O(n)
binary_search_rec O(log n) O(log n) O(log n) O(n)
binary_search_rec2 O(log n) O(n) O(n) ---

HINT: You'll notice that the binary_search_rec function has the best runtimes overall. This is the function you should use as the basis for your homework problems.

Submission

Submit the url to your new lab-timeit repo and your forked version of this repo to sakai.

lab-timeit2's People

Contributors

mikeizbicki avatar npcrites avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.