In many parts of the codebase we opt to use OrderedDict instead of standard dictionari

Replace OrderedDict with standard dictionaries where possible about pygsti HOT 12 OPEN

coreyostrove commented on July 26, 2024

Replace OrderedDict with standard dictionaries where possible

from pygsti.

Comments (12)

sserita commented on July 26, 2024 1

We no longer support versions of Python where dicts do not remember insertion order. I'm fine with this change!

from pygsti.

enielse commented on July 26, 2024

Yeah, I go back and forth on this. I agree the readability is improved in that it's less clunky and cleaner syntax overall, but using OrderedDict explicitly indicates that the ordering within the dictionary is important and we're relying on that, which isn't always that case, and maybe is typically not the case. I could be convinced either way :)

from pygsti.

sserita commented on July 26, 2024

That's a fair counterpoint, the signaling of OrderedDict is much clearer that insertion order is important.Although I'll note that I'm not sure that distinction is immediately obvious to users. I was pair programming something yesterday, we got to an OrderedDict, and I could literally see their thought process settling on just using a regular dict instead.

But I can definitely see the value in signaling to developers that insertion order matters for that object, and not to manipulate it in careless ways... Hmm, maybe I'm changing my stance from pro this change to indecision on it

from pygsti.

rileyjmurray commented on July 26, 2024

I've come to see value in using OrderedDict instead of dict to flag that order preservation is important. So I could be convinced either way.

from pygsti.

sserita commented on July 26, 2024

I think my stance is now that I am for keeping OrderedDict in cases where the order preservation is specifically important. In my mind, this does come with a task of doing an audit of the codebase where we use OrderedDict and migrating to a dict when we don't care about insertion order to ensure this distinction is actually maintained. Probably this is scriptable somehow - if we are using methods that check the order explicitly, it's needed, or something like that.

But assuming that most places an OrderedDict was used pre-3.7 was a place where it was important, presumably most of them are legit uses of the class and so this is probably a relatively low priority thing to check.

from pygsti.

coreyostrove commented on July 26, 2024

Would you be swayed at all by cold hard performance measurements?

While I didn't state this explicitly, part of the subtext behind my suggestion is that I had heard that nowadays dicts are more performant than OrderedDict. (In fact, as I recall hearing the story told, the change in behavior in 3.6 was actually related to a re-implementation intended to improve performance, and the fact that the new implementation now remembered insertion order was a happy side-effect the devs decided to roll with and make permanent). This is hinted at in the documentation for OrderedDict:

Ordered dictionaries are just like regular dictionaries but have some extra capabilities relating to ordering operations. They have become less important now that the built-in dict class gained the ability to remember insertion order (this new behavior became guaranteed in Python 3.7).

Some differences from dict still remain:

The regular dict was designed to be very good at mapping operations. Tracking insertion order was secondary.

The OrderedDict was designed to be good at reordering operations. Space efficiency, iteration speed, and the performance of update operations were secondary.

The OrderedDict algorithm can handle frequent reordering operations better than dict. As shown in the recipes below, this makes it suitable for implementing various kinds of LRU caches.

Anyhow, I'd never actually tested this myself to confirm this was true in practice, so here are some timings on iterating through the two types of dictionaries:

from collections import OrderedDict
test_dict_100 = {key:val for key,val in zip(range(100), range(100))}
test_dict_1000 = {key:val for key,val in zip(range(1000), range(1000))}
test_dict_10000= {key:val for key,val in zip(range(10000), range(10000))} 
test_dict_100000= {key:val for key,val in zip(range(100000), range(100000))}
test_ordered_dict_100 = OrderedDict(test_dict_100)
test_ordered_dict_1000 = OrderedDict(test_dict_1000)
test_ordered_dict_10000 = OrderedDict(test_dict_10000)
test_ordered_dict_100000 = OrderedDict(test_dict_100000)

%%timeit
for _, _ in test_dict_100.items():
    pass

%%timeit
for _, _ in test_dict_1000.items():
    pass

%%timeit
for _, _ in test_dict_10000.items():
    pass

%%timeit
for _, _ in test_dict_100000.items():
    pass

%%timeit
for _, _ in test_ordered_dict_100.items():
    pass

%%timeit
for _, _ in test_ordered_dict_1000.items():
    pass

%%timeit
for _, _ in test_ordered_dict_10000.items():
    pass

%%timeit
for _, _ in test_ordered_dict_100000.items():
    pass

The results for the regular dict were:

1.92 µs ± 11.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
18.2 µs ± 238 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
186 µs ± 3.3 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
1.99 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

and for OrderedDict:

3.75 µs ± 39.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
36.8 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
372 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
4.04 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So, based on just this test regular dictionaries are almost exactly 2X faster for iteration. I suspect that we'd find the OrderedDict more performant for re-ordering operations and the like (as indicated by the docs), but I'd also posit we don't use these all that often in practice and that the primary historical use case for OrderedDict was to ensure we iterated over dictionaries in a predictable order.

Have I swayed anyone back to the dark side?

from pygsti.

sserita commented on July 26, 2024

Performance metrics are usually a good way to sway me; however, it also needs to be paired with how frequently an operation is being done in normal workflows. Is there an OrderedDict that we are hitting a lot where you think this performance matters? :)

from pygsti.

rileyjmurray commented on July 26, 2024

I think OrderedDict objects are almost as common as dicts in pyGSTi. And dictionary accesses happen everywhere in pyGSTi. So these performance numbers push me strongly in favor of switching OrderedDict to dict.

from pygsti.

sserita commented on July 26, 2024

Thoughts on an alternative way to signal that the ordering matters in that case? Maybe a naming scheme where the dict has a _ordered or maybe setting a order_matters attribute. Just wondering if we should have a convention for we want to alert ourselves to not mess with the order too much? Or would that end up having too much user exposure and be too annoying? We could also just do a comment where we instantiate, but that seems likely to be ignored/hard to maintain.

from pygsti.

robinbk commented on July 26, 2024

Does python have an equivalent of C's "typedef" statement? It creates an alias for a type, so "typedef float MyType" creates a new type "MyType" that is isomorphic to (and wholly compatible with) float. This would let you use dicts for speed, but keep the purely semantic reminder that this *particular* object cares about order.

…

On Tue, May 7, 2024, 3:59 PM Stefan Seritan ***@***.***> wrote: Thoughts on an alternative way to signal that the ordering matters in that case? Maybe a naming scheme where the dict has a _ordered or maybe setting a order_matters attribute. Just wondering if we should have a convention for we want to alert ourselves to not mess with the order too much? Or would that end up having too much user exposure and be too annoying? We could also just do a comment where we instantiate, but that seems likely to be ignored/hard to maintain. — Reply to this email directly, view it on GitHub <#427 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNUZ5IOIETRYKTHJZ4EZJDZBFMDHAVCNFSM6AAAAABGWANNN2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJZGQ2DIMJYGE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

from pygsti.

sserita commented on July 26, 2024

Ooh I like this possibility. In Python, this is called type aliasing: https://docs.python.org/3/library/typing.html#type-aliases

Would using a type alias like this be an agreeable way to keep the semantic reminder but have increased readability @coreyostrove? Or maybe if you wanted to just have a = {some dict logic} instead of a = OrderedDict(); some dict processing, then just having the type alias as a typing hint: type ordereddict = Dict; a: ordereddict = {some dict logic}?

from pygsti.

coreyostrove commented on July 26, 2024

Marking this via type annotations works for me 👍

from pygsti.

Replace OrderedDict with standard dictionaries where possible about pygsti HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent