Coder Social home page Coder Social logo

Comments (9)

rhettinger avatar rhettinger commented on August 17, 2024 6

However, it's a breaking chnage that happened without a deprecation period.

I would describe it differently. It was a change in non-tested, non-guaranteed behavior. Effectively, the downstream test suite was relying on an implementation detail that was not well-known, discussed, documented, tested, or intended.

Going forward, we can document, test and guarantee a definition of dataclass equality. Unlike what we have now, that would allow downstream users to make accurate predictions of what will happen with exotic objects like NaN.

FWIW that will make dataclasses better specified than most other tools in Python (for example, we don't promise exactly what list.sort() will do in the presence of a NaN or with objects that don't define a total ordering).

Unless my imagination is lacking, if we were to back out the optimization and make the change in the next release (or in a few releases from now), we couldn't generate a deprecation warning without drastically slowing down the code.

We make changes in implementation details all the time without going through a deprecation period. That can break overspecified tests but is not considered a "breaking change" (like changing a function signature) . IMO nothing good would come from reverting, deprecating, and reimplementing. It would be unnecessarily disruptive and it would deprive two generations of users of the more reasonable behavior that we all seem to prefer.

from cpython.

rhettinger avatar rhettinger commented on August 17, 2024 2

Roughly, this is the essential difference in the generated code:

class a:

    def __init__(self, a):
        self.a = a

    def old_eq(self, other):
        return (self.a,) == (other.a,)  # Wrap data in a tuple

    def new_eq(self, other):
        return self.a == other.a        # Don't wrap data in a tuple

I don't think the prior behavior of NaNs comparing equal on the basis of identity was an intended or desired behavior. That said, the change in behavior wasn't intended either. NaN identity comparisons weren't considered at all.

Personally, I think the new behavior is the best behavior for dataclasses. The norm for pure python classes is for == to mean whatever __eq__ returns. It matches how most people would write the code by hand.

The old behavior is a bit weird. Think about it. Two distinct dataclass instances compare as equal but their components compare as unequal even though there are no special field modifiers.

There is a case to be made either way. According to IEEE-741, NaNs aren't "supposed to" compare equal even when they are identical. Built-in types such as sets, dicts, lists, and tuples don't follow that rule but most user-defined classes do.

from cpython.

terryjreedy avatar terryjreedy commented on August 17, 2024 1

Bugfixes and internal changes are not listed in What's New except by reference to the changelob.

I am pretty sure you found the right issue. Built-in collections intentionally compares elements by is before == so that collections are equal to themselves and contain their elements even when they contain NANs.

>>> nan = float('nan')
>>> (nan,) == (nan,)
True
>>> nan == nan
False
>>> nan is nan
True
>>> {nan: nan} == {nan: nan}
True
>>> nan in (nan,)
True
>>> nan in {nan:nan}
True

These considerations do not normally apply to class instances, so I think that 'regression' versus 'bugfix' may be a decision (not by me) rather than a fact. NAN are nasty. But thank you for noticing and raising the issue.

from cpython.

rhettinger avatar rhettinger commented on August 17, 2024 1

@rhettinger, any thoughts?

If it were up to me, I would leave the code as-is and would add documention, "dataclass instances
compare equal when the types exactly match and the corresponding field values are equal".

My rationale is 1) the expected behavior of NaNs is to never compare equal, 2) the typical use case for a NaN is as a placeholder for missing data so there is no basis for saying the values are equal, and 3) optimizing memory use with a singleton NaN ideally shouldn't change what an application does.

Consider comparing two student budgets where neither student knows what their tuition bill is going to be:

student1 = Budget(rent=202.10, tuition=float('nan'), food=45.67)
student2 = Budget(rent=202.10, tuition=float('nan'), food=45.67)

Those budgets cannot be said to be equal because possibly differing data values are missing. Also the result of student1 == student2 shouldn't change with a speed and space optimization that replaces thefloat('nan') instantiations with a math.nan singleton.

from cpython.

ericvsmith avatar ericvsmith commented on August 17, 2024 1

Unless my imagination is lacking, if we were to back out the optimization and make the change in the next release (or in a few releases from now), we couldn't generate a deprecation warning without drastically slowing down the code. The best we could do is document it. I don't know if anyone would read the deprecation documentation and realize this applies to them.

I'm still mulling it over.

from cpython.

ericvsmith avatar ericvsmith commented on August 17, 2024

Yes, you're probably correct that this is related to #104904. @rhettinger, any thoughts?

from cpython.

hroncok avatar hroncok commented on August 17, 2024

As said, the 3.13 behavior indeed makes sense. However, it's a breaking chnage that happened without a deprecation period.

from cpython.

terryjreedy avatar terryjreedy commented on August 17, 2024

That is true of every bugfix. Eric can decide.

from cpython.

hroncok avatar hroncok commented on August 17, 2024

That is true of every bugfix.

What bugfix do you mean? This behavior change happened accidentally during an optimization change, didn't it?

from cpython.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.