Coder Social home page Coder Social logo

Comments (24)

gvanrossum avatar gvanrossum commented on May 26, 2024 1

Okay, fixed the typo in the original.

from patma.

ilevkivskyi avatar ilevkivskyi commented on May 26, 2024

Important note: ideas marked as rejected there are not really rejected, I just wanted to summarize all alternatives, on many of them I was 50-50.

from patma.

ilevkivskyi avatar ilevkivskyi commented on May 26, 2024

Btw, it is interesting how close are the proposals, it is a good sign!

from patma.

ilevkivskyi avatar ilevkivskyi commented on May 26, 2024

After quickly scrolling through issues, it seems to me all ideas where we diverge are in the 50-50 zone I mentioned.

Couple exceptions:

  • I think it is important to include a detailed spec on how __match__() is generated for dataclasses in the PEP
  • I think we should add @typing.sealed, see #18

from patma.

gvanrossum avatar gvanrossum commented on May 26, 2024
  • I like the idea of adding sealed (and in general thinking things through from the POV of a static type checker).
  • Our proposed __match__ API (and class pattern semantics) is so different from yours that dataclasses are very simple to support for us (see below).

Let me summarize what I see as the main differences.

  • Differences around as vs. case, indentation, multiple as clauses per block (#2).
  • Our proposal has no else (#6) and is a pass if nothing matches (#5). We use case _: instead of else:, and _ is the universal "don't care" pattern.
  • We have | for OR patterns. (We decided against AND patterns, #14.)
  • We have somewhat different rules for variables (#1). This is controversial, but we ended up with a compromise where "single" names starting with a lowercase letter are variables but if a name starts with an upper case letter it's an expression; and dotted names are always expressions (solving some of your issues).
  • We are adopting semantics closer to sequence unpacking, so [a, b] and (a, b) are patterns with exactly the same meaning, and they match any sequence. (I guess if you really wanted to check for a tuple you could use tuple([a, b]).
  • Our API for __match__ is entirely different (#8). See below for a summary.

UPDATE: Here are some more differences:

  • You write unstructured matches as C(...); we just write C(). (see __match__ discussion below.)
  • You bind names only after the guard succeeds. Our proposal leaves the name bound even if the guard fails (in fact it may leave some names in a pattern bound even when a later sub-pattern fails). (Our proposal is inspired by the behavior of := here.)
  • You propose special semantics for coinciding names, so e.g. (x, x) would match a pair of equal values. In our proposal this is just a user error, and to match a pair you'd write case (x, y) if x == y:.
  • You have a special syntax for one-off matches. We should discuss this separately.

About the __match__ method

A pattern of the form C(<whatever>) calls C.__match__(target) (where target is the value from match target:), which should return an object whose attributes are used to match <whatever> or None if there's no match, or None if there's no match. The __match__() method is responsible for checking isinstance(), so that it can be used for duck typing.

In the general case __match__() can return some proxy that does attribute mapping and filtering. In many cases this can just return target unchanged. There's one magic attribute that the returned proxy may have, __match_args__, used for positional argument matching (see below). Usually __match_args__ is a class attribute.

The built-in pattern matching machinery is responsible of processing the return of `match():

  • If the pattern has the form C(), a non-None return from __match__() means success.

  • If the pattern uses keyword args, those are looked up as attributes on the proxy returned by __match__(). For example, Point(x=x, y=0) matches points whose y coordinate is zero, and binds x to the pattern's x coordinate.

  • If the pattern uses positional arguments, they are mapped to names using the __match_args__ attribute of the proxy.

  • For dataclasses we can have __match__() return self (after isinstance() check) and have a class attribute __match_args__ equal to list(C.__annotations__).

  • Not all positional arguments need to be given in the pattern, so Point(x) means the same as Point(x, _).

  • Positional and keyword args may be mixed. Examples: Point(x, y), Point(x, y=y), Point(y=y, x=x) all mean the same thing.

from patma.

ilevkivskyi avatar ilevkivskyi commented on May 26, 2024

Our API for __match__ is entirely different

It is actually not that different. The general logic is like this: interpreter asks an object: "Can you represent/unpack yourself as something like this?" and the object answers: "yes/no" and in "yes" how exactly. The details of how the answer is represented, as a custom object, or tuple + dictionary is less important. What is more important here is how much context you give to the object. There are two extremes here:

  • One is IIUC your proposal: give no context at all, so essentially in your case interpreter just asks "How can you represent/unpack yourself?"
  • The other extreme is in rejected ideas in my draft: give full nested pattern object, and ask the object "Can you represent/unpack yourself as this nested structure?"

I think both extremes are sub-optimal. For the second there are some arguments in the draft, for the first extreme I think the issue is that it is not flexible enough. I think the best option is somewhere in the middle, so that you ask an object "can you represent/unpack yourself like this: two things and z=third thing?"

In my proposal I pass the literal patterns as part of the context, but maybe it is still to much context. I think we should at least pass this data to __match__():

  • How many positional matches were requested
  • Which keys were requested
  • Are there * and/or **

One of the motivation for this is that I believe __match__() should raise an exception for impossible or ambiguous matches, and interpreter can't detect this because it doesn't know the mapping between named and positional representation of an object. In particular I think Point3D(1, 2) should raise an exception because it is ambiguous between Point(1, 2, _) and Point(_, 1, 2). Also Point2D(1, 2, y=1) should raise an exception, it can hide a hard to debug error where a different class was written (like it should probably be Point3D in the last example).

from patma.

ilevkivskyi avatar ilevkivskyi commented on May 26, 2024

Forgot about one thing: moving isinstance() to the __match__ is definitely a very good idea, +1 on this.

About other things:

  • I don't have strong attachment to coinciding names, exception is also a good option (you see I like exceptions :-))
  • Uppercase names as constants are probably fine. I was also thinking of this, but was stopped by the fact that there are no precedents in Python where name case influences semantics.
  • as vs case, I really like as more, but it may be personal.
  • +1 on name binding vs guards, let's be consistent with existing precedents.
  • (a, b) vs [a, b] probably also OK, it is good to be consistent with existing logic.
  • I was actually on the fence about allowing |, so I am fine with this. Note this may complicate the spec quite a bit, e.g. do we allow name | other_name, can one apply | to named sub-matches etc.
  • No else:, I am also +1

from patma.

ilevkivskyi avatar ilevkivskyi commented on May 26, 2024

One more deviation you forgot, what to do in case if all patterns fail? I propose to raise an exception. The main motivation is that most cases I have seen are rather fall where if match would work, or where I would want an exception. So probably whether to raise an exception or not may depend on whether we include if match syntax.

from patma.

gvanrossum avatar gvanrossum commented on May 26, 2024

About __match__:

We considered various APIs for __match__ but they were all pretty complex -- like yours, which has positional args, keyword args, *args, and **kwds. And most classes will only care about the first two.

Regarding your specific example, I don't see the ambiguity at all. Clearly Point3D(1, 2) intends to leave additional arguments of the end, not off the beginning. This is just like for regular calls -- you can leave arguments off the end (assuming they have default values) but not off the beginning (range(10) notwithstanding :-).

The matching of Point2D(1, 2, y=1) is first translated (at runtime, given Point2D.__match_args__ == ['x', 'y']) to Point2D(x=1, y=2, y=1) and the duplicate for y can be detected.

Do we at least agree that Point3D(1) is valid, and matches all 3D points whose x coordinate equals 1?

Apart from the complex signature, there's also the fact that in your proposal, every class must implement quite a bit of __match__ machinery (unless it needs nothing beyond the default object.__match__). In our proposal all the machinery of matching (including matching sub-patterns, or e.g. 0|1|2) is handled by the framework, and the __match__ implementation can focus on telling the framework which attributes exist and in which order. A proxy doesn't have to do any work unless a specific attribute is requested.

from patma.

ilevkivskyi avatar ilevkivskyi commented on May 26, 2024

This is just like for regular calls -- you can leave arguments off the end (assuming they have default values)

Yes, but what if all three arguments are required? I mean if one literally defines:

@dataclass
class Point3D:
    x: int
    y: int
    z: int

To me Point3D(1) looks a bit weird in such context. Of course the object can just return a bit more context to the interpreter, something like __required_match_args__ that would be an integer. And then interpreter can decide.

In general I agree that we should put as much burden on the interpreter as we can here. But give it the chance to detect all possible ambiguous/impossible matches.

from patma.

gvanrossum avatar gvanrossum commented on May 26, 2024
  • what to do in case if all patterns fail?

    See #5. We discussed this quite a bit; in the end @Tobias-Kohn convinced us that it should follow the lead of if ... elif ... elif .... But I can see that from a static checker's POV it's nicer if we fail hard here. It'll probably remain a point of discussion in the final PEP...

  • as vs. case: I can live with as. I agree that the extra level of indentation is tedious, but it's how every other compound statement in Python is done. Allowing multiple as clauses without intervening blocks feels like a syntax error (compare an if with an empty block -- we require pass to force the user to think about it), but if we decide to forego the extra indentation I could live with it.

  • x|y -- we can specify that the arguments to | must all be constants, or we can specify that each of the arms must bind the same set of variables. The latter does not feel hard to check.

  • Point3D with three required attributes: Even if you require all in the constructor, you might still have uses for omitting some. Certainly when using keywords it must be possible to leave some out. Consider a class User with mandatory id and name attributes -- I should be able to match User(id=42) or User(name='drew'). So I think it should be okay for positionals too.

from patma.

gvanrossum avatar gvanrossum commented on May 26, 2024

I don't have strong attachment to coinciding names, exception is also a good option (you see I like exceptions :-))

Compromise: we could specify that a pattern is only allowed to bind a name once (or only once in each branch of |, if we go that route). The "match two equal values" use case for (x, x) feels pretty theoretical, and it looks like it would require a somewhat complex implementation (essentially translating to (x, _x) if x == _x).

from patma.

ilevkivskyi avatar ilevkivskyi commented on May 26, 2024

Allowing multiple as clauses without intervening blocks feels like a syntax error

OK, let's drop it. It is not really needed if we have |.

we can specify that the arguments to | must all be constants [...]

Potentially we can also allow str() | int() as a shorthand for isinstance(..., (int, str)). So i would rather vote for the second option "each of the arms must bind the same set of variables".

So I think it should be okay for positionals too.

I totally agree about the keyword args, but not so much about positional. I think we should some opportunity to the implementer to add some strictness here, like __required_match_args__, it is fine not set it, then Point3D(1) will be accepted. Anyway, you at least convinced me that your approach is extensible in the direction I want (i.e. __required_match_args__ can be added later without breaking backwards compatibility).

from patma.

gvanrossum avatar gvanrossum commented on May 26, 2024
  • str() | int()

    Seems nice, happy to go this way.

  • __required_match_args__

    I can live with this. (Perhaps rename to __match_args_required__ so they all sort together? There's a bunch of bikeshedding to do here about the precise API.)

from patma.

gvanrossum avatar gvanrossum commented on May 26, 2024

Uppercase names as constants are probably fine. I was also thinking of this, but was stopped by the fact that there are precedents in Python where name case influences semantics.

Probably you meant "there are no precedents"?

from patma.

ilevkivskyi avatar ilevkivskyi commented on May 26, 2024

Perhaps rename to __match_args_required__ so they all sort together?

πŸ‘

Probably you meant "there are no precedents"?

Oops, sorry, yes that was a bad typo :-)

from patma.

viridia avatar viridia commented on May 26, 2024

I would not expect Point(1) to match Point(1, 2, 3) - however, I would expect Point(1, ...) or Point(1, _, _) to match.

from patma.

viridia avatar viridia commented on May 26, 2024

Also, as far as name matching goes: EIBTI. I still favor something like ?name or name? for variables, even though it's an extra character, I think clarity beats brevity in this case.

(It's interesting that in LISP, there's this concept of 'quoting' a name, so that you can refer to the name without dereferencing what it holds - much like the C++ reference operator, &. I thought about proposing a generalization of this for use with name matching, but also usable standalone; but on second thought given how many beginning programmers struggle with the concept of pointers in C, perhaps adding pointers to Python is not such a good idea. :) )

BTW one of the things I appreciated reading Ivan's PEP was the analysis of existing code patterns, specifically the statistics around usage of isinstance(). Any far-reaching change to a mature and popular language needs to be well-grounded in fact, and not just airy theorizing :)

I think the main points of divergence surround the semantics of __match__, and the dividing line between efficiency and flexibility; nor do I think we have exhausted the possible space of designs here. The VM is always going to be able to crawl through data structures much faster than end-user code, and will likely create fewer temporary objects along the way.

from patma.

gvanrossum avatar gvanrossum commented on May 26, 2024

I would not expect Point(1) to match Point(1, 2, 3)

Hm, then we definitely need to set __match_args_required__ = 3 in the Point class.

Point(1, ...) or Point(1, _, _).

I'm not a fan of using ellipses as part of pattern syntax. It's too ambiguous in docs etc. So let's use the second form.

Also, as far as name matching goes: EIBTI.

Then let me propose that the default is that a plain name is a variable binding for value extraction, while all dotted names are named constants, and if you really need to use a plain name as a named constants (e.g. example 5 in EXAMPLES.md uses none_rprimitive), you can mark it somehow.

Why mark named constants instead of bound variables? Because in 95% of the cases you'll want the latter. Match statements will be littered with things like Point(x, y) or [a, b, c] -- this is the reason why we have patterns at all -- whereas named constants without dots are relatively rare (and hopefully people will mostly use enums, which naturally have dots in their name, e.g. Color.red, or import modules, e.g. _locale.CHAR_MAX).

Also, let me propose that the marking would be a leading . -- this is unobtrusive, easy to type (no shift key needed), and extends the rule "names with a dot are constants".

As to data, example 5 is the only of the six motivating examples chosen by Brandt from real code that uses a dot-free name.

(But I'm still in favor of the lowercase/uppercase rule myself.)

from patma.

ilevkivskyi avatar ilevkivskyi commented on May 26, 2024

Also, let me propose that the marking would be a leading .

I really like this idea. I was lately thinking that requiring pattern to be dotted to be considered a reference rather than target will give some push towards using enums instead of plain integer constants, which is probably a good thing. But supporting only enums would be definitely too much, so using leading . (instead of something more cryptic like $ or ^) would be a good compromise. Some pros are:

  • It is explicit, rather than implicit like using upper-case
  • It is consistent, so is easier to remember

from patma.

Tobias-Kohn avatar Tobias-Kohn commented on May 26, 2024

Also, let me propose that the marking would be a leading .

I also agree that the single dot might be a good compromise here. It also feels not too far off as similar syntax is used for imports.

In principle, however, we could also consider to have something like $, but then not as a load-marker, but rather as a proper evaluation operator similar to string interpolation (as I just described in issue #1).

from patma.

Tobias-Kohn avatar Tobias-Kohn commented on May 26, 2024

First, please let me also welcome Ivan to the team.

It was very interesting to read Ivan's draft PEP. I like the different perspective and found the analysis of use cases brilliant. However, there are a few elements with which I find myself rather disagreeing (apart from many points in which we probably all agree).

Overall, I have the impression that Ivan's proposal is primarily coming from types, ADTs, and the perspective of performance (please correct me if I am wrong, though). Many of the proposal's elements are about strictness in one way or another. For instance, performing an isinstance()-check before calling __match__, the @sealed decorator, raising an UnmatchedValue exception, or making sure that the number of positional arguments fit. For Python, however, I feel we should rather follow a much more lenient "duck typing" approach, more in line with that type annotations are never enforced by the compiler, but only give hints to type checkers.


Let me pick up a few specific and minor points (I will add more comments to the corresponding issues).

  • Indentation level: there has been quite a bit of discussion about the identation of pattern matching with many proposals preferring a mode where indentation is "saved". So, this is certainly a point we will have to discuss. To be honest, I do not see the problem with indentation here in the first place or why it is something that should be avoided. One the one hand, we have "doubly" indented code in case of classes already as well. One the other hand, I do not think that indentation here is such a big concern that it is worth giving up the existing consistency.
  • Ellipsis: one of the reasons why I really like the underscore as the "don't care" wildcard (apart from it being used in almost all other pattern matching languages) is that it is a legal name and therefore (in principle) does not require any special syntax or treatment as such. The ellipsis, on the other hand, is a legal value in Python that might be used in production code. Using the ellipsis as a marker for not caring about the value seems thus similarly problematic to me like using None to indicate the absence of a value in other cases (although that value could be None itself). Finally, the ellipsis always looks to me like there are several elements missing, whereas the underscore feels more precise.
  • as vs. case: I am not a fan of the "fewer keystrokes" argument and not convinced that in a case like this, it really is valid. That case makes it more similar to switch-statements might even be an advantage as I believe there are more Python programmers with a background in C than with a background in functional languages. However, that as is already an existing keyword and would therefore work better is a very good and valid point. Since the case (or as) would only occur as keyword in well defined positions, this is probably not that much of an issue, though. I am more worried about match there. But it is a good point to consider, nonetheless. (I personally lean more towards case, but I am open to as as well).
  • the initial literal values passed in should not be included in the return: I think that a constraint like this makes the interface rather difficult and hard to use.
  • c(): I originally had (almost) the same syntax in my proposal, allowing int() to mean that the value must be expressible as an intβ€”in addition to _: int. I think for the sake of consistency, we might want to support this.

from patma.

ilevkivskyi avatar ilevkivskyi commented on May 26, 2024

@Tobias-Kohn some of your comments are in a sense outdated, because Guido already convinced me. The updated version (that I propose to take as a starting point) is in PR #25

from patma.

Tobias-Kohn avatar Tobias-Kohn commented on May 26, 2024

Yes, I know that some of it is already outdated or really just a minor issue. But I felt your PEP deserves a proper reaction, even if it ends up being basically just a "I agree with Guido as well" :-).

from patma.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.