Comments (24)
Okay, fixed the typo in the original.
from patma.
Important note: ideas marked as rejected there are not really rejected, I just wanted to summarize all alternatives, on many of them I was 50-50.
from patma.
Btw, it is interesting how close are the proposals, it is a good sign!
from patma.
After quickly scrolling through issues, it seems to me all ideas where we diverge are in the 50-50 zone I mentioned.
Couple exceptions:
- I think it is important to include a detailed spec on how
__match__()
is generated for dataclasses in the PEP - I think we should add
@typing.sealed
, see #18
from patma.
- I like the idea of adding
sealed
(and in general thinking things through from the POV of a static type checker). - Our proposed
__match__
API (and class pattern semantics) is so different from yours that dataclasses are very simple to support for us (see below).
Let me summarize what I see as the main differences.
- Differences around
as
vs.case
, indentation, multipleas
clauses per block (#2). - Our proposal has no
else
(#6) and is a pass if nothing matches (#5). We usecase _:
instead ofelse:
, and_
is the universal "don't care" pattern. - We have
|
for OR patterns. (We decided against AND patterns, #14.) - We have somewhat different rules for variables (#1). This is controversial, but we ended up with a compromise where "single" names starting with a lowercase letter are variables but if a name starts with an upper case letter it's an expression; and dotted names are always expressions (solving some of your issues).
- We are adopting semantics closer to sequence unpacking, so
[a, b]
and(a, b)
are patterns with exactly the same meaning, and they match any sequence. (I guess if you really wanted to check for a tuple you could usetuple([a, b])
. - Our API for
__match__
is entirely different (#8). See below for a summary.
UPDATE: Here are some more differences:
- You write unstructured matches as
C(...)
; we just writeC()
. (see__match__
discussion below.) - You bind names only after the guard succeeds. Our proposal leaves the name bound even if the guard fails (in fact it may leave some names in a pattern bound even when a later sub-pattern fails). (Our proposal is inspired by the behavior of
:=
here.) - You propose special semantics for coinciding names, so e.g.
(x, x)
would match a pair of equal values. In our proposal this is just a user error, and to match a pair you'd writecase (x, y) if x == y:
. - You have a special syntax for one-off matches. We should discuss this separately.
About the __match__
method
A pattern of the form C(<whatever>)
calls C.__match__(target)
(where target
is the value from match target:
), which should return an object whose attributes are used to match <whatever>
or None if there's no match, or None if there's no match. The __match__()
method is responsible for checking isinstance()
, so that it can be used for duck typing.
In the general case __match__()
can return some proxy that does attribute mapping and filtering. In many cases this can just return target
unchanged. There's one magic attribute that the returned proxy may have, __match_args__
, used for positional argument matching (see below). Usually __match_args__
is a class attribute.
The built-in pattern matching machinery is responsible of processing the return of `match():
-
If the pattern has the form
C()
, a non-None return from__match__()
means success. -
If the pattern uses keyword args, those are looked up as attributes on the proxy returned by
__match__()
. For example,Point(x=x, y=0)
matches points whosey
coordinate is zero, and bindsx
to the pattern'sx
coordinate. -
If the pattern uses positional arguments, they are mapped to names using the
__match_args__
attribute of the proxy. -
For dataclasses we can have
__match__()
returnself
(afterisinstance()
check) and have a class attribute__match_args__
equal tolist(C.__annotations__)
. -
Not all positional arguments need to be given in the pattern, so
Point(x)
means the same asPoint(x, _)
. -
Positional and keyword args may be mixed. Examples:
Point(x, y)
,Point(x, y=y)
,Point(y=y, x=x)
all mean the same thing.
from patma.
Our API for
__match__
is entirely different
It is actually not that different. The general logic is like this: interpreter asks an object: "Can you represent/unpack yourself as something like this?" and the object answers: "yes/no" and in "yes" how exactly. The details of how the answer is represented, as a custom object, or tuple + dictionary is less important. What is more important here is how much context you give to the object. There are two extremes here:
- One is IIUC your proposal: give no context at all, so essentially in your case interpreter just asks "How can you represent/unpack yourself?"
- The other extreme is in rejected ideas in my draft: give full nested pattern object, and ask the object "Can you represent/unpack yourself as this nested structure?"
I think both extremes are sub-optimal. For the second there are some arguments in the draft, for the first extreme I think the issue is that it is not flexible enough. I think the best option is somewhere in the middle, so that you ask an object "can you represent/unpack yourself like this: two things and z=third thing?"
In my proposal I pass the literal patterns as part of the context, but maybe it is still to much context. I think we should at least pass this data to __match__()
:
- How many positional matches were requested
- Which keys were requested
- Are there
*
and/or**
One of the motivation for this is that I believe __match__()
should raise an exception for impossible or ambiguous matches, and interpreter can't detect this because it doesn't know the mapping between named and positional representation of an object. In particular I think Point3D(1, 2)
should raise an exception because it is ambiguous between Point(1, 2, _)
and Point(_, 1, 2)
. Also Point2D(1, 2, y=1)
should raise an exception, it can hide a hard to debug error where a different class was written (like it should probably be Point3D
in the last example).
from patma.
Forgot about one thing: moving isinstance()
to the __match__
is definitely a very good idea, +1 on this.
About other things:
- I don't have strong attachment to coinciding names, exception is also a good option (you see I like exceptions :-))
- Uppercase names as constants are probably fine. I was also thinking of this, but was stopped by the fact that there are no precedents in Python where name case influences semantics.
as
vscase
, I really likeas
more, but it may be personal.- +1 on name binding vs guards, let's be consistent with existing precedents.
(a, b)
vs[a, b]
probably also OK, it is good to be consistent with existing logic.- I was actually on the fence about allowing
|
, so I am fine with this. Note this may complicate the spec quite a bit, e.g. do we allowname | other_name
, can one apply|
to named sub-matches etc. - No
else:
, I am also +1
from patma.
One more deviation you forgot, what to do in case if all patterns fail? I propose to raise an exception. The main motivation is that most cases I have seen are rather fall where if match
would work, or where I would want an exception. So probably whether to raise an exception or not may depend on whether we include if match
syntax.
from patma.
About __match__
:
We considered various APIs for __match__
but they were all pretty complex -- like yours, which has positional args, keyword args, *args
, and **kwds
. And most classes will only care about the first two.
Regarding your specific example, I don't see the ambiguity at all. Clearly Point3D(1, 2)
intends to leave additional arguments of the end, not off the beginning. This is just like for regular calls -- you can leave arguments off the end (assuming they have default values) but not off the beginning (range(10)
notwithstanding :-).
The matching of Point2D(1, 2, y=1)
is first translated (at runtime, given Point2D.__match_args__ == ['x', 'y']
) to Point2D(x=1, y=2, y=1)
and the duplicate for y
can be detected.
Do we at least agree that Point3D(1)
is valid, and matches all 3D points whose x coordinate equals 1?
Apart from the complex signature, there's also the fact that in your proposal, every class must implement quite a bit of __match__
machinery (unless it needs nothing beyond the default object.__match__
). In our proposal all the machinery of matching (including matching sub-patterns, or e.g. 0|1|2
) is handled by the framework, and the __match__
implementation can focus on telling the framework which attributes exist and in which order. A proxy doesn't have to do any work unless a specific attribute is requested.
from patma.
This is just like for regular calls -- you can leave arguments off the end (assuming they have default values)
Yes, but what if all three arguments are required? I mean if one literally defines:
@dataclass
class Point3D:
x: int
y: int
z: int
To me Point3D(1)
looks a bit weird in such context. Of course the object can just return a bit more context to the interpreter, something like __required_match_args__
that would be an integer. And then interpreter can decide.
In general I agree that we should put as much burden on the interpreter as we can here. But give it the chance to detect all possible ambiguous/impossible matches.
from patma.
-
what to do in case if all patterns fail?
See #5. We discussed this quite a bit; in the end @Tobias-Kohn convinced us that it should follow the lead of
if ... elif ... elif ...
. But I can see that from a static checker's POV it's nicer if we fail hard here. It'll probably remain a point of discussion in the final PEP... -
as
vs.case
: I can live withas
. I agree that the extra level of indentation is tedious, but it's how every other compound statement in Python is done. Allowing multipleas
clauses without intervening blocks feels like a syntax error (compare anif
with an empty block -- we requirepass
to force the user to think about it), but if we decide to forego the extra indentation I could live with it. -
x|y
-- we can specify that the arguments to|
must all be constants, or we can specify that each of the arms must bind the same set of variables. The latter does not feel hard to check. -
Point3D
with three required attributes: Even if you require all in the constructor, you might still have uses for omitting some. Certainly when using keywords it must be possible to leave some out. Consider a classUser
with mandatoryid
andname
attributes -- I should be able to matchUser(id=42)
orUser(name='drew')
. So I think it should be okay for positionals too.
from patma.
I don't have strong attachment to coinciding names, exception is also a good option (you see I like exceptions :-))
Compromise: we could specify that a pattern is only allowed to bind a name once (or only once in each branch of |
, if we go that route). The "match two equal values" use case for (x, x)
feels pretty theoretical, and it looks like it would require a somewhat complex implementation (essentially translating to (x, _x) if x == _x
).
from patma.
Allowing multiple
as
clauses without intervening blocks feels like a syntax error
OK, let's drop it. It is not really needed if we have |
.
we can specify that the arguments to
|
must all be constants [...]
Potentially we can also allow str() | int()
as a shorthand for isinstance(..., (int, str))
. So i would rather vote for the second option "each of the arms must bind the same set of variables".
So I think it should be okay for positionals too.
I totally agree about the keyword args, but not so much about positional. I think we should some opportunity to the implementer to add some strictness here, like __required_match_args__
, it is fine not set it, then Point3D(1)
will be accepted. Anyway, you at least convinced me that your approach is extensible in the direction I want (i.e. __required_match_args__
can be added later without breaking backwards compatibility).
from patma.
-
str() | int()
Seems nice, happy to go this way.
-
__required_match_args__
I can live with this. (Perhaps rename to
__match_args_required__
so they all sort together? There's a bunch of bikeshedding to do here about the precise API.)
from patma.
Uppercase names as constants are probably fine. I was also thinking of this, but was stopped by the fact that there are precedents in Python where name case influences semantics.
Probably you meant "there are no precedents"?
from patma.
Perhaps rename to
__match_args_required__
so they all sort together?
Probably you meant "there are no precedents"?
Oops, sorry, yes that was a bad typo :-)
from patma.
I would not expect Point(1)
to match Point(1, 2, 3)
- however, I would expect Point(1, ...)
or Point(1, _, _)
to match.
from patma.
Also, as far as name matching goes: EIBTI. I still favor something like ?name
or name?
for variables, even though it's an extra character, I think clarity beats brevity in this case.
(It's interesting that in LISP, there's this concept of 'quoting' a name, so that you can refer to the name without dereferencing what it holds - much like the C++ reference operator, &
. I thought about proposing a generalization of this for use with name matching, but also usable standalone; but on second thought given how many beginning programmers struggle with the concept of pointers in C, perhaps adding pointers to Python is not such a good idea. :) )
BTW one of the things I appreciated reading Ivan's PEP was the analysis of existing code patterns, specifically the statistics around usage of isinstance(). Any far-reaching change to a mature and popular language needs to be well-grounded in fact, and not just airy theorizing :)
I think the main points of divergence surround the semantics of __match__
, and the dividing line between efficiency and flexibility; nor do I think we have exhausted the possible space of designs here. The VM is always going to be able to crawl through data structures much faster than end-user code, and will likely create fewer temporary objects along the way.
from patma.
I would not expect
Point(1)
to matchPoint(1, 2, 3)
Hm, then we definitely need to set __match_args_required__ = 3
in the Point
class.
Point(1, ...)
orPoint(1, _, _)
.
I'm not a fan of using ellipses as part of pattern syntax. It's too ambiguous in docs etc. So let's use the second form.
Also, as far as name matching goes: EIBTI.
Then let me propose that the default is that a plain name is a variable binding for value extraction, while all dotted names are named constants, and if you really need to use a plain name as a named constants (e.g. example 5 in EXAMPLES.md uses none_rprimitive
), you can mark it somehow.
Why mark named constants instead of bound variables? Because in 95% of the cases you'll want the latter. Match statements will be littered with things like Point(x, y)
or [a, b, c]
-- this is the reason why we have patterns at all -- whereas named constants without dots are relatively rare (and hopefully people will mostly use enums, which naturally have dots in their name, e.g. Color.red
, or import modules, e.g. _locale.CHAR_MAX
).
Also, let me propose that the marking would be a leading .
-- this is unobtrusive, easy to type (no shift key needed), and extends the rule "names with a dot are constants".
As to data, example 5 is the only of the six motivating examples chosen by Brandt from real code that uses a dot-free name.
(But I'm still in favor of the lowercase/uppercase rule myself.)
from patma.
Also, let me propose that the marking would be a leading
.
I really like this idea. I was lately thinking that requiring pattern to be dotted to be considered a reference rather than target will give some push towards using enums instead of plain integer constants, which is probably a good thing. But supporting only enums would be definitely too much, so using leading .
(instead of something more cryptic like $
or ^
) would be a good compromise. Some pros are:
- It is explicit, rather than implicit like using upper-case
- It is consistent, so is easier to remember
from patma.
Also, let me propose that the marking would be a leading .
I also agree that the single dot might be a good compromise here. It also feels not too far off as similar syntax is used for imports.
In principle, however, we could also consider to have something like $
, but then not as a load-marker, but rather as a proper evaluation operator similar to string interpolation (as I just described in issue #1).
from patma.
First, please let me also welcome Ivan to the team.
It was very interesting to read Ivan's draft PEP. I like the different perspective and found the analysis of use cases brilliant. However, there are a few elements with which I find myself rather disagreeing (apart from many points in which we probably all agree).
Overall, I have the impression that Ivan's proposal is primarily coming from types, ADTs, and the perspective of performance (please correct me if I am wrong, though). Many of the proposal's elements are about strictness in one way or another. For instance, performing an isinstance()
-check before calling __match__
, the @sealed
decorator, raising an UnmatchedValue
exception, or making sure that the number of positional arguments fit. For Python, however, I feel we should rather follow a much more lenient "duck typing" approach, more in line with that type annotations are never enforced by the compiler, but only give hints to type checkers.
Let me pick up a few specific and minor points (I will add more comments to the corresponding issues).
- Indentation level: there has been quite a bit of discussion about the identation of pattern matching with many proposals preferring a mode where indentation is "saved". So, this is certainly a point we will have to discuss. To be honest, I do not see the problem with indentation here in the first place or why it is something that should be avoided. One the one hand, we have "doubly" indented code in case of classes already as well. One the other hand, I do not think that indentation here is such a big concern that it is worth giving up the existing consistency.
- Ellipsis: one of the reasons why I really like the underscore as the "don't care" wildcard (apart from it being used in almost all other pattern matching languages) is that it is a legal name and therefore (in principle) does not require any special syntax or treatment as such. The ellipsis, on the other hand, is a legal value in Python that might be used in production code. Using the ellipsis as a marker for not caring about the value seems thus similarly problematic to me like using
None
to indicate the absence of a value in other cases (although that value could beNone
itself). Finally, the ellipsis always looks to me like there are several elements missing, whereas the underscore feels more precise. - as vs. case: I am not a fan of the "fewer keystrokes" argument and not convinced that in a case like this, it really is valid. That
case
makes it more similar toswitch
-statements might even be an advantage as I believe there are more Python programmers with a background in C than with a background in functional languages. However, thatas
is already an existing keyword and would therefore work better is a very good and valid point. Since thecase
(oras
) would only occur as keyword in well defined positions, this is probably not that much of an issue, though. I am more worried aboutmatch
there. But it is a good point to consider, nonetheless. (I personally lean more towardscase
, but I am open toas
as well). - the initial literal values passed in should not be included in the return: I think that a constraint like this makes the interface rather difficult and hard to use.
- c(): I originally had (almost) the same syntax in my proposal, allowing
int()
to mean that the value must be expressible as anint
βin addition to_: int
. I think for the sake of consistency, we might want to support this.
from patma.
@Tobias-Kohn some of your comments are in a sense outdated, because Guido already convinced me. The updated version (that I propose to take as a starting point) is in PR #25
from patma.
Yes, I know that some of it is already outdated or really just a minor issue. But I felt your PEP deserves a proper reaction, even if it ends up being basically just a "I agree with Guido as well" :-).
from patma.
Related Issues (20)
- Phrasing update in abstract and overview HOT 2
- Disallow guards when using an "open pattern"? HOT 11
- Can the compiler move guards around? HOT 7
- JSON example HOT 5
- Add link to DLS paper to PEP 635 HOT 1
- Reach out to Pablo for code review HOT 1
- Use "as" for named patterns HOT 2
- Raise SyntaxErrors for anything following irrefutable patterns
- Finish AST validation for patterns
- Catch remaining errors in the compiler HOT 1
- Unify the PEP 634 and implementation grammars HOT 2
- Review progress for the reference implementation HOT 5
- Thomas Wouters' objections HOT 13
- Nick Coghlan's (revised) PEP 642 HOT 1
- Dedicated AST nodes for patterns? HOT 5
- PyCon 2021 HOT 10
- work error with zip param "strict=False" on py386 HOT 1
- Please document the design decisions behind PEP 634. HOT 7
- Case goto's? HOT 1
- PyCon JP 2021 HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from patma.