gvanrossum / patma Goto Github PK

View Code? Open in Web Editor NEW

1.0K 1.0K 64.0 178 KB

Pattern Matching

Python 77.56% Makefile 0.35% Shell 6.58% Jupyter Notebook 15.52%

patma's People

Contributors

Stargazers

Watchers

Forkers

natelust abdulniyaspm holdenweb alcroito erkandiken fperez thautwarm dmoisset consideratio nadia-el umcconnell ethemguner whitebumblebee shubhamditya36 zeta1999 stefanfoulis bowugit jairoandrescastaneda inscopix-oburdin crmiguez devbox10 josep-pujol saeta kristofleroux rodrigoieh cosmicbboy gridl petersaalbrink shawn-yee alexshcer niksonx ayushparikh-code yothinix xdryzen sebastiangonzalezn sebastiangonzalezua nathan-walk cometyang so-y arifpucit zackmark29 maximsmolskiy yashdjsondookun aaronahmid verakai robocoder15 alien-ware python-repository-hub simek-m darlove william3johnson aniketkumar2500 6sly chandrasekharrebba99 pfransozi cristina-gabriela apterek mario8802 dkosarevsky

patma's Issues

Should a string of length N match a list pattern of that length?

For example,

match "ab":
  case [a, b]: print("Two values", a, b)
  case c: print("One value:", c)

Which branch should it take? My current prototype chooses case c. My reasoning is that many people have complained over the years that strings being iterable is a pain in various contexts, and I suspect this is one of those contexts.

Opening up the repo?

I've been telling selected folks about this, and when they requested access I've let them into the repo (without the assumption that they'll become co-authors, of course, but with the hope that they'll participate in the discussion).

Should we just open up the repo? It won't attract a lot of attention unless we specifically advertise it, but I know that sometimes people just browse my GitHub user page looking for interesting repos. Hopefully the repo name doesn't give away too much. ;-)

Should we allow arbitrary patterns on either side of '|'?

In Scala, the | pattern operator only allows constant patterns. We could follow this lead, or we could allow arbitrary patterns.

If we allow arbitrary patterns, should we require that both sides bind the same variables? I.e. should we allow this?

match target:
    case [] | [x]:
        ...

I'm inclined to require that both sides bind the same set of variables, except _ is not considered as a variable.

Allow the implementation some flexibility when compiling patterns.

We may be able to make more powerful optimizations possible in the future if we clearly and explicitly state that there are no guarantees as to the order (or number of times) that different parts of the matching machinery are invoked on patterns, as long as the order and semantics of name bindings, guards, and bodies remain unchanged.

For example, here is a case where we could one day get away with:

Only checking if stuff is a length-one sequence and extracting the contained value once.
Only using Bar.__match__ and friends once.
Only using Foo.__match__ and friends once.

match stuff:
    case [Bar(100) | Foo(100)] | [Foo(42)]:
       ...
    case [Bar(42)]:
        ...

A somewhat weaker variant of this would be to allow individual patterns to be rewritten, but for no data to be shared or reordered between distinct cases. The first case in the example above would still benefit from this, but not the second.

Does underscore bind?

I know that we're special-casing the name _ to be exempt from the "same names in or-patterns" rule. Is the target still bound to _, though? The PEP doesn't say.

I'm inclined to say "no". This should be thought of as an anonymous case, and if a user wants to bind a name, they should use a "real" one.

Reject float constant patterns?

Rust is going to disallow float literals in patterns. We should probably never accept them.

Matching URLs?

Apologies if what I am about to propose is too weird or beyond the pale. I'm not that sure of it myself, but I wanted to throw it out there for discussion.

A common pattern in web application frameworks (both Python and Node.js) is to dispatch to a handler method based on the incoming URL. This is often done using a decorator to associate a URL pattern with a given method:

class Handler:
  @get("/project/{id}") # Note parameter in the pattern
  def get_project(self, request):
    return db.lookup_project(request.params.id)

  @get("/account/profile")
  def get_profile(self, request):
    return db.lookup_profile(request.current_user)

Now, this works perfectly well and there's no compelling reason to improve on it, but I was curious whether the match statement was flexible enough that it could be used in a similar way.

Obviously URL paths are literals and can be matched using the literal pattern. However, if we want to do a prefix match (since that is typically how URLs are matched) we'll need a custom match pattern, but that's fairly easy to do with our proposed __match__ protocol:

match url:
  as UrlPattern("/account/profile")():
    return db.lookup_profile(request.current_user)

More difficult is handling the parameters embedded in the patterns. The problem is that the compiler can't 'see' the variable names embedded in the strings, so there's no way for the match machinery to bind those names in the context of the match arms.

However, recall that (a) match patterns are the inverse of expressions, in the same way that scanf is the mirror of printf, and (b) there is one type of Python expression that directly supports embedding of variables (or more specifically, expressions) in strings: f-strings.

I am not proposing, of course, that the interpreter should directly recognize f-strings as match patterns - for one thing, there is a choice of algorithms to be used when matching a string - regex, greedy scanning, etc - and we wouldn't want to presuppose a particular one. But perhaps we could consider, at least, scanning the string to decide which match variable names should be bound.

To actually perform the match, the f-string would have to be wrapped in a custom matcher. The matcher would return a proxy containing the attributes that it discovered when scanning the string.

Actually, I probably wouldn't literally use f-strings for this, because there are cases where you might actually want to use an f-string as a literal. Instead, I would probably introduce a new type of string, let's call it 'i-string' for 'input string', which has the same format of an f-string, but its interpolation fields represent inputs rather than outputs. Again - scanf vs. printf.

So you can imagine something like:

match url:
  as UrlPattern(i"/projects/{id}")():
    return db.lookup_project(request)

  as UrlPattern("/account/profile")():
    return db.lookup_profile(request.current_user)

Again, I'm not suggesting that anyone would actually want to do this. However, URL-matching is something that happens a lot - an entire industry sector is built on it - so I don't want to dismiss the use case too readily. And although I suspect this idea is pretty ugly, I wonder if it could lead to something less so.

The more general question, I think, is do we want to support other means of determining the set of names to be bound, other than named constructor arguments?

Sealed classes to facilitate ADT pattern

I propose to add a @typing.sealed class decorator that would do nothing at runtime, but indicate to static type checkers that all subclases must be defined in the same module. This will allow static checkers interpret such class as a union of all their children, so they can do static exhaustiveness checks. Effectively sealed classes together with dataclasses would provide nice support for algebraic data types pattern. See this (rather long) example:

from dataclasses import dataclass
from typing import sealed

@sealed
class Node:
    ...

class Expression(Node):
    ...

class Statement(Node):
    ...

@dataclass
class Name(Expression):
    name: str

@dataclass
class Operation(Expression):
    left: Expression
    op: str
    right: Expression

@dataclass
class Assignment(Statement):
    target: str
    value: Expression

@dataclasses
class Print(Statement):
    value: Expression

def dump(node: Node) -> str:
    match node:
    as Assignment(target, value):
        return f"{target} = {dump(value)}"
    as Print(value):
        return f"print({dump(value)})"
    as Operation(left, op, right):
        return f"({dump(left)} {op} {dump(right)})"
    # Error here because not all cases were checked!

Tooling support

Assuming this PEP gets approved and adopted, it is going to have a major impact on the dozens of support tools out there: linters, IDE plugins, code formatters, syntax highlighters and so on. The introduction of a new statement type and two new "soft" keywords (match and case) will require adjustment to these tools.

Although I don't think this group needs to do anything about this or make any actual decisions, I think it's worth noting what the impacts will be.

As a way of thinking about this, I will divide the world of Python source tools into two groups, which I'll call "shallow" and "deep".

Shallow tools do not have a complete understanding of Python grammar, but instead scan the text looking for patterns, often using regular expressions. Examples of this kind of tool are TextMate and the Python plugin for VSCode: they understand the basic keywords, as well as indent/dedent and lines ending with a colon, but don't actually parse expressions or statement structure. This makes sense considering that the source code is frequently syntactically invalid while being edited, so a strict approach would fail most of the time.
Deep tools are ones that completely parse the Python grammar. An example of this is black, which we discussed in another context.

For the shallow tools, adding support for match basically consists of adding two new entries to the keyword table. Depending on how they scan the text, they may or may not have to add support for distinguishing soft keywords at the start of a line from other uses of that name.

Here's an example from the vscode-python repo: https://github.com/microsoft/vscode-python/blob/master/src/client/language/languageConfiguration.ts#L52

For the deep tools, obviously they will have to add the match statement syntax to their grammar definition, including support for soft keywords.

I'd like a better term than 'match arm'

@ilevkivskyi's PEP currently uses "match arm" or simply "arm" for what I think of as "case". I find "arm" a rather esoteric term (for a while I thought it referred to the two sides of a | operator).

I would like to call it "match arm" or just "case", even if in the end we spell it using as.

Thoughts? Anyone got a better idea?

Should Sequence patterns allow both (x, y) and [x, y]?

Sequence patterns aren't quite the same as sequence unpacking assignments (the behavior for iterators and strings is different), but nevertheless we may follow their lead and allow both (x, y) and [x, y].

Or perhaps not: it may be somewhat confusing that case (x, y) matches any sequence, not just tuples.

Add syntax for multiple-dispatch?

(Adapted from a comment in #26.)

def check(stuff):
    match stuff:
        case [x, .x, .x]:
            return f"Three of a kind: {x}!"
        case [x, .x, _] | [x, _, .x] | [_, x, .x]:
            return f"Pair: {x}!"
        case [_, _, _]:
            return "Junk!"
        case _:
            raise TypeError("Nice try!")

Looking at the example above, I wonder if there is some way we could omit the match stuff: line and unindent. So case blocks starting at function-scope just use the named arguments by default, as a form of multi-dispatch:

def check(stuff):
    case [x, .x, .x]:
        return f"Three of a kind: {x}!"
    case [x, .x, _] | [x, _, .x] | [_, x, .x]:
        return f"Pair: {x}!"
    case [_, _, _]:
        return "Junk!"
    case _:
        raise TypeError("Nice try!")

@Tobias-Kohn commented:

The proposal to omit the match stuff: seems to remove a single line at the expense of much added complexity to the grammar and compiler. We would also have to make sure that we do not inadvertedly open the door for some strange corner cases.

On the flip side, multi dispatch has been on the wish list of a couple of people for quite some time. But they would then probably want to go at least one step further and do proper multi dispatch, arriving at something like:
def fact:
    case 0 | 1:
        return 1
    case int(n) if n > 0:
        return n * fact(n-1)
And now are back discussing some fundamental issues: clearly, calling fact(1.5) has no implementation. Is this now a case of TypeError because of incompatible arguments, or do we just silently return None since we decided that pattern matching does not need to be exhaustive?

Having multi-dispatch sure would be a great thing, but I think it is rather an entirely new discussion and not something that we should just bring in as an aside to pattern matching.

I agree that it's probably out-of-scope for the PEP. However, I think we should at least anticipate the request and list it as a deferred idea. If we add some commentary, we could probably help focus the resulting discussion as well.

Consider object.match()

From my understanding of #8 the current proposal is that there is no object.__match__() so that by default trying to match against Foo() raises an error. In my draft notes I have some spec for object.__match__() because I think it is important to provide some "out of the box" experience, so that we don't force owners of existing codebases to add dummy @matchable class decorators to dozens of their classes.

I agree that in its current version my spec is an overkill (like it uses __slots__ as __match_args__ etc). But I think there is a way to provide a minimal safe object.__match__() implementation if we make one tweak to the spec: missing attribute should raise an exception rather than fail a match.

The main argument is that failing a match would hide hard to debug typos. Essentially, my idea is that matching x against Foo(bar=12) should be equivalent to isinstance(x, Foo) and x.bar == 12, and the latter would raise an exception in case of a typo in the attribute name. Also, it is rare for classes to have optional attributes unset, they are set to some default value (like None) much often. Finally, it is easy to override this behaviour by explicitly adding an attribute to __match_args__ indicating we shouldn't raise if such attribute was requested.

Note: this would be different from matching against a mapping, where not having a key is just a failed match. But I think this is normal, classes can behave stricter than just syntax sugar for dictionaries, more like TypedDicts in typing terminology for which the set of allowed keys is fixed.

If we agree with the stricter semantics then IIUC object.__match__() would just perform an isinstance() check and return self. So that match by name will work, but positional match will raise an exception. Btw I am not sure the latter is a conclusion of #8, but I think it probably should be. If the proxy object doesn't specify __match_args__, the positional match should raise (with an error message like <class 'Foo'> doesn't support match by position) rather than fail the match.

Guards

A "guard" is a Boolean expression which can be combined with a case. For example, you could write

match [randint(1, 6), randint(1, 6)]:
    case [a, b] if a == b: print("Pair:", a)
    ...

It's important that the extraction variables (a and b here) are bound before the guard is evaluated.

An alternative (due to @brandtbucher) is to use continue as a way to continue checking cases from inside the block:

match [randint(1, 6), randint(1, 6)]:
    case [a, b]:
        if a != b: continue
        print("Pair:", a)
    ...

Personally I think guards are very good value and easily understood, so I don't see why we shouldn't have then. (But we could also implement continue. See #9.)

AND patterns

Just found these in F#: https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/pattern-matching#and-pattern

This is a simple extension of the | (OR) pattern -- e.g. we could write

case [x, y] & [0|1, 0|1]: print("Two bits:", x, y)

But we can do this using the walrus operator:

case [x := 0|1, y := 0|1]: ...

Should we allow unpacking raw iterators?

For example,

def throw(n=1):
    for i in range(n):
        yield randint(1, 6)

match throw(2):
    case [1, 1]: print("Snake eyes")
    case [4, 2] | [2, 4]: print("Easy six")
    case x: print("Some throw:", list(x))

We sure would want the last case to print two numbers. But that would require some kind of buffering of the match values.

Type annotations or not?

We could support type annotations in patterns, like so:

match x:
    case [a: int, b: str]: print(f"An int {a} and a string {b}:)
    case [a: int, b: int, c: int]: print(f"Three ints", a, b, c)
    ...

However this makes the syntax for patterns fundamentally different from that of expressions. Also, it allows multiple ways of saying the same thing, since presumably we could also write

match x:
    case [int(a), str(b)]: print(f"An int {a} and a string {b}:)
    case [int(a), int(b), int(c)]: print(f"Three ints", a, b, c)
    ...

or:

match x:
    case [a := int(), b := str()]: print(f"An int {a} and a string {b}:)
    case [a := int(), b := int(), c := int()]: print(f"Three ints", a, b, c)
    ...

I still like the clarity of [a: int, b: str] over the other variations.

Another downside would be that you need extra parentheses if you're matching a single value of a given type (not a sequence of length 1):

match x:
    case str(a): ...

match x:
    case a := str(): ...

while with an annotation you need parentheses:

match x:
    case (a: str): ...

This becomes especially bothersome inside other patterns, e.g.

match x:
    case Foo(arg1=(a: str), arg2=(b: str)): ...

compared to

match x:
    case Foo(arg1=str(a), arg2=str(b)): ...

Authors and emails

I propose the following list of authors:

Brandt Bucher, Tobias Kohn, Ivan Levkivskyi, Guido van Rossum, Talin

@Tobias-Kohn What email do you want me to use? (For everyone else I'll look it up in the PEP index.)

Disallow "x := _"?

x := _ can always be replaced with x, and it's clearly misguided. Should we outlaw it?

I think yes, especially since we're banning _ := x.

Related: what about chained assignments, like y := (x := None) or y := x? I think these should be fine, since they seem like the only way to bind a value to two or more names. I can imagine legitimate use cases for this to be done in the pattern itself, given our name-balancing rules.

Range matching patterns?

Some languages have range cases, so you could write e.g. case 1...6: print("Dice"). This seems tricky, and quite specialized. Let's not do this?

Variable length Sequence matching

The obvious syntax for this would be case [a, b, *rest] or even case [a, b, *rest, y, z]. Here rest would always receive a list, similar to how tuple unpacking works.

Match variable scoping

I know that there is already a consensus on this, but I thought that the reasons for rejecting alternatives, such as case block scoping, should be written down, especially since this is an area where Python deviates from the other languages we're modeling this feature on.

Consider allowing negative match pattern

Probably not very important, but while reading code it looks like a negative match may be useful. For example, when people write something like not isisnstance(x, Foo) or x.y != "bar". A general rule would be that match succeeds is the pattern fails. Example syntax may use !:

match expr:
as BinaryOp(left=!IntExpr(value=0)):
    ...

We may consider this case in addition to | and &.

Do walruses only bind on a successful match?

I've just added named pattern (x := <pattern>) support to the reference implementation.

While working on this, I realized that the PEP is a bit vague. Should we always bind the value to x, or only on a successful match? I believe the intent is to bind only on success (and this is how it's implemented), but I wonder if others disagree.

User-defined patterns

Since pattern matching is runtime behavior anyway, I wonder if we could have a way for users to construct their own patterns (without having to use exec()). For example:

P1 = ConstantPattern(int(input())
x = int(input())
match x:
    case P1: print("A match!")
    case _: print("Alas")

There could be various constructors for pattern objects that could be dynamically created and combined.

I'm not sure how to handle value extractions -- maybe case P1(x) would assign an extracted value to x? The value extracted would be up to the object P1.

But now it looks just like Object matching (case Point(x, y, 0) etc.). So maybe there's nothing we have to do? A pattern object would just be a class instance with an appropriate __match__ method. The first example would have to be changed to case P1():.

Reference Implementation

Although the PEP currently defers it, I think this idea is just mature enough to start work on designing and testing a CPython implementation. In addition to a head start (if we're really targeting 3.10, that's only a year to get this working), this will hopefully also inform some of the more technical aspects of the proposal.

I'm comfortable working on everything in the compilation pipeline from the AST through the VM. So if others agree that this is a good idea, we'd need at least one parser guru on-board to make any real headway. 😉

Should it be an error if there are cases past a "catch-all" case?

If a case is case _: or case x:, nothing can go past that. Should we make it an error if one of these is followed by another case?

'Mapping' destructuring

Just like case [a, b] is a Sequence destructuring, we could define case {key: value, ...} as a Mapping destructuring. Semantics:

The most useful would probably be to ignore extra keys, so e.g. {"a": 1, "b": 2} is a match for the pattern {"a": a}.
It would match anything that subclasses collections.abc.Mapping.
For simple cases we might also write case dict(key1=a, key2=b), (implemented by dict.__match__()) but this would probably only match actual dict instances (or dict subclass instances).

Questions:

Do the keys have to be constants? Else it would be a bit tricky to decide what case {k: v} would match.
Could the keys at least be alternatives, e.g. case {"a"|"A": a}?
Is there a need for extracting the remaining keys, e.g. case {"a": a, **rest}?

Alternative proposal by Ivan, in PEP form

Ivan Levkivskyi has independently developed a PEP for matching. This has not been publicly shared yet, but I figured it would be okay to share in this private repo.

https://github.com/ilevkivskyi/peps/blob/pattern-matching/pep-9999.rst

Match use cases and design patterns

This isn't really an issue for the PEP or the implementation, but it is something that I have been thinking about related to matching: in terms of best practices, when should match be used?

Thesis: the introduction of match statements into mainstream languages like Rust and Swift represents, IMAO, a general shift away from the strict, dogmatic OOP philosophy of the 1990s that I outlined in my essay, The Rise And Fall of Object-Oriented Programming.

I've seen evidence of a rejection of OOP in a number of engineering contexts; most recently I've learned about the shift away from OOP in high-performance game programming in favor of something called Entity Component Systems.

I believe that match statements are in alignment with this trend, in that matching helps externalize the 'business logic' of operations on complex object taxonomies. Instead of having a dedicated virtual method for each potential operation on a polymorphic set of classes, the logic for dispatching is moved outside of the object and instead placed in the match statement. This offers a number of advantages over traditional OOP vtables:

It allows 'ad-hoc' operations on objects without polluting the base class with a bunch of extra methods that are only needed in exotic situations.
It recognizes that many kinds of operations represent the interaction between two classes, and don't properly 'belong' to either - for example, if I shoot your game character with a bullet, is the act of applying damage properly a method of the bullet or of the target? The answer is 'neither', and the decision to embed the logic inside one or the other is arbitrary - leading to complexification of what should normally be a simple data structure.
It allows other kinds of data - non-objects such as lists, maps and tuples - to participate in the polymorphic dispatch mechanism. Forcing the programmer to create a named class for even simple data types is not always the best approach - there are many cases where the algorithm is more important than the data structure, and excessive naming becomes a distraction rather than a help.
In some languages, the match design pattern may allow increased performance and concurrency, although these benefits probably aren't applicable in Python.

The poster-child example here is writing a compiler: basically what compilers do (other than parsing) is traverse complex graphs of heterogenous objects (ASTs and such). Having to embed the logic for the various compiler passes into the ASTs themselves leads to terrible architecture (I speak from personal trauma) that is both bloated and hard to test. Visitor patterns are the tool of choice here, and the match statement makes it much easier to throw together a visitor on the fly without a bunch of architectural infrastructure supporting it.

Other examples of application domains that feature deep, heterogenous object graphs include game worlds, planning simulations, and rich document formats such as HTML and Office Open XML.

However, there are also some cases where match statements are not an obvious good fit, even when dealing with heterogenous data. An example is cases where the data types are statically known. For example, a 3D game object may be assembled at runtime from many attributes/sub-objects (physics, shape, color, material, behavior), but these attributes are not polymorphic and can be determined without dynamic dispatch.

Also, I would still favor traditional OOP approaches to things like user interface libraries. The reason for this, I think, is because no matter how exotic and strange the set of widgets you are dealing with, the set of interactions and events applied to those widgets is fairly standardized - it's mainly just the set of environmental signals that can be produced by the user's input actions and/or the operating system, and that set tends not to expand irrespective of the application author's ingenuity. So the interaction interface - the set of valid methods - is fairly well-defined.

Another characteristic of UIs is that the class hierarchy closely models our intuitions about different kinds of user interaction patterns. It's not like we're trying to model highway traffic or the US Senate, where the classes are only a blurry approximation of reality.

Another advantage of traditional OOP is that vtable dispatch is generally going to be more performant than sequential testing of type tags, since it's just a pointer indirection.

One reason for writing this is to identify some common use cases that could be used as examples; unfortunately none of the application types I have mentioned (compilers, games) are small enough to be useful as examples. Those folks who have looked at a lot of code containing calls to isinstance might be able to better supply some suggestions for this.

How to spell anonymous default case?

We could either use case _: ... or else: ....

In favor of else, this is what you use for if-elif-elif.

Against else:

We already have too many different uses of else (for-else/while-else, if-else, try-except-else).
You don't need it, since you can write case _: ...
The special meaning of _ as a pattern is useful inside nested patterns too, e.g.

match x:
    case [1, 2, _]: print("Starts with 1, 2")
    ...

Meta: next steps

I'd like to talk about what are the next steps towards getting this PEP adopted.

Here's the current status as I see it:

Most of the major issues have converged to a consensus. There are a few remaining bikeshed issues outstanding.
Work has begun on a reference implementation.

Here's some things that need to happen, in no particular order:

A few issues are not yet labeled. I have labeled as "open" the ones that do not appear to have come to a conclusion. We should label the remaining ones.
Get all of the remaining "open" issues into a resolved (accepted, rejected, postponed) state. Label as appropriate. (Hmmm: https://github.com/apps/polls)
Finish the reference implementation.
Post the PEP proposal to a wider audience.
- Guido mentioned to me that he doesn't want to open up this discussion until after 3.9 is in the can, due to the potential distraction that would likely result.
Get an officially assigned PEP number for this.
Gather public feedback and evangelize.
Get steering council approval
Create or gather more code examples using the new feature.
Write Sphinx docs for the Python language manual (that is something I can help out with, but it seems premature to work on it until after the PEP has been ratified).

Anything missing from that list?

Can we find a better name for reference patterns?

I find the term "reference patterns" confusing (somehow I thought that was a fancy name for "name patterns" :-). Can we rename these to "constant patterns"? While they can technically be used for non-constant variables, their main use case ought to be for named constants (in the wider sense of the word -- anything that could be declared "Final" would be fair game; though I don't think we should call them "final patterns" :-).

Meta-issue: how to mark discussion status per issue

It would be nice if one could see at a glance which items are still up for discussion and which have been decided, and how that decision ended up (e.g. accepted, rejected, open, postponed).

We could for example close rejected issues, or we could keep them open but add a "Rejected" label, or we could add a notation like "[REJ]" or some Unicode character or emoji to the subject line.

I think that for each issue we should have some text in the PEP -- either in the main specification, or in the "Open Issues" section, or under "Rejected Ideas". Maybe we should also encode in whatever form of annotation we choose whether a good summary of the idea (whether accepted or not) has been included in the PEP draft.

There may also be a fourth section, "Future Extensions", listing ideas we chose not to adopt currently but which are reasonable extensions in the future. Note that one-way-door issues typically must be decided before the PEP is accepted, but rejected two-way-door issues may be placed in this section. For example, whether to use case or as is a one-way-door (we must pick one, and we can't change our mind after the code has been released), but whether to add the "one-off" syntactic shortcut is a two-way-door -- if we decided to skip this for now we could always add it later. (Of course, if we do implement it now we can't take it back later.)

What should happen if no case matches?

There are two schools of thought about this:

do nothing (similar to current if-elif-elif syntax without else)
raise an exception (I think this is common in functional languages)

I currently personally favor "do nothing" -- if people want completeness they can add a catch-all case with an assert False.

If we end up raising, what should it raise?

Parametrization

Despite the immense power of pattern matching to express ideas, there is one kind of use cases that is hardly captured so far: parametrized patterns. The idea is that in addition to a value to match against, a match-function could also take some additional constant parameters.

The interface for __match__ in Ivan's proposal implicitly included the possibility for parametrisation, and Ivan gave an example in issue #23 with ranges. Consider the following example:

match x:
    case InRange(3, 8): ...

The InRange pattern intends to check 3 <= x <= 8. But the actual pattern matching semantics would stipulate that the above means:

wrapper = InRange.__match__(x)
if wrapper.positional[0] == 3 and wrapper.positional[1] == 8: ...

This is almost impossible for InRange.__match__ to get right (even though Ivan has found some nice work around ^_^).

An example from the literature cites regular expressions as a motivator for this, i.e. something like:

match x:
    case RegEx("[0-9]*"): 
        print("Looks like a number to me")

Again, just given the (string) input x, it is impossible for RegEx.__match__ to create the correct pattern to be checked against "[0-9]*" as given here.

The truth is that both of these are not applications of plain pattern matching as such, but rather instances of parametrised pattern matching, where 3 and 8, or "[0-9]*" are actually parameters (and not value tests).

As mentioned before, Ivan's PEP (see issue #19) proposed to pass such values on to the __match__ function, anyway, thereby allowing it to work with these matches.

In F#, they basically allow __match__ to return a function and would then express the above as something of the form (once again, the syntax is adapted rather freely):

match x:
    case (3, 8) => InRange(): ...

match x:
    case "[0-9]*" => RegEx(): ...

I think, although powerful, Ivan's semantics does not fully do justice to the difference between value matched and parameters, as they are kind of intermingled. But the idea of returning a function might actually work, albeit with different syntax than in F#, perhaps like so:

match x:
    case InRange(3, 8)(): ...

match x:
    case RegEx("[0-9]*")(): ...

I think this is very similar to how decorators work and as such would have a precedent in Python. Moreover, parametrized patterns would make it substantially more versatile and powerful and is therefore something worth considering.

'continue' and 'break' in case blocks

@brandtbucher proposed to support continue in case blocks, with the semantics of skipping forward to the next case. This could be used instead of guards (#3). E.g.

match [randint(1, 6), randint(1, 6)]:
    case [a, b]:
        if a != b: continue
        print("Pair:", a)
    ...

If we were to support this we could also support break, which would exit the match statement.

Note that continue could also be used to emulate C's "fall-through" behavior (which should definitely not be the default):

match x:
    case Foo(a, b): continue
    case Bar(a, b): print("Either a Foo or a Bar", a, b)

Distinguishing Loads and Stores

@brandtbucher:

...one could write from (x): if there are no captures, or from (x) -> (y, z): to have every occurrence of y or z in a match be a value capture.

@gvanrossum:

This looks verbose to me, and is likely to cause surprises when a new case is added but the new extraction variable isn’t added to the header.

I agree it's a bit verbose, but it seems like it clears up a lot of headaches for both the user and the compiler. Example 5, rewritten in the prevailing keyword flavor:

match rtype as (a, b):
    case RUnion(items=[none_rprimitive, a]):
        return a
    case RUnion(items=[b, none_rprimitive]):
        return b
    else:
        return None

Otherwise, to me, it's not entirely clear where none_rprimitive, a, b, and maybe even RUnion are coming from / going to. Likely, this ambiguity would need to be resolved in the eval loop (like LOAD_NAME does), which is unfortunate. Here, everything is known at compile-time.

(This could open the door to capturing types like RUnion as well... but that's a separate discussion).

My opinion is that the headaches a construct like this saves vastly outnumber those due to mistakes like the one you mentioned, which a linter could probably catch in most cases.

Defining a custom match protocol (match)

When we write

match x:
    case Foo(a, b, 1|2): ...

this should probably invoke a protocol on Foo, for example Foo.__match__(...).

What should be the signature of __match__()? And should there be a default type.__match__()?

@brandtbucher proposes a library of helpers in the stdlib that can be used for various cases, e.g.

Bool(True) or Bool(False) for truthiness checking
Len(num)
Is(something)
TypeIs(some_type)
Subclass(some_type)

Alternatively you could write all of those as guards:

match x:
    case a if a: ...
    case a if not a: ...
    case Collection(a) if len(a) == num: ...
    case a if a is something: ...
    case a if type(a) is some_type: ...
    case a := some_type(): ...

Allow '**_' and '_ :='?

**_ in a mapping pattern and _ := in a named pattern don't seem like they could ever make sense.

I think we should be consistent, and always ignore stores to _ in patterns. However, we should probably warn or something when a user tries to do **_ or _ :=. Thoughts?

Top-level syntax

My leading candidate is (roughly):

match <expression>:
    case <pattern> [if <expression>]: <block>
    # repeated cases

But other versions have been proposed too. I propose that we look at this separate from the definition of patterns.

Allow binary +/- for complex numbers?

We already allow a single unary - for negation of numbers. Should we also allow binary + if the LHS is real and the RHS is imaginary, so patterns like 1+2j are possible?

I think yes.

Should we allow `*rest` in a class pattern?

The PEP currently allows Rectangle(*coordinates, painted=True), but does not specify semantics.

Is coordinates here an expression that provides constants to match some positional arguments? Or is it a name pattern that sets variable coordinates to all the positional arguments (which can be found via __match_args__)?

I think @ilevkivskyi probably meant the latter, but I misread it as the former. Maybe we're better off without it? That would also strengthen the argument against allowing **rest here -- if we have *args, why not **kwds? **kwds is hard to compute since we would have to enumerate all the attributes of the proxy returned by __match__(), and I don't expect it to be a common use case. For that matter I don't expect *args to be commonly needed either.

What should the pattern [x, x] mean?

There's disagreement over whether this checks for a tuple of two equal items, blindly assigns x the second item, or gets rejected statically. Please discuss.

Sample programs for match statement

Once the reference implementation is at a point where the match statement is actually usable, I have been trying to think of an example program that I could write to use it. It would have to be short, make use of match in multiple places, and do something sensible.

I don't want to do something like a full compiler because it would be too big, nor a text adventure because it would need a lot of non-match code.

So far the best idea I have come up with is a program to do simple algebraic manipulations like simplification and differentiation. I'd have a simple operator-precedence parser that converts an input equation into an AST - whether the AST consists of actual objects, or just nested tuples is TBD. The rest is just pattern matching and transformation. Printing out the result would also be performed by match.

Any other ideas?

Consider one-off match syntax

In @ilevkivskyi's proposal there's an interesting shorthand for a match with one case only:

if match <target> as <pattern> [and <guard>]:
    <block>

This would be a shorthand for (using Ivan's syntax):

match <target>:
case <pattern> [if <guard>]:
    <block>
else:
    pass

This is nice and compact. (And it is not a special case of an if statement -- it's a special case of a match statement.)

Question: Should it allow an else clause? (Ivan's PEP doesn't, but it makes some sense). I could imagine allowing

if match <target> as <pattern> [and <guard>]:
    <block>
else:
    <other-block>

A utilities module for pattern matching

This discussion was started in #8, but now that issue is mostly about __match__() API. So I would like to open a separate issue about providing some utilities/helpers for pattern matching. There are two questions:

Should we add a new stdlib module or add to existing one?
What should go into that module?

For the first one I think it is better to have a new module, I don't see a good fit among the existing modules. Fo the second, I think we should probably add a helper wrapper class for constructing match proxies, something like MatchWrapper, the class would use a __getattr__() and allow hiding/adding attributes for match purpose, setting __match_args__, and maybe something else. For example:

from patterns import MatchWrapper

class Point:
    def __init__(self, x: int, y: int) -> None:
        self.x = x
        self.y = y
        self._processed = False

    @classmethod
    def __match__(cls, obj):
        if isinstance(obj, cls):
            return MatchWrapper(obj, ['x', 'y'], coordinates=[obj.x, obj.y])

p = Point(1, 2)
match p:
    as Point(x, y):  # Works
        ...
    as Point(coordinates=[1, 2]):  # Also works
        ...
    as Point(_processed=False):  # Not included in allowed attributes, raises
        ...

Also it would be useful to have some custom patterns (as initially proposed in #8). I would like to have these two there Len(x) and InRange(x, y). The latter may be tricky using current "pattern-agnostic" API.

I would propose to gather more ideas about possible predefined patterns here.

Allow missing __match_args__ to accept a single positional arg

It is useful to be able to match against both the type and the value of a target. Currently, the only way to do this is to manually assign self to some attribute on the returned proxy, or use a guard.

How about this: if __match_args__ is None or is missing from the proxy (as it is from object, by default), we allow a single positional argument to ~~test for equality~~ continue matching against the proxy itself. As a result, all of these examples will just work by default:

from numbers import Real

specials = {"spam", "eggs", "cheese"}

match x:
    case bool(True):  # Doesn't match 1 or 1.0.
        ...
    case Real(42):
        ...
    case set(.specials):  # Doesn't match frozensets.
        ...

Currently, they must be spelled as:

match x:
    case bool(real=1):
        ...
    case Real() if x == 42:
        ...
    case set() if x == specials:
        ...

If a proxy wants to disable this behavior, it should just assign __match_args__ = [].

Less INDENT/DEDENT?

Arguably indenting case statements under the head match should not be done:

Other "decision statements" - if/elif/else and try/except - line up vertically. Code like the following from asyncio resembles closely the use of match in the use of isinstance for match, but of course it is not indented:

        try:
            nbytes = sock.recv_into(buf)
        except (BlockingIOError, InterruptedError):
            return  # try again next time
        except (SystemExit, KeyboardInterrupt):
            raise
        except BaseException as exc:
            fut.set_exception(exc)
        else:
            fut.set_result(nbytes)

Additional indentation in decision statements is a visual indication of complexity. Match statements change this assumption. And of course frequently Python developers, myself included, try to avoid increasing indentation by returning early.
Minimization of diffs is a good thing. This is especially true when we consider that we want to encourage refactoring, at the very least in the stdlib. Or the occasional need to blame. Compare the following:

2,3c2,4
<     if isinstance(value, str):
<         self.simple_element("string", value)
---
>     match value:
>         case str():
>             self.simple_element("string", value)
5,6c6,7
<     elif value is True:
<         self.simple_element("true")
---
>         case  True:
>             self.simple_element("true")
8,9c9,10
<     elif value is False:
<         self.simple_element("false")
---
>         case False:
>             self.simple_element("false")
11,15c12,16
<     elif isinstance(value, int):
<         if -1 << 63 <= value < 1 << 64:
<             self.simple_element("integer", "%d" % value)
<         else:
<             raise OverflowError(value)
---
>         case int():
>             if -1 << 63 <= value < 1 << 64:
>                 self.simple_element("integer", "%d" % value)
>             else:
>                 raise OverflowError(value)
17,18c18,19
<     elif isinstance(value, float):
<         self.simple_element("real", repr(value))
---
>         case float():
>             self.simple_element("real", repr(value))
20,21c21,22
<     elif isinstance(value, dict):
<         self.write_dict(value)
---
>         case dict():
>             self.write_dict(value)
23,24c24,25
<     elif isinstance(value, (bytes, bytearray)):
<         self.write_bytes(value)
---
>         case bytes() | bytearray():
>             self.write_bytes(value)
26,27c27,28
<     elif isinstance(value, datetime.datetime):
<         self.simple_element("date", _date_to_string(value))
---
>         case datetime.datetime():
>             self.simple_element("date", _date_to_string(value))
29,30c30,31
<     elif isinstance(value, (tuple, list)):
<         self.write_array(value)
---
>         case tuple() | list():
>             self.write_array(value)
32,33c33,34
<     else:
<         raise TypeError("unsupported type: %s" % type(value))
---
>         case _:
>             raise TypeError("unsupported type: %s" % type(value))

versus

2c2,3
<     if isinstance(value, str):
---
>     match value:
>     case str():
5c6
<     elif value is True:
---
>     case  True:
8c9
<     elif value is False:
---
>     case False:
11c12
<     elif isinstance(value, int):
---
>     case int():
17c18
<     elif isinstance(value, float):
---
>     case float():
20c21
<     elif isinstance(value, dict):
---
>     case dict():
23c24
<     elif isinstance(value, (bytes, bytearray)):
---
>     case bytes() | bytearray():
26c27
<     elif isinstance(value, datetime.datetime):
---
>     case datetime.datetime():
29c30
<     elif isinstance(value, (tuple, list)):
---
>     case tuple() | list():
32c33
<     else:
---
>     case _:

Looking at the evolution of the codebase with the latter diff, we are much more confident on what has happened - the new match statement has been adopted, with corresponding simplification.

Should we use 'case' or 'as'?

We've been using case, like this:

match target:
    case Point(x, y):
        ...
    case Rectangle(x0, y0, x1, y1):
        ...

@ilevkivskyi proposes as, like this:

match target:
    as Point(x, y):
        ...
    as Rectangle(x0, y0, x1, y1):
        ...

Let's decide.