Despite the immense power of pattern matching to express ideas, there is one kind of u

I've created a PR <a aria-label="Pull request #56" class="issue-link js-issue-link" da

Hm... Let's say we have two classes: <div class="snippet-clipboard-content notrans

Parametrization,about gvanrossum/patma

Comments (30)

viridia commented on May 26, 2024 1

I've created a PR #56 which talks about this issue.

However, I would still like to consider this issue further. It seems to me that we don't actually need a complex extended matching protocol in order to achieve the flexibility that @Tobias-Kohn is looking for; we just need to find a palatable syntax.

Essentially what we need is 'instance matching' - that is, the ability to explicitly (not implicitly!) test against a pattern that is a class instance rather than a class. The 'parameters' mentioned previously can simply be arguments to that class's constructor.

Since we have already established that the syntax X(args) is used for class matching, we would need something different to indicate an instance match. Several ideas spring to mind:

The first is the already-mentioned double-call:

  Pattern(constructor-args)(match-vars)

...however, some folks opined that this looked ugly.

Some other possible variants:

  InRange<0, 6>
  InRange[0, 6]
  InRange{0, 6}

Note that since we don't allow binary operators in match expressions, the use of angle-brackets should not cause a syntactical ambiguity.

In my thinking, what is between the braces would be the constructor arguments; the match variables would be optional:

  # Both are valid
  InRange<0, 6>
  InRange<0, 6>(value)

A longer example:

match m:
  case InRange<0, 7>:
    print("Half dozen")
  case InRange<6, 13>:
    print("A dozen")
  case 13:
    print("Baker's dozen")
  case _:
    print("Lots")

This is different than the earlier proposal in that what I am doing is using syntax, rather than an extended protocol, to indicate that the pattern behaves differently than other types of patterns.

The main objection here is that it looks odd to be using angle-brackets to do what is effectively a function call or class construction which is ordinarily done using parens. However, since we've already reserved the use of parens for declaring match variables, then that's off the table.

Or is it? A different approach would be to flip them: Have () always mean 'call', even in match patterns, and then use one of the other operators, such as angle-brackets, to declare the pattern variables. The syntax between the angle brackets would be exactly the same as we've specified elsewhere, allowing '**rest' and so on.

So some examples:

Point<x, y> # class match pattern
Point(1, 2) # instance match pattern
Point(1, 2)<x, y> # Instance match pattern with explicit pattern variables

There is some justification for this: all of the other enclosing delimeters '()', '[]', and '{}' have well-established meanings in Python; angle brackets do not. The downside of this is that there are no other bracket characters available in the ASCII character set, so once we use this there are none left for other purposes. Also, angle brackets have a well-established meaning in other languages (C++) which is not the same as what is being proposed here.

from patma.

ilevkivskyi commented on May 26, 2024

I think Guido convinced me that use cases where current proposal doesn't work (i.e. requires "trivial" guarded matches like x if re.match("[0-9]*", x) or x if 1 <= x < 9) will be rare. So maybe instead of trying to extend the current API we can focus on future-proofing it?

One possible way could be to define the __match__ signature as:

def __match__(cls, obj, context): ...

where context in current proposal would contain only some basic info like number of positional matches, and names of name matches. And most implementations will just ignore this argument.

If in future people will ask about something, the context object can grow some new attributes (like what literals have been passed, or kinds of sub-patterns, etc.) without breaking backwards compatibility (and requiring __match_ex__() as it happened with pickle protocol).

It is still less powerful than callbacks, but IMO should be pretty flexible.

from patma.

Tobias-Kohn commented on May 26, 2024

I am not particularly happy with passing in this context: to my eyes it violates the idea of separation of concerns. But more importantly, I think it is not correct semantically to just treat parameters the same way as variable bindings. One of the most obvious differences is that a parameter can never be left blank or bound to a variable, i.e. you cannot do re.march(_, x), say.

Also, there must be a misunderstanding: I do not propose callbacks in any way.

If I recall correctly, parametrisation was also briefly mentioned some time before, concerning types. Should it be possible to check if x has a type like list[int]? With parametrisation, list[int] could itself return the class with an appropriate __match__ method, allowing to do something like, e.g.:

match x:
    case list[int]():
        print("A list with all integers.")

from patma.

gvanrossum commented on May 26, 2024

I have to agree that passing an extra context object looks inelegant. OTOH this is one of the decisions that is worth extra consideration, because it's not easy to revert in the future. This is sometimes called a "one-way door" decision -- you can't go back.[1]

@Tobias-Kohn has shown two possible ways out: we could allow syntax like InRange(3, 8)() or list[int]() (or both). I think if we ever decided that patterns were so ubiquitous that we needed parameterized patterns, they could be written using one of those syntactic forms, and there won't be enough other future use cases for passing an extra context object.

Then again, InRange(3, 8)() is also pretty ugly -- I find it easy to overlook the extra pair of parentheses, and not everyone is familiar with this "higher-order functions" ( :-) ) style of coding. And list[int]() is no better.

All in all, I'm -0.5 on passing context to __match__ -- in part because it's ugly, in part because I think we won't need it. And I'm -1 on supporting InRange(3, 8)() or list[int]() in the first release.

[1] Some frameworks get away with adding optional parameters to callbacks by introspecting the called function. But that's pretty fragile (it only really works for vanilla pure-Python functions) and hence I really don't like that.

from patma.

viridia commented on May 26, 2024

A different extensibility mechanism would be to reserve a different magic method name - let's call it __match_ext__ for sake of argument - for more flexible matching with a different protocol. The interpreter would first test for the presence of this method, and if not present, fall back to using __match__.

__match_ext__ differs from __match__ in that it completely handles the matching operation and leaves little for the interpreter to do, whereas __match__ is higher-performance and lets the interpreter do most of the heavy lifting.

Most 'regular' classes and dataclasses would simply use __match__.

Even if you decide not to support __match_ext__ in the initial release, it might make sense to reserve the name to avoid future conflicts.

The main issue is, I don't know how much of a performance penalty there would be for testing for the existence of a method - I imagine 95% of the time it would be a miss.

from patma.

gvanrossum commented on May 26, 2024

All dunder names are implicitly reserved. :-)

That check would cost less than the cost of calling a no-op Python function.

I worry about another cost though. We don't just have to check for the presence of __match_ext__. We also have to be able to construct the argument. Even if it's rarely going to be used, this might double the amount of code generated for a typical pattern.

(For an idea on what code to generate for patterns, search for translate in patma.py.)

from patma.

viridia commented on May 26, 2024

I'm not sure I understand your comment about cost and additional code. Isn't __match__ a method on the match expression, not the object being matched (it took me a little while to grasp this distinction BTW)? And if that is the case, shouldn't __match_ext__ only be defined on match expressions that actually need that extra flexibility? And as far as constructing the argument goes, can't that be done after we determine that __match_ext__ exists? Maybe I am not understanding how this works.

In any case, my suggestion was meant to be an alternative to adding context as an additional parameter for future extensibility. If we feel that the existing __match__ protocol has sufficient flexibility (and at this point I am not sure which __match__ protocol we are talking about - the original one in this repo, the proxy object one, or Ivan's original one) then I'll happily withdraw the suggestion. :)

from patma.

gvanrossum commented on May 26, 2024

Hm... Let's say we have two classes:

class A:
    def __match__(...): ...
class B:
    def __match_ext__(...): ...

Now if we have some match statement,

match target:
    case A(x): ...
    case B(x): ...

Do you expect the bytecode compiler to generate different code for the first case than for the second? The problem with that idea is that the bytecode compiler does not know what any name means (not even locals or builtins!), so it cannot generate different code.

So yeah, the argument doesn't have to be constructed for the second case, but the bytecode needed to construct it must still be generated, causing code object size bloat. That's what I meant, anyway.

The __match__ protocol we're talking about is the proxy one that you first proposed. I think I wrote up a spec here.

from patma.

Tobias-Kohn commented on May 26, 2024

Yes, I fully agree that InRange(3, 8)() is quite ugly, indeed (and there is whole lot more to say about it not being a particularly good choice)—although I was not really concerned with exact syntax at the moment, but more with the concept behind it.

After having given it some more thought, we could actually quite easily do it with the machinery already in place accoring to our current protocols:

InRange = CreateInRange(3, 8)
DigitRegEx = RegEx("[0-9]*")
match x:
    case InRange(): ...
    case DigitRegEx(): ...

This works because Python has to look up the names of the classes involved, anyway, much in line with Guido's remark:

The problem with that idea is that the bytecode compiler does not know what any name means (not even locals or builtins!), so it cannot generate different code.

And, assuming that a pattern match is called fairly frequently, if we define stuff like InRange and DigitRegEx just once, there will probably a significant gain in performance over any other solution.

So, let me change my stance and reformulate this as a new hypothesis: there is no actual need for context objects being passed to __match__ or a __match_ex__ method. There is a better solution to solve cases where this seems necessary. And this probably also makes it clearer that the 3, 8 and "[0-9]*" are parameters to the patterns and not something that should appear inside the actual case-match itself.

from patma.

gvanrossum commented on May 26, 2024

That does not convince me. At this point it would probably be cleaner to use a regular ‘if’ with the parameters to the test explicitly in there, or a guard if there are other cases that need the power of patterns. What you’ve shown is just that you can shoehorn other tests into the pattern matching machinery, but not that it would be the right solution.

That said, I still find the context parameter ugly.

from patma.

ilevkivskyi commented on May 26, 2024

That said, I still find the context parameter ugly.

Probably off-topic for this issue, but there is one important argument why context may be needed even if we don't consider parametrization: Python classes often have overloaded constructors. For example, let us consider a mystery class called Gid with two possible way to instantiate it: Gid(hex: bytes) and Gid(colo: int, id: int). There are two issues:

For all such classes match by position is essentially off-limits, because x in Gid(x) and Gid(x, y) are different values. IOW such classes would want to put different values in the corresponding positions depending on how many of them were requested.
"Pseudo-constants" can be misleading/confusing. Imagine the authors of the class actually choose one of the two constructors as canonic one, for example Gid(colo, id). Then the point is that when Gid(b"0000") appears somewhere in the pattern it is a pattern not a value, so users will be surprised when they discover Gid(b"0000") doesn't match Gid(b"0000").

Also probably context is not a good name. A better choice may be something like:

def __match__(instance, signature):
    ...

On a more philosophical level, IMO pattern matching should be considered as an inverse of the constructor, not as unpacking an object. In simple cases these two are the same, but there are real use cases where these two are different.

from patma.

gvanrossum commented on May 26, 2024

But then we again have to answer the question what data structure is passed into signature, and how much about the pattern it reveals.

I would personally just not set __match_args__ (or set it to None or ()), which will require the user to write either Gid(colo=..., id=...) or Gid(hex=...).

from patma.

ilevkivskyi commented on May 26, 2024

But then we again have to answer the question what data structure is passed into signature, and how much about the pattern it reveals.

I think as less as possible, it will be easier to add than to remove some information. From what I see now, it looks like number of positional matches should be there. Actually, maybe it is the only thing that should be there? Then it will be kind of symmetric, we call __match__(obj, n) and expect __match_args__ to be of length n. This also removes the need for __match_args_required__.

I would personally just not set __match_args__ (or set it to None or ()), which will require the user to write either Gid(colo=..., id=...) or Gid(hex=...).

I think users (and then authors) of various popular libraries will complain that they can't use match by position. And, as you said, not giving any context/signature information is a one-way decision.

from patma.

gvanrossum commented on May 26, 2024

Honestly I think the design of Gid() is wrong. Or else we should look for a proper solution for overloaded signatures that doesn’t complicate life for everyone else.

from patma.

ilevkivskyi commented on May 26, 2024

Honestly I think the design of Gid() is wrong.

This is just an example. Do you think having overloaded constructors is wrong in general?

Or else we should look for a proper solution for overloaded signatures that doesn’t complicate life for everyone else.

What kind of complication do you mean here? Typing an extra argument that people are free to not use if they don't need to? __exit__() for example requires three arguments that many people don't use and I didn't hear any complains. Anyway, it doesn't look like we are going to find an agreement here.

from patma.

gvanrossum commented on May 26, 2024

Honestly I think the design of Gid() is wrong.

This is just an example. Do you think having overloaded constructors is wrong in general?

Yeah, I think they're problematic. Since @overload only works for the type checker, there's no clean way to implement them: it's usually awkward checking of len(args) and/or the type of some arguments (actually a match args with various cases might be of help here :-). Even something as simple as range(n) vs. range(lo, hi) is ugly to implement. The better pattern is class methods like dict.from_keys() and datetime.fromtimestamp().

In addition, what if the signature overloading isn't on the number of arguments but on the type of an argument? E.g. if we had Gid(hex: bytes) and Gid(id: int), presumably __match__(target, 1) should raise an exception (for any type of target!), in order to force the user to use Gid(hex=...) or Gid(id=...). But then we might as well use the __match_args_required__ mechanism (and leave it unset or zero).

Or else we should look for a proper solution for overloaded signatures that doesn’t complicate life for everyone else.

What kind of complication do you mean here? Typing an extra argument that people are free to not use if they don't need to? __exit__() for example requires three arguments that many people don't use and I didn't hear any complains. Anyway, it doesn't look like we are going to find an agreement here.

Yes, I do think that the extra argument is a nuisance. For example, I recently worked on some changes to traceback.print_exception(). For historical reasons this requires three arguments, one of which is no longer used, and one of which is allowed to be None but must still be present. Each time after I had been distracted by a meal or nap I started out forgetting this. (And if we were to design __exit__() from the ground up for Python 3, we might pass it just a single argument, the exception -- the exception type is just exc.__class__, and the traceback is exc.__traceback__.)

But indeed, I'm happy to agree to disagree. I also understand that there may be other things in our future that will force us to pass more info to __match__(). But it would definitely complicate everything for some edge case.

from patma.

ilevkivskyi commented on May 26, 2024

But indeed, I'm happy to agree to disagree.

Let me try to clarify where exactly we disagree. IIUC we agree that majority of people who will use pattern matching will not write custom __match__() at all. They will be fine with just object.__match__() or using dataclasses or setting class-level __match_args__. For the minority of people who will use __match__() I see these situations where they would need it:

Add some attributes to the proxy that are not normally accessible on instances.
Adjust __match_args__ depending on how many positional matches were requested.
Error on particular combination of named matches, like Gid(hex=..., id=...), or particular number of positional matches (accept only one or three), etc.

So you think most people (among those who will actually use __match__()) will fall into the first bucket, while I think amount in all three buckets will be roughly the same. Is this right? I mean the first one can be implemented without knowing the signature, while two other can't be. Or am I missing something?

from patma.

gvanrossum commented on May 26, 2024

I think we need to look at some more examples in detail to see how important this is.

I think we shouldn't be distracted too much by the idea that a class pattern must match the constructor signature -- this is a handy guideline, not a hard rule (just like the rule that repr() ought to look like a constructor call).

Given what I recall of the Gid class, I think that (except perhaps in some tests) there will be few situations where people would match on specific values of hex, id or colo (that's something where a regular if works fine); more likely people will have received a value that represents a Gid and they might want to use match to figure out what they got:

# Normalize gid to a Gid instance:
match gid:
    case Gid():
        return gid
    case bytes(hex):
        return Gid(hex)
    case (colo, id):
        return Gid(colo, id)

And honestly, case Gid(hex=hex, id=id, colo=colo) seems a fine way to extract both the colo/id pair and the hex value out of a Gid object (e.g. for logging).

But that's just that example. In the stdlib classes with such subtle signatures are rare, or at least I can't think of any other than range(). The more common pattern is alternate constructors like date.from_keys(), and this proposal isn't going to help there.

I feel that when in doubt, patterns should use keyword args rather than positional args (which the default object.__match__() favors anyway), and I don't think we should encourage classes to disallow certain combinations of attributes. To me, the pattern Class(attr1=pat1, attr2=pat2) just means isinstance(target, Class) followed by either matching on attr1/attr2 or extracting those.

from patma.

ilevkivskyi commented on May 26, 2024

A real-life example is Column form sqlalchemy, it is either Column(String) or Column("name", String). Also it is traditionally called by position, I have never seen Column(type_=String). Various sqlalchemy related functions are often very "polymorphic", they accept either a literal, or a column, or whatnot. So code like this will likely be common:

match subs:
    as int():
        ...
    as str():
        ...
    as Column(Integer):
        ...
    as Column(String):
        ...

and people may be annoyed if they will be forced to write Column(type_=...) everywhere in matches. I could imagine other ORMs my have similar situations.

from patma.

Tobias-Kohn commented on May 26, 2024

Without having any knowledge of sqlalchemy, it seems to me like the example above could easily be expressed as:

match subs:
    ...
    case Column(x := int()):
        ...
    case Column(x := str()):
        ...

In general, however, I completely agree with Guido on this, and I think it is a bad idea to support such subtle and rather fragile patterns. A much better design would be to use the analogue to alternate constructors and have something like Column.withName, say for the example with a name argument. This could fairly easily be implemented along the lines of:

class Column:
    ...
    class withName:
        def __match__(self, value):
            ...

We could probably add a decorator @match_function to the utilities module that takes any method or function and creates a class with the same name and a __match__-method, saving some indentation when compared to the above solution.

from patma.

Tobias-Kohn commented on May 26, 2024

IMHO, the __match__ method should really just be concerned about de-constructing a given object, i.e. check if it recognises its structure and if so extract the data stored in the object. The __match__ should not be concerned about how it is used or applied inside a match statement.

The code guidelines I am familiar with all stress the importance of being explicit and use named arguments whenever there is a possible ambiguity. Allowing patterns to subtly change matching behaviour on positional arguments does not seem like a good idea to me.

Even in case of something like a vector where you want to stress that there need to be exactly three positional arguments for it to make sense, I think it would be cleaner to put them into a list and express it like so:

match v:
    case vector([x, y, z]): 
        ...

rather than force any use of vector as a pattern to specify at least/exactly three arguments.

In Ivan's original proposal, the context would capture which positional arguments where values, and which had to be extracted by the __match__ method. This would mean that we have to be extra careful about evaluation order. In the following example, y would be evaluated before a call to foo.__match__ so that its value could be passed on to the match function, whereas z() clearly needs to be evaluated afterwards.

match x:
    case foo(.y, z()):

It is, admittedly, just a small matter, but I think it shows that violating the separation of concerns has further ramifications and is not necessarily the way to make pattern matching easier or more usable.

What I was trying to convey further up: any such thing as a context, signature, etc. is a parameter that we pass into the match-function and therefore works entirely differently to extract data and comparing it to a given value. If we went for such parameters, they should be explicitly marked and visible as such.

from patma.

gvanrossum commented on May 26, 2024

I think there's a way out of this dilemma -- we can make __match__ future proof without having to resort to adding __match_ext__ and trying both __match_ext__ and __match__ for backwards compatible.

Instead, if in the future we need feel the need to pass extra context to __match__, we can define a decorator in our handy match helpers module (#23) that can be used to decorate a __match__ function. The decorator can set a function attribute on the decorated function that we can check, and we will only pass the extra argument(s) if that attribute is set.

For example,

class MyClass:
    @match_with_context
    def __match(self, context):
        ...

This is more verbose for users who need it, but is more extensible than defining __match_ext__, and allows us to see this issue as a two-way door.

from patma.

brandtbucher commented on May 26, 2024

The decorator can set a function attribute on the decorated function that we can check, and we will only pass the extra argument(s) if that attribute is set.

What would this look like for C extensions? A flag on the type?

from patma.

gvanrossum commented on May 26, 2024

The decorator can set a function attribute on the decorated function that we can check, and we will only pass the extra argument(s) if that attribute is set.

What would this look like for C extensions? A flag on the type?

Good question. I don't think it's easy to simulate function attributes in C code -- and come to think of it it's not so easy to check for it. So, honestly, none of my alternatives are any better than the __match_ext__ pattern. And that's fine. Tobias gave some fine arguments; most strongly:

The __match__ should not be concerned about how it is used or applied inside a match statement.

from patma.

viridia commented on May 26, 2024

Need to add to the PEP why this was postponed, and the fact that is a two-way door.

from patma.

Tobias-Kohn commented on May 26, 2024

The idea to clearly visually and syntactically distinguish between parameters and match-variables is very nice. I am usually not much of a fan of these angle-brackets, but here they seem like a viable option. I particularly like this visual separation between parameters and match-variables, although it probably still won't win a beauty contest.

As far as I am aware, the angle-brackets are mostly used for type-parametrisation, which does not seem too far off from what we are doing here. Whether I say List<Integer> in Java or range<2, 5> in future Python, we are getting a specialised instance of a more general pattern in both cases.

Parametrisation is certainly a rather rare (although powerful) case, even more so when compared to the usual match deconstructor. I would therefore rather use range<2, 5>(x) than range(2, 5)<x>, notwithstanding that your argument for it being the other way round makes sense (i.e. that the range(2, 5) is an actuall call whereas the Point<x, y> is something new). Up to a certain point, the use of the same syntax for the constructor and de-constructor is not only intentional, but can really help in understanding the syntax. In that way, it is an extension of x == (2, 3), where we construct an object on the RHS for the single purpose of comparison.

In order to ameliorate the performance issue of instantiating a new object for each case clause, we could also pick up @ilevkivskyi's proposal and pass in the parameters as a tuple to the __match__ method. __match__ would then still be free to create a new instance if so desired.

The main difference to @ilevkivskyi's original proposal is that I would not want to pass the context of what we are matching into __match__ and thereby mix parameters with match-variables. But if we clearly limit ourselves to parameters, this seems like a viable alternative.

The question is: is it worth introducing this new syntax for parametrisation? In the end, I am a bit torn here. Of course, I would absolutely love to have such a parametrisation feature included. But given the issues and concerns, I am also ok if we just drop it for the time being and perhaps mention it in the PEP.

from patma.

viridia commented on May 26, 2024

Well, the same argument that applies to angle brackets also applies to square brackets: although they are normally used for array / map lookups, in Python type annotations they mean exactly what angle brackets mean in C++: type parameters. So if you are going to make the argument that that these brackets are analogous to "parameters" in the type sense rather than in the function call sense, then square brackets are probably more Pythonic.

In fact, let's generalize this: with the introduction of patterns, Python now has 3 syntactical contexts where the parsing rules are different (four if you count the content of docstrings):

Normal context
Type annotation context
Pattern context

Each of these contexts has its own expression grammar, and the meaning of the various operators and delimiters is context-dependent. At the same time, they are all homologues: the various grammars are related to each other in intuitive meaning, although not identical.

One of the other advantages of using syntax to pass in these additional parameters is that the overhead of construction is only paid by objects that opt-in, as opposed to having it paid in every custom match.

I agree that we don't need to decide this now, but that's no reason for the brainstorm to stop, especially if it's fun :). Although I admit that the logic of these syntactical arguments borders on the Jesuitical at times...

from patma.

Tobias-Kohn commented on May 26, 2024

Please don't get me wrong: I actually enjoy this discussion and even there is nothing wrong with wondering how many angels might dance on the head of a pin 😉. I am just trying not to continue pressing for something that has not been that well received.

The square brackets might be more Pythonic, indeed. But my argument was rather that the current use of angle brackets in C-languages is not too far away from our idea of parametrisation. So, I don't expect that people familiar with generic types would be confused by Python's potential use of the syntax. Scala consistently uses square brackets for type parameters, which forces it to use regular parentheses even for list/array access. This took me a while to get used to, but in the end it works just fine. So, long story short: I am just saying that using angle brackets really should be fine.

Coincidentally, your example contains an example where I feel that parametrisation would actually help. This is something that I have come across before, and it kind of irks me that the regular expression matcher does not simply return the "token-type" and "value", but must be interrogated.

  m = TOKENS.match(self.input, self.pos)
  if m:
      self.pos = m.end()
      if m.group("ws"):
          continue
      elif value := m.group("ident"):
          self.token = (TokenType.IDENT, value)
          return
      elif value := m.group("number"):
          self.token = (TokenType.NUMBER, value)
          return
      elif value := m.group("oper"):
          self.token = (TokenType.OPER, value)
          return

To my exes, the above code would be nicer if we could do something like this:

m = TOKENS.match(self.input, self.pos)
if m:
    self.pos = m.end()
    match m:
        case group<"ws">():
            continue
        case group<"ident">(value):
            self.token = (TokenType.IDENT, value)
            return
        case group<"number">(value):
            self.token = (TokenType.NUMBER, value)
            return
        case group<"oper">(value):
            self.token = (TokenType.OPER, value)
            return

However, the group<"ws">() thing is not necessarily a constructor, but more a case where we want to pass in some information in the sense @ilevkivskyi proposed.

In case of no parameters present I just pictures __match__() to receive None or an empty tuple as second parameter. I feel this would not be much of a price to pay: the actual constructing of a tuple to hold the parameters is still on an opt-in basis.

from patma.

viridia commented on May 26, 2024

So I realized another problem with the idea of having two separate parameter lists, which is that there is additional runtime overhead. With most of the other pattern flavors, all of the matching code can be inlined (except for actually calling __match__) by the compiler.

However, for class instance patterns (which is what I am calling these), the actual pattern object needs to be constructed at the time of the match statement, and then immediately thrown away. It can't be compiled in advance, because the constructor arguments might not be constants. (I suppose if they are constants, the compiler could construct a static instance of the pattern, but then you have to worry about polluting state.) So you are creating a new object for each case clause.

Part of the conundrum here (pardon the digression) is that Python doesn't have constants. This not only makes this hard case to optimize, it makes the match statement, and many other things in Python, hard to optimize as well. (Although not as hard to optimize as PHP, which allows even local variable name lookups to be overridden.)

And it's hard to imagine how you could add constants to Python without making it a completely different language. Yes, you could probably introduce a 'final import' statement to tell the compiler that the imported symbols will never be overwritten, and make an Enum metaclass that lets the compiler know that the members of the enum will never be changed. But because the Python compiler only sees a single source file at a time, there's no way to know that a.b is constant if a is imported from another file.

And even if you managed to solve these problems, there's another impact: every programmer who cares about maximizing performance - or who thinks that downstream users of their library care about performance - would start using constants everywhere. A beginner looking a a random sample of Python code in GitHub would be scratching their head and saying to themselves, "what's all this const const const I see everywhere?"

from patma.

gvanrossum commented on May 26, 2024

FWIW we do have a way to indicate finality, but only for static type checkers. See PEP 591:

foo: Final = 42
bar: Final[list[int]] = []
class C:
  @final
  def method(self): ...  # Cannot be overridden

from patma.

Parametrization about patma HOT 30 OPEN

Comments (30)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent