* markdown lint fixes for 8.0 proposals Summary: 1. a few header formatting fixes 1. use `csharp` or `antlr` as the language identifier for code fences. 1. update links to relative links for articles published on docs. * change the links to records.md to absolute links When published on docs.microsoft.com, the links from these proposals will resolve to the source on dotnet/csharplang. That's the desired behavior for records.md, because that is a proposal feature that has not been implemented.
20 KiB
Pattern Matching for C# 7
Pattern matching extensions for C# enable many of the benefits of algebraic data types and pattern matching from functional languages, but in a way that smoothly integrates with the feel of the underlying language. The basic features are: record types, which are types whose semantic meaning is described by the shape of the data; and pattern matching, which is a new expression form that enables extremely concise multilevel decomposition of these data types. Elements of this approach are inspired by related features in the programming languages F# and Scala.
Is expression
The is
operator is extended to test an expression against a pattern.
relational_expression
: relational_expression 'is' pattern
;
This form of relational_expression is in addition to the existing forms in the C# specification. It is a compile-time error if the relational_expression to the left of the is
token does not designate a value or does not have a type.
Every identifier of the pattern introduces a new local variable that is definitely assigned after the is
operator is true
(i.e. definitely assigned when true).
Note: There is technically an ambiguity between type in an
is-expression
and constant_pattern, either of which might be a valid parse of a qualified identifier. We try to bind it as a type for compatibility with previous versions of the language; only if that fails do we resolve it as we do in other contexts, to the first thing found (which must be either a constant or a type). This ambiguity is only present on the right-hand-side of anis
expression.
Patterns
Patterns are used in the is
operator and in a switch_statement to express the shape of data against which incoming data is to be compared. Patterns may be recursive so that parts of the data may be matched against sub-patterns.
pattern
: declaration_pattern
| constant_pattern
| var_pattern
;
declaration_pattern
: type simple_designation
;
constant_pattern
: shift_expression
;
var_pattern
: 'var' simple_designation
;
Note: There is technically an ambiguity between type in an
is-expression
and constant_pattern, either of which might be a valid parse of a qualified identifier. We try to bind it as a type for compatibility with previous versions of the language; only if that fails do we resolve it as we do in other contexts, to the first thing found (which must be either a constant or a type). This ambiguity is only present on the right-hand-side of anis
expression.
Declaration pattern
The declaration_pattern both tests that an expression is of a given type and casts it to that type if the test succeeds. If the simple_designation is an identifier, it introduces a local variable of the given type named by the given identifier. That local variable is definitely assigned when the result of the pattern-matching operation is true.
declaration_pattern
: type simple_designation
;
The runtime semantic of this expression is that it tests the runtime type of the left-hand relational_expression operand against the type in the pattern. If it is of that runtime type (or some subtype), the result of the is operator
is true
. It declares a new local variable named by the identifier that is assigned the value of the left-hand operand when the result is true
.
Certain combinations of static type of the left-hand-side and the given type are considered incompatible and result in compile-time error. A value of static type E
is said to be pattern compatible with the type T
if there exists an identity conversion, an implicit reference conversion, a boxing conversion, an explicit reference conversion, or an unboxing conversion from E
to T
. It is a compile-time error if an expression of type E
is not pattern compatible with the type in a type pattern that it is matched with.
Note: In C# 7.1 we extend this to permit a pattern-matching operation if either the input type or the type
T
is an open type. This paragraph is replaced by the following:Certain combinations of static type of the left-hand-side and the given type are considered incompatible and result in compile-time error. A value of static type
E
is said to be pattern compatible with the typeT
if there exists an identity conversion, an implicit reference conversion, a boxing conversion, an explicit reference conversion, or an unboxing conversion fromE
toT
, or if eitherE
orT
is an open type. It is a compile-time error if an expression of typeE
is not pattern compatible with the type in a type pattern that it is matched with.
The declaration pattern is useful for performing run-time type tests of reference types, and replaces the idiom
var v = expr as Type;
if (v != null) { // code using v }
With the slightly more concise
if (expr is Type v) { // code using v }
It is an error if type is a nullable value type.
The declaration pattern can be used to test values of nullable types: a value of type Nullable<T>
(or a boxed T
) matches a type pattern T2 id
if the value is non-null and the type of T2
is T
, or some base type or interface of T
. For example, in the code fragment
int? x = 3;
if (x is int v) { // code using v }
The condition of the if
statement is true
at runtime and the variable v
holds the value 3
of type int
inside the block.
Constant pattern
constant_pattern
: shift_expression
;
A constant pattern tests the value of an expression against a constant value. The constant may be any constant expression, such as a literal, the name of a declared const
variable, or an enumeration constant, or a typeof
expression.
If both e and c are of integral types, the pattern is considered matched if the result of the expression e == c
is true
.
Otherwise the pattern is considered matching if object.Equals(e, c)
returns true
. In this case it is a compile-time error if the static type of e is not pattern compatible with the type of the constant.
Var pattern
var_pattern
: 'var' simple_designation
;
An expression e matches a var_pattern always. In other words, a match to a var pattern always succeeds. If the simple_designation is an identifier, then at runtime the value of e is bound to a newly introduced local variable. The type of the local variable is the static type of e.
It is an error if the name var
binds to a type.
Switch statement
The switch
statement is extended to select for execution the first block having an associated pattern that matches the switch expression.
switch_label
: 'case' complex_pattern case_guard? ':'
| 'case' constant_expression case_guard? ':'
| 'default' ':'
;
case_guard
: 'when' expression
;
The order in which patterns are matched is not defined. A compiler is permitted to match patterns out of order, and to reuse the results of already matched patterns to compute the result of matching of other patterns.
If a case-guard is present, its expression is of type bool
. It is evaluated as an additional condition that must be satisfied for the case to be considered satisfied.
It is an error if a switch_label can have no effect at runtime because its pattern is subsumed by previous cases. [TODO: We should be more precise about the techniques the compiler is required to use to reach this judgment.]
A pattern variable declared in a switch_label is definitely assigned in its case block if and only if that case block contains precisely one switch_label.
[TODO: We should specify when a switch block is reachable.]
Scope of pattern variables
The scope of a variable declared in a pattern is as follows:
- If the pattern is a case label, then the scope of the variable is the case block.
Otherwise the variable is declared in an is_pattern expression, and its scope is based on the construct immediately enclosing the expression containing the is_pattern expression as follows:
- If the expression is in an expression-bodied lambda, its scope is the body of the lambda.
- If the expression is in an expression-bodied method or property, its scope is the body of the method or property.
- If the expression is in a
when
clause of acatch
clause, its scope is thatcatch
clause. - If the expression is in an iteration_statement, its scope is just that statement.
- Otherwise if the expression is in some other statement form, its scope is the scope containing the statement.
For the purpose of determining the scope, an embedded_statement is considered to be in its own scope. For example, the grammar for an if_statement is
if_statement
: 'if' '(' boolean_expression ')' embedded_statement
| 'if' '(' boolean_expression ')' embedded_statement 'else' embedded_statement
;
So if the controlled statement of an if_statement declares a pattern variable, its scope is restricted to that embedded_statement:
if (x) M(y is var z);
In this case the scope of z
is the embedded statement M(y is var z);
.
Other cases are errors for other reasons (e.g. in a parameter's default value or an attribute, both of which are an error because those contexts require a constant expression).
In C# 7.3 we added the following contexts in which a pattern variable may be declared:
- If the expression is in a constructor initializer, its scope is the constructor initializer and the constructor's body.
- If the expression is in a field initializer, its scope is the equals_value_clause in which it appears.
- If the expression is in a query clause that is specified to be translated into the body of a lambda, its scope is just that expression.
Changes to syntactic disambiguation
There are situations involving generics where the C# grammar is ambiguous, and the language spec says how to resolve those ambiguities:
7.6.5.2 Grammar ambiguities
The productions for simple-name (§7.6.3) and member-access (§7.6.5) can give rise to ambiguities in the grammar for expressions. For example, the statement:
F(G<A,B>(7));
could be interpreted as a call to
F
with two arguments,G < A
andB > (7)
. Alternatively, it could be interpreted as a call toF
with one argument, which is a call to a generic methodG
with two type arguments and one regular argument.
If a sequence of tokens can be parsed (in context) as a simple-name (§7.6.3), member-access (§7.6.5), or pointer-member-access (§18.5.2) ending with a type-argument-list (§4.4.1), the token immediately following the closing
>
token is examined. If it is one of( ) ] } : ; , . ? == != | ^
then the type-argument-list is retained as part of the simple-name, member-access or pointer-member-access and any other possible parse of the sequence of tokens is discarded. Otherwise, the type-argument-list is not considered to be part of the simple-name, member-access or > pointer-member-access, even if there is no other possible parse of the sequence of tokens. Note that these rules are not applied when parsing a type-argument-list in a namespace-or-type-name (§3.8). The statement
F(G<A,B>(7));
will, according to this rule, be interpreted as a call to
F
with one argument, which is a call to a generic methodG
with two type arguments and one regular argument. The statementsF(G < A, B > 7); F(G < A, B >> 7);
will each be interpreted as a call to
F
with two arguments. The statementx = F < A > +y;
will be interpreted as a less than operator, greater than operator, and unary plus operator, as if the statement had been written
x = (F < A) > (+y)
, instead of as a simple-name with a type-argument-list followed by a binary plus operator. In the statementx = y is C<T> + z;
the tokens
C<T>
are interpreted as a namespace-or-type-name with a type-argument-list.
There are a number of changes being introduced in C# 7 that make these disambiguation rules no longer sufficient to handle the complexity of the language.
Out variable declarations
It is now possible to declare a variable in an out argument:
M(out Type name);
However, the type may be generic:
M(out A<B> name);
Since the language grammar for the argument uses expression, this context is subject to the disambiguation rule. In this case the closing >
is followed by an identifier, which is not one of the tokens that permits it to be treated as a type-argument-list. I therefore propose to add identifier to the set of tokens that triggers the disambiguation to a type-argument-list.
Tuples and deconstruction declarations
A tuple literal runs into exactly the same issue. Consider the tuple expression
(A < B, C > D, E < F, G > H)
Under the old C# 6 rules for parsing an argument list, this would parse as a tuple with four elements, starting with A < B
as the first. However, when this appears on the left of a deconstruction, we want the disambiguation triggered by the identifier token as described above:
(A<B,C> D, E<F,G> H) = e;
This is a deconstruction declaration which declares two variables, the first of which is of type A<B,C>
and named D
. In other words, the tuple literal contains two expressions, each of which is a declaration expression.
For simplicity of the specification and compiler, I propose that this tuple literal be parsed as a two-element tuple wherever it appears (whether or not it appears on the left-hand-side of an assignment). That would be a natural result of the disambiguation described in the previous section.
Pattern-matching
Pattern matching introduces a new context where the expression-type ambiguity arises. Previously the right-hand-side of an is
operator was a type. Now it can be a type or expression, and if it is a type it may be followed by an identifier. This can, technically, change the meaning of existing code:
var x = e is T < A > B;
This could be parsed under C#6 rules as
var x = ((e is T) < A) > B;
but under under C#7 rules (with the disambiguation proposed above) would be parsed as
var x = e is T<A> B;
which declares a variable B
of type T<A>
. Fortunately, the native and Roslyn compilers have a bug whereby they give a syntax error on the C#6 code. Therefore this particular breaking change is not a concern.
Pattern-matching introduces additional tokens that should drive the ambiguity resolution toward selecting a type. The following examples of existing valid C#6 code would be broken without additional disambiguation rules:
var x = e is A<B> && f; // &&
var x = e is A<B> || f; // ||
var x = e is A<B> & f; // &
var x = e is A<B>[]; // [
Proposed change to the disambiguation rule
I propose to revise the specification to change the list of disambiguating tokens from
( ) ] } : ; , . ? == != | ^
to
( ) ] } : ; , . ? == != | ^ && || & [
And, in certain contexts, we treat identifier as a disambiguating token. Those contexts are where the sequence of tokens being disambiguated is immediately preceded by one of the keywords is
, case
, or out
, or arises while parsing the first element of a tuple literal (in which case the tokens are preceded by (
or :
and the identifier is followed by a ,
) or a subsequent element of a tuple literal.
Modified disambiguation rule
The revised disambiguation rule would be something like this
If a sequence of tokens can be parsed (in context) as a simple-name (§7.6.3), member-access (§7.6.5), or pointer-member-access (§18.5.2) ending with a type-argument-list (§4.4.1), the token immediately following the closing
>
token is examined, to see if it is
- One of
( ) ] } : ; , . ? == != | ^ && || & [
; or- One of the relational operators
< > <= >= is as
; or- A contextual query keyword appearing inside a query expression; or
- In certain contexts, we treat identifier as a disambiguating token. Those contexts are where the sequence of tokens being disambiguated is immediately preceded by one of the keywords
is
,case
orout
, or arises while parsing the first element of a tuple literal (in which case the tokens are preceded by(
or:
and the identifier is followed by a,
) or a subsequent element of a tuple literal.If the following token is among this list, or an identifier in such a context, then the type-argument-list is retained as part of the simple-name, member-access or pointer-member-access and any other possible parse of the sequence of tokens is discarded. Otherwise, the type-argument-list is not considered to be part of the simple-name, member-access or pointer-member-access, even if there is no other possible parse of the sequence of tokens. Note that these rules are not applied when parsing a type-argument-list in a namespace-or-type-name (§3.8).
Breaking changes due to this proposal
No breaking changes are known due to this proposed disambiguation rule.
Interesting examples
Here are some interesting results of these disambiguation rules:
The expression (A < B, C > D)
is a tuple with two elements, each a comparison.
The expression (A<B,C> D, E)
is a tuple with two elements, the first of which is a declaration expression.
The invocation M(A < B, C > D, E)
has three arguments.
The invocation M(out A<B,C> D, E)
has two arguments, the first of which is an out
declaration.
The expression e is A<B> C
uses a declaration expression.
The case label case A<B> C:
uses a declaration expression.
Some examples of pattern matching
Is-As
We can replace the idiom
var v = expr as Type;
if (v != null) {
// code using v
}
With the slightly more concise and direct
if (expr is Type v) {
// code using v
}
Testing nullable
We can replace the idiom
Type? v = x?.y?.z;
if (v.HasValue) {
var value = v.GetValueOrDefault();
// code using value
}
With the slightly more concise and direct
if (x?.y?.z is Type value) {
// code using value
}
Arithmetic simplification
Suppose we define a set of recursive types to represent expressions (per a separate proposal):
abstract class Expr;
class X() : Expr;
class Const(double Value) : Expr;
class Add(Expr Left, Expr Right) : Expr;
class Mult(Expr Left, Expr Right) : Expr;
class Neg(Expr Value) : Expr;
Now we can define a function to compute the (unreduced) derivative of an expression:
Expr Deriv(Expr e)
{
switch (e) {
case X(): return Const(1);
case Const(*): return Const(0);
case Add(var Left, var Right):
return Add(Deriv(Left), Deriv(Right));
case Mult(var Left, var Right):
return Add(Mult(Deriv(Left), Right), Mult(Left, Deriv(Right)));
case Neg(var Value):
return Neg(Deriv(Value));
}
}
An expression simplifier demonstrates positional patterns:
Expr Simplify(Expr e)
{
switch (e) {
case Mult(Const(0), *): return Const(0);
case Mult(*, Const(0)): return Const(0);
case Mult(Const(1), var x): return Simplify(x);
case Mult(var x, Const(1)): return Simplify(x);
case Mult(Const(var l), Const(var r)): return Const(l*r);
case Add(Const(0), var x): return Simplify(x);
case Add(var x, Const(0)): return Simplify(x);
case Add(Const(var l), Const(var r)): return Const(l+r);
case Neg(Const(var k)): return Const(-k);
default: return e;
}
}