Split list patterns on enumerables (#4793)

This commit is contained in:
Julien Couvreur 2021-05-30 22:02:01 -07:00 committed by GitHub
parent bcd1ace777
commit 5cb472d6b4
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 227 additions and 202 deletions

View file

@ -0,0 +1,224 @@
# List patterns on enumerables
## Summary
Lets you to match an enumerable with a sequence of patterns e.g. `enumerable is { 1, 2, 3 }` will match a sequence of the length three with 1, 2, 3 as its elements.
## Detailed design
The pattern syntax is unchanged:
```antlr
primary_pattern
: list_pattern
| length_pattern
| slice_pattern
| // all of the pattern forms previously defined
;
```
### Pattern compatibility
A *length_pattern* is now also compatible with any type that is not *countable* but is *enumerable* — it can be used in `foreach`.
A *list_pattern* is now also compatible with any type that is not *indexable* but is *enumerable*.
A *slice_pattern* without a subpattern is compatible with any type that is compatible with a *list_pattern*.
```
enumerable is { 1, 2, .. } // okay
enumerable is { 1, 2, ..var x } // error
```
### Semantics
If the input type is *enumerable* but not *countable*, then the *length_pattern* is checked on the number of elements obtained from enumerating the collection.
If the input type is *enumerable* but not *indexable*, then the *list_pattern* enumerates elements from the collection and checks them against the listed patterns:
Patterns at the start of the *list_pattern* — that are before the `..` *slice_pattern* if one is present, or all otherwise — are matched against the elements produced at the start of the enumeration.
If the collection does not produce enough elements to get a value corresponding to a starting pattern, the match fails. So the *constant_pattern* `3` in `{ 1, 2, 3, .. }` doesn't match when the collection has fewer than 3 elements.
Patterns at the end of the *list_pattern* (that are following the `..` *slice_pattern* if one is present) are matched against the elements produced at the end of the enumeration.
If the collection does not produce enough elements to get values corresponding to the ending patterns, the *slice_pattern* does not match. So the *slice_pattern* in `{ 1, .., 3 }` doesn't match when the collection has fewer than 2 elements.
A *list_pattern* without a *slice_pattern* only matches if the number of elements produced by complete enumeration and the number of patterns are equals. So `{ _, _, _ }` only matches when the collection produces exactly 3 elements.
Note that those implicit checks for number of elements in the collection are unaffected by the collection type being *countable*. So `{ _, _, _ }` will not make use of `Length` or `Count` even if one is available.
When multiple *list_patterns* are applied to one input value the collection will be enumerated once at most:
```
_ = collection switch
{
{ 1 } => ...,
{ 2 } => ...,
{ .., 3 } => ...,
};
_ = collectionContainer switch
{
{ Collection: { 1 } } => ...,
{ Collection: { 2 } } => ...,
{ Collection: { .., 3 } } => ...,
};
```
It is possible that the collection will not be completely enumerated. For example, if one of the patterns in the *list_pattern* doesn't match or when there are no ending patterns in a *list_pattern* (e.g. `collection is { 1, 2, .. }`).
If an enumerator is produced when a *list_pattern* is applied to an enumerable type and that enumerator is disposable it will be disposed when a top-level pattern containing the *list_pattern* successfully matches, or when none of the patterns match (in the case of a `switch` statement or expression). It is possible for an enumerator to be disposed more than once and the enumerator must ignore all calls to `Dispose` after the first one.
```
// any enumerator used to evaluate this switch statement is disposed at the indicated locations
_ = collection switch
{
{ 1 } => /* here */ ...,
_ => /* here */ ...,
};
/* here too, with a spilled try/finally around the switch expression */
```
### Lowering on enumerable type
> **Open question**: Need to investigate how to reduce allocation for the end circular buffer. `stackalloc` is bad in loops. Maybe we'll just have to fall back to locals and a `switch`. (see [`params` feature discussion](https://github.com/dotnet/csharplang/blob/main/proposals/format.md#extending-params) also)
Although a helper type is not necessary, it helps simplify and illustrate the logic.
```
class ListPatternHelper
{
// Notes:
// We could inline this logic to avoid creating a new type and to handle the pattern-based enumeration scenarios.
// We may only need one element in start buffer, or maybe none at all, if we can control the order of checks in the patterns DAG.
// We could emit a count check for a non-terminal `..` and economize on count checks a bit.
private EnumeratorType enumerator;
private int count;
private ElementType[] startBuffer;
private ElementType[] endCircularBuffer;
public ListPatternHelper(EnumerableType enumerable, int startPatternsCount, int endPatternsCount)
{
count = 0;
enumerator = enumerable.GetEnumerator();
startBuffer = startPatternsCount == 0 ? null : new ElementType[startPatternsCount];
endCircularBuffer = endPatternsCount == 0 ? null : new ElementType[endPatternsCount];
}
// targetIndex = -1 means we want to enumerate completely
private int MoveNextIfNeeded(int targetIndex)
{
int startSize = startBuffer?.Length ?? 0;
int endSize = endCircularBuffer?.Length ?? 0;
Debug.Assert(targetIndex == -1 || (targetIndex >= 0 && targetIndex < startSize));
while ((targetIndex == -1 || count <= targetIndex) && enumerator.MoveNext())
{
if (count < startSize)
startBuffer[count] = enumerator.Current;
if (endSize > 0)
endCircularBuffer[count % endSize] = enumerator.Current;
count++;
}
return count;
}
public bool Last()
{
return !enumerator.MoveNext();
}
public int Count()
{
return MoveNextIfNeeded(-1);
}
// fulfills the role of `[index]` for start elements when enough elements are available
public bool TryGetStartElement(int index, out ElementType value)
{
Debug.Assert(startBuffer is not null && index >= 0 && index < startBuffer.Length);
MoveNextIfNeeded(index);
if (count > index)
{
value = startBuffer[index];
return true;
}
value = default;
return false;
}
// fulfills the role of `[^hatIndex]` for end elements when enough elements are available
public ElementType GetEndElement(int hatIndex)
{
Debug.Assert(endCircularBuffer is not null && hatIndex > 0 && hatIndex <= endCircularBuffer.Length);
int endSize = endCircularBuffer.Length;
Debug.Assert(endSize > 0);
return endCircularBuffer[(count - hatIndex) % endSize];
}
}
```
`collection is [3]` is lowered to
```
@{
var helper = new ListPatternHelper(collection, 0, 0);
helper.Count() == 3
}
```
`collection is { 0, 1 }` is lowered to
```
@{
var helper = new ListPatternHelper(collection, 2, 0);
helper.TryGetStartElement(index: 0, out var element0) && element0 is 0 &&
helper.TryGetStartElement(1, out var element1) && element1 is 1 &&
helper.Last()
}
```
`collection is { 0, 1, .. }` is lowered to
```
@{
var helper = new ListPatternHelper(collection, 2, 0);
helper.TryGetStartElement(index: 0, out var element0) && element0 is 0 &&
helper.TryGetStartElement(1, out var element1) && element1 is 1
}
```
`collection is { .., 3, 4 }` is lowered to
```
@{
var helper = new ListPatternHelper(collection, 0, 2);
helper.Count() >= 2 && // `..` with 2 ending patterns
helper.GetEndElement(hatIndex: 2) is 3 && // [^2] is 3
helper.GetEndElement(1) is 4 // [^1] is 4
}
```
`collection is { 1, 2, .., 3, 4 }` is lowered to
```
@{
var helper = new ListPatternHelper(collection, 2, 2);
helper.TryGetStartElement(index: 0, out var element0) && element0 is 1 &&
helper.TryGetStartElement(1, out var element1) && element1 is 2 &&
helper.Count() >= 4 && // `..` with 2 starting patterns and 2 ending patterns
helper.GetEndElement(hatIndex: 2) is 3 &&
helper.GetEndElement(1) is 4
}
```
The same way that a `Type { name: pattern }` *property_pattern* checks that the input has the expected type and isn't null before using that as receiver for the property checks, so can we have the `{ ..., ... }` *list_pattern* initialize a helper and use that as the pseudo-receiver for element accesses.
This should allow merging branches of the patterns DAG, thus avoiding creating multiple enumerators.
Note: async enumerables are out-of-scope for C# 10. (Confirmed in LDM 4/12/2021)
Note: sub-patterns are disallowed in slice-patterns on enumerables for now despite some desirable uses: `e is { 1, 2, ..[var count] }` (LDM 4/12/2021)
## Unresolved questions
1. Should we limit the list-pattern to `IEnumerable` types? Then we could allow `{ 1, 2, ..var x }` (`x` would be an `IEnumerable` we would cook up) (answer [LDM 4/12/2021]: no, we'll disallow sub-pattern in slice pattern on enumerable for now)
2. Should we try and optimize list-patterns like `{ 1, _, _ }` on a countable enumerable type? We could just check the first enumerated element then check `Length`/`Count`. Can we assume that `Count` agrees with enumerated count?
3. Should we try to cut the enumeration short for length-patterns on enumerables in some cases? (computing min/max acceptable count and checking partial count against that)
What if the enumerable type has some sort of `TryGetNonEnumeratedCount` API?
4. Can we detect at runtime that the input type is sliceable, so as to avoid enumeration? .NET 6 may be adding some LINQ methods/extensions that would help.

View file

@ -59,8 +59,6 @@ There are three new patterns:
- The *length_pattern* is used to match the length.
- A *slice_pattern* is only permitted once and only directly in a *list_pattern_clause* and discards _**zero or more**_ elements.
> **Open question**: Should we accept a general *pattern* following `..` in a *slice_pattern*? (answer [LDM 2021-05-26]: Yes, any pattern is allowed after a slice.)
Notes:
- Due to the ambiguity with *property_pattern*, a *list_pattern* cannot be empty and a *length_pattern* should be used instead to match a list with the length of zero, e.g. `[0]`.
@ -72,64 +70,14 @@ Notes:
#### Pattern compatibility
A *length_pattern* is compatible with any type that is *countable* — it has an accessible property getter that returns an `int` and has the name `Length` or `Count`. If both properties are present, the former is preferred.
A *length_pattern* is also compatible with any type that is *enumerable* — it can be used in `foreach`.
A *list_pattern* is compatible with any type that is *countable* as well as *indexable* — it has an accessible indexer that takes an `Index` as an argument or otherwise an accessible indexer with a single `int` parameter. If both indexers are present, the former is preferred.
A *list_pattern* is also compatible with any type that is *enumerable*.
A *slice_pattern* with a subpattern is compatible with any type that is *countable* as well as *sliceable* — it has an accessible indexer that takes a `Range` as an argument or otherwise an accessible `Slice` method with two `int` parameters. If both are present, the former is preferred.
A *slice_pattern* without a subpattern is compatible with any type that is compatible with a *list_pattern*.
```
enumerable is { 1, 2, .. } // okay
enumerable is { 1, 2, ..var x } // error
```
This set of rules is derived from the [***range indexer pattern***](https://github.com/dotnet/csharplang/blob/master/proposals/csharp-8.0/ranges.md#implicit-index-support).
#### Semantics on enumerable type
If the input type is *enumerable* but not *countable*, then the *length_pattern* is checked on the number of elements obtained from enumerating the collection.
If the input type is *enumerable* but not *indexable*, then the *list_pattern* enumerates elements from the collection and checks them against the listed patterns:
Patterns at the start of the *list_pattern* — that are before the `..` *slice_pattern* if one is present, or all otherwise — are matched against the elements produced at the start of the enumeration.
If the collection does not produce enough elements to get a value corresponding to a starting pattern, the match fails. So the *constant_pattern* `3` in `{ 1, 2, 3, .. }` doesn't match when the collection has fewer than 3 elements.
Patterns at the end of the *list_pattern* (that are following the `..` *slice_pattern* if one is present) are matched against the elements produced at the end of the enumeration.
If the collection does not produce enough elements to get values corresponding to the ending patterns, the *slice_pattern* does not match. So the *slice_pattern* in `{ 1, .., 3 }` doesn't match when the collection has fewer than 2 elements.
A *list_pattern* without a *slice_pattern* only matches if the number of elements produced by complete enumeration and the number of patterns are equals. So `{ _, _, _ }` only matches when the collection produces exactly 3 elements.
Note that those implicit checks for number of elements in the collection are unaffected by the collection type being *countable*. So `{ _, _, _ }` will not make use of `Length` or `Count` even if one is available.
When multiple *list_patterns* are applied to one input value the collection will be enumerated once at most:
```
_ = collection switch
{
{ 1 } => ...,
{ 2 } => ...,
{ .., 3 } => ...,
};
_ = collectionContainer switch
{
{ Collection: { 1 } } => ...,
{ Collection: { 2 } } => ...,
{ Collection: { .., 3 } } => ...,
};
```
It is possible that the collection will not be completely enumerated. For example, if one of the patterns in the *list_pattern* doesn't match or when there are no ending patterns in a *list_pattern* (e.g. `collection is { 1, 2, .. }`).
If an enumerator is produced when a *list_pattern* is applied to an enumerable type and that enumerator is disposable it will be disposed when a top-level pattern containing the *list_pattern* successfully matches, or when none of the patterns match (in the case of a `switch` statement or expression). It is possible for an enumerator to be disposed more than once and the enumerator must ignore all calls to `Dispose` after the first one.
```
// any enumerator used to evaluate this switch statement is disposed at the indicated locations
_ = collection switch
{
{ 1 } => /* here */ ...,
_ => /* here */ ...,
};
/* here too, with a spilled try/finally around the switch expression */
```
#### Subsumption checking
Subsumption checking works just like [positional patterns with `ITuple`](https://github.com/dotnet/csharplang/blob/main/proposals/csharp-8.0/patterns.md#positional-pattern) - corresponding subpatterns are matched by position plus an additional node for testing length.
@ -147,10 +95,8 @@ case {.., 1, _}: // expr.Length is >= 2 && expr[^2] is 1
```
The order in which subpatterns are matched at runtime is unspecified, and a failed match may not attempt to match all subpatterns.
> **Open question**: By this definition, the pattern `{..}` tests for `expr.Length >= 0`. Should we omit such test, assuming `Length` is always non-negative? (answer [LDM 2021-05-26]: `{ .. }` will not emit a Length check)
#### Lowering on countable/indexeable/sliceable type
#### Lowering
A pattern of the form `expr is {1, 2, 3}` is equivalent to the following code (if compatible via implicit `Index` support):
```cs
@ -168,153 +114,8 @@ expr.Length is >= 2
```
The *input type* for the *slice_pattern* is the return type of the underlying `this[Range]` or `Slice` method with two exceptions: For `string` and arrays, `string.Substring` and `RuntimeHelpers.GetSubArray` will be used, respectively.
#### Lowering on enumerable type
> **Open question**: Need to investigate how to reduce allocation for the end circular buffer. `stackalloc` is bad in loops. Maybe we'll just have to fall back to locals and a `switch`. (see [`params` feature discussion](https://github.com/dotnet/csharplang/blob/main/proposals/format.md#extending-params) also)
Although a helper type is not necessary, it helps simplify and illustrate the logic.
```
class ListPatternHelper
{
// Notes:
// We could inline this logic to avoid creating a new type and to handle the pattern-based enumeration scenarios.
// We may only need one element in start buffer, or maybe none at all, if we can control the order of checks in the patterns DAG.
// We could emit a count check for a non-terminal `..` and economize on count checks a bit.
private EnumeratorType enumerator;
private int count;
private ElementType[] startBuffer;
private ElementType[] endCircularBuffer;
public ListPatternHelper(EnumerableType enumerable, int startPatternsCount, int endPatternsCount)
{
count = 0;
enumerator = enumerable.GetEnumerator();
startBuffer = startPatternsCount == 0 ? null : new ElementType[startPatternsCount];
endCircularBuffer = endPatternsCount == 0 ? null : new ElementType[endPatternsCount];
}
// targetIndex = -1 means we want to enumerate completely
private int MoveNextIfNeeded(int targetIndex)
{
int startSize = startBuffer?.Length ?? 0;
int endSize = endCircularBuffer?.Length ?? 0;
Debug.Assert(targetIndex == -1 || (targetIndex >= 0 && targetIndex < startSize));
while ((targetIndex == -1 || count <= targetIndex) && enumerator.MoveNext())
{
if (count < startSize)
startBuffer[count] = enumerator.Current;
if (endSize > 0)
endCircularBuffer[count % endSize] = enumerator.Current;
count++;
}
return count;
}
public bool Last()
{
return !enumerator.MoveNext();
}
public int Count()
{
return MoveNextIfNeeded(-1);
}
// fulfills the role of `[index]` for start elements when enough elements are available
public bool TryGetStartElement(int index, out ElementType value)
{
Debug.Assert(startBuffer is not null && index >= 0 && index < startBuffer.Length);
MoveNextIfNeeded(index);
if (count > index)
{
value = startBuffer[index];
return true;
}
value = default;
return false;
}
// fulfills the role of `[^hatIndex]` for end elements when enough elements are available
public ElementType GetEndElement(int hatIndex)
{
Debug.Assert(endCircularBuffer is not null && hatIndex > 0 && hatIndex <= endCircularBuffer.Length);
int endSize = endCircularBuffer.Length;
Debug.Assert(endSize > 0);
return endCircularBuffer[(count - hatIndex) % endSize];
}
}
```
`collection is [3]` is lowered to
```
@{
var helper = new ListPatternHelper(collection, 0, 0);
helper.Count() == 3
}
```
`collection is { 0, 1 }` is lowered to
```
@{
var helper = new ListPatternHelper(collection, 2, 0);
helper.TryGetStartElement(index: 0, out var element0) && element0 is 0 &&
helper.TryGetStartElement(1, out var element1) && element1 is 1 &&
helper.Last()
}
```
`collection is { 0, 1, .. }` is lowered to
```
@{
var helper = new ListPatternHelper(collection, 2, 0);
helper.TryGetStartElement(index: 0, out var element0) && element0 is 0 &&
helper.TryGetStartElement(1, out var element1) && element1 is 1
}
```
`collection is { .., 3, 4 }` is lowered to
```
@{
var helper = new ListPatternHelper(collection, 0, 2);
helper.Count() >= 2 && // `..` with 2 ending patterns
helper.GetEndElement(hatIndex: 2) is 3 && // [^2] is 3
helper.GetEndElement(1) is 4 // [^1] is 4
}
```
`collection is { 1, 2, .., 3, 4 }` is lowered to
```
@{
var helper = new ListPatternHelper(collection, 2, 2);
helper.TryGetStartElement(index: 0, out var element0) && element0 is 1 &&
helper.TryGetStartElement(1, out var element1) && element1 is 2 &&
helper.Count() >= 4 && // `..` with 2 starting patterns and 2 ending patterns
helper.GetEndElement(hatIndex: 2) is 3 &&
helper.GetEndElement(1) is 4
}
```
The same way that a `Type { name: pattern }` *property_pattern* checks that the input has the expected type and isn't null before using that as receiver for the property checks, so can we have the `{ ..., ... }` *list_pattern* initialize a helper and use that as the pseudo-receiver for element accesses.
This should allow merging branches of the patterns DAG, thus avoiding creating multiple enumerators.
Note: async enumerables are out-of-scope. (Confirmed in LDM 4/12/2021)
Note: sub-patterns are disallowed in list-patterns on enumerables for now despite some desirable uses: `e is { 1, 2, ..[var count] }` (LDM 4/12/2021)
## Unresolved questions
1. Should we support multi-dimensional arrays? (answer [LDM 2021-05-26]: Not supported. If we want to make a general MD-array focused release, we would want to revisit all the areas they're currently lacking, not just list patterns.)
1. Should we limit the list-pattern to `IEnumerable` types? Then we could allow `{ 1, 2, ..var x }` (`x` would be an `IEnumerable` we would cook up) (answer [LDM 4/12/2021]: no, we'll disallow sub-pattern in slice pattern on enumerable for now)
2. Should we try and optimize list-patterns like `{ 1, _, _ }` on a countable enumerable type? We could just check the first enumerated element then check `Length`/`Count`. Can we assume that `Count` agrees with enumerated count?
3. Should we try to cut the enumeration short for length-patterns on enumerables in some cases? (computing min/max acceptable count and checking partial count against that)
What if the enumerable type has some sort of `TryGetNonEnumeratedCount` API?
4. Can we detect at runtime that the input type is sliceable, so as to avoid enumeration? .NET 6 may be adding some LINQ methods/extensions that would help.
2. Should we accept a general *pattern* following `..` in a *slice_pattern*? (answer [LDM 2021-05-26]: Yes, any pattern is allowed after a slice.)
3. By this definition, the pattern `{..}` tests for `expr.Length >= 0`. Should we omit such test, assuming `Length` is always non-negative? (answer [LDM 2021-05-26]: `{..}` will not emit a Length check)