2019-04-01 01:00:29 +02:00
|
|
|
# Index and Range Changes
|
|
|
|
|
|
|
|
## Summary
|
|
|
|
This proposes to make several changes to the
|
|
|
|
[index and range design](https://github.com/dotnet/csharplang/blob/master/proposals/csharp-8.0/ranges.md) based on
|
|
|
|
customer feedback: particularly from the CoreFX team and their experiences adding Index / Range support to .NET Core.
|
|
|
|
The change is not to the syntax but rather how the langauge maps the syntax to APIs.
|
|
|
|
|
|
|
|
## Motivation
|
2019-04-01 05:39:22 +02:00
|
|
|
The level of API churn necessary to adopt `Index` and `Range` is quite high today. Every collection type must have at
|
|
|
|
least two new indexers added for `Index` and `Range` respectively. The implementation of `Index` is exactly the same
|
|
|
|
for virtually every collection in .NET. The implementation for `Range` is likewise often similar by simply deferring to
|
|
|
|
the underlying collection type.
|
2019-04-01 01:00:29 +02:00
|
|
|
|
2019-04-01 05:39:22 +02:00
|
|
|
This work must be done by the collection author. The reliance on indexers as the primary use case means developers can't
|
|
|
|
update existing collections using extension methods. This means the adoption of the feature will be slowed by waiting
|
|
|
|
for library authors to update their types.
|
|
|
|
|
|
|
|
The use of `Index` does come with a small performance hit. Typically a consumer of `Index` will pay an extra branch
|
2019-04-06 01:47:51 +02:00
|
|
|
check (`IsFromEnd` and bounds) as they would with an `int` parameter (bounds only). Moving to `Range` doubles this cost
|
2019-04-01 05:39:22 +02:00
|
|
|
as it occurs on each `Index`.
|
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
This penalty is small and insignificant in the vast majority of code. For highly optimized code though, this extra
|
2019-04-01 05:39:22 +02:00
|
|
|
branching can be unacceptable. It simply can't be thought of as a syntactic transformation from a call to
|
2019-04-06 01:47:51 +02:00
|
|
|
`span.Slice(start, end)` to `span[start..end]`. Instead, the performance implications must be evaluated for each
|
2019-04-01 05:39:22 +02:00
|
|
|
usage. This will likely lead to the feature being banned in certain portions of code bases.
|
|
|
|
|
|
|
|
This proposals seeks to remove the abstraction penalties around `Index` and `Range`: both at an API and performance
|
|
|
|
layer. The goal being that most collection types just work today with minimal change and the feature can be adopted
|
|
|
|
in existing code without fear of lost performance.
|
2019-04-01 01:00:29 +02:00
|
|
|
|
2019-04-01 05:39:22 +02:00
|
|
|
This proposal specifically does not want to change how `Index` and `Range` type binding occurs. The index and range
|
|
|
|
expressions continue to have the same syntax and types.
|
|
|
|
|
|
|
|
## Detailed Design
|
2019-04-05 02:01:49 +02:00
|
|
|
### Countable Types
|
2019-04-06 01:47:51 +02:00
|
|
|
Any type which has a property named `Length` or `Count` with an accessible getter and a return type of `int` is
|
2019-04-05 02:01:49 +02:00
|
|
|
considered Countable. The language can make use of this property to convert an expression of type `Index` into an `int`
|
2019-04-06 01:47:51 +02:00
|
|
|
at the point of the expression without the need to use the type `Index` at all. In case both `Length` and `Count`
|
|
|
|
are present, `Length` will be preferred.
|
2019-04-01 04:03:26 +02:00
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
For simplicity going forward, the proposal will use the name `Length` to represent `Count` or `Length`.
|
2019-04-01 04:03:26 +02:00
|
|
|
|
2019-04-01 05:39:22 +02:00
|
|
|
### Index and Range implementations are known
|
|
|
|
The implementations of `Index` and `Range` are considered to be known and side effect free. Much like
|
2019-04-06 01:47:51 +02:00
|
|
|
`ValueTuple<T1, T2>`, the language can assume a standard implementation and emit code inline which represents the
|
2019-04-01 05:39:22 +02:00
|
|
|
implementation of methods on `Index` and `Range`. This implementation includes methods like `GetOffset` or conversions
|
|
|
|
like the implicit conversion from `int` to `Index`.
|
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
All arithmetic operations will be emitted using an `unchecked` context. That matches the context
|
2019-04-01 05:39:22 +02:00
|
|
|
in which `Index` and `Range` are compiled in.
|
|
|
|
|
2019-04-05 02:01:49 +02:00
|
|
|
### Implicit Index support
|
|
|
|
The language will provide an instance indexer member with a single parameter of type `Index` for types which meet the
|
|
|
|
following criteria:
|
2019-04-01 04:03:26 +02:00
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
- The type is Countable.
|
2019-04-05 02:01:49 +02:00
|
|
|
- The type has an accessible instance indexer which takes a single `int` as the argument.
|
2019-04-08 17:56:51 +02:00
|
|
|
- The type does not have an accessible instance indexer which takes an `Index` as the first parameter. The `Index`
|
|
|
|
must be the only parameter or the remaining parameters must be optional.
|
2019-04-01 04:03:26 +02:00
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
For such types, the language will act as if there is an index member of the form `T this[Index index]` where `T` is
|
2019-04-05 02:01:49 +02:00
|
|
|
the return type of the `int` based indexer including any `ref` style annotations. The new member will have the same
|
|
|
|
`get` and `set` members with matching accessibilty as the `int` indexer.
|
2019-04-01 04:03:26 +02:00
|
|
|
|
2019-04-05 02:01:49 +02:00
|
|
|
The new indexer will be implemented by converting the argument of type `Index` into an `int` and emitting a call
|
2019-04-06 01:47:51 +02:00
|
|
|
to the `int` based indexer. For discussion purposes, lets use the example of `receiver[expr]`. The conversion of
|
2019-04-05 02:01:49 +02:00
|
|
|
`expr` to `int` will occur as follows:
|
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
- When the argument is of the form `^expr2` and the type of `expr2` is `int`, it will be translated to
|
2019-04-05 02:01:49 +02:00
|
|
|
`receiver.Length - expr2`.
|
2019-04-06 01:47:51 +02:00
|
|
|
- Otherwise, it will be translated as `expr.GetOffset(receiver.Length)`.
|
2019-04-05 02:01:49 +02:00
|
|
|
|
|
|
|
This allows for developers to use the `Index` feature on existing types without the need for modification. For example:
|
|
|
|
|
|
|
|
``` csharp
|
|
|
|
List<char> list = ...;
|
|
|
|
var value = list[^1];
|
|
|
|
|
|
|
|
// Gets translated to
|
2019-04-08 17:56:51 +02:00
|
|
|
var value = list[list.Count - 1];
|
2019-04-05 02:01:49 +02:00
|
|
|
```
|
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
The `receiver` and `Length` expressions will be spilled as appropriate to ensure any side effects are only executed
|
2019-04-01 04:03:26 +02:00
|
|
|
once. For example:
|
|
|
|
|
|
|
|
``` csharp
|
2019-04-05 02:01:49 +02:00
|
|
|
class Collection {
|
|
|
|
private int[] _array = new[] { 1, 2, 3 };
|
|
|
|
|
|
|
|
int Length {
|
|
|
|
get {
|
|
|
|
Console.Write("Length ");
|
|
|
|
return _array.Length;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
int this[int index] => _array[index];
|
|
|
|
}
|
|
|
|
|
2019-04-01 04:03:26 +02:00
|
|
|
class SideEffect {
|
2019-04-05 02:01:49 +02:00
|
|
|
Collection Get() {
|
2019-04-01 04:03:26 +02:00
|
|
|
Console.Write("Get ");
|
2019-04-05 02:01:49 +02:00
|
|
|
return new Collection();
|
2019-04-01 04:03:26 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
void Use() {
|
2019-04-05 02:01:49 +02:00
|
|
|
int i = Get()[^1];
|
2019-04-01 04:03:26 +02:00
|
|
|
Console.WriteLine(i);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
2019-04-05 02:01:49 +02:00
|
|
|
This code will print "Get Length 3".
|
2019-04-01 04:03:26 +02:00
|
|
|
|
2019-04-05 02:01:49 +02:00
|
|
|
### Implicit Range support
|
|
|
|
The language will provide an instance indexer member with a single parameter of type `Range` for types which meet the
|
|
|
|
following criteria:
|
2019-04-01 04:03:26 +02:00
|
|
|
|
2019-04-05 02:01:49 +02:00
|
|
|
- The type is Countable.
|
2019-04-05 21:54:59 +02:00
|
|
|
- The type has an accessible member named `Slice` which has two parameters of type `int`.
|
2019-04-08 17:56:51 +02:00
|
|
|
- The type does not have an instance indexer which takes a single `Range` as the first parameter. The `Range`
|
|
|
|
must be the only parameter or the remaining parameters must be optional.
|
2019-04-01 05:39:22 +02:00
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
For such types, the language will bind as if there is an index member of the form `T this[Range range]` where `T` is
|
2019-04-05 02:01:49 +02:00
|
|
|
the return type of the `Slice` method including any `ref` style annotations. The new member will also have matching
|
|
|
|
accessibility with `Slice`.
|
2019-04-01 05:39:22 +02:00
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
When the `Range` based indexer is bound on an expression named `receiver`, it will be lowered by converting the `Range`
|
|
|
|
expression into two values that are then passed to the `Slice` method. For discussion purposes, lets use the example of
|
2019-04-05 02:01:49 +02:00
|
|
|
`receiver[expr]`.
|
|
|
|
|
|
|
|
The first argument of `Slice` will be obtained by converting the typed expression in the following way:
|
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
- When `expr` is of the form `expr1..expr2` (where `expr2` can be omitted) and `expr1` has type `int`, then it will be
|
2019-04-05 02:01:49 +02:00
|
|
|
emitted as `expr1`.
|
2019-04-06 01:47:51 +02:00
|
|
|
- When `expr` is of the form `^expr1..expr2` (where `expr2` can be omitted), then it will be emitted as
|
|
|
|
`receiver.Length - expr1`.
|
|
|
|
- When `expr` is of the form `..expr2` (where `expr2` can be omitted), then it will be emitted as `0`.
|
|
|
|
- Otherwise, it will be emitted as `expr.Start.GetOffset(receiver.Length)`.
|
2019-04-05 02:01:49 +02:00
|
|
|
|
|
|
|
This value will be re-used in the calculation of the second `Slice` argument. When doing so it will be referred to as
|
|
|
|
`start`. The second argument of `Slice` will be obtained by converting the range typed expression in the following way:
|
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
- When `expr` is of the form `expr1..expr2` (where `expr1` can be omitted) and `expr2` has type `int`, then it will be
|
2019-04-05 02:01:49 +02:00
|
|
|
emitted as `expr2 - start`.
|
2019-04-06 01:47:51 +02:00
|
|
|
- When `expr` is of the form `expr1..^expr2` (where `expr1` can be omitted), then it will be emitted as
|
|
|
|
`(receiver.Length - expr2) - start`.
|
|
|
|
- When `expr` is of the form `expr1..` (where `expr1` can be omitted), then it will be emitted as `receiver.Length`.
|
|
|
|
- Otherwise, it will be emitted as `expr.End.GetOffset(receiver.Length) - start`.
|
2019-04-05 02:01:49 +02:00
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
The `receiver`, `Length` and `expr` expressions will be spilled as appropriate to ensure any side effects are only
|
2019-04-05 02:01:49 +02:00
|
|
|
executed once. For example:
|
2019-04-01 05:39:22 +02:00
|
|
|
|
|
|
|
``` csharp
|
|
|
|
class Collection {
|
2019-04-05 02:01:49 +02:00
|
|
|
private int[] _array = new[] { 1, 2, 3 };
|
|
|
|
|
|
|
|
int Length {
|
|
|
|
get {
|
|
|
|
Console.Write("Length ");
|
|
|
|
return _array.Length;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
int[] Slice(int start, int length) {
|
|
|
|
var slice = new int[length];
|
|
|
|
Array.Copy(_array, start, slice, 0, length);
|
|
|
|
return slice;
|
|
|
|
}
|
2019-04-01 05:39:22 +02:00
|
|
|
}
|
2019-04-01 01:00:29 +02:00
|
|
|
|
2019-04-05 02:01:49 +02:00
|
|
|
class SideEffect {
|
|
|
|
Collection Get() {
|
|
|
|
Console.Write("Get ");
|
|
|
|
return new Collection();
|
|
|
|
}
|
2019-04-01 05:39:22 +02:00
|
|
|
|
2019-04-05 02:01:49 +02:00
|
|
|
void Use() {
|
|
|
|
var array = Get()[0..2];
|
|
|
|
Console.WriteLine(array.length);
|
2019-04-01 05:39:22 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
2019-04-05 02:01:49 +02:00
|
|
|
This code will print "Get Length 2".
|
2019-04-01 05:39:22 +02:00
|
|
|
|
2019-04-05 02:01:49 +02:00
|
|
|
The language will special case the following known types:
|
2019-04-01 05:39:22 +02:00
|
|
|
|
2019-04-05 02:01:49 +02:00
|
|
|
- `string`: the method `Substring` will be used instead of `Slice`.
|
|
|
|
- `array`: the method `System.Reflection.CompilerServices.GetSubArray` will be used instead of `Slice`.
|
2019-04-01 01:00:29 +02:00
|
|
|
|
|
|
|
## Open Issues
|
|
|
|
|
2019-04-08 17:56:51 +02:00
|
|
|
### Declared vs. Instantiated type
|
|
|
|
|
2019-04-01 01:00:29 +02:00
|
|
|
## Considerations
|
|
|
|
|
|
|
|
### Detect Indexable based on ICollection
|
|
|
|
The inspiration for this proposal was collection initializers. Using the structure of a type to convey that it had
|
|
|
|
opted into a feature. In the case of collection initializers types can opt into the feature by implementing the
|
|
|
|
interface `IEnumerable` (non generic).
|
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
This proposal initially required that types implement `ICollection` in order to qualify as Indexable. That required a number of special cases though:
|
2019-04-01 01:00:29 +02:00
|
|
|
|
|
|
|
- `ref struct`: these cannot implement interfaces yet types like `Span<T>` are ideal for index / range support.
|
|
|
|
- `string`: does not implement `ICollection` and adding that `interface` has a large cost.
|
|
|
|
|
|
|
|
This means to support key types special casing is already needed. The special casing of `string` is less interesting
|
|
|
|
as the language does this in other areas (`foreach` lowering, constants, etc ...). The special casing of `ref struct`
|
2019-04-05 21:54:59 +02:00
|
|
|
is more concerning as it's special casing an entire class of types. They get labeled as Indexable if they simply have
|
2019-04-01 01:00:29 +02:00
|
|
|
a property named `Count` with a return type of `int`.
|
|
|
|
|
2019-04-01 04:03:26 +02:00
|
|
|
After consideration the design was normalized to say that any type which has a property `Count` / `Length` with a
|
|
|
|
return type of `int` is Indexable. That removes all special casing, even for `string` and arrays.
|
|
|
|
|
|
|
|
### Detect just Count
|
|
|
|
Detecting on the property names `Count` or `Length` does complicate the design a bit. Picking just one to standardize
|
|
|
|
though is not sufficient as it ends up excluding a large number of types:
|
|
|
|
|
|
|
|
- Use `Length`: excludes pretty much every collection in System.Collections and sub-namespaces. Those tend to derive
|
|
|
|
from `ICollection` and hence prefer `Count` over length.
|
|
|
|
- Use `Count`: excludes `string`, arrays, `Span<T>` and most `ref struct` based types
|
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
The extra complication on the initial detection of Indexable types is outweighed by its simplification in other
|
2019-04-01 04:03:26 +02:00
|
|
|
aspects.
|
|
|
|
|
2019-04-08 17:56:51 +02:00
|
|
|
### Choice of Slice as a name
|
2019-04-01 05:39:22 +02:00
|
|
|
The name `Slice` was chosen as it's the de-facto standard name for slice style operations in .NET. Starting with
|
|
|
|
netcoreapp2.1 all span style types use the name `Slice` for slicing operations. Prior to netcoreapp2.1 there really
|
|
|
|
aren't any examples of slicing to look to for an example. Types like `List<T>`, `ArraySegment<T>`, `SortedList<T>`
|
|
|
|
would've been ideal for slicing but the concept didn't exist when types were added.
|
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
Thus, `Slice` being the sole example, it was chosen as the name.
|
2019-04-01 01:00:29 +02:00
|
|
|
|
2019-04-05 02:01:49 +02:00
|
|
|
### Index target type conversion
|
2019-04-06 01:47:51 +02:00
|
|
|
Another way to view the `Index` transformation in an indexer expression is as a target type conversion. Instead
|
|
|
|
of binding as if there is a member of the form `return_type this[Index]`, the language instead assigns a target
|
2019-04-05 02:01:49 +02:00
|
|
|
typed conversion to `int`.
|
|
|
|
|
|
|
|
This concept could be generalized to all member access on Countable types. Whenever an expression with type `Index` is
|
|
|
|
used as an argument to an instance member invocation and the receiver is Countable then the expression will have a
|
|
|
|
target type conversion to `int`. The member invocations applicable for this conversion include methods, indexers,
|
|
|
|
properties, extension methods, etc ... Only constructors are excluded as they have no receiver.
|
|
|
|
|
|
|
|
The target type conversion will be implemented as follows for any expression which has a type of `Index`. For
|
|
|
|
discussion purposes lets use the example of `receiver[expr]`:
|
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
- When `expr` is of the form `^expr2` and the type of `expr2` is `int`, it will be translated to
|
2019-04-05 02:01:49 +02:00
|
|
|
`receiver.Length - expr2`.
|
2019-04-06 01:47:51 +02:00
|
|
|
- Otherwise, it will be translated as `expr.GetOffset(receiver.Length)`.
|
2019-04-05 02:01:49 +02:00
|
|
|
|
2019-04-06 01:47:51 +02:00
|
|
|
The `receiver` and `Length` expressions will be spilled as appropriate to ensure any side effects are only executed
|
2019-04-05 02:01:49 +02:00
|
|
|
once. For example:
|
|
|
|
|
|
|
|
``` csharp
|
|
|
|
class Collection {
|
|
|
|
private int[] _array = new[] { 1, 2, 3 };
|
|
|
|
|
|
|
|
int Length {
|
|
|
|
get {
|
|
|
|
Console.Write("Length ");
|
|
|
|
return _array.Length;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
int GetAt(int index) => _array[index];
|
|
|
|
}
|
|
|
|
|
|
|
|
class SideEffect {
|
|
|
|
Collection Get() {
|
|
|
|
Console.Write("Get ");
|
|
|
|
return new Collection();
|
|
|
|
}
|
|
|
|
|
|
|
|
void Use() {
|
|
|
|
int i = Get().GetAt(^1);
|
|
|
|
Console.WriteLine(i);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
This code will print "Get Length 3".
|
|
|
|
|
|
|
|
This feature would be beneficial to any member which had a parameter that represented an index. For example
|
|
|
|
`List<T>.InsertAt`. This also has the potential for confusion as the language can't give any guidance as to whether
|
2019-04-06 01:47:51 +02:00
|
|
|
or not an expression is meant for indexing. All it can do is convert any `Index` expression to `int` when invoking
|
2019-04-05 02:01:49 +02:00
|
|
|
a member on a Countable type.
|
|
|
|
|
|
|
|
Restrictions:
|
|
|
|
|
|
|
|
- This conversion is only applicable when the expression with type `Index` is directly an argument to the member. It
|
|
|
|
would not apply to any nested expressions.
|
2019-04-01 01:00:29 +02:00
|
|
|
|
|
|
|
## Related Issues
|
|
|
|
- https://github.com/dotnet/csharplang/blob/master/proposals/csharp-8.0/ranges.cs
|
|
|
|
- https://github.com/dotnet/csharplang/blob/master/proposals/csharp-8.0/ranges.md
|
|
|
|
|
|
|
|
## Design Meetings
|
2019-04-05 02:01:49 +02:00
|
|
|
- https://github.com/dotnet/csharplang/blob/master/meetings/2019/LDM-2019-04-01.md
|
2019-04-17 09:46:15 +02:00
|
|
|
|
|
|
|
|
|
|
|
## Decisions made during implementation
|
|
|
|
|
|
|
|
- All members in the pattern must be instance members
|
|
|
|
- If a Length method is found but it has the wrong return type, continue looking for Count
|
|
|
|
- The indexer used for the Index pattern must have exactly one int parameter
|
|
|
|
- The Slice method used for the Range pattern must have exactly two int parameters
|
|
|
|
- When looking for the pattern members, we look for original definitions, not constructed members
|