# C# Language Design Notes for Jan 18, 2018 ## Agenda We discussed the range operator in C# and the underlying types for it. 1. Scope of the feature 2. Range types 3. Type name 4. Open-ended ranges 5. Empty ranges 6. Enumerability 7. Language questions # Scope of the feature The scenario we are eager to address right now relates to indexing and slicing, where the elements of the range are contiguous integral indices into some data structure. We want to design the language feature and API so that we address this scenario in the best possible way, without cutting off future support for range syntax in other scenarios, such as math, inclusion tests, etc., which require more generality. # Range types We've talked about having a couple of types, say `Range` and `LongRange` representing general-purpose ranges over ints and longs, respectively. However a `Range` type that ranges over all ints might have a length that is greater than what an `int` can contain! We could represent the `Length` property with a `uint`, but then it doesn't interoperate well with all the `int`-based types and logic in the language and framework. And when we think about the core scenario, this pain would all be for naught: for indexing purposes, indices are always non-negative `int`s, and ranges of them would therefore never be longer than `int.MaxValue`. It therefore seems that we should build a `Range` type specifically for indexing purposes, and have language support *only* for that, but in a way that we can generalize later. Specifically we can support conversion of `x..y` expressions only to `Range` for now, but open up for user-defined `..` operators later. ## Conclusion Design just an indexing-oriented `Range` type for now. # Type name Should the indexing-specific range type be called `IndexRange` or just `Range`? On the one hand it is a bit dangerous to use the good name for a special-purpose range type. On the other it is going to be *very* common, probably much more so than any other future range type. It is possible that we will someday have a `Range` type that represents a more abstract mathematical notion of ranges. It will then be a bit odd for `Range` and `Range` to be only loosely related to each other. ## Conclusion We can live with that risk. We think it is ok for this type to be called `Range`. # Open-ended ranges Should we allow open-ended range values, as represented by missing endpoints in the language syntax: `x..`, `..y` and `..`? It seems a significant loss of expressiveness if we don't. Scenarios for open-at-one-end are common, and scenarios for open-at-both-ends include multi-dimensional slicing, where we want a concise way to keep "all of" a given dimension: `tensor[.., ..1, ..]`. Open-ended ranges are semantically different from just having `0` and `int.MaxValue` at the ends, because indexing/slicing APIs will require range arguments to be "within range" of the target data structure. Where `m..int.MaxValue` would therefore often yield an exception, an open-ended `n..` would just mean "from `n` to the end, wherever that is", and be legal as long as `n` is within range. Also, we can't just represent open-ended ranges as closed ones with the target data structure's min and max index substituted in, because range values can exist independently of a given target data structure. Internally, the `Range` type can e.g. use negative ints to represent open ends, so as not to waste space. This representation would not be visible from the outside though. Can we use nullable int in the constructor, to represent endpoints that may not be there? Then `null` means open in that end. We could have an `(int, int)` constructor overload for efficiency. Proposal for this to be written offline. ## Conclusion Yes, support open-ended ranges. # Empty ranges Can the `Range` type represent empty ranges? In that case, does the start point matter or not, or are all empty ranges the same? They might behave differently when you slice, depending of whether the start point is in range in the target data structure. (If not, they will throw). Can you index an empty array with an empty range? Yes. As long as it's in range; otherwise it throws. This is consistent with slices, spans and strings. ## Conclusion It's important to support empty ranges. The start point should matter even for empty ranges, in that indexing/slicing would throw if the start point of an empty range wasn't in range of the target data structure. # Enumerability `Range` should implement `IEnumerable`, and needs to support the enumerable pattern of having a struct enumerator type. This is extra bulk in the type, but that's just how we do things. Enumeration on open ranges should probably throw. # Language questions Given this there are a couple of questions on the language support, which we gathered here, but did not yet answer. ## Should range expressions have `Range` as a natural type, or should they just convert to it? If they have `Range` as a natural type, then you can infer `Range` as the type of a range expression: `var r = 3..5;`. And support for `foreach` naturally falls out: `foreach (var x in 3..5) { ... }`. However, this locks in `Range` as the "primary" range type for the language forever. It would mean that applicable overloads taking the `Range` type would always win over other overloads taking range expressions (once those are allowed). Also, the `foreach` support based on this natural type wouldn't be very expressive: It wouldn't allow starting at negative numbers, or going backwards. ## Other notations We're fairly certain we want an inclusive `start..end` syntax. Presumably, open-ended ranges are expressed by leaving out one or the other endpoint. But should there be other notations? In particular, it seems that a syntax for giving an index and a *count* would handle a common scenario; possibly *more* common. All current APIs are like that, and many (most?) use cases would take a given number of elements at a time. It's not obvious what such a syntax should be, though. `start:count` and `start::count` both would have some ambiguity with existing syntax. Also, at some point we might want to consider ranges that are exclusive of their endpoints. That's not a big deal with `Range` as the only target range type, where exclusive is just a question of `+1` or `-1`. But if at some point we start considering ranges of floats etc., inclusion or exclusion would need first class representation in the range types, and in the language's notation.