csharplang/meetings/2018/LDM-2018-01-18.md
2018-03-20 17:38:03 -07:00

6.4 KiB

C# Language Design Notes for Jan 18, 2018

Agenda

We discussed the range operator in C# and the underlying types for it.

  1. Scope of the feature
  2. Range types
  3. Type name
  4. Open-ended ranges
  5. Empty ranges
  6. Enumerability
  7. Language questions

Scope of the feature

The scenario we are eager to address right now relates to indexing and slicing, where the elements of the range are contiguous integral indices into some data structure.

We want to design the language feature and API so that we address this scenario in the best possible way, without cutting off future support for range syntax in other scenarios, such as math, inclusion tests, etc., which require more generality.

Range types

We've talked about having a couple of types, say Range and LongRange representing general-purpose ranges over ints and longs, respectively.

However a Range type that ranges over all ints might have a length that is greater than what an int can contain! We could represent the Length property with a uint, but then it doesn't interoperate well with all the int-based types and logic in the language and framework.

And when we think about the core scenario, this pain would all be for naught: for indexing purposes, indices are always non-negative ints, and ranges of them would therefore never be longer than int.MaxValue.

It therefore seems that we should build a Range type specifically for indexing purposes, and have language support only for that, but in a way that we can generalize later. Specifically we can support conversion of x..y expressions only to Range for now, but open up for user-defined .. operators later.

Conclusion

Design just an indexing-oriented Range type for now.

Type name

Should the indexing-specific range type be called IndexRange or just Range? On the one hand it is a bit dangerous to use the good name for a special-purpose range type. On the other it is going to be very common, probably much more so than any other future range type. It is possible that we will someday have a Range<T> type that represents a more abstract mathematical notion of ranges. It will then be a bit odd for Range and Range<T> to be only loosely related to each other.

Conclusion

We can live with that risk. We think it is ok for this type to be called Range.

Open-ended ranges

Should we allow open-ended range values, as represented by missing endpoints in the language syntax: x.., ..y and ..? It seems a significant loss of expressiveness if we don't. Scenarios for open-at-one-end are common, and scenarios for open-at-both-ends include multi-dimensional slicing, where we want a concise way to keep "all of" a given dimension: tensor[.., ..1, ..].

Open-ended ranges are semantically different from just having 0 and int.MaxValue at the ends, because indexing/slicing APIs will require range arguments to be "within range" of the target data structure. Where m..int.MaxValue would therefore often yield an exception, an open-ended n.. would just mean "from n to the end, wherever that is", and be legal as long as n is within range. Also, we can't just represent open-ended ranges as closed ones with the target data structure's min and max index substituted in, because range values can exist independently of a given target data structure.

Internally, the Range type can e.g. use negative ints to represent open ends, so as not to waste space. This representation would not be visible from the outside though.

Can we use nullable int in the constructor, to represent endpoints that may not be there? Then null means open in that end. We could have an (int, int) constructor overload for efficiency. Proposal for this to be written offline.

Conclusion

Yes, support open-ended ranges.

Empty ranges

Can the Range type represent empty ranges? In that case, does the start point matter or not, or are all empty ranges the same? They might behave differently when you slice, depending of whether the start point is in range in the target data structure. (If not, they will throw).

Can you index an empty array with an empty range? Yes. As long as it's in range; otherwise it throws. This is consistent with slices, spans and strings.

Conclusion

It's important to support empty ranges. The start point should matter even for empty ranges, in that indexing/slicing would throw if the start point of an empty range wasn't in range of the target data structure.

Enumerability

Range should implement IEnumerable<T>, and needs to support the enumerable pattern of having a struct enumerator type. This is extra bulk in the type, but that's just how we do things.

Enumeration on open ranges should probably throw.

Language questions

Given this there are a couple of questions on the language support, which we gathered here, but did not yet answer.

Should range expressions have Range as a natural type, or should they just convert to it?

If they have Range as a natural type, then you can infer Range as the type of a range expression: var r = 3..5;. And support for foreach naturally falls out: foreach (var x in 3..5) { ... }.

However, this locks in Range as the "primary" range type for the language forever. It would mean that applicable overloads taking the Range type would always win over other overloads taking range expressions (once those are allowed).

Also, the foreach support based on this natural type wouldn't be very expressive: It wouldn't allow starting at negative numbers, or going backwards.

Other notations

We're fairly certain we want an inclusive start..end syntax. Presumably, open-ended ranges are expressed by leaving out one or the other endpoint.

But should there be other notations? In particular, it seems that a syntax for giving an index and a count would handle a common scenario; possibly more common. All current APIs are like that, and many (most?) use cases would take a given number of elements at a time.

It's not obvious what such a syntax should be, though. start:count and start::count both would have some ambiguity with existing syntax.

Also, at some point we might want to consider ranges that are exclusive of their endpoints. That's not a big deal with Range as the only target range type, where exclusive is just a question of +1 or -1. But if at some point we start considering ranges of floats etc., inclusion or exclusion would need first class representation in the range types, and in the language's notation.