Mads Torgersen a8d0626a9f Move design meeting notes into subfolders by year

2017-01-31 10:38:26 -08:00

16 KiB

Raw Permalink Blame History

C# Design Notes for Apr 12-22, 2016

These notes summarize discussions across a series of design meetings in April on several topics related to tuples and patterns:

Tuple syntax for non-tuple types
Tuple deconstruction
Tuple conversions
Deconstruction and patterns
Out vars and their scope

There's still much more to do here, but lots of progress.

Tuple syntax for other types

We are introducing a new family of System.ValueTuple<...> types to support tuples in C#. However, there are already types that are tuple-like, such as System.Tuple<...> and KeyValuePair<...>. Not only are these often used throughout C# code, but the former is also what's targeted by F#'s tuple mechanism, which we'd want our tuple feature to interoperate well with.

Additionally you can imagine allowing other types to benefit from some, or all, of the new tuple syntax.

The obvious kinds of interop would be tuple construction and deconstruction. Since tuple literals are target typed, consider a tuple literal that is assigned to another tuple-like type:

System.Tuple<int, string> t = (5, null);

We could think of this as calling System.Tuple's constructor, or as a conversion, or maybe something else. Similarly, tuples will allow deconstruction and "decorated" names tracked by the compiler. Could we allow those on other types as well?

There are several levels of support we could decide land on:

Only tuples are tuples. Other types are on their own.
Specific well-known tuple-like things are also tuples (probably Tuple<...> and KeyValuePair<...>).
An author of a type can make it tuple-like through certain API patterns
I can make anything work with tuple syntax without being the author of it
All types just work with it

Level 2 would be enough to give us F# interop and improve the experience with existing APIs using Tuple<...> and KeyValuePair<...>. Option 3 could rely on any kind of declarations in the type, whereas option 4 would limit that to instance-method patterns that someone else could add through extension methods.

It is hard to see how option 5 could work for deconstruction, but it might work for construction, simply by treating a tuple literal as an argument list to the type's constructor. One might consider it invasive that a type's constructor can be called without a new keyword or any mention of the type! On the other hand this might also be seen as a really nifty abbreviation. One problem would be how to use it with constructors with zero or one argument. So far we haven't opted to add syntax for 0-tuples and 1-tuples.

We haven't yet decided which level we want to target, except we want to at least make Tuple<...> and KeyValuePair<...> work with tuple syntax. Whether we want to go further is a decision we probably cannot put off for a later version, since a later addition of capabilities might clash with user-defined conversions.

Tuple deconstruction

Whether deconstruction works for other types or not, we at least want to do it for tuples. There are three contexts in which we consider tuple deconstruction:

Assignment: Assign a tuple's element values into existing variables
Declaration: Declare fresh variables and initialize them with a tuple's element values
Pattern matching: Recursively apply patterns to each of a tuple's element values

We would like to add forms of all three.

Deconstructing assignments

It should be possible to assign to existing variables, to fields and properties, array elements etc., the individual values of a tuple:

(x, y) = currentFrame.Crop(x, y); // x and y are existing variables
(a[x], a[x+1], a[x+2]) = GetCoordinates();

We need to be careful with the evaluation order. For instance, a swap should work just fine:

(a, b) = (b, a); // Yay!

A core question is whether this is a new syntactic form, or just a variation of assignment expressions. The latter is attractive for uniformity reasons, but it does raise some questions. Normally, the type and value of an assignment expression is that of its left hand side after assignment. But in these cases, the left hand side has multiple values and types. Should we construct a tuple from those and yield that? That seems contrary to this being about deconstructing not constructing tuples!

This is something we need to ponder further. As a fallback we can say that this is a new form of assignment statement, which doesn't produce a value.

Deconstructing declarations

In most of the places where local variables can be introduced and initialized, we'd like to allow deconstructing declarations - where multiple variables are declared, but assigned collectively from a single tuple (or tuple-like value):

(var x, var y) = GetCoordinates();             // locals in declaration statements
foreach ((var x, var y) in coordinateList) ... // iteration variables in foreach loops
from (x, y) in coordinateList ...              // range variables in queries
M(out (var x, var y));                         // tuple out parameters

For range variables in queries, this would depend on clever use of transparent identifiers over tuples.

For out parameters this may require some form of post-call assignment, like VB has for properties passed to out parameters. That may or may not be worth it.

For syntax, there are two general approaches: "Types-with-variables" or "types-apart".

// Types-with-variables:
(string first, string last) = GetName(); // Types specified
(var first, var last) = GetName();       // Types inferred
var (first, last) = GetName();           // Optional shorthand for all var

// Types-apart:
(string, string) (first, last) = GetName(); // Types specified
var (first, last) = GetName();              // All types inferred
(var, var) (first, last) = GetName();       // Optional long hand for types inferred

This is mostly a matter of intuition and taste. For now we've opted for the types-with-variables approach, allowing the single var shorthand. One benefit is that this looks more similar to what we envision deconstruction in tuples to look like. Feedback may change our mind on this.

Multiple variables won't make sense everywhere. For instance they don't seem appropriate or useful in using-statements. We'll work through the various declaration contexts one by one.

Tuple conversions

Viewed as generic structs, tuple types aren't inherently covariant. Moreover, struct covariance is not supported by the CLR. And yet it seems entirely reasonable and safe that tuple values be allowed to be assigned to more accommodating tuple types:

(byte, short) t1 = (1, 2);
(int, int t2) = t1; // Why not?

The intuition is that tuple conversion should be thought of in a pointwise manner. A tuple type is convertible (in a given manner) to another if each of their element types are pairwise convertible (in the same manner) to each other.

However, if we are to allow this we need to build it into the language specifically - sometimes implementing tuple assignment as assigning the elements one-by-one, when the CLR doesn't allow the wholesale assignment of the tuple value.

In essence we'd be looking at a situation similar to when we introduced nullable value types in C# 2. Those are implemented in terms of generic structs, but the language adds extensive special semantics to these generic structs, allowing operations - including covariant conversions - that do not automatically fall out from the underlying representation.

This language-level relaxation comes with some subtle breaks that can happen on upgrade. Consider the following code, where C.dll is a C# 6 consumer of C# 7 libraries A.dll and B.dll:

A.dll:

(int, long) Foo()... // ValueTuple<int, long> Foo();

B.dll:

void Bar(object o)
void Bar((int?, long?) t)

C.dll:

Bar(Foo());

Because C# 6 has no knowledge of tuple conversions, it would pick the first overload of Bar for the call. However, when the owner of C.dll upgrades to C# 7, relaxed tuple conversion rules would make the second overload applicable, and a better pick.

It is important to note that such breaks are esoteric. Exactly parallel examples could be constructed for when nullable value types were introduced; yet we never saw them in practice. Should they occur they are easy to work around. As long as the underlying type (in our case ValueTuple<...>) and the conversion rules are introduced at the same time, the risk of programmers getting them mixed in a dangerous manner is minimal.

Another concern is that "pointwise" tuple conversions, just like nullable value types, are a pervasive change to the language, that affects many parts of the spec and implementation. Is it worth the trouble? After all it is pretty hard to come up with compelling examples where conversions between two tuple types (as opposed to from tuple literals or to individual variables in a deconstruction) is needed.

We feel that tuple conversions are an important part of the intuition around tuples, that they are primarily "groups of values" rather than "values in and of themselves". It would be highly surprising to developers if these conversions didn't work. Consider the baffling difference between these two pieces of code if tuple conversions didn't work:

(long, long) tuple = (1, 1);

var tmp = (1, 1);
(long, long) tuple = tmp; // Doesn't work??!?

All in all we feel that pointwise tuple conversions are worth the effort. Furthermore it is crucial that they be added at the same time as the tuples themselves. We cannot add them later without significant breaking changes.

Tuples vs ValueTuple

In accordance with this philosophy we cannot say more about the relationship between language level tuple types and the underlying ValueTuple<...> types.

Just like Nullable<T> is equivalent to T?, so ValueTuple<T1, T2, T3> should be in every way equivalent to the unnamed (T1, T2, T3). That means the pointwise conversions also work when tuple types are specified using the generic syntax.

If the tuple is bigger than the limit of 7, the implementation will nest the "tail" as a tuple into the eighth element recursively. This nesting is visible by accessing the Rest field of a tuple, but that field is considered an implementation detail, and is hidden from e.g. auto-completion, just as the ItemX field names are hidden but allowed when a tuple has named elements.

A well formed "big tuple" will have names Item1 etc. all the way up to the number of tuple elements, even though the underlying type doesn't physically have those fields directly defined. The same goes for the tuple returned from the Rest field, only with the numbers "shifted" appropriately. All this says is that the tuple in the Rest field is treated the same as all other tuples.

Deconstructors and patterns

Whether or not we allow arbitrary values to opt in to the unconditional tuple deconstruction described above, we know we want to enable such positional deconstruction in recursive patterns:

if (o is Person("George", var last)) ...

The question is: how exactly does a type like Person specify how to be positionally deconstructed in such cases? There are a number of dimensions to this question, along with a number of options for each:

Static or instance/extension member?
GetValues method or new is operator?
Return tuple or tuple out parameter or several out parameters?

Selecting between these, we have to observe a number of different tradeoffs:

Overloadable or not?
Yields tuple or individual values?
Growth path to "active patterns"?
Can be applied to existing types without modifying them?

Let's look at these in turn.

Overloadability

If the multiple extracted values are returned as a tuple from a method (whether static or instance) then that method cannot be overloaded.

public (string firstName, string lastName) GetValues() { ... }

The deconstructor is essentially canonical. That may not be a big deal from a usability perspective, but it does hamper the evolution of the type. If it ever adds another member and wants to enable access to it through deconstruction, it needs to replace the deconstructor, it cannot just add a new overload. This seems unfortunate.

A method that yields it results through one or more out parameters can be overloaded in C#. Also, for a new kind of user defined operator we can decide the overloading rule whichever way we like. For instance, conversion operators today can be overloaded on return type.

Tuple or individual values

If a deconstructor yields a tuple, then that confers special status to tuples for deconstruction. Essentially tuples would have their own built-in deconstruction mechanism, and all other types would defer to those by supplying a tuple.

Even if we rely on multiple out parameters, tuples cannot just use the same mechanism. In order to do so, long tuples would need to be enhanced by the compiler with an implementation that hides the nested nature of such tuples.

There doesn't seem to be any strong benefit to yielding a single tuple over multiple values (in out parameters).

Growing up to active patterns

There's a proposal where one type gets to specify deconstruction semantics for another, along even with logic to determine whether the pattern applies or not. We do not plan to support that in the first go-around, but it is worth considering whether the deconstruction mechanism lends itself to such an extension.

In order to do so it would need to be static (so that it can specify behavior for an object of another type), and would benefit from an out-parameter-based approach, so that the return position could be reserved for returning a boolean when the pattern is conditional.

There is a lot of speculation involved in making such concessions now, and we could reasonably rely on our future selves to invent a separate specification mechanism for active patterns without us having to accommodate it now.

Conclusion

This was an exploration of the design space. The actual decision is left to a future meeting.

Out vars and their scope

We are in favor of reviving a restricted version of the declaration expressions that were considered for C# 6. This would allow methods following the TryFoo pattern to behave similarly to the new pattern-based is-expressions in conditions:

if (int.TryParse(s, out var i) && i > 0) ...

We call these "out vars", even though they are perfectly fine to specify a type. The scope rules for variables introduced in such contexts would be the same as variables coming from a pattern: they generally be in scope within all of the nearest enclosing statement, except when that is an if-statement, where they would not be in scope in the else-branch.

On top of that there are some relatively esoteric positions we need to decide on.

If an out var occurs in a field initializer, where should it be in scope? Just within the declarator where it occurs, not even in subsequent declarators of the same field declaration.

If an out var occurs in a constructor initializer (this(...) or base(...)) where should it be in scope? Let's not even allow that - there's no way you could have written equivalent code yourself.

16 KiB Raw Permalink Blame History