csharplang/meetings/2016/LDM-2016-10-25-26.md

151 lines
6.3 KiB
Markdown
Raw Normal View History

C# Language Design Notes for Oct 25 and 26, 2016
================================================
Agenda
------
- Declaration expressions as a generalizing concept
- Irrefutable patterns and definite assignment
- Allowing tuple-returning deconstructors
- Avoiding accidental reuse of out variables
- Allowing underbar as wildcard character
Declaration expressions
=======================
In C# 6.0 we embraced, but eventually abandoned, a very general notion of "declaration expressions" - the idea that a variable could be introduced *as* an expression `int x`.
The full generality of that proposal had some problems:
* A variable introduced like `int x` is unassigned and therefore unusable in most contexts. It also leads to syntactic ambiguities
* We therefore allowed an optional initializer `int x = e`, so that it could be used elsewhere. But that lead to weirdness around when `= e` meant assignment (the result of which is a *value*) and when it meant initialization (the result of which is a *variable* that can be assigned to or passed by ref).
However, there's value in looking at C#'s new deconstruction and out variable features through the lens of declaration expressions.
Currently we have
* Deconstructing *assignments* of the form `(x, y) = e`
* Deconstructing *declarations* of the form `(X x, Y y) = e`
* Out variables of the form `M(out X x)`
This calls for generalization. What if we said that
* Tuple expressions `(e1, e2)` can be lvalues if all their elements are lvalues, and will cause deconstruction when assigned to
* There are declaration expressions `X x` that can only occur in positions where an unassigned variable is allowed, that is
* In or as part of the left hand side of an assignment
* In or as part of an out argument
Then all the above features - and more - can be expressed in terms of combinations of the two:
* Deconstructing assignments are just a tuple on the left hand side of an assignment
* Deconstructing declarations are just deconstructing assignments where all the nested variables are declaration expressions
* Out variables are just declaration expressions passed as an out argument
Given the short time left for fixes to C# 7.0 we could still keep it to these three special cases for now. However, if those restrictions were later to be lifted, it would lead to a number of other things being expressible:
* Using a "deconstructing declaration" as an expression (today it is a statement)
* Mixing existing and new variables in a deconstructing assignment `(x, int y) = e`
* deconstruction in out context `M(out (x, y))`
* single declaration expression on the left hand side of assignment `int x = e`
The work involved in moving to this model *without* adding this functionality would be to modify the representation in the Roslyn API, so that it can be naturally generalized later.
Conclusion
----------
Let's re-cast the currently planned C# 7.0 features in turns of declaration expressions and tuple expressions, and then plan to later add some or all of the additional expressiveness this enables.
Irrefutable patterns
====================
Some patterns are "irrefutable", meaning that they are known by the compiler to be always fulfilled. We currently make very limited use of this property in the "subsumption" analysis of switch cases. But we could use it elsewhere:
``` c#
if (GetInt() is int i) { ... }
UseInt(i); // Currently an error. We could know that i is definitely assigned here
```
This might not seem to be of much value - why are you applying a pattern if it never fails? But in a future with recursive patterns, this may become more useful:
``` c#
if (input is Assignment(Expression left, var right) && left == right) { ... }
... // condition failed, but left and right are still assigned
```
Conclusion
----------
This seems harmless, and will grow more useful over time. It's a small tweak that we should do.
Tuple-returning deconstructors
==============================
It's a bit irksome that deconstructors must be written with out parameters. This is to allow overloading on arity, but in practice most types would only declare one. We could at least optionally allow one of them to be defined with a return tuple instead:
``` c#
(int x, int y) Deconstruct() => (X, Y);
```
Instead of:
``` c#
void Deconstruct(out int x, out int y) => (x, y) = (X, Y);
```
Evidence is inconclusive as to which is more efficient: it depends on circumstances.
Conclusion
----------
Not worth it.
Definite assignment for out vars
================================
With the new scope rules, folks have been running into this:
``` c#
if (int.TryParse(s1, out var i)) { ... i ... }
if (int.TryParse(s2, out var j)) { ... i ... } // copy/paste bug - still reading i instead of j
```
It works the same as when people had to declare their own variables outside the `if`, but it seems a bit of a shame that we can't do better now. Could we do something with definite assignment of out variables that could prevent this situation?
Conclusion
----------
Whatever we could do here would be too specific for a language solution. It would be a great idea for a Roslyn analyzer, which can
- identify `TryFoo` methods by looking for "Try" in the name and the `bool` return type
- find this bug in existing code declaring the variable ahead of time, *as well as* in new code using out variables
Underbar wunderbar
==================
Can we make `_` the wildcard instead of `*`? We'd need to be sneaky in order to preserve current semantics (`_` is a valid identifier) while allowing it as a wildcard in new kinds of code.
``` c#
M(out var _);
int _ = e.M(_ => ..._...x...); // outer _ is still a named variable?
(int x, var _) = e.M(_ => ..._...x...);
(_, _) = ("x", 2);
```
We would need to consider rules along the following lines:
* Make `_` always a wildcard in deconstructions and patterns
* Make `_` always a wildcard in declaration expressions and patterns
* Allow `_` as a wildcard in other l-value situations (out arguments at least, but maybe assignment) when it's not already defined
* Allow `_` to be declared more than once in the same scope, in which case it is a wildcard when mentioned
* Allow `_` to be "redeclared" in an inner scope, and then it's a wildcard in the inner scope
"Once a wildcard, always a wildcard!"
Conclusion
----------
This is worth pursuing. More thinking is needed!