csharplang/meetings/2016/LDM-2016-07-15.md
2017-01-31 10:38:26 -08:00

5.8 KiB

C# Design Language Notes for Jul 15, 2016

Agenda

In this meeting we took a look at what the scope rules should be for variables introduced by patterns and out vars.

Scope of locals introduced in expressions

So far in C#, local variables have only been:

  1. Introduced by specific kinds of statements, and
  2. Scoped by specific kinds of statements

for, foreach and using statements are all able to introduce locals, but at the same time also constitute their scope. Declaration statements can introduce local variables into their immediate surroundings, but those surroundings are prevented by grammar from being anything other than a block {...}. So for most statement forms, questions of scope are irrelevant.

Well, not anymore! Expressions can now contain patterns and out arguments that introduce fresh variables. For any statement form that can contain expressions, we therefore need to decide how it relates to the scope of such variables.

Current design

Our default approach has been fairly restrictive:

  • All such variables are scoped to the nearest enclosing statement (even if that is a declaration statement)
  • Variables introduced in the condition of an if statement aren't in scope in the else clause (to allow reuse of variable names in nested else ifs)

This approach caters to the "positive" scenarios of is expressions with patterns and invocations of Try... style methods with out parameters:

if (o is bool b) ...b...; // b is in scope
else if (o is byte b) ...b...; // fine because bool b is out of scope
...; // no b's in scope here

It doesn't handle unconditional uses of out vars, though:

GetCoordinates(out var x, out var y);
...; // x and y not in scope :-(

It also fits poorly with the "negative" scenarios embodied by what is sometimes called the "bouncer pattern", where a method body starts out with a bunch of tests (of parameters etc.) and jumps out if the tests fail. At the end of the test you can write code at the highest level of indentation that can assume that all the tests succeeded:

void M(object o)
{
  if (o == null) throw new ArgumentNullException(nameof(o));
  ...; // o is known not to be null
}

However, the strict scope rules above make it intractable to extend the bouncer pattern to use patterns and out vars:

void M(object o)
{
  if (!(o is int i)) throw new ArgumentException("Not an int", nameof(o));
  ...; // we know o is int, but i is out of scope :-(
}

Guard statements

In Swift, this scenario was found so important that it earned its own language feature, guard, that acts like an inverted if, except that a) variables introduced in the conditions are in scope after the guard statement, and b) there must be an else branch that leaves the enclosing scope. In C# it might look something like this:

void M(object o)
{
  guard (o is int i) else throw new ArgumentException("Not an int", nameof(o)); // else must leave scope
  ...i...; // i is in scope because the guard statement is specially leaky
}

A new statement form seems like a heavy price to pay. And a guard statement wouldn't deal with non-error bouncers that correct the problem instead of bailing out:

void M(object o)
{
  if (!(o is int i)) i = 0;
  ...; // would be nice if i was in scope and definitely assigned here
}

(In the bouncer analogy I guess this is equivalent to the bouncer lending the non-conforming customer a tie instead of throwing them out for not wearing one).

Looser scope rules

It would seem better to address the scenarios and avoid a new statement form by adopting more relaxed scoping rules for these variables.

How relaxed, though?

Option 1: Expression variables are only scoped by blocks

This is as lenient as it gets. It would create some odd situations, though:

for (int i = foo(out int j);;); 
// j in scope but not i?

It seems that these new variables should at least be scoped by the same statements as old ones:

Option 2: Expression variables are scoped by blocks, for, foreach and using statements, just like other locals:

This seems more sane. However, it still leads to possibly confusing and rarely useful situations where a variable "bleeds" out many levels:

if (...)
  if (...)
    if (o is int i) ...i...
...; // i is in scope but almost certainly not definitely assigned 

It is unlikely that the inner if intended i to bleed out so aggressively, since it would almost certainly not be useful at the outer level, and would just pollute the scope.

One could say that this can easily be avoided by the guidance of using curly brace blocks in all branches and bodies, but it is unfortunate if that changes from being a style guideline to having semantic meaning.

Option 3: Expression variables are scoped by blocks, for, foreach and using statements, as well as all embedded statements:

What is meant by an embedded statement here, is one that is used as a nested statement in another statement - except inside a block. Thus the branches of an if statement, the bodies of while, foreach, etc. would all be considered embedded.

The consequence is that variables would always escape the condition of an if, but never its branches. It's as if you put curlies in all the places you were "supposed to".

Conclusion

While a little subtle, we will adopt option 3. It strikes a good balance:

  • It enables key scenarios, including out vars for non-Try methods, as well as patterns and out vars in bouncer if-statements.
  • It doesn't lead to egregious and counter-intuitive multi-level "spilling".

It does mean that you will get more variables in scope than the current restrictive regime. This does not seem dangerous, because definite assignment analysis will prevent uninitialized use. However, it prevents the variable names from being reused, and leads to more names showing in completion lists. This seems like a reasonable tradeoff.