csharplang/meetings/2014/LDM-2014-04-21.md
2017-02-09 09:25:09 -08:00

20 KiB
Raw Permalink Blame History

C# Language Design Notes for Apr 21, 2014

Agenda

In this design meeting we looked at some of the most persistent feedback on the language features showcased in the BUILD CTP, and fixed up many of the most glaring issues.

  1. Indexed members <lukewarm response, feature withdrawn>
  2. Initializer scope <new scope solves all kinds of problems with initialization>
  3. Primary constructor bodies <added syntax for a primary constructor body>
  4. Assignment to getter-only auto-properties from constructors <added>
  5. Separate accessibility for type and primary constructor <not worthy of new syntax>
  6. Separate doc comments for field parameters and fields <not worthy of new syntax>
  7. Left associative vs short circuiting null propagation <short circuiting>

Indexed members

The indexed member feature e.$x and new C { $x = e } has been received less than enthusiastically. People arent super happy with the syntax, but most of all they arent very excited about the feature.

We came to this feature in a roundabout way, where it started out having much more expressiveness. For instance, it was the way you could declaratively create an object with values at given indices. But given the dictionary initializer syntax new C { ["x"] = e } the $ syntax is again just thin sugar for using string literals in indexers. Is that worth new syntax? It seems not.

Conclusion

Well pull the feature. Theres little love for it, and we shouldnt litter the language unnecessarily. If this causes an outcry, well thats different feedback, and we can then act on that.

Initializer scope

There are a couple of things around primary constructors and scopes that are currently annoying:

  1. You frequently want a constructor parameter and a (private) field with the same name. In fact we have a feature just for that the so-called field parameters, where primary constructors annotated with an accessibility modifier cause a field to be also emitted. However, if you try to declare this manually, we give an error because members and primary constructor parameters are in the same declaration space.

  2. We have special rules for primary constructor parameters, making it illegal to use them after initialization time, even though they are “in scope”.

So in this code:

public class ConfigurationException(Configuration configuration, string message) 
    : Exception(message)
{
    private Configuration configuration = configuration;
    public override string ToString() => message + "(" + configuration + ")";
}

The declaration of the field configuration is currently an error, because it clashes with the parameter of the same name in the same declaration space, but it would be nice if it just worked.

The use of message in a method body is and should be an error, but it would be preferable if that was a more natural consequence of existing scoping rules, instead of new specific restrictions.

An idea to fix this is to introduce what well call the initialization scope. This is a scope and declaration space that is nested within the type declarations scope and declaration space, and which includes the parameters and base initializer arguments of a primary constructor (if any) and the expressions in all member initializers of the type.

That immediately means that this line becomes legal and meaningful:

    private Configuration configuration = configuration;

The field configuration no longer clashes with the parameter configuration, because they are no longer declared in the same declaration space: the latters is nested within the formers. Moreover the reference to configuration in the initializer refers to the parameter, not the field, because while both are in scope, the parameter is nearer.

Some would argue that a line like the above is a little confusing. You are using the same name to mean different things. That is a fair point. The best way to think of it is probably the corresponding line in a normal constructor body:

    this.configuration = configuration;

Which essentially means the same thing. Just as weve gotten used to this disambiguating that line, well easily get used to the leading modifier and type of the field declaration disambiguating the field initializer.

The initialization scope also means that this line is naturally disallowed:

    public override string ToString() => message + "(" + configuration + ")";

Because the reference to message does not appear within the initialization scope, and the parameter is therefore not in scope. If there was a field with that name, the field would get picked up instead; it wouldnt be shadowed by a parameter which would be illegal to reference.

A somewhat strange aspect of the initialization scope is that it is textually discontinuous: it is made up of bits and pieces throughout a type declaration. Hopefully this is not too confusing: conceptually it maps quite clearly to the notion of “initialization time”. Essentially, the scope is made up of “the code that runs when the object is initialized”.

There are some further desirable consequences of introducing the initialization scope:

Field parameters

The feature of field parameters is currently the only way to get a primary constructor parameter and a field of the same name:

public class ConfigurationException(private Configuration configuration, )

With the initialization scope, the feature is no longer special magic, but just thin syntactic sugar over the field declaration above. If for some reason field parameters dont work for you, you can easily fall back to an explicit field declaration with the same name as the parameter.

It raises the question of whether wed even need field parameters, but they still seem like a nice shorthand.

The scope of declaration expressions in initializers

In the current design, each initializer provides its own isolated scope for declaration expressions: there was no other choice really. With the initialization scope, however, declaration expressions in initializers would naturally use that as their scope, allowing the use of locals to flow values between initializers. This may not be common, but you can certainly imagine situations where that comes in handy:

public class ConfigurationException(Configuration configuration, string message) 
    : Exception(message)
{
    private Configuration configuration = configuration;
    public bool IsRemote { get; } = (var settings = configuration.Settings)["remote"];
    public bool IsAsync { get; } = settings["async"];
    public override string ToString() => Message + "(" + configuration + ")";
}

The declaration expression in IsRemotes initializer captures the result of evaluating configuration.Settings into the local variable settings, so that it can be reused in the initializer for IsAsync.

We need to be a little careful about partial types. Since the “textual order” between different parts of a partial type is not defined, it does not seem reasonable to share variables from declaration expressions between different parts. Instead we should introduce a scope within each part of a type containing the field and property initializers contained in that part. This scope is then nested within the initializer scope, which itself covers all the parts.

A similar issue needs to be addressed around the argument to the base initializer. Textually it occurs before the member initializers, but it is evaluated after. To avoid confusion, the argument list needs to be in its own scope, nested inside the scope that contains the field and property initializers (of that part of the type). That way, locals introduced in the argument list will not be in scope in the initializers, and members introduced in the initializers cannot be used in the argument list (because their use would textually precede their declaration).

Primary constructors in VB

Importantly, the notion of the initialization scope would also make it possible to introduce primary constructors in VB. The main impediment to this has been that, because of case insensitivity, the restriction that primary constructor parameters could not coexist with members of the same name became too harsh. If you need both a parameter, a backing field and a property, you quickly run out of names!

The initialization scope helps with that, by introducing a separate scope that the parameters can live in, so they no longer clash with other names.

Unlike C#, VB today allows initializers to reference previously initialized fields. With the initialization scope this would still be possible, as long as theres not a primary constructor parameter shadowing that field. And if there is, you probably want the parameter anyway. An if you dont, you can always get at the field through Me (VBs version of this).

It is up to the VB design team whether to actually add primary constructors this time around, but it is certainly nice to have a model that will work in both languages.

Conclusion

The initialization scope solves many problems, and leaves the language cleaner and with less magic. This clearly outweighs the slight oddities it comes with.

Primary constructor bodies

By far the most commonly reported reason why people cannot use primary constructors is that they dont allow for easy argument validation: there is simply no “body” within which to perform checks and throw exceptions.

We could certainly change that. The simplest thing, syntactically, is to just let you write a block directly in the type body, and that block then gets executed when the object is constructed:

public class ConfigurationException(Configuration configuration, string message) 
    : Exception(message)
{
    {
        if (configuration == null) 
        {
            throw new ArgumentNullException(nameof(configuration));
        }
    }
    private Configuration configuration = configuration;
    public override string ToString() => Message + "(" + configuration + ")";
}

This looks nice, but there is a core question we need to answer: when exactly is that block executed? There seem to be two coherent answers to that, and we need to choose:

  1. The block is an initializer body. It runs before the base call, following the same textual order as the surrounding field and property initializers. You could even imagine allowing multiple of them interspersed with field initialization, and they can occur regardless of whether there is a primary constructor.

  2. The block is a constructor body. It is the body of the primary constructor and therefore runs after the base call. You can only have one, and only if there is a primary constructor that it can be part of.

Both approaches have pros and cons. The initializer body corresponds to a similar feature in Java, and has the advantage that you can weed out bad parameters before you start digging into them or pass them to the base initializer (though arguments passed to the base initializer should probably be validated by the base initializer rather than in the derived class anyway).

As an example of this issue, our previous example where an initializer digs into the contents of a primary constructor parameter, wouldnt work if the validation was done in a constructor body, after initialization (here in a simplified version):

    public bool IsRemote { get; } = configuration.Settings["remote"];

If the passed-in configuration is null, this would yield a null reference exception before a constructor body would have a chance to complain (by throwing a better exception). Instead, in a constructor body interpretation, the initialization of IsRemote would either have to happen in the constructor body as well, following the check, or it would have to make copious use of the null propagating operator that were also adding:

    public bool IsRemote { get; } = configuration?.Settings?["remote"] ?? false;

On the other hand, the notion of a constructor body is certainly more familiar, and it is easy to understand that the block is stitched together with the parameter list and the base initializer to produce the constructor declaration underlying the primary constructor.

Moreover, a constructor body has access to fields and members, while this access during initialization time is prohibited. Therefore, a constructor body can call helper methods etc. on the instance under construction; also a common pattern.

Conclusion

At the end of the day we have to make a choice. Here, familiarity wins. While the initializer body approach has allure, it is also very much a new thing. Constructor bodies on the other hand work the way they work. The downsides have workarounds. So a constructor body it is.

In a partial type, the constructor body must be in the same part as the primary constructor. Scope-wise, the constructor body is nested within the scope for the primary constructors base arguments, which in turn is nested within the scope for the field and property initializers of that part, which in turn is nested within the initialization scope that contains the primary constructor parameters:

partial class C(int x1) : B(int x3 = x1 /* x2 in scope but cant be used */)
{
    public int X0 { get; } = (int x2 = x1);
    {
        int x4 = X0 + x1 + x2 + x3;
    }
}

Lets look at the scopes (and corresponding declaration spaces) nested in each other here:

  • The scope S4 spans the primary constructor body. It directly contains the local variable x4, and is nested within S3.
  • The scope S3 spans S4 plus the argument list to the primary constructors base initializer. It directly contains the local variable x3, and is nested within S2.
  • The scope S2 spans S3 plus all field and property initializers in this part of the type declaration. It directly contains the local variable x2, and is nested within S1.
  • The scope S1 spans S2 plus similar “S2s” from other parts of the type declaration, plus the parameter list of the primary constructor. It directly contains the parameter x1, and is nested within S0.
  • The scope S0 spans all parts of the whole type declaration, including S1. It directly contains the property X0.

On top of this, the usual rule applies for local variables, that they cannot be used in a position that textually precedes their declaration.

Assignment to getter-only auto-properties

There are situations where you cannot use a primary constructor. We have to make sure that you do not fall off too steep of a cliff when you are forced to abandon primary constructor syntax and use an ordinary constructor.

One of the main nuisances that has been pointed out is that the only way to initialize a getter-only auto-property is with an initializer. If you want to initialize it from constructor parameters, you therefore need to have a primary constructor, so those parameters can be in scope for initialization. If you cannot have a primary constructor, then the property cannot be a getter-only auto-property: You have to fall back to existing, more lengthy and probably less fitting property syntax.

Thats a shame. The best way to level the playing field here is to allow assignment to getter-only auto-properties from within constructors:

public class ConfigurationException : Exception
{
    private Configuration configuration;
    public bool IsRemote { get; }
    public ConfigurationException(Configuration configuration, string message) 
        : base(message)
    {
        if (configuration == null) 
        {
            throw new ArgumentNullException(nameof(configuration));
        }
        this.configuration = configuration;
        IsRemote = configuration.Settings["remote"];
    }
}

The assignment to IsRemote would go directly to the underlying field (since there is no setter to call). Thus, semantics are a little different from assignment to get/set auto-properties, where the setter is called even if you assign from a constructor. The difference is observable if the property is virtual. We could restore symmetry by changing the meaning of assignment to a get/set auto-property to also go directly to the backing field, but that would be a breaking change.

Conclusion

Lets allow assignment to getter-only auto-properties from constructor bodies. It translates into assignment directly to the underlying field (which is readonly). We are ok with the slight difference in semantics from get/set auto-property assignment.

Separate accessibility on type and primary constructor

There are scenarios where you dont want the constructors of your type to have the same accessibility as the type. A common case is where the type is public, but the constructor is private or protected, object construction being exposed only through factories.

Should we invent syntax so that a primary constructor can get a different accessibility than its type?

Conclusion

No. There is no elegant way to address this. This is a fine example of a scenario where developers should just fall back to normal constructor syntax. With the previous decisions above, weve done our best to make sure that that cliff isnt too steep.

Separate doc comments for field parameters and their fields

Doc comments for a primary constructor parameter apply to the parameter. If the parameter is a field parameter, there is no way to add a doc comment that goes on the field itself. Should there be?

Conclusion

No. If the field needs separate doc comments, it should just be declared as a normal field. With the introduction of initialization scopes above, this is now not only possible but easy.

Null propagating operator associativity

What does the following mean?

var x = a?.b.c;

People gravitate to two interpretations, which each side maintains is perfectly intuitive and the only thing that makes sense.

One interpretation is that ?. is an operator much like .. It is left associative, and so the meaning of the above is roughly the same as

var x = ((var tmp = a) == null ? null : tmp.b).c;

In other words, we access b only if a is not null, but c is accessed regardless. This is obviously likely to lead to a null reference exception; after all the use of the null propagating operator indicates that theres a likelihood that a is null. So advocates of the “left associative” interpretation would put a diagnostic on the code above, warning that this is probably bad, and pushing people to write, instead:

var x = a?.b?.c;

With a null-check again before accessing c.

The other interpretation has been called “right associative”, but that isnt exactly right (no pun intended): better to call it “short circuiting”. It holds that the null propagating operator should short circuit past subsequent member access (and invocation and indexing) when the receiver is null, essentially pulling those subsequent operations into the conditional:

var x = ((var tmp = a) == null ? null : tmp.b.c);

There are long discussions about this, which I will no attempt to repeat here. The “short circuiting” interpretation is slightly more efficient, and probably more useful. On the other hand it is more complicated to fit into the language, because it needs to “suck up” subsequent operations in a way those operations arent “used to”: since when would the evaluation of e.x not necessarily lead to x being accessed on e? So wed need to come up with alternative versions of remote access, indexing and invocation that can represent being part of a short-circuited chain following a null propagating operator.

Conclusion

Despite the extra complexity and some disagreement on the design team, weve settled on the “short circuiting” interpretation.