# C# Language Design Notes for May 21, 2014 ## Agenda 1. Limit the nameof feature? <_keep current design_> 2. Extend params IEnumerable? <_keep current design_> 3. String interpolation <_design nailed down_> ## Limit the nameof feature? The current design of the `nameof(x)` feature allows for the named entity to reference entities that are not uniquely identified by the (potentially dotted) name given: methods overloaded on signature, and types overloaded on generic arity. This was rather the point of the feature: since you are only interested in the name, why insist on binding uniquely to just one symbol? As long as there’s at least one entity with the given name, it seems fine to yield the name without error. This sidesteps all the issues with the mythical `infoof()` feature (that would take an entity and return reflective information for it) of coming up with language syntax to uniquely identify overloads. (Also there’s no need to worry about distinguishing generic instantiations from uninstantiated generics, etc.: they all have the same name). The design, however, does lead to some interesting challenges for the tooling experience: ``` c# public void M(); public void M(int i); public void M(string s); WriteLine(nameof(M)); // writes the string "M" ``` Which definition of `M` should “Go To Definition” on `M` go to? When does renaming an overload cause a rename inside of the `nameof`? Which of the overloads does the occurrence of `nameof(M)` count as a reference to? Etc. The ambiguity is a neat trick at the language level, but a bit of a pain at the tooling level. Should we limit the application of `nameof` to situations where it is unambiguous? ### Conclusion No. Let’s keep the current design. We can come up with reasonable answers for the tooling challenges. Hobbling the feature would hurt real scenarios. ## Extend params IEnumerable? By current design, params is extended to apply to `IEnumerable` parameters. The feature still works by the call site generating a `T[]` with the arguments in it; but that array is of course available inside the method only as an `IEnumerable`. It has been suggested that we might as well make this feature work for other generic interfaces (or even all types) that arrays are implicitly reference convertible to, instead of just `IEnumerable`. It would certainly be straightforward to do, though there are quite a lot of such types. We could even infer an element type for the array from the passed-in arguments for the cases where the collection type does not have an element type of its own. On the other hand, it is usually bad practice for collection-expecting public APIs to take anything more specific than `IEnumerable`. That is especially true if the API is not intending to modify the collection, and no meaningful params method would do so: after all, if your purpose is to cause a side effect on a passed-in collection, why would you give the caller the option not to pass one? ### Conclusion Params only really makes sense on `IEnumerable`. If we were designing the language from scratch today we wouldn’t even have params on arrays, but only on `IEnumerable`. So let’s keep the design as is. ## String interpolation There have been a number of questions around how to add string interpolation to C#, some a matter of ambition versus simplicity, some just a matter of syntax. In the following we settle on these different design aspects. ### Safety Concatenation of strings with contents of variables has a long tradition for leading to bugs or even attack vectors, when the resulting string is subsequently parsed up and used as a command. Presumably if you make string concatenation easier, you are more vulnerable to such issues – or at least, by having a dedicated string interpolation features, you have a natural place in the language to help address such problems. Consequently, string interpolation in the upcoming EcmaScript 6 standard allows the user to indicate a function which will be charged with producing the result, based on compiler-generated lists of string fragments and expression results to be filled in. A given trusted function can prevent SQL injection or ensure the well-formedness of a URI. #### Conclusion We don’t think accommodating custom interpolators in C# is the sweet spot at this point. Most people are probably just looking for simpler and more readable syntax for filling out holes in strings. However, as we settle on syntax we should keep an eye on our ability to extend for this in the future. ### Culture In .NET there’s a choice between rendering values in the current culture or an invariant culture. This determines how common values such as dates and even floating point numbers are shown in text. The default is current culture, which even language-recognized functions such as `ToString()` make use of. Current culture is great if what you’re producing is meant to be read by humans in the same culture as the program is run in. If you get more ambitious than that with human readers, the next step up is to localize in some more elaborate fashion: looking up resources and whatnot. At that point, you are reaching for heavier hammers than the language itself should probably provide. There’s an argument that when a string is produced for machine consumption it is better done in the invariant culture. After all, it is quite disruptive to a comma-separated list of floating point values if those values are rendered with commas instead of dots as the decimal point! Should a string interpolation feature default to current or invariant culture, or maybe provide a choice? #### Conclusion We think this choice has already been made for us, with the language and .NET APIs broadly defaulting to current culture. That is probably the right choice for most quick-and-easy scenarios. If we were to accommodate custom interpolators in the future, there could certainly be one for culture-invariant rendering. ### Syntax The general format is strings with “holes”, the holes containing expressions to be “printed” in that spot. We’d like the syntax to stress the analogy to `String.Format` as much as possible, and we therefore want to use curly braces `{…}` in the delimiting of holes. We’ll return to what exactly goes in the curly braces, but for now there is one central question: how do we know to do string interpolation at all? There are two approaches we can think of: 1. Provide new syntax around the holes 2. Provide new syntax around the string itself To the first approach, we previously settled on escaping the initial curly brace of each hole to mean this was a string interpolation hole, and the contents should be interpreted as expression syntax: ``` c# "Hello, \{name}, you have \{amount} donut{s} left." ``` Here, `name` and `amount` refer to variables in scope, whereas `{s}` is just part of the text. This has a few drawbacks. It doesn’t look that much like a format string, because of the backslash characters in front of the curlies. You also need to visually scan the string to see if it is interpolated. Finally there’d be no natural place for us to indicate a custom interpolation function in the future. An example of the second approach would be to add a prefix to the string to trigger interpolation, e.g.: ``` c# $"Hello, {name}, you have {amount} donut\{s\} left." ``` Now the holes can be expressed with ordinary braces, and just like format strings you have to escape braces to actually get them in the text (though we are eager to use backslash escapes instead of the double braces that format strings use). You can see up front that the string is interpolated, and if we ever add support for custom interpolators, the function can be put immediately before or after the `$`; whichever we decide: ``` c# LOC$"Hello, {name}, you have {amount} donut\{s\} left." SQL$"…" URI$"…" ``` The prefix certainly doesn’t have to be a `$`, but that’s the character we like best for it. We don’t actually have to do it with a prefix. JavaScript is going to use back ticks to surround the string. But prefix certainly seems better than yet another kind of string delimiter. #### Conclusion The prefix approach seems better and more future proof. We are happy to use `$`. It wouldn’t compose with the `@` sign used for verbatim strings; it would be either one or the other. ### Format specifiers Format strings for `String.Format` allow various format specifiers in the placeholders introduced by commas and colons. We could certainly allow similar specifiers in interpolated strings. The semantics would be for the compiler to just turn an interpolated string into a call to `String.Format`, passing along any format specifiers unaltered: ``` c# $"Hello, {name}, you have {amount,-3} donut\{s\} left." ``` This would be translated to ``` c# String.Format("Hello, {0}, you have {1,-3} donut{{s}} left.", name, amount) ``` (Note that formatting of literal curlies needs to change if we want to keep our backslash escape syntax, which, tentatively, we do). The compiler would be free to not call `String.Format`, if it knows how to do things more optimally. This would typically be the case when there are no format specifiers in the string. #### Conclusion Allow all format specifiers that are allowed in the format strings of `String.Format`, and just pass them on. ### Expressions in the holes The final – important – question is which expressions can be put between the curly braces. In principle, we could imagine allowing almost any expression, but it quickly gets weird, both from a readability and from an implementation perspective. What if the expression itself has braces or strings in it? We wouldn’t be able to just lex our way past it (when to stop?), and similarly a reader, even with the help of colorization, would get mightily confused about what gets closed out when exactly. Additionally the choice to allow format specifiers limits the kinds of expressions that can unambiguously precede those. ``` c# $"{a as b ? – c : d}" // ?: or nullable type and format specifier? ``` The other extreme is to allow just a very limited set of expressions. The common case is going to be simple variables anyway, and anything can be expressed by first assigning into variables and then using those in the string. #### Conclusion We want to be quite cautious here, at least to begin with. We can always extend the set of expressions allowed, but for now we want to be close to the restrictive extreme and allow only simple and dotted identifiers.