csharplang/meetings/2014/LDM-2014-05-21.md
2017-02-09 09:25:09 -08:00

10 KiB
Raw Permalink Blame History

C# Language Design Notes for May 21, 2014

Agenda

  1. Limit the nameof feature? <keep current design>
  2. Extend params IEnumerable? <keep current design>
  3. String interpolation <design nailed down>

Limit the nameof feature?

The current design of the nameof(x) feature allows for the named entity to reference entities that are not uniquely identified by the (potentially dotted) name given: methods overloaded on signature, and types overloaded on generic arity.

This was rather the point of the feature: since you are only interested in the name, why insist on binding uniquely to just one symbol? As long as theres at least one entity with the given name, it seems fine to yield the name without error. This sidesteps all the issues with the mythical infoof() feature (that would take an entity and return reflective information for it) of coming up with language syntax to uniquely identify overloads. (Also theres no need to worry about distinguishing generic instantiations from uninstantiated generics, etc.: they all have the same name).

The design, however, does lead to some interesting challenges for the tooling experience:

public void M();
public void M(int i);
public void M(string s);
WriteLine(nameof(M)); // writes the string "M"

Which definition of M should “Go To Definition” on M go to? When does renaming an overload cause a rename inside of the nameof? Which of the overloads does the occurrence of nameof(M) count as a reference to? Etc. The ambiguity is a neat trick at the language level, but a bit of a pain at the tooling level.

Should we limit the application of nameof to situations where it is unambiguous?

Conclusion

No. Lets keep the current design. We can come up with reasonable answers for the tooling challenges. Hobbling the feature would hurt real scenarios.

Extend params IEnumerable?

By current design, params is extended to apply to IEnumerable<T> parameters. The feature still works by the call site generating a T[] with the arguments in it; but that array is of course available inside the method only as an IEnumerable<T>.

It has been suggested that we might as well make this feature work for other generic interfaces (or even all types) that arrays are implicitly reference convertible to, instead of just IEnumerable<T>.

It would certainly be straightforward to do, though there are quite a lot of such types. We could even infer an element type for the array from the passed-in arguments for the cases where the collection type does not have an element type of its own.

On the other hand, it is usually bad practice for collection-expecting public APIs to take anything more specific than IEnumerable<T>. That is especially true if the API is not intending to modify the collection, and no meaningful params method would do so: after all, if your purpose is to cause a side effect on a passed-in collection, why would you give the caller the option not to pass one?

Conclusion

Params only really makes sense on IEnumerable<T>. If we were designing the language from scratch today we wouldnt even have params on arrays, but only on IEnumerable<T>. So lets keep the design as is.

String interpolation

There have been a number of questions around how to add string interpolation to C#, some a matter of ambition versus simplicity, some just a matter of syntax. In the following we settle on these different design aspects.

Safety

Concatenation of strings with contents of variables has a long tradition for leading to bugs or even attack vectors, when the resulting string is subsequently parsed up and used as a command. Presumably if you make string concatenation easier, you are more vulnerable to such issues or at least, by having a dedicated string interpolation features, you have a natural place in the language to help address such problems.

Consequently, string interpolation in the upcoming EcmaScript 6 standard allows the user to indicate a function which will be charged with producing the result, based on compiler-generated lists of string fragments and expression results to be filled in. A given trusted function can prevent SQL injection or ensure the well-formedness of a URI.

Conclusion

We dont think accommodating custom interpolators in C# is the sweet spot at this point. Most people are probably just looking for simpler and more readable syntax for filling out holes in strings. However, as we settle on syntax we should keep an eye on our ability to extend for this in the future.

Culture

In .NET theres a choice between rendering values in the current culture or an invariant culture. This determines how common values such as dates and even floating point numbers are shown in text. The default is current culture, which even language-recognized functions such as ToString() make use of.

Current culture is great if what youre producing is meant to be read by humans in the same culture as the program is run in. If you get more ambitious than that with human readers, the next step up is to localize in some more elaborate fashion: looking up resources and whatnot. At that point, you are reaching for heavier hammers than the language itself should probably provide.

Theres an argument that when a string is produced for machine consumption it is better done in the invariant culture. After all, it is quite disruptive to a comma-separated list of floating point values if those values are rendered with commas instead of dots as the decimal point!

Should a string interpolation feature default to current or invariant culture, or maybe provide a choice?

Conclusion

We think this choice has already been made for us, with the language and .NET APIs broadly defaulting to current culture. That is probably the right choice for most quick-and-easy scenarios. If we were to accommodate custom interpolators in the future, there could certainly be one for culture-invariant rendering.

Syntax

The general format is strings with “holes”, the holes containing expressions to be “printed” in that spot. Wed like the syntax to stress the analogy to String.Format as much as possible, and we therefore want to use curly braces {…} in the delimiting of holes. Well return to what exactly goes in the curly braces, but for now there is one central question: how do we know to do string interpolation at all?

There are two approaches we can think of:

  1. Provide new syntax around the holes
  2. Provide new syntax around the string itself

To the first approach, we previously settled on escaping the initial curly brace of each hole to mean this was a string interpolation hole, and the contents should be interpreted as expression syntax:

"Hello, \{name}, you have \{amount} donut{s} left."

Here, name and amount refer to variables in scope, whereas {s} is just part of the text. This has a few drawbacks. It doesnt look that much like a format string, because of the backslash characters in front of the curlies. You also need to visually scan the string to see if it is interpolated. Finally thered be no natural place for us to indicate a custom interpolation function in the future.

An example of the second approach would be to add a prefix to the string to trigger interpolation, e.g.:

$"Hello, {name}, you have {amount} donut\{s\} left."

Now the holes can be expressed with ordinary braces, and just like format strings you have to escape braces to actually get them in the text (though we are eager to use backslash escapes instead of the double braces that format strings use). You can see up front that the string is interpolated, and if we ever add support for custom interpolators, the function can be put immediately before or after the $; whichever we decide:

LOC$"Hello, {name}, you have {amount} donut\{s\} left."
SQL$"…"
URI$"…"

The prefix certainly doesnt have to be a $, but thats the character we like best for it.

We dont actually have to do it with a prefix. JavaScript is going to use back ticks to surround the string. But prefix certainly seems better than yet another kind of string delimiter.

Conclusion

The prefix approach seems better and more future proof. We are happy to use $. It wouldnt compose with the @ sign used for verbatim strings; it would be either one or the other.

Format specifiers

Format strings for String.Format allow various format specifiers in the placeholders introduced by commas and colons. We could certainly allow similar specifiers in interpolated strings. The semantics would be for the compiler to just turn an interpolated string into a call to String.Format, passing along any format specifiers unaltered:

$"Hello, {name}, you have {amount,-3} donut\{s\} left."

This would be translated to

String.Format("Hello, {0}, you have {1,-3} donut{{s}} left.", name, amount)

(Note that formatting of literal curlies needs to change if we want to keep our backslash escape syntax, which, tentatively, we do). The compiler would be free to not call String.Format, if it knows how to do things more optimally. This would typically be the case when there are no format specifiers in the string.

Conclusion

Allow all format specifiers that are allowed in the format strings of String.Format, and just pass them on.

Expressions in the holes

The final important question is which expressions can be put between the curly braces. In principle, we could imagine allowing almost any expression, but it quickly gets weird, both from a readability and from an implementation perspective. What if the expression itself has braces or strings in it? We wouldnt be able to just lex our way past it (when to stop?), and similarly a reader, even with the help of colorization, would get mightily confused about what gets closed out when exactly.

Additionally the choice to allow format specifiers limits the kinds of expressions that can unambiguously precede those.

$"{a as b ?  c : d}" // ?: or nullable type and format specifier?

The other extreme is to allow just a very limited set of expressions. The common case is going to be simple variables anyway, and anything can be expressed by first assigning into variables and then using those in the string.

Conclusion

We want to be quite cautious here, at least to begin with. We can always extend the set of expressions allowed, but for now we want to be close to the restrictive extreme and allow only simple and dotted identifiers.