csharplang/meetings/2021/LDM-2021-05-17.md
2021-05-18 16:27:15 -07:00

8 KiB

C# Language Design Meeting for May 17th, 2021

Agenda

  1. Raw string literals

Quote(s) of the Day

  • "You have a mongolian vowel separator" "Dear god"
  • "I was expecting [redacted] to flip the table and say I'm out of here" "When you flip the table at home, how do you clean that up?" "Exactly, I don't want to have to clean that up" "Wait wait, that sounds like a new teams feature"
  • "Pretty easy to interpolate those results" "Is that a correct use of the word interpolate?" "No"

Discussion

https://github.com/dotnet/csharplang/issues/4304

Today we took a look through the proposal for raw string literals. This would add a new type of string literal to C#, specifically for working with embedded language scenarios that are difficult to work with in C# today. A number of other languages have added similar features, though our colleagues on Java are perhaps the most direct inspiration for this proposal (particularly with the whitespace trimming feature).

Overall, the LDM is universally in favor of the feature. Some details still remain to be worked out, but we believe this is a feature that will benefit C#, despite the complexity in adding yet another string literal format.

Single-line vs multi-line

The proposal suggests both single-line and multi-line forms for this new literal format. While multi-line literals are the obvious headline feature here, we do think there's a use-case for a single-line form: embedded languages like regex are often single-line, used inline in a method call or similar. They suffer from all the same problems as other embedded languages today (frequent need for quotes, but using quotes is both hard and results in hard-to-read code).

Conclusion

We would like a single-line version.

Whitespace trimming

Verbatim string literals today have behavior that make them somewhat unpleasant for multiline strings because, if whitespace is not desired at the start of the line, the literal content has to be fully start-aligned in the file. This leads to a zig-zag code pattern in the file which breaks up the flow of the file and can make subsequent indentation hard to judge. Trimming solves this by removing whitespace from the start of every line, determined by the whitespace preceding the final quotes. This feature has a couple of levers we can tune to increase or decrease the requirements on the user, namely around handling blank lines. The proposal currently says that blank lines must have a prefix of the removed whitespace on the line. That means that, if the whitespace to be removed is <tab> <tab> <space>, a completely empty line is fine, as is a blank line with <tab> <tab>. Both are prefixes of the whitespace to be removed. However, a line with <tab> <space> is not fine, because that is not a prefix of the whitespace to be removed. We could make this more strict by saying that a blank line must either have the full whitespace to be removed, or no content at all. While this is more strict, it could end up being a better user experience: whitespace is, by nature, hard to visualize, and providing a good diagnostic experience around "this whitespace isn't a prefix of that whitespace" is not something we're eager to tackle. On the other hand, trailing whitespace is pretty easy to accidentally insert today, and we don't make that a compile error by default anywhere else.

Conclusion

We'll go with the stricter behavior for now: blank lines must either contain the entire whitespace being removed, or no text at all. We can relax this later if it makes the experience better.

Language fences

We also looked at allowing a language fence in literals, similar to how markdown works. In multiline markdown strings, ```identifier marks the code block as having a specific language, which the markdown renderer can use to render the text inline. In C# string literals today, there's fragmented support for this implemented in a number of tools: VS, for example, detects when a string is being passed directly to a regex API, and also has support for a comment syntax to turn on highlighting for different locations. We could standardize this for C# raw string literals: the proposal is specifically worded such that text is required to start on the line after the open quotes to allow us to include this feature in the future. One immediate question, however, is how would we support this for single-line literals. Our existing syntax highlighting support is specifically for regex, which is the exact use case for the single-line version, but the language specifier doesn't work there. We could potentially support a trailing language specifier for this case, such as """.*"?"""regex, but it would limit the number of things that can be put in the space.

Conclusion

We're mixed on language fences, leaning against supporting them for now. More debate is needed.

Interpolation

This proposal didn't originally have interpolation, but after a large pushback from the community, interpolation was added. Because the goal of the proposal is to represent all strings without escaping, the immediate next question is how we represent interpolation holes without requiring escaping of braces. The proposal suggests we use the number of $s in front of the raw string to represent the number of braces required to trigger an interpolation hole: $$"""{}""" would be the string {}, because {{}} is needed to be counted as an interpolation hole. IDE experience is going to be very important here: context-sensitive interpolation holes are going to be somewhat harder to keep track of, and refactorings to add additional braces to an existing string if needed will be very helpful for users to avoid tedious and potentially error-prone manual changes when suddenly the user needs to use the existing number of braces as an actual string component.

We also considered a slightly different form, $"""{{, where the number of braces after the triple quotes controls how interpolations work. This form, while providing a more direct representation of the number of curlies required, doesn't work for single-line strings and cannot be applied to all interpolated strings. We further thought about using the number of quotes to control both the number of braces required for interpolation holes and the number of quotes required to close the string; while this would work for the single-line form, it would require that all interpolation holes are a minimum of 3 braces, but most scenarios we can think of either don't need braces, or only need to represent a single open/close brace. It also cannot be extended to all interpolated strings.

Conclusion

We want interpolation, and we're ok with using the number of $s to control the number of braces.

Alternative Quotes

We considered whether to use ``` or ''' instead of or in addition to """. The ` symbol is concerning because it can be hard for non-English keyboard layouts to hit; while there are symbols in C# today that are already difficult for these layouts to hit, we don't want to deliberately introduce more pain for these users. We also don't like the complexity of either having multiple symbols that can start strings in C#: this proposal is already adding an axis of complexity (for gain we feel is worth it), but we don't think the additional axes of complexity is worth the tradeoff here. It does mean that the single-line version cannot represent a string that starts with a ", but we think this is an OK tradeoff.

Conclusion

Quotes are the only way to start strings. Users that need to start a string with a quote must use the multi-line version.

Miscellaneous conclusions

Even though we could support parsing long strings of quotes on a non-blank line inside a raw string literal, we will require that if a user wants to use 4 quotes in a string, the raw string delimiters must be at least 5 quotes long.

Strings like this are supported:

var z = $$"""
{{{1 + 1}}}
""";

The innermost braces are the interpolation holes, the resulting value here would be {2}.