csharplang/meetings/2021/LDM-2021-06-02.md
2021-07-27 12:34:38 -07:00

8.8 KiB

C# Language Design Meeting for June 2nd, 2021

Agenda

  1. Enhanced #line directives
  2. Lambda return type parsing
  3. Records with circular references

Quote of the Day

  • If you make it .., does that mean the end is included or not?

Discussion

Enhanced #line directives

https://github.com/dotnet/csharplang/issues/4747

This proposal covers a set of enhancements for the #line directive that solve several problems that the razor team has had over the years with the existing directive causing mismapping of code in the debugger. While the razor team is the main user and driver of this scenario, these enhancements will help any DSL authors that embed C# in their code and want to have a debugging experience. The issue today is that nested expressions don't have sequence points emitted correctly, and putting the expression on a newline is often not a sufficient solution to ensure that they end up correctly represented.

We brainstormed a few different ways to determine the span of an expression for this mapping. Inherently, the enhanced directive needs to be able to specify the length of the mapped line, because users might end up putting multiple separate fragments of C# code on the same line in the original text file. To be able to represent that accurately, we therefore need to ensure that the exact length is represented. We considered whether we could do this via the total length of the mapped expression, rather than specifying the end line/column of the directive. However, there are a few concerns with that approach:

  1. Files are often written and checked into source control with different encodings/line endings, most commonly differing between \r\n and \n for line endings. For maps that span multiple lines, therefore, the total character count could end up being wrong after compilation depending on what system created the mapping and what built it.
  2. For humans attempting to author and debug generators that use these directives, line and column are easier to map to the real code that they're writing. Every editor includes this information when editing a file based on the current cursor location. This makes understanding what is happening easier.

We also considered how the line and column information is defined. Today, C# in general defines specific characters that are newlines (in part to ensure that the existing #line directives are well-defined). This doesn't change with the new proposal, and some editors don't always agree on what constitutes a new line (such as vertical tab), this proposal doesn't change these inconsistencies. For character offsets, we define it as UTF-16 characters, following with the rest of the .NET ecosystem here. If the ecosystem ever wants to attempt to change to a different encoding, it will be a much larger effort and adjusting the semantics of #line should be trivial by comparison.

Finally, we looked at a few different syntax options for the directive, as we felt that just 5 numbers next to each other was unnecessarily difficult to read:

  1. #line (10,20)-(11,22) 3 "a.razor"
  2. #line (10:20)-(11:22) 3 "a.razor"
  3. #line 10:20-11:22 3 "a.razor"
  4. #line 10,20-11,22 3 "a.razor"
  5. #line (10,20)-(11,22):3 "a.razor"
  6. #line (10,20) (11,22) 3 "a.razor"
  7. #line (10,20):(11,22) 3 "a.razor"
  8. #line (10,20)..(11,22) 3 "a.razor"

We think options without the parens are a bit confusing, because - binds more tightly than , or : in c#, making it look like 10 , (20-11) , 22. We also think that , is the more common line/column separator. Given this, we'll go with option 1.

Conclusions

Existing proposal is accepted with the following clarifications:

  1. Column info is described as UTF-16 characters.
  2. The syntax is #line (number , number) - (number , number) number string.

Lambda return type parsing

https://github.com/dotnet/csharplang/blob/main/proposals/csharp-10.0/lambda-improvements.md

In a previous LDM, we decided that lambda return types should go before the parameter list, as in ReturnType (ParamType param) => { ... }. This has presented some challenging parsing scenarios.

F(b ? () => a : c); // Conditional expression or lambda that returns 'b?'?

This issue is very similar to generics and greater-than/less-than parsing, and can benefit from the same solution: we specifically define a set of characters after the lambda body in the language and use them to determine, at that point, whether the expression is a ternary or a lambda with a return type. For example, if token after the lambda is a ), it couldn't have been a ternary. This will take effort, but should be doable.

Another parsing problem comes from multiplication:

F(x * () => y) // Is it multiplication, or a lambda with a return type of x*?

Here, our saving grace is that * binds more tightly than the lambda expression today, so this code does not parse. This should allow us to disambiguate the cases here. Again, this will take effort, but should be doable.

While considering these cases, we also thought about lambda naming as a potential easier disambiguation point. We could allow lambdas to be named (perhaps just with a _ to start), and require them to have a _ to use the return type. Further, we need to consider this now because the identifier () => {} syntax can only mean one thing: identifier will either need to be the return type, or the name of the lambda. As we continue to narrow the gap between lambda expressions and local functions, it stands to reason that we may at some point want to give them names (despite the phrase "named lambda" being a bit of an oxymoron). Further, choosing to make the single-identifier case mean the type instead of the name breaks with the precedent we have around lambda parameters, where the name can be specified without the parameter type.
On the other hand, we do feel that the return type is the more common thing to want to specify for a lambda. While named lambdas could be useful for recursive scenarios, we don't see this as a driving need, while the return type is explicitly being driven to enable .NET 6 scenarios. Additionally, if we allowed lambda names for recursion we'd sign ourselves up for recursive type inference, as we'd have to start inferring the return type of lambdas that can call themselves. By saying that the single-identifier case is for specifying the return type, we separate that out into a different feature, and we have a natural type to allow as the return type of the lambda for this case: var. In order to protect this design space, we'll disallow var as the explicit return type of a lambda expression.

Conclusion

We think the existing difficulties should be possible to parse, with a bit of effort. We'll reserve var as an explicit return type to protect our design space around recursive type inference in lambdas, if we ever decide to try and tackle that challenge.

Records with circular references

https://github.com/dotnet/roslyn/issues/48646

In the brief time left at the end of the meeting, we wanted to re-examine our previous position around circularity in record types. There are several ways that users can end up with circular data structures in records, in both obvious (and detectable) ways of direct mutable recursion, such as in a doubly-linked list, and indirectly, via nested mutation. Our initial position in records was that this isn't something we want to try and solve in the language, but could be solved via source generators. There are a number of points where users might want to customize aspects of record code generation, such as using a different equality comparison option for strings, or doing sequence/set equality for collections. The question we want to answer, therefore, is whether we should move the cliff to generators up a bit further and add a method of excluding specific fields from record semantics to the language, and whether we should add a new warning wave warning to inform users when they're setting themselves up for direct recursion issues. This warning can't be perfect, however: it cannot catch mutually-recursive types where one type has a mutable field of the other type, particularly when fields of a non-sealed type are involved. For example, record A has a mutable field of type IDisposable. Record B has an immutable field of type A and implements IDisposable. A is constructed, then B is constructed with that instance of A, then A's mutable field is set to B. Unless we warn on every mutable field of a non-sealed type, that case is impossible to detect, and we're cautious of overwarning and then causing users to miss real issues.

Conclusion

A smaller working group will meet to explore this in more depth, and make a recommendation to the LDM about the approach we should take.