Add design notes

This commit is contained in:
Mads Torgersen 2018-07-09 17:07:16 -07:00
parent 2fa167f506
commit 962dfeab6c
7 changed files with 386 additions and 0 deletions

View file

@ -0,0 +1,53 @@
# C# Language Design Notes for May 23, 2018
***Warning: These are raw notes, and still need to be cleaned up. Read at your own peril!***
# Working with data
Collections of data with heterogeneous types.
It's *not* objects, but decoupled from operations.
- tuples
- anonymous types
- classes
All have a mutability issue, just different ways.
Modeling enumerated types is complex and doesn't provide exhaustiveness.
Also, you cannot efficiently switch over different types. You do tricks like visitors, abstract kind properties, etc. Performance, correctness and succinctness are all in conflict. Type patterns help, because they look good, but they are still not as efficient or as safe as they could be.
Technically speaking, mutability and value equality are a dangerous combo on classes. If the object mutates, its equality and hashcode change, meaning you could lose track of them in dictionaries, etc.
On the other hand, C# is mutable by default: should we bend over backwards to not support this combo?
Object initializers aren't strictly necessary, but they jive well with `with` expressions, give a less positional view.
Object initializers are really popular today, in that they are used a lot. But it's unclear whether they are because the declaration site doesn't provide constructors, and instead do the "easy" thing of just offering auto-properties.
Withers: we keep talking about them for readonly data, but they might very well be useful on mutable data as well.
The "data classes" proposal puts weight on *not* being positional. It could even reorder members alphabetically in generated positional constructs such as constructors.
Separately, there is an idea of "named tuples". These are what have previously been proposed as "records". These are an evolution story for tuples.
This may be an overload of concepts. It may be that we can view them as aspects of the same feature; one may be an evolution of the other.
Discriminated syntax: Enum classes. Starts very simple, probably has ways of letting you grow up. That gets messy though, with the nesting, but we could use partial to separate things.
Kind fields must necessarily be unspeakable, and discoverable only by the compiler. Otherwise they can't be efficient.
Records and data classes are extensions of existing constructs. Enum classes are more of a new concept.
We kind of agree on the simple cases. It's how they grow up and fit into the existing world that's fraught with questions.
Interestingly, you sometimes want your "enums" *not* to be exhaustive. This is the case when you expect it to evolve with more cases.

View file

@ -0,0 +1,30 @@
# C# Language Design Notes for May 30, 2018
***Warning: These are raw notes, and still need to be cleaned up. Read at your own peril!***
# Annotating parts of a project
Roslyn starts out with about 2000 warnings. We may want to support nullable annotations for only a part of a project, so that you can gradually ease into them.
This touches on a core question: Are there separate opt-ins for *consuming* nullability (give me warnings please) and *producing* nullability (consider unannotated reference types to be non-nullable; "URTANN").
If we have a scoped "URTANN", then turning on warnings for the whole file would still have limited impact, until annotations start becoming abundant.
But we may also want to consider warnings to be turned on in a scoped way, at a certain granularity. Source files? Fine grained program elements? It might be better not having this, though, as it comes with the risk of introducing more problems (through annotation), without discovering it (because the consumption has warnings off).
``` c#
T M(string s) => M2(s); // No warning because s is oblivious
[URTANN]
T M2(string s) => ...
```
The opt-in to warnings could be a "fake" error code that represents a category. We may not need a brand new dedicated command line option, but we're willing to go there.
1. We agree that we are comfortable (for now at least) in opting in to the warnings at the whole project level only
2. We agree that there should be a separate mechanism for opting in/out of non-null annotations (URTANN)
3. We agree that URTANN should be attribute-based, and able to be applied (or unapplied, with a `false` argument) at multiple levels of program elements
Decision:
Let's have `URTANN(true)` implicit by default, unless you have another module-level URTANN. We can maybe auto-generate it if you don't have it already. It should be called `NonNullTypesAttribute(bool)`.

View file

@ -0,0 +1,81 @@
# C# Language Design Notes for Jun 4, 2018
***Warning: These are raw notes, and still need to be cleaned up. Read at your own peril!***
# Nullable flow analysis
``` c#
static void F(object? x)
{
object y = null;
Action f = () => y = x; // warning?
if (x == null) return;
f();
y.ToString(); // warning?
}
```
A proposal is to take the most nullable the variable can be and use that state in a lambda that captures it.
(This is only relevant for top-level nullability, as nested nullability is not tracked through flow.)
There's a slight concern that we could run into cycles, where "the most nullable state" depends on "the most nullable state". We think that it is not going to be an issue.
``` c#
M(string? x, string? y)
{
Action f = () => y = x;
x = null;
//M1(f);
x = "";
M2(f);
y.Length;
}
```
It seems a shame that optimizing like this, by caching the delegate, could lead to more nullable warnings. We could imagine tracking delegates through locals and basing the analysis on where they are used (called or passed).
Envisioned safe tightenings:
* Only care about null assignments that happen after the lambda becomes "effective"
* If a lambda goes into a local variable, then it is only "effective" when that local variable is used
* Such a local, when invoked directly, does not depend on future null states
Another option is to just be more loose about the whole thing, and allow there to be holes. Specifically we could assume that the lambda is executed either when it appears or not at all. The hole is that this does not accurately account for deferred execution.
Thinking about the effect of the lambda above on `y`, it can happen at any time after the lambda. So at any point after that, the null state of y would be a "superposition" of null states. So if a lambda makes a variable `MayBeNull`, it would be irreversably `MayBeNull` for the remainder of the method. *Even right after a null check!* It's on perpetual lockdown!
This seems draconian. It feels stronger than our position on dotted names, which lets us assume that the null state remains stable between observing and using. That suggests we should be at least somewhat loose. We could maybe just assume that the lambda is conditionally executed at the point of capture, and we assume that mutations don't happen later.
Aside: lambdas that assign captured variables aren't as esoteric as they may seem. For instance, our whole ecosystem around analyzers relies on a recommended pattern that does that.
## Conclusion
We analyze a lambda as if it is conditionally executed only at the point of capture into a delegate. It does rely on order of execution to an uncomfortable degree. Some of the refinements we've considered could be introduced later.
For instance, these two examples would behave differently if `y` comes in non-null and `x` comes in maybe-null:
``` c#
void M1(Action, ref string s);
M1(() => y = x, ref y);
void M2(ref string s, Action);
M2(ref y, () => y = x);
```
Side note: There's a similar granularity issue with the nullability attributes, whether they should apply to parameters as soon as that parameter is encountered, or only after the whole argument list.
# Local functions
When a local function is captured into a delegate, we just do the same as for lambdas: assume a conditional execution at the point of capture.
When local functions are called, abstractly speaking we should do the same as for definite assigment, where the *requirements* and the *effect* are inferred for each local function, and then applied in place where the function is called. The difficulty of course comes in when functions are recursive. For definite assignment, we can prove that a recursive analysis terminates, because of the monotony of definite assignment (you never become *un*assigned). Can we make a similar argument/proof for nullability?
# Conditional attributes and special annotations
Let's change `EnsuresTrueAttribute` to `AssertsTrueAttribute`, and always take them into account, even when the condition is false.

View file

@ -0,0 +1,76 @@
# C# Language Design Meeting
***Warning: These are raw notes, and still need to be cleaned up. Read at your own peril!***
## Agenda
Expression trees
# Scenario
## Big data
Bing scenario, sending code from ad hoc queries would be too heavyweight through traditional compiling and sending of assemblies. Also it's not just execution but intense analysis, which feeds into where the query ends up running etc. Introspection.
Currently work around various limitations. Async calls may be part of the query, and there's a lot of code manipulation for that.
## Machine learning and GPU
are other important scenarios: capturing code for auto-differentiation (ML) or for translation to GPU executable code.
# Restriction as a feature
Would allowing more nodes remove a useful restriction, that people depend on today? For instance, most new nodes we add may not be sensible in EF. We've always had this problem, and been able to write something that fails at runtime.
Partial solutions:
- provide alternative factories with a pluggable builder pattern, driven by the target type (just like we now do for task-like types)
- offer analyzers that are domain specific.
Provider model: Probably not that unrealistic to implement your own factory. Plugging it through the language probably requires some extra work, but it seems doable. This would allow static checking of shape, and would give analyzers something to latch off of for further restriction of nodes.
The expression lambda rewriter today is nicely isolated in the Roslyn codebase and only ~1100 lines.
There's a further step we could take, which is to freeze current expression trees in time. Then whoever wants to understand new language features needs to use a new expression type.
# Reduction
Expression trees have been significantly expanded since the language tied into them. There is an extension story, where a node can be reducible. It then offers to translate into other nodes. This helps older providers still work, as well as lets you not have to do the work to implement all the nodes, only the irreducible ones.
We don't want to overdo reduction. We probably wouldn't reduce async lambdas to trees representing the state machine. That would overly commit us to some very specific implementation choices, and probably also leave performance opportunities on the table.
# Philosophy
It's a big question whether we want to commit to expression trees following the language in perpetuity. But we don't need to commit to that. As long as we feel an upgrade is useful.
# Extending existing nodes
This is not just about adding new nodes, but also expanding existing ones with new expressiveness. We'd have to be careful to keep generating old factory method calls for existing situations.
# Bart's experiment
github.com/bartdesmet/ExpressionFutures/tree
Inherits and shadows `System.Linq.Expressions.Expression`.
Supports dynamic, async. Needed some hacks because it is a seperate library, if it goes into the BCL it would have access to the right things.
Supports null-conditional, discard.
Statements are generally supported up to C# 6.0.
Since these are a shadow library over System.Linq.Expressions, there's an updated Roslyn compiler that also understands those.
# Plan
What we would like to do:
* Use factory methods as the well-defined interface between compiler and API
* Feel good about incrementally improving without having to do all of the language at once
* Add provider model/builder pattern for other factories
* Get to a prioritized list of specific language constructs to start supporting
* Evolve compiler and API together for specific constructs
* Checkin with existing providers such as EF to ensure compat story is good, and their scenarios are taken care of
For now, let's mull it over for a couple of months, pursue information, crisp up the plan. Only after a while will we be able to free up compiler resources.

View file

@ -0,0 +1,129 @@
# C# Language Design Notes for Jun 25, 2018
***Warning: These are raw notes, and still need to be cleaned up. Read at your own peril!***
## Agenda
1. Target-typed new-expressions
# Target-typed new-expressions
## Syntax
``` c#
C c = new (...){ ... };
```
You can leave off either the constructor parameters `(...)` or initializer `{ ... }` but not both, just as when the type is in.
## Conversion
This will only work if a) we can determine a unique constructor for `C` through overload resolution, and b) the object/collection initializer binds appropriately.
But are these errors part of *conversion* or part of the expression itself? It doesn't matter in a simple example like this, but it matters in overload resolution.
## Overload resolution
There are two philosophies we can take on what happens when a target-typed new-expression is passed to an overloaded method.
### "late filter" approach
Don't try to weed out overload candidates that won't work with the new-expression, thus possibly causing an ambiguity down the line, or selecting a candidate that won't work. If we make it through, we will do a final check to bind the constructor and object initializer, and if we can't, we'll issue an error.
This reintroduces the notion of "conversion exists with errors" which we just removed in C# 7.3.
### "early filter" approach
Consider arguments to constructor, as well as member names in object initializer, as part of applicability of a given overload. Could even consider conversions to members in object initializer. The question is how far to go.
### Trade-off
The "early filter" approach is more likely to ultimately succeed - it weeds out things that will fail later before they get picked. It does mean that it relies more on the specifics of the chosen target type for overload resolution, so it is more vulnerable to changes to those specifics.
``` c#
struct S1 { public int x; }
struct S2 {}
M(S1 s1);
M(S2 s2);
M(new () { x = 43 }); // ambiguous with late filter, resolved with early. What does the IDE show?
```
Adding constructors to the candidate types can break both models. Adding fields, properties, members called `Add`, implementing `IEnumerable` can all potentially break in the early filter model.
``` c#
M2(Func<S1> f);
M2(Func<S2> f);
M2(() => new () { x = 43 });
S1 Foo() => new () { x = 43 };
```
Even if we did late filtering, this would probably work (i.e. the `S2` overload would fail), because "conversion with error" would give an error in the lambda, which in itself rules out the overload.
We're having a hard time thinking of practical scenarios where the difference really matters. Only if we go to the "extremely early" position where the expression could contribute even to type inference. We've previously considered:
``` c#
M<T>(C<T> c);
M(new C (...) { ... });
```
Where the type arguments to `C` could be left off and inferred from the `new` expression. This would take it a bit further and allow
``` c#
M (new (...) {...});
```
In that same setup, contributing to type inference from the innards of an implicit `new` expression.
## Conclusion
We are good with late checking for now. This does mean that we reintroduce the notion of conversion with errors.
## Breaking change
As mentioned this introduces a new kind of breaking change in source code, where adding a constructor can influence overload resolution where a target-typed new expression is used in the call.
## Unconstructable types
That said, we could define a set of types which can never be target types for `new` expressions. That is not subject to the same worries as the discussion above, where the innards of the `new` expression could potentially affect overload resolution. These are overloads where no implicit `new` expression could ever work.
Candidates for unconstructable types:
* Pointer types
* array types
* abstract classes
* interfaces
* enums
Tuples *are* constructable. You can use `ValueTuple` overloads.
Delegates are constructable.
## Nullable value types
Without special treatment, they would only allow the constructors of nullable itself. Not very useful. Should they instead drive constructors of the underlying type?
``` c#
S? s = new (){}
```
### Conclusion
Yes
## Natural type
Target-typed new doesn't have a natural type. In the IDE experience we will drive completion and errors from the target type, offering constructor overloads and members (for object initializers) based on that.
## Newde
Should we allow stand-alone `new` without any type, constructor arguments or initializers?
No. We don't allow `new C` either.
## Dynamic
We don't allow `new dynamic()`, so we shouldn't allow `new()` with `dynamic` as a target type.
For constructor parameters that are `dynamic` there is no new/special problem.

View file

@ -0,0 +1,5 @@
# C# Language Design Notes for Jul 9, 2018
## Agenda (planned)
new using features

View file

@ -0,0 +1,12 @@
# C# Language Design Notes for Jul 16, 2018
## Agenda (planned)
Null-related features
1. Null-coalescing assignment (`??=`) https://github.com/dotnet/csharplang/issues/34
* Also in this proposal are `||=` and `&&=`, but its not clear if theyre actually championed.
2. Null-conditional await (`await?`) https://github.com/dotnet/csharplang/issues/35
* As part of this, I think we should also consider other constructs that could benefit. On my list here are yield and foreach.
3. Null operator support for pointer types https://github.com/dotnet/csharplang/issues/418
* Array dereferences (`p?[a]`), pointer dereferences (`p?->a`), and null coalescing operators (`p ?? q`) are on the list here.