Add design notes

This commit is contained in:
Mads Torgersen 2018-10-09 16:24:16 -07:00
parent f5d6fb7fa6
commit 349fd796e9
3 changed files with 265 additions and 10 deletions

View file

@ -0,0 +1,61 @@
# C# Language Design Notes for Oct 1, 2018
## Agenda
1. Nullable type inference
2. Type parameters and nullability context
# Nullable type inference
Generic type inference is also used to determine "best common type", e.g. in anonymously typed arrays, finding the inferred return type of lambdas with multiple returns, etc.
Type inference roughly proceeds as follows:
1. Gather up all the types of the constituent expressions (some, e.g. `null`, may not have a type to contribute)
2. Among those, gather the ones to which all the constituent expressions have an appropriate implicit conversion
3. If these are all identity convertible to each other, construct the result from them
This third step is a little cryptic. Most of the time, when types are identity convertible to each other, they are the *same* type. But there are two exceptions in the language today:
- `object` and `dynamic` are identity convertible to each other
- Tuple types with pairwise identity convertible element types are identity convertible, regardless of element names
In step 3 above, in places where the identity convertible types differ by `object` vs `dynamic`, choose dynamic. Where they differ by tuple element names, have the tuple element be unnamed.
This is all relevant to nullable reference types, because we are about to introduce a third way in which non-identical types can be identity convertible to each other:
- Identity conversion between reference types is determined without regard to their nullability (though if the conversion is performed, it may lead to a warning)
This means we need to say how to construct the result of type inference with regard to nullability of reference types. Where the identity convertible types differ by nullability, we'll determine the nullability based on the variance of the type's position:
- **Covariant**: A type is in a covariant position if it is
- a top level type,
- a type argument to a covariant type parameter of a generic type in a covariant position
- a type argument to a contravariant type parameter if a generic type in a contravariant positition
- **Contravariant**: A type is in a contravariant position if it is:
- a type argument to a contravariant type parameter of a generic type in a covariant position
- a type argument to a covariant type parameter if a generic type in a contravariant positition
- **Invariant**: A type is in an invariant position if it is:
- a type argument to an invariant type parameter
- a type argument to a type parameter of a generic type in an invariant position
For a given type position in the result type, we'll always pick among the nullabilities present in that position, with one exception.
- In a covariant position pick nullable if present, otherwise oblivious if present, otherwise nonnullable
- In a contravariant position pick nonnullable if present, otherwise oblivious if present, otherwise nullable
- In an invariant position:
- if both nullable and nonnullable are present, then yield a warning and pick oblivious (this is the one exception)
- otherwise if either nullable or nonnullable is present, pick that one
- otherwise pick oblivious
This leads to nice and symmetric rules, where nullable and nonnullable are treated equally, and oblivious isn't too infectious. As far as we can tell, the rules are associative and can be expressed in a pairwise manner, without causing order dependence. If oblivious had dominated nullable and nonnullable in the invariant case, that would have thwarted associativity.
The one thing that's a little odd is where nullable and nonnullable clash in an invariant position. Ideally this would lead to an error, but we only want to allow nullability to lead to warnings, not errors, so we need to have some answer for what comes out. Oblivious seems the right choice, in that we've already warned that something is wrong, and it will lead to suppression of further warnings caused by that. What's odd about it is that oblivious normally comes from legacy code that's explicitly opted out of the nullability feature. This is the only place where it can occur in new code that is all "opted in".
# Type parameters declared and used in different nullability contexts
Allowing a more granular in-file choice between nullability contexts (whether nullable annotations are on or off) leads to new kinds of situations. For instance, a type parameter can be declared in an "off" context (oblivious to nullability) but used in an "on" context. This is the topic of [Roslyn issue 30214](https://github.com/dotnet/roslyn/issues/30214).
The context where the type parameter is declared determines whether it should be sensitive to the nullability implications of its constraints. In the example in the issue, the type parameters are oblivious, and should not lead to nullability diagnostics, because they are declared in a "legacy" context.

View file

@ -0,0 +1,189 @@
# C# Language Design Notes for Oct 3, 2018
## Agenda
1. How is the nullable context expressed?
2. Async streams - which interface shape?
# Nullability context
In order to accommodate "null-oblivious" legacy source code while it is under transition, we want to allow regional changes to the context for how nullability is handled. There are actually two interesting "nullability contexts":
1. Annotation context: should an unannotated type reference in the context be considered nonnullable or oblivious?
2. Warning context: If null annotations are violated within the context, should a warning be given?
We have learned the hard way that we cannot use regular attributes due to circularities in binding. Essentially, modifying semantics with attributes is a bad idea not just from a "moral" perspective, but, as it turns out, a technical one: you need to do binding to understand the attributes. If the attributes themselves affect binding (as these would), well hmmm.
This is causing us to rethink the context switching experience. We have three general candidate ideas for describing regional changes to the nullability context:
1. A new modifier
2. A "fake" or pseudo-attribute
3. Compiler directives
We don't have even a strawman-level idea for good modifier keywords, so we are going to drop that one from the discussion. For the two others, there are strawman proposals below for the purposes of discussion.
## Pseudo-attributes
The idea is to keep the currently implemented attribute-based user experience, but discover the attributes specially, in an earlier phase of the compiler, rather than through normal binding (where it is too late).
Nullable annotations and warnings are controlled by the same attribute, `[NonNullTypes]`. It has a positional boolean parameter that controls annotations, and defaults to true. It has an additional named boolean parameter that controls warnings and defaults to true.
The attributes override each other hierarchically, and can be applied at the module, type and member levels at least.
Here is an example where nullable annotations are turned off for some members, and warnings are turned off for another member:
``` c#
[module:NonNullTypes]
public class Dictionary<TKey, TValue> : ICollection<KeyValuePair<TKey, TValue>>, IEnumerable<KeyValuePair<TKey, TValue>>, IEnumerable, IDictionary<TKey, TValue>, IReadOnlyCollection<KeyValuePair<TKey, TValue>>, IReadOnlyDictionary<TKey, TValue>, ICollection, IDictionary, IDeserializationCallback, ISerializable
{
public Dictionary() { }
public Dictionary(IDictionary<TKey, TValue> dictionary) { }
public Dictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection) { }
public Dictionary(IEqualityComparer<TKey>? comparer) { }
[NonNullTypes(false)] public Dictionary(int capacity) { }
[NonNullTypes(false)] public Dictionary(IDictionary<TKey, TValue> dictionary, IEqualityComparer<TKey> comparer) { }
[NonNullTypes(false)] public Dictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection, IEqualityComparer<TKey> comparer) { }
public Dictionary(int capacity, IEqualityComparer<TKey>? comparer) { }
[NonNullTypes(warn = false)] protected Dictionary(SerializationInfo info, StreamingContext context) { }
}
```
Some consequences of the attribute being "fake":
- argument expressions would have to be literals
- there's generally a trade off between expressiveness and effort
- maybe we wouldn't even be able to allow the named parameter
- What are implications to the semantic model in the Roslyn API? Should you be able to tell the difference?
## Directives
The idea is to control nullable annotations and warnings as separate `#`-prefixed compiler directives that apply lexically to all source code until undone by another directive:
- `#pragma warning disable null` and `#pragma warning restore null` for turning warnings off and on (simply using the `null` keyword as a special diagnostic name for the existing feature)
- `#nonnull disable` and `#nonnull restore` for turning nullable annotations off and on (reusing the `disable` and `restore` keywords from pragma warnings)
Here is the same example as above, using the directives approach:
``` c#
public class Dictionary<TKey, TValue> : ICollection<KeyValuePair<TKey, TValue>>, IEnumerable<KeyValuePair<TKey, TValue>>, IEnumerable, IDictionary<TKey, TValue>, IReadOnlyCollection<KeyValuePair<TKey, TValue>>, IReadOnlyDictionary<TKey, TValue>, ICollection, IDictionary, IDeserializationCallback, ISerializable
{
public Dictionary() { }
public Dictionary(IDictionary<TKey, TValue> dictionary) { }
public Dictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection) { }
public Dictionary(IEqualityComparer<TKey>? comparer) { }
#nonnull disable
public Dictionary(int capacity) { }
public Dictionary(IDictionary<TKey, TValue> dictionary, IEqualityComparer<TKey> comparer) { }
public Dictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection, IEqualityComparer<TKey> comparer) { }
#nonnull restore
public Dictionary(int capacity, IEqualityComparer<TKey>? comparer) { }
#pragma warning disable null
protected Dictionary(SerializationInfo info, StreamingContext context) { }
#pragma warning restore null
}
```
A consequence:
- Might require a compiler switch for the global opt-in - which we were happy to get rid of when we first adopted the attribute approach.
## Discussion
The purpose of the feature is to toggle *contextual information* for a region of code: a) whether unannotated type references in the region should be interpreted as nonnullable, and b) whether warnings should be yielded for violations of nullable intent within the region.
It is uncommon for attributes to affect context. Modifiers sometimes do (unsafe, checked/unchecked), and directives sometimes do (#pragma warning). Attributes usually directly affect the entity to which they are attached, rather than elements (declarations, expressions, etc.) within it.
It is also uncommon - and usually considered undesirable - for attributes to affect semantics. In fact, it is affecting semantics that is causing the problem in the first place, because attributes also *depend* on semantics.
Syntactically, directives stand out from the syntax, and generally indent to the left margin. They are *about* the source code, not part of it. Attributes are enmeshed with the code itself, and stand out less. Which signal do we want to send? "The rules of the game have changed in this region of source code" vs "This class or member is special with regards to nullability"?
Attributes are syntactically limited in their *granularity* they can only apply to certain nodes in the syntax tree. Directives are free to appear between any two tokens (as long as there are line breaks between them), including at both a smaller scale (in between statements and expressions) and a larger scale (around multiple types or even namespaces) than attributes. They can also modify top-level non-metadata constructs such as using aliases.
Wherever we *infer* obliviousness for a specific declaration, directives would let you make that same declaration explicitly (albeit awkwardly), whereas attributes would generally be unable to.
On the other hand, in practice your desired granularity would often be at the member or type level, where attributes would do just fine. The ability of directives to turn off and then on *at different syntactic nesting levels* is just weird, and hardly useful. Maybe the natural prevention attributes provide against such anomalies is a good thing. Then again, other directives already have this ability, and that doesnt seem to cause trouble in practice.
Inside of method bodies, change of the warning context seems more likely than the annotation context.
Attributes are the means by which we would have to encode the context in metadata regardless. So using attributes in syntax would be more direct than having to come up with a scheme for generating attributes from weirdly placed directives. While changing context around a using alias or in the middle of a member body wouldnt need to have direct metadata effects, the same is not the case for a context change inside, say, a constraints clause. We would need to either invent an encoding scheme, or error on such places.
Directives would more likely allow editor-config integration.
Attributes would maybe complicate the semantic model.
Finally, "pseudo-attributes", recognized specially by the compiler, are a new concept to the language. Is it worth it? In practice, though, the seams are mostly not going to show: a user wont generally need to worry that its not a regular attribute. Conversely, `#nonnull` would be a new directive: Is it worth it?
## Conclusion
Given all this, we are in favor of the directive approach. We will start from the strawman and refine over the coming weeks.
# Async streams
The idea behind async streams is to make them analogous to synchronous `IEnumerable<T>`s, allowing them to be consumed by `foreach` in asynchronous methods, and produced by `yield return`ing asynchronous iterator methods. The natural shape of the `IAsyncEnumerable<T>` interface is therefore simply a straightforwardly "async'ified" version of `IEnumerable<T>`, like this:
``` c#
public interface IAsyncEnumerable<out T>
{
IAsyncEnumerator<T> GetAsyncEnumerator();
}
public interface IAsyncEnumerator<out T> : IAsyncDisposable
{
ValueTask<bool> MoveNextAsync();
T Current { get; }
}
public interface IAsyncDisposable
{
ValueTask DisposeAsync();
}
```
This works well. The main difference (other than `Async` occurring in names) is that `MoveNextAsync` is asynchronous, so that you need to await it to learn if there is a next value, and before `Current` can be assumed to contain that value. Just as the semantics of `foreach` can be described (and *is* described in the language spec) as an expansion to a while loop, `foreach await` is an almost identical expansion, except with some `await`ing going on.
It is also generally quite efficient. The use of `ValueTask` instead of `Task` allows implementers (including the ones produced from iterators by the compiler) to avoid any allocations during iteration time, storing any state needed to track the asynchrony alongside the iteration state in the enumerator object that implements `IAsyncEnumerator<T>` (which can often in turn be shared with the implementation of `IAsyncEnumerable<T>` itself). Thus, the number of allocations needed to iterate an `IAsyncEnumerable<T>` will typically be zero, and occasionally (in the case of concurrent iterations) one (for a second enumerator that can't be shared with the enumerable).
One problem remains, performance-wise, and it is shared with the original synchronous `IEnumerable<T>`: It requires *two* interface calls per loop iteration, whereas with clever tricks that could be brought down to "usually one" in cases where the next value is most often available synchronously. The best `IAsyncEnumerator<T>` design we can come up with that satisfies those properties is the following:
``` c#
public interface IAsyncEnumerator<out T> : IAsyncDisposable
{
ValueTask<bool> WaitForNextAsync();
T TryGetNext(out bool success);
}
```
Here the `TryGetNext` method can be used to loop synchronously through all the readily available values. Only when it yields `false` do we need to fall back to an outer loop that awaits calls to `WaitForNextAsync()` until more data is available.
In the degenerate case where `TryGetNext` always yields `false`, this is not faster: you fall back to the outer loop and always have two interface calls. However, the more often `TryGetNext` yields true, the more often we can stay in the inner loop and skip an interface call.
The interface has several drawbacks, though:
- It is meaningfully different from the synchronous `IAsyncEnumerator<T>`
- `TryGetNext` is unidiomatic, in that it switches its success result and its value result, in order to allow the type parameter to be covariant, making it annoying to manually consume
- The meaning of the two methods is less intuitively clear
- The double loop consumption is more complicated
It's a dilemma between simple and fast. The performance benefit of the fast version can be up to 2x, when most elements are available synchronously, and the work in the loop is small enough to be dominated by the interface calls. But it really is a lot harder to use manually. While that doesn't matter when you use `foreach` and iterators, there are still enough residual scenarios where you really need to produce or consume the interface directly. An example is the implementation of `Zip`.
There are ways to have the two approaches coexist. If we support both of them statically, we'd end up exploding e.g. async LINQ with vast amounts of surface area. That'd be terrible and confusing.
But there's also a dynamic approach. Already today, many LINQ query operators are implemented to do type checks on their inputs to see if they support a faster implementation. For instance, `Count` on an array can just access the `Length` instead of iterating to count the elements.
Similarly we could have a fast async enumerator interface that generally lives under the hood. But implementations of e.g. LINQ methods can type check for the fast interface and use it if applicable. The code generated for foreach could even do that too, though that probably leads to complicated code. We can also generate iterators that implement both interfaces.
## Conclusion
We want to stick with the simple `IEnumerator<T>`-like interface in general. We'll keep the "fast under the hood" option in our back pocket and decide after the first preview whether to apply that, but the surface area that people see and trade in should be the simple, straightforward "port" of `IEnumerator<T>` to "async space".

View file

@ -129,6 +129,21 @@ Triage:
5. `params Span<T>`
6. Nullable reference type features on `Nullable<T>`
## Oct 1, 2018
[C# Language Design Notes for Oct 1, 2018](LDM-2018-10-01.md)
1. Nullable type inference
2. Type parameters and nullability context
## Oct 3, 2018
[C# Language Design Notes for Oct 3, 2018](LDM-2018-10-03.md)
1. How is the nullable context expressed?
2. Async streams - which interface shape?
# Upcoming meetings
## Sep 24, 2018
@ -139,16 +154,6 @@ Triage:
- Warning waves (Jason, Tom)
## Oct 1, 2018
- Nullable type inference with respect to null-oblivious types (Chuck, Fred)
- Null toggle syntax
## Oct 3, 2018
- Null toggle syntax
- Async streams (Steve, Immo)
## Oct 10, 2018
- Pattern matching open questions (Neal)