csharplang/meetings/2018/LDM-2018-10-03.md
2018-10-09 16:24:16 -07:00

14 KiB
Raw Permalink Blame History

C# Language Design Notes for Oct 3, 2018

Agenda

  1. How is the nullable context expressed?
  2. Async streams - which interface shape?

Nullability context

In order to accommodate "null-oblivious" legacy source code while it is under transition, we want to allow regional changes to the context for how nullability is handled. There are actually two interesting "nullability contexts":

  1. Annotation context: should an unannotated type reference in the context be considered nonnullable or oblivious?
  2. Warning context: If null annotations are violated within the context, should a warning be given?

We have learned the hard way that we cannot use regular attributes due to circularities in binding. Essentially, modifying semantics with attributes is a bad idea not just from a "moral" perspective, but, as it turns out, a technical one: you need to do binding to understand the attributes. If the attributes themselves affect binding (as these would), well hmmm.

This is causing us to rethink the context switching experience. We have three general candidate ideas for describing regional changes to the nullability context:

  1. A new modifier
  2. A "fake" or pseudo-attribute
  3. Compiler directives

We don't have even a strawman-level idea for good modifier keywords, so we are going to drop that one from the discussion. For the two others, there are strawman proposals below for the purposes of discussion.

Pseudo-attributes

The idea is to keep the currently implemented attribute-based user experience, but discover the attributes specially, in an earlier phase of the compiler, rather than through normal binding (where it is too late).

Nullable annotations and warnings are controlled by the same attribute, [NonNullTypes]. It has a positional boolean parameter that controls annotations, and defaults to true. It has an additional named boolean parameter that controls warnings and defaults to true.

The attributes override each other hierarchically, and can be applied at the module, type and member levels at least.

Here is an example where nullable annotations are turned off for some members, and warnings are turned off for another member:

[module:NonNullTypes]

public class Dictionary<TKey, TValue> : ICollection<KeyValuePair<TKey, TValue>>, IEnumerable<KeyValuePair<TKey, TValue>>, IEnumerable, IDictionary<TKey, TValue>, IReadOnlyCollection<KeyValuePair<TKey, TValue>>, IReadOnlyDictionary<TKey, TValue>, ICollection, IDictionary, IDeserializationCallback, ISerializable
{
    public Dictionary() { }
    public Dictionary(IDictionary<TKey, TValue> dictionary) { }
    public Dictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection) { }
    public Dictionary(IEqualityComparer<TKey>? comparer) { }
    [NonNullTypes(false)] public Dictionary(int capacity) { }
    [NonNullTypes(false)] public Dictionary(IDictionary<TKey, TValue> dictionary, IEqualityComparer<TKey> comparer) { }
    [NonNullTypes(false)] public Dictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection, IEqualityComparer<TKey> comparer) { }
    public Dictionary(int capacity, IEqualityComparer<TKey>? comparer) { }
    [NonNullTypes(warn = false)] protected Dictionary(SerializationInfo info, StreamingContext context) { }
}

Some consequences of the attribute being "fake":

  • argument expressions would have to be literals
  • there's generally a trade off between expressiveness and effort
  • maybe we wouldn't even be able to allow the named parameter
  • What are implications to the semantic model in the Roslyn API? Should you be able to tell the difference?

Directives

The idea is to control nullable annotations and warnings as separate #-prefixed compiler directives that apply lexically to all source code until undone by another directive:

  • #pragma warning disable null and #pragma warning restore null for turning warnings off and on (simply using the null keyword as a special diagnostic name for the existing feature)
  • #nonnull disable and #nonnull restore for turning nullable annotations off and on (reusing the disable and restore keywords from pragma warnings)

Here is the same example as above, using the directives approach:

public class Dictionary<TKey, TValue> : ICollection<KeyValuePair<TKey, TValue>>, IEnumerable<KeyValuePair<TKey, TValue>>, IEnumerable, IDictionary<TKey, TValue>, IReadOnlyCollection<KeyValuePair<TKey, TValue>>, IReadOnlyDictionary<TKey, TValue>, ICollection, IDictionary, IDeserializationCallback, ISerializable
{
    public Dictionary() { }
    public Dictionary(IDictionary<TKey, TValue> dictionary) { }
    public Dictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection) { }
    public Dictionary(IEqualityComparer<TKey>? comparer) { }
 
#nonnull disable
    public Dictionary(int capacity) { }
    public Dictionary(IDictionary<TKey, TValue> dictionary, IEqualityComparer<TKey> comparer) { }
    public Dictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection, IEqualityComparer<TKey> comparer) { }
#nonnull restore
 
    public Dictionary(int capacity, IEqualityComparer<TKey>? comparer) { }
 
#pragma warning disable null
    protected Dictionary(SerializationInfo info, StreamingContext context) { }
#pragma warning restore null
}

A consequence:

  • Might require a compiler switch for the global opt-in - which we were happy to get rid of when we first adopted the attribute approach.

Discussion

The purpose of the feature is to toggle contextual information for a region of code: a) whether unannotated type references in the region should be interpreted as nonnullable, and b) whether warnings should be yielded for violations of nullable intent within the region.

It is uncommon for attributes to affect context. Modifiers sometimes do (unsafe, checked/unchecked), and directives sometimes do (#pragma warning). Attributes usually directly affect the entity to which they are attached, rather than elements (declarations, expressions, etc.) within it.

It is also uncommon - and usually considered undesirable - for attributes to affect semantics. In fact, it is affecting semantics that is causing the problem in the first place, because attributes also depend on semantics.

Syntactically, directives stand out from the syntax, and generally indent to the left margin. They are about the source code, not part of it. Attributes are enmeshed with the code itself, and stand out less. Which signal do we want to send? "The rules of the game have changed in this region of source code" vs "This class or member is special with regards to nullability"?

Attributes are syntactically limited in their granularity they can only apply to certain nodes in the syntax tree. Directives are free to appear between any two tokens (as long as there are line breaks between them), including at both a smaller scale (in between statements and expressions) and a larger scale (around multiple types or even namespaces) than attributes. They can also modify top-level non-metadata constructs such as using aliases.

Wherever we infer obliviousness for a specific declaration, directives would let you make that same declaration explicitly (albeit awkwardly), whereas attributes would generally be unable to.

On the other hand, in practice your desired granularity would often be at the member or type level, where attributes would do just fine. The ability of directives to turn off and then on at different syntactic nesting levels is just weird, and hardly useful. Maybe the natural prevention attributes provide against such anomalies is a good thing. Then again, other directives already have this ability, and that doesnt seem to cause trouble in practice.

Inside of method bodies, change of the warning context seems more likely than the annotation context.

Attributes are the means by which we would have to encode the context in metadata regardless. So using attributes in syntax would be more direct than having to come up with a scheme for generating attributes from weirdly placed directives. While changing context around a using alias or in the middle of a member body wouldnt need to have direct metadata effects, the same is not the case for a context change inside, say, a constraints clause. We would need to either invent an encoding scheme, or error on such places.

Directives would more likely allow editor-config integration.

Attributes would maybe complicate the semantic model.

Finally, "pseudo-attributes", recognized specially by the compiler, are a new concept to the language. Is it worth it? In practice, though, the seams are mostly not going to show: a user wont generally need to worry that its not a regular attribute. Conversely, #nonnull would be a new directive: Is it worth it?

Conclusion

Given all this, we are in favor of the directive approach. We will start from the strawman and refine over the coming weeks.

Async streams

The idea behind async streams is to make them analogous to synchronous IEnumerable<T>s, allowing them to be consumed by foreach in asynchronous methods, and produced by yield returning asynchronous iterator methods. The natural shape of the IAsyncEnumerable<T> interface is therefore simply a straightforwardly "async'ified" version of IEnumerable<T>, like this:

public interface IAsyncEnumerable<out T>
{
    IAsyncEnumerator<T> GetAsyncEnumerator();
}

public interface IAsyncEnumerator<out T> : IAsyncDisposable
{
    ValueTask<bool> MoveNextAsync();
    T Current { get; }
}

public interface IAsyncDisposable
{
    ValueTask DisposeAsync();
}

This works well. The main difference (other than Async occurring in names) is that MoveNextAsync is asynchronous, so that you need to await it to learn if there is a next value, and before Current can be assumed to contain that value. Just as the semantics of foreach can be described (and is described in the language spec) as an expansion to a while loop, foreach await is an almost identical expansion, except with some awaiting going on.

It is also generally quite efficient. The use of ValueTask instead of Task allows implementers (including the ones produced from iterators by the compiler) to avoid any allocations during iteration time, storing any state needed to track the asynchrony alongside the iteration state in the enumerator object that implements IAsyncEnumerator<T> (which can often in turn be shared with the implementation of IAsyncEnumerable<T> itself). Thus, the number of allocations needed to iterate an IAsyncEnumerable<T> will typically be zero, and occasionally (in the case of concurrent iterations) one (for a second enumerator that can't be shared with the enumerable).

One problem remains, performance-wise, and it is shared with the original synchronous IEnumerable<T>: It requires two interface calls per loop iteration, whereas with clever tricks that could be brought down to "usually one" in cases where the next value is most often available synchronously. The best IAsyncEnumerator<T> design we can come up with that satisfies those properties is the following:

public interface IAsyncEnumerator<out T> : IAsyncDisposable
{
    ValueTask<bool> WaitForNextAsync();
    T TryGetNext(out bool success);
}

Here the TryGetNext method can be used to loop synchronously through all the readily available values. Only when it yields false do we need to fall back to an outer loop that awaits calls to WaitForNextAsync() until more data is available.

In the degenerate case where TryGetNext always yields false, this is not faster: you fall back to the outer loop and always have two interface calls. However, the more often TryGetNext yields true, the more often we can stay in the inner loop and skip an interface call.

The interface has several drawbacks, though:

  • It is meaningfully different from the synchronous IAsyncEnumerator<T>
  • TryGetNext is unidiomatic, in that it switches its success result and its value result, in order to allow the type parameter to be covariant, making it annoying to manually consume
  • The meaning of the two methods is less intuitively clear
  • The double loop consumption is more complicated

It's a dilemma between simple and fast. The performance benefit of the fast version can be up to 2x, when most elements are available synchronously, and the work in the loop is small enough to be dominated by the interface calls. But it really is a lot harder to use manually. While that doesn't matter when you use foreach and iterators, there are still enough residual scenarios where you really need to produce or consume the interface directly. An example is the implementation of Zip.

There are ways to have the two approaches coexist. If we support both of them statically, we'd end up exploding e.g. async LINQ with vast amounts of surface area. That'd be terrible and confusing.

But there's also a dynamic approach. Already today, many LINQ query operators are implemented to do type checks on their inputs to see if they support a faster implementation. For instance, Count on an array can just access the Length instead of iterating to count the elements.

Similarly we could have a fast async enumerator interface that generally lives under the hood. But implementations of e.g. LINQ methods can type check for the fast interface and use it if applicable. The code generated for foreach could even do that too, though that probably leads to complicated code. We can also generate iterators that implement both interfaces.

Conclusion

We want to stick with the simple IEnumerator<T>-like interface in general. We'll keep the "fast under the hood" option in our back pocket and decide after the first preview whether to apply that, but the surface area that people see and trade in should be the simple, straightforward "port" of IEnumerator<T> to "async space".