Add new "nominal" records proposal

This commit is contained in:
Andy Gocke 2019-07-18 21:27:18 -07:00
parent 6a09af1205
commit 856c335cc5

354
proposals/recordsv2.md Normal file
View file

@ -0,0 +1,354 @@
# Records v2
In the past we've thought about records as a feature to enable working with data.
"Working with data" is a big group with a number of facets, so it may be interesting to look at
each in isolation. Let's start by looking at an example of records today and some of its drawbacks.
For instance, a simple record would be defined today as follows
```C#
public class UserInfo
{
public string Username { get; set; }
public string Email { get; set; }
public bool IsAdmin { get; set; } = false;
}
```
and the usage would read
```C#
void M()
{
var userInfo = new UserInfo()
{
Username = "andy",
Email = "angocke@microsoft.com",
IsAdmin = true
};
}
```
There are significant advantages in this code:
1. The definition is version resilient, properties can easily be added or moved
2. Properties can be set in any order, and the names in the initialization always
match the accessors
3. Properties with default values can simply be skipped
The first flaw is that the properties must now be mutable.
# Mutability
What we'd like is for C# to provide a way to set a `readonly` member in object initializers.
Since some types may not have been designed with this initialization in mind, we'd also like
it to be opt-in.
The proposed solution is a new modifier, `initonly`, that can be applied to
properties and fields:
```C#
public class UserInfo
{
public initonly string Username { get; }
public initonly string Email { get; }
public initonly bool IsAdmin { get; } = false;
}
```
The codegen for this is surprisingly straight forward: we just set the readonly field.
Specifically, the lowered properties would look like:
```C#
public class UserInfo
{
private readonly string <Backing>_username;
public string get_Username() => <Backing>_username;
[return: modreq(initonly)]
public void set_Username(string value) { <Backing>_username = value; }
...
}
```
The CLR considers setting readonly fields to be unverifiable, but not unsafe. To support
a more advanced verifier, the following rule is proposed: a readonly field can be modified
only inside `initonly` methods, or on a new object that is on the CLR stack and has not been
published via a store or method call.
This should solve many of the problems with mutability in the `UserInfo` example, while not
requiring complicated or brittle emit strategies. However, immutability does present a new
problem: easily constructing an object with changes.
# With-ing
When programming with immutability, making changes to an object is done by constructing a
copy with changes instead of making the changes directly on the object. Unfortunately, there's
no convenient way to do this in C#, even with current-style records. It's been previously
proposed that some kind of autogenerated "With" method be provided for records that implements
that functionality. If we have such a mechanism, it's important that it work with `initonly`
members. To achieve this, it's proposed that we add a `with` expression, analogous to an object
initializer. Sample usage would be as follows:
```C#
var userInfo = new UserInfo()
{
Username = "andy",
Email = "angocke@microsoft.com",
IsAdmin = true
};
var newUserName = userInfo with { Username = "angocke" };
```
The resulting `newUserName` object would be a copy of `userInfo`, with `Username` set to "angocke".
The codegen on the `with` expression would also be similar to the object initializer: a new object
is constructed, and then the `initonly` `Username` setter would be called in the method body.
Of course, the difference here is that the new object being constructed is not a simple new object
creation, it is a duplicate of the original object. To provide this functionality, we require that
the object provide a "With constructor" that provides a duplicate object. A sample `With` constructor
would look like:
```C#
class UserInfo
{
...
[WithConstructor] // placeholder syntax, up for debate
public UserInfo With()
{
return new UserInfo() { Username = this.Username, Email = this.Email, IsAdmin = this.IsAdmin };
}
}
```
Notably, the `with` expression will set `initonly` members, just like the object initializer, so to
support verification we must ensure that the object cannot have been published before the `initonly`
members are set. To enforce this, the `WithConstructor` attribute (or equivalent syntax) will enforce
a new rule for the method: all return statements must directly contain an object creation expression,
possibly with an object initializer.
If the `With` constructor requires validation, the user may introduce a constructor to do that validation,
e.g.
```C#
class UserInfo
{
...
private UserInfo(UserInfo original)
{
// validation code
}
[WithConstructor]
public UserInfo With() => new UserInfo(this);
}
```
The last piece of complexity associated with `With` is inheritance. If your record is extensible, you
will need to provide a new `With` for the subclass. This can be achieved as follows:
```C#
class Base
{
...
protected Base(Base original)
{
// validation
}
[WithConstructor]
public virtual Base With() => new Base(this);
}
class Derived : Base
{
...
protected Derived(Derived original)
: base(original)
{
// validation
}
[WithConstructor]
public override Derived With() => new Derived(this);
}
```
Note one additional piece of complexity here: in order to override the `With` constructor with
the derived type the language will also need to support covariant return types in overrides.
There is already a separate proposal for this feature
[here](https://github.com/dotnet/csharplang/blob/725763343ad44a9251b03814e6897d87fe553769/proposals/covariant-returns.md).
**Drawbacks**
- Making all return statements in `WithConstructor`s contain new object expressions is restrictive.
This could be possibly be mitigated by flow analysis that ensures the new object doesn't escape
the method
- Supporting variance in overrides through compiler tricks will require stub methods, which will
grow quadratically with the inheritance depth. The need for a stub method is due to a runtime
requirement that override signatures match exactly. If the runtime requirement were loosened,
the stub methods would not be required at all.
- Using chained constructors of the form `Type(Type original)` effectively reserves that constructor
for the use of the pattern. Since constructors have unique semantics and cannot be re-named this
could be limiting.
## Wrapping it all up: Records
The above features enable a style of programming that was very difficult before. But even with
the new features it could be quite verbose and error prone to annotate everything yourself. There
are also a few items, like Equals and GetHashCode, which can already be written today, it's just laborious.
Moreover, a significant flaw in implementing equality on top of these new primitives is that
structural equality is something that should change with your data type as new data is added, but
when handling it manually it is likely that these things can get out of sync.
Therefore, it is proposed that C# support new syntax for records, not for providing new features,
but for setting defaults and generating code designed for use in records. Example syntax would
look like
```C#
data class UserInfo
{
public string Username { get; }
public string Email { get; }
public bool IsAdmin { get; } = false;
}
```
The generated code for this class would regard all public fields and auto-properties as structural
members of the record. Record members could be customized using a new `RecordMember(bool)` attribute
that could be used to either include or exclude members. Record members would be `initonly` by default
and equality would be autogenerated for the class based on the record members. At any point the behavior
of these members could be customized simply by declaring them in source. The user-written implementation
would replace the default implementation in all pattern usage.
Note that equality in the face of inheritance is complex, but seems to have been
adequately solved in the [other records proposal](records.md).
## Primary constructors
Previous record proposal have also included a new syntax for a parameter list on the type itself, e.g.
```C#
class Point(int X, int Y);
```
In the new design, the parameter list would be an orthogonal C# feature, which could be cleanly integrated
with records. If a primary constructor is included in a record, it would have new defaults, just like
public fields and auto-properties: the parameters in the primary constructor would be used to generate
public record-member properties with the same name. In addition, the primary constructor could now be
used to auto-generate a deconstructor.
For example, the following record with a primary constructor
```C#
data class Point(int X, int Y);
```
would be equivalent to
```C#
data class Point
{
public int X { get; }
public int Y { get; }
public Point(int x, int y)
{
X = x;
Y = y;
}
public void Deconstruct(out int X, out int Y)
{
X = this.X;
Y = this.Y;
}
}
```
and the final generation of the above would be
```C#
class Point
{
public initonly int X { get; }
public initonly int Y { get; }
public Point(int x, int y)
{
X = x;
Y = y;
}
protected Point(Point other)
: this(other.X, other.Y)
{ }
public virtual Point With() => new Point(this);
public void Deconstruct(out int X, out int Y)
{
X = this.X;
Y = this.Y;
}
// Generated equality
}
```
Note that we've taken one other piece of information into account
for a data class with a primary constructor: instead of setting
the primary fields inside the generated protected constructor, we delegate
to the primary constructor. If the Point class had another non-primary
record member, e.g.
```C#
data class Point(int X, int Y)
{
public int Z { get; }
}
```
then that would change the generated protected constructor as follows:
```C#
class Point
{
// ...
protected Point(Point other)
: this(other.X, other.Y)
{
Z = other.Z;
}
// ...
}
```
Notably, this doesn't answer what to do about inheritance of records
with primary constructors. For instance,
```C#
data class A(int X, int Y);
data class B(int X, int Y, int Z) : A;
```
Rather than resolving in an arbitrary manner, a more explicit approach
could require that a parameter list be provided with the base list, e.g.
```C#
data class A(int X, int Y);
data class B(int X, int Y, int Z) : A(X, Y);
```
The parameter list in the base list would then be applied to a `base` call
in the generated primary constructor:
```C#
class B
{
// ..
public B(int x, int y, int z)
: base(x, y)
// ..
}
```
As for what a primary constructor could mean outside of a record, that is still open to further proposal.