csharplang/proposals/rejected/recordsv2.md

11 KiB

Records v2

In the past we've thought about records as a feature to enable working with data.

"Working with data" is a big group with a number of facets, so it may be interesting to look at each in isolation. Let's start by looking at an example of records today and some of its drawbacks.

For instance, a simple record would be defined today as follows

public class UserInfo
{
    public string Username { get; set; }
    public string Email { get; set; }
    public bool IsAdmin  { get; set; } = false;
}

and the usage would read

void M()
{
    var userInfo = new UserInfo() 
    {
        Username = "andy",
        Email = "angocke@microsoft.com",
        IsAdmin = true
    };
}

There are significant advantages in this code:

  1. The definition is version resilient, properties can easily be added or moved
  2. Properties can be set in any order, and the names in the initialization always match the accessors
  3. Properties with default values can simply be skipped

The first flaw is that the properties must now be mutable.

Mutability

What we'd like is for C# to provide a way to set a readonly member in object initializers. Since some types may not have been designed with this initialization in mind, we'd also like it to be opt-in.

The proposed solution is a new modifier, initonly, that can be applied to properties and fields:

public class UserInfo
{
    public initonly string Username { get; }
    public initonly string Email { get; }
    public initonly bool IsAdmin { get; } = false;
}

The codegen for this is surprisingly straight forward: we just set the readonly field. Specifically, the lowered properties would look like:

public class UserInfo
{
    private readonly string <Backing>_username;
    public string get_Username() => <Backing>_username;
    [return: modreq(initonly)]
    public void set_Username(string value) { <Backing>_username = value; }
    ...
}

The CLR considers setting readonly fields to be unverifiable, but not unsafe. To support a more advanced verifier, the following rule is proposed: a readonly field can be modified only inside initonly methods, or on a new object that is on the CLR stack and has not been published via a store or method call.

This should solve many of the problems with mutability in the UserInfo example, while not requiring complicated or brittle emit strategies. However, immutability does present a new problem: easily constructing an object with changes.

With-ing

When programming with immutability, making changes to an object is done by constructing a copy with changes instead of making the changes directly on the object. Unfortunately, there's no convenient way to do this in C#, even with current-style records. It's been previously proposed that some kind of autogenerated "With" method be provided for records that implements that functionality. If we have such a mechanism, it's important that it work with initonly members. To achieve this, it's proposed that we add a with expression, analogous to an object initializer. Sample usage would be as follows:

var userInfo = new UserInfo() 
{
    Username = "andy",
    Email = "angocke@microsoft.com",
    IsAdmin = true
};
var newUserName = userInfo with { Username = "angocke" };

The resulting newUserName object would be a copy of userInfo, with Username set to "angocke". The codegen on the with expression would also be similar to the object initializer: a new object is constructed, and then the initonly Username setter would be called in the method body.

Of course, the difference here is that the new object being constructed is not a simple new object creation, it is a duplicate of the original object. To provide this functionality, we require that the object provide a "With constructor" that provides a duplicate object. A sample With constructor would look like:

class UserInfo
{
    ...
    [WithConstructor] // placeholder syntax, up for debate
    public UserInfo With()
    {
        return new UserInfo() { Username = this.Username, Email = this.Email, IsAdmin = this.IsAdmin };
    }
}

Notably, the with expression will set initonly members, just like the object initializer, so to support verification we must ensure that the object cannot have been published before the initonly members are set. To enforce this, the WithConstructor attribute (or equivalent syntax) will enforce a new rule for the method: all return statements must directly contain an object creation expression, possibly with an object initializer.

If the With constructor requires validation, the user may introduce a constructor to do that validation, e.g.

class UserInfo
{
    ...
    private UserInfo(UserInfo original)
    {
        // validation code
    }
    [WithConstructor]
    public UserInfo With() => new UserInfo(this);
}

The last piece of complexity associated with With is inheritance. If your record is extensible, you will need to provide a new With for the subclass. This can be achieved as follows:

class Base
{
    ...
    protected Base(Base original)
    {
        // validation
    }
    [WithConstructor]
    public virtual Base With() => new Base(this);
}
class Derived : Base
{
    ...
    protected Derived(Derived original)
    : base(original)
    {
        // validation
    }
    [WithConstructor]
    public override Derived With() => new Derived(this);
}

Note one additional piece of complexity here: in order to override the With constructor with the derived type the language will also need to support covariant return types in overrides. There is already a separate proposal for this feature here.

Drawbacks

  • Making all return statements in WithConstructors contain new object expressions is restrictive. This could be possibly be mitigated by flow analysis that ensures the new object doesn't escape the method
  • Supporting variance in overrides through compiler tricks will require stub methods, which will grow quadratically with the inheritance depth. The need for a stub method is due to a runtime requirement that override signatures match exactly. If the runtime requirement were loosened, the stub methods would not be required at all.
  • Using chained constructors of the form Type(Type original) effectively reserves that constructor for the use of the pattern. Since constructors have unique semantics and cannot be re-named this could be limiting.

Wrapping it all up: Records

The above features enable a style of programming that was very difficult before. But even with the new features it could be quite verbose and error prone to annotate everything yourself. There are also a few items, like Equals and GetHashCode, which can already be written today, it's just laborious. Moreover, a significant flaw in implementing equality on top of these new primitives is that structural equality is something that should change with your data type as new data is added, but when handling it manually it is likely that these things can get out of sync.

Therefore, it is proposed that C# support new syntax for records, not for providing new features, but for setting defaults and generating code designed for use in records. Example syntax would look like

data class UserInfo
{
    public string Username { get; }
    public string Email { get; }
    public bool IsAdmin { get; } = false;
}

The generated code for this class would regard all public fields and auto-properties as structural members of the record. Record members could be customized using a new RecordMember(bool) attribute that could be used to either include or exclude members. Record members would be initonly by default and equality would be autogenerated for the class based on the record members. At any point the behavior of these members could be customized simply by declaring them in source. The user-written implementation would replace the default implementation in all pattern usage.

Note that equality in the face of inheritance is complex, but seems to have been adequately solved in the other records proposal.

Primary constructors

Previous record proposal have also included a new syntax for a parameter list on the type itself, e.g.

class Point(int X, int Y);

In the new design, the parameter list would be an orthogonal C# feature, which could be cleanly integrated with records. If a primary constructor is included in a record, it would have new defaults, just like public fields and auto-properties: the parameters in the primary constructor would be used to generate public record-member properties with the same name. In addition, the primary constructor could now be used to auto-generate a deconstructor.

For example, the following record with a primary constructor

data class Point(int X, int Y);

would be equivalent to

data class Point
{
    public int X { get; }
    public int Y { get; }

    public Point(int x, int y)
    {
        X = x;
        Y = y;
    }

    public void Deconstruct(out int X, out int Y)
    {
        X = this.X;
        Y = this.Y;
    }
}

and the final generation of the above would be

class Point
{
    public initonly int X { get; }
    public initonly int Y { get; }

    public Point(int x, int y)
    {
        X = x;
        Y = y;
    }

    protected Point(Point other)
    : this(other.X, other.Y)
    { }

    [WithConstructor]
    public virtual Point With() => new Point(this);

    public void Deconstruct(out int X, out int Y)
    {
        X = this.X;
        Y = this.Y;
    }

    // Generated equality
}

Note that we've taken one other piece of information into account for a data class with a primary constructor: instead of setting the primary fields inside the generated protected constructor, we delegate to the primary constructor. If the Point class had another non-primary record member, e.g.

data class Point(int X, int Y)
{
    public int Z { get; }
}

then that would change the generated protected constructor as follows:

class Point
{
    // ...
    protected Point(Point other)
    : this(other.X, other.Y)
    {
        Z = other.Z;
    }
    // ...
}

Notably, this doesn't answer what to do about inheritance of records with primary constructors. For instance,

data class A(int X, int Y);
data class B(int X, int Y, int Z) : A;

Rather than resolving in an arbitrary manner, a more explicit approach could require that a parameter list be provided with the base list, e.g.

data class A(int X, int Y);
data class B(int X, int Y, int Z) : A(X, Y);

The parameter list in the base list would then be applied to a base call in the generated primary constructor:

class B
{
    // ..
    public B(int x, int y, int z)
    : base(x, y)
    // ..
}

As for what a primary constructor could mean outside of a record, that is still open to further proposal.