Implementing IXmlSerializable in readonly structs

Background

There are three things you need to know to start with:

Operations on read-only variables which are value types copy the variable value first. I’ve written about this before on this blog. C# 7.2 addresses this by introducing the readonly modifier for structs. See the language proposal for more details. I was touched to see that the proposal references my blog post :)

The ref readonly local variables and in parameter features in C# 7.2 mean that “read-only variables” are likely to be more common in C# than they have been in the past.

Noda Time includes many value types which implement IXmlSerializable. Noda Time implements IXmlSerializable.ReadXml by assigning to this: fundamentally IXmlSerializable assumes a mutable type. I use explicit interface implementation to make this less likely to be used directly on an unboxed variable. With a generic method using an interface constraint you can observe a simple method call mutating a non-read-only variable, but that’s generally harmless.

Adding the readonly modifier to Noda Time structs

I relished news of the readonly modifier for struct declarations. At last, I can remove my ReadWriteForEfficiency attribute! Hmm. Not so much. To be fair, some of the structs (7 out of 18) were fine. But every struct that implements IXmlSerializable gave me this error:

Cannot assign to ‘this’ because it is read-only

That’s reasonable: in members of a readonly struct, this effectively becomes an in parameter instead of a ref parameter. But how can we fix that? Like any sensible developer, I turned to Stack Overflow, which has precisely the question I’m interested in. It even has an answer! Unfortunately, it amounts to a workaround: assigning to this via unsafe code.

Violating readonly with unsafe code

To give a concrete example of the answer from Stack Overflow, here’s my current LocalTime code:

void IXmlSerializable.ReadXml([NotNull] XmlReader reader)
{
    Preconditions.CheckNotNull(reader, nameof(reader));
    var pattern = LocalTimePattern.ExtendedIso;
    string text = reader.ReadElementContentAsString();
    this = pattern.Parse(text).Value;
}

Here’s the code that compiles when LocalTime is marked as readonly, after enabling unsafe blocks:

unsafe void IXmlSerializable.ReadXml([NotNull] XmlReader reader)
{
    Preconditions.CheckNotNull(reader, nameof(reader));
    var pattern = LocalTimePattern.ExtendedIso;
    string text = reader.ReadElementContentAsString();
    fixed (LocalTime* thisAddr = &this)
    {
        *thisAddr = pattern.Parse(text).Value;
    }
}

Essentially, the unsafe code is bypassing the controls around read-only structs. Just for kicks, let’s apply the same change
throughout Noda Time, and think about what would happen…

As it happens, that fix doesn’t work universally: ZonedDateTime is a “managed type” because it contains a reference (to a DateTimeZone) which means you can’t create a pointer to it. That’s a pity, but if we can make everything else readonly, that’s a good start. Now let’s look at the knock-on effects…

Safe code trying to abuse the unsafe code

Let’s try to abuse our “slightly dodgy” implementations. Here’s a class we’ll put in a “friendly” assembly which is trying to be as helpful as possible:

using NodaTime;

public class AccessToken
{
    private readonly Instant expiry;
    public ref readonly Instant Expiry => ref expiry;

    public AccessToken(Instant expiry) => this.expiry = expiry;
}

Great – it lets you get at the expiry time of an access token without even copying the value.

The big test is: can we break this friendly’s code’s assumptions about expiry really not changing its value?

Here’s code I expected to mutate the access token:

static void MutateAccessToken(AccessToken accessToken)
{
    ref readonly Instant expiry = ref accessToken.Expiry;
    string xml = "<evil>2100-01-01T00:00:00Z</evil>";
    EvilReadXml(in expiry, xml);
}

static void EvilReadXml<T>(in T value, string xml) where T : IXmlSerializable
{
    var reader = XmlReader.Create(new StringReader(xml));
    reader.Read();
    value.ReadXml(reader);
}

We have an in parameter in EvilReadXml, so the expiry variable is being passed by reference, and then we’re calling ReadXml on that parameter… so doesn’t that mean we’ll modify the parameter, and thus the underlying expiry field in the object?

Nope. Thinking about it, the compiler doesn’t know when it compiles EvilReadXml that T is a readonly struct – it could be a regular struct. So it has to create a copy of the value before calling ReadXml.

Looking the the spec proposal, there’s one interesting point in the section on ref extension methods:

However any use of an in T parameter will have to be done through an interface member. Since all interface members are considered mutating, any such use would require a copy.

Hooray! That suggests it’s safe – at least for now. I’m still worried though: what if the C# team adds a readonly struct generic constraint in C# 8? Would that allow a small modification of the above code to break? And if interface methods are considered mutating anyway, why doesn’t the language know that when I’m trying to implement the interface?

But hey, now that we know unsafe code is able to work around this in our XML implementation, what’s to stop nasty code from using the same trick directly?

Unsafe code abusing safe code

Imagine we didn’t support XML serialization at all. Could unsafe code mutate AccessToken? It turns out it can, very easily:

static void MutateAccessToken(AccessToken accessToken)
{
    ref readonly Instant expiry = ref accessToken.Expiry;
    unsafe
    {
        fixed (Instant* expiryAddr = &expiry)
        {
            *expiryAddr = Instant.FromUtc(2100, 1, 1, 0, 0);
        }
    }
}

This isn’t too surprising – unsafe code is, after all, unsafe. I readily admit I’m not an expert on .NET security, and I know the
landscape has changed quite a bit over time. These days I believe the idea of a sandbox has been somewhat abandoned – if you care about executing code you don’t really trust, you do it in a container and use that as your security boundary. That’s about the limit of my knowledge though, which could be incorrect anyway.

Where do we go from here?

At this point, I’m stuck making a choice between several unpleasant ones:

  • Leave Noda Time public structs as non-read-only. That prevents users from writing efficient code to use them without copying.
  • Remove XML serialization from Noda Time 3.0. We’re probably going to remove binary serialization anyway on the grounds that Microsoft is somewhat-discouraging it going forward, and it’s generally considered a security problem. However, I don’t think XML serialization is considered to be as bad, so long as you use it carefully, preventing arbitrary type loading. (See this paper for more information.)
  • Implement XML serialization using unsafe code as shown above. Even though my attempt at abusing it failed, that doesn’t provide very much comfort. I don’t know what other ramifications there might be for including unsafe code. Does that limit where the code can be deployed? It also doesn’t help with my concern about a future readonly struct constraint.

Thoughts very welcome. Reassurances from the C# team that “readonly struct” constraints won’t happen particularly welcome… along with alternative ways of implementing XML serialization.

7 thoughts on “Implementing IXmlSerializable in readonly structs”

  1. Maybe when “readonly struct” constraints happen they’ll bring Shapes along for the ride, so we can have something like

    public shape SXmlSerializable
    {
    void WriteXml(XmlWriter writer);
    static T ReadXml(XmlReader reader);
    }

    Like

  2. Ideas:
    – Implement XML serialization without using IXmlSerializable (e.g. with a factory function for read as shown above).
    – Introduce a 2nd type (that may implement IXmlSerializable) and offer from/to options for your struct.

    In all the years I’ve been using XML serialization I’ve never needed IXmlSerializable. It’s no loss at all to not support it.

    Like

    1. Both of those options seem to make it harder for the users of my library to serialize their data. They expect to have data models with LocalDate properties etc, and for those to be serialized transparently for them. How would either of your suggestions help there? I think there’s a huge difference between an application author who wants to serialize their own data models, and a library author who wants application authors to be able to use their components in a natural way.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s