Category Archives: C#

Versioning limitations in .NET

This is a blog post I’ve intended to write for a very long time. (Other blog posts in that category include a recipe for tiramisu ice cream, and “knights and allies”.) It’s one of those things that’s grown in my mind over time, becoming harder and harder to start. However, there have been three recent incidents that have brought it back into focus:

TL;DR: Versioning is inherently hard, but the way that .NET infrastructure is set up makes it harder than it needs to be, I suspect.

The sample code for this blog post is available on GitHub.

Refresher: SemVer

NuGet is the de facto standard for distribution of packages now, and it supports semantic versioning, also known as SemVer for short. SemVer version strings (ignoring pre-release versions) are of the form major.minor.patch.

The rules of SemVer sound straightforward from the perspective of a package producer:

  • If you make a breaking change, you need to bump the major version
  • If you make backward compatible additions, you need to bump the minor version
  • If you make backward and forward compatible changes (basically internal implementation changes or documentation changes) you bump the patch version

It also sounds straightforward from the perspective of a package consumer, considering moving from one version to another of a package:

  • If you move to a different major version, your existing code may not work (because everything can change between major versions)
  • If you move to a later minor version within the same major version, your code should still work
  • If you move to an earlier minor version within the same major version, your existing code may not work (because you may be using something that was introduced in the latest minor version)
  • If you move to a later or earlier patch version within the same major/minor version, your code should still work

Things aren’t quite as clear as they sound though. What counts as a breaking change? What kind of bug fix can go into just a patch version? If a change can be detected, it can break someone, in theory at least.

The .NET Core team has a set of rules about what’s considered breaking or not. That set of rules may not be appropriate for every project. I’d love to see:

  • Tooling to tell you what kind of changes you’ve made between two commits
  • A standard format for rules so that the tool from the first bullet can then suggest what your next version number should be; your project can then advertise that it’s following those rules
  • A standard format to record the kinds of changes made between versions
  • Tooling to check for “probable compatibility” of the new version of a library you’re consuming, given your codebase and the record of changes

With all that in place, we would all hopefully be able to follow SemVer reliably.

Importantly, this makes the version number a purely technical decision, not a marketing one. If the current version of your package is (say) 2.3.0, and you add a bunch of features in a backward-compatible way, you should release the new version as 2.4.0, even if it’s a “major” version in terms of the work you’ve put in. Use whatever other means you have to communicate marketing messages: keep the version number technical.

Even with packages that follow SemVer predictably and clearly, that’s not enough for peace and harmony in the .NET ecosystem, unfortunately.

The diamond dependency problem

The diamond dependency problem is not new to .NET, and most of the time we manage to ignore it – but it’s still real, and is likely to become more of an issue over time.

The canonical example of a diamond dependency is where an application depends on two libraries, each of which depends on a common third library, like this:

Common diamond dependency

(I’m using NodaTime as an example so I can refer to specific versions in a moment.)

It doesn’t actually need to be this complicated – we don’t need Lib2 here. All we need is two dependencies on the same library, and one of those can be from the application:

Simplified diamond dependency

Multiple dependencies on the same library are fine, so long as they depend on compatible versions. For example, from our discussion of SemVer above, it should be fine for Lib1 to depend on NodaTime 1.3.0, and App to depend on NodaTime 1.2.0. We expect the tooling to resolve all the dependencies and determine that 1.3.0 is the version to use, and the App code should be fine with that – after all, 1.3.0 is meant to be backward-compatible with 1.2.0. The same is true the other way round, if App depends on later version than Lib1, so long as they’re using the same major version.

(Note: there are potential problems even within a minor version number – if App depends on 1.3.0 and Lib1 depends on 1.3.1 which contains a bug fix, but App has a workaround for the bug which then fails under 1.3.1 when the bug is no longer present. Things like that can definitely happen, but I’ll ignore that kind of problem for the rest of this post, and assume that everything conforms to idealized SemVer.)

Diamond dependencies become a problem under SemVer when the dependencies are two different major versions of the same library. To give a concrete example from the NodaTime package, consider the IClock interface. The 1.4.x version contains a single property, Now. The 2.0.x version has the same functionality, but as a method, GetCurrentInstant(). (This was basically a design failing on my part in v1 – I followed the BCL example of DateTime.Now without thinking clearly enough about whether it should have been a property.)

Now suppose App is built with the .NET Core SDK, and depends on NodaTime 2.0.0, and Lib1 depends on NodaTime 1.3.1 – and let’s imagine a world where that was the only breaking change in NodaTime 2.x. (It wasn’t.) When we build the application, we’d expect 2.0 to be used at execution time. If Lib1 never calls IClock.Now, all is well. Under .NET Core tooling, assembly binding redirects are handled automatically so when Lib1 “requests” NodaTime 1.3.1, it gets NodaTime 2.0.0. (The precise way in which this is done depends on the runtime executing the application. In .NET Core, there’s an App.deps.json file; in desktop .NET it’s App.exe.config. Fortunately this doesn’t matter much at the level of this blog post. It may well make a big difference to what’s viable in the future though.)

If Lib1 does call IClock.Now, the runtime will throw a MissingMethodException. Ouch. (Sample code.)

The upshot is that if the transitive set of “package + version” tuples for your entire application contains more than one major version for the same package, it’s entirely possible that you’ll get an exception at execution time such as MissingMethodException, MissingFieldException, TypeNotFoundException or similar.

If that doesn’t sound too likely, please consider that the Newtonsoft.Json package (Json .NET) has 12 major versions as I’m writing this blog post. I suspect that James Newton-King has kept the breaking changes to an absolute minimum, but even so, it’s pretty scary.

Non-proposals

I’d like to propose some enhancements to tooling that might help to address the issue. Before we look at what I am suggesting, I’d like to mention a few options that I’m not suggesting.

Ignore the problem

I’m surprised that few people seem as worried about versioning as I am. I’ve presented talks on versioning a couple of times, but I don’t remember seeing anyone else do so – and certainly not in a .NET-specific deep-dive way. (My talk isn’t that, either.) It’s possible that there are lots of people who are worried, and they’re just being quiet about it.

This blog post is just part of me trying to agitate the community – including but not limited to Microsoft – into taking this problem seriously. If it turns out that there are already smart people working on this, that’s great. It’s also possible that we can live on the edge of versioning anarchy forever and it will always be a potential nightmare, but only cause a small enough number of failures that we decide we can live with it. That feels like a risk we should at least take consciously though.

Build at head, globally

In 2017, Titus Winters presented C++ as a live at head language at CppCon. It’s a great talk; go watch it. (Yes, it’s an hour and a half long. It’s still worth it. It also states a bunch of what I’ve said above in a slightly different way, so it may be helpful in that sense.) The idea is for everyone to build their application based on full source code, and provide tooling to automatically update consumer code based on library changes.

To go back to the Noda Time IClock example, if I build all the code for my application locally (App, Lib1 and NodaTime) then when NodaTime changes from the IClock.Now property to IClock.GetCurrentInstant(), the code in Lib1 that uses IClock.Now can automatically be changed to use IClock.GetCurrentInstant(), and everyone is happy with the same version. The Abseil project is a library (or collection of libraries) for C++ that embrace this concept.

It’s possible that this could eventually be a good solution for .NET. I don’t know of any technical aspects that mean it could work for C++ but not for .NET. However, it’s so far from our current position that I don’t believe it’s a practical choice at the moment, and I think it makes sense to try this experiment in one language first for a few years, then let other languages consider whether it makes sense for them.

I want to make it very clear that I’m not disagreeing with anything Titus said. He’s a very smart guy, and I explicitly do agree with almost everything I’ve heard him say. If I ever decide that I disagree with some aspect and want to make a public debate about it, I’ll be a lot more specific. Vague arguments are irritating for everyone. But the .NET ecosystem does depend on binary distribution of packages at the moment, and that’s an environment Titus deliberately doesn’t try to address. If someone wants to think about all the practical implications of all the world’s .NET consumers living at head in a source-driven (rather than binary-driven) world, I’d be interested in reading the results of that thinking. It’s certainly more feasible now than it was before .NET Core. But I’m not going there right now.

Never make breaking changes in any library

If we never make any changes that will break anyone, none of this is a problem.

I gave the example of Newtonsoft.Json earlier, and that it’s on major version 12. My guess is that that means there really have been 11 sets of breaking changes, but that they’re sufficiently unlikely to cause real failure that we’ve survived.

In the NodaTime package, I know I have made real breaking changes – it’s currently at version 2.4.x, and I’m planning on a 3.0 release some time after C# 8 comes out. I’ve made (or I’m considering) breaking changes in at least three different ways:

  • Adding members to public interfaces. If you implement those interfaces yourself (which is relatively unlikely) your code will be broken. On the other hand, everyone who wants the functionality I’ve added gets to use it in a clean way.
  • Removing functionality which is either no longer desirable (binary serialization) or shouldn’t have been present to start with. If you still want that functionality, I can only recommend that you stay on old versions.
  • Refactoring existing functionality, e.g. the IClock.Now => IClock.GetCurrentInstant() change, or fixing a typo in a method name. It’s annoying for existing consumers, but better for future consumers.

I want to be able to make all of these changes. They’re all good things in the long run, I believe.

So, those are options I don’t want to take. Let’s look at a few that I think we should pursue.

Proposals

Firstly, well done and thank you for making it this far. Before any editing, we’re about 2000 words into the post at this point. A smarter person might have got this far quicker without any loss of important information, but I hope the background has been useful.

Prerequisite: multi-version support

My proposals require that the runtime support loading multiple assemblies with the same name at the same time. Obviously I want to support .NET Core, so this mustn’t require the use of multiple AppDomains. As far as I’m aware, this is already the case, and I have a small demo of this, running with both net471 and netcoreapp2.0 targets:

// Call SystemClock.Instance.Now in NodaTime 1.3.1
string path131 = Path.GetFullPath("NodaTime-1.3.1.dll");
Assembly nodaTime131 = Assembly.LoadFile(path131);
dynamic clock131 = nodaTime131
    .GetType("NodaTime.SystemClock")
    // Instance is a field 1.x
    .GetField("Instance")
    .GetValue(null);
Console.WriteLine(clock131.Now);

// Call SystemClock.Instance.GetCurrentInstant() in NodaTime 2.0.0
string path200 = Path.GetFullPath("NodaTime-2.0.0.dll");
Assembly nodaTime200 = Assembly.LoadFile(path200);
dynamic clock200 = nodaTime200
    .GetType("NodaTime.SystemClock")
    // Instance is a property in 2.x
    .GetProperty("Instance")
    .GetValue(null);
Console.WriteLine(clock200.GetCurrentInstant());

I’ve used dynamic typing here to avoid having to call the Now property or GetCurrentInstant() method using hand-written reflection, but we have to obtain the clock with reflection as it’s accessed via a static member. This is in a project that doesn’t depend on Noda Time at all in a compile-time sense. It’s possible that introducing a compile-time dependency could lead to some interesting problems, but I suspect those are fixable with the rest of the work below.

On brief inspection, it looks like it’s also possible to load two independent copies of the same version of the same assembly, so long as they’re stored in different files. That may be important later on, as we’ll see.

Proposal: execute with the expected major version

The first part of my proposal underlies all the rest. We should ensure that each library ends up executing against a dependency version that has the same major version it requested. If Lib1 depends on Noda Time 1.3.1, tooling should make sure it always gets >= 1.3.1 and = 1.3.1″ which appears to be the default at the moment, but I don’t mind too much if I have to be explicit. The main point is that when different dependencies require different major versions, the result needs to be multiple assemblies present at execution time, rather than either a build error or the approach of “let’s just hope that Lib1 doesn’t use anything removed in 2.0”. (Of course, Lib1 should be able to declare that it is compatible with both NodaTime 1.x and NodaTime 2.x. It would be good to make that ease to validate, too.)

If the rest of the application already depends on NodaTime 1.4.0 (for example) then it should be fine to stick to the simple situation of loading a single copy of the NodaTime assembly. But if the rest of the application is using 2.0.0 but Lib1 depends on 1.3.1, we should make that work by loading both major versions 1 and 2.

This proposal then leads to other problems in terms of how libraries communicate with each other; the remaining proposals attempt to address that.

Proposal: private dependencies

When describing the diamond dependency problem, there’s one aspect I didn’t go into. Sometimes a library will take a dependency as a pure implementation detail. For example, Lib1 could use NodaTime internally, but expose an API that’s purely in terms of DateTime. On the other hand, Lib1 could expose its use of NodaTime via its public (and protected) API, using NodaTime types for some properties, method parameters, method return types, generic type arguments, base types and so on.

Both scenarios are entirely reasonable, but they have different versioning concerns. If Lib1 uses NodaTime as a “private dependency” then App shouldn’t (in an ideal world) need to care which version of NodaTime Lib1 uses.

However, if Lib1 exposes method with an IClock parameter, the method caller really needs to know that it’s using a 1.3.1. They’ll need to have a “1.3.1 IClock” to pass in. That means App needs to be aware of the version of NodaTime that Lib1 depends on.

I propose that the author of Lib1 should be able to make a decision about whether NodaTime is a “public” or “private” dependency, and express that decision within the NuGet package.

The compiler should be able to validate that a private dependency really isn’t exposed in the public API anywhere. Ideally, I’d like this to be part of the C# language eventually; I think versioning is important enough to be a language concern. It’s reasonable to assert that that ship has sailed, however, and that it’s reasonable to just have a Roslyn analyzer for this. Careful thought is required in terms of transitive dependencies, by the way. How should the compiler/analyzer treat a situation where Lib1 privately depends on NodaTime 1.3.1, but publicly depends on Lib2 that publicly depends on NodaTime 2.0.0? I confess I haven’t thought this through in detail; I first want to get enough people interested that the detailed work is worth doing.

Extern aliases for packages

Private dependencies are relatively simple to think about, I believe. They’re implementation details that should – modulo a bunch of caveats – not impact consumers of the library that has the private dependencies.

Public dependencies are trickier. If App wants to use NodaTime 2.0.0 for almost everything, but needs to pass in a 1.3.1 clock to a method in Lib1, then App effectively needs to depend on both 1.3.1 and 2.0.0. Currently, as far as I’m aware, there’s no way of representing this in a project file. C# as a language supports the idea of multiple assemblies exposing the same types, via extern aliases… but we’re missing a way of expressing that in project files.

There’s already a GitHub issue requesting this, so I know I’m not alone in wanting it. We might have something like:

<ProjectReference Include="NodaTime" Version="1.3.1" ExternAlias="noda1" />
<ProjectReference Include="NodaTime" Version="2.0.0" ExternAlias="noda2" />

then in the C# code you might use:

using noda2::NodaTime;
// Use NodaTime types as normal, using NodaTime 2.0.0

// Then pass a 1.3.1 clock into a Lib1 method:
TypeFromLib1.Method(noda1::NodaTime.SystemClock.Instance);

There’s an assumption here: that each package contains a single assembly. That definitely doesn’t have to be true, and a full solution would probably need to address that, allowing more complex syntax for per-assembly aliasing.

It’s worth noting that it would be feasible for library authors to providing “bridging” packages too. For example, I could provide a NodaTime.Bridging package which allowed you to convert between NodaTime 1.x and NodaTime 2.x types. Sometimes those conversions may be lossy, but they’re at least feasible. The visible immutability of almost every type in Noda Time is a big help here, admittedly – but packages like this could really help consumers.

Here be dragons: shared state

So far I’ve thought of two significant problems with the above proposals, and both involve shared state – but in opposite directions.

Firstly, consider singletons that we really want to be singletons. SystemClock.Instance is a singleton in Noda Time. But if multiple assemblies are loaded, one per major version, then it’s really “singleton per major version.” For SystemClock that’s fine, but imagine if your library decided that it would use grab a process-wide resource in its singleton, assuming that it was okay to do so because after all there’s only be one of them. Maybe you’d have an ID generator which would guarantee uniqueness by incrementing a counter. That doesn’t work if there are multiple instances.

Secondly, we need to consider mutable shared state, such as some sort of service locator that code registered implementations in. Two different libraries with supposedly private dependencies on the same service locator package might each want to register the same type in the service locator. At that point, things work fine if they depend on different major versions of the service locator package, but start to conflict if the implementations happen to depend on the same major version, and end up using the same assembly. Our isolation of the private dependency isn’t very isolated after all.

While it’s reasonable to argue that we should avoid this sort of shared state as far as possible, it’s unreasonable to assume that it doesn’t exist, or that it shouldn’t be considered as part of this kind of versioning proposal. At the very least, we need to consider how users can diagnose issues stemming from this with some ease, even if I suspect it’ll always be slightly tricky.

As noted earlier, it’s possible to introduce more isolation by loading the same assembly multiple times, so potentially each private dependency could really be private. That helps in the second case above, but hurts more in the first case. It also has a performance impact in terms of duplication of code etc.

Here be unknown dragons

I’m aware that versioning is really complicated. I’ve probably thought about it more than most developers, but I know there’s a lot I’m unaware of. I don’t expect my proposals to be “ready to go” without serious amounts of detailed analysis and work. While I would like to help with that work, I suspect it will mostly be done by others.

I suspect that even this detailed analysis won’t be enough to get things right – I’d expect that when there’s a prototype, exposing it to real world dependencies will find a bunch more issues.

Conclusion

I believe the .NET ecosystem has a versioning problem that’s currently not being recognized and addressed.

The intention isn’t that these proposals are final, concrete design docs – the intention is that they help either start the ball rolling, or give an already-rolling-slightly ball a little more momentum. I want the community to openly discuss the problems we’re seeing now, so we get a better handle on the problem, and then work together to alleviate those problems as best we can, while recognizing that perfection is unlikely to be possible.

Lying to the compiler

This morning I tweeted this:

Just found a C# 8 nullable reference types warning in Noda Time. Fixing it by changing Foo(x, x?.Bar) to Foo(x, x?.Bar!) which looks really dodgy… anyone want to guess why it’s okay?

This attracted more interest than I expected, so I thought I’d blog about it.

First let’s unpack what x?.Bar! means. x?.Bar means “if x is non-null, the value of x.Bar; otherwise, the corresponding null value”. The ! operator at the end is introduced in C# 8, and it’s the damn-it operator (more properly the “null-forgiving operator”, but I expect to keep calling it the damn-it operator forever). It tells the compiler to treat the preceding expression as “definitely not null” even if the compiler isn’t sure for whatever reason. Importantly, this does not emit a null check in the IL – it’s a compile-time only change.

When talking about the damn-it operator, I’ve normally given two scenarios where it makes sense:

  • When testing argument validation
  • When you have invariants in your code which allow you to know more than the compiler does about nullability. This is a little bit like a cast: you’re saying you know more than the compiler. Remember that it’s not like a cast in terms of behaviour though; it’s not checked at execution time.

My tweet this morning wasn’t about either of these cases. It’s in production code, and I absolutely believe that it’s possible for x?.Bar to be null. I’m lying to the compiler to get it to stop it emitting a warning. The reason is that in the case where the value is null, it won’t matter that it’s null.

The actual code is in this Noda Time commit, but the code below provides a simplified version. We have three classes:

  • Person, with a name and home address
  • Address, with some properties I haven’t bothered showing here
  • Delivery, with a recipient and an address to deliver to
using System;

public sealed class Address
{
    // Properties here
}

public sealed class Person
{
    public string Name { get; }
    public Address HomeAddress { get; }

    public Person(string name, Address homeAddress)
    {
        Name = name ??
            throw new ArgumentNullException(nameof(name));
        HomeAddress = homeAddress ??
            throw new ArgumentNullException(nameof(homeAddress));
    }
}

public sealed class Delivery
{
    public Person Recipient { get; }
    public Address Address { get; }

    public Delivery(Person recipient)
        : this(recipient, recipient?.HomeAddress!)
    {
    }

    public Delivery(Person recipient, Address address)
    {
        Recipient = recipient ??
            throw new ArgumentNullException(nameof(recipient));
        Address = address ??
            throw new ArgumentNullException(nameof(address));
    }
}

The interesting part is the Delivery(Person) constructor, that delegates to the Delivery(Person, Address) constructor.

Here’s a version that would compile with no warnings:

public Delivery(Person recipient)
    : this(recipient, recipient.HomeAddress)

However, now if recipient is null, that will throw NullReferenceException instead of the (preferred) ArgumentNullException. Remember that nullable reference checking in C# 8 is really just advisory – the compiler does nothing to stop a non-nullable reference variable from actually having a value of null. This means we need to keep all the argument validation we’ve already got.

We could validate recipient before we pass it on to the other constructor:

public Delivery(Person recipient)
    : this(recipient ?? throw new ArgumentNullException(...),
           recipient.HomeAddress)

That will throw the right exception, but it’s ugly and more code than we need. We know that the constructor we’re delegating to already validates recipient – we just need to get that far. That’s where the null-conditional operator comes in. So we can write:

public Delivery(Person recipient)
    : this(recipient, recipient?.HomeAddress)

That will behave as we want it to – if recipient is null, we’ll pass null values as both arguments to the other constructor, and it will validate them. But now the compiler warns that the second argument could be null, and the parameter is meant to be non-null. The solution is to use the damn-it operator:

public Delivery(Person recipient)
    : this(recipient, recipient?.HomeAddress!)

Now we get the behaviour we want, with no redundant code, and no warnings. We’re lying to the compiler and satisfied that we’re doing so sensibly, because recipient?.HomeAddress is only null if recipient is null, and we know that that will be validated first anyway.

I’ve added a comment, as it’s pretty obscure otherwise – but part of me just enjoys the oddity of it all :)

NullableAttribute and C# 8

Background: Noda Time and C# 8

Note: this blog post was written based on experimentation with Visual Studio 2019 preview 2.2. It’s possible that some of the details here will change over time.

C# 8 is nearly here. At least, it’s close enough to being “here” that there are preview builds of Visual Studio 2019 available that support it. Unsurprisingly, I’ve been playing with it quite a bit.

In particular, I’ve been porting the Noda Time source code to use the new C# 8 features. The master branch of the repo is currently the code for Noda Time 3.0, which won’t be shipping (as a GA release) until after C# 8 and Visual Studio 2019 have fully shipped, so it’s a safe environment in which to experiment.

While it’s possible that I’ll use other C# 8 features in the future, the two C# 8 features that impact Noda Time most are nullable reference types and switch expressions. Both sets of changes are merged into master now, but the pull requests are still available so you can see just the changes:

The switch expressions PR is much simpler than the nullable reference types one. It’s entirely an implementation detail… although admittedly one that confused docfx, requiring a few of those switch expressions to be backed out or moved in a later PR.

Nullable reference types are a much, much bigger deal. They affect the public API, so they need to be treated much more carefully, and the changes end up being spread far wide throughout the codebase. That’s why the switch expression PR is a single commit, whereas nullable reference types is split into 14 commits – mostly broken up by project.

Reviewing the public API of a nullable reference type change

So I’m now in a situation where I’ve got nullable reference type support in Noda Time. Anyone consuming the 3.0 build (and there’s an alpha available for experimentation purposes) from C# 8 will benefit from the extra information that can now be expressed about parameters and return values. Great!

But how can I be confident in the changes to the API? My process for making the change in the first place was to enable nullable reference types and see what warnings were created. That’s a great starting point, but it doesn’t necessarily catch everything. In particular, although I started with the main project (the one that creates NodaTime.dll), I found that I needed to make more changes later on, as I modified other projects.

Just because your code compiles without any warnings with nullable reference types enabled doesn’t mean it’s “correct” in terms of the API you want to expose.

For example, consider this method:

public static string Identity(string input) => input;

That’s entirely valid C# 7 code, and doesn’t require any changes to compile, warning-free, in C# 8 with nullable reference types enabled. But it may not be what you actually want to expose. I’d argue that it should look like one of these:

// Allowing null input, and producing nullable output
public static string? Identity(string? input) => input;

// Preventing null input, and producing non-nullable output
public static string Identity(string input)
{
    // Convenience method for nullity checking.
    Preconditions.CheckNotNull(input, nameof(input));
    return input;
}

If you were completely diligent when writing tests for the code before C# 8, it should be obvious which is required – because you’d presumably have something like:

[Test]
public void Identity_AcceptsNull()
{
    Assert.IsNull(Identity(null));
}

That test would have produced a warning in C# 8, and would have suggested that the null-permissive API is the one you wanted. But maybe you forgot to write that test. Maybe the test you would have written was one that would have shown up a need to put that precondition in. It’s entirely possible that you write much more comprehensive tests than I do, but I suspect most of us have some code that isn’t explicitly tested in terms of its null handling.

The important part take-away here is that even code that hasn’t changed in appearance can change meaning in C# 8… so you really need to review any public APIs. How do you do that? Well, you could review the entire public API surface you’re exposing, of course. For many libraries that would be the simplest approach to take, as a “belt and braces” attitude to review. For Noda Time that’s less appropriate, as so much of the API only deals in value types. While a full API review would no doubt be useful in itself, I just don’t have the time to do it right now.

Instead, what I want to review is any API element which is impacted by the C# 8 change – even if the code itself hasn’t changed. Fortunately, that’s relatively easy to do.

Enter NullableAttribute

The C# 8 compiler applies a new attribute to every API element which is affected by nullability. As an example of what I mean by this, consider the following code which uses the #nullable directive to control the nullable context of the code.

public class Test
{
#nullable enable
    public void X(string input) {}

    public void Y(string? input) {}
#nullable restore

#nullable disable
    public void Z(string input) {}
#nullable restore
}

The C# 8 compiler creates an internal NullableAttribute class within the assembly (which I assume it wouldn’t if we were targeting a framework that already includes such an attribute) and applies the attribute anywhere it’s relevant. So the above code compiles to the same IL as this:

using System.Runtime.CompilerServices;

public class Test
{
    public void X([Nullable((byte) 1)] string input) {}    

    public void Y([Nullable((byte) 2)] string input) {}

    public void Z(string input) {}}
}

Note how the parameter for Z doesn’t have the attribute at all, because that code is still oblivious to nullable reference types. But both X and Y have the attribute applied to their parameters – just with different arguments to describe the nullability. 1 is used for not-null; 2 is used for nullable.

That makes it relatively easy to write a tool to display every part of a library’s API that relates to nullable reference types – just find all the members that refer to NullableAttribute, and filter down to public and protected members. It’s slightly annoying that NullableAttribute doesn’t have any properties; code to analyze an assembly needs to find the appropriate CustomAttributeData and examine the constructor arguments. It’s awkward, but not insurmountable.

I’ve started doing exactly that in the Noda Time repository, and got it to the state where it’s fine for Noda Time’s API review. It’s a bit quick and dirty at the moment. It doesn’t show protected members, or setter-only properties, or handle arrays, and there are probably other things I’ve forgotten about. I intend to improve the code over time and probably move it to my Demo Code repository at some point, but I didn’t want to wait until then to write about NullableAttribute.

But hey, I’m all done, right? I’ve just explained how NullableAttribute works, so what’s left? Well, it’s not quite as simple as I’ve shown so far.

NullableAttribute in more complex scenarios

It would be oh-so-simple if each parameter or return type could just be nullable or non-nullable. But life gets more complicated than that, with both generics and arrays. Consider a method called GetNames() returning a list of strings. All of these are valid:

// Return value is non-null, and elements aren't null
List<string> GetNames()

// Return value is non-null, but elements may be null
List<string?> GetNames()

// Return value may be null, but elements aren't null
List<string>? GetNames()

// Return value may be null, and elements may be null
List<string?>? GetNames()

So how are those represented in IL? Well, NullableAttribute has one constructor accepting a single byte for simple situations, but another one accepting byte[] for more complex ones like this. Of course, List<string> is still relatively simple – it’s just a single top-level generic type with a single type argument. For a more complex example, imagine Dictionary<List<string?>, string[]?> . (A non-nullable reference to a dictionary where each key is a not-null list of nullable strings, and each value is a possibly-null array of non-nullable elements. Ouch.)

The layout of NullableAttribute in these cases can be thought of in terms of a pre-order traversal of a tree representing the type, where generic type arguments and array element types are leaves in the tree. The above example could be thought of as this tree:

         Dictionary<,> (not null)
            /               \
           /                 \
 List<> (not null)      Array (nullable)
        |                     |
        |                     |
 string (nullable)      string (not null)

The pre-order traversal of that tree gives us these values:

  • Not null (dictionary)
  • Not null (list)
  • Nullable (string)
  • Nullable (array)
  • Not null (string)

So a parameter declared with that type would be decorated like this:

[Nullable(new byte[] { 1, 1, 2, 2, 1 })]

But wait, there’s more!

NullableAttribute in simultaneously-complex-and-simple scenarios

The compiler has one more trick up its sleeve. When all the elements in the tree are “not null” or all elements in the tree are “nullable”, it simply uses the constructor with the single-byte parameter instead. So Dictionary<List<string>, string[]> would be decorated with Nullable[(byte) 1] and Dictionary<List<string?>?, string?[]?>? would be decorated with Nullable[(byte) 2].

(Admittedly, Dictionary<,> doesn’t permit null keys anyway, but that’s an implementation detail.)

Conclusion

The C# 8 feature of nullable reference types is a really complicated one. I don’t think we’ve seen anything like this since async/await. This post has just touched on one interesting implementation detail. I’m sure there’ll be more posts on nullability over the next few months…

Backward compatibility and overloading

I started writing a blog post about versioning in July 2017. I’ve mostly abandoned it, because I think the topic is too vast for a single post. It potentially needs a whole site/wiki/repository devoted to it. I hope to come back to it at some point, because I believe this is a hugely important topic that doesn’t get as much attention as it deserves.

In particular, the .NET ecosystem is mostly embracing semantic versioning – which sounds great, but does rely on us having a common understanding of what’s meant by a “breaking change”. That’s something I’ve been thinking about quite a lot. One aspect which has struck me forcefully recently is how hard it is to avoid breaking changes when using method overloading. That’s what this post is about, mostly because it’s fun.

First, a quick definition…

Source and binary compatibility

If I can recompile my client code with a new version of the library and it all works fine, that’s source compatible. If I can redeploy my existing client binary with a new version of the library without recompiling, that’s binary compatible. Neither of these is a superset of the other:

  • Some changes are both source and binary incompatible, such as removing a whole public type that you depended on.
  • Some changes are source compatible but binary incompatible, such as changing a public static read-only field into a property.
  • Some changes are binary compatible but source incompatible, such as adding an overload which could cause compile-time ambiguity.
  • Some changes are source and binary compatible, such as reimplementing the body of a method.

So what are we talking about?

I’m going to assume that we have a public library at version 1.0, and we wish to add some overloads in version 1.1. We’re following semantic versioning, so we need to be backward compatible. What does that mean we can and can’t do, and is it a simple binary choice?

In various cases, I’ll present library code at version 1.0 and version 1.1, then “client” code (i.e. code that is using the library) which could be broken by the change. I’m not presenting method bodies or class declarations, as they’re largely irrelevant – focus on the signatures. It should be easy to reproduce any of this if you’re interested though. We’ll imagine that all the methods I present are in a class called Library.

Simplest conceivable change, foiled by method group conversions

The simplest example I can imagine would be adding a parameterized method when there’s a parameterless one already:

// Library version 1.0
public void Foo()

// Library version 1.1
public void Foo()
public void Foo(int x)

Even that’s not completely compatible. Consider this client code:

// Client
static void Method()
{
    var library = new Library();
    HandleAction(library.Foo);
}

static void HandleAction(Action action) {}
static void HandleAction(Action<int> action) {}

In library version 1.0, that’s fine. The call to HandleAction performs a method group conversion of library.Foo to create an Action. In library version 1.1, it’s ambiguous: the method group can be converted to either Action or Action<int>. So it’s not source compatible, if we’re going to be strict about it.

At this point you might be tempted to give up and go home, resolving never to add any overloads, ever again. Or maybe we can say that this is enough of a corner case to not consider it breaking. Let’s call method group conversions out of scope for now.

Unrelated reference types

We get into a different kind of territory when we have overloads with the same number of parameters. You might expect this library change to be non-breaking:

// Library version 1.0
public void Foo(string x)

// Library version 1.1
public void Foo(string x)
public void Foo(FileStream x)

That feels like it should be reasonable. The original method still exists, so we won’t be breaking binary compatibility. The simplest way of breaking source compatibility is to have a call that either works in v1.0 but doesn’t in v1.1, or works in both but does something different in v1.1 than it did in v1.0.

How can a call break between v1.0 and v1.1? We’d have to have an argument that’s compatible with both string and FileStream. But they’re unrelated reference types…

The first failure is if we have a user-defined implicit conversion to both string and FileStream:

// Client
class OddlyConvertible
{
    public static implicit operator string(OddlyConvertible c) => null;
    public static implicit operator FileStream(OddlyConvertible c) => null;
}

static void Method()
{
    var library = new Library();
    var convertible = new OddlyConvertible();
    library.Foo(convertible);
}

Hopefully the problem is obvious: what used to be unambiguous via a conversion to string is now ambiguous as the OddlyConvertible type can be implicitly converted to both string and FileStream. (Both overloads are applicable, neither is better than the other.)

It may be reasonable to exclude user-defined conversions… but there’s a far simpler way of making this fail:

// Client
static void Method()
{
    var library = new Library();
    library.Foo(null);
}

The null literal is implicitly convertible to any reference type or any nullable value type… so again, the call becomes ambiguous in the library v1.1. Let’s try again…

Reference type and non-nullable value type parameters

If we don’t mind user-defined conversions, but don’t like null literals causing a problem, how about introducing an overload with a non-nullable value type?

// Library version 1.0
public void Foo(string x)

// Library version 1.1
public void Foo(string x)
public void Foo(int x)

This looks good – library.Foo(null) will be fine in v1.1. So is it safe? Not in C# 7.1…

// Client
static void Method()
{
    var library = new Library();
    library.Foo(default);
}

The default literal is like the null literal, but for any type. It’s really useful – and a complete pain when it comes to overloading and compatibility :(

Optional parameters

Optional parameters bring their own kind of pain. Suppose we have one optional parameter, but wish to add a second. We have three options, shown as 1.1a, 1.1b and 1.1c below.

// Library version 1.0
public void Foo(string x = "")

// Library version 1.1a
// Keep the existing method, but add another one with two optional parameters.
public void Foo(string x = "")
public void Foo(string x = "", string y = "")

// Library version 1.1b
// Just add the parameter to the existing method.
public void Foo(string x = "", string y = "")

// Library version 1.1c
// Keep the old method but make the parameter required, and add a new method
// with both parameters optional.
public void Foo(string x)
public void Foo(string x = "", string y = "")

Let’s think about a client that makes two calls:

// Client
static void Method()
{
    var library = new Library();
    library.Foo();
    library.Foo("xyz");
}

Library 1.1a keeps binary compatiblity, but breaks source compatibility: the library.Foo() is now ambiguous. The C# overloading rules prefer a method that doesn’t need the compiler to “fill in” any optional parameters, but it doesn’t have any preference in terms of how many optional parameters are filled in.

Library 1.1b keeps source compatibility, but breaks binary compatibility. Existing compiled code will expect to call a method with a single parameter – and that method no longer exists.

Library 1.1c keeps binary compatibility, but is potentially odd around source compatibility. The library.Foo() call now resolves to the two-parameter method, whereas library.Foo("xyz") resolves to the one-parameter method (which the compiler prefers over the two-parameter method because it doesn’t need to fill in any optional parameters). That may very well be okay, if the one-parameter version simply delegates to the two-parameter version using the same default value. It feels odd for the meaning of the first call to change though, when the method it used to resolve to still exists.

Optional parameters get even hairer when you don’t want to add a new one at the end, but in the middle – e.g. if you’re trying to follow a convention of keeping an optional CancellationToken parameter at the end. I’m not going to dive into this…

Generics

Type inference is a tricky beast at the best of times. With overload resolution it goes into full-on nightmare mode.

Let’s have a single non-generic method in v1.0, and then add a generic method in v1.1.

// Library version 1.0
public void Foo(object x)

// Library version 1.1
public void Foo(object x)
public void Foo<T>(T x)

That doesn’t seem too awful… but let’s look closely at what happens to client code:

// Client
static void Method()
{
    var library = new Library();
    library.Foo(new object());
    library.Foo("xyz");
}

In library v1.0, both calls resolve to Foo(object) – the only method that exists.

Library v1.1 is binary-compatible: if we use a client executable compiled against v1.0 but running against v1.1, both calls will still use Foo(object). But if we recompile, the second call (and only the second one) will change to using the generic method. Both methods are applicable for both calls.

In the first call, T would be inferred to be object, so the argument-to-parameter-type conversion is just object to object in both cases. Great. The compiler applies a tie-break rule that prefers non-generic methods over generic methods.

In the second call, T would be inferred to be string, so the argument-to-parameter-type conversion is string to object for the original method and string to string for the generic method. The latter is a “better” conversion, so the second method is picked.

If the two methods behave the same way, that’s fine. If they don’t, you’ve broken compatibility in a very subtle way.

Inheritance and dynamic typing

I’m sorry: I just don’t have the energy. Both inheritance and dynamic typing would interact with overload resolution in “fun” and obscure ways.

If you add a method in one level of the inheritance hierarchy which overloads a method in a base class, the new method will be examined first, and picked over the base class method even when the base class method is more specific in terms of argument-to-parameter-type conversions. There’s lots of scope for messing things up.

Likewise with dynamic typing (within the client code), to some extent all bets are off. You’re already sacrificing a lot of compile-time safety… it shouldn’t come as a surprise when things break.

Conclusion

I’ve tried to keep the examples reasonably simple here. It can get really complicated really quickly as soon as you have multiple optional parameters etc.

Versioning is hard and makes my head hurt.

Implementing IXmlSerializable in readonly structs

Background

There are three things you need to know to start with:

Operations on read-only variables which are value types copy the variable value first. I’ve written about this before on this blog. C# 7.2 addresses this by introducing the readonly modifier for structs. See the language proposal for more details. I was touched to see that the proposal references my blog post :)

The ref readonly local variables and in parameter features in C# 7.2 mean that “read-only variables” are likely to be more common in C# than they have been in the past.

Noda Time includes many value types which implement IXmlSerializable. Noda Time implements IXmlSerializable.ReadXml by assigning to this: fundamentally IXmlSerializable assumes a mutable type. I use explicit interface implementation to make this less likely to be used directly on an unboxed variable. With a generic method using an interface constraint you can observe a simple method call mutating a non-read-only variable, but that’s generally harmless.

Adding the readonly modifier to Noda Time structs

I relished news of the readonly modifier for struct declarations. At last, I can remove my ReadWriteForEfficiency attribute! Hmm. Not so much. To be fair, some of the structs (7 out of 18) were fine. But every struct that implements IXmlSerializable gave me this error:

Cannot assign to ‘this’ because it is read-only

That’s reasonable: in members of a readonly struct, this effectively becomes an in parameter instead of a ref parameter. But how can we fix that? Like any sensible developer, I turned to Stack Overflow, which has precisely the question I’m interested in. It even has an answer! Unfortunately, it amounts to a workaround: assigning to this via unsafe code.

Violating readonly with unsafe code

To give a concrete example of the answer from Stack Overflow, here’s my current LocalTime code:

void IXmlSerializable.ReadXml([NotNull] XmlReader reader)
{
    Preconditions.CheckNotNull(reader, nameof(reader));
    var pattern = LocalTimePattern.ExtendedIso;
    string text = reader.ReadElementContentAsString();
    this = pattern.Parse(text).Value;
}

Here’s the code that compiles when LocalTime is marked as readonly, after enabling unsafe blocks:

unsafe void IXmlSerializable.ReadXml([NotNull] XmlReader reader)
{
    Preconditions.CheckNotNull(reader, nameof(reader));
    var pattern = LocalTimePattern.ExtendedIso;
    string text = reader.ReadElementContentAsString();
    fixed (LocalTime* thisAddr = &this)
    {
        *thisAddr = pattern.Parse(text).Value;
    }
}

Essentially, the unsafe code is bypassing the controls around read-only structs. Just for kicks, let’s apply the same change
throughout Noda Time, and think about what would happen…

As it happens, that fix doesn’t work universally: ZonedDateTime is a “managed type” because it contains a reference (to a DateTimeZone) which means you can’t create a pointer to it. That’s a pity, but if we can make everything else readonly, that’s a good start. Now let’s look at the knock-on effects…

Safe code trying to abuse the unsafe code

Let’s try to abuse our “slightly dodgy” implementations. Here’s a class we’ll put in a “friendly” assembly which is trying to be as helpful as possible:

using NodaTime;

public class AccessToken
{
    private readonly Instant expiry;
    public ref readonly Instant Expiry => ref expiry;

    public AccessToken(Instant expiry) => this.expiry = expiry;
}

Great – it lets you get at the expiry time of an access token without even copying the value.

The big test is: can we break this friendly’s code’s assumptions about expiry really not changing its value?

Here’s code I expected to mutate the access token:

static void MutateAccessToken(AccessToken accessToken)
{
    ref readonly Instant expiry = ref accessToken.Expiry;
    string xml = "<evil>2100-01-01T00:00:00Z</evil>";
    EvilReadXml(in expiry, xml);
}

static void EvilReadXml<T>(in T value, string xml) where T : IXmlSerializable
{
    var reader = XmlReader.Create(new StringReader(xml));
    reader.Read();
    value.ReadXml(reader);
}

We have an in parameter in EvilReadXml, so the expiry variable is being passed by reference, and then we’re calling ReadXml on that parameter… so doesn’t that mean we’ll modify the parameter, and thus the underlying expiry field in the object?

Nope. Thinking about it, the compiler doesn’t know when it compiles EvilReadXml that T is a readonly struct – it could be a regular struct. So it has to create a copy of the value before calling ReadXml.

Looking the the spec proposal, there’s one interesting point in the section on ref extension methods:

However any use of an in T parameter will have to be done through an interface member. Since all interface members are considered mutating, any such use would require a copy.

Hooray! That suggests it’s safe – at least for now. I’m still worried though: what if the C# team adds a readonly struct generic constraint in C# 8? Would that allow a small modification of the above code to break? And if interface methods are considered mutating anyway, why doesn’t the language know that when I’m trying to implement the interface?

But hey, now that we know unsafe code is able to work around this in our XML implementation, what’s to stop nasty code from using the same trick directly?

Unsafe code abusing safe code

Imagine we didn’t support XML serialization at all. Could unsafe code mutate AccessToken? It turns out it can, very easily:

static void MutateAccessToken(AccessToken accessToken)
{
    ref readonly Instant expiry = ref accessToken.Expiry;
    unsafe
    {
        fixed (Instant* expiryAddr = &expiry)
        {
            *expiryAddr = Instant.FromUtc(2100, 1, 1, 0, 0);
        }
    }
}

This isn’t too surprising – unsafe code is, after all, unsafe. I readily admit I’m not an expert on .NET security, and I know the
landscape has changed quite a bit over time. These days I believe the idea of a sandbox has been somewhat abandoned – if you care about executing code you don’t really trust, you do it in a container and use that as your security boundary. That’s about the limit of my knowledge though, which could be incorrect anyway.

Where do we go from here?

At this point, I’m stuck making a choice between several unpleasant ones:

  • Leave Noda Time public structs as non-read-only. That prevents users from writing efficient code to use them without copying.
  • Remove XML serialization from Noda Time 3.0. We’re probably going to remove binary serialization anyway on the grounds that Microsoft is somewhat-discouraging it going forward, and it’s generally considered a security problem. However, I don’t think XML serialization is considered to be as bad, so long as you use it carefully, preventing arbitrary type loading. (See this paper for more information.)
  • Implement XML serialization using unsafe code as shown above. Even though my attempt at abusing it failed, that doesn’t provide very much comfort. I don’t know what other ramifications there might be for including unsafe code. Does that limit where the code can be deployed? It also doesn’t help with my concern about a future readonly struct constraint.

Thoughts very welcome. Reassurances from the C# team that “readonly struct” constraints won’t happen particularly welcome… along with alternative ways of implementing XML serialization.

Using .NET Core 2.0 SDK on Travis

This is just a brief post that I’m hoping may help some people migrate to use .NET Core 2.0 SDK on Travis. TL;DR: see the end of the post for a sample configuration.

Yesterday (August 15th), .NET Core 2.0 was fully released. Wonderfully, Travis already supports it. You just need dotnet: 2.0.0 in your YAML file.

I decided to experiment with upgrading the Noda Time build to require .NET Core 2.0 SDK. To be clear, I’m not doing anything in the code that requires 2.0, but it simplifies my build scripts:

Additionally, supporting netcoreapp2.0 means I’ll be able to run my benchmarks against that as well, which is going to be very interesting. However, my tests still target netcoreapp1.0, and that’s where I ran into problems.

Having done the bare minimum to try using 2.0 (edit global.json and .travis.yml) I ran into this error:

The specified framework 'Microsoft.NETCore.App', version '1.0.5' was not found.
  - Check application dependencies and target a framework version installed at:
      /
  - Alternatively, install the framework version '1.0.5'.

That makes sense. Although netcoreapp2.0 is compatible with netstandard1.0 (i.e. you can use libraries targeted to netstandard1.0 in a 2.0 environment) an application targeting netcoreapp1.0 really needs a 1.0 runtime.

So, we need to install just the runtime as well. I’d expected this to be potentially painful, but it’s really not. You just need an addons section in the YAML file:

addons:
  apt:
    sources:
    - sourceline: 'deb [arch=amd64] https://packages.microsoft.com/repos/microsoft-ubuntu-trusty-prod trusty main'
      key_url: 'https://packages.microsoft.com/keys/microsoft.asc'
    packages:
    - dotnet-sharedframework-microsoft.netcore.app-1.0.5

Note that in my case, I want netcoreapp1.0 – if you need netcoreapp1.1, you’d probably install dotnet-sharedframework-microsoft.netcore.app-1.1.2.

So, aside from comments etc, my new Travis configuration will look like this:

language: csharp
mono: none
dotnet: 2.0.0
dist: trusty

addons:
  apt:
    sources:
    - sourceline: 'deb [arch=amd64] https://packages.microsoft.com/repos/microsoft-ubuntu-trusty-prod trusty main'
      key_url: 'https://packages.microsoft.com/keys/microsoft.asc'
    packages:
    - dotnet-hostfxr-1.0.1
    - dotnet-sharedframework-microsoft.netcore.app-1.0.5

script:
  - build/travis.sh

I can now build with the 2.0 SDK, and run tests under both netcoreapp1.0 and netcoreapp2.0.

I’m hoping it’s just as simple on AppVeyor when that supports 2.0 as well…

Surprise! Creating an instance of an open generic type

This is a brief post documenting a very weird thing I partly came up with on Stack Overflow today.

The context is this question. But to skip to the shock, we end up with code like this:

object x = GetWeirdValue();
// This line prints True. Be afraid - be very afraid!
Console.WriteLine(x.GetType().GetTypeInfo().IsGenericTypeDefinition);

That just shouldn’t happen. You shouldn’t be able to create an instance of an open type – a type that still contains generic type parameters. What does a List<T> (rather than a List<string> or List<int>) mean? It’s like creating an instance of an abstract class.

Before today, I’d have expected it to be impossible – the CLR should just not allow such an object to exist. I now know one – and only one – way to do it. While you can’t get normal field values for an open generic type, you can get constants… after all, they’re constant values, right? That’s fine for most constants, because those can’t be generic types – int, string etc. The only type of constant with a user-defined type is an enum. Enums themselves aren’t generic, of course… but what if it’s nested inside another generic type, like this:

class Generic<T>
{
    enum GenericEnum
    {
        Foo = 0
    }
}

Now Generic<>.GenericEnum is an open type, because it’s nested in an open type. Using Enum.GetValues(typeof(Generic<>.GenericEnum)) fails in the expected way: the CLR complains that it can’t create instances of the open type. But if you use reflection to get at the constant field representing Foo, the CLR magically converts the underlying integer (which is what’s in the IL of course) into an instance of the open type.

Here’s the complete code:

using System;
using System.Reflection;

class Program
{
    static void Main(string[] args)
    {
        object x = GetWeirdValue();
        // This line prints True
        Console.WriteLine(x.GetType().GetTypeInfo().IsGenericTypeDefinition);
    }

    static object GetWeirdValue() =>
        typeof(Generic<>.GenericEnum).GetTypeInfo()
            .GetDeclaredField("Foo")
            .GetValue(null);

    class Generic<T>
    {
        public enum GenericEnum
        {
            Foo = 0
        }
    }
}

… and the corresponding project file, to prove it works for both the desktop and .NET Core…

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFrameworks>netcoreapp1.0;net45</TargetFrameworks>
  </PropertyGroup>

</Project>

Use this at your peril. I expect that many bits of code dealing with reflection would be surprised if they were provided with a value like this…

It turns out I’m not the first one to spot this. (That would be pretty unlikely, admittedly.) Kirill Osenkov blogged two other ways of doing this, discovered by Vladimir Reshetnikov, back in 2014.