First steps with nullable reference types

This blog post is effectively a log of my experience with the preview of the C# 8 nullable reference types feature.

There are lots of caveats here: it’s mostly “as I go along” so there may well be backtracking. I’m not advising the right thing to do, as I’m still investigating that myself. And of course the feature is still changing. Oh, and this blog post is inconsistent about its tense. Sometimes I write in the present tense as I go along, sometimes I wrote in the past tense afterwards without worrying about it. I hope this isn’t/wasn’t/won’t be too annoying.

I decided that the best way of exploring the feature would be to try to use it with Noda Time. In particular:

  • Does it find any existing bugs?
  • Do my existing attributes match what Roslyn expects?
  • Does the feature get in the way, or improve my productivity?

Installation

I started at the preview page on GitHub. There are two really important points to note:

  • Do this on a non-production machine. I used an old laptop, but presumably you can use a VM instead.
  • Uninstall all versions of Visual Studio other than VS2017 first

I ended up getting my installation into a bad state, and had to reinstall VS2017 (and then the preview) before it would work again. Fortunately that takes a lot less time than it used to.

Check it works

The preview does not work with .NET Core projects or the command-line csc

It’s only for old-style MSBuild projects targeting the .NET framework, and only from Visual Studio.

So to test your installation:

  • Create a new .NET Framework desktop console app
  • Edit Program.cs to include: string? x = null;
  • Build

If you get an error CS0453 (“The type ‘string’ must be a non-nullable value type…”) then it’s not working. If it builds with maybe a warning about the variable not being used, you’re good to go.

First steps with Noda Time

The first thing I needed to do was convert Noda Time to a desktop project. This didn’t require the preview to be installed, so I was able to do it on my main machine.

I created a new solution with three desktop projects (NodaTime, NodaTime.Test and NodaTime.Testing), and added the dependencies between the projects and external ones. I then copied these project files over the regular Noda Time ones.

Handy tip: if you add <Compile Include="**\*.cs" /> in an MSBuild file and open it in Visual Studio, VS will replace it with all the C# files it finds. No need for tedious “Add existing” all over the place.

A small amount of fiddling was required for signing and resources, and then I had a working copy of Noda Time targeting just .NET 4.5. All tests passed :)

For anyone wanting to follow my progress, the code is in a branch of my fork of Noda Time although I don’t know how long I’ll keep it for.

Building with the preview

After fetching that branch onto my older laptop, it built first time – with 228 warnings, most of which were “CS8600: Cannot convert null to non-nullable reference.” Hooray – this is exactly what we want. Bear in mind that this is before I’ve made any changes to the code.

The warnings were split between the three projects like this:

  • NodaTime: 94
  • NodaTime.Testing: 0
  • NodaTime.Test: 134

Follow the annotations

Noda Time already uses [CanBeNull] and [NotNull] attributes for both parameters and return types to indicate expectations. The first obvious step is to visit each application of [CanBeNull] and use a nullable reference type there.

To make it easier to see what’s going on, I first unloaded the NodaTime.Test project. This was so that I could concentrate on making NodaTime self-consistent to start with.

Just doing that actually raised the number of warnings from 94 to 110. Clearly I’m not as consistent as I’d like to be. I suspect I’ve got plenty of parameters which can actually be null but which I didn’t apply the annotation to. It’s time to start actually looking at the warnings.

Actually fix things

I did this in a completely haphazard fashion: fix one warning, go onto another.

I’ve noticed a pattern that was already feasible before, but has extra benefits in the nullable reference type world. Instead of this:

// Old code
string id = SomeMethodThatCanReturnNull();
if (id == null)
{
    throw new SomeException();
}
// Use id knowing it's not null

… I can use the ?? operator with the C# 7 feature of throw expressions:

// Old code
string id = SomeMethodThatCanReturnNull() ??
    throw new SomeException();
// Use id knowing it's not null

That avoids having a separate local variable of type string?, which can be very handy.

I did find a few places where the compiler could do better at working out nullity. For example:

// This is fine
string? x = SomeExpressionThatCanReturnNull();
if (x == null)
{
    return;
}
string y = x;

// This creates a warning: the compiler doesn't know that x
// can't be null on the last line
string? x = SomeExpressionThatCanReturnNull();
if (ReferenceEquals(x, null))
{
    return;
}
string y = x;

The preview doc talks about this in the context of string.IsNullOrEmpty; the ReferenceEquals version is a little more subtle as we can’t determine nullity just from the output – it’s only relevant if the other argument is a constant null. On the other hand, that’s such a fundamental method that I’m hopeful it’ll get fixed.

Fixing these warnings didn’t take very long, but it was definitely like playing Whackamole. You fix one warning, and that causes another. For example, you might make a return type nullable to make
a return null; statement work – and that affects all the callers.

I found that rebuilding would sometimes find more warnings, too. At one point I thought I was done (for the time being) – after rebuilding, I had 26 warnings.

I ran into one very common problem: implementing IEquatable<T> (for a concrete reference type T). In every case, I ended up making it implement IEquatable<T?>. I think that’s the right thing to do… (I didn’t do it consistently though, as I found out later on. And IEqualityComparer<T> is trickier, as I’ll talk about later.)

Reload the test project

So, after about an hour of fixing warnings in the main project, what would happen when I reload the test project? We previously had 134 warnings in the test project. After reloading… I was down to 123.

Fixing the test project involved fixing a lot more of the production code, interestingly enough. And that led me to find a limitation not mentioned on the preview page:

public static NodaFormatInfo GetInstance(IFormatProvider? provider)
{
    switch (provider)
    {
        case null:
            return ...;
        case CultureInfo cultureInfo:
            return ...;
        case DateTimeFormatInfo dateTimeFormatInfo;
            return ...;
        default:
            throw new ArgumentException($"Cannot use provider of type {provider.GetType()}");
    }
}

This causes a warning of a possible dereference in the default case – despite the fact that provider clearly can’t be null, as otherwise it would match the null case. Will try to provide a full example in a bug report.

The more repetitive part is fixing all the tests that ensure a method throws an ArgumentNullException if called with a null argument. As there’s no compile-time checking as well, the argument
needs to be null!, meaning “I know it’s really null, but pretend it isn’t.” It makes me chuckle in terms of syntax, but it’s tedious to fix every occurrence.

IEqualityComparer<T>

I have discovered an interesting problem. It’s hard to implement IEqualityComparer<T> properly. The signatures on the interface are pretty trivial:

public interface IEqualityComparer<in T>
{
    bool Equals(T x, T y);
    int GetHashCode(T obj);
}

But problems lurk beneath the surface. The documentation for the Equals() method doesn’t state what should happen if x or y is null. I’ve typically treated this as valid, and just used the normal equality rules (two null references are equal to each other, but nothing else is equal to a null reference.) Compare that with GetHashCode(), where it’s explicitly documented that the method should throw ArgumentNullException if obj is null.

Now think about a type I’d like to implement an equality comparer for – Period for example. Should I write:

public class PeriodComparer : IEqualityComparer<Period?>

This allows x and y to be null – but also allows obj to be null, causing an ArgumentNullException, which this language feature is trying to eradicate as far as possible.

I could implement the non-nullable version instead:

public class PeriodComparer : IEqualityComparer<Period>

Now the compiler will check that you’re not passing a possibly-null value to GetHashCode(), but will also check that for Equals, despite it being okay.

This feels like it’s a natural but somewhat unwelcome result of the feature arriving so much later than the rest of the type system. I’ve chosen to implement the nullable form, but still throw the
exception in GetHashCode(). I’m not sure that’s the right solution, but I’d be interested to hear what others think.

Found bugs in Noda Time!

One of the things I was interested in finding out with all of this was how consistent Noda Time is in terms of its nullity handling. Until you have a tool like this, it’s hard to tell. I’m very pleased to say that most of it hangs together nicely – although so far that’s only the result of getting down to no warnings, rather than a file-by-file check through the code, which I suspect I’ll want to do eventually.

I did find two bugs, however. Noda Time tries to handle the case where TimeZoneInfo.Local returns a null reference, because we’ve seen that happen in the wild. (Hopefully that’s been fixed now in Mono, but even so it’s nice to know we can cope.) It turns out that we have code to cope with it in one place, but there are two places where we don’t… and the C# 8 tooling found that. Yay!

Found a bug in the preview!

To be clear, I didn’t expect the preview code to be perfect. As noted earlier, there are a few places I think it can be smarter. But I found a nasty bug that would hang Visual Studio and cause csc.exe to fail when building. It turns out that if you have a type parameter T with a constraint of T : class, IEquatable<T?>, that causes a stack overflow. I’ve reported the bug (now filed on GitHub thanks to diligent Microsoft folks) so hopefully it’ll be fixed long before the final version.

Admittedly the constraint is interesting in itself – it’s not necessarily clear what it means, if T is already a nullable reference type. I’ll let smarter people than myself work that out.

Conclusion

Well, that was a jolly exercise. My first impressions are:

  • We really need class library authors to embrace this as soon as C# 8 comes out, in order to make it as useful as possible early. Noda Time has no further dependencies, fortunately.
  • It didn’t take as long as I feared it might to do a first pass at annotating Noda Time, although I’m sure I missed some things while doing it.
  • A few bugs aside, the tooling is generally in very good shape; the warnings it produced were relevant and easy to understand.
  • It’s going to take me a while to get used to things like IList<string?>? for a nullable list of nullable strings.

Overall I’m very excited by all of this – I’m really looking forward to the final release. I suspect more blog posts will come over time…

Backward compatibility and overloading

I started writing a blog post about versioning in July 2017. I’ve mostly abandoned it, because I think the topic is too vast for a single post. It potentially needs a whole site/wiki/repository devoted to it. I hope to come back to it at some point, because I believe this is a hugely important topic that doesn’t get as much attention as it deserves.

In particular, the .NET ecosystem is mostly embracing semantic versioning – which sounds great, but does rely on us having a common understanding of what’s meant by a “breaking change”. That’s something I’ve been thinking about quite a lot. One aspect which has struck me forcefully recently is how hard it is to avoid breaking changes when using method overloading. That’s what this post is about, mostly because it’s fun.

First, a quick definition…

Source and binary compatibility

If I can recompile my client code with a new version of the library and it all works fine, that’s source compatible. If I can redeploy my existing client binary with a new version of the library without recompiling, that’s binary compatible. Neither of these is a superset of the other:

  • Some changes are both source and binary incompatible, such as removing a whole public type that you depended on.
  • Some changes are source compatible but binary incompatible, such as changing a public static read-only field into a property.
  • Some changes are binary compatible but source incompatible, such as adding an overload which could cause compile-time ambiguity.
  • Some changes are source and binary compatible, such as reimplementing the body of a method.

So what are we talking about?

I’m going to assume that we have a public library at version 1.0, and we wish to add some overloads in version 1.1. We’re following semantic versioning, so we need to be backward compatible. What does that mean we can and can’t do, and is it a simple binary choice?

In various cases, I’ll present library code at version 1.0 and version 1.1, then “client” code (i.e. code that is using the library) which could be broken by the change. I’m not presenting method bodies or class declarations, as they’re largely irrelevant – focus on the signatures. It should be easy to reproduce any of this if you’re interested though. We’ll imagine that all the methods I present are in a class called Library.

Simplest conceivable change, foiled by method group conversions

The simplest example I can imagine would be adding a parameterized method when there’s a parameterless one already:

// Library version 1.0
public void Foo()

// Library version 1.1
public void Foo()
public void Foo(int x)

Even that’s not completely compatible. Consider this client code:

// Client
static void Method()
{
    var library = new Library();
    HandleAction(library.Foo);
}

static void HandleAction(Action action) {}
static void HandleAction(Action<int> action) {}

In library version 1.0, that’s fine. The call to HandleAction performs a method group conversion of library.Foo to create an Action. In library version 1.1, it’s ambiguous: the method group can be converted to either Action or Action<int>. So it’s not source compatible, if we’re going to be strict about it.

At this point you might be tempted to give up and go home, resolving never to add any overloads, ever again. Or maybe we can say that this is enough of a corner case to not consider it breaking. Let’s call method group conversions out of scope for now.

Unrelated reference types

We get into a different kind of territory when we have overloads with the same number of parameters. You might expect this library change to be non-breaking:

// Library version 1.0
public void Foo(string x)

// Library version 1.1
public void Foo(string x)
public void Foo(FileStream x)

That feels like it should be reasonable. The original method still exists, so we won’t be breaking binary compatibility. The simplest way of breaking source compatibility is to have a call that either works in v1.0 but doesn’t in v1.1, or works in both but does something different in v1.1 than it did in v1.0.

How can a call break between v1.0 and v1.1? We’d have to have an argument that’s compatible with both string and FileStream. But they’re unrelated reference types…

The first failure is if we have a user-defined implicit conversion to both string and FileStream:

// Client
class OddlyConvertible
{
    public static implicit operator string(OddlyConvertible c) => null;
    public static implicit operator FileStream(OddlyConvertible c) => null;
}

static void Method()
{
    var library = new Library();
    var convertible = new OddlyConvertible();
    library.Foo(convertible);
}

Hopefully the problem is obvious: what used to be unambiguous via a conversion to string is now ambiguous as the OddlyConvertible type can be implicitly converted to both string and FileStream. (Both overloads are applicable, neither is better than the other.)

It may be reasonable to exclude user-defined conversions… but there’s a far simpler way of making this fail:

// Client
static void Method()
{
    var library = new Library();
    library.Foo(null);
}

The null literal is implicitly convertible to any reference type or any nullable value type… so again, the call becomes ambiguous in the library v1.1. Let’s try again…

Reference type and non-nullable value type parameters

If we don’t mind user-defined conversions, but don’t like null literals causing a problem, how about introducing an overload with a non-nullable value type?

// Library version 1.0
public void Foo(string x)

// Library version 1.1
public void Foo(string x)
public void Foo(int x)

This looks good – library.Foo(null) will be fine in v1.1. So is it safe? Not in C# 7.1…

// Client
static void Method()
{
    var library = new Library();
    library.Foo(default);
}

The default literal is like the null literal, but for any type. It’s really useful – and a complete pain when it comes to overloading and compatibility :(

Optional parameters

Optional parameters bring their own kind of pain. Suppose we have one optional parameter, but wish to add a second. We have three options, shown as 1.1a, 1.1b and 1.1c below.

// Library version 1.0
public void Foo(string x = "")

// Library version 1.1a
// Keep the existing method, but add another one with two optional parameters.
public void Foo(string x = "")
public void Foo(string x = "", string y = "")

// Library version 1.1b
// Just add the parameter to the existing method.
public void Foo(string x = "", string y = "")

// Library version 1.1c
// Keep the old method but make the parameter required, and add a new method
// with both parameters optional.
public void Foo(string x)
public void Foo(string x = "", string y = "")

Let’s think about a client that makes two calls:

// Client
static void Method()
{
    var library = new Library();
    library.Foo();
    library.Foo("xyz");
}

Library 1.1a keeps binary compatiblity, but breaks source compatibility: the library.Foo() is now ambiguous. The C# overloading rules prefer a method that doesn’t need the compiler to “fill in” any optional parameters, but it doesn’t have any preference in terms of how many optional parameters are filled in.

Library 1.1b keeps source compatibility, but breaks binary compatibility. Existing compiled code will expect to call a method with a single parameter – and that method no longer exists.

Library 1.1c keeps binary compatibility, but is potentially odd around source compatibility. The library.Foo() call now resolves to the two-parameter method, whereas library.Foo("xyz") resolves to the one-parameter method (which the compiler prefers over the two-parameter method because it doesn’t need to fill in any optional parameters). That may very well be okay, if the one-parameter version simply delegates to the two-parameter version using the same default value. It feels odd for the meaning of the first call to change though, when the method it used to resolve to still exists.

Optional parameters get even hairer when you don’t want to add a new one at the end, but in the middle – e.g. if you’re trying to follow a convention of keeping an optional CancellationToken parameter at the end. I’m not going to dive into this…

Generics

Type inference is a tricky beast at the best of times. With overload resolution it goes into full-on nightmare mode.

Let’s have a single non-generic method in v1.0, and then add a generic method in v1.1.

// Library version 1.0
public void Foo(object x)

// Library version 1.1
public void Foo(object x)
public void Foo<T>(T x)

That doesn’t seem too awful… but let’s look closely at what happens to client code:

// Client
static void Method()
{
    var library = new Library();
    library.Foo(new object());
    library.Foo("xyz");
}

In library v1.0, both calls resolve to Foo(object) – the only method that exists.

Library v1.1 is binary-compatible: if we use a client executable compiled against v1.0 but running against v1.1, both calls will still use Foo(object). But if we recompile, the second call (and only the second one) will change to using the generic method. Both methods are applicable for both calls.

In the first call, T would be inferred to be object, so the argument-to-parameter-type conversion is just object to object in both cases. Great. The compiler applies a tie-break rule that prefers non-generic methods over generic methods.

In the second call, T would be inferred to be string, so the argument-to-parameter-type conversion is string to object for the original method and string to string for the generic method. The latter is a “better” conversion, so the second method is picked.

If the two methods behave the same way, that’s fine. If they don’t, you’ve broken compatibility in a very subtle way.

Inheritance and dynamic typing

I’m sorry: I just don’t have the energy. Both inheritance and dynamic typing would interact with overload resolution in “fun” and obscure ways.

If you add a method in one level of the inheritance hierarchy which overloads a method in a base class, the new method will be examined first, and picked over the base class method even when the base class method is more specific in terms of argument-to-parameter-type conversions. There’s lots of scope for messing things up.

Likewise with dynamic typing (within the client code), to some extent all bets are off. You’re already sacrificing a lot of compile-time safety… it shouldn’t come as a surprise when things break.

Conclusion

I’ve tried to keep the examples reasonably simple here. It can get really complicated really quickly as soon as you have multiple optional parameters etc.

Versioning is hard and makes my head hurt.