Using “git bash” from AppVeyor

TL;DR: If your AppVeyor build starts breaking because it’s started using WSL bash, change the path in your YAML file – see the end of the post for an example.

For years now, I’ve used bash scripts for all kinds of automation in Windows projects. The version of bash I use is the one that comes with Git for Windows – I believe its origins include Cygwin, MSYS2, and MinGW-w64. (I don’t know enough about the differences between those projects or which builds on which to say more. Fortunately, I don’t need to.) This version of bash is installed by default AppVeyor, the CI system I typically use for Windows builds, so I don’t need to do anything else.

Importantly, I don’t want to use Windows Subsystem for Linux (WSL) on Windows builds. The point of doing the build is to use the Windows tool chains. I use Travis for doing Linux builds.

On October 11th 2019, my Noda Time AppVeyor build failed with this error:

build/appveyor.sh: line 11: dotnet: command not found

It turns out this is because AppVeyor has started shipping WSL with its Visual Studio 2019 images. The bash from WSL is earlier in the path than the version of bash from git, so that one is used, and everything starts failing.

It took a little while to diagnose this, but the fix is pretty easy – you just need to put git bash earlier in your path. I chose to do this in the “install” part of appveyor.yml:

install:
  # Make sure we get the bash that comes with git, not WSL bash
  - ps: $env:Path = "C:\Program Files\Git\bin;$env:Path"

Using just that change, the build started working again. Hooray!

Versioning limitations in .NET

This is a blog post I’ve intended to write for a very long time. (Other blog posts in that category include a recipe for tiramisu ice cream, and “knights and allies”.) It’s one of those things that’s grown in my mind over time, becoming harder and harder to start. However, there have been three recent incidents that have brought it back into focus:

TL;DR: Versioning is inherently hard, but the way that .NET infrastructure is set up makes it harder than it needs to be, I suspect.

The sample code for this blog post is available on GitHub.

Refresher: SemVer

NuGet is the de facto standard for distribution of packages now, and it supports semantic versioning, also known as SemVer for short. SemVer version strings (ignoring pre-release versions) are of the form major.minor.patch.

The rules of SemVer sound straightforward from the perspective of a package producer:

  • If you make a breaking change, you need to bump the major version
  • If you make backward compatible additions, you need to bump the minor version
  • If you make backward and forward compatible changes (basically internal implementation changes or documentation changes) you bump the patch version

It also sounds straightforward from the perspective of a package consumer, considering moving from one version to another of a package:

  • If you move to a different major version, your existing code may not work (because everything can change between major versions)
  • If you move to a later minor version within the same major version, your code should still work
  • If you move to an earlier minor version within the same major version, your existing code may not work (because you may be using something that was introduced in the latest minor version)
  • If you move to a later or earlier patch version within the same major/minor version, your code should still work

Things aren’t quite as clear as they sound though. What counts as a breaking change? What kind of bug fix can go into just a patch version? If a change can be detected, it can break someone, in theory at least.

The .NET Core team has a set of rules about what’s considered breaking or not. That set of rules may not be appropriate for every project. I’d love to see:

  • Tooling to tell you what kind of changes you’ve made between two commits
  • A standard format for rules so that the tool from the first bullet can then suggest what your next version number should be; your project can then advertise that it’s following those rules
  • A standard format to record the kinds of changes made between versions
  • Tooling to check for “probable compatibility” of the new version of a library you’re consuming, given your codebase and the record of changes

With all that in place, we would all hopefully be able to follow SemVer reliably.

Importantly, this makes the version number a purely technical decision, not a marketing one. If the current version of your package is (say) 2.3.0, and you add a bunch of features in a backward-compatible way, you should release the new version as 2.4.0, even if it’s a “major” version in terms of the work you’ve put in. Use whatever other means you have to communicate marketing messages: keep the version number technical.

Even with packages that follow SemVer predictably and clearly, that’s not enough for peace and harmony in the .NET ecosystem, unfortunately.

The diamond dependency problem

The diamond dependency problem is not new to .NET, and most of the time we manage to ignore it – but it’s still real, and is likely to become more of an issue over time.

The canonical example of a diamond dependency is where an application depends on two libraries, each of which depends on a common third library, like this:

Common diamond dependency

(I’m using NodaTime as an example so I can refer to specific versions in a moment.)

It doesn’t actually need to be this complicated – we don’t need Lib2 here. All we need is two dependencies on the same library, and one of those can be from the application:

Simplified diamond dependency

Multiple dependencies on the same library are fine, so long as they depend on compatible versions. For example, from our discussion of SemVer above, it should be fine for Lib1 to depend on NodaTime 1.3.0, and App to depend on NodaTime 1.2.0. We expect the tooling to resolve all the dependencies and determine that 1.3.0 is the version to use, and the App code should be fine with that – after all, 1.3.0 is meant to be backward-compatible with 1.2.0. The same is true the other way round, if App depends on later version than Lib1, so long as they’re using the same major version.

(Note: there are potential problems even within a minor version number – if App depends on 1.3.0 and Lib1 depends on 1.3.1 which contains a bug fix, but App has a workaround for the bug which then fails under 1.3.1 when the bug is no longer present. Things like that can definitely happen, but I’ll ignore that kind of problem for the rest of this post, and assume that everything conforms to idealized SemVer.)

Diamond dependencies become a problem under SemVer when the dependencies are two different major versions of the same library. To give a concrete example from the NodaTime package, consider the IClock interface. The 1.4.x version contains a single property, Now. The 2.0.x version has the same functionality, but as a method, GetCurrentInstant(). (This was basically a design failing on my part in v1 – I followed the BCL example of DateTime.Now without thinking clearly enough about whether it should have been a property.)

Now suppose App is built with the .NET Core SDK, and depends on NodaTime 2.0.0, and Lib1 depends on NodaTime 1.3.1 – and let’s imagine a world where that was the only breaking change in NodaTime 2.x. (It wasn’t.) When we build the application, we’d expect 2.0 to be used at execution time. If Lib1 never calls IClock.Now, all is well. Under .NET Core tooling, assembly binding redirects are handled automatically so when Lib1 “requests” NodaTime 1.3.1, it gets NodaTime 2.0.0. (The precise way in which this is done depends on the runtime executing the application. In .NET Core, there’s an App.deps.json file; in desktop .NET it’s App.exe.config. Fortunately this doesn’t matter much at the level of this blog post. It may well make a big difference to what’s viable in the future though.)

If Lib1 does call IClock.Now, the runtime will throw a MissingMethodException. Ouch. (Sample code.)

The upshot is that if the transitive set of “package + version” tuples for your entire application contains more than one major version for the same package, it’s entirely possible that you’ll get an exception at execution time such as MissingMethodException, MissingFieldException, TypeNotFoundException or similar.

If that doesn’t sound too likely, please consider that the Newtonsoft.Json package (Json .NET) has 12 major versions as I’m writing this blog post. I suspect that James Newton-King has kept the breaking changes to an absolute minimum, but even so, it’s pretty scary.

Non-proposals

I’d like to propose some enhancements to tooling that might help to address the issue. Before we look at what I am suggesting, I’d like to mention a few options that I’m not suggesting.

Ignore the problem

I’m surprised that few people seem as worried about versioning as I am. I’ve presented talks on versioning a couple of times, but I don’t remember seeing anyone else do so – and certainly not in a .NET-specific deep-dive way. (My talk isn’t that, either.) It’s possible that there are lots of people who are worried, and they’re just being quiet about it.

This blog post is just part of me trying to agitate the community – including but not limited to Microsoft – into taking this problem seriously. If it turns out that there are already smart people working on this, that’s great. It’s also possible that we can live on the edge of versioning anarchy forever and it will always be a potential nightmare, but only cause a small enough number of failures that we decide we can live with it. That feels like a risk we should at least take consciously though.

Build at head, globally

In 2017, Titus Winters presented C++ as a live at head language at CppCon. It’s a great talk; go watch it. (Yes, it’s an hour and a half long. It’s still worth it. It also states a bunch of what I’ve said above in a slightly different way, so it may be helpful in that sense.) The idea is for everyone to build their application based on full source code, and provide tooling to automatically update consumer code based on library changes.

To go back to the Noda Time IClock example, if I build all the code for my application locally (App, Lib1 and NodaTime) then when NodaTime changes from the IClock.Now property to IClock.GetCurrentInstant(), the code in Lib1 that uses IClock.Now can automatically be changed to use IClock.GetCurrentInstant(), and everyone is happy with the same version. The Abseil project is a library (or collection of libraries) for C++ that embrace this concept.

It’s possible that this could eventually be a good solution for .NET. I don’t know of any technical aspects that mean it could work for C++ but not for .NET. However, it’s so far from our current position that I don’t believe it’s a practical choice at the moment, and I think it makes sense to try this experiment in one language first for a few years, then let other languages consider whether it makes sense for them.

I want to make it very clear that I’m not disagreeing with anything Titus said. He’s a very smart guy, and I explicitly do agree with almost everything I’ve heard him say. If I ever decide that I disagree with some aspect and want to make a public debate about it, I’ll be a lot more specific. Vague arguments are irritating for everyone. But the .NET ecosystem does depend on binary distribution of packages at the moment, and that’s an environment Titus deliberately doesn’t try to address. If someone wants to think about all the practical implications of all the world’s .NET consumers living at head in a source-driven (rather than binary-driven) world, I’d be interested in reading the results of that thinking. It’s certainly more feasible now than it was before .NET Core. But I’m not going there right now.

Never make breaking changes in any library

If we never make any changes that will break anyone, none of this is a problem.

I gave the example of Newtonsoft.Json earlier, and that it’s on major version 12. My guess is that that means there really have been 11 sets of breaking changes, but that they’re sufficiently unlikely to cause real failure that we’ve survived.

In the NodaTime package, I know I have made real breaking changes – it’s currently at version 2.4.x, and I’m planning on a 3.0 release some time after C# 8 comes out. I’ve made (or I’m considering) breaking changes in at least three different ways:

  • Adding members to public interfaces. If you implement those interfaces yourself (which is relatively unlikely) your code will be broken. On the other hand, everyone who wants the functionality I’ve added gets to use it in a clean way.
  • Removing functionality which is either no longer desirable (binary serialization) or shouldn’t have been present to start with. If you still want that functionality, I can only recommend that you stay on old versions.
  • Refactoring existing functionality, e.g. the IClock.Now => IClock.GetCurrentInstant() change, or fixing a typo in a method name. It’s annoying for existing consumers, but better for future consumers.

I want to be able to make all of these changes. They’re all good things in the long run, I believe.

So, those are options I don’t want to take. Let’s look at a few that I think we should pursue.

Proposals

Firstly, well done and thank you for making it this far. Before any editing, we’re about 2000 words into the post at this point. A smarter person might have got this far quicker without any loss of important information, but I hope the background has been useful.

Prerequisite: multi-version support

My proposals require that the runtime support loading multiple assemblies with the same name at the same time. Obviously I want to support .NET Core, so this mustn’t require the use of multiple AppDomains. As far as I’m aware, this is already the case, and I have a small demo of this, running with both net471 and netcoreapp2.0 targets:

// Call SystemClock.Instance.Now in NodaTime 1.3.1
string path131 = Path.GetFullPath("NodaTime-1.3.1.dll");
Assembly nodaTime131 = Assembly.LoadFile(path131);
dynamic clock131 = nodaTime131
    .GetType("NodaTime.SystemClock")
    // Instance is a field 1.x
    .GetField("Instance")
    .GetValue(null);
Console.WriteLine(clock131.Now);

// Call SystemClock.Instance.GetCurrentInstant() in NodaTime 2.0.0
string path200 = Path.GetFullPath("NodaTime-2.0.0.dll");
Assembly nodaTime200 = Assembly.LoadFile(path200);
dynamic clock200 = nodaTime200
    .GetType("NodaTime.SystemClock")
    // Instance is a property in 2.x
    .GetProperty("Instance")
    .GetValue(null);
Console.WriteLine(clock200.GetCurrentInstant());

I’ve used dynamic typing here to avoid having to call the Now property or GetCurrentInstant() method using hand-written reflection, but we have to obtain the clock with reflection as it’s accessed via a static member. This is in a project that doesn’t depend on Noda Time at all in a compile-time sense. It’s possible that introducing a compile-time dependency could lead to some interesting problems, but I suspect those are fixable with the rest of the work below.

On brief inspection, it looks like it’s also possible to load two independent copies of the same version of the same assembly, so long as they’re stored in different files. That may be important later on, as we’ll see.

Proposal: execute with the expected major version

The first part of my proposal underlies all the rest. We should ensure that each library ends up executing against a dependency version that has the same major version it requested. If Lib1 depends on Noda Time 1.3.1, tooling should make sure it always gets >= 1.3.1 and = 1.3.1″ which appears to be the default at the moment, but I don’t mind too much if I have to be explicit. The main point is that when different dependencies require different major versions, the result needs to be multiple assemblies present at execution time, rather than either a build error or the approach of “let’s just hope that Lib1 doesn’t use anything removed in 2.0”. (Of course, Lib1 should be able to declare that it is compatible with both NodaTime 1.x and NodaTime 2.x. It would be good to make that ease to validate, too.)

If the rest of the application already depends on NodaTime 1.4.0 (for example) then it should be fine to stick to the simple situation of loading a single copy of the NodaTime assembly. But if the rest of the application is using 2.0.0 but Lib1 depends on 1.3.1, we should make that work by loading both major versions 1 and 2.

This proposal then leads to other problems in terms of how libraries communicate with each other; the remaining proposals attempt to address that.

Proposal: private dependencies

When describing the diamond dependency problem, there’s one aspect I didn’t go into. Sometimes a library will take a dependency as a pure implementation detail. For example, Lib1 could use NodaTime internally, but expose an API that’s purely in terms of DateTime. On the other hand, Lib1 could expose its use of NodaTime via its public (and protected) API, using NodaTime types for some properties, method parameters, method return types, generic type arguments, base types and so on.

Both scenarios are entirely reasonable, but they have different versioning concerns. If Lib1 uses NodaTime as a “private dependency” then App shouldn’t (in an ideal world) need to care which version of NodaTime Lib1 uses.

However, if Lib1 exposes method with an IClock parameter, the method caller really needs to know that it’s using a 1.3.1. They’ll need to have a “1.3.1 IClock” to pass in. That means App needs to be aware of the version of NodaTime that Lib1 depends on.

I propose that the author of Lib1 should be able to make a decision about whether NodaTime is a “public” or “private” dependency, and express that decision within the NuGet package.

The compiler should be able to validate that a private dependency really isn’t exposed in the public API anywhere. Ideally, I’d like this to be part of the C# language eventually; I think versioning is important enough to be a language concern. It’s reasonable to assert that that ship has sailed, however, and that it’s reasonable to just have a Roslyn analyzer for this. Careful thought is required in terms of transitive dependencies, by the way. How should the compiler/analyzer treat a situation where Lib1 privately depends on NodaTime 1.3.1, but publicly depends on Lib2 that publicly depends on NodaTime 2.0.0? I confess I haven’t thought this through in detail; I first want to get enough people interested that the detailed work is worth doing.

Extern aliases for packages

Private dependencies are relatively simple to think about, I believe. They’re implementation details that should – modulo a bunch of caveats – not impact consumers of the library that has the private dependencies.

Public dependencies are trickier. If App wants to use NodaTime 2.0.0 for almost everything, but needs to pass in a 1.3.1 clock to a method in Lib1, then App effectively needs to depend on both 1.3.1 and 2.0.0. Currently, as far as I’m aware, there’s no way of representing this in a project file. C# as a language supports the idea of multiple assemblies exposing the same types, via extern aliases… but we’re missing a way of expressing that in project files.

There’s already a GitHub issue requesting this, so I know I’m not alone in wanting it. We might have something like:

<ProjectReference Include="NodaTime" Version="1.3.1" ExternAlias="noda1" />
<ProjectReference Include="NodaTime" Version="2.0.0" ExternAlias="noda2" />

then in the C# code you might use:

using noda2::NodaTime;
// Use NodaTime types as normal, using NodaTime 2.0.0

// Then pass a 1.3.1 clock into a Lib1 method:
TypeFromLib1.Method(noda1::NodaTime.SystemClock.Instance);

There’s an assumption here: that each package contains a single assembly. That definitely doesn’t have to be true, and a full solution would probably need to address that, allowing more complex syntax for per-assembly aliasing.

It’s worth noting that it would be feasible for library authors to providing “bridging” packages too. For example, I could provide a NodaTime.Bridging package which allowed you to convert between NodaTime 1.x and NodaTime 2.x types. Sometimes those conversions may be lossy, but they’re at least feasible. The visible immutability of almost every type in Noda Time is a big help here, admittedly – but packages like this could really help consumers.

Here be dragons: shared state

So far I’ve thought of two significant problems with the above proposals, and both involve shared state – but in opposite directions.

Firstly, consider singletons that we really want to be singletons. SystemClock.Instance is a singleton in Noda Time. But if multiple assemblies are loaded, one per major version, then it’s really “singleton per major version.” For SystemClock that’s fine, but imagine if your library decided that it would use grab a process-wide resource in its singleton, assuming that it was okay to do so because after all there’s only be one of them. Maybe you’d have an ID generator which would guarantee uniqueness by incrementing a counter. That doesn’t work if there are multiple instances.

Secondly, we need to consider mutable shared state, such as some sort of service locator that code registered implementations in. Two different libraries with supposedly private dependencies on the same service locator package might each want to register the same type in the service locator. At that point, things work fine if they depend on different major versions of the service locator package, but start to conflict if the implementations happen to depend on the same major version, and end up using the same assembly. Our isolation of the private dependency isn’t very isolated after all.

While it’s reasonable to argue that we should avoid this sort of shared state as far as possible, it’s unreasonable to assume that it doesn’t exist, or that it shouldn’t be considered as part of this kind of versioning proposal. At the very least, we need to consider how users can diagnose issues stemming from this with some ease, even if I suspect it’ll always be slightly tricky.

As noted earlier, it’s possible to introduce more isolation by loading the same assembly multiple times, so potentially each private dependency could really be private. That helps in the second case above, but hurts more in the first case. It also has a performance impact in terms of duplication of code etc.

Here be unknown dragons

I’m aware that versioning is really complicated. I’ve probably thought about it more than most developers, but I know there’s a lot I’m unaware of. I don’t expect my proposals to be “ready to go” without serious amounts of detailed analysis and work. While I would like to help with that work, I suspect it will mostly be done by others.

I suspect that even this detailed analysis won’t be enough to get things right – I’d expect that when there’s a prototype, exposing it to real world dependencies will find a bunch more issues.

Conclusion

I believe the .NET ecosystem has a versioning problem that’s currently not being recognized and addressed.

The intention isn’t that these proposals are final, concrete design docs – the intention is that they help either start the ball rolling, or give an already-rolling-slightly ball a little more momentum. I want the community to openly discuss the problems we’re seeing now, so we get a better handle on the problem, and then work together to alleviate those problems as best we can, while recognizing that perfection is unlikely to be possible.

Lying to the compiler

This morning I tweeted this:

Just found a C# 8 nullable reference types warning in Noda Time. Fixing it by changing Foo(x, x?.Bar) to Foo(x, x?.Bar!) which looks really dodgy… anyone want to guess why it’s okay?

This attracted more interest than I expected, so I thought I’d blog about it.

First let’s unpack what x?.Bar! means. x?.Bar means “if x is non-null, the value of x.Bar; otherwise, the corresponding null value”. The ! operator at the end is introduced in C# 8, and it’s the damn-it operator (more properly the “null-forgiving operator”, but I expect to keep calling it the damn-it operator forever). It tells the compiler to treat the preceding expression as “definitely not null” even if the compiler isn’t sure for whatever reason. Importantly, this does not emit a null check in the IL – it’s a compile-time only change.

When talking about the damn-it operator, I’ve normally given two scenarios where it makes sense:

  • When testing argument validation
  • When you have invariants in your code which allow you to know more than the compiler does about nullability. This is a little bit like a cast: you’re saying you know more than the compiler. Remember that it’s not like a cast in terms of behaviour though; it’s not checked at execution time.

My tweet this morning wasn’t about either of these cases. It’s in production code, and I absolutely believe that it’s possible for x?.Bar to be null. I’m lying to the compiler to get it to stop it emitting a warning. The reason is that in the case where the value is null, it won’t matter that it’s null.

The actual code is in this Noda Time commit, but the code below provides a simplified version. We have three classes:

  • Person, with a name and home address
  • Address, with some properties I haven’t bothered showing here
  • Delivery, with a recipient and an address to deliver to
using System;

public sealed class Address
{
    // Properties here
}

public sealed class Person
{
    public string Name { get; }
    public Address HomeAddress { get; }

    public Person(string name, Address homeAddress)
    {
        Name = name ??
            throw new ArgumentNullException(nameof(name));
        HomeAddress = homeAddress ??
            throw new ArgumentNullException(nameof(homeAddress));
    }
}

public sealed class Delivery
{
    public Person Recipient { get; }
    public Address Address { get; }

    public Delivery(Person recipient)
        : this(recipient, recipient?.HomeAddress!)
    {
    }

    public Delivery(Person recipient, Address address)
    {
        Recipient = recipient ??
            throw new ArgumentNullException(nameof(recipient));
        Address = address ??
            throw new ArgumentNullException(nameof(address));
    }
}

The interesting part is the Delivery(Person) constructor, that delegates to the Delivery(Person, Address) constructor.

Here’s a version that would compile with no warnings:

public Delivery(Person recipient)
    : this(recipient, recipient.HomeAddress)

However, now if recipient is null, that will throw NullReferenceException instead of the (preferred) ArgumentNullException. Remember that nullable reference checking in C# 8 is really just advisory – the compiler does nothing to stop a non-nullable reference variable from actually having a value of null. This means we need to keep all the argument validation we’ve already got.

We could validate recipient before we pass it on to the other constructor:

public Delivery(Person recipient)
    : this(recipient ?? throw new ArgumentNullException(...),
           recipient.HomeAddress)

That will throw the right exception, but it’s ugly and more code than we need. We know that the constructor we’re delegating to already validates recipient – we just need to get that far. That’s where the null-conditional operator comes in. So we can write:

public Delivery(Person recipient)
    : this(recipient, recipient?.HomeAddress)

That will behave as we want it to – if recipient is null, we’ll pass null values as both arguments to the other constructor, and it will validate them. But now the compiler warns that the second argument could be null, and the parameter is meant to be non-null. The solution is to use the damn-it operator:

public Delivery(Person recipient)
    : this(recipient, recipient?.HomeAddress!)

Now we get the behaviour we want, with no redundant code, and no warnings. We’re lying to the compiler and satisfied that we’re doing so sensibly, because recipient?.HomeAddress is only null if recipient is null, and we know that that will be validated first anyway.

I’ve added a comment, as it’s pretty obscure otherwise – but part of me just enjoys the oddity of it all :)

Storing UTC is not a silver bullet

Note: this is a pretty long post. If you’re not interested in the details, the conclusion at the bottom is intended to be read in a standalone fashion. There’s also a related blog post by Lau Taarnskov – if you find this one difficult to read for whatever reason, maybe give that a try.

When I read Stack Overflow questions involving time zones, there’s almost always someone giving the advice to only ever store UTC. Convert to UTC as soon as you can, and convert back to a target time zone as late as you can, for display purposes, and you’ll never have a time zone issue again, they say.

This blog post is intended to provide a counterpoint to that advice. I’m certainly not saying storing UTC is always the wrong thing to do, but it’s not always the right thing to do either.

Note on simplifications: this blog post does not go into supporting non-Gregorian calendar systems, or leap seconds. Hopefully developers writing applications which need to support either of those are already aware of their requirements.

Background: EU time zone rule changes

The timing of this blog post is due to recent European Parliament proceedings that look like they will probably end the clocks changing twice a year into “summer time” or “winter time” within EU member states. The precise details are yet to be finalized and are unimportant to the bigger point, but for the purpose of this blog post I’ll assume that each member state has to decide whether they will “spring forward” one last time on March 28th 2021, then staying in permanent “summer time”, or “fall back” one last time on October 31st 2021, then staying in permanent “winter time”. So from November 1st 2021 onwards, the UTC offset of each country will be fixed – but there may be countries which currently always have the same offset as each other, and will have different offsets from some point in 2021. (For example, France could use winter time and Germany could use summer time.)

The larger point is that time zone rules change, and that applications should expect that they will change. This isn’t a corner case, it’s the normal way things work. There are usually multiple sets of rule changes (as released by IANA) each year. At least in the European changes, we’re likely to have a long notice period. That often isn’t the case – sometimes we don’t find out about rule changes until a few days before they happen.

Application example

For the sake of making everything concrete, I’m going to imagine that we’re writing an application to help conference organizers. A conference organizer can create a conference within the application, specifying when and where it’s happening, and (amongst other things) the application will display a countdown timer of “the number of hours left before the start of the conference”. Obviously a real application would have a lot more going on than this, but that’s enough to examine the implementation options available.

To get even more concrete, we’ll assume that a conference organizer has registered a conference called “KindConf” and has said that it will start at 9am in Amsterdam, on July 10th 2022. They perform this registration on March 27th 2019, when the most recently published IANA time zone database is 2019a, which predicts that the offset observed in Amsterdam on July 10th 2022 will be UTC+2.

For the sake of this example, we’ll assume that the Netherlands decides to fall back on October 31st 2021 for one final time, leaving them on a permanent offset of UTC+1. Just to complete the picture, we’ll assume that this decision is taken on February 1st 2020, and that IANA publishes the changes on March 14th 2020, as part of release 2020c.

So, what can the application developer do? In all the options below, I have not gone into details of the database support for different date/time types. This is important, of course, but probably deserves a separate blog post in its own right, on a per-database basis. I’ll just assume we can represent the information we want to represent, somehow.

Interlude: requirements

Before we get to the implementations, I’ll just mention a topic that’s been brought up a few times in the comments and on Twitter. I’ve been assuming that the conference does still occur at 9am on July 10th 2022… in other words, that the “instant in time at which the conference starts” changes when the rules change.

It’s unlikely that this would ever show up in a requirements document. I don’t remember ever being in a meeting with a product manager where they’d done this type of contingency planning. If you’re lucky, someone would work out that there’s going to be a problem long before the rules actually change. At that point, you’d need to go through the requirements and do the implementation work. I’d argue that this isn’t a new requirement – it’s a sort of latent, undiscovered requirement you’ve always had, but you hadn’t known about before.

Now, back to the options…

Option 1: convert to UTC and just use that forever

The schema for the Conferences table in the database might look like this:

  • ID: auto-incremented integer
  • Name: string
  • Start: date/time in UTC
  • Address: string

The entry for KindConf would look like this:

  • ID: 1
  • Name: KindConf
  • Start: 2022-07-10T07:00:00Z
  • Address: Europaplein 24, 1078 GZ Amsterdam, Netherlands

That entry is then preserved forever, without change. So what happens to our countdown timer?

Result

The good news is that anyone observing the timer will see it smoothly count down towards 0, with no jumps. The bad news is that when it reaches 0, the conference won’t actually start – there’ll be another hour left. This is not good.

Option 2: convert to UTC immediately, but reconvert after rule changes

The schema for the Conferences table would preserve the time zone ID. (I’m using the IANA ID for simplicity, but it could be the Windows system time zone ID, if absolutely necessary.) Alternatively, the time zone ID could be derived each time it’s required – more on that later.

  • ID: auto-incremented integer
  • Name: string
  • Start: date/time in UTC
  • Address: string
  • Time zone ID: string

The initial entry for KindConf would look like this:

  • ID: 1
  • Name: KindConf
  • Start: 2022-07-10T07:00:00Z
  • Address: Europaplein 24, 1078 GZ Amsterdam, Netherlands
  • TimeZoneId: Europe/Amsterdam

On March 14th 2020, when the new time zone database is released, that entry could be changed to make the start time accurate again:

  • ID: 1
  • Name: KindConf
  • Start: 2022-07-10T08:00:00Z
  • Address: Europaplein 24, 1078 GZ Amsterdam, Netherlands
  • TimeZoneId: Europe/Amsterdam

But what does that “change” procedure look like? We need to convert the UTC value back to the local time, and then convert back to UTC using different rules. So which rules were in force when that entry was created? It looks like we actually need an extra field in the schema somewhere: TimeZoneRulesVersion. This could potentially be a database-wide value, although that’s only going to be reasonable if you can update all entries and that value atomically. Allowing a value per entry (even if you usually expect all entries to be updated at roughly the same time) is likely to make things simpler.

So our original entry was actually:

  • ID: 1
  • Name: KindConf
  • Start: 2022-07-10T07:00:00Z
  • Address: Europaplein 24, 1078 GZ Amsterdam, Netherlands
  • TimeZoneId: Europe/Amsterdam
  • TimeZoneRules: 2019a

And the modified entry is:

  • ID: 1
  • Name: KindConf
  • Start: 2022-07-10T08:00:00Z
  • Address: Europaplein 24, 1078 GZ Amsterdam, Netherlands
  • TimeZoneId: Europe/Amsterdam
  • TimeZoneRules: 2020c

Of course, the entry could have been updated many times over the course of time, for 2019b, 2019c, …, 2020a, 2020b. Or maybe we only actually update the entry if the start time changes. Either way works.

Result

Now, anyone refreshing the countdown timer for the event will see the counter increase by an hour when the entry is updated. That may look a little odd – but it means that when the countdown timer reaches 0, the conference is ready to start. I’m assuming this is the desired behaviour.

Implementation

Let’s look at roughly what would be needed to perform this update in C# code. I’ll assume the use of Noda Time to start with, but then we’ll consider what happens if you’re not using Noda Time.

public class Conference
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Address { get; set; }
    public Instant Start { get; set; }
    public string TimeZoneId { get; set; }
    public string TimeZoneRules { get; set; }
}

// In other code... some parameters might be fields in the class.
public void UpdateStartTime(
    Conference conference,
    Dictionary<string, IDateTimeZoneProvider> timeZoneProvidersByVersion,
    string latestRules)
{
    // Map the start instant into the time zone using the old rules
    IDateTimeZoneProvider oldProvider = timeZoneProvidersByVersion[conference.TimeZoneRules];
    DateTimeZone oldZone = oldProvider[conference.TimeZoneId];
    ZonedDateTime oldZonedStart = conference.Start.InZone(oldZone);   

    IDateTimeZoneProvider newProvider = timeZoneProvidersByVersion[latestRules];
    DateTimeZone newZone = newProvider[conference.TimeZoneId];
    // Preserve the local time, but with the new time zone rules
    ZonedDateTime newZonedStart = oldZonedStart.LocalDateTime.InZoneLeniently(newZone);

    // Update the conference entry with the new information
    conference.Start = newZonedStart.ToInstant();
    conference.TimeZoneRules = latestRules;
}

The InZoneLeniently call is going to be a common issue – we’ll look at that later (“Ambiguous and skipped times”).

This code would work, and Noda Time would make it reasonably straightforward to build that dictionary of time zone providers, as we publish all the “NZD files” we’ve ever created from 2013 onwards on the project web site. If the code is being updated with the latest stable version of the NodaTime NuGet package, the latestRules parameter wouldn’t be required – DateTimeZoneProviders.Tzdb could be used instead. (And IDateTimeZoneProvider.VersionId could obtain the current version.)

However, this approach has three important requirements:

  • The concept of “version of time zone rules” has to be available to you
  • You have to be able to load a specific version of the time zone rules
  • You have to be able to use multiple versions of the time zone rules in the same application

If you’re using C# but relying on TimeZoneInfo then… good luck with any of those three. (It’s no doubt feasible, but far from simple out of the box, and it may require an external service providing historical data.)

I can’t easily comment on other platforms in any useful way, but I suspect that dealing with multiple versions of time zone data is not something that most developers come across.

Option 3: preserve local time, using UTC as derived data to be recomputed

Spoiler alert: this is my preferred option.

In this approach, the information that the conference organizer supplied (“9am on July 10th 2022”) is preserved and never changed. There is additional information in the entry that is changed when the time zone database is updated: the converted UTC instant. We can also preserve the version of the time zone rules used for that computation, as a way of allowing the process of updating entries to be restarted after a failure without starting from scratch, but it’s not strictly required. (It’s also probably useful as diagnostic information, too.)

The UTC instant is only stored at all for convenience. Having a UTC representation makes it easier to provide total orderings of when things happen, and also to compute the time between “right now” and the given instant, for the countdown timer. Unless it’s actually useful to you, you could easily omit it entirely. (My Noda Time benchmarks suggest it’s unlikely that doing the conversion on every request wouldn’t cause a bottleneck. A single local-to-UTC conversion on my not-terribly-fast benchmark machine only takes ~150ns. In most environments that’s close to noise. But for cases where it’s relevant, it’s fine to store the UTC as described below.)

So the schema would have:

  • ID: auto-incremented integer
  • Name: string
  • Local start: date/time in the specified time zone
  • Address: string
  • Time zone ID: string
  • UTC start: derived field for convenience
  • Time zone rules version: for optimization purposes

So our original entry is:

  • ID: 1
  • Name: KindConf
  • LocalStart: 2022-07-10T09:00:00
  • Address: Europaplein 24, 1078 GZ Amsterdam, Netherlands
  • TimeZoneId: Europe/Amsterdam
  • UtcStart: 2022-07-10T07:00:00Z
  • TimeZoneRules: 2019a

On March 14th 2020, when the time zone database 2020c is released, this is modified to:

  • ID: 1
  • Name: KindConf
  • LocalStart: 2022-07-10T09:00:00
  • Address: Europaplein 24, 1078 GZ Amsterdam, Netherlands
  • TimeZoneId: Europe/Amsterdam
  • UtcStart: 2022-07-10T08:00:00Z
  • TimeZoneRules: 2020c

Result

This is the same as option 2: after the update, there’s a jump of an hour, but when it reaches 0, the conference starts.

Implementation

This time, we don’t need to convert our old UTC value back to a local value: the “old” time zone rules version and “old” UTC start time are irrelevant. That simplifies matter significantly:

public class Conference
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Address { get; set; }
    public LocalDateTime LocalStart { get; set; }
    public string TimeZoneId { get; set; }
    public Instant UtcStart { get; set; }
    public string TimeZoneRules { get; set; }
}

// In other code... some parameters might be fields in the class.
public void UpdateUtcStart(
    Conference conference,
    IDateTimeZoneProvider latestZoneProvider)
{
    DateTimeZone newZone = latestZoneProvider[conference.TimeZoneId];
    // Preserve the local time, but with the new time zone rules
    ZonedDateTime newZonedStart = conference.LocalStart.InZoneLeniently(newZone);

    // Update the conference entry with the new information
    conference.UtcStart = newZonedStart.ToInstant();
    conference.TimeZoneRules = latestZoneProvider.VersionId;
}

As the time zone rules version is now optional, this code could be ported to use TimeZoneInfo instead. Obviously from my biased perspective the code wouldn’t be as pleasant, but it would be at least reasonable. The same is probably true on other platforms.

So I prefer option 3, but is it really so different from option 2? We’re still storing the UTC value, right? That’s true, but I believe the difference is important because the UTC value is an optimization, effectively.

Principle of preserving supplied data

For me, the key difference between the options is that in option 3, we store and never change what the conference organizer entered. The organizer told us that the event would start at the given address in Amsterdam, at 9am on July 10th 2022. That’s what we stored, and that information never needs to change (unless the organizer wants to change it, of course). The UTC value is derived from that “golden” information, but can be re-derived if the context changes – such as when time zone rules change.

In option 2, we don’t store the original information – we only store derived information (the UTC instant). We need to store information to tell us all the context about how we derived it (the old time zone rules version) and when updating the entry, we need to get back to the original information before we can re-derive the UTC instant using the new rules.

If you’re going to need the original information anyway, why not just store that? The implementation ends up being simpler, and it means it doesn’t matter whether or not we even have the old time zone rules.

Representation vs information

It’s important to note that I’m only talking about preserving the core information that the organizer entered. For the purposes of this example at least, we don’t need to care about the representation they happened to use. Did they enter it as “July 10 2022 09:00” and we then parsed that? Did they use a calendar control that provided us with “2022-07-10T09:00”? I don’t think that’s important, as it’s not part of the core information.

It’s often a useful exercise to consider what aspects of the data you’re using are “core” and which are incidental. If you’re receiving data from another system as text for example, you probably don’t want to store the complete XML or JSON, as that choice between XML and JSON isn’t relevant – the same data could be represented by an XML file and a JSON file, and it’s unlikely that anything later will need to know or care.

A possible option 4?

I’ve omitted a fourth option which could be useful here, which is a mixture of 2 and 3. If you store a “date/time with UTC offset” then you’ve effectively got both the local start time and the UTC instant in a single field. To show the values again, you’d start off with:

  • ID: 1
  • Name: KindConf
  • Start: 2022-07-10T09:00:00+02:00
  • Address: Europaplein 24, 1078 GZ Amsterdam, Netherlands
  • TimeZoneId: Europe/Amsterdam
  • TimeZoneRules: 2019a

On March 14th 2020, when the time zone database 2020c is released, this is modified to:

  • ID: 1
  • Name: KindConf
  • Start: 2022-07-10T09:00:00+01:00
  • Address: Europaplein 24, 1078 GZ Amsterdam, Netherlands
  • TimeZoneId: Europe/Amsterdam
  • TimeZoneRules: 2020c

In systems that support “date/time with UTC offset” well in both the database and the languages using it, this might be an attractive solution. It’s important to note that the time zone ID is still required (unless you derive it from the address whenever you need it) – there’s a huge difference between knowing the time zone that’s applied, and knowing the UTC offset in one specific situation.

Personally I’m not sure I’m a big fan of this option, as it combines original and derived data in a single field – the local part is the original data, and the offset is derived. I like the separation between original and derived data in option 3.

With all those options presented, let’s look at a few of the corner cases I’ve mentioned in the course of the post.

Ambiguous and skipped times

In both of the implementations I’ve shown, I’ve used the InZoneLeniently method from Noda Time. While the mapping from UTC instant to local time is always completely unambiguous for a single time zone, the reverse mapping (from local time to UTC instant) is not always unambiguous.

As an example, let’s take the Europe/London time zone. On March 31st 2019, at 1am local time, we will “spring forward” to 2am, changing offset from UTC+0 to UTC+1. On October 27th 2019, at 2am local time, we will “fall back” to 1am, changing offset from UTC+1 to UTC+0. That means that 2019-03-31T01:30 does not happen at all in the Europe/London time zone, and 2019-10-27T01:30 occurs twice.

Now it’s reasonable to validate this when a conference organizer specifies the starting time of a conference, either prohibiting it if the given time is skipped, or asking for more information if the given time is ambiguous. I should point out that this is highly unlikely for a conference, as transitions are generally done in the middle of the night – but other scenarios (e.g. when to schedule an automated backup) may well fall into this.

That’s fine at the point of the first registration, but it’s also possible that a previously-unambiguous local time could become ambiguous under new time zone rules. InZoneLeniently handles that in a way documented in the Resolvers.LenientResolver. That may well not be the appropriate choice for any given application, and developers should consider it carefully, and write tests.

Recurrent events

The example I’ve given so far is for a single event. Recurrent events – such as weekly meetings – end up being trickier still, as a change to time zone rules can change the offsets for some instances but not others. Likewise meetings may well be attended by people from more than a single time zone – so it’s vital that the recurrence would have a single coordinating time zone, but offsets may need to be recomputed for every time zone involved, and for every occurrence. Application developers have to think about how this can be achieved within performance requirements.

Time zone boundary changes and splits

So far we’ve only considered time zone rules changing. In options 2-4, we stored a time zone ID within the entry. That assumes that the time zone associated with the event will not change over time. That assumption may not be valid.

As far as I’m aware, time zone rules change more often than changes to which time zone any given location is in – but it’s entirely possible for things to change over time. Suppose the conference wasn’t in Amsterdam itself, but Rotterdam. Currently Rotterdam uses the Europe/Amsterdam time zone, but what if the Netherlands splits into two countries between 2019 and 2022? It’s feasible that by the time the conference occurs, there could be a Europe/Rotterdam time zone, or something equivalent.

To that end, a truly diligent application developer might treat the time zone ID as derived data based on the address of the conference. As part of checking each entry when the time zone database is updated, they might want to find the time zone ID of the address of the conference, in case that’s changed. There are multiple services that provide this information, although it may need to be a multi-step process, first converting the address into a latitude/longitude position, and then finding the time zone for that latitude/longitude.

Past vs recent past

This post has all been about future date/time values. In Twitter threads discussing time zone rule changes, there’s been a general assertion that it’s safe to only store the UTC instant related to an event in the past. I would broadly agree with that, but with one big caveat: as I mentioned earlier, sometimes governments adopt time zone rule changes with almost no notice at all. Additionally, there can be a significant delay between the changes being published and them being available within applications. (That delay can vary massively based on your platform.)

This means that while a conversion to UTC for a value more than (say) a year ago will probably stay valid, if you’re recording a date and time of “yesterday”, it’s quite possible that you’re using incorrect rules without knowing it. (Even very old rules can change, but that’s rarer in my experience.)

Do you need to account for this? That depends on your application, like so many other things. I’d at least consider the principle described above – and unless it’s much harder for you to maintain the real source information for some reason, I’d default to doing that.

Conclusion

The general advice of “just convert all local date/time data to UTC and store that” is overly broad in my view. For future and near-past events, it doesn’t take into account that time zone rules change, making the initial conversion potentially inaccurate. Part of the point of writing this blog post is to raise awareness, so that even if people do still recommend storing UTC, they can add appropriate caveats rather than treating it as a universal silver bullet.

I should explicitly bring up timestamps at this point. Machine-generated timestamps are naturally instants in time, recording “the instant at which something occurred” in an unambiguous way. Storing those in UTC is entirely reasonable – potentially with an offset or time zone if the location at which the timestamp was generated is relevant. Note that in this case the source of the data isn’t “a local time to be converted”.

That’s the bigger point, that goes beyond dates and times and time zones: choosing what information to store, and how. Any time you discard information, that should be a conscious choice. Are you happy discarding the input format that was used to enter a date? Probably – but it’s still a decision to make. Defaulting to “convert to UTC” is a default to discarding information which in some cases is valid, but not all. Make it a conscious choice, and ensure you store all the information you think may be needed later. You might also want to consider whether and how you separate “source” information from “derived” information – this is particularly relevant when it comes to archiving, when you may want to discard all the derived data to save space. That’s much easier to do if you’re already very aware of which data is derived.

My experience is that developers either don’t think about date/time details nearly enough when coding, or are aware of some of the pitfalls but decide that means it’s just too hard to contemplate. Hopefully this worked example of real life complexity shows that it can be done: it takes a certain amount of conscious thought, but it’s not rocket science.

Hosting ASP.NET Core behind https in Google Kubernetes Engine

Side-note: this may be one of the clumsiest titles I’ve ever written for a blog post. But it does what it says on the tin. Oh, and the space after “ASP” in “ASP .NET Core” everywhere it to avoid auto-linking. While I could use a different dot or a zero-width non-breaking space to avoid it, I’m not sure I trust WordPress to do the right thing with those…

Background

Over the past few weeks, I’ve moved nodatime.org, csharpindepth.com and jonskeet.uk over to Google Kubernetes Engine. (They all used to be hosted on Azure.)

I’ve done this for a few reasons:

  • As my job is primarily making .NET developers more productive on Google Cloud Platform, it feels natural to run my own code there. I want to see where there are friction points, so I can help fix them.
  • I wanted more redundancy, particularly for nodatime.org; Kubernetes felt a simple way of managing that at a reasonable cost
  • HTTPS certificate management (via Let’s Encrypt) has been a bit painful for me on Azure; I could have automated more, but that would have taken extra time I don’t have. (It may also have improved since I last looked.)

The first of these is the most important, by a long way. But the HTTPS management part – and then the knock-on effects – is what I’m looking at in this blog post.

Basic hosting

Hosting an ASP .NET Core application in Google Kubernetes Engine (GKE from now on) is really simple, at least once you’ve understood the Kubernetes concepts. I have:

In each case, the ASP .NET Core application is built with a vanilla Dockerfile which would not look unusual to anyone who’s hosted ASP .NET Core in Docker anywhere else.

I happen to use Google Cloud Build to build the Docker images, and Google Container Registry to host the images, but neither of those are required. (For csharpindepth.com and jonskeet.uk there are simple triggers in Google Cloud Build to build and deploy on GitHub pushes; for nodatime.org it’s a bit more complicated as the documentation build currently has some Windows dependencies. I have a machine at home that polls GitHub every half hour, and pushes the built application to Google Cloud Build for packaging when necessary.)

So, that gets HTTP hosting sorted. I dare say there are some aspects I’ve not configured as well as I could have done, but it was reasonably straightforward to get going.

HTTPS with Google-managed certificates

With HTTP working, it’s time to shoot for HTTPS. It’s important to note that the apps I’m talking about are all hobby projects, not commercial ones – I’m already paying for hosting, so I don’t want to have to pay for SSL certificates as well. Enter Let’s Encrypt, of course.

A while ago I used Let’s Encrypt to set up HTTPS on Azure, and while it was free and I didn’t have to write any code, it wasn’t exactly painless. I followed two guides at the same time, because neither of them exactly matched the Azure portal I was looking at. There were lots of bits of information to grab from various different bits of the portal, and it took a couple of attempts to get right… but I got there. I also set up a web job to renew the certificates, but didn’t go through the hoops required to run those web jobs periodically. (There were instructions, but it looked like they’d take a while to work through compared with just manually running the web job every couple of months or so. I decided to take the pragmatic approach, knowing that I was expecting to move to GKE anyway. If Azure had been the expected permanent home for the apps, I’d have gone through the steps and I’m sure they’d have worked fine.) I don’t know which guide I worked through at the time, but if I were starting today I’d probably try Scott Hanselman’s guide.

So, what can I do on Google Cloud Platform instead? I decided to terminate the SSL connection at the load balancer, using Google-managed certificates. To be really clear, these are currently in beta, but have worked well for me so far. Terminating the SSL connection at the load balancer means that the load balancer forwards the request to the Kubernetes service as an HTTP request, not HTTPS. The ASP .NET Core app itself only exposes an HTTP port, so it doesn’t need to know any details of certificates.

The steps to achieve this are simple, assuming you have the Google Cloud SDK (gcloud) installed already:

  • Create the certificate, e.g.
    gcloud beta compute ssl-certificates create nodatime-org --domains nodatime.org
  • Attach the certificate to the load balancer, via the Kubernetes ingress in my case, with an annotation in the ingress metadata:
    ingress.gcp.kubernetes.io/pre-shared-cert: "nodatime-org"
  • Apply the modifications to the ingress:
    kubectl apply -f ingress.yaml
  • Wait for the certificate to become valid (the provisioning procedure takes a little while, and I’ve seen some interesting errors while that’s taking place)
  • Enjoy HTTPS support, with auto-renewing certificates!

There are only two downsides to this that I’ve experienced so far:

  • Currently each certificate can only be associated with a single domain. For example, I have different certificates for nodatime.org, http://www.nodatime.org and test.nodatime.org. (More about the last of these later.) This is a minor annoyance, but the ingress supports multiple pre-shared certificates, so it has no practical implications for me.
  • I had to accept some downtime on HTTPS when transferring from Azure to GKE, while the certificate was provisioning after I’d transferred the DNS entry. This was a one-time issue of course, and one that wouldn’t affect most users.

Beyond the basics

At this point I had working HTTPS URLs – but any visitor using HTTP would stay that way. (At the time of writing this is still true for csharpindepth.com and jonskeet.uk.) Obviously I’d like to encourage secure browsing, so I’d like to use the two pieces of functionality provided by ASP .NET Core:

  • Redirection of HTTP requests via app.UseHttpsRedirection()
  • HSTS support via app.UseHsts()

I should note here that the Microsoft documentation was fabulously useful throughout. It didn’t quite get me all the way, but it was really close.

Now, I could have just added those calls into the code and deployed straight to production. Local testing would have worked – it would have redirected from localhost:5000 on HTTP to localhost:5001 on HTTPS with no problems. It would also have failed massively for reasons we’ll look at in a minute. Just for a change, I happened to do the right thing…

For hosting changes, always use a test deployment first

In Azure, I had a separate AppService I could deploy to, called nodatimetest. It didn’t have a fancy URL, but it worked okay. That’s where I tested Azure-specific changes before deploying to the real AppService. Unfortunately, it wouldn’t have helped in this situation, as it didn’t have a certificate.

Fortunately, creating a new service in Kubernetes, adding it to the ingress, and creating a managed certificate is so easy that I did do this for the new hosting – and I’m so glad I did so. I use a small script to publish the local ASP .NET Core build to Google Cloud Build which does the Docker packaging, pushes it to Google Container Registry and updates the Kubernetes deployment. As part of that script, I add a small text file containing the current timestamp so I can check that I’m really looking at the deployment I expect. It takes just under two minutes to build, push, package, deploy – not a tight loop you’d want for every day development, but pretty good for the kind of change that can’t be tested locally.

So, I made the changes to use HTTPS redirection and HSTS, deployed, and… there was no obvious change.

Issue 1: No HTTPS port to redirect to

Remember how the ASP .NET Core app in Kubernetes is only listening on HTTP? That means it doesn’t know which port to redirect users to for HTTPS. Oops. While I guess it would be reasonable to guess 443 if it didn’t know any better, the default of “don’t redirect if you haven’t been told a port” means that your application doesn’t stop working if you get things wrong – it just doesn’t redirect.

This is easily fixed in ConfigureServices:

services.AddHttpsRedirection(options => options.HttpsPort = 443);

… but I’ve added conditional code so it doesn’t do that in development environment, as otherwise it would try to redirect from localhost:5000 to localhost:443, which wouldn’t work. This is a bit hacky at the moment, which is a common theme – I want to clean up all the configuration at some point quite soon (moving things into appsettings.json as far as possible) but it’s just about hanging together for now.

So, make the change, redeploy to test, and… observed infinite redirection in the browser. What?

Issue 2: Forwarding proxied headers

Remember again how the ASP .NET Core app is only listening on HTTP? We want it to behave differently depending on whether the end user made a request to the load balancer on HTTP or HTTPS. That means using headers forwarded from the proxy (in our case the load balancer) to determine the original request scheme. Fortunately, again there’s documentation on hand for this.

There are two parts to configuring this:

  • Configuring the ForwardedHeadersOptions in ConfigureServices
  • Calling app.UseForwardedHeaders() in Configure

(At least, that’s the way that’s documented. I’m sure there are myriad alternatives, but my experience level of ASP .NET Core is such that I’m still in “follow the examples verbatim, changing as little as possible” at the moment.)

I won’t go into the gory details of exactly how many times I messed up the forwarded headers options, but I will say:

  • The example which just changes options.ForwardedHeaders is probably fine if your proxy server is local to the application, but otherwise you will need do to extra work
  • The troubleshooting part of the documentation is spectacularly useful
  • There are warnings logged if you get things wrong, and those logs will help you – but they’re at a debug log level, so you may need to update your logging settings. (I only realized this after I’d fixed the problem, partly thanks to Twitter.)

Lesson to learn: when debugging a problem, turn on debugging logs. Who’d have thought?

Configuring this properly is an area where you really need to understand your deployment and how a request reaches you. In my case, the steps are:

  • The user’s HTTPS request is terminated by the load balancer
  • The load balancer makes a request to the Kubernetes service
  • The Kubernetes service makes a request to the application running on one of the suitable nodes

This leads to a relatively complex configuration, as there are two networks to trust (Google Cloud load balancers, and my internal Kubernetes network) and we need to allow two “hops” of proxying. So my configuration code looks like this:

services.Configure<ForwardedHeadersOptions>(options =>
{
    options.KnownNetworks.Clear();
    // Google Cloud Platform load balancers
    options.KnownNetworks.Add(new IPNetwork(IPAddress.Parse("130.211.0.0"), 22));
    options.KnownNetworks.Add(new IPNetwork(IPAddress.Parse("35.191.0.0"), 16));
    // GKE service which proxies the request as well.
    options.KnownNetworks.Add(new IPNetwork(IPAddress.Parse("10.0.0.0"), 8));
    options.ForwardedHeaders = ForwardedHeaders.XForwardedFor | ForwardedHeaders.XForwardedProto;
    options.ForwardLimit = 2;
});

(The call to KnownNetworks.Clear() probably isn’t necessary. The default is to include the loopback, which is safe enough to leave in the list.)

Okay, deploy that to the test environment. Everything will work now, right? Well, sort of…

Issue 3: make sure health checks are healthy!

As it happens, when I’d finished fixing issue 2, I needed to help at a birthday party for a family we’re friends with. Still, I went to the party happily knowing everything was fine.

I then came home and found the test deployment was broken. Really broken. “502 Bad Gateway” broken. For both HTTP and HTTPS. This is not good.

I tried adding more logging, but it looked like none of my requests were getting through to the application. I could see in the logs (thank you, Stackdriver!) that requests were being made, always to just “/” on HTTP. They were all being redirected to HTTPS via a 307, as I’d expect.

This confused me for a while. I honestly can’t remember what gave me the lightbulb moment of “Ah, these are load balancer health checks, and it thinks they’re failing!” but I checked with the load balancer in the Google Cloud Console and sure enough, I had multiple working backends, and one broken one – my test backend. The reason I hadn’t seen this before was that I’d only checked the test deployment for a few minutes – not long enough for the load balancer to deem the backend unhealthy.

I was stuck at this point for a little while. I considered reconfiguring the load balancer to make the health check over HTTPS, but I don’t think that could work as the app isn’t serving HTTPS itself – I’d need to persuade it to make the request as if it were a user-generated HTTPS request, with appropriate X-Forwarded-Proto etc headers. However, I saw that I could change which URL the load balancer would check. So how about we add a /healthz URL that would be served directly without being redirected? (The “z” at the end is a bit of Googler heritage. Just /health would be fine too, of course.)

I started thinking about adding custom inline middleware to do this, but fortunately didn’t get too far before realizing that ASP .NET Core provides health checking already… so all I needed to do was add the health check middleware before the HTTPS redirection middleware, and all would be well.

So in ConfigureServices, I added a no-op health check service:

services.AddHealthChecks();

And in Configure I added the middleware at an appropriate spot:

app.UseHealthChecks("/healthz");

After reconfiguring the health check on the load balancer, I could see /healthz requests coming in and receiving 200 (OK) responses… and the load balancer was then happy to use the backend again. Hooray!

After giving the test service long enough to fail, I deployed to production, changed the load balancer health check URL, and all was well. I did the two parts of this quickly enough so that it never failed – a safer approach would have been to add the health check handler but without the HTTPS redirection first, deploy that, change the health check URL, then turn on HTTPS.

But the end result is, all is working! Hooray!

Conclusion

Moving the service in the first place has been a long process, mostly due to a lack of time to spend on it, but the HTTPS redirection has been its own interesting bit of simultaneous pleasure and frustration. I’ve learned a number of lessons along the way:

  • The combination of Google Kubernetes Engine, Google Cloud Load Balancing, Google Cloud Build and Google Container registry is pretty sweet.
  • Managed SSL certificates are wonderfully easy to use, even if there is a bit of a worrying delay while provisioning
  • It’s really, really important to be able to test deployment changes (such as HTTPS redirection) in an environment which is very similar to production, but which no-one is depending on. (Obviously if you have a site which few people care about anyway, there’s less risk. But as it’s easy to set up a test deployment on GKE, why not do it anyway?)
  • HTTPS redirection caused me three headaches, all predictable:
    • ASP .NET Core needs to know the HTTPS port to redirect to.
    • You need to configure forwarded headers really carefully, and know your deployment model thoroughly .
    • Be aware of health checks! Make sure you leave a test deployment “live” for long enough for the health checks to mark it as unhealthy if you’ve done something wrong, before you deploy to production.
  • When debugging, turn on debug logging. Sounds obvious in retrospect, doesn’t it? (Don’t start trying to copy middleware source code into your own application so you can add logging, rather than using the logging already there…)

I also have some future work to do:

  • There’s a particular URL (http://nodatime.org/tzdb/latest.txt) which is polled by applications in order to spot time zone information changes. That’s the bulk of the traffic to the site. It currently redirects to HTTPS along with everything else, which leads to the total traffic being nearly double what it was before, for no real benefit. I’ve encouraged app authors to use HTTPS instead, but I’ve also filed a feature request against myself to consider serving that particular URL without the redirect. It looks like that’s non-trivial though.
  • I have a bunch of hard-coded information which should really be in appsettings.json. I want to move all of that, but I need to learn more about the best way of doing it first.

All in all, this has been a very positive experience – I hope the details above are useful to anyone else hosting ASP .NET Core apps in Google Kubernetes Engine.

NullableAttribute and C# 8

Background: Noda Time and C# 8

Note: this blog post was written based on experimentation with Visual Studio 2019 preview 2.2. It’s possible that some of the details here will change over time.

C# 8 is nearly here. At least, it’s close enough to being “here” that there are preview builds of Visual Studio 2019 available that support it. Unsurprisingly, I’ve been playing with it quite a bit.

In particular, I’ve been porting the Noda Time source code to use the new C# 8 features. The master branch of the repo is currently the code for Noda Time 3.0, which won’t be shipping (as a GA release) until after C# 8 and Visual Studio 2019 have fully shipped, so it’s a safe environment in which to experiment.

While it’s possible that I’ll use other C# 8 features in the future, the two C# 8 features that impact Noda Time most are nullable reference types and switch expressions. Both sets of changes are merged into master now, but the pull requests are still available so you can see just the changes:

The switch expressions PR is much simpler than the nullable reference types one. It’s entirely an implementation detail… although admittedly one that confused docfx, requiring a few of those switch expressions to be backed out or moved in a later PR.

Nullable reference types are a much, much bigger deal. They affect the public API, so they need to be treated much more carefully, and the changes end up being spread far wide throughout the codebase. That’s why the switch expression PR is a single commit, whereas nullable reference types is split into 14 commits – mostly broken up by project.

Reviewing the public API of a nullable reference type change

So I’m now in a situation where I’ve got nullable reference type support in Noda Time. Anyone consuming the 3.0 build (and there’s an alpha available for experimentation purposes) from C# 8 will benefit from the extra information that can now be expressed about parameters and return values. Great!

But how can I be confident in the changes to the API? My process for making the change in the first place was to enable nullable reference types and see what warnings were created. That’s a great starting point, but it doesn’t necessarily catch everything. In particular, although I started with the main project (the one that creates NodaTime.dll), I found that I needed to make more changes later on, as I modified other projects.

Just because your code compiles without any warnings with nullable reference types enabled doesn’t mean it’s “correct” in terms of the API you want to expose.

For example, consider this method:

public static string Identity(string input) => input;

That’s entirely valid C# 7 code, and doesn’t require any changes to compile, warning-free, in C# 8 with nullable reference types enabled. But it may not be what you actually want to expose. I’d argue that it should look like one of these:

// Allowing null input, and producing nullable output
public static string? Identity(string? input) => input;

// Preventing null input, and producing non-nullable output
public static string Identity(string input)
{
    // Convenience method for nullity checking.
    Preconditions.CheckNotNull(input, nameof(input));
    return input;
}

If you were completely diligent when writing tests for the code before C# 8, it should be obvious which is required – because you’d presumably have something like:

[Test]
public void Identity_AcceptsNull()
{
    Assert.IsNull(Identity(null));
}

That test would have produced a warning in C# 8, and would have suggested that the null-permissive API is the one you wanted. But maybe you forgot to write that test. Maybe the test you would have written was one that would have shown up a need to put that precondition in. It’s entirely possible that you write much more comprehensive tests than I do, but I suspect most of us have some code that isn’t explicitly tested in terms of its null handling.

The important part take-away here is that even code that hasn’t changed in appearance can change meaning in C# 8… so you really need to review any public APIs. How do you do that? Well, you could review the entire public API surface you’re exposing, of course. For many libraries that would be the simplest approach to take, as a “belt and braces” attitude to review. For Noda Time that’s less appropriate, as so much of the API only deals in value types. While a full API review would no doubt be useful in itself, I just don’t have the time to do it right now.

Instead, what I want to review is any API element which is impacted by the C# 8 change – even if the code itself hasn’t changed. Fortunately, that’s relatively easy to do.

Enter NullableAttribute

The C# 8 compiler applies a new attribute to every API element which is affected by nullability. As an example of what I mean by this, consider the following code which uses the #nullable directive to control the nullable context of the code.

public class Test
{
#nullable enable
    public void X(string input) {}

    public void Y(string? input) {}
#nullable restore

#nullable disable
    public void Z(string input) {}
#nullable restore
}

The C# 8 compiler creates an internal NullableAttribute class within the assembly (which I assume it wouldn’t if we were targeting a framework that already includes such an attribute) and applies the attribute anywhere it’s relevant. So the above code compiles to the same IL as this:

using System.Runtime.CompilerServices;

public class Test
{
    public void X([Nullable((byte) 1)] string input) {}    

    public void Y([Nullable((byte) 2)] string input) {}

    public void Z(string input) {}}
}

Note how the parameter for Z doesn’t have the attribute at all, because that code is still oblivious to nullable reference types. But both X and Y have the attribute applied to their parameters – just with different arguments to describe the nullability. 1 is used for not-null; 2 is used for nullable.

That makes it relatively easy to write a tool to display every part of a library’s API that relates to nullable reference types – just find all the members that refer to NullableAttribute, and filter down to public and protected members. It’s slightly annoying that NullableAttribute doesn’t have any properties; code to analyze an assembly needs to find the appropriate CustomAttributeData and examine the constructor arguments. It’s awkward, but not insurmountable.

I’ve started doing exactly that in the Noda Time repository, and got it to the state where it’s fine for Noda Time’s API review. It’s a bit quick and dirty at the moment. It doesn’t show protected members, or setter-only properties, or handle arrays, and there are probably other things I’ve forgotten about. I intend to improve the code over time and probably move it to my Demo Code repository at some point, but I didn’t want to wait until then to write about NullableAttribute.

But hey, I’m all done, right? I’ve just explained how NullableAttribute works, so what’s left? Well, it’s not quite as simple as I’ve shown so far.

NullableAttribute in more complex scenarios

It would be oh-so-simple if each parameter or return type could just be nullable or non-nullable. But life gets more complicated than that, with both generics and arrays. Consider a method called GetNames() returning a list of strings. All of these are valid:

// Return value is non-null, and elements aren't null
List<string> GetNames()

// Return value is non-null, but elements may be null
List<string?> GetNames()

// Return value may be null, but elements aren't null
List<string>? GetNames()

// Return value may be null, and elements may be null
List<string?>? GetNames()

So how are those represented in IL? Well, NullableAttribute has one constructor accepting a single byte for simple situations, but another one accepting byte[] for more complex ones like this. Of course, List<string> is still relatively simple – it’s just a single top-level generic type with a single type argument. For a more complex example, imagine Dictionary<List<string?>, string[]?> . (A non-nullable reference to a dictionary where each key is a not-null list of nullable strings, and each value is a possibly-null array of non-nullable elements. Ouch.)

The layout of NullableAttribute in these cases can be thought of in terms of a pre-order traversal of a tree representing the type, where generic type arguments and array element types are leaves in the tree. The above example could be thought of as this tree:

         Dictionary<,> (not null)
            /               \
           /                 \
 List<> (not null)      Array (nullable)
        |                     |
        |                     |
 string (nullable)      string (not null)

The pre-order traversal of that tree gives us these values:

  • Not null (dictionary)
  • Not null (list)
  • Nullable (string)
  • Nullable (array)
  • Not null (string)

So a parameter declared with that type would be decorated like this:

[Nullable(new byte[] { 1, 1, 2, 2, 1 })]

But wait, there’s more!

NullableAttribute in simultaneously-complex-and-simple scenarios

The compiler has one more trick up its sleeve. When all the elements in the tree are “not null” or all elements in the tree are “nullable”, it simply uses the constructor with the single-byte parameter instead. So Dictionary<List<string>, string[]> would be decorated with Nullable[(byte) 1] and Dictionary<List<string?>?, string?[]?>? would be decorated with Nullable[(byte) 2].

(Admittedly, Dictionary<,> doesn’t permit null keys anyway, but that’s an implementation detail.)

Conclusion

The C# 8 feature of nullable reference types is a really complicated one. I don’t think we’ve seen anything like this since async/await. This post has just touched on one interesting implementation detail. I’m sure there’ll be more posts on nullability over the next few months…

Farewell, Daisy Shipton

This is more of a quick, explanatory “heads-up” post than anything else.

On March 31st 2018, I started an experiment: I created a new Stack Overflow user called “Daisy Shipton” with no picture and a profile that just read “Love coding in C#” (or similar). I wanted to see how a new user presenting with a traditionally-female name would be treated, while posting the same content that I normally would. This experiment was only a small part of my thinking around the culture of Stack Overflow, and I expect to write more on that subject, touching on the experience of “Daisy”, at another time.

I let a few people in on the secret as I went along – people who I fully expected to recognize my writing style fairly quickly. A single person emailed me to ask whether Daisy and I were the same person – well done to them for spotting it. (Once someone had the idea, the evidence was pretty compelling – the “Jon Skeet” account went into a decline in posting answers at the same time that the “Daisy Shipton” account was created, and Daisy just happened to post about C#, Noda Time, Protocol Buffers, time zones and Google Cloud Platform client libraries for .NET. I really wasn’t trying to cover my tracks.)

As Daisy reached a rep of about 12,000 points, there became little point in continuing the experiment, so I asked for “her” account to be merged into my regular one. So if you see comments on my posts referring to @DaisyShipton, that’s why.

There’s one aspect of experimentation that never happened: Daisy never asked a question. Next time I want to ask a question on Stack Overflow, I’ll probably create another account to see how a question I think is good is received when posted from a 1-rep account.

It’s been fun, but it’ll also be nice to only have one account to manage now…