Category Archives: Evil code

When is a constant not a constant? When it’s a decimal…

A comment on a Stack Overflow post recently got me delving into constants a bit more thoroughly than I have done before.

Const fields

I’ve been aware for a while that although you can specify decimal field as a const in C#, it’s not really const as far as the CLR is concerned. Let’s consider this class to start with:

class Test
{
    const int ConstInt32 = 5;
    const decimal ConstDecimal = 5;
}

Firstly, csc gives us a warning about ConstDecimal but not about ConstInt32:

Test.cs(4,19): warning CS0414: The field ‘Test.ConstDecimal’ is assigned but its value is never used

The Roslyn compiler (rcsc) doesn’t warn about either of the fields. This is just a curiosity, really – but it already shows that there’s some difference in how they’re compiled. When we delve into the IL, the difference is much more pronounced:

.class private auto ansi beforefieldinit Test
       extends [mscorlib]System.Object
{
  .field private static literal int32 ConstInt32 = int32(0x00000005)
  .field private static initonly valuetype [mscorlib]System.Decimal ConstDecimal
  .custom instance void [mscorlib]DecimalConstantAttribute::.ctor
      (uint8, uint8, uint32, uint32, uint32) = 
      ( 01 00 00 00 00 00 00 00 00 00 00 00 05 00 00 00 00 00 ) 

  // Skip the parameterless constructor

  .method private hidebysig specialname rtspecialname static 
          void  .cctor() cil managed
  {
    // Code size       12 (0xc)
    .maxstack  8
    IL_0000:  ldc.i4.5
    IL_0001:  newobj     instance void [mscorlib]System.Decimal::.ctor(int32)
    IL_0006:  stsfld     valuetype [mscorlib]System.Decimal Test::ConstDecimal
    IL_000b:  ret
  } // end of method Test::.cctor

} // end of class Test

First things to note:

ConstInt32 has the literal constraint in IL. From ECMA 335, I.8.6.1.2:

The literal constraint promises that the value of the location is actually a fixed value
of a built-in type. The value is specified as part of the constraint. Compilers are
required to replace all references to the location with its value, and the VES therefore
need not allocate space for the location. This constraint, while logically applicable to
any location, shall only be placed on static fields of compound types. Fields that are
so marked are not permitted to be referenced from CIL (they shall be in-lined to their
constant value at compile time), but are available using reflection and tools that
directly deal with the metadata.

Whereas ConstDecimal only has the initonly constraint. Again, from ECMA 335:

The init-only constraint promises (hence, requires) that once the location has been
initialized, its contents never change. Namely, the contents are initialized before any
access, and after initialization, no value can be stored in the location. The contents are
always identical to the initialized value (see §I.8.2.3). This constraint, while logically
applicable to any location, shall only be placed on fields (static or instance) of
compound types.

(Here “compound type” just means “not an array type” – although in quite a confusing manner. I would ignore it if I were you.)

Next, note that there’s an attribute (System.Runtime.CompilerServices.DecimalConstantAttribute – I’ve taken the namespace off in the listing for the sake of readability) applied to the field, which tells anything consuming the assembly that it should be a constant, and what its value is. Indeed, if you’re very careful, you can create your own “constants” like this:

[DecimalConstant((byte)0, (byte)0, (uint)0, (uint)0, (uint) 5)]
public static readonly decimal ConstDecimal;

That field declaration will be treated as a constant in other assembles – but not within the same assembly. So printing ConstDecimal within the same assembly will result in 0 (unless you change it to another value in the static initializer) whereas printing Test.ConstDecimal in a different assembly will result in 5. (It won’t even touch the field at execution time.) I’m sure I can work out some nasty ways of abusing that, if I try hard enough.

Note that the casts to uint are important – if you accidentally call the attribute constructor with a (byte, byte, int, int, int), the compiler doesn’t recognize it. (Interesting, the latter was only introduced in .NET 2.0. I’ve no idea why.)

Amusingly, you can combine the two:

[DecimalConstant((byte)0, (byte)0, (uint)0, (uint)0, (uint) 5)]
public const decimal ConstDecimal = 10;

In this case, the IL contains both DecimalConstant attributes, despite the fact that it’s only legal to apply one. (AllowMultiple is false on its AttributeUsage.) The compiler appears to pick the one specified by the value rather than manually applied, which is slightly disappointing, but of no real importance.

Optional parameters

In the case of const fields, we’ve really only cared about what the compiler does. Let’s try something where both the compiler and the framework can get involved: optional parameters.

Again, let’s write a little class to demonstrate how the default values of optional parameters are encoded in IL:

public class Test
{
    public void PrintInt32(int x = 10)
    {
        Console.WriteLine(x);
    }

    public void PrintDecimal(decimal d = 10m)
    {
        Console.WriteLine(d);
    }
}

The important bits of the generated IL are:

.method public hidebysig instance void PrintInt32([opt] int32 x) cil managed
{
    .param [1] = int32(0x0000000A)
    ...
} // end of method Test::PrintInt32

.method public hidebysig instance void PrintDecimal(
    [opt] valuetype [mscorlib]System.Decimal d) cil managed
{
    .param [1]
    .custom instance void [mscorlib] DecimalConstantAttribute::.ctor
    (uint8, uint8, uint32, uint32, uint32) =
    ( 01 00 00 00 00 00 00 00 00 00 00 00 0A 00 00 00 00 00 ) 
    ...
} // end of method Test::PrintDecimal

Again, we have a DecimalConstantAttribute to describe the default value for the decimal parameter, whereas . If you call the method but don’t specify an argument, the compiler notes the DecimalConstantAttribute applied to the method parameter, and constructs the value in the calling code. That’s not the only way of observing the default value, however – you can do it with reflection, too:

public static void Main()
{
    var method = typeof(Test).GetMethod("PrintDecimal");
    var parameter = method.GetParameters()[0];
    Console.WriteLine("Default value: {0}", parameter.DefaultValue);
}

As you’d expect, that prints a default value of 10. There’s no direct equivalent of DefaultValue for FieldInfo – there’s GetRawConstantValue() but that only works for constants that the CLR is directly aware of – it fails for a field like const decimal Foo = 10m , with an InvalidOperationException. I’ll talk more about CLR constants later.

Now let’s try something a bit more tricksy though… C# doesn’t support DateTime literals, unlike VB – but there’s a DateTimeConstantAttribute – what happens if we try to apply that ourselves? Let’s see…

public void PrintDateTime
    ([Optional, DateTimeConstant(635443315962469079L)] DateTime date)
{
    Console.WriteLine(date);
}

So if we call PrintDateTime(), what does that print? Well (leaving aside the formatting – the examples below use the UK default formatting):

  • With csc.exe (the “old” C# compiler), with the call in the same assembly as the method declaration, it prints 01/01/0001 00:00:00
  • With csc.exe, with the call in a different assembly to the method declaration, it prints 22/08/2014 19:13:16
  • With rcsc.exe (Roslyn), it prints 22/08/2014 19:13:16 regardless of where it’s called from
  • If you call it dynamically (dynamic d = new Test(); d.PrintDateTime();) it prints 22/08/2014 19:13:16 regardless of the compiler – at least with the version of .NET I’m using. It may well vary by version.

In every case, printing out the ParameterInfo.DefaultValue gives the right answer: the framework doesn’t care whether or not the compiler understands the attribute.

In case you’re wondering why I didn’t mention this possibility for constant fields – I tried it, and it didn’t work, even in Roslyn. For some reason optional parameters are treated as more “special” than constant fields.

Having got this far, why stop with DateTime? The DateTimeConstantAttribute class derives from CustomConstantAttribute (whereas DecimalConstantAttribute just derives from Attribute). So can I introduce my own constant attributes? Noda Time seems to be an obvious candidate for this – let’s try for a LocalDateConstantAttribute:

[AttributeUsage(AttributeTargets.Parameter | AttributeTargets.Field)]
public class LocalDateConstantAttribute : CustomConstantAttribute
{
    private readonly LocalDate value;

    public LocalDateConstantAttribute(int year, int month, int day)
    {
        value = new LocalDate(year, month, day);
    }

    public override object Value { get { return value; } }
}
...
public void PrintLocalDate(
    [Optional, LocalDateConstant(2014, 8, 22)] LocalDate date)
{
    Console.WriteLine(date);
}

How does this fare? Not so well, unfortunately:
– With a normal method call (regardless of assembly), it prints 01 January 1970 (the default value for a LocalDate in Noda Time 1.3)
– With a dynamic method call it prints 01 January 1970
– With reflection (i.e. using ParameterInfo.DefaultValue) the framework does construct the appropriate LocalDate, which seems logical as it’s presumably just using the Value property of the attribute

So, there’s still work to be done there. I think it would be feasible to do this in a general way, if it’s acceptable for an exception to be thrown if the Value property returns an incompatible type of object to the parameter type. The great thing is that Roslyn is open source, so I should be able to spike this myself, right? How hard can it be? (Cue howls of derisive laughter from the Roslyn team, who will know much better than I how hard it’s likely to really be.)

CLR constants and attribute arguments

So with constant fields, it was all down to the compiler, really. For optional parameters, it’s mostly still down to the compiler, but with framework support for reflection. What about attribute arguments? They’re the other notable aspect of C# which requires compile-time constants. For example, this is fine:

[Description("This is a constant")]

But this is not:

[Description("Initialized at " + DateTime.Now)]

Intriguingly, this is fine too:

[ContractClass(typeof(Foo))]

… despite the fact that

const Type ConstType = typeof(Foo);

isn’t valid. The only constant expression which is valid for type Type is null. So in section 17.2 of the C# 5 specification, Type is explicitly called out:

An expression E is an attribute-argument-expression if all of the following statements are true:

  • The type of E is an attribute parameter type (§17.1.3).
  • At compile-time, the value of E can be resolved to one of the following:
    • A constant value.
    • A System.Type object.
    • A one-dimensional array of attribute-argument-expressions.

(Interestingly, there’s no indication that I can see that the value of E has to be obtained via typeof, in the spec – clearly [ContractClass(Type.GetType("TodayIs" + DateTime.Today.Day))] should be invalid, but I can’t currently see which bit of the spec that violates. Something to be tightened up, potentially.)

And the “attribute parameter type” part – section 17.1.3 – looks like this:

The types of positional and named parameters for an attribute class are limited to the attribute parameter types, which are:

  • One of the following types: bool, byte, char, double, float, int, long, sbyte, short, string, uint, ulong, ushort.
  • The type object.
  • The type System.Type.
  • An enum type, provided it has public accessibility and the types in which it is nested (if any) also have public accessibility (§17.2).
  • Single-dimensional arrays of the above types.

Oops – no decimal. Note that the type of E that has to be one of those types, as well as the parameter type… so it doesn’t help to have a parameter of type object and then try to pass a constant decimal value as the argument.

Basically, attributes arguments are sufficiently low-level that the CLR itself wants to know about them – and while it has a deep knowledge of Type, string and the primitive value types, it doesn’t have much knowledge about decimal (or at least, it isn’t required to). Attribute arguments can’t use funky “custom constant” attributes to specify values like decimal or DateTime – you really are limited to the types that the CLR knows about. In a future version it’s not inconceivable that this could be broadened, but at the moment it’s pretty strict.

Conclusion

So, it turns out the idea of a constant isn’t terribly constant in itself. We have:

  • Constant fields, which are primarily a compile-time concern, and therefore language-specific.
  • Optional parameter default values, which feel like they ought to be just like constant fields (in that a value specified in one place is substituted in another) but apparently have a bit more support in the C# compiler… and more reflection support too.
  • Attribute arguments, which are the strictest form of constant I’ve found so far, in that they have to correspond to a small set of CLR “special” types.

I didn’t even talk about constant expressions (section 7.19 of the C# 5 spec) much in this post – maybe I’ll delve into those in more detail in another post.

Unlike my normal day-dreaming about changing the compiler, I think I really might have a crack at Roslyn for supporting arbitrary optional parameters – it feels like it could potentially be genuinely useful (which is also unlike most of my idle speculation).

So next time you ask yourself whether something is a constant, you should start off by asking yourself what you mean by “constant” in the first place.

The BobbyTables culture

I started writing a post like this a long time ago, but somehow never finished it.

Countless posts on Stack Overflow are vulnerable to SQL injection attacks. Along with several other users, I always raise this when it shows up – this is something that really just shouldn’t happen these days. It’s a well-understood issue,and parameterized SQL is a great solution in almost all cases. (No, it doesn’t work if you want to specify an column or table name dynamically. Yes, whitelisting is the solution there.)

The response usually falls into one of three camps:

  • Ah – I didn’t know about that. Great, I’ll fix it now. Thanks!
  • This is just a prototype. I’ll fix it for the real thing. (Ha! Like that ever happens.)
  • Well yes, in theory – but I’m just using numbers. That’s not a problem, is it?

Now personally I feel that you should just get the habit of using parameterized queries all the time, even when you could get away without it. This post is a somewhat tongue-in-cheek counterargument to the last of these responses. If you haven’t seen Bobby Tables, you really should. It’s the best 10-second explanation of SQL injection that I’ve ever seen, and I almost always drop a link to it when I’m adding a comment on a vulnerable query on Stack Overflow.

So in honour of Bobby, here’s a little program. See if you can predict the output.

using System;
using System.Globalization;
using System.Threading;

class Test
{
    static void Main()
    {
        string sql = "SELECT * FROM Foo WHERE BarDate > '" + DateTime.Today + "'";
        // Imagine you're executing the query here...
        Console.WriteLine(sql);

        int bar = -10;
        sql = "SELECT * FROM Foo WHERE BarValue = " + bar;
        // Imagine you're executing the query here...
        Console.WriteLine(sql);
    }

    // Some other code here...
}

Does that look okay? Not great, admittedly – but not too bad, right? Well, the output of the program is:

SELECT * FROM Foo WHERE BarDate > '2014-08-08' OR ' '=' '
SELECT * FROM Foo WHERE BarValue = 1 OR 1=1 OR 1=10

Yikes! Our queries aren’t filtering out anything!

Of course, the black magic is in “Some other code here” part:

static Test()
{
    InstallBobbyTablesCulture();
}

static void InstallBobbyTablesCulture()
{
    CultureInfo bobby = (CultureInfo) CultureInfo.InvariantCulture.Clone();
    bobby.DateTimeFormat.ShortDatePattern = @"yyyy-MM-dd'' OR ' '=''";
    bobby.DateTimeFormat.LongTimePattern = "";
    bobby.NumberFormat.NegativeSign = "1 OR 1=1 OR 1=";
    Thread.CurrentThread.CurrentCulture = bobby;
}

Neither numbers (well, negative numbers in this case) nor dates are safe. And of course if your database permissions aren’t set correctly, the queries could do a lot more than just remove any filtering. For extra fun, you can subvert some custom format strings – by changing the DateSeparator property, for example.

Even in sensible cultures, if the database expects you to use . for the decimal separator and you’re in a European culture that uses , instead, do you know how your database will behave? If you sanitize your input based on the numeric value, but then that isn’t the value that the database sees due to a string conversion, how comfortable are you that your application is still safe? It may not allow direct damage, but it could potentially reveal more data than you originally expected – which is definitely a vulnerability in a form.

Now the chances of me getting onto your system and installing the Bobby Tables culture – let alone making it the system default – are pretty slim, and if that happens you’ve probably got bigger problems anyway… but it’s the principle of the thing. You don’t care about a text representation of your values: you just want to get them to the database intact.

Parameterized SQL: just say yes.

Micro-optimization: the surprising inefficiency of readonly fields

Introduction

Recently I’ve been optimizing the heck out of Noda Time. Most of the time this has been a case of the normal measurement, find bottlenecks, carefully analyse them, lather, rinse, repeat. Yesterday I had a hunch about a particular cost, and decided to experiment… leading to a surprising optimization.

Noda Time’s core types are mostly value types – date/time values are naturally value types, just as DateTime and DateTimeOffset are in the BCL. Noda Time’s types are a bit bigger than most value types, however – the largest being ZonedDateTime, weighing in at 40 bytes in an x64 CLR at the moment. (I can shrink it down to 32 bytes with a bit of messing around, although it’s not terribly pleasant to do so.) The main reason for the bulk is that we have two reference types involved (the time zone and the calendar system), and in Noda Time 2.0 we’re going to have nanosecond resolution instead of tick resolution (so we need 12 bytes just to store a point in time). While this goes against the Class Library Design Guidelines, it would be odd for the smaller types (LocalDate, LocalTime) to be value types and the larger ones to be reference types. Overall, these still feel like value types.

A lot of these value types are logically composed of each other:

  • A LocalDate is a YearMonthDay and a CalendarSystem reference
  • A LocalDateTime is a LocalDate and a LocalTime
  • An OffsetDateTime is a LocalDateTime and an Offset
  • A ZonedDateTime is an OffsetDateTime and a DateTimeZone reference

This leads to a lot of delegation, potentially – asking a ZonedDateTime for its Year could mean asking the OffsetDateTime, which would ask the LocalDateTime, which would ask the LocalDate, which would ask the YearMonthDay. Very nice from a code reuse point of view, but potentially inefficient due to copying data.

Why would there be data copying involved? Well, that’s where this blog post comes in.

Behaviour of value type member invocations

When an instance member (method or property) belonging to a value type is invoked, the exact behaviour depends on the kind of expression it is called on. From the C# 5 spec, section 7.5.5 (where E is the expression the member M is invoked on, and the type declaring M is a value type):

If E is not classified as a variable, then a temporary local variable of E’s type is created and the value of E is assigned to that variable. E is then reclassified as a reference to that temporary local variable. The temporary variable is accessible as this within M, but not in any other way. Thus, only when E is a true variable is it possible for the caller to observe the changes that M makes to this.

So when is a variable not a variable? When it’s readonly… from section 7.6.4 (emphasis mine) :

If T is a struct-type and I identifies an instance field of that class-type:

  • If E is a value, or if the field is readonly and the reference occurs outside an instance constructor of the struct in which the field is declared, then the result is a value, namely the value of the field I in the struct instance given by E.

(There’s a very similar bullet for T being a class-type; the important part is that the field type is a value type

The upshot is that if you have a method call of:

int result = someField.Foo();

then it’s effectively converted into this:

var tmp = someField;
int result = tmp.Foo();

Now if the type of the field is quite a large value type, but Foo() doesn’t modify the value (which it never does within my value types), that’s performing a copy completely unnecessarily.

To see this in action outside Noda Time, I’ve built a little sample app.

Show me the code!

Our example is a simple 256-bit type, composed of 4 Int64 values. The type itself doesn’t do anything useful – it just holds the four values, and exposes them via properties. We then measure how long it takes to sum the four properties lots of times.

using System;
using System.Diagnostics;

public struct Int256
{
    private readonly long bits0;
    private readonly long bits1;
    private readonly long bits2;
    private readonly long bits3;
    
    public Int256(long bits0, long bits1, long bits2, long bits3)
    {
        this.bits0 = bits0;
        this.bits1 = bits1;
        this.bits2 = bits2;
        this.bits3 = bits3;
    }
    
    public long Bits0 { get { return bits0; } }
    public long Bits1 { get { return bits1; } }
    public long Bits2 { get { return bits2; } }
    public long Bits3 { get { return bits3; } }
}

class Test
{
    private readonly Int256 value;

    public Test()
    {
        value = new Int256(1L, 5L, 10L, 100L);
    }
    
    public long TotalValue 
    { 
        get 
        {
            return value.Bits0 + value.Bits1 + value.Bits2 + value.Bits3; 
        }
    }
    
    public void RunTest()
    {
        // Just make sure it’s JITted…
        var sample = TotalValue;
        Stopwatch sw = Stopwatch.StartNew();
        long total = 0;
        for (int i = 0; i < 1000000000; i++)
        {
            total += TotalValue;
        }
        sw.Stop();
        Console.WriteLine("Total time: {0}ms", sw.ElapsedMilliseconds);
    }
    
    static void Main()
    {
        new Test().RunTest();
    }
}

Building this from the command line with /o+ /debug- and running (in a 64-bit CLR, but no RyuJIT) this takes about 20 seconds to run on my laptop. We can make it much faster with just one small change:

class Test
{
    private Int256 value;

    // Code as before
}

The same test now takes about 4 seconds – a 5-fold speed improvement, just by making a field non-readonly. If we look at the IL for the TotalValue property, the copying becomes obvious. Here it is when the field is readonly:

.method public hidebysig specialname instance int64 
        get_TotalValue() cil managed
{
  // Code size       60 (0x3c)
  .maxstack  2
  .locals init (valuetype Int256 V_0,
           valuetype Int256 V_1,
           valuetype Int256 V_2,
           valuetype Int256 V_3)
  IL_0000:  ldarg.0
  IL_0001:  ldfld      valuetype Int256 Test::’value’
  IL_0006:  stloc.0
  IL_0007:  ldloca.s   V_0
  IL_0009:  call       instance int64 Int256::get_Bits0()
  IL_000e:  ldarg.0
  IL_000f:  ldfld      valuetype Int256 Test::’value’
  IL_0014:  stloc.1
  IL_0015:  ldloca.s   V_1
  IL_0017:  call       instance int64 Int256::get_Bits1()
  IL_001c:  add
  IL_001d:  ldarg.0
  IL_001e:  ldfld      valuetype Int256 Test::’value’
  IL_0023:  stloc.2
  IL_0024:  ldloca.s   V_2
  IL_0026:  call       instance int64 Int256::get_Bits2()
  IL_002b:  add
  IL_002c:  ldarg.0
  IL_002d:  ldfld      valuetype Int256 Test::’value’
  IL_0032:  stloc.3
  IL_0033:  ldloca.s   V_3
  IL_0035:  call       instance int64 Int256::get_Bits3()
  IL_003a:  add
  IL_003b:  ret
} // end of method Test::get_TotalValue

And here it is when the field’s not readonly:

.method public hidebysig specialname instance int64 
        get_TotalValue() cil managed
{
  // Code size       48 (0x30)
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  ldflda     valuetype Int256 Test::’value’
  IL_0006:  call       instance int64 Int256::get_Bits0()
  IL_000b:  ldarg.0
  IL_000c:  ldflda     valuetype Int256 Test::’value’
  IL_0011:  call       instance int64 Int256::get_Bits1()
  IL_0016:  add
  IL_0017:  ldarg.0
  IL_0018:  ldflda     valuetype Int256 Test::’value’
  IL_001d:  call       instance int64 Int256::get_Bits2()
  IL_0022:  add
  IL_0023:  ldarg.0
  IL_0024:  ldflda     valuetype Int256 Test::’value’
  IL_0029:  call       instance int64 Int256::get_Bits3()
  IL_002e:  add
  IL_002f:  ret
} // end of method Test::get_TotalValue

Note that it’s still loading the field address (ldflda) four times. You might expect that copying the field onto the stack once via a temporary variable would be faster, but that ends up at about 6.5 seconds on my machine.

There is an optimization which is even faster – moving the totalling property into Int256. That way (with the non-readonly field, still) the total time is less than a second – twenty times faster than the original code!

Conclusion

This isn’t an optimization I’d recommend in general. Most code really doesn’t need to be micro-optimized this hard, and most code doesn’t deal with large value types like the ones in Noda Time. However, I regard Noda Time as a sort of "system level" library, and I don’t ever want someone to decide not to use it on  performance grounds. My benchmarks show that for potentially-frequently-called operations (such as the properties on ZonedDateTime) it really does make a difference, so I’m going to go for it.

I intend to apply a custom attribute to each of these "would normally be readonly" fields to document the intended behaviour of the field – and then when Roslyn is fully released, I’ll probably write a test to validate that all of these fields would still compile if the field were made readonly (e.g. that they’re never assigned to outside the constructor).

Aside from anything else, I find the subtle difference in behaviour between a readonly field and a read/write field fascinating… it’s something I’d been vaguely aware of in the past, but this is the first time that it’s had a practical impact on me. Maybe it’ll never make any difference to your code… but it’s probably worth being aware of anyway.

Extension methods, explicitly implemented interfaces and collection initializers

This post is the answer to yesterday’s brainteaser. As a reminder, I was asking what purpose this code might have:

public static class Extensions 

    public static void Add<T>(this ICollection<T> source, T item) 
    { 
        source.Add(item); 
    } 
}

There are plenty of answers, varying from completely incorrect (sorry!) to pretty much spot on.

As many people noticed, ICollection<T> already has an Add method taking an item of type T. So what difference could this make? Well, consider LinkedList<T>, which implements ICollection<T>, used as below:

// Invalid
LinkedList<int> list = new LinkedList<int>();
list.Add(10);

That’s not valid code (normally)…. whereas this is:

// Valid
ICollection<int> list = new LinkedList<int>();
list.Add(10);

The only difference is the compile-time type of the list variable – and that changes things because LinkedList<T> implements ICollection<T>.Add using explicit interface implementation. (Basically you’re encouraged to use AddFirst and AddLast instead, to make it clear which end you’re adding to. Add is equivalent to AddLast.)

Now consider the invalid code above, but with the brainteaser extension method in place. Now it’s a perfectly valid call to the extension method, which happens to delegate straight to the ICollection<T> implementation. Great! But why bother? Surely we can just cast list if we really want to:

LinkedList<int> list = new LinkedList<int>();
((IList<int>)list).Add(10);

That’s ugly (really ugly) – but it does work. But what about situations where you can’t cast? They’re pretty rare, but they do exist. Case in point: collection initializers. This is where the C# 6 connection comes in. As of C# 6 (at least so far…) collection initializers have changed so that an appropriate Add extension method is also permitted. So for example:

// Sometimes valid :)
LinkedList<int> list = new LinkedList<int> { 10, 20 };

That’s invalid in C# 5 whatever you do, and it’s only valid in C# 6 when you’ve got a suitable extension method in place, such as the one in yesterday’s post. There’s nothing to say the extension method has to be on ICollection<T>. While it might feel nice to be general, most implementations of ICollection<T> which use explicit interface implementation for ICollection<T>.Add do so for a very good reason. With the extension method in place, this is valid too…

// Valid from a language point of view…
ReadOnlyCollection<int> collection = new ReadOnlyCollection<int>(new[] { 10, 20 }) { 30, 40 };

That will compile, but it’s obviously not going to succeed at execution time. (It throws NotSupportedException.)

Conclusion

I don’t think I’d ever actually use the extension method I showed yesterday… but that’s not quite the same thing as it being useless, particularly when coupled with C# 6’s new-and-improved collection initializers. (The indexer functionality means you can use collection initializers with ConcurrentDictionary<,> even without extension methods, by the way.)

Explicit interface implementation is an interesting little corner to C# which is easy to forget about when you look at code – and which doesn’t play nicely with dynamic typing, as I’ve mentioned before.

And finally…

Around the same time as I posted the brainteaser yesterday, I also remarked on how unfortunate it was that StringBuilder didn’t implement IEnumerable<char>. It’s not that I really want to iterate over a StringBuilder… but if it implemented IEnumerable, I could use it with a collection initializer, having added some extension methods. This would have been wonderfully evil…

using System;
using System.Text; 

public static class Extensions  
{  
    public static void Add(this StringBuilder builder, string text)
    {  
        builder.AppendLine(text);
    }  

    public static void Add(this StringBuilder builder, string format, params object[] arguments)
    {  
        builder.AppendFormat(format, arguments);
        builder.AppendLine();
    }  

class Test
{
    static void Main()
    {
        // Sadly invalid :(
        var builder = new StringBuilder
        {
            "Just a plain message",
            { "A message with formatting, recorded at {0}", DateTime.Now }
        };
    }
}

Unfortunately it’s not to be. But watch this space – I’m sure I’ll find some nasty ways of abusing C# 6…

Quick brainteaser

Just a really quick one today…

What’s the point of this code? Does it have any point at all?

public static class Extensions
{
    public static void Add<T>(this ICollection<T> source, T item)
    {
        source.Add(item);
    }
}

Bonus marks if you can work out what made me think about it.

I suggest you ROT-13 answers to avoid spoilers for other readers.

More fun with DateTime

(Note that this is deliberately not posted in the Noda Time blog. I reckon it’s of wider interest from a design perspective, and I won’t be posting any of the equivalent Noda Time code. I’ll just say now that we don’t have this sort of craziness in Noda Time, and leave it at that…)

A few weeks ago, I was answering a Stack Overflow question when I noticed an operation around dates and times which should have been losing information apparently not doing so. I investigated further, and discovered some "interesting" aspects of both DateTime and TimeZoneInfo. In an effort to keep this post down to a readable length (at least for most readers; certain WebDriver developers who shall remain nameless have probably given up by now already) I’ll save the TimeZoneInfo bits for another post.

Background: daylight saving transitions and ambiguous times

There’s one piece of inherent date/time complexity you’ll need to understand for this post to make sense: sometimes, a local date/time occurs twice. For the purposes of this post, I’m going to assume you’re in the UK time zone. On October 28th 2012, at 2am local time (1am UTC), UK clocks will go back to 1am local time. So 1:20am local time occurs twice – once at 12:20am UTC (in daylight saving time, BST), and once at 1:20am UTC (in standard time, GMT).

If you want to run any of the code in this post and you’re not in the UK, please adjust the dates and times used to a similar ambiguity for when your clocks go back. If you happen to be in a time zone which doesn’t observe daylight savings, I’m afraid you’ll have to adjust your system time zone in order to see the effect for yourself.

DateTime.Kind and conversions

As you may already know, as of .NET 2.0, DateTime has a Kind property, of type DateTimeKind – an enum with the following values:

  • Local: The DateTime is considered to be in the system time zone. Not an arbitrary "local time in some time zone", but in the specific current system time zone.
  • Utc: The DateTime is considered to be in UTC (corollary: it always unambiguously represents an instant in time)
  • Unspecified: This means different things in different contexts, but it’s a sort of "don’t know" kind; this is closer to "local time in some time zone" which is represented as LocalDateTime in Noda Time.

DateTime provides three methods to convert between the kinds:

  • ToUniversalTime: if the original kind is Local or Unspecified, convert it from local time to universal time in the system time zone. If the original kind is Utc, this is a no-op.
  • ToLocalTime: if the original kind is Utc or Unspecified, convert it from UTC to local time. If the original kind is Local, this is a no-op.
  • SpecifyKind: keep the existing date/time, but just change the kind. (So 7am stays as 7am, but it changes the meaning of that 7am effectively.)

(Prior to .NET 2.0, ToUniversalTime and ToLocalTime were already present, but always assumed the original value needed conversion – so if you called x.ToLocalTime().ToLocalTime().ToLocalTime() the result would probably end up with the appropriate offset from UTC being applied three times!)

Of course, none of these methods change the existing value – DateTime is immutable, and a value type – instead, they return a new value.

DateTime’s Deep Dark Secret

(The code in this section is presented in several chunks, but it forms a single complete piece of code – later chunks refer to variables in earlier chunks. Put it all together in a Main method to run it.)

Armed with the information in the previous sections, we should be able to make DateTime lose data. If we start with 12:20am UTC and 1:20am UTC on October 28th as DateTimes with a kind of Utc, when we convert them to local time (on a system in the UK time zone) we should get 1:20am in both cases due to the daylight saving transition. Indeed, that works:

// Start with different UTC values around a DST transition
var original1 = new DateTime(2012, 10, 28, 0, 20, 0, DateTimeKind.Utc);
var original2 = new DateTime(2012, 10, 28, 1, 20, 0, DateTimeKind.Utc);

// Convert to local time
var local1 = original1.ToLocalTime();
var local2 = original2.ToLocalTime();

// Result is the same for both values. Information loss?
var expected = new DateTime(2012, 10, 28, 1, 20, 0, DateTimeKind.Local);
Console.WriteLine(local1 == expected); // True
Console.WriteLine(local2 == expected); // True
Console.WriteLine(local1 == local2);   // True

If we’ve started with two different values, applied the same operation to both, and ended up with equal values, then we must have lost information, right? That doesn’t mean that operation is "bad" any more than "dividing by 2" is bad. You ought to be aware of that information loss, that’s all.

So, we ought to be able to demonstrate that information loss further by converting back from local time to universal time. Here we have the opposite problem: from our local time of 1:20am, we have two valid universal times we could convert to – either 12:20am UTC or 1:20am UTC. Both answers would be correct – they are universal times at which the local time would be 1:20am. So which one will get picked? Well… here’s the surprising bit:

// Convert back to UTC
var roundTrip1 = local1.ToUniversalTime(); 
var roundTrip2 = local2.ToUniversalTime();

// Values round-trip correctly! Information has been recovered…
Console.WriteLine(roundTrip1 == original1);  // True
Console.WriteLine(roundTrip2 == original2);  // True
Console.WriteLine(roundTrip1 == roundTrip2); // False

Somehow, each of the local values knows which universal value it came from. The The information has been recovered, so the reverse conversion round-trips each value back to its original one. How is that possible?

It turns out that DateTime actually has four potential kinds: Local, Utc, Unspecified, and "local but treat it as the earlier option when resolving ambiguity". A DateTime is really just a 64-bit number of ticks, but because the range of DateTime is only January 1st 0001 to December 31st 9999. That range can be represented in 62 bits, leaving 2 bits "spare" to represent the kind. 2 bits gives 4 possible values… the three documented ones and the shadowy extra one.

Through experimentation, I’ve discovered that the kind is preserved if you perform arithmetic on the value, too… so if you go to another "fall back" DST transition such as October 30th 2011, the ambiguity resolution works the same way as before:

var local3 = local1.AddYears(-1).AddDays(2); 
var local4 = local2.AddYears(-1).AddDays(2);        
Console.WriteLine(local3.ToUniversalTime().Hour); // 0
Console.WriteLine(local4.ToUniversalTime().Hour); // 1

If you use DateTime.SpecifyKind with DateTimeKind.Local, however, it goes back to the "normal" kind, even though it looks like it should be a no-op:

// Should be a no-op?
var local5 = DateTime.SpecifyKind(local1, local1.Kind); 
Console.WriteLine(local5.ToUniversalTime().Hour); // 1

Is this correct behaviour? Or should it be a no-op, just like calling ToLocalTime on a "local" DateTime is? (Yes, I’ve checked – that doesn’t lose the information.) It’s hard to say, really, as this whole business appears to be undocumented… at least, I haven’t seen anything in MSDN about it. (Please add a link in the comments if you find something. The behaviour actually goes against what’s documented, as far as I can tell.)

I haven’t looked into whether various forms of serialization preserve values like this faithfully, by the way – but you’d have to work hard to reproduce it in non-framework code. You can’t explicitly construct a DateTime with the "extra" kind; the only ways I know of to create such a value are via a conversion to local time or through arithmetic on a value which already has the kind. (Admittedly if you’re serializing a DateTime with a Kind of Local, you’re already on potentially shaky ground, given that you could be deserializing it on a machine with a different system time zone.)

Unkind comparisons

I’ve misled you a little, I have to admit. In the code above, when I compared the "expected" value with the results of the first conversions, I deliberately specified DateTimeKind.Local in the constructor call. After all, that’s the kind we do expect. Well, yes – but I then printed the result of comparing this value with local1 and local2… and those comparisons would have been the same regardless of the kind I’d specified in the constructor.

All comparisons between DateTimes ignore the Kind property. It’s not just restricted to equality. So for example, consider this comparison:

// In June: Local time is UTC+1, so 8am UTC is 9am local
var dt1 = new DateTime(2012, 6, 1, 8, 0, 0, DateTimeKind.Utc); 
var dt2 = new DateTime(2012, 6, 1, 8, 30, 0, DateTimeKind.Local); 
Console.WriteLine(dt1 < dt2); // True

When viewed in terms of "what instants in time do these both represent?" the answer here is wrong – when you convert both values into the same time zone (in either direction), dt1 occurs after dt2. But a simple look at the properties tells a different story. In practice, I suspect that most comparisons between DateTime values of different kinds involve code which is at best sloppy and is quite possibly broken in a meaningful way.

Of course, if you bring Kind=Unspecified into the picture, it becomes impossible to compare meaningfully in a kind-sensitive way. Is 12am UTC before or after 1am Unspecified? It depends what time zone you later use.

To be clear, it is a hard-to-resolve issue, and one that we don’t do terribly well at in Noda Time at the moment for ZonedDateTime. (And even with just LocalDateTime you’ve got issues between calendars.) This is a situation where providing separate Comparer<T> implementations works nicely – so you can explicitly say what kind of comparison you want.

Conclusions

There’s more fun to be had with a similar situation when we look at TimeZoneInfo, but for now, a few lessons:

  • Giving a type different "modes" which make it mean fairly significantly different things is likely to cause headaches
  • Keeping one of those modes secret (and preventing users from even constructing a value in that mode directly) leads to even more fun and games
  • If two instances of your type are considered "equal" but behave differently, you should at least consider whether there’s something smelly going on
  • There’s always more fun to be had with DateTime…

Type initializer circular dependencies

To some readers, the title of this post may induce nightmarish recollections of late-night debugging sessions. To others it may be simply the epitome of jargon. Just to break the jargon down a bit:

  • Type initializer: the code executed to initialize the static variables of a class, and the static constructor
  • Circular dependency: two bits of code which depend on each other – in this case, two classes whose type initializers each require that the other class is initialized

A quick example of the kind of problem I’m talking about would be helpful here. What would you expect this code to print?

using System;

class Test
{    
    static void Main()
    {
        Console.WriteLine(First.Beta);
    }
}

class First
{
    public static readonly int Alpha = 5;
    public static readonly int Beta = Second.Gamma;
}

class Second
{
    public static readonly int Gamma = First.Alpha;
}

Of course, without even glancing at the specification, any expectations are pretty irrelevant. Here’s what the spec (section 10.5.5.1 of the C# 4 version):

The static field variable initializers of a class correspond to a sequence of assignments that are executed in the textual order in which they appear in the class declaration. If a static constructor (§10.12) exists in the class, execution of the static field initializers occurs immediately prior to executing that static constructor. Otherwise, the static field initializers are executed at an implementation-dependent time prior to the first use of a static field of that class.

In addition to the language specification, the CLI specification gives more details about type initialization in the face of circular dependencies and multiple threads. I won’t post the details here, but the gist of it is:

  • Type initialization acts like a lock, to prevent more than one thread from initializing a type
  • If the CLI notices that type A needs to be initialized in order to make progress, but it’s already in the process of initializing type A in the same thread, it continues as if the type were already initialized.

So here’s what you might expect to happen:

  1. Initialize Test: no further action required
  2. Start running Main
  3. Start initializing First (as we need First.Beta)
  4. Set First.Alpha to 5
  5. Start initializing Second (as we need Second.Gamma)
  6. Set Second.Gamma to First.Alpha (5)
  7. End initializing Second
  8. Set First.Beta to Second.Gamma (5)
  9. End initializing First
  10. Print 5

Here’s what actually happens – on my box, running .NET 4.5 beta. (I know that type initialization changed for .NET 4, for example. I don’t know of any changes for .NET 4.5, but I’m not going to claim it’s impossible.)

  1. Initialize Test: no further action required
  2. Start running Main
  3. Start initializing First (as we need First.Beta)
  4. Start initializing Second (we will need Second.Gamma)
  5. Set Second.Gamma to First.Alpha (0)
  6. End initializing Second
  7. Set First.Alpha to 5
  8. Set First.Beta to Second.Gamma (0)
  9. End initializing First
  10. Print 0

Step 5 is the interesting one here. We know that we need First to be initialized, in order to get First.Alpha, but this thread is already initializing First (we started in step 3) so we just access First.Alpha and hope that it’s got the right value. As it happens, that variable initializer hasn’t been executed yet. Oops.

(One subtlety here is that I could have declared all these variables as constants instead using "const" which would have avoided all these problems.)

Back in the real world…

Hopefully that example makes it clear why circular dependencies in type initializers are nasty. They’re hard to spot, hard to debug, and hard to test. Pretty much your classic Heisenbug, really. It’s important to note that if the program above had happened to initialize Second first (to access a different variable, for example) we could have ended up with a different result. In particular, it’s easy to get into a situation where running all your unit tests can cause a failure – but if you run just the failing test, it passes.

One way of avoiding all of this is never to use any type initializers for anything, of course. In many cases that’s exactly the right solution – but often there are natural uses, particularly for well-known values such as Encoding.Utf8, TimeZoneInfo.Utc and the like. Note that in both of those cases they are static properties, but I would expect them to be backed by static fields. I’m somewhat ambivalent between using public static readonly fields and public static get-only properties – but as we’ll see later, there’s a definite advantage to using properties.

Noda Time has quite a few values like this – partly because so many of its types are immutable. It makes sense to create a single UTC time zone, a single ISO calendar system, a single "pattern" (text formatter/parser) for each of a variety of common cases. In addition to the publicly visible values, there are various static variables used internally, mostly for caching purposes. All of this definitely adds complexity – and makes it harder to test – but the performance benefits can be significant.

Unfortunately, a lot of these values end up with fairly natural circular dependencies – as I discovered just recently, where adding a new static field caused all kinds of breakage. I was able to fix the immediate cause, but it left me concerned about the integrity of the code. I’d fixed the one failure I knew about – but what about any others?

Testing type initialization

One of the biggest issues with type initialization is the order-sensitivity – combined with the way that once a type has been initialized once, that’s it for that AppDomain. As I showed earlier, it’s possible that initializing types in one particular order causes a problem, but a different order won’t.

I’ve decided that for Noda Time at least, I want to be reasonably sure that type initialization circularity isn’t going to bite me. So I want to validate that no type initializers form cycles, whatever order the types are initialized in. Logically if we can detect a cycle starting with one type, we ought to be able to detect it starting with any of the other types in that cycle – but I’m sufficiently concerned about weird corner cases that I’d rather just take a brute force approach.

So, as a rough plan:

  • Start with an empty set of dependencies
  • For each type in the target assembly:
    • Create a new AppDomain
    • Load the target assembly into it
    • Force the type to be initialized
    • Take a stack trace at the start of each type initializer and record any dependencies
  • Look for cycles in the complete set of dependencies

Note that we’ll never spot a cycle within any single AppDomain, due to the way that type initialization works. We have to put together the results for multiple initialization sequences to try to find a cycle.

A description of the code would probably be harder to follow than the code itself, but the code is relatively long – I’ve included it at the end of this post to avoid intefering with the narrative flow. For more up-to-date versions in the future, look at the Noda Time repository.

This isn’t a terribly nice solution, for various reasons:

  • Creating a new AppDomain and loading assemblies into it from a unit test runner isn’t as simple as it might be. My code doesn’t currently work with NCrunch; I don’t know how it finds its assemblies yet. When I’ve fixed that, I’m sure other test runners would still be broken. Likewise I’ve had to explicitly filter types to get TeamCity (the continuous integration system Noda Time uses) to work properly. Currently, you’d need to edit the test code to change the filters. (It’s possible that there are better ways of filtering, of course.)
  • It relies on each type within the production code which has an "interesting" type initializer to have a line like this:
    private static readonly int TypeInitializationChecking = NodaTime.Utility.TypeInitializationChecker.RecordInitializationStart();
  • Not only does the previous line need to be added to the production code – it clearly gets executed each time, and takes up heap space per type. It’s only 4 bytes for each type involved, and it does no real work when we’re not testing, but it’s a nuisance anyway. I could use preprocessor directives to remove the code from non-debug or non-test-targeted builds, but that would look even messier.
  • It only picks up cycles which occur when running the version of .NET the tests happen to execute on. Given that there are ordering changes for different versions, I wouldn’t like to claim this is 100% bullet-proof. Likewise if there are only cycles when you’re running in some specific cultures (or other environmental features), it won’t necessarily pick those up.
  • I’ve deliberately not tried to make the testing code thread-safe. That’s fine in Noda Time – I don’t have any asynchronous operations or new threads in Noda Time at all – but other code may need to make this more robust.

So with all these caveats, is it still worth it? Absolutely: it’s already found bugs which have now been fixed.

In fact, the test didn’t get as far as reporting cycles to start with – it turned out that if you initialized one particular type first, the type initializer would fail with a NullReferenceException. Ouch! Once I’d fixed that, there were still quite a few problems to fix. Somewhat coincidentally, fixing them improved the design too – although the user-visible API didn’t change at all.

Fixing type initializer cycles

In the past, I’ve occasionally "fixed" type initialization ordering problems by simply moving fields around. The cycles still existed, but I figured out how to make them harmless. I can say now that this approach does not scale, and is more effort than it’s worth. The code ends up being brittle, hard to think about, and once you’ve got more than a couple of types involved it’s really error-prone, at least for my brain. It’s much better to break the cycle completely. To this end, I’ve ended up using a fairly simple technique to defer initialization of static variables. It’s a poor-man’s Lazy<T>, to some extent – but I’d rather not have to write Lazy<T> myself, and we’re currently targeting .NET 3.5…

Basically, instead of exposing a public static readonly field which creates the cycle, you expose a public static readonly property – which returns an internal static readonly field in a nested, private static class. We still get the nice thread-safe once-only initialization of a type initializer, but the nested type won’t be initialized until it needs to be. (In theory it could be initialized earlier, but a static constructor would ensure it isn’t.) So long as nothing within the rest of the type initializer for the containing class uses that property, we avoid the cycle.

So instead of this:

// Requires Bar to be initialized – if Bar also requires Foo to be
// initialized, we have a problem…
public static readonly Foo SimpleFoo = new Foo(Bar.Zero);

We might have:

public static readonly Foo SimpleFoo { get { return Constants.SimpleFoo; } }

private static class Constants
{
    private static readonly int TypeInitializationChecking = NodaTime.Utility.TypeInitializationChecker.RecordInitializationStart(); 

    // This requires both Foo and Bar to be initialized, but that’s okay
    // so long as neither of them require Foo.Constants to be initialized.
    // (The unit test would spot that.)
    internal static readonly Foo SimpleFoo = new Foo(Bar.Zero);
}

I’m currently undecided about whether to include static constructors in these classes to ensure lazy initialization. If the type initializer for Foo triggered the initializer of Foo.Constants, we’d be back to square one… but adding static constructors into each of these nested classes sounds like a bit of a pain. The nested classes should call into the type initialization checking as well, to validate they don’t cause any problems themselves.

Conclusion

I have to say, part of me really doesn’t like either the testing code or the workaround. Both smack of being clever, which is never a good thing. It’s definitely worth considering whether you could actually just get rid of the type initializer (or part of it) entirely, avoiding maintaining so much static state. It would be nice to be able to detect these type initializer cycles without running anything, simply using static analysis – I’m going to see whether NDepend could do that when I get a chance. The workaround doesn’t feel as neat as Lazy<T>, which is really what’s called for here – but I don’t trust myself to implement it correctly and efficiently myself.

So while both are somewhat hacky, they’re better than the alternative: buggy code. That’s what I’m ashamed to say I had in Noda Time, and I don’t think I’d ever have spotted all the cycles by inspection. It’s worth a try on your own code – see whether you’ve got problems lurking…

 

 

Appendix: Testing code

As promised earlier, here’s the code for the production and test classes.

TypeInitializationChecker

This is in NodaTime.dll itself.

internal sealed class TypeInitializationChecker : MarshalByRefObject
{
    private static List<Dependency> dependencies = null;

    private static readonly MethodInfo EntryMethod = typeof(TypeInitializationChecker).GetMethod("FindDependencies");

    internal static int RecordInitializationStart()
    {
        if (dependencies == null)
        {
            return 0;
        }
        Type previousType = null;
        foreach (var frame in new StackTrace().GetFrames())
        {
            var method = frame.GetMethod();
            if (method == EntryMethod)
            {
                break;
            }
            var declaringType = method.DeclaringType;
            if (method == declaringType.TypeInitializer)
            {
                if (previousType != null)
                {
                    dependencies.Add(new Dependency(declaringType, previousType));
                }
                previousType = declaringType;
            }
        }
        return 0;
    }

    /// <summary>
    /// Invoked from the unit tests, this finds the dependency chain for a single type
    /// by invoking its type initializer.
    /// </summary>
    public Dependency[] FindDependencies(string name)
    {
        dependencies = new List<Dependency>();
        Type type = typeof(TypeInitializationChecker).Assembly.GetType(name, true);
        RuntimeHelpers.RunClassConstructor(type.TypeHandle);
        return dependencies.ToArray();
    }

    /// <summary>
    /// A simple from/to tuple, which can be marshaled across AppDomains.
    /// </summary>
    internal sealed class Dependency : MarshalByRefObject
    {
        public string From { get; private set; }
        public string To { get; private set; }
        internal Dependency(Type from, Type to)
        {
            From = from.FullName;
            To = to.FullName;
        }
    }
}

TypeInitializationTest

This is within NodaTime.Test:

[TestFixture]
public class TypeInitializationTest
{
    [Test]
    public void BuildInitializerLoops()
    {
        Assembly assembly = typeof(TypeInitializationChecker).Assembly;
        var dependencies = new List<TypeInitializationChecker.Dependency>();
        // Test each type in a new AppDomain – we want to see what happens where each type is initialized first.
        // Note: Namespace prefix check is present to get this to survive in test runners which
        // inject extra types. (Seen with JetBrains.Profiler.Core.Instrumentation.DataOnStack.)
        foreach (var type in assembly.GetTypes().Where(t => t.FullName.StartsWith("NodaTime")))
        {
            // Note: this won’t be enough to load the assembly in all test runners. In particular, it fails in
            // NCrunch at the moment.
            AppDomainSetup setup = new AppDomainSetup { ApplicationBase = AppDomain.CurrentDomain.BaseDirectory };
            AppDomain domain = AppDomain.CreateDomain("InitializationTest" + type.Name, AppDomain.CurrentDomain.Evidence, setup);
            var helper = (TypeInitializationChecker)domain.CreateInstanceAndUnwrap(assembly.FullName,
                typeof(TypeInitializationChecker).FullName);
            dependencies.AddRange(helper.FindDependencies(type.FullName));
        }
        var lookup = dependencies.ToLookup(d => d.From, d => d.To);
        // This is less efficient than it might be, but I’m aiming for simplicity: starting at each type
        // which has a dependency, can we make a cycle?
        // See Tarjan’s Algorithm in Wikipedia for ways this could be made more efficient.
        // http://en.wikipedia.org/wiki/Tarjan’s_strongly_connected_components_algorithm
        foreach (var group in lookup)
        {
            Stack<string> path = new Stack<string>();
            CheckForCycles(group.Key, path, lookup);
        }
    }

    private static void CheckForCycles(string next, Stack<string> path, ILookup<string, string> dependencyLookup)
    {
        if (path.Contains(next))
        {
            Assert.Fail("Type initializer cycle: {0}-{1}", string.Join("-", path.Reverse().ToArray()), next);
        }
        path.Push(next);
        foreach (var candidate in dependencyLookup[next].Distinct())
        {
            CheckForCycles(candidate, path, dependencyLookup);
        }
        path.Pop();
    }
}

Eduasync part 15: implementing COMEFROM with a horrible hack

Ages ago when I wrote my previous Eduasync post, I said we’d look at a pipeline model of coroutines. I’ve decided to skip that, as I do want to cover the topic of this post, and I’ve got some more "normal" async ideas to write about too. If you want to look at the pipeline coroutines code, it’s project 20 in the source repository. Have fun, and don’t blame me if you get confused reading it – so do I.

The code I am going to write about is horrible too. It’s almost as tricky to understand, and it does far nastier things. Things that the C# 5 specification explicitly says you shouldn’t do.

If it makes you feel any better when your head hurts reading this code, spare a thought for me – I haven’t looked at it in over six months, and I don’t have a blog post explaining how it’s meant to work. I just have almost entirely uncommented code which is designed to be hard to understand (in terms of the main program flow).

On no account should any code like this ever be used for anything remotely serious.

With that health warning out of the way, let’s have a look at it…

COMEFROM at the caller level

The idea is to implement the COMEFROM control structure, which is sort of the opposite of GOTO (or in my implementation, more of a GOSUB). There are two operations, effectively:

  • ComeFrom(label): Register interest in a particular label.
  • Label(label): If anyone has registered interested in the given label, keep going from their registration point (which will be within a method), then continue from where we left off afterwards.

In some senses it’s a little like the observer pattern, with labels taking the place of events. However, it looks entirely different and is much harder to get your head round, because instead of having a nicely-encapsulated action which is subscribed to an event, we just have a ComeFrom call which lets us jump back into a method somewhat arbitrarily.

I have two implementations, in project 22 and project 23 in source control. Project 22 is almost sane; a little funky, but not too bad. Project 23 is where the fun really happens. In addition to the operations listed above, there’s an Execute operation which is sort of an implementation detail – it allows an async method containing ComeFrom calls to be executed without returning earlier than we might want.

Let’s look at some code and the output, and try to work out what’s going on.

internal class Program
{
    private static void Main(string[] args)
    {
        Coordinator coordinator = new Coordinator(SimpleEntryPoint);
        coordinator.Start();
    }

    private static async void SimpleEntryPoint(Coordinator coordinator)
    {
        await coordinator.Execute(SimpleOtherMethod);

        Console.WriteLine("First call to Label(x)");
        await coordinator.Label("x");

        Console.WriteLine("Second call to Label(x)");
        await coordinator.Label("x");

        Console.WriteLine("Registering interesting in y");
        bool firstTime = true;
        await coordinator.ComeFrom("y");

        Console.WriteLine("After ComeFrom(y). FirstTime={0}", firstTime);

        if (firstTime)
        {
            firstTime = false;
            await coordinator.Label("y");
        }
        Console.WriteLine("Finished");
    }

    private static async void SimpleOtherMethod(Coordinator coordinator)
    {
        Console.WriteLine("Start of SimpleOtherMethod");

        int count = 0;
        await coordinator.ComeFrom("x");

        Console.WriteLine("After ComeFrom x in SimpleOtherMethod. count={0}. Returning.",
                          count);
        count++;
    }
}

The reason for the "Simple" prefix on the method names is that there’s another example in the same file, with a more complex control flow.

Here’s the output – then we can look at why we’re getting it…

Start of SimpleOtherMethod
After ComeFrom x in SimpleOtherMethod. count=0. Returning.
First call to Label(x)
After ComeFrom x in SimpleOtherMethod. count=1. Returning.
Second call to Label(x)
After ComeFrom x in SimpleOtherMethod. count=2. Returning.
Registering interesting in y
After ComeFrom(y). FirstTime=True
After ComeFrom(y). FirstTime=False
Finished

So, the control flow is a bit like this:

  • Start SimpleEntryPoint
    • Call into SimpleOtherMethod
      • Log "Start of SimpleOtherMethod"
      • Initialize the "count" variable with value 0
      • Register interest in x; ComeFrom remembers the continuation but keeps going.
      • Log "After ComeFrom x in SimpleOtherMethod. count=0. Returning."
      • Increment count to 1.
      • Return.
    • Return takes us back to SimpleEntryPoint…
  • Log "First call to Label(x)"
  • Call Label("x")…
    • … which takes us back into SimpleOtherMethod (remember, the method we thought we’d finished executing?) just after ComeFrom
      • Log AfterComeFrom x in SimpleOtherMethod. count=1. Returning.
      • Increment count to 2.
      • Return.
    • Return takes us back to SimpleEntryPoint…
  • Log "Second call to Label(x)"
  • Call Label("x")…
    • … which takes us back into SimpleOtherMethod again
      • Log AfterComeFrom x in SimpleOtherMethod. count=2. Returning.
      • Increment count to 3.
      • Return.
    • Return takes us back to SimpleEntryPoint…
  • Log "Registering interest in y"
  • Initialize the "firstTime" variable with value true.
  • Register interest in y; ComeFrom remembers the continuation and keeps going
  • Log "After ComeFrom(y). FirstTime=True"
  • Check the value of firstTime… It’s true, so:
    • Set firstTime to false
    • Call Label("y")
  • … which takes us back to earlier in the method (just after ComeFrom), like a normal looping construct…
  • Log "After ComeFrom(y). FirstTime=False"
  • Check the value of firstTime… It’s false, so:
    • Log "Finished"
    • Exit!

Doing all of this has a few interesting challenges. Let’s look at them one at a time… and I would strongly advise you not to try to pay too much attention to the details.

Noting a continuation and continuing regardless…

Just as a quick reminder before we get cracking, it’s worth remembering that all of this is entirely synchronous, despite being implemented with async. There’s only a single user thread involved here. As with previous parts, we maintain a stack of actions to call, and basically keep calling from the top until we’re done – but the actions we call can create extra stack entries, of course.

ComeFrom has unusual semantics in terms of async. We want to remember the continuation and keep executing as if we didn’t need to wait. We can easily do one side or the other. If we wanted to just keep going without needing to know about the continuation, we could just return true from IsCompleted. If we just want to remember the continuation, we can make the awaiter’s IsCompleted property return false, and remember the continuation when it’s passed to OnCompleted. How do we do both?

Well, effectively we want to remember the continuation and then call it immediately. But we can’t just call it directly from OnCompleted, as otherwise each ComeFrom call would end up in a "real" execution stack from, whereas our execution stack is stored as a Stack<Action>. So instead, we need to remember the continuation and immediately put it at the top of the stack.

However, that only works if as soon as the generated code returns from the async method containing the ComeFrom call, we go back into the state machine. If we’d just called SimpleOtherMethod directly in SimpleEntryPoint, we would have continued within SimpleEntryPoint with the new stack entry just waiting around. This is why we need the Executor method: that does exactly the same thing, effectively shuffling the stack around. When it’s given something to execute, it puts its own continuation on the action stack, then the action it’s been asked to execute, then returns. The top level code will then pick up the original action, and we’re away.

So, here’s the code for Execute, which is the simplest part of the coordinator:

public ExecuteAwaiter Execute(Action<Coordinator> action)
{
    return new ExecuteAwaiter(() => action(this), this);
}

public class ExecuteAwaiter
{
    private readonly Action action;
    private readonly Coordinator coordinator;

    internal ExecuteAwaiter(Action action, Coordinator coordinator)
    {
        this.action = action;
        this.coordinator = coordinator;
    }

    public ExecuteAwaiter GetAwaiter()
    {
        return this;
    }

    // Always yield
    public bool IsCompleted { get { return false; } }

    public void OnCompleted(Action callerContinuation)
    {
        // We want to execute the action continuation, then get back here,
        // allowing any extra continuations put on the stack *within* the action
        // to be executed.
        coordinator.stack.Push(callerContinuation);
        coordinator.stack.Push(action);
    }

    public void GetResult()
    {
    }
}

All the awaitables in this project return themselves as the awaiter – when you don’t need any other state, it’s an easy step to take.

That’s all we need to say about Execute, but how exactly are we capturing the continuation in ComeFrom?

Capturing continuations

Once we’ve got the action stack shuffling under our belts, there are two more problems with ComeFrom:

  • What happens if we ComeFrom the same label twice?
  • How do we really capture a continuation?

The first point didn’t come up in the sample I’ve shown here, but it does come up in the more complex example – imagine if SimpleOtherMethod had two ComeFrom calls; when we jump back to the first one, we’ll execute the second one again. I made a simple policy decision to only allow a single "return point" for any label – if a ComeFrom call tries to register the existing continuation point for a label, we ignore it; otherwise we throw an exception. So we only need to care about a single continuation for any label, which makes life easier.

The second point is trickier. If you remember back to earlier posts in this series, we saw that the state machine generated for async only really contains a single entry point (MoveNext) which is used for all continuations. A variable in the state machine is responsible for remembering where we were within it between calls. So in order to really make the continuation remember the point at which it needs to continue, we need to remember that state. We need to store an object for the continuation, which contains the delegate to invoke, and the state of the state machine when we were first passed the continuation. I’ve created a class for this, unimaginatively called Continuation, which looks like this:

/// <summary>
/// This hack allows a continuation to be executed more than once,
/// contrary to the C# spec. It does this using reflection to store the
/// value of the "state" field within the generated class. NEVER, EVER, EVER
/// try to use this in real code. It’s purely for fun.
/// </summary>
internal sealed class Continuation : IEquatable<Continuation>
{
    private readonly int savedState;
    private readonly object target;
    private readonly FieldInfo field;
    private readonly Action action;

    internal Continuation(Action action)
    {
        target = action.Target;
        field = target.GetType().GetField("<>1__state", BindingFlags.Instance | BindingFlags.NonPublic);
        savedState = (int) field.GetValue(target);
        this.action = action;
    }

    internal void Execute()
    {
        field.SetValue(target, savedState);
        action();
    }

    // Snip Equals/GetHashCode
}

Yes, we use reflection to fish out the <>1__state variable initially, and poke the state machine with the same value when we next want to execute the continuation. All highly implementation-specific, of course.

Now the ComeFrom method is reasonably straightforward – all we need is a dictionary mapping labels to continuations. Oh, and the same action stack shuffling as for Execute:

// In the coordinator
private readonly Dictionary<string, Continuation> labels = new Dictionary<string, Continuation>();

public ComeFromAwaiter ComeFrom(string label)
{
    return new ComeFromAwaiter(label, this);
}

public struct ComeFromAwaiter
{
    private readonly string label;
    private readonly Coordinator coordinator;

    internal ComeFromAwaiter(string label, Coordinator coordinator)
    {
        this.label = label;
        this.coordinator = coordinator;
    }

    public ComeFromAwaiter GetAwaiter()
    {
        return this;
    }

    // We *always* want to be given the continuation
    public bool IsCompleted { get { return false; } }

    public void OnCompleted(Action action)
    {
        Continuation newContinuation = new Continuation(action);
        Continuation oldContinuation;
        if (!coordinator.labels.TryGetValue(label, out oldContinuation))
        {
            // First time coming from this label. Always succeeds.
            coordinator.labels[label] = newContinuation;
        }
        else
        {
            // Current semantics are to prohibit two different ComeFrom calls for the same label.
            // An alternative would be to just replace the existing continuation with the new one,
            // in which case we wouldn’t need any of this – we could just use
            // coordinator.labels[label] = newContinuation;
            // unconditionally.
            if (!oldContinuation.Equals(newContinuation))
            {
                throw new InvalidOperationException("Additional continuation detected for label " + label);
            }
            // Okay, we’ve seen this one before. Nothing to see here, move on.
        }
        // We actually want to continue from where we were: we’re only really marking the
        // ComeFrom point.
        coordinator.stack.Push(action);
    }

    public void GetResult()
    {
    }
}

There’s one interesting point here which is somewhat subtle, and screwed me up for a bit…

The default value of a struct is always valid…

You may have noticed that ComeFromAwaiter is a struct. That’s pretty unusual for me. However, it’s also absolutely critical. Without it, we’d get a NullReferenceException when we execute the continuation the second time.

Normally, the flow of async methods looks a bit like this, for an await expression taking the "long" route (i.e. IsCompleted is false):

  • Call GetAwaiter() and assign the result to an awaiter field
  • Call IsCompleted (which returns false in this scenario)
  • Set the state variable to remember where we’d got to
  • Call OnCompleted
  • Return
  • … When we continue…
  • Set state to 0 (running)
  • Call GetResult() on the awaiter
  • Set the awaiter field to default(TypeOfAwaiter)
  • Continue

Now that’s fine when we’re only continuing once – but if we need to jump into the middle of that sequence a second time, we’re going to call GetAwaiter() on the awaiter field after it’s been set to the default value of the awaiter type. If the default value is null, we’ll go bang. So we must use a struct.

Fortunately, our GetResult() call doesn’t need any of the state in the awaiter – it’s purely there to satisfy the normal flow of things. So we’re quite happy with a "default" ComeFrom awaiter.

Finally, labels…

We’ve now done all the hard work. The final piece of the puzzle is Label, which just needs to check whether there’s a continuation to jump to, and shuffle the action stack in the way we’re now painfully accustomed to:

public LabelAwaiter Label(string label)
{
    Continuation continuation;
    labels.TryGetValue(label, out continuation);
    return new LabelAwaiter(continuation, this);
}

public class LabelAwaiter
{
    private readonly Continuation continuation;
    private readonly Coordinator coordinator;

    internal LabelAwaiter(Continuation continuation, Coordinator coordinator)
    {
        this.continuation = continuation;
        this.coordinator = coordinator;
    }

    public LabelAwaiter GetAwaiter()
    {
        return this;
    }

    // If there’s no continuation to execute, just breeze through.
    public bool IsCompleted { get { return continuation == null; } }

    public void OnCompleted(Action action)
    {
        // We want to execute the ComeFrom continuation, then get back here.
        coordinator.stack.Push(action);
        coordinator.stack.Push(continuation.Execute);
    }

    public void GetResult()
    {
    }
}

Almost painfully simple, really.

So that looks like all the code that’s used, right? Not quite

Reusable builders?

As we saw in the sample code, we can end up finishing the same async method multiple times (SimpleOtherMethod completes three times). That’s going to call SetResult on the AsyncVoidMethodBuilder three times… which feels like it should go bang. Indeed, when I revisited my code earlier I wondered why it didn’t go bang – it’s the sort of illegal state transition the framework is usually pretty good at picking up on.

Then I remembered – this isn’t the framework’s AsyncVoidMethodBuilder – it’s mine. And my SetResult method in this project does absolutely nothing. How convenient!

Make it stop, make it stop!

Okay thiat was a pretty quick tour of some horrible code. You’ll never have to do anything like this with async in sane code, but it certainly made me painfully familiar with how it all worked. Just to recap on the oddities involved:

  • We needed to capture a continuation and then immediately keep going, almost as if the awaiter had said the awaitable had completed already. This involved shenanigans with the execution model and an extra method (Execute)
  • We needed to remember the state of a continuation, which we did with reflection.
  • We needed to make awaiter.GetResult() a valid call after awaiter had been reset to the default value for the type
  • We needed to ensure that the builder created in the skeleton method could have SetResult called on it multiple times

That’s all on continuations and co-routines, I promise.

Next time (hopefully soon) I’ll look at an example of how composition works so neatly in async, and then show how we can unit test async methods – at least sometimes.

Upcoming speaking engagements

It’s just occurred to me that I’ve forgotten to mention a few of the things I’ll be up to in the near-ish future. (I’ve talked about next week’s Progressive .NET session before.) This is just a quick rundown – follow the links for more blurb and details.

.NET Developer Network – Bristol, September 21st (evening)

I’ll be talking about async in Bristol – possibly at a high level, possibly in detail, depending on the audience experience. This is my first time talking with this particular user group, although I’m sure there’ll be some familiar faces. Come along if you’re in the area.

Øredev 2011 – Malmö, November 9th

It’s a whistle-stop trip to Sweden as I’m running out of vacation days; I’m flying out on the Tuesday evening and back on the Wednesday evening, but while I’m there I’ll give two talks:

  • Async 101 (yes, more async; I wonder at what point I’ll have given as many talks about it as Mads)
  • Effective technical communication (not a particularly technical talk, but definitely specific to technical communication)

Last year I had an absolute blast – looking forward to this year, even though I won’t have as much time for socializing.

Stack Overflow Dev Days 2011 – London, November 14th – cancelled!

Update: Dev Days has been cancelled. I’m still hoping to do something around this topic, and there may be small-scale meet-ups in London anyway.

Two years ago I talked about how humanity had let the world of software engineering down. This was one of the best talks I’ve ever given, and introduced the world to Tony the Pony. Unfortunately that puts the bar relatively high for this year’s talk – at least, high by my own pretty low standards.

In a somewhat odd topic for a Christian and a happy employee of a company with a code of conduct which starts "Don’t be evil," this year’s talk is entitled "Thinking in evil." As regular readers are no doubt aware, I love torturing the C# language and forcing the compiler to work with code which would make any right-thinking software engineer cringe. I was particularly gratified recently when Eric Lippert commented on one of my Stack Overflow answers that this was "the best abuse of C# I’ve seen in a while." I’m looking forward to talking about why I think it’s genuinely a good idea to think about nasty code like this – not to use it, but to get to know your language of choice more intimately. Like last time, I have little idea of exactly what this talk will be like, but I’m really looking forward to it.

Of memory and strings

This post was provoked by a recent Stack Overflow question which asked whether there was an efficient representation of ASCII strings in .NET.

In particular, the questioner wanted to store hundreds of thousands – possibly millions – of strings in memory, and knowing (or assuming) that they all consisted of ASCII characters, he wanted to avoid the waste of space that comes from storing each character in a .NET string as a UTF-16 code unit.

My answer to the question mostly consisted of saying that I didn’t think it would be worth the effort, and giving some reasons. But the more reasons I gave, the more I thought about the subtleties involved, and that it’s actually quite an interesting case study into memory use in .NET.

How are we going to measure object size?

If we’re going to work out any sort of benefit from a more compact string representation, we’ll need to be able to calculate how much memory our objects are taking to start with. Rather than work this out in a purely theoretical way, I’ve been running tests using code like this:

using System;

class Program
{
    static void Main(string[] args)
    {
        int size = 10000000;
        object[] array = new object[size];

        long before = GC.GetTotalMemory(true);
        for (int i = 0; i < size; i++)
        {
            array[i] = new object();
        }
        long after = GC.GetTotalMemory(true);

        double diff = after – before;

        Console.WriteLine(“Per object: “ + diff / size);

        // Stop the GC from messing up our measurements
        GC.KeepAlive(array);
    }
}

Obviously that doesn’t take into account factors such as memory used by JITting, or anything that could be going on in other threads, but by using a suitably large number of objects, and by performing the division in floating point arithmetic (to avoid a slight variation making an 11.99999 come out as an 11, when it’s really just a “12 with something else going on”, we can work out the size of objects pretty clearly. The sample above measures the size of a vanilla object, but the code can be adapted very easily.

The first important thing to point out is that C# doesn’t guarantee the results of this – it isn’t responsible for determining how all of an object is laid out in memory; that’s the runtime’s job. While there are attributes to affect the layout and padding of the data members of a type in memory, there are other aspects that are out of your control. In this post I won’t use any of the layout attributes – we’ll just use the defaults.

Not all runtimes are created equal, either. On my laptop I’ve got Mono 2.8, .NET 3.5 and .NET 4, with the two versions of .NET each having the 32-bit (x86) and 64-bit (x64) CLRs. For the sake of simplicity, I’m going to stick with .NET 4 for this post, but I’ll give results for both the x64 and x86 CLRs. To test each of them, I’m compiling with “/platform:x64” or “/platform:x86”.

Some simple starter measurements

Before I start creating my own types, let’s try a few built-in types, including strings:

Type x86 size x64 size
object 12 24
object[] 16 + length * 4 32 + length * 8
int[] 12 + length * 4 28 + length * 4
byte[] 12 + length 24 + length
string 14 + length * 2 26 + length * 2

Note that all the x86 sizes are rounded up to the nearest 4 bytes, and all x64 sizes are rounded up to the nearest 8 bytes.

The string numbers are interesting, because strings are the only non-array types in .NET which vary in size. A long string consists of a single large object in memory. Compare this with Java, where a String is a “normal” type in terms of memory consumption, containing an offset and length into a char array – so a long string consists of a small object referring to a large char array. This distinction will be very important when we come to build an AsciiString type. Another point about measuring string sizes is that it’s relatively tricky to measure the size of an empty string – because even if you use the “new string(‘x’, 0)” constructor, the result is still cached – the same reference is returned each time.

You might be forgiven for looking at the numbers above and thinking that the “overhead” of an object is 12 bytes in x86 and 24 in x64… but that’s not quite right. Let’s build our own straightforward classes and measure them…

Custom classes containing primitives

Here are six classes, all of which are measured with the same simple test code:

class Empty {}
class OneInt32 { int x; }
class TwoInt32 { int x, y; }
class ThreeInt32 { int x, y, z; }

class Mixed1
{
    int x;
    byte b1, b2, b3, b4;
    int y, z;
}

class Mixed2
{
    int x;
    byte b1;
    int y, z;
    byte b2, b3, b4;
}

The last case is mostly to check how the CLR handles an “awkward” class declaration, where the int variables won’t naturally be aligned on 4-byte boundaries. The results look odd at first, but we’ll make sense of them in a minute:

Type x86 size x64 size
Empty 12 24
OneInt32 12 24
TwoInt32s 16 24
ThreeInt32s 20 32
Mixed1 24 32
Mixed2 24 32

A few interesting things to note here:

  • There’s a “base” overhead of 8 bytes per object in x86 and 16 per object in x64… given that we can store an Int32 of “real” data in x86 and still have an object size of 12, and likewise we can store two Int32s of real data in x64 and still have an object of x64.
  • There’s a “minimum” size of 12 bytes and 24 bytes respectively. In other words, you can’t have a type which is just the overhead. Note how the “Empty” class takes up the same size as creating instances of Object… there’s effectively some spare room, because the CLR doesn’t like operating on an object with no data. (Note that a struct with no fields takes up space too, even for local variables.)
  • The x86 objects are padded to 4 byte boundaries; on x64 it’s 8 bytes (just as before)
  • By default, the CLR is happy to pack fields pretty densely – Mixed2 only took as much space as ThreeInt32. My guess is that it reorganized the in-memory representation so that the bytes all came after the ints… and that’s what a quick bit of playing around with unsafe pointers suggests too… but I’m not sufficiently comfortable with this sort of thing to say for sure. Frankly, I don’t care… so long as it all works, what we’re interested in is the overall size, not the precise layout.

So what does an ASCII string look like?

In this blog post I’m not actually going to implement an ASCII string at all (well, not much). I’m merely pointing out what the data structures would look like. However, it’s worth working out what desirable qualities it should have. As far as possible, it should feel like System.String. In particular:

  • It should be immutable.
  • It should have fast access to individual characters, and the length.
  • It should mostly “feel” like an immutable reference type, in that passing a value of type AsciiString around should be cheap, like copying a reference.
  • It should use as little memory as possible… less than the equivalent string, or it’s pointless.
    • One caveat to this: in theory that could mean storing 8 characters in every 7 bytes, as ASCII really only uses 7 bits per character. I’m not going to those extremes, but you can think about them if you want.

We’re going to store the characters as a byte array. We have three options as to exactly how we handle that byte array:

  • We could go the Java way, where several strings share references to the same array. Each string then has an offset and a length to say which bit of the array they’re interested in.
    • Pros: Substring becomes really cheap
    • Cons: You can end up having just a tiny substring responsible for keeping a huge character array alive
  • We could go the .NET way, where each string has its own character data, but the buffer may be longer than necessary… so it stores the length too. (A bit like a List<T>.)
    • Pros: Can potentially make building strings cheap, if you just keep whatever potentially oversized buffer you’ve already got.
    • Cons: Wasted space for the unused part of the array, and a field for the length.
  • We could just have a byte array of exactly the right size – and it already knows its size.

I’m going to assume the third option here. So all the data our type needs is a byte array. That’s going to be pretty cheap… we hope. Let’s look at what we can build.

Option 1: A simple class

To give a flavour of the implementation, I’ve decided to implement four members for each option:

  • A way of creating an AsciiString from a regular string
  • The Substring overload with both a start and length
  • The Length property
  • The indexer returning a char

Hopefully that will give enough of an idea of what’s going on to be useful. Note that these aren’t production-quality implementations at all… none of the code has ever been run at all. I have made sure it compiles, so just be grateful for that :)

using System;
using System.Text;

public sealed class AsciiString
{
    private readonly byte[] data;

    public AsciiString(string text)
    {
        // TODO: Rather more data validation etc!
        data = Encoding.ASCII.GetBytes(text);
    }

    private AsciiString(byte[] data)
    {
        // This constructor is private: we can trust that it’s been called
        // by code which isn’t going to modify the contents of the array
        // afterwards.
        this.data = data;
    }

    public AsciiString Substring(int startIndex, int length)
    {
        if (startIndex < 0 || startIndex > data.Length)
        {
            throw new ArgumentOutOfRangeException(“startIndex”);
        }
        if (startIndex + length > data.Length)
        {
            throw new ArgumentOutOfRangeException(“length”);
        }
        byte[] newData = new byte[length];
        Buffer.BlockCopy(data, startIndex, newData, 0, length);
        return new AsciiString(newData);
    }

    public int Length { get { return data.Length; } }

    public char this[int position] { get { return (char) data[position]; } }
    // etc…
}

Hopefully this is pretty straightforward – it’s meant to be the most “obvious” solution. Note that we’ve not got the nice locality of reference which the real String class has – it’s possible that the an AsciiString could end up with its backing array a long way away in memory, so a indexer operation for a single character could end up with three cache misses – one for the AsciiString object, one for part of the data array storing the length (for argument validation) and one for the part of the data array containing the character we’re looking for. That may be unlikely, and it’s not the kind of thing I normally think about – but it’s probably the kind of thing the BCL team pay a lot of attention to.

We get the same “immutable reference type” behaviour present in the normal string type, however – you can have a null AsciiString reference just as normal, any assignments will just be reference assignments, etc.

What about the size though? There are two objects to consider:

  • The array, of size 12 + length or 24 + length (x86 and x64 respectively; rounded up to 4 or 8 bytes as well)
  • The object itself, of size 12 or 24

So we’ve got a total size of 24 + length or 48 + length, depending on architecture. To show how the break-even point works, here’s a little table showing the sizes of string and AsciiString for various sizes on both architectures:

Length string-x86 string-x64 AsciiString-x86 AsciiString-x64
0 16 32 24 48
1 16 32 28 56
2 20 32 28 56
3 20 32 28 56
4 24 40 28 56
5 24 40 32 56
6 28 40 32 56
7 28 40 32 56
8 32 48 32 56
9 32 48 36 64
10 36 48 36 64
..
16 48 64 40 64
24 64 80 48 72
32 80 96 56 80

As you can see, the break-even point in x86 is at length 10; in x64 it’s at length 16. After that, we start winning – as we’d expect. The penalty for very small strings is quite hefty though – you’d really better hope you didn’t have lots of single-character strings, taking 56 bytes each in x64.

Let’s see if we can do better…

Option 2: A simple struct

A lot of the overhead here has come from the fact that we’ve got an object which only has a single field. The field is all we’re interested in… why are we bothering with all the overhead of the object? Let’s make it a struct instead, effectively inlining that field wherever we use the type. Assignment, passing arguments to methods etc will still only be copying a reference – it’s just the reference will be the byte array rather than a wrapper object.

It all sounds good, but there are two snags:

  • The value can never be null; that at least diverges from the familiar string behaviour
  • We won’t be able to prevent code from creating an instance of our struct with new AsciiString() – and that won’t be good.

We can actually pit these two downsides against each other by making the “default” value a pseudo-null value… we can even throw NullReferenceException just as if it were a reference type. We don’t even need to do any work in order to get that NullReferenceException – every member is going to use the data array anyway, and dereferencing that will automatically throw an exception. We might want to change things around a bit to make that the very first thing that can throw an exception, but that’s all.

It’s nasty, but it appeals very slightly. In an evil kind of way. It makes things slightly more familiar, but at the cost of being generally weird in other ways.

We still need to be able to check whether an AsciiString value is the logical null value. I’ll add an IsNull property for that purpose. (An alternative would be HasValue, but that would be confusing with Nullable<T>.)

Most of the code remains untouched – it looks like this:

public struct AsciiString
{
    private readonly byte[] data;
    public bool IsNull { get { return data == null; } }

    // Remainder of code as before
}

Now let’s look at the sizes, which should be a lot more favourable than before. Note that I had to change the size-checking code to create an array of type AsciiStruct[] instead of object[] to avoid boxing. Should we take the size of the array itself into consideration when computing the size of the AsciiString? We haven’t when working out the size of string… in each case the size of any individual element will be the size of a reference. For the table below, I haven’t included it… but bear in mind that this form of measurement would count the size of most value types (int etc) as 0. It just goes to show that when you talk about the size of a data type, you really need to be very precise in what you mean.

Length string-x86 string-x64 AsciiString-x86 AsciiString-x64
0 16 32 12 24
1 16 32 16 32
2 20 32 16 32
3 20 32 16 32
4 24 40 16 32
5 24 40 20 32
6 28 40 20 32
7 28 40 20 32
8 32 48 20 32
9 32 48 24 40
10 36 48 24 40
..
16 48 64 28 40
24 64 80 36 48
32 80 96 44 56

This time, unsurprisingly, AsciiString is always more space-efficient than the normal string. It just takes a certain amount of holding our noses. Speaking of which…

Option 3: Extension methods on byte[]

Suppose we really, really want to have “proper” null references. We don’t really need the struct. We could treat any byte array as an array of ASCII characters, with extension methods like this:

public static class ByteArrayExtensions
{
    public static byte[] Substring(this byte[] data, int startIndex, int length)
    {
        if (startIndex < 0 || startIndex > data.Length)
        {
            throw new ArgumentOutOfRangeException(“startIndex”);
        }
        if (startIndex + length > data.Length)
        {
            throw new ArgumentOutOfRangeException(“length”);
        }
        byte[] newData = new byte[length];
        Buffer.BlockCopy(data, startIndex, newData, 0, length);
        return newData;
    }
}

The size is the same as with option 2 – in both cases there’s just the byte array, basically. This option is truly horrible in many ways though:

  • You can no longer tell in your code (just through typing) what’s meant to be an AsciiString and what isn’t
  • Kiss immutability goodbye
  • We can’t guarantee that all the characters will be valid ASCII any more
  • We can’t add extension properties or extension indexers
  • We can’t make it implement the interfaces we want it to

Obviously, we’d never take this route. I just thought I’d include it for a regular dose of evil-ness.

Conclusion

Implementing an ASCII-only string representation sounds like it should be an obvious win in terms of memory, at the cost of doing a lot of work that’s already been done for us in String. However, the most obvious implementation takes a while to break even in memory usage, compared with the normal string type, due to the “special” nature of string within the CLR. We can’t mimic the “stretchiness” of string ourselves. The BCL/CLR teams could, of course, if they really wanted to. I’m not holding my breath though.

If we’re dead keen on saving space at the cost of some design wonkiness, we can use a value type instead of a reference type. Other than nullity, it works pretty well… but you have all the disadvantages which go with value types, such as the unavoidable parameterless constructor and the need to watch out for boxing.

Aside from anything else, I hope this was useful as a delve into how much space objects actually take up in .NET – and as a way of highlighting the extra memory used when running in the x64 CLR.